updated sporadically at best

Thursday, January 03, 2019

The Evolution of What I Call Work: A Google Calendar Memoir

“I never do real work anymore,” a fellow professor used to complain to me.

“People have to be ready to work,” someone once told me about startup hiring.

For better or for worse, conversations often revolve around it. What do you do for work? How did you come to do the work you do? How much do you work? Are you working on anything interesting? Oh, please don’t ask me what I do for work. Excuse me, I have to go do some work.

Work has also been a major subject of conversation as I have been recruiting. Many of the people I talk to are at a life crossroads, deciding what work identity to take on next. They ask what it was like for me--to be a PhD student, to be a professor, and then to start a company. The questions range from the philosophical (“are you glad you did it?”) to the logistical (“how often did you exercise?”).

In these conversations I often struggle, as work means such wildly different things to different people. This is all well and good when people are trying to find common ground at a cocktail party, but it can be problematic when people are using metrics taken in different contexts to make important life decisions. The nature of my work has completely transformed between when I was an undergraduate and now--and that is something important to acknowledge when people are asking questions, especially ones around “how much.”

This post, then, is an attempt to establish context about what I mean when I talk about work. In 2007, I started using Google Calendar not just for scheduling, but also for documenting what I did with my time. What this means is that I’m able to give you a rough overview of how I spent most of my days from 2007 until now. I show representative time blocks from representative weeks, chosen to give you the most accurate picture of my life without overwhelming you with data. What’s important is not so much the precise number of hours, but the evolution of the content of those hours over the years. Here we go.

Late Undergrad/PhD: Back When I Used to do "Real Work"

Fall 2007, first semester of my senior year of college.

Here I show a week from fall of my senior year of college. Before I continue, I should explain how I’ve chosen to display my week. I’ve divided up my time into three categories: work, recovery, and life upkeep. From college until fairly recently, I did not consider meetings or talks to be work: my labeling of “work” on these calendars reflects this early misconception. I chose 7am to 10pm on weekdays only not because work only happened during these periods of time, but because they are the most representative and well-documented. I like to keep late nights and weekends unstructured.

This was one of my easier semesters: I was taking three courses (computational linguistics, randomized algorithms, and an art class) and working on an honors thesis. I had just handed over leadership of the Harvard College Engineering Society, which I had help start my freshman year. I was not TAing that semester. All this gave me the freedom to spend a lot of time working on two things I was excited about: my senior thesis and projects for my studio art class. This is a representative week from late undergrad and when work was going well during my PhD: relatively little structured time and lots of time to produce output.

Mid-PhD: I Learn to Make Good Use of Recovery Time

Fall 2012, mid-PhD.

When work was not going as well, the previous schedule turned out to be completely wrong. Here is a week from fall of 2012, when I was in between the first and second major projects of my PhD thesis and was coming off a three-month sprint of an internship. I was tired and lost, so I spent my time making progress in ways outside of my main output-producing work. During this week, my periods of work are heavily broken up and interspersed with meetings. Some are research meetings: with my advisor, with existing collaborators, and with potential collaborators. Some of the meetings were outside of research: Tuesday afternoon I had a “Positivity@MIT” that I organized to combat bias on campus; Thursday evening I had a Graduate Women at MIT social. (I had started Graduate Women at MIT with a couple of fellow students in 2009.) Around this time, I also became more deliberate about working on my writing skills, often scheduling at least one evening a week to write (as I do the Wednesday of this week). I did not see most of these activities as “work” at the time, but they turned out to be valuable work, both towards recovery and towards building my career.

Learning to Find Time for "Real Work" as an Assistant Professor

Spring 2017, second semester as an Assistant Professor at CMU.

It turns out that scheduling lots of meetings during the recovery periods of my PhD prepared me for being a professor, which mostly consists of meetings and very little time for what I would call “work.” Here is a week from the spring of my first year at Carnegie Mellon. That week I had two visitors I hosted for talks, one on Monday and one on Friday. I taught Tuesday and Thursdays. That week I was working on producing a video with fellow computer scientist James Mickens about my research, so I blocked on Wednesday morning to work with him on that. (It turns out James has very high artistic standards, so I worked on the video for most of that weekend too. You know your life is not so bad when you tell people you’ve been losing sleep looking for the right clips of “Who Wants to be a Millionaire” for your research talk video.) Here, I’m doing much less of the output-producing work I consider to be “real work” and spending much more time enabling others to do that kind of work, with what I hope will be higher impact.

Starting a Company: Even More Meetings

Fall 2018, a few months into starting my company.

People say that being a professor is like running a small business. This has proven to be somewhat true as I’ve been setting up my actual company. The main difference is that there is even less of what I would call actual work. Here I show a particularly meeting-intensive (but not particularly anomalous) week from this past fall. During this week, I went up to the city from Palo Alto/Menlo Park three times (Monday, Wednesday, and Thursday). Some of it was for recruiting, some of it was for meetings with potential customers, and some of it was for networking (coffees or meals with people I had been introduced to; actual networking events). That Friday, two of the remote part-time people I work with were in town, so I spent most of Friday through the weekend working with them. This week was during a period of heavy recruiting, so I had many recruiting/interview calls. The work that week consisted mostly of emails to schedule and follow up with meetings. I was also developing the ideas for our initial product, both in terms of use cases and the underlying technology. Most of my work now involves meetings that potentially enable other people to work.

Closing Thoughts

It turns out that I, too, don’t do very much of what I used to consider real work anymore, but I spend more of my time working. I thought I had maxed out on my capacity for work as an undergraduate. In some ways this was true, but as the nature of my work changed this turned out not to be true. What I remember my work life to be (for instance, similar to the first week I showed you throughout my PhD) is not always accurate. My capacity for output-producing work has probably diminished, but my capacity for meetings and total overall work has increased. My entire life is meetings now. I should spend less time in transit.

So there we have it. I’m quite voyeuristic about how other people spend their time, so I’d love to see yours.

With thanks to Kayvon Fatahalian, Aliza Aufrichtig, and Dean Hachamovitch for comments.

Sunday, December 09, 2018

A Tribute to Scott Krulcik

Yesterday morning, I read a news article that my former student Scott Krulcik had died. Surely they’ve got the wrong guy, I thought.

Scott on an apple-picking trip that he helped organize.

Almost comically wholesome, Scott was what probably every parent dreams their kid will grow up to be. He was the kind of person who remembered not just your birthday but also your dairy allergy and made a dairy-free dessert in celebration. Not only did Scott cook many of his own meals (an impressive feat for a college student!), he also carried around his own silverware to minimize his impact on the environment. Scott’s stories of fertilizing his lawn by hand and fixing things with his toolbox impressed us so much that we came to call him the “Dad” of the research group, despite the fact that two other members of the group were actual fathers.

It’s very confusing when someone who has no business dying dies. And Scott, in particular, should have had absolutely no business with death. This was supposed to be the beginning of everything for Scott. We had celebrated his graduation from Carnegie Mellon just seven months ago. He recently moved into his first post-college apartment in the West Village. He seemed to be excelling at the transition to the working life, spending his spare time running, exploring restaurants, and reading books at the wooden desk he had built for himself the summer after graduation. Days before the news, he sent me a picture of the holiday lights he and his roommate had just put up in their apartment. It was supposed to be the first of many iterations of his holiday decorations.

Scott at a Thanksgiving dinner he helped organize, with
the turkey he made.

The best way to describe Scott is that he was a really, really great person. (The past tense is still a shock here.) I met him because he was enrolled in the security course I was teaching with Matt Fredrikson spring of 2017. As part of the first assignment, Matt and I had asked the students to email us introducing themselves and telling us why they were taking the course. Scott had a particularly memorable self-introduction. He talked about trying to be a good human who wants to help other humans and explained the connection with computer security. In his words, if he’s producing software that helps people in some other way but leaves them vulnerable to identity theft and blackmailing, then the software isn’t really helping people. Here, I thought, was a student who really got it.

Scott presenting his senior thesis project as a poster.

Over the course of the semester, it became clear that Scott was both incredibly committed to doing good and preternaturally adept at seeing connections. The day we taught the class about information flow security, he discovered and reported such a bug in Instagram that leaked people’s private photos. (The other students became more motivated to pay attention after they learned of his success with Facebook’s bug bounty program!) After he emailed the course staff observing similarities between reference monitors in computer security and repair proteins he was learning about in biology, I asked him if he had thought about writing a senior thesis. Scott showed up with admirable requirements: he needed to be able to explain the importance of the project to his mother, who was nontechnical and whose opinion he cared about deeply, and he needed to work on a project that could really help people. After exploring a breathtaking range of topics, ranging from using programs to model history (Scott loved military history) to detecting bugs in Facebook access policies, we finally settled on an idea that Scott came up with himself, combining ideas from my prior research with the work he had been doing at Google as an intern. The project was so compelling that even before it was finished, he was already attracting attention from people in industry.

Scott celebrating success in building the movie screen.

Once I got to know Scott, I discovered that he brought an equivalent level of effort and creativity to make people’s lives better in an immediate way. Scott’s positivity and energy was infectious in our research group and beyond. When our French group member asked about Thanksgiving, Scott insisted that we have a group Thanksgiving dinner complete with a turkey. I told him we could have a turkey if he took the lead on it, fully expecting him to let that requirement drop, but Scott got instructions from his mother and executed beautifully on brining, roasting, and carving the bird. When another group member suggested we host a Lindsay Lohan movie night, Scott took my somewhat joking suggestion that we should build our own movie screen and ran with it, showing up with his “Dad” toolbox and corralling fellow group members to work late into the night to realize his vision. (Perhaps unsurprisingly, The Parent Trap was one of his favorite movies.) One of my favorite memories of Scott is when he convinced me that, as a faculty judge of the CMU hackathon TartanHacks, I should come push the snack cart with him at midnight, citing how happy tired hackers are to get refreshments at that hour. The reactions across campus when people saw us approaching with clementines and cookies surrounded by a cloud of bluetooth-powered ice cream truck music confirmed that Scott knew just how to make people smile.

With Scott’s passing, the world has lost someone with enormous potential for using their capabilities for good. It is sad enough to lose Scott as a former student and friend. On top of that, I am also sad that the world will lose a rare software engineer who is deeply thoughtful about the process and purpose of software. After Scott graduated, he became a significant source of interesting reading for me: I regularly received notes and articles from him about software engineering or cybersecurity. He was thinking hard about how people should create robust, secure software. He was thinking hard about what it meant to build things that would help people. He was a protector of people’s personal information and was critical of processes and practices that seemed irresponsible with data privacy. And his ideas were spot on: he had an understanding of ecosystems and the implications of different approaches far beyond what I could say of myself at that age. In many respects, I saw Scott more as a peer than as a mentee and would seek his insightful feedback on ideas and drafts.

If life were one of the movies that Scott loved, we would not have lost him so early. This would have been the point in the story when he faced the first obstacle, only to overcome it swimmingly and go on to defeat all the bad guys and save the entire world. The group “Dad” would have gotten the chance to become an actual father. He would have become known for helping people in some big way while maintaining an impressive level of humanity and making the people around him happier. But real life doesn’t play by these rules, and Scott is gone.

Since hearing the news, I have been thinking a lot about one of my favorite books, Virginia Woolf’s Mrs. Dalloway. In the book, Clarissa Dalloway’s views are shaped by having watched her sister get killed by a falling tree. From that moment onward, Clarissa no longer believes in any higher order and instead lives her life to make things better in small ways each moment. I found her reaction to senseless tragedy inspiring when I first read the book and even more inspiring now. The least we can do to honor Scott is to remember to leave things better than we found them. Without Scott, it’s on us to carry on the legacy he was just starting to build.

With thanks to Jacob Van Buren for collaboration and Aliza Aufrichtig for edits.

Saturday, September 16, 2017

The Genius Fallacy

During our department's PhD Open House this past spring, a student asked what I thought made a PhD student successful. I realized that my answer now is different than it would have been a few years ago.

My friend Seth tells me I need to build more suspense in my writing*, so let me first tell my life story.

The whole time I was growing up, I was slightly disappointed that I wasn't some kind of prodigy. It seemed that my parents were telling me every day about so-and-so's toddler son who was playing Beethoven concertos from memory, so-and-so's daughter who, as an infant, had already completed a course on special relativity. In order to give me the same opportunities to demonstrate my genius, my parents spent all their money on piano lessons, gymnastic classes, writing camps, art camps, tennis camps, and extracurricular math classes. Unfortunately, nobody ever said, "This is the best kid I have ever seen. I must take her away from her family to train her for greatness."

"Child prodigies have hard lives," my father would tell me, probably trying hiding his disappointment. "It can be difficult for them to make friends because others can't relate to how gifted they are."

"Just work hard, be a nice person, and try to be happy," my mother would tell me. "You didn't know how to cry when you were born. I'm glad you're able to talk in full sentences."

Despite the comforting words from my parents, there was always a part of me that held out hope of discovering a secret prodigious talent. But the angst of not being a prodigy was small compared to the existential angst of being newly alive and so I mostly tried to work hard, to be a nice person, and to be happy. This got me all the way to college, where I thought I could leave all this prodigy nonsense behind me.

In college, I discovered that the pressure to be immediately and wildly gifted came in another form. In my first two years of school, I attended many talks and panels by professors telling us what we should do with our lives. I attended a research panel in the economics department, where one of the professors kept repeating the word "star."

"You have to be a super star to succeed in a department like ours," he said about what it meant to be on the tenure track in the economics department. "I want undergraduate researchers who are stars."

I didn't know what a star was and I didn't presume to be one, but I liked the professor's research, so I emailed him my resume and said I would like to work with him.

He never wrote back.

I resigned myself to not being a star. I took hard classes with people who had medaled in math, informatics, and science olympiads, wondering how it would feel to do the problem sets if I had such a gifted, well-trained mind. I also became concerned about my future. What was my place in a world that worshipped instatalent?

It all began to change when I began to talk more with the professors in the Computer Science department. Despite my lack of apparent star quality, my professors seemed to like answering the questions I asked them. They pitched me projects I could do, and before I knew it I was applying to PhD programs and preparing to spend the next few years doing academic research. As I was graduating, I spoke with my one professor to get advice about my future in research.

"Research isn't just about smarts," my professor told me. At the time, I thought this was a white lie that professors told to their students who weren't prodigies.

Then she told me something that turned my worldview upside down. "My biggest concern for you, Jean, is that you need to start finishing projects," she told me. "You need to focus."

It was then that I began to realize that maybe the myth of the instagenius was but a myth. I had gone from interest to interest, from project to project, waiting to find It, that easy fit, that continuous honeymoon. With some projects I had It for a while, long enough to demonstrate to myself and others that I could finish. Then I moved on, waiting to fall in love with a problem, waiting for a problem to choose me. What I had failed to see was that this relationship with a problem didn't just happen: I had to do my share of the work.

Still, I clung to the dream of the easy problem. At Google, employees get to have a 20% project: a side project they spend the equivalent of one day a week working on that may or may not make its way into production eventually. In graduate school, my 20% project was looking for an easier project--a project with which I had more chemistry, a project with fewer days lost to dead ends and angst. One of my hobbies involved interviewing for internships in completely different research areas. Another one of my hobbies was fantasizing about becoming a classics PhD student, despite knowing no ancient languages. (I once took an upper-level literature seminar on Aristotle with the leading world scholar on Homeric poetry and I thought he had a pretty good life.)

But because I like to finish what I started, the PhD became a process of learning to persevere. Instead of indulging the temptation to switch projects, advisors, or even schools, I kept going. I endured something like five rounds of rejections on the first paper towards my PhD thesis, and multiple years of people telling me that maybe I should find another topic, because I didn't seem in love. Eventually, I learned that every problem that looks like it might be easy has hard parts, every problem that looks like it might be fun has boring parts, and all problems worth solving are full of dead ends. I finally learned, in the words of my friend Seth, that "the grass is brown everywhere."

And this shattering of my belief in instagenius has shaped my conception of what makes a student a star. There was a time when I, like many people, thought that the superstars were the ones who sounded the most impressive when they spoke, or who had the most raw brainpower. If you asked me what I thought made a good researcher, I may have said some other traits like creativity and good taste in problems. And while all these certainly help with being a good researcher, there are plenty of people with these traits who do not end up being successful.

What I have learned is that discipline and the ability to persevere are equally, if not more, important to success than being able to look like a smart person in meetings. All of the superstars I've known have worked harder--and often faced more obstacles, in part due to the high volume of work--than other people, despite how much it might look like they are flying from one brilliant result to another from the outside. Because of this, I now want students who accept that life is hard and that they are going to fail. I want students who accept that sometimes work is going to feel like it's going to nowhere, to the point that they wish they were catastrophically failing instead because then at least something would be happening. While confidence might signal resilience and a formidable intellect might decrease the number of obstacles, the main differentiator between a star and simply a smart person is the ability to keep showing up when things do not go well.

It has become especially important for me to fight the idolization of the lone genius because it is not just distracting, but also harmful. Currently, people who "look smart" (which often translates into looking white, male, and/or socioeconomically privileged) have a significant advantage for two main reasons. The first reason has to do with self-perception. Committing to hard work and overcoming obstacles is easier if you think it will pay off. If someone already does not feel like they belong, it is easier for them to stop trying and self-select out of a pursuit when they hit a snag. The second reason has to do with perception by others. Research suggests that in fields that value innate talent, women and other minorities are often stereotyped to have less of it, leading to unfair treatment.

And so I've written this post not just to reveal my longstanding delusions of grandeur, but also to start a discussion how the myth of instagenius holds us back, as individual researchers and as a community. Would love to hear your thoughts about how we can move past the genius fallacy.

Related writing:

* Seth also tells me the main idea of this blog post is the same as Angela Duckworth's book Grit. I guess I should tell you that you could read that instead of this. On the subject of the lack of originality of my ideas, you should also read what Cal Newport has to say about the "passion trap."

Wednesday, August 09, 2017

Guest Post: The Real Problem Isn't Gender; It's the Modern Media

This guest post by Seth Stephens-Davidowitz was adapted from a comment he wrote on a Facebook post of mine sharing this essay.

While gender in tech is certainly an issue, a lot of the controversy over it is unnecessary. What recently happened with the Google memo is a classic case of Scott Alexander’s Toxoplasma of Rage, one of the most brilliant pieces I have ever read. Read his post. Then read it again.

The stories that go viral are those that maximize anger and foster the most disagreement.

Guy writes a memo with a lot of true statements but an aggressive tone bound to infuriate some people. Within two days, everybody is predictably furious.

My hypothesis is that an overwhelming majority of people actually agree on many of the points of contention--or would agree if they were phrased a little less aggressively, in a tone less likely to create controversy and less likely to go viral.

How many people, for example, agree with the following statement?

"We do not know why CS majors are 80% male. It is possible that, even though millions of women have a passion for computer science, there are, in aggregate, fewer women than men who have this passion. We don't know since computer science is kind of new. And also we don't really understand why female CS majors rose to 40% and then plummeted. Since it is possible that discrimination and stereotypes play a role, we should devote resources to making sure everybody with interest in these high-status jobs has ample opportunity to pursue them. Also, everybody should be judged based on their own interest and aptitude in a job, not how many people of their gender would want that job. Finally, the majority of women in tech--as well as many other high-powered fields--have said they have faced sexism, and we should work really hard to stop that."

This addresses many of the controversies that were raised by the James Damore memo and the responses to it, but is phrased in a way such that few people would find it objectionable. Perhaps we should stop falling for these traps that maximize rage and instead try sober analysis. We may find a surprising amount of consensus.

Lastly, no young person, man or woman, should actually be training for anything--driving cars, teaching kids, diagnosing diseases, or writing programs--because AI will soon do all that for us. ;)

For 352 pages of sober analysis on even more controversial topics, you can check out Seth’s book Everybody Lies.

Saturday, April 01, 2017

Techniques for Protecting Comey's Twitter: A Taxonomy

Person in the know calling me out.

After my post about how the Comey Twitter leak was the most exciting thing ever for information flow security researchers, I had some conversations with people wanting to know how to tell between information that is directly leaked and information that is deduced. Someone also pointed out that I didn't mention differential privacy, a kind of statistical privacy that talks about how much information an observer can infer. It's true: there are many mechanisms for protecting sensitive information, and I focused on a particular one, both because it was the relevant one and because it's what I work on. :)

Since this Comey Twitter leak is such a nice example, I'm going to provide more context by revisiting a taxonomy I used in my spring software security course, adding statistical privacy to the list. (Last time I had to use a much less exciting example, about my mother spying on my browser cookies.)

Access control mechanisms resolve permissions on individual pieces of data, independently of a program that uses the data. An access control policy could say, for instance, that only Comey's followers could see who he is following. You can use access control policies to check data as it's leaving a database, or anywhere in the code. Things people care about with respect to access control is that the access control language can express the desired policies while providing provable guarantees that policies won't accidentally grant access, and can be checked reasonably efficiently.
Information flow mechanisms check the interaction of sensitive data with the rest of the program. In the case of this Comey leak, access control policies were in place some of the time. For example, if you went to Comey's profile page, you couldn't see who he was following. How the journalist ended up finding his page was by looking at the other users suggested by the recommendation algorithm after requesting to follow hypothesized-Comey. (This was aided by the fact that Comey is following few people, and In this case, it seems that Instagram was feeding secret follow information into the recommendation algorithm and not realizing that the results could leak follow information. An information flow mechanism would make sure that any computation based on secret follow information could not make its way into the output from a recommendation algorithm. If the follow list is secret, then so is the length of that list, people followed by people on the follow list, photos of people from the list, etc.
Statistical privacy mechanisms protect prevent aggregate computations from revealing too much information about individual sensitive values. For instance, you might want to develop a machine learning algorithm that uses medical patient record information to do automated diagnosis given symptoms. It's clear that individual patient record information needs to be kept secret--in fact, there are laws that require people to keep this secret. But there can be a lot of good if we can use sensitive patient information to help other patients. What we want, then, is to allow algorithms to use this data, but with a guarantee that an observer has a very low probability of tracing diagnoses back to individual patients. The most popular formulation of statistical privacy is differential privacy, a property over computations that allows computations only if observers can tell the original data apart from slightly different data with very low probability. Differential privacy is very hot right now: you may have read that Apple is starting to use this. It's also not a solved problem: my collaborator and co-instructor Matt Fredrikson has an interesting paper about the tension between differential privacy and social good, calling for a reformulation of statistical privacy to address the current flaws.

For those wondering why I didn't talk about encryption: encryption focuses on the orthogonal problem of putting a lock on an individual piece of data, where locks can have varying cost and varying strength. Encryption involves a different kind of math--and we also don't cover encryption in my spring course for this reason.

Another discussion I had on Twitter.

Discussion. Some people may wonder if the Comey Twitter leak is an information flow leak, or some other kind of leak. It is true that in many cases, this Instagram bug may not be so obvious because someone is following many people, and the recommendation algorithm has more to work with. I would argue that it squarely is in the purview of information flow mechanisms. If follow information is secret, then recommendation algorithms should not be able to compute using this data. (Here, it seems like what one means by "deducible" is "computed from," and that's an information flow property.) We're not in a situation where these recommendation engines are taking information from thousands of users and doing something important. It's very easy for information to leak here, and it's simply not worth the loss to privacy!

Poor, and in violation of our privacy settings.

Takeaways. We should stand up for ourselves when it comes to our data. Companies like Facebook are making recommendations based on private information all the time, and not only is it creepy, but it violates our privacy policies, and they can definitely do something about it. My student Scott recently made $1000 from Facebook's bug bounty program reporting that photos from protected accounts were showing up in keep-in-touch emails from Instagram. If principles alone don't provide enough motivation, maybe the $$ will incentivize you to call tech companies out when you encounter sloppy data privacy practices.

Friday, March 31, 2017

Five Research Ideas Instagram Could Have Used to Protect Comey's Secret Twitter

Even though cybersecurity is one of the hottest topics on the Internet, my specific area of research, information flow security, has remained relatively obscure. Until now, that is.

You may have heard of "information flow" as a term that has been thrown around with terms like "data breach," "information leak," and "1337 hax0r." You may not be aware that information flow is a specific term, referring to the practice of tracking sensitive data as it flows through a program. While techniques like access control and encryption protect individual pieces of data (for instance, as they leave a database), information flow techniques additionally protect the results of any computations on sensitive data.

Information flow bugs are usually not the kinds of glamorous bugs that make headlines. Many of the data leaks that have been in the public consciousness, for instance the Target and Sony hacks, happened because the data was not protected properly at all. In these cases, having the appropriate access checks, or encrypting the data, should do the trick. But "why we need to protect data more better" is harder to explain. Up through my PhD thesis defense, I had such a difficult time finding headlines that were actually information flow bugs that I resorted to general software security motivations (cars! skateboards! rifles!) instead.

From the article.

Then along came "This Is Almost Certainly James Comey's Twitter Account," an article I have been waiting for since I started working on information flow in 2010. The basic idea behind the article is this: a journalist named Ashley Feinberg wanted to find FBI director James Comey's secret Twitter account, and so started digging around the Internet. Feinberg was able to be successful within four hours due to being clever and a key information leak in Instagram: when you request to follow an Instagram account, it makes algorithmic suggestions based on who to follow. And in the case of this article, the algorithmic suggestions for Comey's son Brien included several family members, including James Comey's wife--and the account that Feinberg deduced to be James Comey's. And it seems that Comey uses the same "anonymous" handle on Instagram as he does on Twitter. And so Instagram's failure to protect Brien Comey's protected "following" list led to the discovery of James Comey's Twitter account.

So what happened here? Instagram promises to protect secret accounts, which it (sometimes*) does. When one directly views the Instagram page of a protected user, they cannot access that person's photos, who that user is following, and who follows that user. This might lead a person to think that all of this information is protected all of the time. Wrong! It turns out the protected account information is visible to algorithms that suggest other users to follow, a feature that becomes--incorrectly--visible to all viewers once a follow is requested, because, presumably, whoever implemented this functionality forgot an access check. In this case the leak is particularly insidious because while the profile photos and names of the users shown are all already public, they are likely shown as a result of a computations on secret information: Brien Comey's protected follow information. (This is a subtle case to remember to check!) In information flow nomenclature, this is called an implicit flow. When someone is involved in a lot of Instagram activity, the implicit flow of the follow information may not be so apparent. But when many of the recommended follows are Comey family members, many of them who use their actual names, this leak becomes more serious!

Creepy Facebook search, from express.co.uk.

In the world of information flow, this article is a Big Deal because it so perfectly illustrates why information flow analyses are useful. For years, I had been jumping up and down and waving my arms (see here and here, for instance) about why we need to check data in more places than the point where it leaves the database. Applications aren't just showing sensitive values directly anymore, but the results of all kinds of computations on those values! (In this case it was a recommendations algorithm.) We don't always know where sensitive data is eventually going! (As was the case when Brien Comey's protected "following" list was handed over to the algorithm.) Policies might depend on sensitive data! We may even compute where sensitive data is going based on other sensitive data! In a world where we can search over anything, no data is safe!

Until recently, my explanations have seemed subtle and abstract to most, in direct contrast to the sexy flashy security work that people imagine after watching Hackers or reading Crypto. By now, though, we information flow researchers should have your attention. We have all kinds of computations over all kinds of data going to all kinds of people, and nobody has any clue what is going on in the code. Even though digital security should be one of the main concerns of the FBI, Comey is not able to avoid the problems that arise from the mess of policy spaghetti that is modern code.

Fortunately, information flow researchers have been working for years on preventing precisely this kind of Comey leak**. In fashionable BuzzFeed style, I will list exactly five research ideas Instagram could adapt to prevent such leaks in the future:

Static label-based type-checking. In most programming languages, program values have types. Type usually tell you simple things like whether something is a number or a list, but they can be arbitrarily fancy. Types may be checked at compile time, before the program runs, or at run time, while the program is running. There has been a line of work on static (compile time) label-based information flow type systems (starting with Jif for Java, with a survey paper here describing more of this work) that allows programmers to label data values with security levels (for instance, secret or not) as types, and that propagate the type of a program that makes sure sensitive information does not flow places that are less sensitive. These type systems give guarantees about any program that runs. The beauty of these type systems is that while they look simple, they are clever enough to be able to capture the kind of implicit flow that we saw with algorithms leaking Brien Comey's follow information. (We'd label the follow lists as sensitive, and then any values computed from them couldn't be leaked!)
Static verification. Label-based type-checking is a light-weight way of proving the correctness of programs according to some logical specification. There are also heavier-weight ways of doing it, using systems that translate programs automatically into logical representations, and check them against the specification. Various directions of work using refinement types, super fancy types that depend on program values could be used for information flow. An example of a refinement type is {int x | canSee(alice, x)}, the type of a value that exists as an integer x that can only exist if user "alice" is allowed to see it according to the "canSee" function/predicate) Researchers have also demonstrated ways of proving information flow properties in systems like IronClad and mKertiKOS. These efforts are pretty hardcore and require a lot of programmer effort, but they allow people to prove all sorts of boutique guarantees on boutique systems (as opposed to the generic type system guarantees using the subset of a language that is supported).
Label-based dynamic information flow tracking. Static label-based type-checking, while useful, often requires the programmer to put labels all over programs. Systems such as HiStar, Flume (the specific motivation of which was the OKCupid web server), and Hails allow labeling of data in a way similar to static label-based type systems, but track the flow of information dynamically, while the program is running. The run-time tracking, while it makes it so that programmers don't have to put labels everywhere, comes at a cost. First, it introduces performance slowdowns. Second, we can't know if a program is going to give us some kind of "access denied" error before it runs, so there could be accesses denied all over the place. Many of these systems handle these problems by doing things at the process level: if there is an unintended leak anywhere in the process, the whole process aborts. (Those who haven't heard of processes can think of the process as encapsulating a whole big task, rather than an individual action, like doing a single arithmetic operation.)
Secure multi-execution. Secure multi-execution is a nice trick for running black-box code (code that you don't want to--or can't--change) in a way that is secure with respect to information flow. The trick is this: every time you reach a sensitive value, you execute the sensitive value in one process, and you spawn another process using a secure default input. The process separation guarantees that sensitive values won't leak into the process containing the default value, so you know you should always be allowed to show the result of that one. As you might guess, secure multi-execution can slow down the program quite a bit, as it needs to spawn a new process every time it sees a sensitive value. To mitigate this, my collaborators Tom Austin and Cormac Flanagan developed a faceted execution semantics for programs that lets you execute a program on multiple values at the same time, with all of the security guarantees of secure multi-execution.
Policy-agnostic programming. While all of these other approaches can prevent sensitive values from leaking information, if we want programs to run most of the time, somebody needs to make sure that programs are written not to leak information in the first place. It turns out this is pretty difficult, so I have been working on programming model that factors information flow policies out of the rest of the program. (If I'm going to write a whole essay about information flow, of course I'm going to write about my own research too!) Instead of having to implement information flow policies as checks across the program, where any missing check can lead to a bug, type error, or runtime "access denied," programmers can now specify each policy once, associated with the data, along with a default value, and rely on the language runtime and/or compiler to make the code execute according to the policies. In the policy-agnostic system, the programmer can say that Brien Comey's follows should only be visible to followers, and the machine becomes responsible for making sure this policy is enforced everywhere, including the code implementing the recommendations algorithm. That policies can depend on sensitive values, that sensitive values may be shown to viewers whose identities are computed from sensitive values, and that enforcing policies usually implementing access checks across the code are all challenges. Our semantics for the Jeeves programming language (paper here) addresses all of these issues using a dynamic faceted execution approach, and we have also extended this programming model to handle applications with a SQL database backend (paper here). We are also working on a static type-driven repair approach (draft here).

I don't know how much this Twitter account leak upset the Comeys, but reading this article was pretty much the most exciting thing that I have ever done. Up until now, most people have thought about security in terms of protecting individual data items, rather than in terms of a complex and subtle interaction with the programs that use them. This has started to change in the last few years as people have been realizing just how much of our data is online, and just how unreliable the code is that we trust with this data. I hope that this Comey leak will cause even more people to realize how important it is to reason deeply about what our software is doing. (And to fund and use our research. :))

* A student in my spring software security course (basically, programming languages applied to security), Scott, had noticed earlier this semester that emails from Instagram allowed previews of protected accounts he was not following. He reported this to Facebook's bug bounty program and made $1000. I told him to please write in the course reviews that the course helped him make money.
** Note that a lot of other things are going on in this Comey story. The reporter used facts about Comey to figure out the setup, and also some clever inference. But this clever inference exploited a specific information leak from the secret follows list to the recommendations list, and this post focuses on this kind of leak.

Wednesday, March 08, 2017

Autoresponse: Striking: A Day Without a Woman

In front of the Federal Building, Pittsburgh.

Dear Message Sender,

I am not responding to email on March 8, 2017 because I am observing A Day Without a Woman. In the afternoon, I will be joining students at CMU in a silent protest and attending a rally at the City-County building in downtown Pittsburgh.

Despite the efforts and progress made towards gender equality, women do not have an equal voice, and we are not appreciated equally in society. For example:

The gender wage gap persists, and two-thirds of minimum wage earners are women.
The House and Senate are currently 19% women. This means an 81% male group is making decisions that affect women's health and lives.
The United States still has not had a female president, even though many countries we'd like to think we are more progressive than have a woman currently in power.
Only 24 of the Fortune 500 CEOs are women. Money is power, and women have less of it.

Some may say that women are simply less ambitious, or don't want to be in positions of power and influence as much as men do. Study after study--and I'm happy to talk in more detail--have shown that women who do have the ambition face far more obstacles than men do. Also, my statistics above focus on what people like to call "privileged" women, but the undervaluing of female labor (including domestic and emotional labor) make life even harder for those in less fortunate circumstances.

There are many ways you can show support. The first is to attend local rallies, especially if you have an employment situation where you will have few consequences. Even if you are not a woman and/or not striking today, here are some things you can do:

Listen to women, and call people out when women's voices are not heard.
Question your own biases. (You can have biases even if you are a woman!)
Vote for women. Champion women. Mentor women. (In that order.)
Support people who are striking, and who are more actively fighting for women's rights and the appreciation of women's labor, both financially and by amplifying their voices.

Yours in solidarity,
Jean