Saturday, June 25, 2016

Counter-Advice for the PhD

Recently I attended the Programming Languages Mentoring Workshop, a program to introduce advanced undergraduates and early-stage PhD students to research in general and research in our field. (By the way, this is a fantastic workshop and I highly encourage students to attend!) While listening to advice from other academics and talking to the students about their questions, I realized that I have come to disagree with much of the conventional wisdom and advice for PhD students, some of which I have been guilty of re-dispensing. I express my dissent here.

Clarification: The workshop was not the source of all of the quotes! It was simply what got me thinking about the dangers of taking any one piece of advice too seriously. (The workshop itself is great for showcasing different points of view.)

--

"To decide what to work on, read lots of papers and then choose the best problem."
One of my undergraduate professors once told me, "Take all advice with a grain of salt. Most advice is highlights and wishful thinking." This was one of the best pieces of advice anyone has ever given me. It is easy for people to give this kind of advice about choosing research problems after they have learned what makes a good research problem. The advice is far more difficult to follow for people who are still developing their research taste. While some people probably have chosen research problems this way and while it is helpful to more deeply understand your area, early-stage researchers sometimes just have to jump in, do things, and learn from the confusion.

Also, in the early stages of a PhD, developing research skills (project management, time management, and communication of results) can be far more important than working on the best problem. In this case, I would recommend working on a problem that a mentor is sufficiently invested in to help you gain the skills you need.

--

"Choose an advisor you are completely compatible with."
This advice goes in the same category as the previous one. Once you have gone through the PhD and developed a deep understanding of who you are and who your advisor is, it is easy to think that a good situation can be easily recreated or a bad situation could have been more easily avoided. While you should look enough into your soul and do enough due diligence to make sure there are no glaring red flags, you should not worry if you do not feel like you know enough about your working style or preferences to choose a perfectly compatible advisor. Advisor-advisee relationships, like all other human relationships, depend a lot on many factors, only some of which are under the control of the two main participants, and also can evolve quite a bit over time.

--

"Do a PhD because you are in love."
 I completely agree that doing a PhD out of love of learning, love of discovery, or love for a discipline is a much better reason than doing a PhD for the money, fame, or glory. But I've seen many students get stuck in the "passion trap," the idea that you need to be completely in love with something before you invest significant amounts of time and energy into it. According to Cal Newport, who has written extensively about this, passion is something that often comes later, after you have become an expert and people recognize you for your contributions.

What I have noticed is that people often differ more in the narratives they have about their relationships with their work than in their actual relationships with their work. In my Quora answer to the question "How common is it for PhD students to do work they are not passionate in?" I talk about how one's relationship with a project often follows a trajectory similar to a romantic relationship: infatuation, followed by a steady state that comes sometime later, often much later, with a period of confusion and negotiation in between. I've seen every researcher I know well experience the confusion phase, but some researchers are more open than others to talking about it.

A side observation is that relationships with research seem to vary culturally: for instance, being blanket positive about one's own research seems to go along with the American tendency to be blanket positive about one's own life.

--

"Superstars are born, not made."
No one has said this specific phrase to me, but many people have implied it with the qualities they value in students. Once a professor told me that some students "just can't cut it." I've seen professors pick favorites based on internal metrics they have (often, it seems, based on how much a student reminds them of themselves). I've seen students decide someone is the smartest among them because of confidence, or some other "star" quality that doesn't necessarily correlate directly with research skill. While there is a baseline level of intelligence, curiosity, drive, and tolerance for uncertainty that someone needs to be a good researcher, many of the qualities that make a great researcher--discipline and persistence, to name two--are not entirely innate and definitely not strongly correlated with the confidence and charisma that seem to build many star reputations. (Note: confidence and charisma can also be learned.)

--

"The PhD is lonely without a significant other, especially if you are a woman in a male-dominated field."
I was surprised that people have told me this--and that, at least when I was starting my PhD, there was a common conception that having a romantic partner was somehow necessary for enduring the trials of the PhD. While it is important to nurture healthy relationships with supportive people, a significant other does not need to be one of them. Especially in relationships between people of similar levels of ambition, it can become tricky to negotiate coevolution and colocation, thus adding unnecessary pressure to the PhD experience. (And, unfortunately, because of society's insistence on holding on to gender roles, women who date men often find themselves with more pressure to conform to their partner's desires.) During my PhD, I had many friends, some of them women in male-dominated fields, many who ended up becoming stars in their fields, who were happily single for significant portions of their PhD.

--

"The most successful PhD students work all the time."
See my answer (and other answers) to the Quora question "Do Ph.D. students have time for hobbies?" Nurturing a healthy relationship with a significant other also counts as a hobby, so you can do that too if that's what you want.

--

"If it's not hard, it's not worth doing."
There is something to be said for only doing things that help you grow in some way, but there is necessary challenge and then there is unnecessary challenge. During the PhD, it is necessary to come to terms with uncertainty, confusion, and possible rejection of your ideas from the community. This is crucial for one's development into a full-fledged researcher. What is less necessary, however, is depriving yourself of food or sleep, always working to the point of exhaustion, or mismanaging your time so that you are always under deadline pressure. For some people it may be necessary to endure toxic advisor or collaborator relationships, but I would encourage those people to seek ways out of that if possible--abuse does not need to go hand-in-hand with growth. Self-inflicted struggle only makes the necessary struggle more difficult.

--

I hope you realize by now that there is no single right way to do the PhD and that there are many valid--and sometimes conflicting--views on what a good path is. Have fun with the confusion. :)

Sunday, May 22, 2016

"We run things, things don't run we"

Yesterday was Porchfest, an annual event where local musicians perform on their porches all around Somerville, MA. My friend Stefan Anderson, who performs as the solo act The Stefan Banderson, played his cover of Miley Cyrus's "We Can't Stop," which is my favorite cover of all time. (Summer 2013 I heard him perform the cover before I heard the actual song.)

Something I like about Stefan's covers is how they highlight the absurdity of pop song lyrics. During this performance I became obsessed with the line "We run things, things don't run we." After I spent far longer thinking about this line than a serious adult should, I had the following enlightening email exchange with the friends I went to Porchfest with.

This is another post in the series where I experiment with publishing emails the way RMO does. This post is about not being able to stop. It is also about what happens when science PhDs close-read Miley lyrics.

(While we're on the topic of Porchfest I'd also like to plug my friend Christiana's band Paper Waves. Check them out--they're great!)

--

from:Jean Yang
to:Alison Hill,
Elizabeth Brown,
Ali Rabi
date:Sun, May 22, 2016 at 10:46 AM
subject:We run things, things don't run we



Guys I looked this up and this is a real lyric of the song. I really like it. I think I'll make it my new life motto.


---




from:Elizabeth Brown
to:Jean Yang
cc:Alison Hill,
Ali Rabi
date:Sun, May 22, 2016 at 11:05 AM
subject:Re: We run things, things don't run we

Oh my god that is wonderful. It makes a great motto!

--

from:Alison Hill
to:Elizabeth Brown
cc:Jean Yang,
Ali Rabi
date:Sun, May 22, 2016 at 12:55 PM


the final "we" of the third line also serves as the subject of the fourth line .. optionally .. or you could interpret the lines separately, and then the last one is not a statement but an imperative

but all this just makes me like it even more!

--

from:Jean Yang
to:Alison Hill
cc:Elizabeth Brown,
Ali Rabi
date:Sun, May 22, 2016 at 3:18 PM

Yeah, one can also interpret as a fight to control the forces that control us.

I read this article about how "We Can't Stop" is actually a sad song:
http://www.businessinsider.com/why-miley-cyrus-we-cant-stop-is-actually-the-saddest-song-of-the-summer-2013-8

This particular stanza is particularly interesting as a commentary on the power you give up when you enter into a super glam life--whether it's partying or academia. You act like you run the show but you've already bought in to something much bigger and you "can't stop." "Don't take nothing from nobody" and maybe you can escape the greater forces.

SO MUCH SUBTLETLY I LOVE IT.

Friday, May 20, 2016

A Recent Exchange on Money and Time

Below is a recent exchange with my friend Rob Ochshorn, who often writes emails instead of blog posts to work out his thoughts. Here I am borrowing not only his technique, but also his thoughts, on a topic I have recently begun to think about and would like to think about more.

Seth and Aliza were included non-consensually. I continue to include them for context: this conversation happened with an audience.

---

from:Jean Yang
to:Aliza Aufrichtig,
Robert M Ochshorn,
Seth Stephens-Davidowitz
date:Sun, May 15, 2016 at 10:57 AM
subject:Cultural shorthand for money and time

Our society has a widely used abstraction for things that cost money ($$$) but there's not similar concept for time.

I started thinking about this because I wanted a way to express things that are time-expensive (was thinking ⌚⌚⌚).

Related to this, it would be nice if people could tell me how much time things cost, rather than just how much money. Thinking about this made me wonder how much our lack of shorthand for this idea is a result of our entire society not caring about time as much as money, or because the people who shape our cultural shorthand (for instance the people running Yelp) care more about money than time.

Zooming out even further, isn't it interesting that Silicon Valley has become so obsessed with helping people live forever--while perpetuating a culture that steals people's time and youth in an unprecedented all-consuming way? (Based on how you think about it, it is or isn't unprecedented. Let's discuss.) I wonder what this means...

---

from:Robert Ochshorn
to:Jean Yang
cc:Aliza Aufrichtig,
Seth Stephens-Davidowitz
date:Mon, May 16, 2016 at 2:33 AM
subject:Re: Cultural shorthand for money and time

If only the singularity-upload fantasy of an eternal life were based on a mature understanding of leisure! I’m using “leisure” to mean the non-financialized use of time. This distinguishes it from the manner of time that Silicon Valley loves to “save” you. I mean, most of these stupid startups justify their work in terms of saving you time. Some startups allow you to convert your money into somebody else’s time (InstaCart, TaskRabbit, Magic), while others use automation and interface, in the grand tradition of the dishwasher, to let you do your work/chores faster (“so you can focus on …”). 

I would dispute your claim that our society cares about money more than time. I think it’s worse than that: much tech marketing and ideology[0] is based on the myth of a temporal-financial relativity: the conversion of money into time (the inverse, time->money, being what we call a job).

Your Yelp example is interesting. It makes me think of the 50s fantasy of “fast food.” Silicon Valley has proposed a modernization of this concept (Soylent), which should make clear what the purpose of this time-saving is: that we will have more time to work! In other words: what is the point of “saving time” if not to prepare or enjoy a nice meal?

Iconographically, there’s some design precedent creeping into popular consciousness. Medium, for example, numerically estimates the time an article will take its average user to ingest (“5 min read”). I’m kicking myself for not introducing you to my friend Tristan, who just passed through Cambridge for a Berkman lecture and runs a Time Well Spent movement that sees itself as a time-respecting “Fair Trade” equivalent for tech. What I like about your “⌚⌚⌚ is that it implies a depth and prestige to a potential long-form experience—it makes me feel like I will be taken out of my normal, fragmentary, hectic existence and transported into a deep, coherent, and focused place for a while. The watches culturally suggest wealth and tradition. It’s a very different feel with hourglasses (⌛⌛⌛)—perhaps the difference between being in control of one’s time verses our lives slipping away from us.

There’s a media-theoretical concept that’s important here: Marshall McLuhan’s notion of the reversal. The automobile makes us faster, but when you extend the concept as far as it goes, we’re stuck sitting in traffic. Ivan Illich took this even further, making a brutal calculation[1] of a car’s speed based on all of the factors that allow us to occasionally sit in a car cruising down the open road.

So for precedent I would propose McDonald’s, the dishwasher, and the automobile. Think about the ways they play together: teenagers working at McDonald’s to buy a car while their mothers enter the workforce, aided at home by the dishwasher.

Warm greetings from Ramallah! Time definitely seems to work differently here.

Your correspondent,
R.M.O.


[0] This is slightly off-topic, but it’s too lovely to omit. From Levy’s Inside the Googleplex, a great snapshot how time and latency are discussed/traded within Google (emph mine):

After the Code Yellow, Google set a companywide OKR (the objective key result metric Google uses to set goals) to fight latency. To help meet its goals, the company created a market-based incentive program for product teams to juice up performance—a cap-and-trade model in which teams were mandated latency ceilings or maximum performance times. If a team didn’t make its benchmarks, says H√∂lzle, it accrued a debt that had to be paid off by barter with a team that exceeded its benchmarks. “You could trade for an engineer or machines. Whatever,” he says. The metric for this exchange was, oddly enough, human lives. The calculation goes like this: average human life expectancy is seventy years. That’s about two billion seconds. If a product has 100 million users and unnecessarily wastes four seconds of a user’s time every day, that was more than a hundred people killed in a year. So if the Gmail team wasn’t meeting its goals, it might go to the Picasa team and ask for ten lives to lift its speed budget into the black. In exchange, the Gmailers might yield a thousand servers from its allocation or all its massage tickets for the next month.

[1] From Wikipedia:

…the concept of counterproductivity: when institutions of modern industrial society impede their purported aims. For example, Ivan Illich calculated that, in America in the 1970s, if you add the time spent to work to earn the money to buy a car, the time spent in the car (including traffic jam), the time spent in the health care industry because of a car crash, the time spent in the oil industry to fuel cars ...etc., and you divide the number of kilometres traveled per year by that, you obtain the following calculation: 10000 km per year per person divided by 1600 hours per year per American equals 6 km per hour. So the real speed of a car would be about 3.7 miles per hour.

Friday, May 13, 2016

Networking Tips for Younger PhD Students

This post was a collaboration with Nadia Polikarpova and Shachar Itzhaky, done while we were supposed to be collaborating on other things.

A younger student in the group where I did my PhD is going to his first conference next week and my advisor sent him my way for advice. Nadia, Shachar, and I had already been discussing research (and attending a BBQ) for hours at this point, so we welcomed the opportunity to discuss something else. Here's what we came up with.
  • Be prepared to show off your research. A main goal of attending a conference is to get your name out there, associated with good work. At a conference, you'll be lucky to get more than five minutes in with someone, especially somebody established. It would serve you well to prepare a succinct, memorable elevator pitch for your work. If you have a demo, it doesn't hurt to have that ready in case someone wants to see. Bonus: if you can tailor your pitch based on the interests of who you're talking to, they'll like it more.
  • Make your networking bingo sheet--and play it. Make a list of people who you'd like to talk to: people about whose work you have questions, people whose work you cite/whose papers your read, people you'd like to tell about your work, and people whose work you admire in general. You may want to consult your advisor and/or collaborators for a good list. Having a list helps keep you on track for making the most of your time at the conference. I also like feeling like I'm on a mission.
  • Don't be afraid to ask for introductions. While most people in my community (programming languages) are pretty friendly, it can often be easier to talk to someone if you get introduced. Don't be afraid to ask people if they are able to introduce you to someone on your bingo sheet.
  • Don't sit with the same people twice. This is a conference, not vacation with your best friends. My former advisor Saman Amarasinghe liked to tell his PhD students to split up at all meals so they can meet new people. It's fine to have a friend you go around the conference with, but make sure you're talking to new people during each break and meal.
  • Prepare questions and talking points. When I was a first-year PhD student attending my first POPL, my friend Luke and I were so excited to see Xavier Leroy, one of our research heroes, standing by himself during the break that we ran up to him and introduced ourselves. As we had no further game plan, we answered the questions he asked us about who we were and then we ran away. At the next conference, PLDI, I was determined to do better. I asked his student, Jean-Baptiste, if I could have lunch with them on one of the days. I figured that since Jean-Baptiste was my friend, Xavier could become my friend by transitivity. The conference flew by and we ran out of lunches, but Jean-Baptiste said I was welcome to walk with them while Xavier fetched his suitcase and walked to get a cab. Again, I was very excited, but again, I had nothing to say and the conversation more or less consisted of me answering questions that Xavier politely asked me. Ever since, I've always made sure to prepare a couple of questions and/or talking points if I really want to talk to someone. It also doesn't hurt to prepare a couple of general stories/talking points to break the ice when you sit at that lunch table full of people you don't know.
  • Listen more than you talk. It is well known that Level 1 networking for graduate students involves ambushing innocent passers-by with a well-rehearsed elevator pitch. While this more or less does the job, there are greater heights to aspire to. The next level involves listening to and interacting with the other person. In How to Win Friends and Influence People, Dale Carnegie talks about how much more people like you if you let them talk first and figure out what they want to talk about. This is also true in research settings. I, for one, tend to be much more impressed with someone if they can ask insightful questions/offer useful suggestions about my work than if they simply presented to me interesting ideas about their own work.
  • Dress appropriately. Dressing appropriately increases one's efficacy in all situations and conferences are no different. Your main fashion goals at a conference are 1) not to stand out too much, 2) to be sufficiently mobile to move between groups and between the conference venue and evening activities, and 3) to be sufficiently comfortable that you can last from the morning until late at night. For 2), make sure your backpack isn't too big and you don't have too much stuff but have your jacket/comfortable shoes if you're going to head out with a group for dinner and/or drinks.
  • Carry a notebook. If you're doing it right, you'll be having lots of conversations. It will be useful to write down things you learn and things to follow up on. Notebooks are also useful for drawing figures to describe your research.
  • Always wear your nametag. People are going to remember who you are a lot better if they see your name every time they look at you.
  • Mind your manners. You want people to remember you for your research without being distracted by poor manners. It's a good idea to be careful not to interrupt people and not to make a mess when you eat. I also try not to make too big of a deal out of my dietary restrictions when we're making decisions about where to eat. It makes it a lot easier, especially in large groups, if you try to be agreeable and go with the flow.
Finally, have fun. Conferences have helped me solidify friendships with many people in my research area. Especially as you spend more time in a community, conferences can become more like a family reunion than a serious networking event with faceless paper authors.

As always, let us know if you have other tips!

Saturday, May 07, 2016

Why It's Not Academia's Job to Produce Code That Ships

My scientist friends often scoff at crime show writers' creative interpretation of technology's limits.

The technology shiny world of CSI: Cyber.
"Let's zoom in here," a character says in an investigation room with floor-to-ceiling screens showing high-definition maps of the show's major metropolitan area. A flick of the fingers reveals an image of the suspect at mouth-watering resolution.

In another scene, the characters listen to a voice mail from the suspect. "What's that in the background?" one investigator asks. Using an interface that deadmau5 would kill to have, the hacker of the bunch strips out the the talking, the other sounds. They say some words like "triangulation" and, eureka, they deduce the suspect's exact location.

Yes, real police technology is nowhere near this sophisticated. Yes, nobody (except maybe the government, secretly) has technology like this. But those who criticize the lack of realism are missing the point.

The realities that art constructs take us out of our existing frames of perception--not only for fun, but also for profit. Many important technological advances, from the submarine from the cell phone, appeared in fiction well before they appeared in real life. Correlation does not imply causation, but many dare say that fiction inspires science.

Some complaints against academic Computer Science.
This brings us to the relationship between academic Computer Science and the tech industry. Recently, people in industry have made similar criticisms of academic computer science. Mike Hoye of Mozilla started the conversation by saying he was "extremely angry" with academics for making it difficult for industry to access the research results. This unleashed a stream of Internet frustration against academics about everything from lack of Open Access (not our faults) to squandering government funding (not entirely true) to not caring about reproducibility or sharing our code (addressed in an earlier blog post).

At the heart of the frustration is a legitimate accusation*: that academics care more about producing papers than about producing anything immediately (or close to immediately) useful for the real world. I have been hearing some variation of this criticism, from academics as well as industry people, for longer than I have been doing research. But these criticisms are equivalent to saying that TV writers care more about making a good show than being technically realistic. While both are correct observations, they should not be complaints. The real problem here is not that academics don't care about relevance or that industry does not care about principles, but that there is a mismatch in expectations.

It makes sense that people expect academic research results to work in companies right away. Research that makes tangible, measurable contributions is often what ends up being most popular with funding sources (including industry), media outlets, and other academics reviewing papers, faculty applications, and promotion cases. As a result, academic researchers are increasingly under pressure to do research that can be described as "realistic" and "practical," to explicitly make connections between academic work and the real, practical work that goes on in industry.

In reality, most research--and much of the research worth doing--is far from being immediately practical. For very applied research, the connections are natural and the claims of practicality may be a summer internship or startup away from being true. Everything else is a career bet. Academics bet years, sometimes the entirety, of their careers on visions of what the world will be like in five, ten, twenty years. Many, many academics spend many years doing what others consider "irrelevant," "crazy," or "impossible" so that the ideas are ready by the time the time the other factors--physical hardware, society--are in place.

The paths to becoming billion-dollar industries.
In Computer Science, it is especially easy to forget that longer-term research is important when we can already do so much with existing ideas. But even if we look at what ends up making  money, evidence shows that career bets are responsible for much of the technology we have today. The book Innovation in Information Technology talks about how ideas in computer science turned into billion-dollar ideas. A graphic from the book (on right) shows that the Internet started as a university project in the sixties. Another graphic shows there were similarly long tech transfer trajectories for ideas such as relational databases, the World Wide Web, speech recognition, and broadband in the last mile.

The story of slow transfer is true across Computer Science. People often ask me why I do research in programming languages if most of the mainstream programming languages were created by regular programmers. It we look closely, however, most of the features in mainstream languages came out of decades of research. Yes, Guido Van Rossum was a programmer and not a researcher before he became the Benevolent Dictator of Python. But Python's contribution is not in innovating in terms of any particular paradigm, but in combining well features like object orientation (Smalltalk, 1972, and Clu, 1975), anonymous lambda functions (the lambda calculus, 1937), and garbage collection (1959) with an interactive feel (1960s). As programming languages researchers, we're looking at what's next: how to address problems now that people without formal training are programming, now that we have all these security and privacy concerns. In a media interview about my Jeeves language for automatically enforcing security and privacy policies, I explained the purpose of creating research languages as follows: "We’re taking a crazy idea, showing that it can work at all, and then fleshing it out so that it can work in the real world."

Some may believe that all of the deep, difficult work has already been done in Computer Science--and now we should simply capitalize on the efforts of researchers past. History has shown that progress has always gone beyond people's imaginations. Henry Leavitt Ellsworth, the first Commissioner of the US Patent Office, is known to have made fun of the notion that progress is ending, saying, "The advancement of the arts, from year to year, taxes our credulity and seems to presage the arrival of that period when human improvement must end." And common sense tell us otherwise. All of our data is becoming digitized and we have no clue how to make sure we're not leaking too much information. We're using software to design drugs and diagnose illness without really understanding what the software is doing. To say we have finished making progress is to be satisfied with an unsatisfying status quo.

The challenge, then, is not to get academics to be more relevant, but to preserve the separate roles of industry and academia while promoting transfer of ideas. As academics, we can do better in communicating the expectations of academic research (an outreach problem) and developing more concrete standards of expectations for "practical" research (something that Artifact Evaluation Committees have been doing, but that could benefit from more input from industry). As a society, we also need to work towards having more patience with the pace of research--and with scientists taking career bets that don't pay off. Part of the onus is on scientists for better communicating the actual implications of the work. But everyone else also has a responsibility to understand that if we're in the business of developing tools for an unpredictable future--as academics are--it is unreasonable to expect that we can fill in all the details right away, or that we're always right.

It is exciting that we live in a time when it is possible to see technical ideas go from abstract formulations to billion-dollar industries in the course of a single lifetime. It is clear we need to rethink how academia and industry should coexist under these new circumstances. Asking academics to conform to the standards of industry, however, is like asking TV writers to conform to the standards of scientists--unnecessary and stifling to creativity. I invite you to think with me about how we can do better.

With thanks to Rob Miller and Emery Berger for helping with references.

* Note that this post does not address @mhoye's main complaint about reproducibility, for which the response is that, at least in Programming Languages and Software Engineering, we recognize this can be a problem (though not as big of a problem as some may think) and have been working on it through the formation of Artifact Evaluation Committees. This post addresses the more general "what are academics even doing?!" frustration that arose from the thread.

--

Addendum: Many have pointed out that @mhoye was mainly asking for researchers to share their code. I address the specific accusation about academics not sharing code in a previous blog post. I should add that I'm all for sharing of usable code, when that's relevant to the work. In fact, I'm co-chairing the POPL 2017 Artifact Evaluation Committee for this reason. I'm also all for bridging the gaps between academia and industry. This is why I started the Cybersecurity Factory accelerator for turning commercializing security research.

What I'm responding to in this post is the deeper underlying sentiment responsible for the misperception that academics do not share their code, the sentiment that academics are not relevant. This relevance, translating roughly into "something that can be turned into a commercial idea" or "something that can be implemented in a production platform" is what I mean by "shipping code." For those who wonder if people really expect this, the answer is yes. I've been asked everything from "why work on something if it's not usable in industry in the next five years?" to "why work on something if you're not solving the problems industry has right now?"

What I'd like is for people to recognize that in order for us to take bets on the future, not all research is going to seem relevant right away--and some if might never be relevant. It's a sad state of affairs when would-be Nobel laureatees end up driving car dealership shuttles because they failed to demonstrate immediate relevance. Supporting basic science in computer science involves patience with research.

Friday, May 06, 2016

Dual Booting Windows 10 and Ubuntu 16.04

For a couple of years, secure boot was making it harder every time to install a Windows/Linux dual boot. I've just finished successfully setting up a dual boot on my Lenovo X1 Carbon and am happy to report that it requires more or less the same tedious steps as before.

Here are step-by-step instructions for setting up a dual boot, where Windows has already been installed.
  1. Shrink the size of your Windows partition. Create a partition for Linux, an optional swap partition for Linux, and an optional other partition if you want to share files between your two partitions. You don't need to format your Linux and swap partitions. You will want to format the optional shared partition as NTFS.
  2. Get an Ubuntu image onto a DVD or a USB drive.
  3. Turn off Fast Boot in Windows. If you don't do this, your system is going to boot straight into Windows every time.
  4. In your BIOS, disable Secure Boot, enable UEFI, and disable Legacy Boot. You can mess with your BIOS settings by restarting and then intercepting startup by pressing "enter."
  5. Boot from your image. You can do this by restarting and intercepting startup to boot from a device. When choosing the installation option, make sure to choose "something else" instead of the ones that erase all your files.
  6. Follow the instructions and install Linux onto the partition(s) you've set aside for it. You will need to select the intended Linux partition and format it as ext[something] (I did ext3) and set the mount point as the top directory. You'll also need to designate the swap partition as such. You'll also have to explicitly specify your boot partition. Since you already have Windows installed, this will be the first partition formatted fat32. (This was different than before. I don't recall ever having to explicitly reformat my Linux partition or choose my boot partition.)
  7. From Linux, run Boot-Repair to reinstall your GRUB. Otherwise you'll boot straight into Windows every time. (If you accidentally booted back into Windows, you can repair your GRUB by running Linux live off your boot device.)
  8. If you want to share files between Windows and Linux, you'll also need to configure your shared partition so you can use it in Linux. (Instructions here. I had to restart before Dropbox let me put my directory there.)
Enjoy!

P.S. Does anyone know when I'd ever want to use Legacy Boot? According to everything I've read it doesn't seem like I ever need it for anything. Why is it there?

Sunday, May 01, 2016

Myth: "CS Researchers Don't Publish Code or Data"

A collaboration with Sam Tobin-Hochstadt, Assistant Professor at Indiana University.

There has been some buzz on social media about this "Extremely Angry" Twitter thread. Mike Hoye, Engineering Community Manager for Firefox at Mozilla expressed frustration about getting access to the products of research. It turns out that many other people are angry about this too.

While there are certainly legitimate aspects to these complaints, we’d like to address a specific misperception from this Twitter thread: the claim that "CS researchers don't publish code or data." The data simply shows this is not true.

First of all, while the Repeatability in Computer Science study from a few years ago highlighted some issues with reproducibility in our field, it revealed that a significant fraction of researchers (226 out of 402) in systems conferences have code available either directly linked from the paper, or on request.

Additionally, in the last few years, conferences in Programming Languages and Software Engineering have been pushing for more standardization of code-sharing and repeatability of results through Artifact Evaluation Committees. There is a comprehensive summary of Artifact Evaluation in our field here. (In fact, Jean is co-chairing the POPL 2017 AEC with Stephen Chong.) According to the site, artifacts are evaluated according to the following criteria:
  • Consistent with the paper. Does the artifact substantiate and help to reproduce the claims in the paper?
  • Complete. What is the fraction of the results that can be reproduced?
  • Well documented. Does the artifact describe and demonstrate how to apply the presented method to a new input?
  • Easy to reuse. How easy is it to reuse the provided artifact? 
The most detailed documentation is associated with the AEC for OOPSLA 2013, where 50 papers were accepted, 18 artifacts passed evaluation, and 3 artifacts were rejected. For PLDI 2014, 20 of of 50 papers submitted artifacts and 12 passed. By PLDI 2015, 27 papers (out of 52) had had approved artifacts. Even POPL, the “theoretical” PL conference, had 21 papers with approved artifacts by 2016.

For those wondering why more artifacts are not passing yet, here is a transcribed discussion by Edward Yang from PLDI 2014. The biggest takeaways are that 1) many people care about getting the community to share reproducible and reusable code and 2) it takes time to figure out the best ways to share research code. (That academia’s job is not to produce shippable products, as Sam pointed out on Twitter, is the subject of a longer conversation.)

While it’s going to take time for us to develop practices and standards that encourage reproducibility and reusability, we’ve already seen some improvements. Over the years, Artifact Evaluation has become more standardized and committees have moved towards asking researchers to package code in VMs if possible to ensure long-term reproducibility. Here are the latest instructions for authors.

Yes, we can always do better to push towards making all of our papers and code available and reusable. Yes, researchers can do better in helping bridge the communication gap between academia and industry--and this is something we've both worked at. But the evidence shows that the academic community is certainly sharing our code--and that we’ve been doing a better job of it each year.

Note: It would be really cool if someone did a survey of individual researchers. As Sam pointed out on Twitter, many of our colleagues use GitHub or other social version control and push their code even before the papers come out.

--

UPDATE! Here is a survey for academics to report on how we share code. Please fill it out so we can see what the numbers are like! Thanks to Emery Berger, Professor at UMass Amherst, for conducting the survey.

Related update. Some conversations with others reminded me that the times I haven't shared my code, it has been because I was collaborating with companies and corporate IP policies prevented me from sharing. (In fact, this was one of the reasons I preferred to stay in academia.) The survey above asks about this. I'm curious how the numbers come out.