Sunday, May 01, 2016

Myth: "CS Researchers Don't Publish Code or Data"

A collaboration with Sam Tobin-Hochstadt, Assistant Professor at Indiana University.

There has been some buzz on social media about this "Extremely Angry" Twitter thread. Mike Hoye, Engineering Community Manager for Firefox at Mozilla expressed frustration about getting access to the products of research. It turns out that many other people are angry about this too.

While there are certainly legitimate aspects to these complaints, we’d like to address a specific misperception from this Twitter thread: the claim that "CS researchers don't publish code or data." The data simply shows this is not true.

First of all, while the Repeatability in Computer Science study from a few years ago highlighted some issues with reproducibility in our field, it revealed that a significant fraction of researchers (226 out of 402) in systems conferences have code available either directly linked from the paper, or on request.

Additionally, in the last few years, conferences in Programming Languages and Software Engineering have been pushing for more standardization of code-sharing and repeatability of results through Artifact Evaluation Committees. There is a comprehensive summary of Artifact Evaluation in our field here. (In fact, Jean is co-chairing the POPL 2017 AEC with Stephen Chong.) According to the site, artifacts are evaluated according to the following criteria:
  • Consistent with the paper. Does the artifact substantiate and help to reproduce the claims in the paper?
  • Complete. What is the fraction of the results that can be reproduced?
  • Well documented. Does the artifact describe and demonstrate how to apply the presented method to a new input?
  • Easy to reuse. How easy is it to reuse the provided artifact? 
The most detailed documentation is associated with the AEC for OOPSLA 2013, where 50 papers were accepted, 18 artifacts passed evaluation, and 3 artifacts were rejected. For PLDI 2014, 20 of of 50 papers submitted artifacts and 12 passed. By PLDI 2015, 27 papers (out of 52) had had approved artifacts. Even POPL, the “theoretical” PL conference, had 21 papers with approved artifacts by 2016.

For those wondering why more artifacts are not passing yet, here is a transcribed discussion by Edward Yang from PLDI 2014. The biggest takeaways are that 1) many people care about getting the community to share reproducible and reusable code and 2) it takes time to figure out the best ways to share research code. (That academia’s job is not to produce shippable products, as Sam pointed out on Twitter, is the subject of a longer conversation.)

While it’s going to take time for us to develop practices and standards that encourage reproducibility and reusability, we’ve already seen some improvements. Over the years, Artifact Evaluation has become more standardized and committees have moved towards asking researchers to package code in VMs if possible to ensure long-term reproducibility. Here are the latest instructions for authors.

Yes, we can always do better to push towards making all of our papers and code available and reusable. Yes, researchers can do better in helping bridge the communication gap between academia and industry--and this is something we've both worked at. But the evidence shows that the academic community is certainly sharing our code--and that we’ve been doing a better job of it each year.

Note: It would be really cool if someone did a survey of individual researchers. As Sam pointed out on Twitter, many of our colleagues use GitHub or other social version control and push their code even before the papers come out.

--

UPDATE! Here is a survey for academics to report on how we share code. Please fill it out so we can see what the numbers are like! Thanks to Emery Berger, Professor at UMass Amherst, for conducting the survey.

Related update. Some conversations with others reminded me that the times I haven't shared my code, it has been because I was collaborating with companies and corporate IP policies prevented me from sharing. (In fact, this was one of the reasons I preferred to stay in academia.) The survey above asks about this. I'm curious how the numbers come out.

88 comments:

Tragic Cynic said...

To paraphrase what you've said:
> 50% of 'public research' have either:
> public examples of reproducible experiments (ie. code)
> or code on request. (good luck contacting the dead! :P)
>
> we need a committee to make sure experiments are repeatable
>
> people complain because our experiments (code) cannot be even attempted (won't run).
> it's hard to make reproducible experiments. But this is not that bad, because (direct quote)
> > "That academia’s job is not to produce shippable
> > products..." (Last sentence, paragraph 6)
> yes, we are bad at this. But we are improving year by year. (no hurry :<} )

This paraphrasing, of course, assumes that being able to run code is evidence for the thesis at hand.

When you guys:
Stop taking government checks
Make lectures in shorts & thongs, like Carmack (maybe chub&tuck as well?)
keep all your tax-payer(industry) funded findings public
By all that, I mean not doing this
Or DMCA claiming a student's homework for using your code libraries rather than dealing with the issue on-campus, asking for a library gitignore, etc...

Then,
and only then would I consider your expensive existence ( To those who fund you, through tax or fee ) a boon.

Anonymous said...

> This paraphrasing, of course, assumes that being able to run code is evidence for the thesis at hand.

Indeed, for a huge portion of CS research providing a runnable piece of code is not a priority.

> Make lectures in shorts & thongs, like Carmack (maybe chub&tuck as well?)

Is this a joke?

> keep all your tax-payer(industry) funded findings public

Believe me, most of the researchers support Open Access models.

> and only then would I consider your expensive existence ( To those who fund you, through tax or fee ) a boon.

So let me be clear, unless a researcher produces a runnable piece of code, you consider their research useless?

Unknown said...

First, don't confuse scientists with those suing Swartz — those were scientific publishers. I didn't know about MIT, but I don't support their decision. We'd like to get rid of scientific publishers, but there's lots of inertia, even though we don't get one penny out of what one might pay to read an article. Yet, it's happening.

Second, the device you're typing on wouldn't exist without research, and some of that research is too expensive for companies to fund — so it's either government-funded or won't happen. Even this functional programming thing that Carmack now loves talking about exists thanks to researchers (who've been at it for >50 years), not thanks to the companies who've ignored it.

Grigori Fursin said...

As an artifact evaluation chair from CGO and PPoPP (http://cTuning.org/ae), I can mention that the major problem is not about code and data sharing, but about sharing artifacts as reusable and customizable components. This would allow reserchers not only to validate current claims, but also to try techniques in a different environment and with different parameters.

However, after surveying authors, we noticed that there is simply no supporting technology for that. This motivated us to develop an open-source framework (Collective Knowledge) to let researchers share their artifacts as Python components with JSON API and meta information, and even to crowdsource empirical experiments (we focus mainly on application performance analysis and optimization in computer systems' research):
* http://github.com/ctuning/ck

We also promote new publication model where articles and artifacts are submitted to an open archive at the time of submission, while all reviews are public (i.e. it's a cooperative effort to validate and improve shared code, data and results rather than just shaming and rejecting problematic ones). We tried it at our ADAPT workshop this year, and it was very successful:
* http://adapt-workshop.org
* http://arxiv.org/abs/1406.4020

Hope it will help address some of the raised issues!

Vicky Steeves said...

So there's a big difference between reproducibility & making code and data available. While open access is a big ingredient in reproducibility, there's the problem of "dependency hell." This basically amounts to the crazy amount of dependencies required to successfully rerun and reuse someone else's code & data -- for instance, if you are on a different OS, if you have a different version of a software library, if you don't document what you are doing (publishing a bunch of CSVs and scripts does nothing to help people reproduce things), then your research won't be reproducible. While there's great OA in CS, there's not great reproducibility.

aliyaa said...

It is a big opportunity for me to apply on a personal statement revision because it will bring them perfect site for us.

Unknown said...

Programming is combination of intelligent and creative work. Programmers can do anything with code. The entire Programming tutorials that you mention here on this blog are awesome. Beginners Heap also provides latest tutorials of Programming from beginning to advance level.
Be with us to learn programming in new and creative way.

Unknown said...

Good job in presenting the correct content with the clear explanation. The content looks real with valid information. Good Wor, Learn how our role-based and specialty aws certifications help you demonstrate your deep AWS knowledge.

Anonymous said...

thank you for sharing useful information.
web programming tutorial
welookups

IT Tutorials said...

Really useful information. Thank you so much for sharing.It will help everyone.Keep Post. RPA training in chennai | RPA training in Chennai with placement | UiPath training in Chennai | UiPath certification in Chennai with cost

Diya shree said...

Hey, it's really nice information to share here. Thanks for your blog, keep posting like this regularly. Thank you!!!

PMP Training in Chennai | Best PMP Training in Chennai |
Project Management Requirements | PMP Certification Training Courses and Books |
PMP Certification Courses in Velachery & OMR | PMP Certification training in chennai | Project Manager Interview Questions & Answer

DedicatedHosting4u said...

Just seen your Article, it amazed me and surpised me with god thoughts that eveyone will benefit from it. It is really a very informative post for all those budding entreprenuers planning to take advantage of post for business expansions. You always share such a wonderful articlewhich helps us to gain knowledge .Thanks for sharing such a wonderful article, It will be deinitely helpful and fruitful article.
Thanks
DedicatedHosting4u.com

MindtechAffiliates said...

Usually, I visit your blogs and get updated with the information you include but today’s blog would be the most appreciable...

Thanks
Cpa offers

IICT Technologies said...

Superb
SAP Training in Chennai
SAP ABAP Training in Chennai
SAP Basis Training in Chennai
SAP FICO Training in Chennai
SAP SD Training in Chennai
SAP MM Training in Chennai
SAP PM Training in Chennai
SAP PP Training in Chennai
SAP MDG Training in Chennai
SAP EHS Training in Chennai

Tutorials said...

Thanks for the article. Its very useful. Keep sharing.   AWS Certification course online  |     AWS online course     AWS course online  

Michael said...

Your blog was amazing
http://alltopc.com/

Rose said...

https://getdailybook.com/

Trending Technologies said...

Excellent blog with valuable information and just added your blog to my bookmarking sites thank for sharing.
Data Science Course in Bangalore

Srigokul said...

Interesting.. Nice Blog, Thanks for Sharing this useful information...

Data science training in chennai
Data science course in chennai

madhavi reddy said...

I Want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging endeavors.
business analytics course in bangalore

madhavi reddy said...

I Want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging endeavors.
cyber security course in bangalore

Deekshitha said...

Informative blog
data scientist course in Bangalore

madhavi reddy said...

I Want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging endeavors.
data science course in bangalore with placement

Data Science Course in Bhilai - 360DigiTMG said...

Really wonderful blog completely enjoyed reading and learning to gain the vast knowledge. Eventually, this blog helps in developing certain skills which in turn helpful in implementing those skills. Thanking the blogger for delivering such a beautiful content and keep posting the contents in upcoming days.

data science training institute in bangalore

data analytics courses in bangalore with placement - 360DigiTMG said...

Tremendous blog quite easy to grasp the subject since the content is very simple to understand. Obviously, this helps the participants to engage themselves in to the subject without much difficulty. Hope you further educate the readers in the same manner and keep sharing the content as always you do.

data analytics courses in bangalore with placement

madhavi reddy said...

I Want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging endeavors.
data science course in bangalore with placement

Mallela said...

Thanks for posting the best information and the blog is very helpful.data science institutes in hyderabad

AI Patasala said...

Thanks for sharing such a fantastic blog. I really like it. Keep sharing some more articles.
AI Patasala-Data Science course in Hyderabad
AI Patasala-Artificial Intelligence Course
AI Patasala-Machine Learning Course in Hyderabad

trainingcourses said...

I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
Best Data Science courses in Hyderabad

Data Science Courses said...

Awesome article. I enjoyed reading your articles. this can be really a good scan for me. wanting forward to reading new articles. maintain the nice work!
Data Science Courses in Bangalore

Data Analytics Course said...

Excellent Blog! I would like to thank you for the efforts you have made in writing this post. Gained lots of knowledge.
Data Analytics Course

Datascience Course Analyst said...

Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.
Data Science Course in Bangalore

trainingcourses said...

I think I have never seen such blogs before that have completed things with all the details which I want. So kindly update this ever for us.
digital marketing courses in hyderabad with placement

Business Analytics said...

I am sure it will help many people. Keep up the good work. It's very compelling and I enjoyed browsing the entire blog.
Business Analytics Course in Bangalore

AI Courses said...

What an incredible message this is. Truly one of the best posts I have ever seen in my life. Wow, keep it up.
AI Courses in Bangalore

Mallela said...

Thanks for posting the best information and the blog is very helpful.artificial intelligence course in hyderabad

Deekshitha said...

Informative blog
ai training in hyderabad

Deekshitha said...

Informative blog
ai training in hyderabad

InstituteBlr said...

I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!
data analytics course in bangalore

Mallela said...

Thanks for posting the best information and the blog is very important.digital marketing institute in hyderabad

Data Science Courses said...

Awesome article. I enjoyed reading your articles. this can be really a good scan for me. wanting forward to reading new articles. maintain the nice work!
Data Science Courses in Bangalore

AI Courses said...

What an incredible message this is. Truly one of the best posts I have ever seen in my life. Wow, keep it up.
AI Courses in Bangalore

Business Analytics said...

I am sure it will help many people. Keep up the good work. It's very compelling and I enjoyed browsing the entire blog.
Business Analytics Course in Bangalore

yanmaneee said...

kobe basketball shoes
curry shoes
lebron shoes
a bathing ape
kobe shoes
goyard handbags
yeezy boost 500
moncler jackets
off-white
yeezy

data analytics books said...

I am a new user of this site, so here I saw several articles and posts published on this site, I am more interested in some of them, hope you will provide more information on these topics in your next articles.
data analytics training in bangalore

Pallavi reddy said...

i am glad to discover this page : i have to thank you for the time i spent on this especially great reading !! i really liked each part and also bookmarked you for new information on your site.
artificial intelligence training in chennai

Data Science said...

I am glad to discover this page. I have to thank you for the time I spent on this especially great reading !! I really liked each part and also bookmarked you for new information on your site.
Data Science Training in Chennai

Data Science Course in Bhilai - 360DigiTMG said...

Truly mind blowing blog went amazed with the subject they have developed the content. These kind of posts really helpful to gain the knowledge of unknown things which surely triggers to motivate and learn the new innovative contents. Hope you deliver the similar successive contents forthcoming as well.

Data Science in Bangalore

trainingcourses said...

Impressive. Your story always bring hope and new energy. Keep up the good work.
best data science institute in hyderabad

Pallavi reddy said...

i am glad to discover this page : i have to thank you for the time i spent on this especially great reading !! i really liked each part and also bookmarked you for new information on your site.
artificial intelligence training in chennai

Deekshitha said...

Informative blog
best digital marketing institute in hyderabad

Deekshitha said...

Informative blog
ai training in hyderabad

Deekshitha said...

Informative blog
data science course in Nashik

Data Science Course in Bhilai - 360DigiTMG said...

Terrific post thoroughly enjoyed reading the blog and more over found to be the tremendous one. In fact, educating the participants with it's amazing content. Hope you share the similar content consecutively.

data science course in varanasi

Data Science Training in Bangalore said...

I bookmarked your website because this site contains valuable information. I am very satisfied with the quality and the presentation of the articles. Thank you so much for saving great things. I am very grateful for this site.

Data Science Training in Bangalore

Digital Marketing Training in Bangalore said...

I have voiced some of the posts on your website now, and I really like your blogging style. I added it to my list of favorite blogging sites and will be back soon ...

Digital Marketing Training in Bangalore

Artificial Intelligence Training in Bangalore said...

I found Habit to be a transparent site, a social hub that is a conglomerate of buyers and sellers willing to offer digital advice online at a decent cost.

Artificial Intelligence Training in Bangalore

Machine Learning Course in Bangalore said...

You actually make it seem like it's really easy with your acting, but I think it's something I think I would never understand. I find that too complicated and extremely broad. I look forward to your next message. I'll try to figure it out!

Machine Learning Course in Bangalore

anonymous said...

I want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging enedevors.
business analytics courses

Artificial Intelligence Training in Bangalore said...

I wanted to leave a little comment to support you and wish you the best of luck. We wish you the best of luck in all of your blogging endeavors.

Artificial Intelligence Training in Bangalore

Machine Learning Course in Bangalore said...

Truly incredible blog found to be very impressive due to which the learners who go through it will try to explore themselves with the content to develop the skills to an extreme level. Eventually, thanking the blogger to come up with such phenomenal content. Hope you arrive with similar content in the future as well.

Machine Learning Course in Bangalore

DS Training in Bangalore said...

You have completed certain reliable points there. I did some research on the subject and found that almost everyone will agree with your blog.

Data Science Training in Bangalore

Priya Rathod said...

I was impressed by the information that you have on your site. It showed me how much experience you have in this area, and also gave me some options to consider.
AWS Training in Hyderabad
AWS Course in Hyderabad

jony blaze said...

Great Article. I really liked your blog post! It was well organized, insightful and most of all helpful.
Artificial Intelligence Training in Hyderabad
Artificial Intelligence Course in Hyderabad

Priya Rathod said...

I love this article. It's well-written. Thanks for all the effort you put into it! I enjoyed reading it and plan to read many more of your articles in the future.
Data Science Training in Hyderabad
Data Science Course in Hyderabad

anonymous said...

I want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging enedevors.
data science course

Tech Institute said...

I am really enjoying reading your well written articles. I am looking forward to reading new articles. Keep up the good work.
Data Science Courses in Bangalore

Bhuvankumar said...

I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
Data Analytics Course

Dev Kumar said...

I am sure it will help many people. Keep up the good work. It's very compelling and I enjoyed browsing the entire blog.
Business Analytics Course in Bangalore

kumal kumar said...

Very good message. I stumbled across your blog and wanted to say that I really enjoyed reading your articles.
AI Courses in Bangalore

Raghav said...

It's like you understand the topic well, but forgot to include your readers. Maybe you should think about it from several angles.
Data Science Course in Delhi

Karan said...

Hi, I looked at most of your posts. This article is probably where I got the most useful information for my research. Thanks for posting, we can find out more about this. Do you know of any other websites on this topic?
Business Analytics Course in Delhi

Mahesh said...

Very good message. I stumbled across your blog and wanted to say that I really enjoyed reading your articles. Anyway, I will subscribe to your feed and hope you post again soon.
Artificial Intelligence Course in Delhi

Bharath said...

I was browsing the internet for information and found your blog. I am impressed with the information you have on this blog.
Machine Learning Course in Delhi

Data Science Course in Bhilai - 360DigiTMG said...

Mind-blowing went amazed with the content posted. Containing the information in its unique format with fully loaded valid info, which ultimately grabs the folks to go through its content. Hope you to keep up maintaining the standards in posting the content further too.

Data Science Course in Bangalore

kumal kumar said...
This comment has been removed by the author.
kumal kumar said...

What an incredible message this is. Truly one of the best posts I have ever seen in my life. Wow, keep it up.
AI Courses in Bangalore

Sushil said...

Actually I read it yesterday but I had some ideas about it and today I wanted to read it again because it is so well written.
IOT Course

Deekshitha said...

Informative blog
data analytics courses in hyderabad

Prajwal said...

Very good message. I stumbled across your blog and wanted to say that I really enjoyed reading your articles. Anyway, I will subscribe to your feed and hope you post again soon.
MLOps Course

Data Science said...

I am glad to discover this page. I have to thank you for the time I spent on this especially great reading !! I really liked each part and also bookmarked you for new information on your site.
Data Scientist Course in Delhi

Data Science said...

I am glad to discover this page. I have to thank you for the time I spent on this especially great reading !! I really liked each part and also bookmarked you for new information on your site.
Data Science Course in Gurgaon

Sushil said...

Hi, I was browsing the internet for information and found your blog. I am impressed with the information you have on this blog. Thanks for sharing
MLOps Training

Varun said...

It's good to visit your blog again, it's been months for me. Well, this article that I have been waiting for so long. I will need this post to complete my college homework, and it has the exact same topic with your article. Thanks, have a good day.
IoT Training

Tech Course said...

Very good message. I stumbled across your blog and wanted to say that I really enjoyed reading your articles. Anyway, I will subscribe to your feed and hope you post again soon.
MLOps Training

360digitmg said...

Your work is very good and I appreciate you and hopping for some more informative posts data science course in delhi with placement

Tech Course said...

Hi, I looked at most of your posts. This article is probably where I got the most useful information for my research. Thanks for posting, we can find out more about this. Do you know of any other websites on this topic?
IoT Course

Bhuvankumar said...

Thank you for sharing this wonderful blog, I read that Post and got it fine and informative. Please share more like that...
Ethical Hacking Institute in Bangalore