Sunday, December 09, 2012

Treat Yo Self: Clean Up Your Code

A couple years ago, I discovered what I thought was the shortcut to building research systems.  Forget good software engineering practice!  Forget functional abstraction!  Copy-and-paste code all over the place; modify it to fit your needs.  Thinking before coding?  So college.  After all, premature optimization is the root of all evil.

In the beginning, this worked out well.  In my first year, even my advisor told me he was impressed with how quickly I got things working.  I saw in other research code the same patterns that I was learning to adopt: monstrous tangles of functionality with scant documentation.  I have found the secret to research productivity, I thought.

A couple deadlines later, I began to feel the consequences of my actions.  Pre-deadline, systems would begin to fall apart: a patch here revealed another hole there.  Post-deadline I had no desire to go back to disgusting soups of one-off functions, barely usable in the first place and certainly not reusable.  Much time was spent either avoiding my code or writing replacement code from scratch.

In other researchers, I observed that the few who had designed their systems well were able to make quick bug fixes and extensions.  For everyone else, either their code bit-rotted in obscurity or they were a slave to maintaining their systems for barely-satisfied users.  During paper deadlines, those who had built up good infrastructure could build comfortably on previous work while others ran around fighting fires and despairing.

During the course of grad school, my relationship with my code has become increasingly important.  Clean, modular, and well-documented code (with tests!) is not only less likely to have bugs but will be useful for longer.  Clean code provides a solid foundation for you and potentially other researchers.  Modular code makes it easier to reuse parts of your code. Also, knowing exactly what your code does just feels good.

On his blog, Harvard professor Michael Mitzenmacher advises graduate students to take a day every now and then to find better tools: for organizing papers read, for recording ideas and progress, etc.  Rewriting and refactoring code has become an important part of these activities for me.  Not only is refactoring useful, but it is also a relatively low-effort way to achieve a feeling of progress*.  For me, refactoring has become a treat for working hard.

Don't tell too many people now, but I think this is the real secret to research productivity.

* The importance of the feeling of progress is a topic worthy of its own blog post.


prabhasp said...

Question: do you write tests?
Before or after you write code? Is this different whether its refactor mode, or original make-shit-work mode?

Jean said...

I am not as good about tests as I would like to be. I usually write a few tests for a couple of edge cases in the beginning but they don't become thorough until refactor mode.

What about you?

Tom said...

I spent my first year on a project where I neglected software engineering practices. It didn't take long for me to regret it for all the reasons you describe. Eventually I declared bankruptcy, created a new git repository, and recreated the project by copying code from the old project, but taking care to modularize and clean things up in the process. It helped a lot, so I took it one step further over the summer.

On my next project I had an aggressive prototype-every-other-week schedule that I maintained by starting every prototype from scratch, but reusing code from previous prototypes by copying and cleaning up the good parts. I was able to work quickly because most of the time, most of the code was well-designed. It even looks like a significant amount of the code is going to be reused by another researcher in their project. Paying technical debt as you go is absolutely the right thing to do.

Jeremy H said...

Not only is refactoring useful, but it is also a relatively low-effort way to achieve a feeling of progress.

This is one of the main reasons I dropped out of (theory) grad school and switched to industry.

These days, I use refactoring to jumpstart my brain when I'm having an off day. That way, I never fall into a slump and the "expected productivity" for a day can't dip (too much :)).

prabhasp said...

@Jean: I've recently come around to TDD, and try to force myself to do it when I write code, the little I do. It helps make sure I don't write code that I can't use later on, and sometimes even helps during the development process.

xcod4r said...

Print ip addresses found in text
c language samples for beginners