Wednesday, October 15, 2008

The JY Intro CS curriculum

I have long been of the the opinion that assembly (or even bits) is the best first programming language to teach, and I have long been appalled at the practice of teaching Python to poor naive starting programmers, but recent events* have caused me to change my opinion. The introductory curriculum I propose for people who have had none to light experience is:
  • Introduction to Python via requiring students to write something making good use of Python's very nice libraries for doing nasty things, such as I/O, regular expressions, and interacting with the WWW.
  • Introduction to why Python is not great by requiring students to write something in Python that is very slow for large sizes. Maybe implement a minimum spanning tree algorithm or something like that.
  • Introduction to C! We will write the same thing in C. After a couple days of pointer chasing and pulling hair out, we will emerge as better, stronger, programmers. Now we can run our programs 1000 times for the time it took to run our last program once!
  • Introduction to the necessary evil of object-oriented programming through revealing some difficulties of practicing good data abstraction in C. Maybe have people write something like a binary tree in C and have them try to make their library be more opaque about data members. (So OO programming is not actually necessary, but might be a necessary evil in present-day CS education for practical reasons.)
  • Now we program with actually cool languages that do things right. Haskell, anyone? (This might be the bonus extra part of the class that might happen after the semester ends.)
When I am a professor, students will have evolved enough such that such a curriculum will not be considered ambitious at all. ;)

Python, despite the fact that it is incredibly unprincipled, is a decent first language because it gives you great mileage. You can go very far with Python in a week; you can go very far with Python with a few lines of code. (Adam was writing code that took logged onto password-protected sites and processed/parsed the HTML for keywords to download specific files within a couple of days.) Starting with Python makes people excited about their potential power and shows them that tools can be powerful. It is, however, bad to stay in Pythonland for too long because Python enforces no good programming practices. (You get too far being a bad programmer and so become in danger of never becoming a good programmer.)

Why would you want to become a good programmer? Besides the obvious reasons ("you will get much further when you build large projects", etc.), you can't live in Pythonland forever unless you plan to live forever (and have deadlines extend forever). Though Python has nice foreign function interfaces in C, there exist times when you will want to write your own C. One of my favorite conversations demonstrating my power from good choice of language (in discussing some small simulation to solve a problem in a randomized algorithms course):
  • Naive friend: Does your thing take a really long time to run for n=10,000?
  • Powerful Jean: Um yeah. It takes a whole minute, maybe?
  • NF: Oh. Mine has been running since before dinner.
  • PJ: Haha. Should have used C.

Why I previously did not believe in any Python at all. Programming abstractions are not opaque, so many things in programming make much more sense when one understands why they are the way they are. (For instance, the difference between a linked list and array--and the reason why linked lists exist at all--make more sense when you have a feel for memory and issues regarding contiguous memory.) It doesn't make a lot of sense to teach programming with broken tools 1) things will seem arbitrary to students unless they understand why things are broken, and 2) unless you are getting some mileage out of them. (I see bits as the least broken abstraction, and then assembly, etc.) One of my first languages was Java, one of the most broken languages of all time, and things made little sense to me until I learned C.

Note. Learning to program is not the same thing as learning to think. Learning to program is a process that involves getting to know one's tools. Learning to think is a process that involves developing one's mind so that one can make good use of tools. Thinking models are, unfortunately, not yet the same as the tools we currently have, so as of now these two sorts of things should be (initially) taught separately and in parallel. As of now I lump really high-level things like Scheme with the high-level thinking stuff because 1) Scheme is so simple and nice (has only one main rule and is unityped) that it is great for teaching reasoning about recursion, continuations, and other such things and 2) Scheme isn't the best for helping people develop great programming practices (modularity, incremental testing, good abstractions, etc.).

*I have been advising my boyfriend, Adam, an economics consultant, in learning to program. So far he has written a lot of useful Python to automate his work and he has written a binary tree in C. He is the coolest beginning C programmer around because I showed him gdb, valgrind, and gprof.


Jesse Tov said...

Wow, I couldn't disagree with you more. As you accurately point out, Python does not encourage good programming. The problem with people starting in Python is that from the beginning they learn bad style, and they will tend to continue with Python. It's a gross, unprincipled language designed by a guy that says foldl is too hard to understand. Ruby and Perl have all the advantages of Python without the stupidity—though I wouldn't start an intro class with them, either.

If your Python isn't fast enough, you should 1) blame Python's implementers for not making it fast enough, and 2) not switch to C, of all things. There's no good reason Python can't be implemented with an optimizing compiler, and then it would at least be competitive with Java for speed, but they don't know how.

More to the point, why use an unsafe, pain-in-the-ass like C if there are safe languages that will run just as quickly? We may still need C for a while for writing device drivers and garbage collectors, but if you're writing general purpose code that needs to run quickly, you're far better off with MLton (SML) or OCaml. Safe languages with good abstractions, unparalleled programming-in-the-large features, and it's not hard to write code that runs just as fast.

I also think you unfairly malign Scheme. The recent Scheme standard (R6RS) provides a very advanced module system and a real record system, which together mean that Scheme now supports data abstraction. (It already did, but you had to hide everything in a closure—no longer.) There are also several Scheme compilers (Ikarus, Larceny, and Chez Scheme) that produce very fast code, not quite competitive with C, but kicking the pants off Java.

What would I start my intro course in? Not Scheme, because it's too complicated; not Python, because it's too stupid; not C, because it introduced too many irrelevant concerns. I am now fully convinced by the HTDP approach, which uses a series of small, Scheme-like languages to teach programming. The HTDP approach teaches a programming methodology that has students writing good, elegant code within the first couple weeks, and they learn to develop their programs from scratch, rather than filling things in. By the third week, HTDP students can design their own data structures, use structural recursion to traverse them, and are writing purely-functional animations. By learning first in a small teaching language, they learn programming techniques that they can use in whatever language they choose, rather than getting bogged down in language quirks and a specific library. I am amazed by how well it works in teaching almost any college freshman to write good programs.

It's true that they come out of the course knowing no real programming language, but that's hardly a problem. They pick up real Scheme and Java very quickly, and I'm sure they could learn something like ML with no trouble.

jxyz said...

Scheme is so nice that it does not enforce good programming habits. I want to teach people to program with a language that requires programmer discipline to produce readable, correct code. The other stuff comes later, or in the "how to think" section. You can't get very far as a programmer without being forced to face ugliness, because in life you will have to. This is the unfortunate truth.

Also, notice that you did not say "they will pick up C quickly." Most (programming, serious programming) people will require C at some point in their lives. Also, every job interview I've had has tested me extensively on my understanding of pointers, memory efficiency, and things like that.

pg said...

my main gripe with your elevated focus on teaching people how to write high-performance code (at the expense of increased development/debugging time) is that nowadays even people who are good programmers tend to regularly use high-level languages to interface with modules that are specially implemented for high performance. e.g., for doing scientific data processing, my friends in computational bio/chem use python to manage and manipulate their data then use some API for some heavy-duty machine learning package (that's probably implemented in C, fortran, or whatever, and compiled to run on multi-core or GPUs or whatever) ... so my argument is that most programmers nowadays (unless you're working as a SYSTEMS engineer) don't need to program high-performance stuff ... it's no longer n00bs that write in high-level dynamically-typed languages - expert programmers do too.

of course, to your credit, these expert programmers all know C very well and understand what goes on 'under the hood', so they are able to better utilize high-level languages.

(arggg i hate typing anything more than a sentence into these text boxes!)