Tuesday, April 01, 2014

Run Your Research Demo Site on the Cloud

Last week, Travis Hance and I spent hours wading through the many blog posts of the internet to figure out how to set up a simple website on Amazon EC2 using our Jeeves language, which runs on Python and C++. Because we want to spare you this trouble, we put together this definitive* post for people who want to run the simplest possible research demo site on Amazon EC2. We cover the following:
  1. How to set up an Amazon EC2 instance and SSH to it (to the install and configure whatever you like).
  2. How to set up and configure an Apache web server on your Amazon EC2 instance.
  3. How to set up your database and what to do if you want to host your own database on your Amazon EC2 instance.
  4. How to configure virtual hosts on your Apache web server if you want to use the same server to host different projects on different subdomains.
This post assumes you have experience using Django and testing things on your local machine. We're using Django 1.6.2 with Apache 2.4.9. These instructions are tailored for an Ubuntu instance, but they probably generalize as well.

Is Amazon EC2 for me?

The first thing to do is to determine whether you need to run your own EC2 server. Amazon's Elastic Compute Cloud (EC2) gives you elastic compute in the cloud. The biggest win is you can easily change how much capacity you have with minimal friction. It's also just a nice way to host servers without managing your own physical machines.

If you just need vanilla Django hosting, then you should probably find some other hosting service that can manage things for you. In our case, we wanted to use the Z3 SMT solver, which runs on C++, so we needed to run our own server.

Fellow CSAILers may be interested to learn that I have also set up a mirror site on our department's OpenStack cloud. This is free for people in our department and is useful if you don't need permanent cloud data storage.

How to set up a cloud instance.

Once you decide you want to set things up on EC2, it's pretty easy to get started. As of the time we signed up, there is a free Linux tier that gives you 750 hours at no cost. Amazon recently announced further price cuts, so the situation may be even more exciting by now. To set up your own Amazon EC2 server, sign up here and follow the i nstructions for launching a new instance.

SSHing to your EC2 instance.

In order to SSH to your instance, you will need to set the permissions of your servers to allow this. You can do this by going to your EC2 management console and adding your IP address (or all IP address if you want to live on the edge) to the "Inbound" list of allowed SSH addresses.

You'll also have to use an RSA key, which you should have generated sometime during the setup. Go to the "Instances" tab under your console to get the public DNS name. Then you can SSH to your instance:

ssh -i [location of your RSA private key] [username]@[public DNS name]

For Ubuntu instances, the username is "ubuntu."


Installing software.

Congratulations! You now have root access on an EC2 instance. You have the freedom to install software the way you would on any other machine. You can check out a copy of your code, as well as everything you need to run it, this way.

How to run a web server.

We'll be describing how to use the Apache HTTP web server for serving websites off your machine. To run your server, first download Apache and the WSGI (Web Server Gateway Interface) module for interfacing with Python programs.

sudo apt-get install apache2 libapache2-mod-wsgi

Once you have done this, you should be able to access the Apache configuration file in /etc/apache2/apache2.conf. This file tells a webpage how to interact with Apache, by describing for instance how paths should be resolved.

To make sure your Apache server knows about your demo project, first you'll want to set your Python path and alias your / path to wherever your WSGI configuration file is.

WSGIScriptAlias / /home/ubuntu/srv/testproject/testproject/wsgi.py
WSGIPythonPath /home/ubuntu/srv/testproject

You'll also want to add "Alias" entries for the static/ and media/ directories:

Alias /static /home/ubuntu/code/jeeves/demo/conf/static
Alias /media /home/ubuntu/code/jeeves/demo/conf/media

Finally, you'll want to add a "Directory" entry to set the permissions for the directory where you'll be serving your Python files from.

<Directory /usr/share>
  AllowOverride None
  Require all granted

To put these changes into effect, restart your Apache server:

sudo /etc/init.d/apache2 restart

You'll also want to change the permissions of your static/ and media/ directories to make them owned by the www-data group.

sudo chown -R www-data:www-data path/to/static/
sudo chown -R www-data:www-data path/to/media/

Now everything should work! Go to your hostname in the browser and see for yourself. Okay, so it is likely that there were some configuration errors and you get a "Bad request" or other error. When this happens, it is helpful to check your Apache error log, which can be found in /var/log/apache2/error.log.

Oh, and for Apache configuration files: a gotcha is that order matters, so for redirects you should put the most specific first and the most general last. A consequence of this gotcha is that if you have aliased '/' and you already have a Directory entry for '/', you need to move this to be after the Directory entry for the directory aliased to '/'.


Setting up your database.

If your Django application uses a database, you'll want to hook that up as well. Django has pretty good documentation for how to edit your settings.py for the database of your choice. You may need to install Python-specific libraries for interfacing with these databases. For instance, for MySQL you will want to install the python-mysqldb Ubuntu package. Once you have configured your database settings, running "syncdb" will set up your tables:

python manage.py syncdb

We found that our site ran much faster if we hosted the database locally. We followed the standard instructions for installing and running a MySQL database. For those who have never done this before, here is what you should expect to do:
  1. Install MySQL server.
  2. Configure your server by, for instance, setting a password for the root user.
  3. Start your MySQL server.
  4. Create a new MySQL database for use by your web application.
EC2-related: if you want to be able to access your database through SSH from other hosts (for instance, to back up your database from elsewhere), you will need to add a SQL entry to your security settings permitting access from the allowed IP address(es).


Getting ready for production.

Now you are ready to go! For your website to look the most professional, you will want to set DEBUG = False in your settings.py file. Once you do this, you will need to make sure the ALLOWED_HOSTS list includes your domain. An easy way to do this is to add the host '*' to the list.

And make sure the secret key you use in production is secret! 


How to host multiple projects on one server.

You might want to serve multiple demos, each with their own Django projects. There are a couple of ways to do this. One is to do the appropriate aliasing in your Apache configuration file for different subdirectories (For instance, example.com/project1.). If you go this route, you will have to make sure your redirects, includes, etc. point to the right place.

Another option, the one we took, is to use virtual hosts to put each project on its own subdomain. Here is how to add each new virtual host:
  1. Add a VirtualHost entry to your /etc/apache2/sites-available/[site name].conf file. For the main site the file is 000-default.conf.
  2. Enable this site:
    a2ensite [site name]  
  3. Reload your Apache configuration:
    sudo /etc/init.d/apache2 reload
Here is my VirtualHost configuration for jconf.jeeves.csail.mit.edu that lives in my /etc/apache2/sites-available/jconf.jeeves.csail.mit.edu.conf file. This post is getting long so I'm getting too lazy to explain all the parts, but you can see how I'm specifying paths, aliases, listening on port 80, and all that good stuff.

<VirtualHost *:80>
    ServerName jconf.jeeves.csail.mit.edu
    DocumentRoot /home/ubuntu/code/jeeves/demo/conf

    WSGIDaemonProcess jconf processes=5 threads=1
    WSGIScriptAlias / /home/ubuntu/code/jeeves/demo/conf/wsgi.py
    ErrorLog /var/log/apache2/jconf-error.log

    Alias /static /home/ubuntu/code/jeeves/demo/conf/static
    Alias /media /home/ubuntu/code/jeeves/demo/conf/media
    Alias /logs /home/ubuntu/code/jeeves/demo/conf/logs

    <Directory /home/ubuntu/code/jeeves/demo/conf>
      <Files wsgi.py>
        Order deny,allow
        Allow from all

Note that if you want things to run on subdomains, 1) you will need to use your own domain (rather than Amazon EC2's dynamically assigned DNS) and 2) you need to make sure you have DNS entries for the subdomains (you need to tell someone which IP addresses you would like for these subdomains to resolve to). There are instructions here about setting up your own domain name with EC2. Instructions for mapping subdomains will vary based on domain manager. (For CSAIL domains created with WebDNS, you can create subdomains by editing your hostname file and adding aliases for your subdomains.)


A final word.

There are a lot of details (version numbers; deprecation; death) involved with these web things, but it is so satisfying to get everything working. And if at first you don't succeed, try, try, try again.

* This claim is intended to be tongue-in-cheek. I had told Travis that there was so much misinformation on the internet that I wanted to write the definitive blog post. He laughed because this sentiment surely motivates every other post out there.