Monday, February 23, 2009

Make An Effort(s)

I wrote some simple code today: a batch job that counts the number of items in a queue, then emails the count to a couple of people. My first version sent a message like this:

There are 3 item(s) in the queue.

Then I smacked myself and dove back into the code. Now the message looks like:

There are 3 items in the queue.

or:

There is 1 item in the queue.

Maybe ten minutes to code and test, and I feel better. The item(s) dodge is lazy, and looks so cheesy.

Friday, February 20, 2009

Lotus Domino - Sometimes Store And Forward Wins

In my day job, I babysit a fairly large Lotus Domino web application – call it ELF – at a very large financial services company. I do all the development and most of the production support.

Some parts of ELF work like this:
  1. A customer request comes in through the web front end. (ELF handles about a dozen different kinds of requests);

  2. The web server stuffs the request into a Domino database;

  3. A batch job moves the request to a relational database.
At first glance, step 2 seems redundant. Why slow things down? Why not just go directly to the relational database? After all, data redundancy is a bad thing. It can lead to discrepancies, it's wasteful, it's needless complexity.

All true, but this basic store and forward architecture has a real advantage: it's more robust than the typical web application (call it TWA). TWA looks something like this:
  • Web server running on one machine;

  • Application server, a separate piece of software, running as a separate process on the same machine, or on another machine;

  • Database server, a separate piece of software, almost certainly running on another machine.
That's three potential points of failure. Plus network risk. Maybe the application server is using web services to store the data – another potential point of failure. And there's configuration risk – more about that later.

Domino OTOH is a large, monolithic monster. It provides everything - the web server, the application server, the database server, even a batch scheduler. All in one server, on one machine. This eliminates a lot of the risks. If the web server is up, then the database server is up, and the web server can talk to it. If Domino can serve up pages, it can store the data that come back.

For ELF, this means independence from the back-end relational databases. If they crash, if they go off-line for maintenance, if there's a network problem, ELF doesn't care. It keeps accepting customer requests and storing them in Domino. Every couple of hours, it tries to reach the relational databases. If they're available, fine – it transfers the requests. If not, no worries, it tries again later. And the website stays up.

Here's a real-life example, from last week, that got me thinking about all this. I mentioned configuration risk. In TWA, the application server probably stores its database credentials – userid, password – in a secure configuration file. But what if some overzealous dolt disables the database userid? That kind of thing happens in big companies. And when it does, TWA stops working, till someone figures out what happened and gets it fixed. Again, in a big company, that can take a surprisingly long time.

But when this happened to ELF, it kept rolling along. We didn't even know there was a problem till the people on the relational database side noticed they weren't seeing any inserts. We looked into it, found the credentials problem, and got it fixed. But the userid went bad on a Friday, when I was out. By the time we got the problem fixed, it was Tuesday - five days. All that time, ELF accepted customer requests and stored them in Domino. Once the credentials were fixed, it transferred all the requests. No data loss, no unhappy website users.

TWA can't load-balance around a problem like this. But let's reshape TWA a bit. Let's have the application server write all the front-end data into a local store, on the same machine. Maybe it uses the filesystem, maybe sqlite. Whatever's fast and reliable. After that, let some batch job deal with it. Any problems reaching the “real” datastore are now in the background. The application server can still capture the user's data and send good news back to the browser.

Obviously, if your customers are trading stock, you don't want to do this. But if they're buying books or T-shirts, maybe you can get away with it. And it's a whole lot better than “Try again later.”