This is the science fiction that I thought the Pern stories should have been all along. Its fair enough that there is a build up to this point, although it took a long time and involved a lot more light weight fiction than I would have liked. This was a good book, and I enjoyed it.
[award: nominee hugo 1992]
A fresh cup mentions the Ruby on Rails exception notifier plugin. The idea is that every time an exception is raised in your code you get an email. This is such a horrible idea that I need to take the time to comment.
As someone who spends all his time dealing with large deployments of software, email is the worst way of reporting errors I can think of. Think about it:
- Email is unreliable to deliver. It could get queued on the reporting server, a mail router on the network, or on your delivery server. Worse than that, it could get marked as spam, or randomly discarded.
- Email is expensive. There are two kinds of expense here — email needs to be written to disk reliably, which means you sync() when you write the mail to a destination or a queue. For some MTAs, this can mean several syncs() per email as the mail moves between queues. There can be more than one of these MTAs on the way to the final delivery target as well. Additionally, storing email at the destination is expensive. Think of the backups, virus scanning, spam scanning, caching on clients and so forth.
- Email is wasteful. Think of all those headers for probably only a couple of lines or actual error to report.
- One email can result in many deliveries. To make the expense worse, your one email might get delivered many times. Think aliases, blackberries and mailing lists.
- Email is dumb. You wont get a summary of repetition — if there are 10,000 errors, you’ll get 10,000 emails. Try wading through those trying to work out what went wrong.
- The email could result in other outages. Most MTAs are built to handle “normal workloads”. Delivering 10,000 errors could end up with an unrelated outage taking down the mail system as well.
- You probably don’t care anyway. An exception indicates a bug in your code which you don’t know how to handle. The server should just crash and restart at that point… There isn’t anything that the exception will really help with here. A bug should be reported instead for new instances of exceptions (you could automate this).
A better solution? Write log entries to disk when these exceptions are seen, and then provide summaries to developers and operations engineers — perhaps as email, perhaps as something else. Report on the rate of these errors, and the count of the number of errors seen recently. These are much lower volume, so email would be an ok solution here.