Sunday, August 30, 2009

Snow Leopard -- First Impressions

I was planning on picking up Snow Leopard (Mac OS 10.6) next weekend, but we were by the mall yesterday, so I went ahead and picked up a copy. My initial plan was to wait a week and let any major issues be discovered before I did my upgrades. Of course, I couldn't let it sit around for a week, and with multiple backup systems of my machines, I installed it last night. My initial experience and impressions are:
  • The upgrade process took 22 minutes and consisted of 3 or 4 mouse clicks and a reboot.
  • I have a number of SMB shares mounted from a server running Samba. I had to change the security setting to "user" where it had been "share".
  • The upgrade performed by the installer overwrote my /etc/snmp/snmpd.conf file. This prevented any snmp monitoring from working until I figured out what went wrong. (The installer also re-enables the OS X firewall even if it was disabled before -- probably a good decision)
  • It seems that Image Capture has gotten significant improvements with 10.6 -- I haven't noticed this called out by any of the other reviews. The UI is much more feature-full and even offers the option to delete images and videos from my iPhone. (If this was in the previous version, I hadn't noticed it.)
  • Boot time is noticeably faster.
  • I ended up saving about 10GB of disk space as part of the upgrade
I've only played with Snow Leopard for a few hours, so I expect to find out more things as I get more time with it.

Friday, June 26, 2009

Getting mail on my iPhone

We got our iPhones 3GS on the launch day, and I spent part of the weekend getting mail working right. I did hit a few issues, but I have everything working acceptable now. This is all based on my previous post on configuring a mail proxy.

The issues I hit were:
  • Can't seem to get the phone to connect on a user specified port for either IMAP or SMTP. There are options for that, but I could never get my phone to (reliably) use them.
  • I could never get starttls to work with the iPhone. Other clients, mail.app, worked just fine. This is annoying, but not such a big deal. The password for the login is still encrypted and the mail itself would normally be going outside of my systems anyway.
  • There is no "advanced" or "expert" UI for initially setting up email accounts on the iPhone (nor on mail.app). This means that you have to wait as the phone walks through a lot of default mail options (ports to connect on, SSL or no SSL, etc) before you get a chance to adjust anything.
So here are the steps. There was some bit of trial and error, but I tried to record everything accurately.
  1. Before you try to setup any mail accounts on the phone, you should import all of your self signed certs to the phone. To do this, simply upload all of the certs, renamed to something.crt to a web server and then load that site in Safari on your iPhone. That means to put the .crt files into a directory you can access from your phone (or any browser), not to use the certs to SSL encrypt the site. When you load each of those files in Safari (like going to http:///your.site.tld/file/mycert.crt), you'll get a new application to import the cert into your profile.
  2. Now create the accounts on the iPhone as you would otherwise. The settings for your connections are:
    ConnectionUsernameAuthenticationSSL
    IncomingusernamePasswordOn
    Outgoingusername@domainCRAM-MD5Off

  3. When making the initial IMAPS connection, you will be prompted to accept the certificate. Click to continue/accept the cert. I would have thought that importing the certs to my phone's profiles would have taken care of this, but it didn't. Of course there is probably some nuance of PKI that I don't grok (feel free to enlighten me if you know the details).
  4. After much waiting as the phone tries various incarnations of SMTP connections, you'll get prompted to attempt to proceed un-encrypted. Say NO here. If you say "yes"it seems to screw up the IMAPS connection that is actually working fine at this point.
  5. Now go into the mail settings and fix the SMTP connection to work right. Turn off SSL and set the authentication to CRAM-MD5.
  6. Open mail on the phone and you should mail working fine. Try reading some of your mail and try sending some to make sure it all works right.
That's about it. This is far from perfect, but seems to be working reliably and the boss (wife) approves. If someone has details on how to make this better or more efficient, please let me know.

Monday, June 15, 2009

OpenBSD + SMTP AUTH/TLS + IMAPS Proxy

Problem

I want to securely access my home email from my newly ordered iPhone 3GS. Since my mail repository is at home (vs on some one's webmail/freemail platform) and I want to be able to send email from my domain, I need to connect from essentially anywhere to my home systems.

Solution

I have an existing bastion host running OpenBSD to which I added the mail functions. Documentation that I found was a bit out dated or didn't quite put everything together, so I'm putting it all together here. This article is based on what is available with OpenBSD 4.5.

First, what are the different pieces in this puzzle?
  • Retrieve/view mail: I keep all of my mail on a server at home and provide access via IMAP(S). I extended this by using dovecot as an IMAP proxy on the bastion host. This allows you to view your mail.
  • Sending mail: Sendmail is the base MTA I'm using (it's part of the base install of OpenBSD) and to get the secure authenticated connection for remotely sending mail, two different things are needed.
    • TLS: For my purposes, this encrypts the connection via starttls(8)
    • Authentication: This provides the ability to authenticate users and allow authenticated users to relay mail through the server. This is done via cyrus-sasl.
Dovecot/IMAP Configuration

The configuration for dovecot is rather simple, but I never found it explicitly called out. Acting as a proxy for a particular user is done as part of an entry in the user database. Larger installations with many users can use a database for this, but for a small handful of users, this is more easily done with a passwd-file.
  1. Install dovecot. Either build it from source via ports or install the package. See the OpenBSD FAQ for how to do this.
  2. In /etc/dovecot.conf, find the section specifying the password database as a passwd-file and uncomment it such that you end up with the following. See AuthDatabase/PasswdFile section of the dovecot wiki for more details.
    # passwd-like file with specified location
    #
    passdb passwd-file {
    # [scheme=] [username_format=]
    #
    args = username_format=%n /etc/dovecot.passwd
    }
  3. Create the passwd file, /etc/dovecot.passwd, in your favorite editor filling in the fields as described in the Passwd-file documentation. You should end up with something that looks like the following:
    fred:{PLAIN-MD5}b40ac4fe40284c9de587b992c08f167::::::proxy=y host=my.proxy.domain.tld port=143
    The last fields, the extra fields, are the ones that make the proxy actually work. Note that the TLS/SSL options discussed in the dovecot documentation are only available in newer versions (1.2.rc4+) and not in the stable versions. That means I'm stuck with an un-encrypted connection between my bastion host/proxy and my real mail server. This isn't the perfect solution, but I prefer using the proxy to just allowing a direct connection from anywhere on the internet to my internal servers. Create the md5 passphrase hash with the md5(1) command:
    md5 -s password
  4. Configure dovecot to start at boot (if you didn't when you installed it) and start up dovecot. In /etc/rc.local add:
    if [ -x /usr/local/sbin/dovecot ]; then
    echo -n ' dovecot'; /usr/local/sbin/dovecot
    fi
Sendmail TLS Configuration

This is the easy part to write up. Follow the steps in the starttls(8) man page. Remember, this just gives you encryption when connecting to send mail.

Sendmail Authentication

This requires installing the Cyrus-SASL libraries, configuring users and saslauthd, recompiling sendmail and configuring sendmail.
  1. Install cyrus-sasl. Either build it from source via ports or install the package. See theOpenBSD FAQ for how to do this.
  2. Configure the sasl auth daemon for authentication from sendmail:
    echo pwcheck_method: saslauthd > /usr/local/lib/sasl2/Sendmail.conf
  3. Create users (these are the users and passwords for sending mail) with saslpasswd2(8) with the following command (you'll be prompted for a password). This will create /etc/sasldb2.db. You will use username@domain as the username for authentication.
    saslpasswd2 -c -u domain username
  4. Configure saslauthd to start at boot by adding the following to /etc/rc.local :
    if [ -x /usr/local/sbin/saslauthd ]; then
    echo -n ' saslauthd'; /usr/local/sbin/saslauthd -a getpwent
    fi
  5. Start saslauthd with /usr/local/sbin/saslauthd -a getpwent
  6. Rebuild sendmail with sasl support:
    • Add WANT_SMTPAUTH=YES to /etc/mk.conf
    • If you don't have the OpenBSD source code installed, install it. See the OpenBSD FAQ for details on doing so if needed.
    • cd to /usr/src/gnu/usr.sbin/sendmail/ and build and install sendmail with make clean obj depend && make && make install
  7. Configure sendmail for all of the new options. Edit /usr/share/sendmail/cf/openbsd-proto.mc as follows:
    • Uncomment (remove the "dnl" from the beginning of the line) the section for TSL/SSL support.
      dnl
      dnl TLS/SSL support; uncomment and read starttls(8) to use.
      dnl
      define(`CERT_DIR', `MAIL_SETTINGS_DIR`'certs')dnl
      define(`confCACERT_PATH', `CERT_DIR')dnl
      define(`confCACERT', `CERT_DIR/mycert.pem')dnl
      define(`confSERVER_CERT', `CERT_DIR/mycert.pem')dnl
      define(`confSERVER_KEY', `CERT_DIR/mykey.pem')dnl
      define(`confCLIENT_CERT', `CERT_DIR/mycert.pem')dnl
      define(`confCLIENT_KEY', `CERT_DIR/mykey.pem')dnl
    • Add the following options for SMTP AUTH.
      dnl
      dnl Set SMTP AUTH options
      dnl
      define(`confAUTH_MECHANISMS',`PLAIN LOGIN CRAM-MD5 DIGEST-MD5')dnl
      TRUST_AUTH_MECH(`PLAIN LOGIN CRAM-MD5 DIGEST-MD5')dnl
      define(`confAUTH_OPTIONS',`p,y')dnl
      define(`confPRIVACY_FLAGS',`authwarnings,goaway')dnl
  8. Rebuild the cf files and install them by:
    • cd /usr/share/sendmail/cf
    • make distribution
  9. Configure sendmail to listen for connections over the network (default configuration is to listen only on localhost) by adding sendmail_flags="-L sm-mta -bd -q30m" to /etc/rc.conf.local
  10. Kill the running sendmail, source the new configuration options and restart sendmail:
    kill `head -n1 /var/run/sendmail.pid`
    . /etc/rc.conf
    /usr/sbin/sendmail $sendmail_flags
That's it! You should now have everything setup and working. In trouble shooting and testing my configuration, I found it very handy to watch traffic with tcpdump(8) with a command like tcpdump -n -s 1500 -vvvX port 25 .

Credits:
A good bit of the SMTP AUTH configuration steps where taken from http://www.dsrw.org/~dlg/sysadmin/sendmail/ which was written for OpenBSD 3.3. Some things have changed by OpenBSD 4.5 partly compelling me to write this article.


Wednesday, June 10, 2009

Squid Proxy for Security

The Problem

It's pretty standard these days for everyone connected to the Internet to sit behind some kind of firewall. These firewalls are typically configured to block/filter inbound connections, but allow any connections initiated from inside. This is great for blocking the many scans and connection attempts from the outside, but misses what is the more common risk scenario.

Some non-trivial number of exploits aren't sourced by a connection initiated externally. Things such as spyware get introduced into a system via some means such as email attachments or, more commonly, a web browser exploit and a compromised web site. Once installed, they collect data and perform their real nastiness by sending that collected data back to its owner. All of this is done through connections allowed by the firewall policy.

Another issue is that once a nefarious soul has gained access/control of a machine, they'd like to use it to make connections to other machines to conduct various kinds of mischief. Control of the machine is often through some kind of bot that (a) must connect to some control site to get instructions and (b) must be able to make outbound connections to effect trouble on others.

A Solution

A solution to this is to block all outbound access by default and then only allow what is very specifically needed -- only specific hosts can make outbound connections on specific ports to specific destinations. This can seriously limit the usefulness of your systems to a possible attacker. If they can't call home to get commands, can't ship your personal information back to their home base and can't use your machine(s) to attack others, your systems just aren't very interesting. The problem is that without the ability to connect to various places on the internet, your machine(s) become very un-interesting to you, too.

My solution to this is to proxy all of the outbound connections through one host and only allow that one host to make the out bound connections. This provides several advantages:
  • Regardless of hosts coming and going on my network, I don't have to constantly update firewall rules.
  • With a proxy, I can log all of the outbound connections. This provides an audit source and a place to see what is really requested when outbound connections are made.
  • For a bot or spyware to make an outbound connection, they must understand how to use a proxy and know how to grab the proxy settings. Though this isn't necessarily very hard, but it is another hoop that must be jumped through.
  • If I haven't specifically configured my proxy for a given protocol, then outbound connections aren't possible, stopping the exploit from being effective.
For many years, I've used squid as a cacheing proxy to locally cache web content. This made squid an obvious choice to extend for security purposes. The configuration comprised of three steps. First, configure squid to proxy the various protocols I needed to proxy. Second, configure my firewall to only allow the host running squid to have outbound access as needed. Third, configure various clients to use the proxy.

Squid Configuration

I'll assume you have a working squid installation that is already proxying http(s) and ftp traffic. If not, there are a number of how-tos and other documents available on the web for configuring squid for this (and they will do a much better job than I will).

My configuration supports proxying AIM, Yahoo! IM, Google IM/Gtalk (Jabber), MSN Messenger, and rsync. Simply add the following to your squid.conf and restarting squid should allow these protocols to be proxied. I've collected some of these configurations from various places on the web, so I can't claim credit for figuring it all out.

In your squid.conf file add:
################
#
# allow AIM access
#
acl AIM_ports port 5190
acl AIM_domains dstdomain .oscar.aol.com .blue.aol.com
acl AIM_domains dstdomain .messaging.aol.com .aim.com
acl AIM_hosts dstdomain login.oscar.aol.com login.glogin.messaging.aol.com
acl AIM_nets dst 64.12.0.0/255.255.0.0
acl AIM_methods method CONNECT
http_access allow AIM_methods AIM_ports AIM_nets
http_access allow AIM_methods AIM_ports AIM_hosts
http_access allow AIM_methods AIM_ports AIM_domains
#
################

################
#
# allow Google IM (Gtalk) access
#
acl GTALK_ports port 5222 5050
acl GTALK_domains dstdomain .google.com
acl GTALK_hosts dstdomain talk.google.com
acl GTALK_methods method CONNECT
http_access allow GTALK_methods GTALK_ports GTALK_hosts
http_access allow GTALK_methods GTALK_ports GTALK_domains
#
################

################
#
# allow MSN Access
#
acl MSN_ports port 1863 443 1503
acl MSN_domains dstdomain .microsoft.com .hotmail.com .live.com .msft.net .msn.com .passport.com
acl MSN_hosts dstdomain messenger.hotmail.com
acl MSN_nets dst 207.46.111.0/255.255.255.0
acl MSN_methods method CONNECT
http_access allow MSN_methods MSN_ports MSN_hosts
#
################

################
#
# allow Yahoo IM Access
#
acl YIM_ports port 5050
acl YIM_domains dstdomain .yahoo.com .yahoo.co.jp
acl YIM_hosts dstdomain scs.msg.yahoo.com cs.yahoo.co.jp
acl YIM_methods method CONNECT
http_access allow YIM_methods YIM_ports YIM_hosts
http_access allow YIM_methods YIM_ports YIM_domains
#
################

################
#
# allow rsync proxy
#
acl RSYNC_ports port 873
acl RSYNC_methods method CONNECT
http_access allow RSYNC_methods RSYNC_ports
################

Now the rest is up to configuring your firewall appropriately and setting your various clients to use the proxy.



Wednesday, March 25, 2009

The year of the Linux desktop... again.

As usual, there have been a couple of recent articles surround Linux on the desktop including one from Infoworld and another on Datamation. Of course they were covered and discussed by all of the usual suspects. I feel that it's time for me to give my two cents on the topic.

Before we get too far into things, I think we need to get some nomenclature down. I use "Linux Desktop" to describe the case where Linux is the primary or sole operating system with which a user interacts with their computer on a routine basis. This implies that all of the applications that this user uses on anything other than a special case basis all all native Linux applications running on their desktop. If a user must maintain a virtual machine with some other operating system in it to run applications which they routinely (~daily) use, I'm not sure that this person is really running a "Linux Desktop".

There are several different scenarios or places for a "desktop". Each of these has distinct needs and uses cases for which Linux may, or may not, be a fit.

The first general area is a corporate setting. There are a couple of places in this setting where Linux makes sense and some where it doesn't. The most simple and most fitting place is as a desktop for technical users who are doing specific technical work for which there is native Linux software. These users are primarily working with applications native to Linux with other routine desktop tasks such as email, web browsing or office applications (spreadsheets, word processors, etc) being secondary uses of their machine. For this class of user, Linux makes perfect sense and other operating systems or end up being little more than terminals to some other UNIX machine. The next place where it makes sense is for very strictly configured desktops providing access to a very specific and limited set of applications -- most likely very task specific custom built applications. In this case, the desktop is little more than a modern green-screen terminal. Then there is the case where Linux simply won't work as the desktop. This is really driven by applications that are not available for Linux. For example, consider a user in the finance department that uses Crystal Reports and Microsoft Excel and shares documents with partner companies. Yes, OpenOffice.org provides a spreadsheet, but sharing anything other than the most basic files can be problematic. There are projects similar to Crystal Reports, but again, they are not 100% identical and bound to be problematic. It simply doesn't make sense to have to retrain staff to use tools that different than those generally accepted and used in their profession and then to deal with document conversions and corrections when sharing with other organizations -- the costs of this are just too significant to justify the Linux desktop for these users. In summary on the corporate front, there are a number of special cases where a Linux desktop makes sense, but plenty where it does not. My thought of the "perfect" corporate environment is one that recognizes that there are different tools and systems that suit different tasks. Users should use the tools that best suit their needs, not just force everyone to use the "corporate standard". The standards should be open standards that allow everything to work together -- think IMAP(S)/SMTP vs MAPI.

The other big area is the home setting, and like the corporate environment there are a number of different use cases. The simple use case, and the one where Linux makes sense, is for the technical user or hobbyist. These users specifically seek out Linux and understand its strengths and weaknesses. The next use case is what I'll call "grandma". When reading about "Linux on the desktop" I often read stories that pretty much go like "I blew away Windows on Grandma's computer and installed Linux for her. I remotely manage everything for her via ssh. All she does is basic web browsing and read email via web mail. She couldn't be happier with Linux." The keys to this is that (a) the computer is really nothing more than a basic web browser to the user and (b) there is someone much more technically skilled doing all of the work of patching the system, installing software, and troubleshooting things. For Grandma, it boils down to the computer being a simple tool that someone else takes maintains. Linux fits perfectly well here, but so does pretty much any other operating system. The last category is "Joe Six-pack" -- the typical person that buys a computer, maintains it themselves and may do any range of tasks on it. This is the largest user base and the one where Linux simply falls flat on its face and doesn't work.

So why doesn't Linux work for "Joe Six-pack"? There are several extremely important reasons that seem to often be lost on proponents of the Linux desktop.
  • Different distributions. Joe thinks of his computer as simply the computer. He may vaguely understand that there is this thing called "Windows" on it, but exactly what it is and the concept of an operating system are outside of his grasp, and more importantly his concern. He bought it at some big box store, plugged it in and turned it on. It's "The Computer". He doesn't care about the details of what's running as long as he can complete his tasks. One of the great strengths of Linux is the many different distributions and the innovation that comes with that. It's also a major problem for Joe. Given he can understand (or even wants to) he's running Linux (whatever that is), having to then further understand he's got Ubuntu or Fedora or RedHat or Suse or Gentoo or Mandriva or Kubuntu or Edubuntu or whatever is just too much. It's bad enough being an experienced technical user and having to translate what works on one distro to another -- it's simply far too much to ask of Joe. Hit with those questions, he'll look for a more simple solution. Remember that Joe wants a tool to complete a task, he doesn't want to learn all the details of the computer. The more he has to know and think about what's running on his computer, the less useful it is to him.
  • Different desktop environments. All of the different desktops have great attributes as do the stand alone window managers and simple text only console. The problem is, like the distribution issue, can quickly confuse Joe. I've been on a number of Linux User Group mailing lists where "newbies" ask simple UI questions like "my menu thingy disappeared, how do I get it back". Though these are "newbies" they are well above Joe and understand something about the community and joining the mailing lists for help. These answers always start off trying to help the "newbie" figure out which distribution, which desktop and which version of which desktop they are running. Again, this won't work for Joe. He just needs his menu back and having to go through this series of steps just won't work for him -- it's overly complex and requires effort and knowledge which Joe could care less about.
  • Applications. In the end, Joe's world is all about applications. The problems are two part. First with finding software and the second is the availability of software that actually meets his needs.
    • Finding Software: Joe is perfectly happy to stop by whatever store is nearby and buy a box with a CD/DVD with the application he needs. This is the same paradigm he uses for everything else in his world. If he needs a hammer, he goes to the store and chooses one from the shelf. If he's hungry, he goes to the grocery store and picks up what looks good. The current state of package managers is simply not good enough. First, it is a different paradigm than everything else in Joe's life (going to the store and buying it) but that is becoming less of an obstacle as people are accustomed to downloading and installing software. Joe is more likely to go to his favorite search engine, find software he wants and then want to click "download and install" and start using the application -- just as he can do with the predominant deskopt operating systems. With the plethora of distributions, this doesn't happen. Even if Joe finds a Linux compatible application, the likely hood of being able to click and install from that products web site is slim and at least would require him to understand which distribution he's using and probably require him to manually resolve some list of dependencies (does he really care what libfoo.1.4.3 is and if it's already on his machine?). The current state of package managers and the couple of line description of the software they provide is equally as confusing to Joe. It doesn't provide a rich and easy way to find an application and understand if it is what he wants.
    • Getting the right application. There are simply places where Linux does not have applications that would meet Joe's needs. He can't get tax preparation (TurboTax) software to do his taxes. OpenOffice.org is nice for most things Joe needs to do, but standby for flames the first time he has a problem importing or exporting an Microsoft Office document he got from a friend. Then there are the more difficult tasks such as video editing -- I've heard that it's going to be the year of video editing on Linux about as much as I've heard it's going to be the year of the Linux Desktop, and I'm still waiting. For technically inclined people, the above are not always issues since they understand the challenges, understand their options and in many cases simply enjoy working around the problems and coming up with solutions. Joe just wants to perform a task.
In many cases open source projects provide the best solution, regardless of price. There are, though, areas where these offerings just don't measure up, and the desktop for the typical/average user is one of those places. The desktop for the vast majority of users is about applications and completing a task. If they have to think too much about what an operating system is, let alone which one they have, or can't simply and easily find and use software for any wide range of tasks, then that desktop is simply not a viable option.

I've used Linux and other open source systems and tools for many years. I hope that this article will be considered constructive criticism and not just a rant or attack. I think competition and diversity ultimately result in the best solution, but that often has to be balanced with what the goal.

Wednesday, March 18, 2009

Back to Window Maker

What seems like ages ago, Window Maker was my choice of window managers on my unix workstations. I ran Window Maker on both my Linux and Solaris machines. As updates to Window Maker got fewer and farther between, I eventually succumb to the trend of "desktop environments" and started using Gnome. After enough fighting, tweaks and compromises, I got to a tollerable place and just lived with Gnome.

The other day at work, I decided to see how Window Maker was doing after all these years and what it would be like to go back to just a simple window manager instead of a "desktop environment". After quick install and a log out/log in, I was back in time to Window Maker. A few minutes of minor adjustments later, and good ol' version 0.92 was again making using X far more pleasurable than any "desktop environment" ever did. The simplicity and ease of configuring everything is a pleasant change from the complexity of Gnome.

What is it that makes Window Maker so nice (in my ever so humble opinion):
  • Very light weight. I don't end up with so many additional process for very little value (at least to me) as I would have with Gnome.
  • Aesthetically very pleasing. Granted this is mostly a copy of NeXTStep, but there is an amazing utilitarian beauty that's hard to find elsewhere. I spend a lot of time, seems like most of my waking hours, looking at computer screens so I would like something attractive to look at.
  • Simplicity of configuration. A nice GUI tool, WPrefs.app, can do pretty much all of the configuration. Of course, you're not limited to that either. All of the configuration is contained in six very simple text files. No XML. No longer many different files scattered across a myriad of different hidden directories. No more plethora of different GUI tools needed to adjust different aspects of the desktop.
  • Did I mention simple and straight forward and configurable with only vi?

A screen shot of my Linux workstation at home.


It's great to have switched back. It's even more exciting to see that development on the project seems to really be starting back up.

Monday, March 2, 2009

Avoiding Stalingrad

During World War II, the German operations on the eastern front ground to a halt at Stalingrad. Taking this city became a near obsession for the Germans forgetting their real objective, the oil fields of the Caucasus. The battle for Stalingrad embroiled the Wehrmacht in bloody urban combat for which their highly mechanized forces where ill-suited. Ultimately they chose to fight the wrong battle and ended up losing the war.

Though this is a short history lesson far outside of the IT industry, it has direct correlation to situations we face every day in designing and operating our systems. We end up focusing on the details of the immediate issue and band-aiding it instead of looking at the bigger picture and different solutions that do an end-run around the immediate problem. Moving to a fundamentally different solution is often hard for us for a couple of reasons. First, those of us in the IT industry tend to be very detail focused and looking at the details of the immediate problem is simply a natural thing for us. Second is our aversion to risk. We've learned, for better or worse, to only take small steps and make small changes in our systems to minimize the risk of something unexpected being introduced. Both of these can be very good traits in our day to day lives, but they can also lead us to our own Stalingrads when the problem is much bigger.

Now let's make all of this a bit more concrete with an example. Let's consider an internet facing OLTP system where, not surprisingly, the database is the single point of failure. This system also has a non-trivial SLA requiring very high levels of availability.

A very popular way to solve this problem, at least for projects that have a decent budget, is to build an active-passive shared storage cluster. This typically comprises the following components:
  • An enterprise RDBMs
  • A pair (or more) of proprietary UNIX servers
  • A enterprise storage array and its associated SAN
  • Clustering software to move LUNs on the storage array between server nodes and start/stop the RDBMs
All of these come with big feature lists and when assembled properly and meticulously maintained can provide rather respectable availability numbers.

This system design began life in the corporate back office where availability requirements and expectations where rather modest when compared to that of internet facing systems. Problems with this design begin to surface as we move this system to the internet and increase the availability requirement to nearly 100%. On a good day, we find that a simple server node fail over puts our SLA in jeopardy not to mention the bad day when the complexity of this system results in front page news.

A typical approach to solving this availability problem is to simply look at what causes problems in our cluster and try to band-aid them. We go back to the various vendors that sold us parts of our systems and add more of their features. We may look at their active-active database solutions to try and eliminate node fail over time. We may buy more disks and/or more storage arrays and replicate more copies of our database. We may buy the next greatest (and expensive) proprietary server with claims of greater uptime. We may try a different cluster package in hopes that its weaknesses will be better than the weaknesses of our current package. We keep trying to take this one system and add or incrementally change things to increase its availability.

This attempt to solve the availability problem is a typical occurrence in many companies. The focus is on how to fix a database cluster that doesn't measure up. The changes are incremental (minimized risk) band-aids to that cluster. Like the Germans trying to fight an urban battle with an army design for Blitzkrieg battles on large open planes, attempts to increase the availability of the closely coupled database cluster are ultimately futile. Closely coupled systems always have some common component that becomes a single point of failure. Adding more features to these clusters only adds complexity which in the long run further reduces availability . To win this specific war one has to:
  • Stop focusing on the database cluster. It's going to fail at some point no matter what you do. In fact, expect everything to fail at some point, design and operate around it (see ROC as one way to do look at this).
  • Solve the system problem, not the database problem. Look at the system as a whole and understand what it needs to functionally accomplish. Build a solution around this need -- don't just try to solve a database problem.
  • Expect to do things different than in the past since your requirements are non-trivially different than the past. If you find yourself saying "this is what we always do" or anything like that, you're probably going to end up with what you've always had -- something that doesn't meet the new requirements.
The moral of this story is that the skills that help us in our routine day to day life can actually be hindrances when bigger problems emerge. We have to realize when we are in this situation, pull ourselves out of it and fight to win the war, not the battle.