Sun Honeymoon Update: Servers

It’s been two months since we divorced Rackable and married Sun as our new server & storage vendor and lots of people have been asking how it’s going. So while the ‘marriage’ is still early the server side of things is going really really well. We’re still starry-eyed in love. Our experience with Sun’s storage hardware isn’t nearly so rosy (in fact, it’s downright bad), but I’ll cover that in a near-future update.

So, what do we love about our new server partner?

  • We can standardize on a single server platform for 99% (if not 100%) of our future server needs. The SunFire X2200 M2 servers are 1U and scale up to 2 x dual-core Opterons with 32GB of RAM (and, as important, down to 1 Opteron w/2GB of RAM). For us, that’s huge. Imagine, if you will, some catastrophe befalling one of our database boxes that requires hardware replacement. Instead of having lots of expensive, idle, duplicate hardware around, we could literally crack open a web server, add some more RAM and an external HBA card, and boom, we have a new DB box. There are many reasons Southwest is the most profitable US airline and a huge one is standard components.
  • Their lights-out management (LOM) is a dream. I dinged the Sun T1000 last year because it’s LOM is pretty terrible, but the X2200’s LOM is freaking fantastic. How fantastic? Let me count the ways:
    • It’s ethernet rather than serial. Yay!
    • It can share the same ethernet port the OS does. One wire for both LOM and OS! Less datacenter mess. Double yay!
    • It has a built-in Web UI that lets you see and access all of the features, in addition to telnet and SSH.
    • The Web UI lets you actually view the VGA output on the console. Not just serial console redirection – actual video output.
    • The LOM lets you remotely mount ISOs, floppy images, etc. Got a CD or DVD on your desktop at the office that you wish was in the drive at your datacenter? No problem.
    • Built-in email notification ability for status changes.
    • Lots of SNMP settings. Haven’t played with this much yet, but it looks full-featured.
    • Lots and lots of hardware details, like motherboard and BIOS versions, NIC details, etc are all right there.
    • All of the statuses (fan speeds, temp readings, voltage indicators, etc), with tons of detail, are at your fingertips
  • Well built. First of all, it’s amazing what’s crammed into this 1U footprint. But second, it’s gorgeous inside. It’s clear that someone(s) spent a lot of time and energy working on the layout so that everything fit together just right. Feels like a labor of love. Nothing looks out of place.
  • I gave the T1000 props for the way Sun does illustrations on their lids to show what parts are hot-swappable vs cold-swappable and the X2200 is no exception. The lid is printed with all kinds of useful diagrams that make servicing the hardware much much easier. I’m a sucker for attention to detail (one reason I love Apple).
  • Turnaround time was excellent with both orders we’ve placed so far. We don’t have the luxury of planning for projects months and months in advance, so moving quickly when we need new hardware is key.
  • Pricing was great. Thanks to Sun’s AMD (and soon, Intel) server platforms, their pricing is competitive with everyone else. I truly believe that the baseline hardware (CPU, RAM, HDDs) has become commodity and that the differentiating value is in the extra technology (like LOM), service, and support. Sun gets this, I think.
  • Their rails just work. This is more rare than you might imagine – sucky rails really suck. Sun’s rails do what they’re supposed to – make it easy to install and, later, get access to your servers.
  • Their diagnostic CD was extremely useful and easy to use. This is an often overlooked area, but we were unlucky enough to get some bad RAM (see below), and this came in handy.
  • Fast. I thought this went without saying, since the performance bits are commodity components, but as you’ll see from the storage problems we had, speed on paper doesn’t always equal speed in the datacenter. These boxes are as fast as they should be – screaming.

So what’s not to like? Nothing’s bad enough that we’d kick Sun outta bed for eating crackers, but there are some quirks:

  • We bought these direct from Sun, with custom configurations, and I believe Sun is still trying to get their head around direct sales (vs VARs). As a result, it turns out that they arrived without all of the RAM already installed. No biggie, we just installed it ourselves. Only thing is, the RAM also wasn’t tested beforehand. We’re used to our systems being fully tested & burned-in prior to delivery, and sure enough, we got a bad piece of RAM. That sucked. For now, we’re just adding a day of burn-in to our install routine, but we’re hoping Sun standardizes on this in the future. UPDATE 1: Just got word from Sun, there is an option to have custom configs burned-in at no cost, but it adds an extra 2-3 weeks to the lead time. We’ll have to think about how to best use this here, since we usually want our gear fast.
  • As I mentioned in our engagement announcement, the sales and approval process (not the people) sucks. Having to go through the approval process over and over for each order that’s slightly different isn’t pleasant. Dell excels at this, by comparison. They fire off quotes (and hardware!) with lightning speed. Here’s how I wish it would work:
    • Sun goes through the approval process for SmugMug and assigns us a discount.
    • From then on, we can just go login to sun.com and place orders for as much (or as little) hardware as we want that day and it automagically applies our discount.
    • Should we think our sales volume warrants a bigger discount or something, we re-engage to re-evaluate.
    • Our sales team at Sun gets to focus on keeping us up-to-date on new technology, roadmap changes, and everything else without wasting time on the approval process for small orders that are similar to orders we’ve placed in the past.
    • We’re happy, Sun’s happy, everyone’s happy.

If we could change anything about them, would we? Of course!

  • Love to see dual power supplies. Since power supplies are a very common failure point for servers, we like redundancy here. (The moving parts fail far more often than our circuits do, so surprisingly, we don’t want dual power supplies to handle circuit failures).
  • While we’re dreaming, I’d love to see DC power as an option and remove AC from the equation. We could get lower failure rates, better power utilization, and better redundancy in one fell swoop.
  • And if we really want to get pie-in-the-sky, I’d love to see some sort of liquid or gas cooling system so we can get cooling efficiencies too. This is way outside of my field of expertise, so I don’t know how it would work, but Blackbox seems like it has some great stuff along these lines.

Stuff we really haven’t kicked the tires on yet:

  • We typically whip out our amp meter and take power readings as soon as we get new hardware in our datacenter, since power & cooling are huge concerns for us. This time, we were under such a time crunch (and so busy with all of the nasty storage problems I’ll blog about soon), I haven’t had time. I’m hopeful that all of Sun’s noise about power efficiency is reflected, but I won’t know for sure until I get the hardware out and test it.

And finally, everyone at Sun deserve a shout out. They’ve built a great product, and they’ve certainly showed us a great deal of support and personal attention, which we appreciate. If the people we’ve dealt with are any indicator of upcoming success, Sun’s future looks bright. (No pun intended).

I will post a follow-up shortly detailing the nightmare that our quest for fast DB storage became and what we’ve managed to do about it, but for now, I hope this helps anyone looking for server solutions.

Bottom line: I can’t recommend the X2200 M2 highly enough.