Archive for the ‘datacenter’ Category

MySQL and the Linux swap problem

Thursday, May 1st, 2008

Ever since Peter over at Percona wrote about MySQL and swap, I’ve been meaning to write this post. But after I saw Dathan Pattishall’s post on the subject, I knew I’d better actually do it. :)

There’s a nasty problem with Linux 2.6 even when you have a ton of RAM. No matter what you do, including setting /proc/sys/vm/swappiness = 0, your OS is going to prefer swapping stuff out rather than freeing up system cache. On a single-use machine, where the application is better at utilizing RAM than the system is, this is incredibly stupid. Our MySQL boxes are a perfect example - they run only MySQL and we want InnoDB to have a lot of RAM (32-64GB … and we’re testing 128GB).

You can’t just not have any swap partitions, though, or kswapd will literally dominate one of your CPU cores doing who-knows-what. But you can’t have it swapping to disk, or your performance goes into the toilet. So what to do?

Our solution is to make swap partitions out of RAM disks. Yes, I realize how insane that sounds, but the Linux kernel’s insanity drove us to it. Best part? It works. Here’s how:

mkdir /mnt/ram0
mkfs.ext3 -m 0 /dev/ram0
mount /dev/ram0 /mnt/ram0
dd bs=1024 count=14634 if=/dev/zero of=/mnt/ram0/swapfile
mkswap /mnt/ram0/swapfile
swapon /mnt/ram0/swapfile

That’ll give you a 14MB swap partition that’s actually in RAM, so it’s super-fast. This assumes your kernel is creating 16MB ramdisk partitions, but you can adjust your kernel paramenters and/or the ‘dd’ line above to suit whatever size you want.

We’ve found that anywhere from 20MB-40MB tends to be enough (so use /dev/ram1, /dev/ram2, etc), depending on load of the box. kswapd no longer uses any noticeable CPU, there’s always a few MB of free “swap”, and life is back in the fast lane. Just add those lines to your relevant startup file, like /etc/rc.d/rc.local, and it’ll persist after reboots.

Some Linux purists will probably hate this approach, others may have more efficient ways of achieving the same thing, but this works for us. Give it a shot. :)

Oh, and I hope it goes without saying, but make *darn* sure you know what you’re running on your box and what the maximum RAM footprint will be before you try running with only 20-40MB of swap. We’ve never OOMed (Out-Of-Memory) a production MySQL box - but that’s because we’re careful.

UPDATE: See what happens when I wait to blog? I forget that I read another related post over on Kevin Burton’s blog. Like Kevin, we’re using O_DIRECT, but unlike Kevin, this doesn’t solve the problem for us. Linux still swaps. We use the latest 2.6.18-53.1.14.el5 kernel from CentOS 5, btw. (Sorry, had posted 2.6.9 because I was dumb. We’re fully patched)

New Amazon Features: Status Dashboard & Paid Service

Thursday, April 17th, 2008

I realize I’m already way behind blogging about other new Amazon Web Services features like the recent EC2 release with static IPs, availability zones, and user kernels not to mention the new block storage service.  I’ll still try to get to them - but I didn’t want to wait for this one.

I’ve been pushing Amazon hard to do something like this, and I’m thrilled it’s finally out.  They have a great new service status dashboard complete with historical data and a mechanism for communicating to us, their customers, about any issues they may be having.  Especially cool is that the data is provided via RSS, so you can programmatically poll the status and take steps as necessary.  Awesome!  Get all the details here.

One possible gotcha is that it looks like the dashboard is hosted at Amazon.  We’ve run into outages (very rare) where all of amazon.com is down.  In those cases, it’d be nice to have an externally-hosted site where they could post updates.  Our customers asked us for this recently, so on January 29th, we were happy to comply.  Perhaps Amazon could post to their TypePad blog in events like these, rare as they may be?

Next, they now offer paid premium support.  Need some sort of help that’s not provided on the AWS forums or via searching Google?  No worries - whip out your credit card and pay for it.  Looks like they have two plans which should cover lots of use cases I’ve seen in my own comments and on the forums.

I’d still like to see a pay-per-incident model, personally, even with an extremely high price-tag for each incident.  We rarely use support for AWS, but at the same time, we’re very big customers of theirs, so the monthly price is quite high.  But if we really come up against a big problem, it’d be nice to know I could pay for support just that one time.  I imagine most of their customers will like their Silver and Gold monthly  packages, but for us, they’re just not quite the right fit.  Do they work for you? 

I’m pretty thrilled about this release, but maybe our use case is different from yours.  Do you like these new features?  Are they missing things you’d like to see?

Death of MySQL read replication highly exaggerated

Wednesday, April 16th, 2008

I know I’m a little late to the discussion, but Brian Aker posted a thought-provoking piece on the imminent death of MySQL replication to scale reads.  His premise is that memcached is so cool and scales so much better, that read replication scaling is going to become a think of the past.  Other MySQL community people, like Arjen and Farhan, chimed in too.

Now, I love memcached.  We use it as a vital layer in our datacenters, and we couldn’t live without it.  But it’s not a total solution to all reads, so at least for our use case, it’s not going to kill our replica slaves that we use to scale reads.  

Why?  Because we still need to do index lookups to get the keys that we can extract from memcached.  And we have to do lots of those indexed queries.  Most of the row data lives inside of memcached, so this turns out to be a great solution, but we still need read slaves to provide the lists of keys.  Bottom line is that we still use read replication heavily - but we use it for different things that we did in years past.

And then, of course, there’s the issue of memcached failure.  For us, it’s very rare, and thanks to the way memcached works, it rarely hampers system performance, but when a node fails and needs to be re-filled, we have to go back to disk to get it.  And doing that efficiently means read slaves again.

For us, memcached plus MySQL replication is true magic.  Brian’s a very smart guy, and I realize he wrote the post to get people thinking and talking about the issue, but at least for us, read slaves are here to stay. :)

The Sky is Falling! MySQL charging for features!

Wednesday, April 16th, 2008

There’s quite a bit of buzz on the blogosphere from people I respect a great deal, like Jeremy Cole at Proven Scaling and Vadim at Percona, about MySQL’s new Enterprise backup plans.  

The big deal?  They’re releasing a Community version that doesn’t have all the same features as the Enterprise version of Online Backup, including compression and encryption.  The Community version is open-sourced under GPL, the Enterprise version is not.

Personally, I think this is awesome. Don’t get me wrong - I love open source.  We couldn’t have built our business without it, and we love it when we get a chance to contribute back to the community.

But let’s not forget that MySQL is a business.  And that business helps the community and improves the software.  They have customers (I’m one - we’re a paying MySQL Enterprise Platinum customer), and they have to solve those customers’ problems.  This is a virtuous cycle where the community benefits directly as MySQL thrives financially.  

Every time a business like us pays MySQL for a service or feature, MySQL can then invest in better software that benefits all.  The end result in MySQL’s case is more GPL’d code.   In a very real way, without companies like mine, there wouldn’t be a new backup tool at all - let alone the differences this debate is focused on.

Every day, I hear someone saying “Man, I love SmugMug so much!  It has [insert features here] which I love!  Why isn’t it free?”

The answer?  ”It wouldn’t be SmugMug if it was free.”  MySQL’s situation is very similar.

I wish more open source projects would make it easier for this cycle to ignite.  Some of them, like Red Hat, refuse to even take our money.  Talk about stupid.  There are *lots* of businesses out there willing to pay for extra services and features, and the community can harness that revenue in amazing ways, including getting more (or better) GPL’d code.

Couple more thoughts:

  • I wouldn’t be surprised if future releases add new Enterprise-only features and some existing Enterprise-only features migrate down to Community.
  • The Community version is open-sourced, so I’m sure the community will develop their own compression and encryption features.
  • This is really no different from Enterprise Monitor, which has been only for Enterprise customers for awhile.
  • Lots of other projects do this (and I would argue this benefits those projects and their communities, too)
  • I’m 99% sure that this was the plan before Sun acquired MySQL.
In short, I view this as one of the ways we can both build our business and give back to the open source community.  Keep it up, MySQL!

Thoughts on Google App Engine

Tuesday, April 8th, 2008

First:  Very cool.

Next:  I think it’s interesting that Google has basically taken a sniper scope out and aimed it at a specific cloud computing target.  App Engine is only for web applications.  No batch computing, no cron jobs, no CPU/disk/network access, etc.  

I think this is very smart of Google.  Rather than attacking Amazon head-on, Google has realized there’s a huge playing field for cloud computing, and are attempting to dominate another portion of it, one where they have a lot of expertise.  Very good business move, imho.

Will we use it?  I wouldn’t be surprised.  I’ve long thought that we’ll continue to mix in web services from a variety of providers, and it looks like App Engine can solve a slice of our datacenter need that other providers don’t yet provide.  

I’m more than a little concerned, though, by how much vendor lock-in there is with App Engine.  At first glance, it doesn’t look like the apps will be portable at all.  If I want to switch providers, or add in other providers so I’m not relying solely on Google, I’m outta luck.  

I’m hopeful other languages get supported, too.  I think Python is great - don’t get me wrong - but we have a lot more experience with other languages, so there’ll be a learning curve.

Finally, I’m dying to find out what pricing for an application of our scale will look like.  I can see some immediate, obvious things I’d like to try to do on App Engine, but the beta limits aren’t gonna cut it for us.  :(

Will it replace Amazon?  It sure doesn’t look like it from where I sit.  In fact, I don’t see this as much of a competitor to Amazon Web Services.  There’s some overlap in some small area (hosted web apps on EC2), but I doubt that’s the bulk of Amazon’s business.  As I said, we’ll likely end up using both (and other providers as they come along, too).

My favorite bit?  In theory, Google has solved the data scaling problem.  I don’t mean raw binary (blob) storage, which S3, SmugFS, MogileFS, and plenty of other things have solved, but the “database” scaling problem.  Every popular web app runs into this problem, and it’s typically solved with a combination of memcached, federation, and replication.  But it’s messy.  In theory, Google has automated that piece for us.  I can’t wait to play with it and see if that’s true.

I also can’t wait to see who else is going to wade into this fray.  Sun?  Microsoft?  Yahoo?  IBM?  

Bring it on!

EC2 isn’t 50% slower

Wednesday, February 27th, 2008

I don’t want to start a nerdfight here, but it might be inevitable. :)

Valleywag ran a story today about how Amazon’s EC2 instances are running at 50% of their stated speed/capacity. They based the story on a blog post by Ted Dziuba, of Persai and Uncov fame, whose writing I really love.

Problem is, this time, he’s just wrong. Completely full of FAIL.

I’ll get to that in a minute, but first, let me explain what I think is happening: Amazon’s done a poor job at setting user expectations around how much compute power an instance has. And, to be fair, this really isn’t their fault - both AMD and Intel have been having a hard time conveying that very concept for a few years now.

All of the other metrics - RAM, storage, etc - have very fixed numbers. A GB of RAM is a GB of RAM. Ditto storage. And a megabit of bandwidth is a megabit of bandwidth. But what on earth is a GHz? And how do you compare a 2006 Xeon GHz to a 2007 Opteron GHz? In reality, for mere mortals, you can’t. Which sucks for you, me, and Amazon - not to mention AMD and Intel.

Luckily, there’s an answer - EC2 is so cheap, you can spin up an instance for an hour or two and run some benchmarks. Compare them yourself to your own hardware, and see where they match up. This is exactly what I did, and why I was so surprised to see Ted’s post. It sounded like he didn’t have any empirical data.

Admittedly, we’re pretty insane when it comes to testing hardware out. Rather than trust the power ratings given by the manufacturers, for example, we get our clamp meters out and measure the machines’ power draw under full load. You’d be surprised how much variance there is.

There was one data point in a thread linked from Ted’s post that had me scratching my head, though, and I began to wonder if the Small EC2 instances actually had some sort of problem. (We only use the XLarge instance sizes) This guy had written a simple Ruby script and was seeing a 2X performance difference between his local Intel Core 2 Duo machine and the Small EC2 instance online. Can you spot the problem? I missed it, so I headed over to IRC to find Ted and we proceeded to benchmark a bunch of machines we had around, including all three EC2 instance sizes.

Bottom line? EC2 is right on the money. Ted’s 2.0GHz Pentium 4 performed the benchmark almost exactly as fast as the Small (aka 1.7GHz old Xeon) instance. My 866MHz Pentium 3 was significantly slower, and my modern Opteron was significantly faster.

So what about that guy with the Ruby benchmark? Can you see what I missed, now? See, he’s using a Core 2 Duo. The Core line of processors has completely revolutionized Intel’s performance envelope, and thus, the Core processors preform much better for each clock cycle than the older Pentium line of CPUs. This is akin to AMD, which long ago gave up the GHz race, instead choosing to focus on raw performance (or, more accurately, performance per watt).

Whew. So, what have we learned?

  • All GHz aren’t created equal.
  • CPU architecture & generation matter, too, not just GHz
  • AMD GHz have, for years, been more effective than Intel GHz. Recently, Intel GHz have gotten more effective than older Intel GHz.
  • Comparing old pre-Core Intel numbers with new Intel Core numbers is useless.
  • “top” can be confusing at best, and outright lie at worst, in virtualized instances. Either don’t look at it, or realize the “steal %” column is other VMs on your same hardware doing their thing - not idle CPU you should be able to use
  • Benchmark your own apps yourself to see exactly what the price per compute unit is. Don’t rely on GHz numbers.
  • Don’t believe everything you read online (threads, blogs, etc) - including here! People lie and do stupid things (I’m dumb more often than I’m not, for example). Data is king - get your own.

Hope that clears that up. And if I’m dumb, I fully expect you to tell me so in the comments - but you’d better have the data to back it up!

(And yes, I’m still prepping a monster EC2 post about how we’re using it. Sorry I suck!)

More on MySQL & Sun

Wednesday, January 16th, 2008

Laura Thomson has an interesting post about the MySQL acquisition. And I think it really highlights a fundamental disconnect that some companies built on providing open source applications for enterprises face:

Their means of getting revenue are at odds with their customers’ needs.

I’m a paying MySQL Enterprise Platinum customer, and I’m seriously considering not renewing for another year if Laura’s thoughts are on target. In a nutshell, here’s why:

I would pay more for a version of MySQL that has Yasufumi Kinoshita and Google’s patches than I would pay for a version without.

In fact, as I mentioned already, I probably wouldn’t pay for MySQL as it stands today. I paid for it in the hopes that, as a paying customer, my feedback that these patches (and others like them) are vital would be listened to. Thus far, it hasn’t.

I could care less about MySQL’s desire to keep their released, supported software dual-licensed (commercial and GPL). I don’t consider our Enterprise subscription to be for the software - mentally, I’m paying for service and support. And the support (fixing InnoDB’s concurrency problems) is increasingly at odds with the business (releasing a commerical binary-only Enterprise release). But they’re on a collision course - I’m not the only one who will stop paying for it, resulting in damage to MySQL’s business.

I believe the right (and admittedly scary) thing to do is provide paid support for the GPL’d version and move the ball forward - accept community patches that fix major problems.

You can bet that I’ll be telling Sun this, over and over again. Since they have a history of listening, I’m optimistic.

(BTW, this problem isn’t unique to MySQL. Red Hat has the same dilemma - and they won’t take my money, no matter how hard I try to throw it their way)

Sun acquires MySQL!

Wednesday, January 16th, 2008

Remember when I said Sun was a company that listened? They sure do.

Maybe MySQL will finally start fixing all the performance/concurrency issues with InnoDB (basically, InnoDB’s threading and concurrency aren’t working well with modern multi-core CPUs). Google’s had some fabulous patches for awhile, and the brilliant Yasufumi Kinoshita does as well, but they don’t seem to be making their way into MySQL anytime soon.

Personally, I worry they’re focused too much on Falcon and not enough on InnoDB - but luckily Sun listens, so that may change. :)

Amazon announces SimpleDB (in Beta)

Friday, December 14th, 2007

Sweet! Amazon finally took the wraps off of SimpleDB. They’ve been working on this for awhile, and as you can probably tell, it’s a natural fit with S3 and EC2. There’s a great write-up about it over on inside looking out.

This is nearly a perfect solution for some of our data-related scaling challenges, except for two issues:

  • Physical proximity. Some of my datacenters aren’t close to Amazon’s, so the actual time to query SimpleDB is query time plus latency. This isn’t a problem if you’re doing all your queries from EC2, but we’re not there yet (we’d like to be, but a few pieces are missing. SimpleDB is one of those pieces, so we’re getting closer…). Amazon has promised me they’re workin the speed of light issue. ;)
  • Attribute size limits. We have some data fields that are longer than 1024 bytes (most aren’t and would work fine). We’ve thought about chunking the data up to get around this, which is a possibility, but it gets messy. Storing them in S3 is both overkill and probably too slow - if I need to get a few thousand photo captions *fast*, doing it through S3 isn’t optimal. If we could solve the latency problem I already mentioned, I’d be fine storing that specific data in some other store and working around it that way.

On the plus side, SimpleDB should be screaming fast, incredibly scalable, and almost all of our SQL queries would work with no changes other than syntax. Like many of you, I’m sure, we’re using much of our RDBMS as a fairly simple data store and aren’t using many advanced RDBMS capabilities. All of those queries could just use SimpleDB and then we could devote our DB iron to just the rare complex queries. We’re not alone - tons of web apps are gonna love this.

I’m thrilled to see the Amazon AWS stack continue to grow, and I’m shocked that they have as big of a lead as they do. I would have thought Microsoft / Google / Sun / whomever would have been out with some competition by now. It’s gonna happen - but I never would have guessed it would take this long.

Oh, and while I have your attention - SmugMug is now a fairly heavy user of EC2 and I have a write-up coming. So check back later if that’s of interest.

Companies That Listen: Sun

Thursday, December 13th, 2007

I’m a sucker for companies that listen to their customers. I’m sure you are too. How many times have you gotten a product that’s nearly perfect but is missing that final touch? Or worse, the product just doesn’t live up to it’s expectations? Don’t you usually feel helpless in the face of some huge software/electronics/car/whatever company? I know I do.

For example, the monopolistic cable company I’m forced to use, Comcast, hasn’t figured out how to deliver TV to my house for more than a month (isn’t that sorta what they do?) - and I’m helpless!

I’m happy to report that Sun listens to their customers. Really, truly, listens. Even to small ones like me. Even to small ones like me who complain loudly when a product isn’t right (but who cheer equally loudly when it is).

As you may have gathered from Jonathan Schwartz’s blog post ‘The Internet As Customer’, we were one of the attendees at Sun’s information gathering event, and it was fascinating.

One of my biggest takeaways (other than that Sun listens to their customers) is that Sun’s customer base is amazingly schizophrenic. Check out this small cross sample of some of them:

  • Some customers don’t want to buy Sun hardware unless they’ve embraced Linux (like, say, us). Others are freaked out that Sun is embracing Linux and are afraid it shows a lack of commitment to Solaris. (Wonder what they think about the new Windows deal? :) )
  • Some customers wouldn’t even be customers if it weren’t for AMD/Intel support (us again). Others see this as the death knell for Sun’s custom hardware and are worried.
  • Some customers don’t want to use Sun technologies unless they’re open source (us yet again). Others think Sun’s giving away the farm and that proprietary software (and hardware!) is the only way to survive.
  • Some of us can’t stand the complicated buying process and just want ‘Amazon for servers’ through a web UI (can you guess if this is us?). Others love having complicated, but complete and thorough, ordering channels.
  • A few of them worry that a focus on Java could possibly mean a de-emphasis of datacenter technologies (we don’t use Java, but this isn’t a fear I share). Others wish Sun would just focus on the most important thing to them, Java, and get rid of all this boring datacenter muck already!

I hope you get the general idea - and I’m so super glad that I don’t have to deal with a customer base nearly this broad and fractured. Whew! I don’t know how they do it!

A few quick notes:

  • This was an incredibly expensive event for Sun. Not in the the-food-must-have-cost-a-fortune sense of the word, but in the sheer-man-hours sense of the word. Going to the event, I knew Jonathan was speaking for an hour or so on the first day. I assumed that, being a busy guy with a multi-billion-dollar business to run, he’d speak and then leave to go run Sun. How wrong I was. Jonathan stayed the entire time, as did Scott McNealy, and an amazing braintrust of top executives and engineering talent. I completely believe it was absolutely worth it for much of Sun’s brainpower to be focused on listening to their customers - but honestly, I was surprised to see them actually do it.
  • About 6 months ago, we asked Sun for a product that would be incredibly difficult to design, but would dramatically change how we build datacenters. They nodded, said they’d look into it, and we crossed our fingers. Apparently we weren’t the only ones, because it’s coming - and it’s far better than we had initially asked for.
  • One of the attendees, who spends obscene, ungodly amounts of money with IBM, can’t even get engineering staff on the phone. Apparently, IBM has a big sales force who’s trained to buffer customers away from the engineers. Ugh. It’s an attitude like that which ensured IBM came in dead last in our vendor shoot-out. They literally didn’t want our business. Thank goodness Sun gets me in front of technical people when I need it.
  • I only read the dress code requirements after arriving. They said “Business” for the meetings. Since I don’t even own any “Business” clothes, that was a problem. T-shirt, Crocs, and a baseball cap it was! (And, of course, no-one cared. Or they were polite enough not to say anything :) )

All in all, I’m still feeling pretty dang good about our decision to go with Sun for our servers. An emphasis on innovation and willingness to listen to their customers is a winning strategy in my book.