Archive for the ‘datacenter’ Category

Death of MySQL read replication highly exaggerated

Wednesday, April 16th, 2008

I know I’m a little late to the discussion, but Brian Aker posted a thought-provoking piece on the imminent death of MySQL replication to scale reads.  His premise is that memcached is so cool and scales so much better, that read replication scaling is going to become a think of the past.  Other MySQL community people, like Arjen and Farhan, chimed in too.

Now, I love memcached.  We use it as a vital layer in our datacenters, and we couldn’t live without it.  But it’s not a total solution to all reads, so at least for our use case, it’s not going to kill our replica slaves that we use to scale reads.  

Why?  Because we still need to do index lookups to get the keys that we can extract from memcached.  And we have to do lots of those indexed queries.  Most of the row data lives inside of memcached, so this turns out to be a great solution, but we still need read slaves to provide the lists of keys.  Bottom line is that we still use read replication heavily – but we use it for different things that we did in years past.

And then, of course, there’s the issue of memcached failure.  For us, it’s very rare, and thanks to the way memcached works, it rarely hampers system performance, but when a node fails and needs to be re-filled, we have to go back to disk to get it.  And doing that efficiently means read slaves again.

For us, memcached plus MySQL replication is true magic.  Brian’s a very smart guy, and I realize he wrote the post to get people thinking and talking about the issue, but at least for us, read slaves are here to stay. :)

The Sky is Falling! MySQL charging for features!

Wednesday, April 16th, 2008

There’s quite a bit of buzz on the blogosphere from people I respect a great deal, like Jeremy Cole at Proven Scaling and Vadim at Percona, about MySQL’s new Enterprise backup plans.  

The big deal?  They’re releasing a Community version that doesn’t have all the same features as the Enterprise version of Online Backup, including compression and encryption.  The Community version is open-sourced under GPL, the Enterprise version is not.

Personally, I think this is awesome. Don’t get me wrong – I love open source.  We couldn’t have built our business without it, and we love it when we get a chance to contribute back to the community.

But let’s not forget that MySQL is a business.  And that business helps the community and improves the software.  They have customers (I’m one – we’re a paying MySQL Enterprise Platinum customer), and they have to solve those customers’ problems.  This is a virtuous cycle where the community benefits directly as MySQL thrives financially.  

Every time a business like us pays MySQL for a service or feature, MySQL can then invest in better software that benefits all.  The end result in MySQL’s case is more GPL’d code.   In a very real way, without companies like mine, there wouldn’t be a new backup tool at all – let alone the differences this debate is focused on.

Every day, I hear someone saying “Man, I love SmugMug so much!  It has [insert features here] which I love!  Why isn’t it free?”

The answer?  ”It wouldn’t be SmugMug if it was free.”  MySQL’s situation is very similar.

I wish more open source projects would make it easier for this cycle to ignite.  Some of them, like Red Hat, refuse to even take our money.  Talk about stupid.  There are *lots* of businesses out there willing to pay for extra services and features, and the community can harness that revenue in amazing ways, including getting more (or better) GPL’d code.

Couple more thoughts:

  • I wouldn’t be surprised if future releases add new Enterprise-only features and some existing Enterprise-only features migrate down to Community.
  • The Community version is open-sourced, so I’m sure the community will develop their own compression and encryption features.
  • This is really no different from Enterprise Monitor, which has been only for Enterprise customers for awhile.
  • Lots of other projects do this (and I would argue this benefits those projects and their communities, too)
  • I’m 99% sure that this was the plan before Sun acquired MySQL.
In short, I view this as one of the ways we can both build our business and give back to the open source community.  Keep it up, MySQL!

Thoughts on Google App Engine

Tuesday, April 8th, 2008

First:  Very cool.

Next:  I think it’s interesting that Google has basically taken a sniper scope out and aimed it at a specific cloud computing target.  App Engine is only for web applications.  No batch computing, no cron jobs, no CPU/disk/network access, etc.  

I think this is very smart of Google.  Rather than attacking Amazon head-on, Google has realized there’s a huge playing field for cloud computing, and are attempting to dominate another portion of it, one where they have a lot of expertise.  Very good business move, imho.

Will we use it?  I wouldn’t be surprised.  I’ve long thought that we’ll continue to mix in web services from a variety of providers, and it looks like App Engine can solve a slice of our datacenter need that other providers don’t yet provide.  

I’m more than a little concerned, though, by how much vendor lock-in there is with App Engine.  At first glance, it doesn’t look like the apps will be portable at all.  If I want to switch providers, or add in other providers so I’m not relying solely on Google, I’m outta luck.  

I’m hopeful other languages get supported, too.  I think Python is great – don’t get me wrong – but we have a lot more experience with other languages, so there’ll be a learning curve.

Finally, I’m dying to find out what pricing for an application of our scale will look like.  I can see some immediate, obvious things I’d like to try to do on App Engine, but the beta limits aren’t gonna cut it for us.  :(

Will it replace Amazon?  It sure doesn’t look like it from where I sit.  In fact, I don’t see this as much of a competitor to Amazon Web Services.  There’s some overlap in some small area (hosted web apps on EC2), but I doubt that’s the bulk of Amazon’s business.  As I said, we’ll likely end up using both (and other providers as they come along, too).

My favorite bit?  In theory, Google has solved the data scaling problem.  I don’t mean raw binary (blob) storage, which S3, SmugFS, MogileFS, and plenty of other things have solved, but the “database” scaling problem.  Every popular web app runs into this problem, and it’s typically solved with a combination of memcached, federation, and replication.  But it’s messy.  In theory, Google has automated that piece for us.  I can’t wait to play with it and see if that’s true.

I also can’t wait to see who else is going to wade into this fray.  Sun?  Microsoft?  Yahoo?  IBM?  

Bring it on!

EC2 isn’t 50% slower

Wednesday, February 27th, 2008

I don’t want to start a nerdfight here, but it might be inevitable. :)

Valleywag ran a story today about how Amazon’s EC2 instances are running at 50% of their stated speed/capacity. They based the story on a blog post by Ted Dziuba, of Persai and Uncov fame, whose writing I really love.

Problem is, this time, he’s just wrong. Completely full of FAIL.

I’ll get to that in a minute, but first, let me explain what I think is happening: Amazon’s done a poor job at setting user expectations around how much compute power an instance has. And, to be fair, this really isn’t their fault – both AMD and Intel have been having a hard time conveying that very concept for a few years now.

All of the other metrics – RAM, storage, etc – have very fixed numbers. A GB of RAM is a GB of RAM. Ditto storage. And a megabit of bandwidth is a megabit of bandwidth. But what on earth is a GHz? And how do you compare a 2006 Xeon GHz to a 2007 Opteron GHz? In reality, for mere mortals, you can’t. Which sucks for you, me, and Amazon – not to mention AMD and Intel.

Luckily, there’s an answer – EC2 is so cheap, you can spin up an instance for an hour or two and run some benchmarks. Compare them yourself to your own hardware, and see where they match up. This is exactly what I did, and why I was so surprised to see Ted’s post. It sounded like he didn’t have any empirical data.

Admittedly, we’re pretty insane when it comes to testing hardware out. Rather than trust the power ratings given by the manufacturers, for example, we get our clamp meters out and measure the machines’ power draw under full load. You’d be surprised how much variance there is.

There was one data point in a thread linked from Ted’s post that had me scratching my head, though, and I began to wonder if the Small EC2 instances actually had some sort of problem. (We only use the XLarge instance sizes) This guy had written a simple Ruby script and was seeing a 2X performance difference between his local Intel Core 2 Duo machine and the Small EC2 instance online. Can you spot the problem? I missed it, so I headed over to IRC to find Ted and we proceeded to benchmark a bunch of machines we had around, including all three EC2 instance sizes.

Bottom line? EC2 is right on the money. Ted’s 2.0GHz Pentium 4 performed the benchmark almost exactly as fast as the Small (aka 1.7GHz old Xeon) instance. My 866MHz Pentium 3 was significantly slower, and my modern Opteron was significantly faster.

So what about that guy with the Ruby benchmark? Can you see what I missed, now? See, he’s using a Core 2 Duo. The Core line of processors has completely revolutionized Intel’s performance envelope, and thus, the Core processors preform much better for each clock cycle than the older Pentium line of CPUs. This is akin to AMD, which long ago gave up the GHz race, instead choosing to focus on raw performance (or, more accurately, performance per watt).

Whew. So, what have we learned?

  • All GHz aren’t created equal.
  • CPU architecture & generation matter, too, not just GHz
  • AMD GHz have, for years, been more effective than Intel GHz. Recently, Intel GHz have gotten more effective than older Intel GHz.
  • Comparing old pre-Core Intel numbers with new Intel Core numbers is useless.
  • “top” can be confusing at best, and outright lie at worst, in virtualized instances. Either don’t look at it, or realize the “steal %” column is other VMs on your same hardware doing their thing – not idle CPU you should be able to use
  • Benchmark your own apps yourself to see exactly what the price per compute unit is. Don’t rely on GHz numbers.
  • Don’t believe everything you read online (threads, blogs, etc) – including here! People lie and do stupid things (I’m dumb more often than I’m not, for example). Data is king – get your own.

Hope that clears that up. And if I’m dumb, I fully expect you to tell me so in the comments – but you’d better have the data to back it up!

(And yes, I’m still prepping a monster EC2 post about how we’re using it. Sorry I suck!)

More on MySQL & Sun

Wednesday, January 16th, 2008

Laura Thomson has an interesting post about the MySQL acquisition. And I think it really highlights a fundamental disconnect that some companies built on providing open source applications for enterprises face:

Their means of getting revenue are at odds with their customers’ needs.

I’m a paying MySQL Enterprise Platinum customer, and I’m seriously considering not renewing for another year if Laura’s thoughts are on target. In a nutshell, here’s why:

I would pay more for a version of MySQL that has Yasufumi Kinoshita and Google’s patches than I would pay for a version without.

In fact, as I mentioned already, I probably wouldn’t pay for MySQL as it stands today. I paid for it in the hopes that, as a paying customer, my feedback that these patches (and others like them) are vital would be listened to. Thus far, it hasn’t.

I could care less about MySQL’s desire to keep their released, supported software dual-licensed (commercial and GPL). I don’t consider our Enterprise subscription to be for the software – mentally, I’m paying for service and support. And the support (fixing InnoDB’s concurrency problems) is increasingly at odds with the business (releasing a commerical binary-only Enterprise release). But they’re on a collision course – I’m not the only one who will stop paying for it, resulting in damage to MySQL’s business.

I believe the right (and admittedly scary) thing to do is provide paid support for the GPL’d version and move the ball forward – accept community patches that fix major problems.

You can bet that I’ll be telling Sun this, over and over again. Since they have a history of listening, I’m optimistic.

(BTW, this problem isn’t unique to MySQL. Red Hat has the same dilemma – and they won’t take my money, no matter how hard I try to throw it their way)

Sun acquires MySQL!

Wednesday, January 16th, 2008

Remember when I said Sun was a company that listened? They sure do.

Maybe MySQL will finally start fixing all the performance/concurrency issues with InnoDB (basically, InnoDB’s threading and concurrency aren’t working well with modern multi-core CPUs). Google’s had some fabulous patches for awhile, and the brilliant Yasufumi Kinoshita does as well, but they don’t seem to be making their way into MySQL anytime soon.

Personally, I worry they’re focused too much on Falcon and not enough on InnoDB – but luckily Sun listens, so that may change. :)

Amazon announces SimpleDB (in Beta)

Friday, December 14th, 2007

Sweet! Amazon finally took the wraps off of SimpleDB. They’ve been working on this for awhile, and as you can probably tell, it’s a natural fit with S3 and EC2. There’s a great write-up about it over on inside looking out.

This is nearly a perfect solution for some of our data-related scaling challenges, except for two issues:

  • Physical proximity. Some of my datacenters aren’t close to Amazon’s, so the actual time to query SimpleDB is query time plus latency. This isn’t a problem if you’re doing all your queries from EC2, but we’re not there yet (we’d like to be, but a few pieces are missing. SimpleDB is one of those pieces, so we’re getting closer…). Amazon has promised me they’re workin the speed of light issue. ;)
  • Attribute size limits. We have some data fields that are longer than 1024 bytes (most aren’t and would work fine). We’ve thought about chunking the data up to get around this, which is a possibility, but it gets messy. Storing them in S3 is both overkill and probably too slow – if I need to get a few thousand photo captions *fast*, doing it through S3 isn’t optimal. If we could solve the latency problem I already mentioned, I’d be fine storing that specific data in some other store and working around it that way.

On the plus side, SimpleDB should be screaming fast, incredibly scalable, and almost all of our SQL queries would work with no changes other than syntax. Like many of you, I’m sure, we’re using much of our RDBMS as a fairly simple data store and aren’t using many advanced RDBMS capabilities. All of those queries could just use SimpleDB and then we could devote our DB iron to just the rare complex queries. We’re not alone – tons of web apps are gonna love this.

I’m thrilled to see the Amazon AWS stack continue to grow, and I’m shocked that they have as big of a lead as they do. I would have thought Microsoft / Google / Sun / whomever would have been out with some competition by now. It’s gonna happen – but I never would have guessed it would take this long.

Oh, and while I have your attention – SmugMug is now a fairly heavy user of EC2 and I have a write-up coming. So check back later if that’s of interest.

Companies That Listen: Sun

Thursday, December 13th, 2007

I’m a sucker for companies that listen to their customers. I’m sure you are too. How many times have you gotten a product that’s nearly perfect but is missing that final touch? Or worse, the product just doesn’t live up to it’s expectations? Don’t you usually feel helpless in the face of some huge software/electronics/car/whatever company? I know I do.

For example, the monopolistic cable company I’m forced to use, Comcast, hasn’t figured out how to deliver TV to my house for more than a month (isn’t that sorta what they do?) – and I’m helpless!

I’m happy to report that Sun listens to their customers. Really, truly, listens. Even to small ones like me. Even to small ones like me who complain loudly when a product isn’t right (but who cheer equally loudly when it is).

As you may have gathered from Jonathan Schwartz’s blog post ‘The Internet As Customer’, we were one of the attendees at Sun’s information gathering event, and it was fascinating.

One of my biggest takeaways (other than that Sun listens to their customers) is that Sun’s customer base is amazingly schizophrenic. Check out this small cross sample of some of them:

  • Some customers don’t want to buy Sun hardware unless they’ve embraced Linux (like, say, us). Others are freaked out that Sun is embracing Linux and are afraid it shows a lack of commitment to Solaris. (Wonder what they think about the new Windows deal? :) )
  • Some customers wouldn’t even be customers if it weren’t for AMD/Intel support (us again). Others see this as the death knell for Sun’s custom hardware and are worried.
  • Some customers don’t want to use Sun technologies unless they’re open source (us yet again). Others think Sun’s giving away the farm and that proprietary software (and hardware!) is the only way to survive.
  • Some of us can’t stand the complicated buying process and just want ‘Amazon for servers’ through a web UI (can you guess if this is us?). Others love having complicated, but complete and thorough, ordering channels.
  • A few of them worry that a focus on Java could possibly mean a de-emphasis of datacenter technologies (we don’t use Java, but this isn’t a fear I share). Others wish Sun would just focus on the most important thing to them, Java, and get rid of all this boring datacenter muck already!

I hope you get the general idea – and I’m so super glad that I don’t have to deal with a customer base nearly this broad and fractured. Whew! I don’t know how they do it!

A few quick notes:

  • This was an incredibly expensive event for Sun. Not in the the-food-must-have-cost-a-fortune sense of the word, but in the sheer-man-hours sense of the word. Going to the event, I knew Jonathan was speaking for an hour or so on the first day. I assumed that, being a busy guy with a multi-billion-dollar business to run, he’d speak and then leave to go run Sun. How wrong I was. Jonathan stayed the entire time, as did Scott McNealy, and an amazing braintrust of top executives and engineering talent. I completely believe it was absolutely worth it for much of Sun’s brainpower to be focused on listening to their customers – but honestly, I was surprised to see them actually do it.
  • About 6 months ago, we asked Sun for a product that would be incredibly difficult to design, but would dramatically change how we build datacenters. They nodded, said they’d look into it, and we crossed our fingers. Apparently we weren’t the only ones, because it’s coming – and it’s far better than we had initially asked for.
  • One of the attendees, who spends obscene, ungodly amounts of money with IBM, can’t even get engineering staff on the phone. Apparently, IBM has a big sales force who’s trained to buffer customers away from the engineers. Ugh. It’s an attitude like that which ensured IBM came in dead last in our vendor shoot-out. They literally didn’t want our business. Thank goodness Sun gets me in front of technical people when I need it.
  • I only read the dress code requirements after arriving. They said “Business” for the meetings. Since I don’t even own any “Business” clothes, that was a problem. T-shirt, Crocs, and a baseball cap it was! (And, of course, no-one cared. Or they were polite enough not to say anything :) )

All in all, I’m still feeling pretty dang good about our decision to go with Sun for our servers. An emphasis on innovation and willingness to listen to their customers is a winning strategy in my book.

I get SLAs now. Duh.

Thursday, October 11th, 2007

Ok, so I guess I’m a total n00b. In hindsight, SLAs make a lot of sense after all. The whole point isn’t to compensate SmugMug for our loss, it’s to make it unprofitable for the service provider to keep making the same mistakes.

In other words, let’s say Amazon’s margins on S3 are 15%. (I have no data, I’m just picking that number out of the air). If Amazon has a serious problem during a month, they have to cough up 25% to all their customers. In other words, they lose 10% instead of make 15%.

That’s pretty major incentive – and it now totally makes sense why SLAs are so highly valued.

Carry on.

Amazon S3 SLA is here! (Nirvanix dies?)

Monday, October 8th, 2007

Amazon has finally released and put into effect their SLA for S3. I know a lot of my readers will be thrilled about this. :)

I’ve gotten a few questions about Nirvanix in the past month or so, especially about the fact that they offer an SLA (and that S3 didn’t). I think this probably puts the final nail in Nirvanix’ coffin because:

  • Why would you trust Nirvanix, a no-name company, with your precious data?
  • Worse, they’re affiliated with MediaMax/Streamload in some way, who have a reputation of poor service. (I’ve even seen reports of data loss at Streamload, though I haven’t bothered to check).
  • Just how much is an SLA worth when there’s nothing behind it to back it up?
  • They’re more expensive than Amazon. Um, duh.

SLAs don’t mean a lot to us, anyway, as I’ve said before because:

  • Everything fails sometimes.
  • The SLA payment is rarely comparable to the pain and suffering your customers had to deal with.

But I know it’s very important to lots of people, so I expect there’s cheering and dancing in the streets. :)

UPDATE: I get SLAs now. Sorry for being dumb.