Posts Tagged ‘percona’

Hot technologies I care about - Sep ‘08

Wednesday, September 17th, 2008
Iron Worker by ikegami

photo by: ikegami

I’ve been too busy to blog lately, and for that I apologize.  But here’s a quicky detailing the technologies (internet related and not) I’m excited about right now:

  • Drizzle.  For years now, I’ve felt that MySQL has been doing in a direction in opposition to my use case.  Stored procedures, views, etc etc have added bloat and complexity without offering me anything useful.  Turns out I’m not alone - and thus Drizzle was born.  To say I’m *super* excited about this is a serious understatement.
  • Google & Percona’s MySQL patches.  While I wait for Drizzle, I’m stuck dealing with terrible concurrency issues in MySQL/InnoDB that force us to partition data way before we really should have to, making our system more complex.  It’s crazy having a server keel over when it shouldn’t be either CPU-bound *or* IO-bound but that’s life with MySQL and InnoDB these days - or at least, it was until Google and Percona fixed what I couldn’t get MySQL to fix with our Platinum Enterprise subscriptions.  Open source rules!
  • Flash storage.  I really wish I could talk about this some more (pesky NDAs), but there are datacenter changes coming that are more dramatic than anything I’ve seen in 14 years of working on them. I hope I’ve talked to everyone in the space (and from the companies I’ve talked to, one of them seems to be the *very* clear winner for this upcoming round), but if you’re a storage vendor working on flash appliances and I haven’t talked to you, ping me.  We’re a bleeding edge customer and we’ll put your stuff in production faster than you can deliver it to us.  :)
  • ZFS.  Regardless of flash storage, ZFS is the filesystem of choice - head and shoulders over everything we’ve used or heard of.  The advent of flash just makes this even more compelling.  The downside?  It’s not on Linux.  :(
  • OpenSolaris.  ZFS is so incredible, my hand has been forced, and we’re about to put our first OpenSolaris system into production.  OpenSolaris is, in theory, the Solaris kernel (think ZFS, DTrace, SMF, high concurrency, etc) with the GNU-like userland (think Linux-like).  In practice, it’s still extremely painful for a Linux expert and Solaris n00b like me to use - even on a single-purpose machine like a MySQL server.  Only ZFS makes the pain worth it.  For development, it’s basically unusable for Linuxers (it’s probaby fabulous for Solaris guys - lucky ducks).
  • Nexenta.  Unlike OpenSolaris, Nexenta *is* the Solaris kernel plus GNU userland.  Unfortunately, it’s not backed by Sun or anyone else I have any relationship with.  Sun has been absolutely the very best technology vendor we’ve ever dealt with in terms of support, technical knowledge, and just plain listening to us, so that’s a big issue.  I wish Sun had taken Nexenta’s approach (or would just buy them or offer support or something).  If OpenSolaris continues to be painful, we may fall back on Nexenta instead - remember, ZFS is the driving factor here.
  • Amazon Web Services competitors.  They’ve been promising they’d be coming out for years now and I’m shocked they’ve given Amazon this much runway.  But I believe a few more are getting very close (can’t say more, again, pesky NDAs).  Now, we’re extremely happy with Amazon, so we have no plans to switch, but competition is good for everyone - and Amazon is a fierce competitor.  Plus there are still gaps in Amazon’s strategy, and if I can mix & match to plug some of those gaps, awesome - sign me up.
  • Memcached.  This one’s been on my list for years, and it’s still way up there.  Binary protocol on the verge of shipping, nice patch to resolve some networking issues we’ve seen, and talk about scabability.  If you’re building web apps and this isn’t a core part of your infrastructure, you’re doing it wrong.
  • Big RAM.  4GB DIMMs are dirt cheap, so if you’re not loading your DB and Memcached boxes to the gills, you’re missing the boat.  Cheap 2-socket 64GB (and relatively cheap 128GB at 4-sockets) are here.
  • Sun Fire X4140 and X4440.  The best 1U (2-socket) and 2U (4-socket) servers on earth.  Despite being late to the game with quad-core, Opteron RAM performance kills Xeon, so these are the servers we’re buying.  You can load them to the gills with 4GB DIMMs, enjoy the dual-power supplies (yes, in the 1U box too), and crank out some great stuff.
  • OpenSocial, Y!OS, etc.  The big boys are finally getting real about getting open and cross-pollinating data and I think we’re finally nearing an inflection point.  We’re hiring a Sorcerer to do nothing but think and build in this space.  I’m sure magic will ensue.
  • Nikon D90 and Canon 5D MkII.  Nikon’s taken the photography world by storm with amazing high-ISO performance, and Canon just announced a DSLR that shoots full 1080p video.  Both look amazing and both are game-changers.
  • Onkyo TX-SR806.  I’m an A/V junkie and this thing is amazing.  5 HDMI inputs (need more?), THX Ultra2 Plus (the low-volume enhancements are *awesome* with young kids sleeping at home), automatic room EQ, decodes every modern audio encoding, etc.  I don’t even use the amplifier section (I have separates), but it’s turning out to be the best Pre/Pro I’ve ever owned.  Sounds fabulous on my gear.
  • iPhone App Store.  That thing is a game changer, and we’re barely seeing the tip of the iceberg.  All the other players have to respond - which is great for you and I.  And talk about a platform that’s a dream to develop on!
So there you have it.  Those are the most important pieces of tech I’m watching these days.  I’ll *definitely* be writing up our ZFS experiments as they come along and I have interesting data to share.  Stay tuned.  
 
Oh, and if you’re curious about what I *wish* was on the list, there’s really only one thing:  iTunes syncing.  I have two desktops (one at my office, one at home) and two laptops, plus my wife has accounts on my computers.  Keeping those all in sync so that when I update a playlist at the office, the update is waiting for me at home, is a nightmare.  I’d pay lots of money if someone could solve that - seems like iTunes + AWS + a smart coder = solved, no?  Wish I had some time….

MySQL and the Linux swap problem

Thursday, May 1st, 2008

Ever since Peter over at Percona wrote about MySQL and swap, I’ve been meaning to write this post. But after I saw Dathan Pattishall’s post on the subject, I knew I’d better actually do it. :)

There’s a nasty problem with Linux 2.6 even when you have a ton of RAM. No matter what you do, including setting /proc/sys/vm/swappiness = 0, your OS is going to prefer swapping stuff out rather than freeing up system cache. On a single-use machine, where the application is better at utilizing RAM than the system is, this is incredibly stupid. Our MySQL boxes are a perfect example - they run only MySQL and we want InnoDB to have a lot of RAM (32-64GB … and we’re testing 128GB).

You can’t just not have any swap partitions, though, or kswapd will literally dominate one of your CPU cores doing who-knows-what. But you can’t have it swapping to disk, or your performance goes into the toilet. So what to do?

Our solution is to make swap partitions out of RAM disks. Yes, I realize how insane that sounds, but the Linux kernel’s insanity drove us to it. Best part? It works. Here’s how:

mkdir /mnt/ram0
mkfs.ext3 -m 0 /dev/ram0
mount /dev/ram0 /mnt/ram0
dd bs=1024 count=14634 if=/dev/zero of=/mnt/ram0/swapfile
mkswap /mnt/ram0/swapfile
swapon /mnt/ram0/swapfile

That’ll give you a 14MB swap partition that’s actually in RAM, so it’s super-fast. This assumes your kernel is creating 16MB ramdisk partitions, but you can adjust your kernel paramenters and/or the ‘dd’ line above to suit whatever size you want.

We’ve found that anywhere from 20MB-40MB tends to be enough (so use /dev/ram1, /dev/ram2, etc), depending on load of the box. kswapd no longer uses any noticeable CPU, there’s always a few MB of free “swap”, and life is back in the fast lane. Just add those lines to your relevant startup file, like /etc/rc.d/rc.local, and it’ll persist after reboots.

Some Linux purists will probably hate this approach, others may have more efficient ways of achieving the same thing, but this works for us. Give it a shot. :)

Oh, and I hope it goes without saying, but make *darn* sure you know what you’re running on your box and what the maximum RAM footprint will be before you try running with only 20-40MB of swap. We’ve never OOMed (Out-Of-Memory) a production MySQL box - but that’s because we’re careful.

UPDATE: See what happens when I wait to blog? I forget that I read another related post over on Kevin Burton’s blog. Like Kevin, we’re using O_DIRECT, but unlike Kevin, this doesn’t solve the problem for us. Linux still swaps. We use the latest 2.6.18-53.1.14.el5 kernel from CentOS 5, btw. (Sorry, had posted 2.6.9 because I was dumb. We’re fully patched)