<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SmugBlog: Don MacAskill &#187; datacenter</title>
	<atom:link href="http://blogs.smugmug.com/don/category/datacenter/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.smugmug.com/don</link>
	<description>Thought stream from SmugMug's CEO &#38; Chief Geek</description>
	<lastBuildDate>Fri, 23 Oct 2009 04:38:57 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9-rare</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Great things afoot in the MySQL community</title>
		<link>http://blogs.smugmug.com/don/2008/12/23/great-things-afoot-in-the-mysql-community/</link>
		<comments>http://blogs.smugmug.com/don/2008/12/23/great-things-afoot-in-the-mysql-community/#comments</comments>
		<pubDate>Tue, 23 Dec 2008 23:09:52 +0000</pubDate>
		<dc:creator>Don MacAskill</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[datacenter]]></category>
		<category><![CDATA[concurrency]]></category>
		<category><![CDATA[drizzle]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[Innodb]]></category>
		<category><![CDATA[mark callaghan]]></category>
		<category><![CDATA[percona]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[s7410]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[ssd]]></category>
		<category><![CDATA[sun]]></category>
		<category><![CDATA[transactions]]></category>
		<category><![CDATA[xtradb]]></category>
		<category><![CDATA[zfs]]></category>

		<guid isPermaLink="false">http://blogs.smugmug.com/don/?p=521</guid>
		<description><![CDATA[tl;dr: The MySQL community rocks.  Percona, XtraDB, Drizzle, SSD storage, InnoDB IO scalability challenges.
For anyone who lives and dies by MySQL and InnoDB, things are finally starting to heat up and get interesting.  I&#8217;ve been banging the &#8220;MySQL/InnoDB scales poorly&#8221; drums for years now, and despite having paid Enterprise licenses, I haven&#8217;t been [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.google.com/search?q=tl;dr">tl;dr</a>: The MySQL community rocks.  <a href="http://www.percona.com/">Percona</a>, <a href="http://www.mysqlperformanceblog.com/2008/12/16/announcing-percona-xtradb-storage-engine-a-drop-in-replacement-for-standard-innodb/">XtraDB</a>, <a href="https://launchpad.net/drizzle">Drizzle</a>, <a href="http://www.sun.com/storage/disk_systems/unified_storage/7410/index.xml">SSD storage</a>, <a href="http://mysqlha.blogspot.com/2008/12/other-performance-problem.html">InnoDB IO scalability challenges</a>.</p>
<p>For anyone who lives and dies by MySQL and InnoDB, things are finally starting to heat up and get interesting.  I&#8217;ve been banging the &#8220;MySQL/InnoDB scales poorly&#8221; drums for years now, and despite having paid Enterprise licenses, I haven&#8217;t been able to get anywhere.  I was pretty excited when Sun bought MySQL since their future is intrinsically tied to concurrency, but things have been pretty slow going over there this year. </p>
<p>But the community has finally taken up arms and is fighting the good fight.  It&#8217;s (finally!) a great time to be a MySQL user because there&#8217;s been lots of recent progress.  Here&#8217;re some of my favorites (and highlights of work left to do):</p>
<p><strong>PERCONA</strong></p>
<p>I can&#8217;t sing <a href="http://www.percona.com/">Percona&#8217;s</a> praises enough.  They&#8217;re probably the most knowledgeable MySQL experts out there (possibly even including Sun).  Absolutely the best bang for the buck in terms of MySQL service and support &#8211; better than MySQL&#8217;s own offering.  (If I had to guess why that is, I&#8217;d bet that MySQL/Sun don&#8217;t want to step on Oracle&#8217;s toes by fixing InnoDB &#8211; but >99% of what we need is related to InnoDB.  Percona has no such tip-toeing limitations.)  Let me quickly count the ways they&#8217;ve helped me in the last few months:</p>
<ul>
<li>They knew of a super obscure configuration setting &#8220;<a href="http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_back_log">back_log</a>&#8220;.  Have you ever heard of it?  I hadn&#8217;t.  But we started seeing latency on MySQL connections (up to *3 seconds*!) on systems that hadn&#8217;t changed recently (exactly 3 seconds sounded awfully suspicious, and sure enough, it was TCP retries).  After going through every single kernel, network, and MySQL tuning parameter I know (and I know a lot), I finally called Percona.  They dug in, investigated the system, and unearthed &#8216;back_log&#8217; within an hour or two.  Popped that into my configuration and boom, everything was fine again.  Whew!</li>
<li>We have servers that easily exceed InnoDB&#8217;s transaction limits.  Did you know InnoDB has a <a href="http://bugs.mysql.com/bug.php?id=26590">concurrent transaction limit of 1024</a>?  (Technically, 1024 INSERTs and 1024 UPDATEs.  But INSERT &#8230; ON DUPLICATE KEY UPDATE manages to chew up one of each).  I know all about it &#8211; I&#8217;ve had bugs open with MySQL Enterprise for more than 2 years on the issue.  What&#8217;s more, these are low-end systems &#8211; 4 cores, 16GB of RAM &#8211; and they&#8217;re no-where near CPU or IO bound.  It took MySQL months to figure out what the problem was (years, really, to figure out all the final details like the different undo logs for INSERT vs UPDATE).  Their final answer?  It&#8217;ll be fixed in MySQL 6.  <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' />  Note that 5.1 *just* went GA after years and years.  On the other hand, it took Percona one weekend to diagnose the problem, and 13 days to have a preliminary patch ready to extend it to 4072 undo slots.  Talk about progress!  (And yes, we want Percona to release the patch to the world)</li>
<li>Solving the CPU scaling problems.  These have been plaguing us for years (we have had some older four-socket systems for awhile &#8230; now with quad-core, it&#8217;s even worse), and thanks to <a href="http://code.google.com/p/google-mysql-tools/wiki/Mysql5Patches">Google</a> and <a href="http://www.percona.com/percona-lab.html">Percona</a>, this problem is well on its way to being solved.  We&#8217;re sponsoring this work and can&#8217;t wait to see what happens next.</li>
<li><a href="http://www.mysqlperformanceblog.com/2008/12/16/announcing-percona-xtradb-storage-engine-a-drop-in-replacement-for-standard-innodb/">XtraDB</a>.  This is the biggy.  So big it deserves its own heading&#8230;.</li>
</ul>
<p><strong>XTRADB</strong></p>
<p>Oracle&#8217;s done a terrible job of supporting the community with InnoDB.  The conspiracy theorists can all say &#8220;I told you so!  Oracle bought them to halt MySQL progress&#8221; now &#8211; history supports them.  Which is a shame &#8211; Heikki is a great guy and has done amazing work with InnoDB, but the fact remains that it wasn&#8217;t moving forward.  The InnoDB plugin release was disappointing, to say the least.  It addressed none of the CPU or IO scalability issues the community has been crying about for years.</p>
<p>Luckily, Percona finally did what everyone else has been too afraid to do &#8211; they forked InnoDB.  <a href="http://www.mysqlperformanceblog.com/2008/12/16/announcing-percona-xtradb-storage-engine-a-drop-in-replacement-for-standard-innodb/">XtraDB</a> is their storage engine, forked from InnoDB (and then <a href="http://www.mysqlperformanceblog.com/2008/12/18/xtradb-benchmarks-15x-gain/">turbocharged</a>!).  We&#8217;re not running it in production yet, but we are running all of the patches that went into XtraDB and I can tell you they&#8217;re great.  We&#8217;re sponsoring more XtraDB development (and yes, we made sure Percona will be contributing anything they build for us back to the community) with Percona, and I&#8217;m sure that&#8217;ll continue.</p>
<p><strong>DRIZZLE</strong></p>
<p>I&#8217;ve already <a href="http://blogs.smugmug.com/don/2008/09/17/hot-technologies-i-care-about-sep-08/">blogged a bit about Drizzle</a>, but it sure looks like <a href="https://launchpad.net/drizzle">Drizzle</a> + XtraDB might be a match made in heaven.  Drizzle can be though of as a MySQL engine re-write with an eye towards web workloads and performance, rather than features.  MySQL 4.1, 5.0, and 5.1 added a lot of features that bloated the code without offering anything really useful to web-oriented workloads like ours, so the Drizzle team is ripping all that stuff back out and rethinking the approaches to the things that are being left in.  Very exciting.</p>
<p><strong>SSD STORAGE</strong></p>
<p>The advent of &#8220;cheap enough&#8221; super-fast SSD storage is finally upon us.  I&#8217;ve got <a href="http://www.sun.com/storage/disk_systems/unified_storage/7410/index.xml">Sun S7410</a> storage appliances in production and they&#8217;re blazingly fast.  I have a very thorough review coming, but the short version is that even with NFS latencies, we&#8217;re able to do obscene write workloads to these boxes (let alone reads).  10000+ write IOPS to 10TB of mirrored, crazy durable (thanks <a href="http://blogs.smugmug.com/don/2008/10/10/success-with-opensolaris-zfs-mysql-in-production/">ZFS</a>!) storage is a dream come true.  Once you mix in snapshots, clones, replication, and Analytics &#8211; well, it just doesn&#8217;t get much better than this.  </p>
<p>(Don&#8217;t get sticker shock looking at the web pricing &#8211; no-one pays anything even remotely like that.  Sign up for <a href="http://www.sun.com/emrkt/startupessentials/">Startup Essentials</a> if you can, or talk to your Sun sales rep if you can&#8217;t, and you can get them much cheaper.  I nearly had a heart attack myself until I got &#8220;real&#8221; pricing.  Tell them I sent you &#8211; enough Sun people read this blog, it might just help <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  ).</p>
<p><strong>STILL NEEDED&#8230;</strong></p>
<p>So, all in all, there&#8217;s been an awful lot of progress this year, which is great.  CPUs are finally scaling under InnoDB, and we finally have storage that isn&#8217;t bounded by physical rotation and mechanical arms.  Unfortunately, great CPU scaling plus amazing IO capabilities isn&#8217;t something InnoDB digests very well.  As is common in complicated systems, once you fix one bottleneck, another one elsewhere in the system crops up.  This time, it&#8217;s IOPS.  It was eerie reading <a href="http://mysqlha.blogspot.com/2008/12/other-performance-problem.html">Mark Callaghan&#8217;s post about this</a> last night &#8211; I&#8217;d come to the exact same conclusions (from an Operations point of view rather than code-level) just yesterday.    </p>
<p>Bottom line:  Despite having ample CPU and ample IO, InnoDB isn&#8217;t capable of using the IO provided.  You can bet we&#8217;ll be working with Percona, Google and Sun (read: sitting back and admiring their brilliant work while writing the occasional check and providing production workload information) to look into fixing this.</p>
<p>In the meantime, we&#8217;re back to the old standbys:  replication and data partitioning.  Yes, we&#8217;re stacking lots of MySQL instances on each S7410 to maximize both our IOPS and our budget.  Fun stuff &#8211; more on that later.  <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><strong>UPDATE:</strong> Just occurred to me that there are plenty of *new* readers to my blog who haven&#8217;t heard me praise <a href="http://code.google.com/p/google-mysql-tools/wiki/Mysql5Patches">Google and their patches</a> before.  <a href="http://mysqlha.blogspot.com/">Mark Callaghan&#8217;s</a> team over at Google definitely deserves a shout-out &#8211; they&#8217;ve really been a catalyst for much of this work along with Percona.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.smugmug.com/don/2008/12/23/great-things-afoot-in-the-mysql-community/feed/</wfw:commentRss>
		<slash:comments>36</slash:comments>
		</item>
		<item>
		<title>On Why Auto-Scaling in the Cloud Rocks</title>
		<link>http://blogs.smugmug.com/don/2008/12/09/on-why-auto-scaling-in-the-cloud-rocks/</link>
		<comments>http://blogs.smugmug.com/don/2008/12/09/on-why-auto-scaling-in-the-cloud-rocks/#comments</comments>
		<pubDate>Tue, 09 Dec 2008 19:32:26 +0000</pubDate>
		<dc:creator>Don MacAskill</dc:creator>
				<category><![CDATA[amazon]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[datacenter]]></category>
		<category><![CDATA[amazon web services]]></category>
		<category><![CDATA[auto-scaling]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[george reese]]></category>
		<category><![CDATA[o'reilly]]></category>
		<category><![CDATA[skynet]]></category>

		<guid isPermaLink="false">http://blogs.smugmug.com/don/?p=514</guid>
		<description><![CDATA[
In high school, I had a great programmable calculator.  I&#8217;d program it to solve complicated math and science problems &#8220;automatically&#8221; for me.  Most of my teachers got upset if they found out, but I&#8217;ll always remember one especially enlightened teacher who didn&#8217;t.  He said something to the effect of &#8220;Hey, if you [...]]]></description>
			<content:encoded><![CDATA[<div class="center"><img class="photo" src="http://don.smugmug.com/photos/305993656_rDF8S-M.jpg" alt="SkyNet Lives - EC2 at SmugMug" /></div>
<p>In high school, I had a great programmable calculator.  I&#8217;d program it to solve complicated math and science problems &#8220;automatically&#8221; for me.  Most of my teachers got upset if they found out, but I&#8217;ll always remember one especially enlightened teacher who didn&#8217;t.  He said something to the effect of &#8220;Hey, if you managed to write software to solve the equation, you must thoroughly understand the problem.  Way to go!&#8221;.  </p>
<p>George Reese wrote up a blog post over at O&#8217;Reilly the other day called <a href="http://broadcast.oreilly.com/2008/12/why-i-dont-like-cloud-auto-scaling.html">On Why I Don&#8217;t Like Auto-Scaling in the Cloud</a>.  His main argument seems to be that auto-scaling is bad and reflects poor capacity planning. In the comments, he specifically calls SmugMug out, saying we&#8217;re  &#8220;using auto-scaling as a crutch for poor or non-existent capacity planning&#8221;.</p>
<p>George is like one of those math teachers who doesn&#8217;t &#8220;get it&#8221;.  I was tempted not to write this post because he gets it so wrong, I&#8217;d hate to spread that meme.  <a href="http://blogs.smugmug.com/don/2008/06/03/skynet-lives-aka-ec2-smugmug/">SkyNet auto-scales well</a>.  No humans at SmugMug are monitoring it and it just hums along, doing its job.  Why is it so efficient?  Because I understand the equation.  I know what metrics drive our capacity planning and I programmed SkyNet to take these into account.  It checks an awful lot of data points every minute or so &#8211; this isn&#8217;t simply &#8220;oh, we have idle CPU, let&#8217;s kill some instances.&#8221;  (I would argue that, depending on the application, simple auto-scaling based on CPU usage or similar data point can be very effective, too, though).  </p>
<p>SkyNet has been in production for over a year with only two incidents of note and SmugMug has more than doubled in size and capacity during that time without adding any new operations people.  How on earth is this a bad thing?  </p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.smugmug.com/don/2008/12/09/on-why-auto-scaling-in-the-cloud-rocks/feed/</wfw:commentRss>
		<slash:comments>63</slash:comments>
		</item>
		<item>
		<title>Sweet new Sun storage stuff on Monday, Nov 10th</title>
		<link>http://blogs.smugmug.com/don/2008/11/09/sweet-new-sun-storage-stuff-on-monday-nov-10th/</link>
		<comments>http://blogs.smugmug.com/don/2008/11/09/sweet-new-sun-storage-stuff-on-monday-nov-10th/#comments</comments>
		<pubDate>Sun, 09 Nov 2008 21:31:48 +0000</pubDate>
		<dc:creator>Don MacAskill</dc:creator>
				<category><![CDATA[datacenter]]></category>
		<category><![CDATA[servers]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[sun]]></category>

		<guid isPermaLink="false">http://blogs.smugmug.com/don/?p=485</guid>
		<description><![CDATA[FYI, Sun is announcing some sweet new storage stuff on Monday at 3:30pm PT.  
I&#8217;m reviewing a few of the things they&#8217;re announcing, and hope to publish my thoughts here soon (one of them joins my production network tonight if all goes well).  However, I&#8217;m at Disneyland with my kids (first trip!) from [...]]]></description>
			<content:encoded><![CDATA[<p>FYI, Sun is <a href="http://www.sun.com/whatsnew/?intcmp=2168">announcing some sweet new storage stuff</a> on Monday at 3:30pm PT.  </p>
<p>I&#8217;m reviewing a few of the things they&#8217;re announcing, and hope to publish my thoughts here soon (one of them joins my production network tonight if all goes well).  However, I&#8217;m at Disneyland with my kids (first trip!) from Monday through Thursday, so I don&#8217;t know (yet) when I&#8217;ll be able to write them up.  Bear with me if it takes a few days.</p>
<p>But the gear is exciting, and the direction Sun is headed is even more exciting!</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.smugmug.com/don/2008/11/09/sweet-new-sun-storage-stuff-on-monday-nov-10th/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Huge EC2 release: Load Balancing &amp; Auto-Scaling!</title>
		<link>http://blogs.smugmug.com/don/2008/10/27/huge-ec2-release-load-balancing-auto-scaling/</link>
		<comments>http://blogs.smugmug.com/don/2008/10/27/huge-ec2-release-load-balancing-auto-scaling/#comments</comments>
		<pubDate>Mon, 27 Oct 2008 15:57:17 +0000</pubDate>
		<dc:creator>Don MacAskill</dc:creator>
				<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[datacenter]]></category>
		<category><![CDATA[amazon web services]]></category>
		<category><![CDATA[auto-scaling]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[ebs]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[elastic block storage]]></category>
		<category><![CDATA[load balancing]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[simpledb]]></category>

		<guid isPermaLink="false">http://blogs.smugmug.com/don/?p=471</guid>
		<description><![CDATA[
June 5th, 2008 near Maryville, Missouri by Shane Kirk

In case you didn&#8217;t see it, Amazon had a huge EC2 announcement the other day that included:

EC2 is now out of beta.
EC2 has a SLA!
Windows is now availabled on EC2
SQL Server is now available on EC2

But the really cool bits, if you ask me, are the announcements [...]]]></description>
			<content:encoded><![CDATA[<div class="center"><a href="http://shanekirk.smugmug.com/gallery/4486827_hgWRJ#308491696_samyX"><img class="photo" src="http://shanekirk.smugmug.com/photos/308491696_samyX-M-2.jpg" alt="June 5th, 2008 near Maryville, Missouri" /></a>
<p class="photoby" style="width: 600px;">June 5th, 2008 near Maryville, Missouri by <a href="http://shanekirk.smugmug.com/">Shane Kirk</a></p>
</div>
<p>In case you didn&#8217;t see it, <a href="http://aws.typepad.com/aws/2008/10/big-day-for-ec2.html">Amazon had a huge EC2 announcement</a> the other day that included:</p>
<ul>
<li>EC2 is now out of beta.</li>
<li>EC2 has a SLA!</li>
<li>Windows is now availabled on EC2</li>
<li>SQL Server is now available on EC2</li>
</ul>
<p>But the really cool bits, if you ask me, are the announcements about the next wave of related services:</p>
<ul>
<li>Monitoring</li>
<li>Load Balancing</li>
<li>Auto-Scaling</li>
<li>A web-based management console</li>
</ul>
<p>As frequent readers of my blog and/or conference talks will know, this means one of the last important building blocks to creating fully cloud-hosted applications *at scale* is nearly ready for primetime.  </p>
<p>For those keeping score at home, my personal checklist shows that the only thing now missing is a truly scalable, truly bottomless database-like data store.  Neither Elastic Block Storage (EBS) nor SimpleDB really solve the entire scope of the problem, though they&#8217;re great building blocks that do solve big pieces (or everything, at smaller scale).  I&#8217;m positive that someone (Amazon or other) will solve this problem and I can start moving more stuff &#8220;to the Cloud&#8221;. </p>
<p>I can&#8217;t wait.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.smugmug.com/don/2008/10/27/huge-ec2-release-load-balancing-auto-scaling/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Live-tweeting Cloud keynote at PDC 2008</title>
		<link>http://blogs.smugmug.com/don/2008/10/27/live-tweeting-cloud-keynote-at-pdc-2008/</link>
		<comments>http://blogs.smugmug.com/don/2008/10/27/live-tweeting-cloud-keynote-at-pdc-2008/#comments</comments>
		<pubDate>Mon, 27 Oct 2008 15:25:29 +0000</pubDate>
		<dc:creator>Don MacAskill</dc:creator>
				<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[datacenter]]></category>
		<category><![CDATA[pdc]]></category>
		<category><![CDATA[pdc08]]></category>
		<category><![CDATA[pdc2008]]></category>
		<category><![CDATA[tweet]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://blogs.smugmug.com/don/?p=466</guid>
		<description><![CDATA[
UFO OR CLOUD? by Shane Kirk

Microsoft is announcing some exciting Cloud Computing stuff today at their Professional Developers Conference (PDC).  Assuming it&#8217;s the same stuff (and more?) I&#8217;ve been briefed on over the last year, it&#8217;s pretty exciting stuff.
I&#8217;ll be live-tweeting the best bits over on my Twitter account.  If this stuff is [...]]]></description>
			<content:encoded><![CDATA[<div class="center"><a href="http://shanekirk.smugmug.com/gallery/4437987_ShGg8#261065337_rvQnM"><img class="photo" src="http://shanekirk.smugmug.com/photos/261065337_rvQnM-M-14.jpg" alt="UFO OR CLOUD?" /></a>
<p class="photoby" style="width: 600px;">UFO OR CLOUD? by <a href="http://shanekirk.smugmug.com/">Shane Kirk</a></p>
</div>
<p>Microsoft is announcing some exciting Cloud Computing stuff today at their <a href="http://www.microsoftpdc.com/">Professional Developers Conference (PDC)</a>.  Assuming it&#8217;s the same stuff (and more?) I&#8217;ve been briefed on over the last year, it&#8217;s pretty exciting stuff.</p>
<p>I&#8217;ll be live-tweeting the best bits over on <a href="http://twitter.com/DonMacAskill">my Twitter account</a>.  If this stuff is interesting to you, come check it out.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.smugmug.com/don/2008/10/27/live-tweeting-cloud-keynote-at-pdc-2008/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ZFS &amp; MySQL/InnoDB Compression Update</title>
		<link>http://blogs.smugmug.com/don/2008/10/13/zfs-mysqlinnodb-compression-update/</link>
		<comments>http://blogs.smugmug.com/don/2008/10/13/zfs-mysqlinnodb-compression-update/#comments</comments>
		<pubDate>Mon, 13 Oct 2008 22:43:45 +0000</pubDate>
		<dc:creator>Don MacAskill</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[datacenter]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[gzip]]></category>
		<category><![CDATA[lzjb]]></category>
		<category><![CDATA[opensolaris]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[zfs]]></category>

		<guid isPermaLink="false">http://blogs.smugmug.com/don/?p=425</guid>
		<description><![CDATA[
Network.com setup in Vegas, Thumper disk bay, green by Shawn Ferry

As I expected it would, the fact that I used ZFS compression on our MySQL volume in my little OpenSolaris experiment struck a chord in the comments.  I chose gzip-9 for our first pass for a few reasons:

I wanted to see what the &#8220;best [...]]]></description>
			<content:encoded><![CDATA[<div class="center"><a href="http://lalartu.smugmug.com/gallery/4203013_VhgB8#245651304_UvE8Y"><img class="photo" src="http://lalartu.smugmug.com/photos/245651304_UvE8Y-M.jpg" alt="Network.com setup in Vegas, Thumper disk bay, green by Shawn Ferry" /></a>
<p class="photoby" style="width: 600px;">Network.com setup in Vegas, Thumper disk bay, green by <a href="http://lalartu.smugmug.com/">Shawn Ferry</a></p>
</div>
<p>As I expected it would, the fact that I used ZFS compression on our MySQL volume in <a href="http://blogs.smugmug.com/don/2008/10/10/success-with-opensolaris-zfs-mysql-in-production/">my little OpenSolaris experiment</a> struck a chord in the comments.  I chose gzip-9 for our first pass for a few reasons:</p>
<ol>
<li>I wanted to see what the &#8220;best case&#8221; compression ratio was for our dataset (InnoDB tables)</li>
<li>I wanted to see what the &#8220;worst case&#8221; CPU usage was for our workload</li>
<li>I don&#8217;t have a lot of time.  I need to try something quick &#038; dirty.</li>
</ol>
<p>I got both those data points with enough granularity to be useful:  a 2.12X compression ratio over a large &#038; varied dataset, and the compression was fast enough to not really be noticeable for my end users.  The next step, obviously, is to find out what the best ratio of compression and CPU is for our data.  So I spent the morning testing exactly that.  Here are the details:</p>
<ul>
<li>Created 11 new ZFS volumes (compression = [none | lzjb | gzip1-9])</li>
<li>Grabbed 4 InnoDB tables of varying sizes and compression ratios and loaded them in the disk cache</li>
<li>Timed the time (using &#8216;ptime&#8217;) it took to read the file from cache and write it to disk (using &#8216;cp&#8217;), watching CPU utilization (using &#8216;top&#8217;, &#8216;prstat&#8217;, and &#8216;mpstat&#8217;)</li>
</ul>
<p>It quickly became obvious that there&#8217;s relatively little difference in compression between gzip-1 and gzip-9 (and, contrary to what people were saying in the comments, relatively little difference between CPU usage, either, in 3 of the 4 cases.  The other case, though&#8230;  yikes!).  So I quickly stopped even doing anything but &#8216;none&#8217;, &#8216;lzjb&#8217;, &#8216;gzip-1&#8242;, and &#8216;gzip-9&#8242;.  (<a href="http://en.wikipedia.org/wiki/LZJB">LZJB</a> is the default compression for ZFS &#8211; gzip-N was added later as an option).</p>
<p>Note that all the files were pre-cached in RAM before doing any of the tests, and &#8216;iostat&#8217; verified we were doing zero reads.  Also note that this is writing to two DAS enclosures with 15 x 15K SCSI disks apiece (28 spindles in a striped+mirrored configuration) with 512MB of write cache apiece.  So these tests complete very quickly from an I/O perspective because we&#8217;re either writing to cache (for the smaller files) or writing to tons of fast spindles at once (the bigger files).  In theory, this should mean we&#8217;re testing CPU more than we&#8217;re testing our IO &#8211; which is the whole point.</p>
<p>I ran each &#8216;cp&#8217; at least 10 times, letting the write cache subside each time, selecting the fastest one as the shown result.  Here they are (and be sure to read the CPU utilization note after the tables):</p>
<table cellspacing="5" cellpadding="5" border="1">
<tr>
<td colspan="4">TABLE1</td>
</tr>
<tr>
<td class="strong"><strong>compression</strong></td>
<td class="strong"><strong>size</strong></td>
<td class="strong"><strong>ratio</strong></td>
<td class="strong"><strong>time</strong></td>
</tr>
<tr>
<td>uncompressed</td>
<td>172M</td>
<td>1</td>
<td>0.207s</td>
</tr>
<tr>
<td>lzjb</td>
<td>79M</td>
<td>2.18X</td>
<td>0.234s</td>
</tr>
<tr>
<td>gzip-1</td>
<td>50M</td>
<td>3.44X</td>
<td>0.24s</td>
</tr>
<tr>
<td>gzip-9</td>
<td>46M</td>
<td>3.73X</td>
<td>0.217s</td>
</tr>
</table>
<p>Notes on TABLE1:</p>
<ul>
<li>This dataset seems to be small enough that much of time is probably spent in system internals, rather than actually reading, compressing, and writing data, so I view this as only an interesting size datapoint, rather than size and time.  Feel free to correct me, though.  <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
</ul>
<table cellspacing="5" cellpadding="5" border="1">
<tr>
<td colspan="5">TABLE2</td>
</tr>
<tr>
<td class="strong"><strong>compression</strong></td>
<td class="strong"><strong>size</strong></td>
<td class="strong"><strong>ratio</strong></td>
<td class="strong"><strong>time</strong></td>
<td><strong>ratio</strong></td>
</tr>
<tr>
<td>uncompressed</td>
<td>631M</td>
<td>1</td>
<td>1.064s</td>
<td>1</td>
</tr>
<tr>
<td>lzjb</td>
<td>358M</td>
<td>1.76X</td>
<td>0.668</td>
<td>1.59X</td>
</tr>
<tr>
<td>gzip-1</td>
<td>253M</td>
<td>2.49X</td>
<td>1.302</td>
<td>0.82X</td>
</tr>
<tr>
<td>gzip-9</td>
<td>236M</td>
<td>3.73X</td>
<td>11.1s</td>
<td>0.10X</td>
</tr>
</table>
<p>Notes on TABLE2:</p>
<ul>
<li>gzip-9 is massively slower on this particular hunk of data.  I&#8217;m no expert on gzip, so I have no idea why this would be, but you can see the tradeoff is probably rarely worth it, even if were using precious storage commodities (say, flash or RAM rather than hard disks).  I ran this one extra times just to make sure.  Seems valid (or a bug).</li>
</ul>
<table cellspacing="5" cellpadding="5" border="1">
<tr>
<td colspan="5">TABLE3</td>
</tr>
<tr>
<td class="strong"><strong>compression</strong></td>
<td class="strong"><strong>size</strong></td>
<td class="strong"><strong>ratio</strong></td>
<td class="strong"><strong>time</strong></td>
<td><strong>ratio</strong></td>
</tr>
<tr>
<td>uncompressed</td>
<td>2675M</td>
<td>1</td>
<td>15.041s</td>
<td>1</td>
</tr>
<tr>
<td>lzjb</td>
<td>830M</td>
<td>3.22X</td>
<td>5.274</td>
<td>2.85X</td>
</tr>
<tr>
<td>gzip-1</td>
<td>246M</td>
<td>10.87X</td>
<td>44.287</td>
<td>0.34X</td>
</tr>
<tr>
<td>gzip-9</td>
<td>220M</td>
<td>12.16X</td>
<td>52.475</td>
<td>0.29X</td>
</tr>
</table>
<p>Notes on TABLE3:</p>
<ul>
<li>LZJB really shines here, performance wise.  It delivers roughly 3X faster performance while also chewing up roughly 3X less bytes.  Awesome.</li>
<li>gzip&#8217;s compression ratios are crazy great on this hunk of data, but the performance is pretty awful.  Definitely CPU-bound, not IO-bound.</li>
</ul>
<table cellspacing="5" cellpadding="5" border="1">
<tr>
<td colspan="5">TABLE4</td>
</tr>
<tr>
<td class="strong"><strong>compression</strong></td>
<td class="strong"><strong>size</strong></td>
<td class="strong"><strong>ratio</strong></td>
<td class="strong"><strong>time</strong></td>
<td><strong>ratio</strong></td>
</tr>
<tr>
<td>uncompressed</td>
<td>2828M</td>
<td>1</td>
<td>17.09s</td>
<td>1</td>
</tr>
<tr>
<td>lzjb</td>
<td>1814M</td>
<td>1.56X</td>
<td>14.495s</td>
<td>1.18X</td>
</tr>
<tr>
<td>gzip-1</td>
<td>1384M</td>
<td>2.04X</td>
<td>48.895s</td>
<td>0.35X</td>
</tr>
<tr>
<td>gzip-9</td>
<td>1355M</td>
<td>2.09X</td>
<td>54.672s</td>
<td>0.31X</td>
</tr>
</table>
<p>Notes on TABLE4:</p>
<ul>
<li>Again, LZJB performs quite well.  1.5X bytes saved while remaining faster.  Nice!</li>
<li>gzip is again very obviously CPU bound, rather than IO-bound.  Dang.</li>
</ul>
<p>There&#8217;s one other very important datapoint here that &#8216;ptime&#8217; itself didn&#8217;t show &#8211; <strong>CPU utilization</strong>.  On every run with LZJB, both &#8216;top&#8217; and &#8216;mpstat&#8217; showed idle CPU.  The most I saw it consume was 70% of the aggregate of all 4 CPUs, but the average was typically 30-40%.  gzip, on the other hand, pegged all 4 CPUs on each run.  Both &#8216;top&#8217; and &#8216;mpstat&#8217; verified that 0% CPU was idle, and interactivity on the bash prompt was terrible on gzip runs.</p>
<p>Some other crazy observations that I can&#8217;t explain (yet?):</p>
<ul>
<li>After a copy (even to an uncompressed volume), &#8216;du&#8217; wouldn&#8217;t always show the right bytes.  It took time (many seconds) before showing the right # of bytes, even after doing things like &#8216;md5sum&#8217;.  I have no idea why this might be.</li>
<li>gzip-9 made a smaller file (1355M vs 1380M) on this new volume as opposed to my big production volume (which is gzip-9 also).  I assume this must be due to a different compression dictionary or something, but it was interesting.</li>
<li>Sometimes I&#8217;d get strange error messages trying to copy a file over an existing one (removing the existing one and trying again always worked):
<pre class="codebox"><code>
bash-3.2# ptime cp table4.ibd /data/compression/gzip-1
cp: cannot create /data/compression/gzip-1/table4.ibd: Arg list too long
</code></pre>
</li>
<li>After running lots of these tests, I wasn&#8217;t able to start MySQL anymore.  It crashed on startup, unable to allocate enough RAM for InnoDB&#8217;s buffer pool.  (You may recall from my last post that MySQL seems to be more RAM limited under OpenSolaris than Linux).  I suspect that ZFS&#8217;s ARC might have sucked up all the RAM and was unwilling to relinquish it, but I wasn&#8217;t sure.  So I rebooted and everything was fine.  <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> </li>
</ul>
<p>Conclusion?  Unless you care a great deal about eking out every last byte (using a RAM disk, for example), LZJB seems like a much saner compression choice.  Performance seem to improve, rather than degrade, and it doesn&#8217;t hog your CPU.  I&#8217;m switching my ZFS volume to LZJB right now (on-the-fly changes &#8211; woo!) and will copy all my data so it gets the new compression settings.  I&#8217;ll sacrifice some bytes, but that&#8217;s ok &#8211; performance is king.  <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Also, my theory that I&#8217;d always have idle CPU with modern multi-core chips so compression wouldn&#8217;t be a big deal seems to be false.  Clearly, with gzip, it&#8217;s possible to hog your entire CPU if you&#8217;re doing big long writes.  We don&#8217;t tend to do high-MB/s reads or writes, but it&#8217;s clearly something to think about.  LZJB seems to be the right balance.</p>
<p>So, what should I test next?  I wouldn&#8217;t mind testing compression latencies on very small reads/writes more along the lines of what our DB actually does, but I don&#8217;t know how to do that in a quick &#038; dirty way like I was able to here.  </p>
<p>Also, I have to admit, I&#8217;m curious about the different checksum options.  Has anyone played with anything other than the default?</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.smugmug.com/don/2008/10/13/zfs-mysqlinnodb-compression-update/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>Success with OpenSolaris + ZFS + MySQL in production!</title>
		<link>http://blogs.smugmug.com/don/2008/10/10/success-with-opensolaris-zfs-mysql-in-production/</link>
		<comments>http://blogs.smugmug.com/don/2008/10/10/success-with-opensolaris-zfs-mysql-in-production/#comments</comments>
		<pubDate>Fri, 10 Oct 2008 22:14:30 +0000</pubDate>
		<dc:creator>Don MacAskill</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[datacenter]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[dell]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[filesystem]]></category>
		<category><![CDATA[filesystem compression]]></category>
		<category><![CDATA[freebsd]]></category>
		<category><![CDATA[fuse]]></category>
		<category><![CDATA[gzip]]></category>
		<category><![CDATA[hardware raid]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[lvm]]></category>
		<category><![CDATA[lvm2]]></category>
		<category><![CDATA[mac os x]]></category>
		<category><![CDATA[md3000]]></category>
		<category><![CDATA[opensolaris]]></category>
		<category><![CDATA[raid]]></category>
		<category><![CDATA[smugmug]]></category>
		<category><![CDATA[software raid]]></category>
		<category><![CDATA[solaris]]></category>
		<category><![CDATA[sun]]></category>
		<category><![CDATA[sunfire]]></category>
		<category><![CDATA[volume management]]></category>
		<category><![CDATA[volume manager]]></category>
		<category><![CDATA[x2200]]></category>
		<category><![CDATA[zfs]]></category>

		<guid isPermaLink="false">http://blogs.smugmug.com/don/?p=394</guid>
		<description><![CDATA[
Pimp My Drive by Richard and Barb

There&#8217;s remarkably little information online about using MySQL on ZFS, successfully or not, so I did what any enterprising geek would do:  Built a box, threw some data on it, and tossed it into production to see if it would sink or swim.   
I&#8217;m a Linux [...]]]></description>
			<content:encoded><![CDATA[<div class="center"><a href="http://banjon.smugmug.com/gallery/4356946_mQ7BB#307021908_jj9rM"><img class="photo" src="http://banjon.smugmug.com/photos/307021908_jj9rM-M.jpg" alt="Pimp My Drive by Richard and Barb" /></a>
<p class="photoby" style="width: 600px;">Pimp My Drive by <a href="http://banjon.smugmug.com/">Richard and Barb</a></p>
</div>
<p>There&#8217;s remarkably little information online about using <a href="http://dev.mysql.com/tech-resources/articles/mysql-zfs.html">MySQL on ZFS</a>, successfully or not, so I did what any enterprising geek would do:  Built a box, threw some data on it, and tossed it into production to see if it would sink or swim.  <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>I&#8217;m a Linux geek, have been since 1993 (<a href="http://en.wikipedia.org/wiki/Slackware">Slackware</a>!).  All of <a href="http://www.smugmug.com/">SmugMug&#8217;s</a> datacenters (and <a href="http://blogs.smugmug.com/don/2008/06/03/skynet-lives-aka-ec2-smugmug/">our EC2 images</a>) are built on Linux.  But the current state of filesystems on Linux is awful, and it&#8217;s been awful for at least 8 years.  As a result, we&#8217;ve put our first <a href="http://opensolaris.org/">OpenSolaris</a> box into production at SmugMug and I&#8217;ve been pleasantly surprised with the performance (the userland portions of the OS, though, leave a lot to be desired).  Why OpenSolaris?</p>
<p><strong>ZFS.</strong></p>
<p><a href="http://en.wikipedia.org/wiki/ZFS">ZFS</a> is the most amazing filesystem I&#8217;ve ever come across.  Integrated volume management.  Copy-on-write.  Transactional.  End-to-end data integrity.  On-the-fly corruption detection and repair.  Robust checksums.  No RAID-5 write hole.  Snapshots.  Clones (writable snapshots).  Dynamic striping.  Open source software.  It&#8217;s not available on Linux.  Ugh.  Ok, that sucks.  (GPL is a double-edged sword, and this is a perfect example).  Since it&#8217;s open-source, it&#8217;s available on other OSes, like FreeBSD and Mac OS X, but Linux is a no go.  *sigh*  I have a feeling Sun is working towards GPL&#8217;ing ZFS, but these things take time and I&#8217;m sick of waiting.</p>
<p>The OpenSolaris project is working towards making Solaris resemble the Linux (GNU) userland plus the Solaris kernel.  They&#8217;re not there yet, but the goal is commendable and the package management system has taken a few good steps in the right direction.  It&#8217;s still frustrating, but massively less so.  Despite all the rough edges, though, ZFS is just so compelling I basically have no choice.  I need end-to-end data integrity.  The rest of the stuff is just icing on an already delicious cake.</p>
<p>The obvious first place to use ZFS was for our database boxes, so that&#8217;s what I did.  I didn&#8217;t have the time, knowledge of OpenSolaris, or inclination to do any synthetic benchmarking or attempt to create an apples-to-apples comparison with our current software setup, so I took the quickest route I could to have a MySQL box up and running.  I had two immediate performance metrics I cared about:</p>
<ul>
<li>Can a MySQL slave on OpenSolaris with ZFS keep up with the write load with no readers?</li>
<li>If yes, can the slave shoulder its fair share of the reads, too?</li>
</ul>
<p>Simple and to the point.  Here&#8217;s the system:</p>
<ul>
<li><a href="http://blogs.smugmug.com/don/2007/04/11/sun-honeymoon-update-servers/">SunFire X2200 M2</a> w/64GB of RAM and 2 x dual-core 2.6GHz Opterons</li>
<li><a href="http://blogs.smugmug.com/don/2007/10/01/dell-md3000-great-das-db-storage/">Dell MD3000</a> w/15 x 15K SCSI disks and mirrored 512MB battery-backed write caches (these are <strong>really</strong> starting to piss us off, but that&#8217;s another post&#8230;)</li>
</ul>
<p>The quickest path to getting the system up and running resulted in lots of variables in the equation changing:</p>
<ul>
<li>Linux -> OpenSolaris (snv_95 currently)</li>
<li>MySQL 5.0 -> MySQL 5.1</li>
<li>LVM2 + ext3 -> ZFS</li>
<li>Hardware RAID -> Software RAID</li>
<li>No compression -> gzip9 volume compression</li>
</ul>
<p>Whew!  Lots of changes.  Let me break them down one by one, skipping the obvious first one:</p>
<p><strong>MySQL</strong> &#8211; <a href="http://dev.mysql.com/downloads/mysql/5.1.html">MySQL 5.1</a> is nearing GA, and has a couple of very important bug fixes for us that we&#8217;ve been working around for an awfully long time now.  When I downloaded the MySQL 5.0 Enterprise Solaris packages and they wouldn&#8217;t install properly, that made the decision to dabble with 5.1 even easier &#8211; the <a href="http://cooltools.sunsource.net/coolstack/">CoolStack 5.1</a> binaries from Sun installed just fine.  <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   </p>
<p>Going to MySQL 5.1 on a ~1TB DB is painful, though, I should warn you up front.  It forced &#8216;REPAIR TABLE&#8217; on lots of my tables, so this step took much longer than I expected.  Also, we found that the query optimizer in some cases did a poor job of choosing which indexes to use for queries.  A few &#8220;simple&#8221; SELECTs (no JOINs or anything) that would take a few milliseconds on our 5.0 boxes took seconds on our 5.1 boxes.  A little bit of code solved the problem and resulted in better efficiency even for the 5.0 boxes, so it was a net win, but painful for a few hours while I tracked it down.  </p>
<p>Finally, after running CoolStack for a few days, we switched (on advice from Sun) to the 5.1.28 Community Edition to <a href="http://blogs.sun.com/realneel/entry/peeling_the_mysql_scalability_onion">fix some scalability issues</a>.  This made a <strong>huge</strong> difference so I highly recommend it.  (On a side note, I wish MySQL provided Enterprise binaries for 5.1 for their paying customers to test with).  The <a href="http://mysqlha.blogspot.com/2008/09/more-patches-than-we-know-what-to-do.html">Google &#038; Percona patches</a> should make a monster difference, too.</p>
<p><strong>Volume management and the filesystem</strong> &#8211; There&#8217;s some debate online as to whether ZFS is a &#8220;layering violation&#8221; or not.  I could care less &#8211; it&#8217;s pure heaven to work with.  This is how filesystems should have always been.   The commands to create, manage, and extend pools are so simple and logical you basically don&#8217;t even need man pages (discovering disk names, on the other hand, isn&#8217;t easy.  I finally used &#8216;format&#8217; but even typing it gives me the shivers&#8230;).
<pre class="codebox"><code>zpool create MYPOOL c0t0d0</code></pre>
<p>You just created a ZFS pool.  Want a mirror?
<pre class="codebox"><code>zpool create MYPOOL mirror c0t0d0 c0t0d1</code></pre>
<p>Want a striped mirror (RAID-1+0) w/spare?
<pre class="codebox"><code>zpool create MYPOOL mirror c0t0d0 c0t0d1 mirror c0t0d2 c0t0d3 spare c0t0d4</code></pre>
<p>Want to add another mirror to an already striped mirror (RAID-1+0) pool?
<pre class="codebox"><code>zpool add MYPOOL mirror c0t0d5 c0t0d6</code></pre>
<p>Get the idea?  Super-easy.  Massively easier than LVM2+ext3 where adding a mirror is  at least 4 commands: pvcreate, vgextend, lvextend, resize2fs &#8211; usually with an fsck in there too.</p>
<p><strong>Software RAID</strong> &#8211; This is something we&#8217;ve been itching for for quite some time.  With modern system architectures and modern CPUs, there&#8217;s no real reason &#8220;storage&#8221; should be separate from &#8220;servers&#8221;.  A storage device should be just a server with some open-source software and lots of disks.  (The &#8220;open source&#8221; part is important.  I&#8217;m sick of relying on closed-source RAID firmware).  The amount of flexibility, performance, reliability and operational cost savings you can achieve with software RAID rather than hardware is enormous.  With real datacenter-grade flash storage devices just around the corner, this becomes even more vital.  ZFS makes all of this stuff Just Work, including properly adjusting the write caches on the disk, eliminating the RAID-5 write hole, etc.  Our first box still has a battery-backed write-cache between the disks and the CPU for write performance, but all the disks are just exposed as JBOD and striped + mirrored using ZFS.  It rocks.</p>
<p><strong>Compression</strong> &#8211; Ok, so this is where the geek in me decided to get a little crazy.  ZFS allows you to turn on (and off) a variety of compression mechanisms on-the-fly on your pool.  This comes with some unknown (depends on lots of factors, including your workload, CPUs, etc) performance penalty (CPU is required to compress/decompress), but can have performance upsides too (smaller reads and writes = less busy disk).  </p>
<p>InnoDB is notoriously bad at disk usage (we see 2X+ space usage using InnoDB) and while it&#8217;s not an enormous concern, it&#8217;d be something nice to curtail.  On most of our DB boxes, we have idle CPU around (we&#8217;re not really I/O bound either &#8211; MySQL is a strange duck in that you can be concurrency bound without being either CPU or I/O bound fairly easily thanks to poor locking), so I figured I&#8217;d go wild and give it a shot.  </p>
<p>Lo and behold, it worked!  We&#8217;re getting a 2.12X compression ratio on our DB, and performance is keeping up just fine.  I ran some quick performance tests on large linear reads/writes and we were measuring 45.6MB/s sustained uncompression and 39MB/s sustained compression on a single-threaded app on an Opteron CPU.  We&#8217;ll probably continue to test compression stuff, and of course if we run into performance bottlenecks, we&#8217;ll turn it off immediately, but so far the mad science experiment is working.</p>
<p><strong>Configuration</strong></p>
<p>Configuring everything was relatively painless.  I bounced a few questions off of Sun (imho, this is where Sun really shines &#8211; they listen to their customers and put technical people with real answers within arms reach) and read the <a href="http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide">Evil Tuning Guide to ZFS</a>.  In the end I really only ended up tweaking two things (plus setting compression to gzip-9):</p>
<ul>
<li>I set the recordsize to match InnoDB&#8217;s &#8211; 16KB.
<pre class="codebox"><code>zfs set recordsize=16K MYPOOL</code></pre>
</li>
<li>I turned off file-level prefetching.  See the Evil Tuning Guide.  (I&#8217;m testing with this on, now, and so far it seems fine).</li>
</ul>
<p>I believe since ZFS is fully checksummed and transactional (so partial writes never occur) I can disable InnoDB&#8217;s doublewrite buffer.  I haven&#8217;t been brave enough to do this yet, but I plan to.  I like performance.  <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><strong>Performance</strong></p>
<p>This box has been in production in our most important DB cluster for two weeks now.  On the metrics I care about (replication lag, query performance, CPU utliization, etc) it&#8217;s pulling its fair share of the read load and keeping completely up on replication.  Just eyeballing the stats (we haven&#8217;t had time to number crunch comparison stats, though we gave some to Sun that I&#8217;m hoping they crunch), I can&#8217;t tell a difference between this slave and any of the others in the cluster running Linux.  I sure feel a lot better about the data integrity, though.</p>
<p><strong>Why not [insert other OS here]?</strong></p>
<p>We could have gone with <a href="http://www.nexenta.org/os">Nexenta</a>, FreeBSD, Mac OS X, or even *gulp* tried <a href="http://zfs-on-fuse.blogspot.com/">ZFS on FUSE/Linux</a>.  To be honest, Nexenta is the most interesting because it actually *is* the Solaris kernel plus Linux userland, exactly what I wanted.  I&#8217;ve played with it a tiny bit, and plan to play with it more, but this is a mission-critical chunk of data we&#8217;re dealing with, so I need a company like Sun in my corner.  I find myself wishing Sun had taken the Nexenta route (or offered support for it that I could buy or something).  Instead, we&#8217;ll be buying software service &#038; support from Sun for this and any other mission-critical OpenSolaris boxes.</p>
<p>FreeBSD also doesn&#8217;t have the support I need, Mac OS X wasn&#8217;t performant enough the last time I fiddled with it as a server, and most FUSE filesystems are slow so I didn&#8217;t even bother.  </p>
<p><strong>Gotchas</strong></p>
<ul>
<li>On my 64GB Linux boxes, I give InnoDB 54GB of buffer pool size.  With otherwise exactly the same my.cnf settings, MySQL on OpenSolaris crashes with anything more than 40GB.  14GB, or 21.9% of my RAM, that I can&#8217;t seem to use effectively.  Sun is looking into this, I&#8217;ll let you know if I find anything out.</li>
<li>For a Linux geek, OpenSolaris userland is still painful.  Bear in mind that this is a single-purpose box, so all I really want to do is install and configure MySQL, then monitor the software and hardware.  If this were a developer box, I would have already given up.  OpenSolaris is still very early, so I&#8217;m still hopeful, but be prepared to invest some time.  Some of my biggest peeves:
<ul>
<li>Common commands, like &#8216;ps&#8217;, have very different flags.</li>
<li>Some GNU bins are provided in /usr/gnu/bin &#8211; but a better &#8216;ps&#8217; is missing, as is &#8216;top&#8217; (no, &#8216;prstat&#8217; is *not* the same!), &#8217;screen&#8217;, etc (Can anyone even use remote command-line Unix boxes without &#8217;screen&#8217;?  If so, how?)</li>
<li>Packages are crazily named, making finding your stuff to install tough.  Like instead of Apache being called &#8216;apache&#8217; or &#8216;httpd&#8217;, it&#8217;s called &#8216;SUNWapch&#8217;.  What?</li>
<li>After finally figuring out how to search for packages to get the names (&#8216;pkg search -r Apache&#8217; &#8211; which doesn&#8217;t provide pleasant results), I discovered that &#8216;top&#8217; and &#8217;screen&#8217; just simply aren&#8217;t provided (or they&#8217;re named even worse than I thought).  Instead, I had to go to a 3rd party repository, <a href="http://www.blastwave.org/">BlastWave</a>, to get them.  And then, of course, the &#8216;top&#8217; OpenSolaris package wouldn&#8217;t actually install and I had to manually break into the package and extract the binary.  Ugh.</li>
</ul>
</li>
</ul>
<p>Whew!  Big post, but there was a lot of ground to cover.  I&#8217;m sure there are questions, so please post in the comments and I&#8217;ll try to do a follow-up.  As I fiddle, tweak, and change things I&#8217;ll try to post updates, too &#8211; but no promises.  <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><strong>UPDATE:</strong>  One other gotcha I forgot to mention.  When MySQL (or, presumably, anything else running on the box) gets really busy, user interactivity evaporates on OpenSolaris.  Just hitting enter or any other key at a bash prompt over SSH can take many seconds to register.  I remember when Linux had these sort of issues in the past, but had blissfully forgotten about them. </p>
<p><strong>UPDATE:</strong> I went <a href="http://blogs.smugmug.com/don/2008/10/13/zfs-mysqlinnodb-compression-update/">more in depth on ZFS compression</a> testing and blogged the results.  Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.smugmug.com/don/2008/10/10/success-with-opensolaris-zfs-mysql-in-production/feed/</wfw:commentRss>
		<slash:comments>81</slash:comments>
		</item>
		<item>
		<title>SkyNet Lives! (aka EC2 @ SmugMug)</title>
		<link>http://blogs.smugmug.com/don/2008/06/03/skynet-lives-aka-ec2-smugmug/</link>
		<comments>http://blogs.smugmug.com/don/2008/06/03/skynet-lives-aka-ec2-smugmug/#comments</comments>
		<pubDate>Tue, 03 Jun 2008 16:28:56 +0000</pubDate>
		<dc:creator>Don MacAskill</dc:creator>
				<category><![CDATA[amazon]]></category>
		<category><![CDATA[datacenter]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[photo processing]]></category>
		<category><![CDATA[rubberband]]></category>
		<category><![CDATA[s3]]></category>
		<category><![CDATA[skynet]]></category>
		<category><![CDATA[smugmug]]></category>
		<category><![CDATA[sqs]]></category>
		<category><![CDATA[video rendering]]></category>

		<guid isPermaLink="false">http://blogs.smugmug.com/don/?p=219</guid>
		<description><![CDATA[
Everyone knows that SmugMug is a heavy user of S3, storing well over half a petabyte of data (non-replicated) there.  What you may not know is that EC2 provides a core part of our infrastructure, too.  Thanks to Amazon, the software and hardware that processes all of your high-resolution photos and high-definition video [...]]]></description>
			<content:encoded><![CDATA[<div class="center"><img class="photo" src="http://don.smugmug.com/photos/305993656_rDF8S-M.jpg" alt="SkyNet Lives - EC2 at SmugMug" /></div>
<p>Everyone knows that <a href="http://www.smugmug.com/">SmugMug</a> is a <a href="http://www.google.com/search?q=site:blogs.smugmug.com+%22Amazon+S3%22">heavy user of S3</a>, storing well over half a petabyte of data (non-replicated) there.  What you may not know is that <a href="http://ec2.amazonaws.com/">EC2</a> provides a core part of our infrastructure, too.  Thanks to Amazon, the software and hardware that processes all of your high-resolution photos and <a href="http://blogs.smugmug.com/don/2008/04/25/i-demand-video-to-be-awesome/">high-definition video</a> is totally scalable without any human intervention.  And when I say scalable, I mean both up and down, just the way it should be.  Here&#8217;s our approach in a nutshell:</p>
<p><strong>OVERVIEW</strong></p>
<p>The architecture basically consists of three software components:  the rendering workers, the batch queuing piece, and the controller.  The rendering workers live on EC2, and both the queuing piece and the controller live at SmugMug.  We don&#8217;t use <a href="http://www.amazon.com/Simple-Queue-Service-home-page/b/ref=sc_fe_l_2?ie=UTF8&amp;node=13584001&amp;no=3435361&amp;me=A36L942TSJ2AJA">SQS</a> for our queuing mechanism for a few reasons:</p>
<ul>
<li>We&#8217;d already built a queuing mechanism years ago, and it hasn&#8217;t (yet?) hit any performance or reliability bottlenecks.</li>
<li>SQS&#8217;s pricing used to be outta whack for what we needed.  They&#8217;ve since dramatically lowered the pricing and it&#8217;s now much more in line with what we&#8217;d expect &#8211; but by then, we were done.</li>
<li>The controller consumes historical data to make smart decisions, and our existing queuing system was slightly easier to generate the historical data from.</li>
</ul>
<p><strong>RENDER WORKERS</strong></p>
<p>Our render workers are totally &#8220;dumb&#8221;.  They&#8217;re literally bare-bones CentOS 5 AMIs (you can build your own, or use <a href="http://blog.rightscale.com/2007/10/23/64-bit-centos5-amazon-ec2-image-release/">RightScale&#8217;s</a>, or whatever you&#8217;d like) with a single extra script on them which is executed from /etc/rc.d/rc.local.  What does that script do?  It fetches intelligence.  <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>When that script executes, it sends an authenticated request to get a software bundle, extracts the bundle, and starts the software inside.  That&#8217;s it.  Further, the software inside the bundle is self-aware and self-updating, too, automatically fetching updated software, terminating older versions, and relaunching itself.  This makes it super-simple to push out new SmugMug software releases &#8211; no bundling up new AMIs and testing them or anything else that&#8217;s messy.  Simply update the software bundle on our servers and all of the render workers automatically get the new release within seconds.</p>
<p>Of course, worker instances might have different roles or be assigned to work with different SmugMug clusters (test vs production, for example), so we have to be able to give it instructions at launch.  We do this through the &#8220;user-data&#8221; launch parameter you can specify for EC2 instances &#8211; they give the software all the details needed to choose a role, get software, and launch it.   Reading the user-data couldn&#8217;t be easier.  If you haven&#8217;t done it before, just fetch <em>http://169.254.169.254/latest/user-data</em> from your running instance and parse it.</p>
<p>Once they&#8217;re up and running, they simply ping the queue service with a &#8220;Hi, I&#8217;m looking for work.  Do you have any?&#8221; request, and the queue service either supplies them with work or gives them some other directive (shutdown, software update, take a short nap, etc).  Once a job is done (or generated an error), the worker stores the work result on S3 and notifies the queue service that the job is done and asks for more work.  Simple.</p>
<p><strong>QUEUE SERVICE</strong></p>
<p>This is your basic queuing service, probably very similar to any other queueing service you&#8217;ve seen before.  Ours supports job types (new upload, rotate, watermark, etc) and priorities (<a href="http://www.smugmug.com/pro/">Pros</a> go to the head of the line, etc) as well as other details.  Upon completion, it also logs historical data such as time to completion.  It also supports time-based re-queueing in the event of a worker outage, miscommunication, error, or whatever.  I haven&#8217;t taken a really hard look at SQS in quite some time, but I can&#8217;t imagine it would be very difficult to implement on SQS for those of you starting fresh.</p>
<p><strong>CONTROLLER (aka SkyNet)</strong></p>
<p>For me, this was the fun part.  Initially we called it RubberBand, but we had an ususual partial outage one day which caused it to go berzerk and launch ~250 XL instances (~2000 normal EC2 instances) in a single call.  Clearly, it had gained sentience and was trying to take over the world, so we renamed it SkyNet.  (We&#8217;ve since corrected the problem, and given SkyNet more reasonable thresholds and limits.  And yes, I caught it within the hour.).</p>
<p><strong>SkyNet is completely autonomous</strong> &#8211; it operates with with zero human interaction, either watching or providing interactive guidance.  No-one at SmugMug even pays attention to it anymore (and we haven&#8217;t for many months) since it operates so efficiently.  (Yes, I realize that means it&#8217;s probably well on its way to world domination.  Sorry in advance to everyone killed in the forthcoming man-machine war.)</p>
<p>Roughly once per minute, SkyNet makes an EC2 decision:  launch instance(s), terminate instance(s), or sleep.  It has a lot of inputs &#8211; it checks anywhere from 30-50 pieces of data to make an informed decision.  One of the reasons for that is we have a variety of different jobs coming in, some of which (uploads) are semi-predictable.  We know that lots of uploads come in every Sunday evening, for example, so we can begin our prediction model there.  Other jobs, though, such as watermarking an entire gallery of 10,000 photos with a single click, aren&#8217;t predictable in a useful way, and we can only respond once the load hits the queue.</p>
<p>A few of the data points SkyNet looks at are:</p>
<ul>
<li>How many jobs are pending?</li>
<li>What&#8217;s the priority of the jobs?</li>
<li>What type of jobs are they?</li>
<li>How complex are the pending jobs? (ex: <a href="http://blogs.smugmug.com/don/2008/04/25/i-demand-video-to-be-awesome/">HD video</a> vs 1Mpix photo)</li>
<li>How time-sensitive are the pending jobs? (ex: Uploads vs rotations)</li>
<li>Current load of the EC2 cluster</li>
<li>Current # of jobs per sample processed</li>
<li>Average time per job per sample</li>
<li>Historical load and job performance</li>
<li>How close any instances are to the end of their 1-hour cost window</li>
<li>Recent SkyNet actions (start/terminate/etc)</li>
</ul>
<p>.. and the list goes on.</p>
<p>Our goal is to keep enough slack around to handle surges of unpredictable batch operations, but not enough so it drains our bank account.  We&#8217;ve settled on an average of roughly 25% of excess compute capacity available when averaged over a full 24 hour period and SkyNet keeps us remarkably close to that number.  We always err on the side of more excess (so we get faster processing times) rather than less when we have to make a decision.  It&#8217;s great to save a few bucks here and there that we can plow back into better customer service or a new feature &#8211; but not if photo uploads aren&#8217;t processing, consistently, within 5-30 seconds of upload.</p>
<div class="center"><img class="photo" src="http://don.smugmug.com/photos/306159777_WjVbH-L.jpg" alt="SkyNet Lives - EC2 at SmugMug" /></div>
<p>Our workers like lots of threads, so SkyNet does its best to launch c1.xlarge instances (Amazon calls these &#8220;<a href="http://aws.typepad.com/aws/2008/05/more-ec2-power.html">High-CPU Instances</a>&#8220;), but is smart enough to request equivalent other instance sizes (2 x Large, 8 x Small, etc) in the event it can&#8217;t allocate as many c1.xlarge instances as it would like.  Our application doesn&#8217;t care how big/small the instances are, just that we get lots of CPU cores in aggregate.  (We were in the Beta for the High-CPU feature, so we&#8217;ve been using it for months).</p>
<p>One interesting thing we had to take into account when writing SkyNet was the EC2 startup lag.  Don&#8217;t get me wrong &#8211; I think EC2 starts up reasonably fast (~5 mins max, usually less), but when SkyNet is making a decision every minute, that means you could launch too many instances if you don&#8217;t take recent actions into account to cover startup lag (and, conversely, you need to start instances a little earlier than you might actually need them otherwise you get behind).</p>
<p><strong>THE MONEY</strong></p>
<p>SmugMug is a profitable business, and we like to keep it that way.  The secrets to efficiently using EC2, at least in our use case, are as follows:</p>
<ul>
<li>Take advantage of the free S3 transfers.  This is a biggy.  Our workers get and put almost all of their bytes to/from S3.</li>
<li>Make sure you have scaling down working as well as scaling up.  At 3am on an average Wednesday morning, we have very few instances running.</li>
<li>Use the new High-CPU Instances.  Twice the CPU resources for the same $$ if you don&#8217;t need RAM.</li>
<li>Amazon kindly gives you 30 days to monetize your AWS expenses.  Use those 30 days wisely &#8211; generate revenues.  <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
</ul>
<p><strong>WHY NO WEB SERVERS?</strong></p>
<p>I get asked this question a lot, and it really comes down to two issues, one major and one minor:</p>
<ul>
<li>No complete DB solution.  <a href="http://www.amazon.com/SimpleDB-AWS-Service-Pricing/b/ref=sc_fe_l_2?ie=UTF8&#038;node=342335011&#038;no=3435361&#038;me=A36L942TSJ2AJA">SimpleDB</a> is interesting, and the new <a href="http://www.allthingsdistributed.com/2008/04/persistent_storage_for_amazon.html">EC2 Persistent Storage</a> is too, but neither provides a complete solution for us.  EC2 storage isn&#8217;t performant enough without some serious, painful partitioning to a finer grain than we do now &#8211; which comes with its own set of challenges, and SimpleDB both isn&#8217;t performant enough and doesn&#8217;t address all of our use cases.  Since latency to our DBs matters a great deal to our web servers, this is a deal-killer &#8211; I can&#8217;t have EC2 web servers talking to DBs in my datacenters. (There are a few corner cases we&#8217;re exploring where we probably can, but they&#8217;re the exception &#8211; not the rule).</li>
<li>No load balancing API.  They&#8217;ve got an IP address solution in the form of Elastic IPs, which is awesome and major step forward, but they don&#8217;t have a simple Load Balancer API that I can throw my web boxes behind.  Yes, I realize I can manually do it using EC2 instances, but that&#8217;s more fragile and difficult (and has unknown scaling properties at our scale).  If the DB issue were solved, I&#8217;d probably dig into this and figure out how to do it ourselves &#8211; but since it&#8217;s not, I can keep asking for this in the meantime.</li>
</ul>
<p>Let me be very clear here:  <strong>I really don&#8217;t want to operate datacenters anymore</strong> despite the fact that we&#8217;re pretty good at it.  It&#8217;s a necessary evil because we&#8217;re an Internet company, but our mission is to be the best photo sharing site.  We&#8217;d rather spend our time giving our customers great service and writing great software rather than managing physical hardware.  I&#8217;d rather have my awesome Ops team interacting with software remotely for 100% of their duties (and mostly just watching software like SkyNet do its thing).  We&#8217;ll get there &#8211; I&#8217;m confident of that &#8211; we&#8217;re just not there yet.</p>
<p>Until then, we&#8217;ll remain a hybrid approach.<br />
<iframe src='http://digg.com/api/diggthis.php?u=http%3A//digg.com/software/SkyNet_Lives_aka_EC2_SmugMug' height='82' width='55' frameborder='0' scrolling='no'></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.smugmug.com/don/2008/06/03/skynet-lives-aka-ec2-smugmug/feed/</wfw:commentRss>
		<slash:comments>53</slash:comments>
		</item>
		<item>
		<title>MySQL and the Linux swap problem</title>
		<link>http://blogs.smugmug.com/don/2008/05/01/mysql-and-the-linux-swap-problem/</link>
		<comments>http://blogs.smugmug.com/don/2008/05/01/mysql-and-the-linux-swap-problem/#comments</comments>
		<pubDate>Fri, 02 May 2008 01:25:05 +0000</pubDate>
		<dc:creator>Don MacAskill</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[datacenter]]></category>
		<category><![CDATA[Innodb]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[OOM]]></category>
		<category><![CDATA[percona]]></category>
		<category><![CDATA[RAM]]></category>
		<category><![CDATA[swap]]></category>

		<guid isPermaLink="false">http://blogs.smugmug.com/don/?p=217</guid>
		<description><![CDATA[Ever since Peter over at Percona wrote about MySQL and swap, I&#8217;ve been meaning to write this post.  But after I saw Dathan Pattishall&#8217;s post on the subject, I knew I&#8217;d better actually do it.   
There&#8217;s a nasty problem with Linux 2.6 even when you have a ton of RAM.  No [...]]]></description>
			<content:encoded><![CDATA[<p>Ever since Peter over at <a href="http://www.percona.com/">Percona</a> wrote about <a href="http://www.mysqlperformanceblog.com/2008/04/06/should-you-have-your-swap-file-enabled-while-running-mysql/">MySQL and swap</a>, I&#8217;ve been meaning to write this post.  But after I saw <a href="http://mysqldba.blogspot.com/2008/05/linux-64-bit-mysql-swap-and-memory.html">Dathan Pattishall&#8217;s post on the subject</a>, I knew I&#8217;d better actually do it.  <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>There&#8217;s a nasty problem with Linux 2.6 even when you have a ton of RAM.  No matter what you do, including setting /proc/sys/vm/swappiness = 0, your OS is going to prefer swapping stuff out rather than freeing up system cache.  On a single-use machine, where the application is better at utilizing RAM than the system is, this is incredibly stupid.  Our MySQL boxes are a perfect example &#8211; they run only MySQL and we want InnoDB to have a lot of RAM (32-64GB  &#8230; and we&#8217;re testing 128GB).</p>
<p>You can&#8217;t just not have any swap partitions, though, or kswapd will literally dominate one of your CPU cores doing who-knows-what.  But you can&#8217;t have it swapping to disk, or your performance goes into the toilet.  So what to do?</p>
<p><strong>Our solution is to make swap partitions out of RAM disks.</strong>  Yes, I realize how insane that sounds, but the Linux kernel&#8217;s insanity drove us to it.  Best part?  It works.  Here&#8217;s how:</p>
<pre class="codebox"><code>mkdir /mnt/ram0
mkfs.ext3 -m 0 /dev/ram0
mount /dev/ram0 /mnt/ram0
dd bs=1024 count=14634 if=/dev/zero of=/mnt/ram0/swapfile
mkswap /mnt/ram0/swapfile
swapon /mnt/ram0/swapfile</code></pre>
<p>That&#8217;ll give you a 14MB swap partition that&#8217;s actually in RAM, so it&#8217;s super-fast.  This assumes your kernel is creating 16MB ramdisk partitions, but you can adjust your kernel paramenters and/or the &#8216;dd&#8217; line above to suit whatever size you want.</p>
<p>We&#8217;ve found that anywhere from 20MB-40MB tends to be enough (so use /dev/ram1, /dev/ram2, etc), depending on load of the box.  kswapd no longer uses any noticeable CPU, there&#8217;s always a few MB of free &#8220;swap&#8221;, and life is back in the fast lane.  Just add those lines to your relevant startup file, like /etc/rc.d/rc.local, and it&#8217;ll persist after reboots.</p>
<p>Some Linux purists will probably hate this approach, others may have more efficient ways of achieving the same thing, but this works for us.  Give it a shot.  <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Oh, and I hope it goes without saying, but make *darn* sure you know what you&#8217;re running on your box and what the maximum RAM footprint will be before you try running with only 20-40MB of swap.  We&#8217;ve never OOMed (Out-Of-Memory) a production MySQL box &#8211; but that&#8217;s because we&#8217;re careful.</p>
<p><strong>UPDATE:</strong> See what happens when I wait to blog?  I forget that I read another <a href="http://feedblog.org/2007/09/29/using-o_direct-on-linux-and-innodb-to-fix-swap-insanity/">related post over on Kevin Burton&#8217;s blog</a>.  Like Kevin, we&#8217;re using O_DIRECT, but unlike Kevin, this doesn&#8217;t solve the problem for us.  Linux still swaps.  We use the latest  2.6.18-53.1.14.el5 kernel from CentOS 5, btw. (Sorry, had posted 2.6.9 because I was dumb.  We&#8217;re fully patched)</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.smugmug.com/don/2008/05/01/mysql-and-the-linux-swap-problem/feed/</wfw:commentRss>
		<slash:comments>44</slash:comments>
		</item>
		<item>
		<title>New Amazon Features:  Status Dashboard &amp; Paid Service</title>
		<link>http://blogs.smugmug.com/don/2008/04/17/new-amazon-features-status-dashboard-paid-service/</link>
		<comments>http://blogs.smugmug.com/don/2008/04/17/new-amazon-features-status-dashboard-paid-service/#comments</comments>
		<pubDate>Thu, 17 Apr 2008 16:33:27 +0000</pubDate>
		<dc:creator>Don MacAskill</dc:creator>
				<category><![CDATA[amazon]]></category>
		<category><![CDATA[datacenter]]></category>
		<category><![CDATA[block storage]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[s3]]></category>
		<category><![CDATA[web services]]></category>

		<guid isPermaLink="false">http://blogs.smugmug.com/don/?p=214</guid>
		<description><![CDATA[I realize I&#8217;m already way behind blogging about other new Amazon Web Services features like the recent EC2 release with static IPs, availability zones, and user kernels not to mention the new block storage service.  I&#8217;ll still try to get to them &#8211; but I didn&#8217;t want to wait for this one.
I&#8217;ve been pushing Amazon [...]]]></description>
			<content:encoded><![CDATA[<p>I realize I&#8217;m already way behind blogging about other new Amazon Web Services features like the recent <a href="http://aws.typepad.com/aws/2008/03/new-ec2-feature.html">EC2 release with static IPs, availability zones, and user kernels</a> not to mention the <a href="http://aws.typepad.com/aws/2008/04/block-to-the-fu.html">new block storage service</a>.  I&#8217;ll still try to get to them &#8211; but I didn&#8217;t want to wait for this one.</p>
<p>I&#8217;ve been pushing Amazon hard to do something like this, and I&#8217;m thrilled it&#8217;s finally out.  They have a great new <a href="http://status.aws.amazon.com/">service status dashboard</a> complete with historical data and a mechanism for communicating to us, their customers, about any issues they may be having.  Especially cool is that the data is provided via RSS, so you can programmatically poll the status and take steps as necessary.  Awesome!  Get <a href="http://aws.typepad.com/aws/2008/04/the-service-hea.html">all the details here</a>.</p>
<p>One possible gotcha is that it looks like the dashboard is hosted at Amazon.  We&#8217;ve run into outages (very rare) where all of <a href="http://www.amazon.com/">amazon.com</a> is down.  In those cases, it&#8217;d be nice to have an externally-hosted site where they could post updates.  Our customers asked us for this recently, so on January 29th, <a href="http://smugmug.wordpress.com/">we were happy to comply</a>.  Perhaps Amazon could post to <a href="http://aws.typepad.com/">their TypePad blog</a> in events like these, rare as they may be?</p>
<p>Next, they now offer <a href="http://aws.typepad.com/aws/2008/04/may-we-help-you.html">paid premium support</a>.  Need some sort of help that&#8217;s not provided on the <a href="http://aws.amazon.com/forums">AWS forums</a> or via searching Google?  No worries &#8211; whip out your credit card and pay for it.  Looks like they have two plans which should cover lots of use cases I&#8217;ve seen in my own comments and on the forums.</p>
<p>I&#8217;d still like to see a pay-per-incident model, personally, even with an extremely high price-tag for each incident.  We rarely use support for AWS, but at the same time, we&#8217;re very big customers of theirs, so the monthly price is quite high.  But if we really come up against a big problem, it&#8217;d be nice to know I could pay for support just that one time.  I imagine most of their customers will like their Silver and Gold monthly  packages, but for us, they&#8217;re just not quite the right fit.  Do they work for you? </p>
<p>I&#8217;m pretty thrilled about this release, but maybe our use case is different from yours.  Do you like these new features?  Are they missing things you&#8217;d like to see?</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.smugmug.com/don/2008/04/17/new-amazon-features-status-dashboard-paid-service/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>
