<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Silent data corruption on AMD servers</title>
	<atom:link href="http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/</link>
	<description>Thought stream from SmugMug's CEO &#38; Chief Geek</description>
	<lastBuildDate>Tue, 24 Nov 2009 10:26:00 -0800</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9-rare</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Chris</title>
		<link>http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/comment-page-1/#comment-66186</link>
		<dc:creator>Chris</dc:creator>
		<pubDate>Fri, 07 Sep 2007 20:07:01 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/#comment-66186</guid>
		<description>Hey Don,
Just a heads up looks like that first Kernel.org link is broken.  I realize this is a month+ late though :-P.

BTW, just started reading random parts of this blog a few weeks ago, and I have say thanks.  Your incite and experience sharing is priceless and I wish more companies were as open as you are regarding hardware/scaling/experiences.  Keep it up.</description>
		<content:encoded><![CDATA[<p>Hey Don,<br />
Just a heads up looks like that first Kernel.org link is broken.  I realize this is a month+ late though <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_razz.gif' alt=':-P' class='wp-smiley' /> .</p>
<p>BTW, just started reading random parts of this blog a few weeks ago, and I have say thanks.  Your incite and experience sharing is priceless and I wish more companies were as open as you are regarding hardware/scaling/experiences.  Keep it up.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Somewhere out there! &#187; Blog Archive &#187; Silent data corruption on AMD servers with 4G+ RAM</title>
		<link>http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/comment-page-1/#comment-57551</link>
		<dc:creator>Somewhere out there! &#187; Blog Archive &#187; Silent data corruption on AMD servers with 4G+ RAM</dc:creator>
		<pubDate>Fri, 27 Jul 2007 23:05:24 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/#comment-57551</guid>
		<description>[...] From smugmug. [...]</description>
		<content:encoded><![CDATA[<p>[...] From smugmug. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Don MacAskill</title>
		<link>http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/comment-page-1/#comment-57404</link>
		<dc:creator>Don MacAskill</dc:creator>
		<pubDate>Fri, 27 Jul 2007 08:02:09 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/#comment-57404</guid>
		<description>@Jeff:

Thanks for the insight - that&#039;s what I was thinking, too.</description>
		<content:encoded><![CDATA[<p>@Jeff:</p>
<p>Thanks for the insight &#8211; that&#8217;s what I was thinking, too.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Bonwick</title>
		<link>http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/comment-page-1/#comment-57398</link>
		<dc:creator>Jeff Bonwick</dc:creator>
		<pubDate>Fri, 27 Jul 2007 07:29:07 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/#comment-57398</guid>
		<description>Hey Don -- I don&#039;t know enough about the bug and the BIOS/Linux interaction to say for sure whether ZFS would save you here.  If, as the bug report implies, it&#039;s related to the iommu, that would suggest that the data is corrupted during DMA.  In which case, ZFS would detect it on the next read; and if you were running with mirrors or RAID-Z, ZFS would correct it as well.

Wout is right that if you get silent in-memory corruption *before* the data is written to disk, we currently have no way to detect that.  We&#039;ve considered adding an option to keep in-memory buffers checksummed and verify them before any modification and before any disk write.  It would be insanely expensive, of course, but could come in handy when trying to track down broken hardware.</description>
		<content:encoded><![CDATA[<p>Hey Don &#8212; I don&#8217;t know enough about the bug and the BIOS/Linux interaction to say for sure whether ZFS would save you here.  If, as the bug report implies, it&#8217;s related to the iommu, that would suggest that the data is corrupted during DMA.  In which case, ZFS would detect it on the next read; and if you were running with mirrors or RAID-Z, ZFS would correct it as well.</p>
<p>Wout is right that if you get silent in-memory corruption *before* the data is written to disk, we currently have no way to detect that.  We&#8217;ve considered adding an option to keep in-memory buffers checksummed and verify them before any modification and before any disk write.  It would be insanely expensive, of course, but could come in handy when trying to track down broken hardware.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Don MacAskill</title>
		<link>http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/comment-page-1/#comment-57305</link>
		<dc:creator>Don MacAskill</dc:creator>
		<pubDate>Thu, 26 Jul 2007 16:28:32 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/#comment-57305</guid>
		<description>@Matt:

No, we&#039;re running Linux on our Sun boxes.  CentOS5 to be exact.

But ZFS has us perpetually curious about Solaris.  :)</description>
		<content:encoded><![CDATA[<p>@Matt:</p>
<p>No, we&#8217;re running Linux on our Sun boxes.  CentOS5 to be exact.</p>
<p>But ZFS has us perpetually curious about Solaris.  <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matt Culbreth</title>
		<link>http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/comment-page-1/#comment-57291</link>
		<dc:creator>Matt Culbreth</dc:creator>
		<pubDate>Thu, 26 Jul 2007 15:31:44 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/#comment-57291</guid>
		<description>Don,

I&#039;m assuming you&#039;re running Solaris on your Sun boxes, not Linux?

P.S.--Great blog.  I&#039;m trying out a Sun box here soon mostly on your blog&#039;s recommendation.</description>
		<content:encoded><![CDATA[<p>Don,</p>
<p>I&#8217;m assuming you&#8217;re running Solaris on your Sun boxes, not Linux?</p>
<p>P.S.&#8211;Great blog.  I&#8217;m trying out a Sun box here soon mostly on your blog&#8217;s recommendation.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wout</title>
		<link>http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/comment-page-1/#comment-57258</link>
		<dc:creator>Wout</dc:creator>
		<pubDate>Thu, 26 Jul 2007 08:42:56 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.smugmug.com/don/2007/07/25/silent-data-corruption-on-amd-servers/#comment-57258</guid>
		<description>Well, from the bugs it seems that data read from the disk is not what was stored. So the system memory is not affected?

ZFS adds the checksum before the write. As soon as you hand off a block of data to the ZFS subsystem, it&#039;s protected. Anything that happens at a lower level, like disk errors, wire problems, controller issues etc will be detected.

If the data you give to ZFS is wrong though, there&#039;s nothing that can help you :-) ZFS will simply hand you back the same wrong data when you read it.

So if this bug silently overwrites main memory, ZFS might not be in a position to help.</description>
		<content:encoded><![CDATA[<p>Well, from the bugs it seems that data read from the disk is not what was stored. So the system memory is not affected?</p>
<p>ZFS adds the checksum before the write. As soon as you hand off a block of data to the ZFS subsystem, it&#8217;s protected. Anything that happens at a lower level, like disk errors, wire problems, controller issues etc will be detected.</p>
<p>If the data you give to ZFS is wrong though, there&#8217;s nothing that can help you <img src='http://blogs.smugmug.com/don/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  ZFS will simply hand you back the same wrong data when you read it.</p>
<p>So if this bug silently overwrites main memory, ZFS might not be in a position to help.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
