Archive for the ‘amazon’ Category

Amazon S3 SLA is here! (Nirvanix dies?)

Monday, October 8th, 2007

Amazon has finally released and put into effect their SLA for S3. I know a lot of my readers will be thrilled about this. :)

I’ve gotten a few questions about Nirvanix in the past month or so, especially about the fact that they offer an SLA (and that S3 didn’t). I think this probably puts the final nail in Nirvanix’ coffin because:

  • Why would you trust Nirvanix, a no-name company, with your precious data?
  • Worse, they’re affiliated with MediaMax/Streamload in some way, who have a reputation of poor service. (I’ve even seen reports of data loss at Streamload, though I haven’t bothered to check).
  • Just how much is an SLA worth when there’s nothing behind it to back it up?
  • They’re more expensive than Amazon. Um, duh.

SLAs don’t mean a lot to us, anyway, as I’ve said before because:

  • Everything fails sometimes.
  • The SLA payment is rarely comparable to the pain and suffering your customers had to deal with.

But I know it’s very important to lots of people, so I expect there’s cheering and dancing in the streets. :)

UPDATE: I get SLAs now. Sorry for being dumb.

Speaking at ‘The Startup Project’ Wednesday

Tuesday, September 11th, 2007

I should have posted this awhile ago. I suck. I’m sorry.

Anyway, I’m speaking at The Startup Project, an Amazon and Kleiner-Perkins event tomorrow in the Silicon Valley. I’ll be talking a little bit about S3, EC2, and FPS, the three announced Amazon Web Services we’re most excited about.

There’ll be a Q&A, and I’m happy to stick around after and answer questions about AWS or anything else under the sun, too, if you have any.

See ya there! RSVPs are required, I believe.

(On a related note, I blew it this year and spoke at and attended too many events. In 2008, I’ll be going to far fewer conferences and will be very selective of the ones I speak at, so if you think I’d be a good fit with your event, ask earlier rather than later please)

Amazon Flexible Payment Service (FPS)

Saturday, August 4th, 2007

To answer the questions, yes, we’re definitely going to be using FPS in a big way (millions of dollars per year) shortly. We aren’t, though, going to be using the part that all the press are talking about – the so-called ‘PayPal killer’. We don’t talk about un-released features at SmugMug, so I’m afraid I have to leave it at that – but feel free to speculate. :)

On a personal note, I’m really excited about FPS because, like many, I hate PayPal. When we were getting SmugMug off the ground, I was interested in using PayPal either as our main payment option, or at least as an alternative. Their developer support was terrible, though, and the ability to do big batches was apparently nonexistent. I even knew people over there, and they’d just shrug with a ‘what can you do?’ look on their faces when I’d ask them if we could use their stuff.

Definitely not Amazon’s approach. :)

UPDATE: Apparently I was too abstract in my initial post about how we’d be using it, so here’s a quick clarification. We’re not going to use FPS to enable you to signup for SmugMug service or buy prints & gifts using FPS. We have something else in mind. :)

Why not use FPS (or PayPal, for that matter) for signup & purchase, you might ask. Our answer is that we’re not totally comfortable passing customers along to a UI we don’t control and isn’t branded at such a crucial point in our monetization process. The establishment of brand, and even more specifically, trust in that brand, is extremely important to us. These are people’s priceless photos, afterall, and we want to be clear on who’s taking care of them. It’s entirely possible we’re shooting ourselves in the foot with this stance, but that’s our prerogative.

Amazon S3: New pricing model

Tuesday, May 1st, 2007

I’m getting emails about Amazon’s new S3 pricing model, so I guess the news is out. :)

For us, this is great. We’ll save money right off the top (we upload a lot, so $0.10/GB uploaded vs $0.20/GB uploaded is a big deal) first of all, and secondly, they finally have tiered download transfer costs. This is a big one for us, because we buy enough bandwidth that $0.20/GB wasn’t cost-effective enough for us.

I’m going to have to run some numbers (I’m at MIX right now) to see if it’s now good enough for us to start serving more content out of S3 or not, but even if it’s still not perfect for us, it’s a major move in the right direction.

Finally, this illustrates a subtle but important point of using S3. When I buy physical disks at SmugMug, those are sunk costs. They’ll never get cheaper because I’ve already paid for them. At Amazon, though, market forces and changes will cause their pricing model to continue to re-adjust downwards. As disks get cheaper, that $0.15/GB/month fee will drop. And instantly all of your storage magically gets cheaper, no sunk costs to worry about.

That happened today, and I’m sure it’ll happen over and over again as storage & bandwidth both get cheaper and Amazon is able to leverage their scale to get better deals. The more people use S3, the more Amazon can drive prices down.

Since we were already saving a ton of money using S3, this is music to my ears. :)

ETech 2007 SmugMug Amazon Slides are Up!

Friday, March 30th, 2007

My slides from ETech 2007 about Amazon’s Web Services, especially S3, are up in PDF form.

Holler if something isn’t clear, but hopefully this’ll give anyone who couldn’t make it some good insight into what works and what doesn’t with S3 here at SmugMug.

Enjoy!

ETech Amazon S3 slides are coming

Friday, March 30th, 2007

I think they’ll be up later today, I’m just trying to put some of my speaking notes into them, too, so you’re not left wondering what each bullet point means.

So subscribe (see the right sidebar) or come back later.

Sorry they’re not up yet! :)

UPDATE: They’re up!

Amazon S3: The “speed of light” problem

Thursday, March 8th, 2007

I was interviewed yesterday by Beth Pariseau for an article about Amazon’s S3 at SearchStorage.com. All-in-all I think it’s a good article that covers some of Amazon’s strengths and weaknesses, but would like to clarify some of my quotes in the article.

I’m quoted as having no read speed issues, but having write speed problems. As is common in articles like this, that’s boiling down a long conversation and much is lost in the translation. :) In reality, Amazon has been blazingly fast for us (both reads and writes), relatively speaking, except for the few times they’ve had problems, which I’ve blogged about before. That particular quote, especially about it being less than a 10th of a second, was my attempt to explain the “speed of light” problem, which applies to both read and writes. Even mighty Amazon hasn’t yet figured out how to transfer data at faster-than-light speeds. :)

Basically, we’re in California and Amazon isn’t. This means that when we initiate a read or a write to S3, we’re sending bytes to them and they have to cover, at minimum, the physical distance to Amazon’s datacenters (wherever they are) before anything can be done. Assuming that one of their datacenters in on the East Coast, and assuming we have to read or write from that one occasionally, we’re talking 60-80ms of time just to get bits there and back. No-one on Planet Earth can get around this problem, so it bears consideration when you’re planning for S3 usage.

Obviously, our data in our own datacenters suffers from this problem too – only it’s inches, instead of thousands of miles, to our servers, so it’s almost negligible. But we do have clients all over the world, so the problem is still very real. Our friends Down Under, for example, have to wait much longer for their photos to start drawing than our friends at the Googleplex down the street. If we really wanted to solve that problem, we’d have to build or use a CDN (Content Distribution Network). So far, we haven’t wanted to.

Beth mentions how Bob Ippolito at Mochi Media got better performance in Taipei with CacheFly than with Amazon S3. To me, this seems sorta obvious. To my knowledge, S3 doesn’t have a datacenter in Asia at all, and secondly, they’re not a CDN. Let me say that again – they’re not a CDN. Amazon has their issues they need to overcome with S3, but dinging them for lower performance than a CDN is sorta silly. S3 doesn’t provide web search faster than Google either. See my point?

I’m sure Amazon has thought (or is thinking?) about extending S3 to offer CDN services, but I believe the way Amazon builds these things, it’d probably be a separate service that could be layered on top of S3. They’re into offering building blocks which you can mix & match, not complicated services that do too much. (To any would-be Amazon Web Services competitors reading this, the building block approach is the Right Way to do this.)

Beth’s article is right on the money with regards to data transfer costs, though. S3 currently has two sweet spots: small companies who can’t buy large bandwidth, and companies who need a lot of storage but not a lot of transfers. There are, of course, companies which need a lot of transfers but not much storage (CDNs are probably appropriate here), and companies which need a lot of transfers AND a lot of storage. SmugMug potentially falls into this latter category, but you can imagine someone like YouTube falling into it even more than we do. How they solve the different requirements of different companies will be interesting to watch.

Let me reiterate in case it’s not abundantly clear: I love S3. It’s saved us tons of money. I’m a normal, paying customer – not an Amazon shill. It has problems and growing pains, just like every single other online site or service you can name. It may not be right for you – but it’s certainly right for a ton of us.

I address the “speed of light” issue (and some ways of minimizing it) and the whole “sweet spot” pricing issue on my ETech talk (which I’m still working on). If there’s anything specific you’d like to see, be sure to let me know – I’ll be posting the slides here.

Amazon S3: What would you like to know?

Friday, February 2nd, 2007

As I mentioned in my article about performance issues with S3, I’m speaking on the subject at ETech this year. I’m planning on spending roughly half the time on the business ramifications and half on technical architecture. And I’ll be posting the slides or a PDF or something here after the presentation.

But I’d love some feedback about what you would like me to talk about so you can get the most out of my presentation and/or the information I put up here.

Leave a comment telling me what you’re most interested in about S3 and our implementation and I’ll re-prioritize based on your feedback.

Thanks!

UPDATE: Slides from ETech 2007 are up.

Amazon S3: Outages, slowdowns, and problems

Tuesday, January 30th, 2007

First of all, I’m giving a session on Amazon web services (with S3 being the main focus, with a little EC2 and other service love thrown in) at ETech this year. I’ll post a PDF or something of my slides here when I’m done, but if you’re really interested in this stuff, you might want to stop by. Wear some SmugMug gear and I’ll comp your account. :)

UPDATE: I’ve posted a call for topics you’re interested in hearing at ETech or in the resulting PDF. Let me know.

So there’s been some noise this month about S3 problems, and I’ve been getting requests about what we do when Amazon has problems and why our site is able to stay up when they do. I’m happy to answer as best I can, and I’d like to remind everyone that I’m not paid by Amazon – it’s the other way around. I pay them a lot of money, so I expect good service. :) That being said, I think they’re getting too much heat, and I’ll explain why.

First, lets define the issues. During our history with Amazon S3 (since April of 2006), we’ve experienced four large problems. The first two were catastrophic outages – they involved core network switch failures and caused everything to die for 15-30 minutes. And by everything, I mean Amazon.com itself was offline, at least from my network view. (Due to DNS caching issues, even GSLB’d sites can look “down” to part of the world while remaining “up” to other parts. I don’t know if this was the case during these two times). We’ve had core network switch failures here at SmugMug, too, and they’re almost impossible to prevent.

The other two were performance-related. Not outages, because the service still functioned, but massively slower than we were used to. In the first case, which happened right as the BusinessWeek cover article hit newstands and during the Web 2.0 Summit, our customers were at our gates with pitchforks and torches. Our paying customers were affected and they could tell there was something wrong. Not good.

The second time, though, was in early January, and our customers had no idea. I emailed the S3 team to let them know we were seeing issues, flipped a switch in our software, and we were fine.

So what was the difference? We’ve been playing with using Amazon in a variety of different roles and scenarios at SmugMug. At first, we were just using them as a backup copy. That provided some great initial savings and a great deal of customer satisfaction as our customers became aware that their photos were safer than ever. As time went on and we grew more confident in Amazon’s ability to scale and keep their systems reliable, though, we moved Amazon into a more fundamental role at SmugMug and experimented with using them as primary storage. The week we started to experiment with that was the first of the two performance issues, and shined a bright glaring light on the downsides of using them in this way. We quickly shifted gears and are now quite happy with our current architecture, both from a cost view and a reliability view.

So what are we doing differently? Simple. Amazon serves as “cold storage” where everyone’s valuable photos go to live in safety. Our own storage clusters are now “hot storage” for photos that need to be served up fast and furious to the millions of unique visitors we get every day. That’s a bit of an oversimplification of our architecture, as you can imagine, but it’s mostly accurate. The end result is that performance problems with S3 are mostly buffered and offset by our local storage, and even outages are mostly properly handled while resyncing after the outage passes. For the curious, this architecture reduces our actual physical disk usage in our own datacenters by roughly 95%.

Further, we also have the ability to target specific Amazon S3 clusters. In January, we noticed that their West Coast cluster seemed to be performing more slowly than their East Coast cluster, even though we’re on the West Coast, so we toggle our primary endpoint to use the East Coast for awhile. This is the switch I mentioned earlier that I flipped, and it worked out beautifully.

Now, though, I think we come to the real meat of the problem. Are we upset about Amazon’s issues? Do we regret using them? Are we looking elsewhere? Absolutely not, and here’s why:

I can’t think of a particular vendor or service we use that doesn’t have outages, problems, or crashes. From disk arrays to networking gear, everything has bad days. Further, I can’t think of a web site that doesn’t, either. It doesn’t matter if you’re GMail or eBay, you have outages and performance problems from time to time. I knew going into this that Amazon would have problems, and I built our software and our own internal architecture to accommodate occasional issues. This is the key to building good internet infrastructures anyway. Assume every piece of your architecture, including Amazon S3, will fail at some point. What will you do? What will your software do?

Amazon does need to get better about communicating with their customers. They need to have a page which shows the health of their systems, and pro-active notification of major issues, a 24/7 contact method, etc. I’m on their Developer Advisory Council, and believe me, they know about these issues. I’m sure they’re working on them.

To put things into perspective, we have vendors which we pay hundreds of thousands of dollars to each year that seem to be incapable of providing us with decent support. Amazon is not unique in terms of providing a great product but average support. If you ask nearly anyone in IT, I think you’ll find that’s far more common in our industry than it should be and not unique to Amazon in particular.

Finally, S3 is a new service and yet remarkably reliable. Since April 2006, they’ve been more reliable than our own internal systems, which I consider to be quite reliable. Nothing’s perfect, but they’re doing quite well so far for a brand-new service. Oh, and their services has also saved our butts a few times. I’ll try to write those up in the near future, too.

Other Amazon articles:

See you at ETech!

Amazon S3: Show me the money

Friday, November 10th, 2006

UPDATE 4/30/07: This post was written in November 2006, so these numbers are a little out of date. It’s now been 12 months and we’ve saved almost exactly $1M. You can see the most recent numbers, as of April 2007, in my ETech slides.

I still have some more Web 2.0 Summit stuff to write up if I get a few minutes today, but let me talk about Amazon’s S3 for a minute. At the conference, I was chatting with Michael Arrington of TechCrunch fame (who perfectly handled a blogosphere mini-explosion last week, I thought) and we got to talking about S3. He was impressed with how we were using it, but joked that our $500K saved number sounded like “complete bullsh*t”. I laughed along with him and assured him it was true, but on the way home I got to thinking that it IS a really big number to throw out there without details.

So here are the cold hard facts:

  • Our estimate, as you can see in BusinessWeek’s cover story, is that we’re saving $500K per year. We’ve been using S3 for almost 7 months so far (we launched it on or around April 14th), so for my $500K estimate to be in the right ballpark, we should be somewhere near $291K saved to date (well, we don’t grow linearly, so less than that … but let’s do easy math, shall we?).
  • We had roughly 64,000,000 photos when we launched S3. We now have close to 110,000,000 photos. Yes, that’s ~72% growth in 7 months.
  • To sustain our pre-S3 growth, we were buying roughly $40,000 per month in hard disks plus servers to attach them to. We’re not talking about EMC or other over-priced storage solutions. We’re talking about single processor commodity Pentium 4 servers attached to really cheap Apple Xserve RAID arrays. Not quite off-the-shelf IDE disks, but once you factor in the reliability and managability, the TCO comes out to be in a similar ballpark (We’ve done it both ways).
    • If you’re doing the math at home, $40K may seem a little high until you realize how our architecture works: We use RAID-5, with hot spares, and we have two entirely separate storage clusters. That means we have to buy 1.4TB of raw disk to store an actual 500GB.
  • To sustain our current, Nov 2006 growth rate, we’d need to buy more like ~$80K per month. Let’s assume over the 7 months, it ramped from $40K to $80K linearly (it was actually more of a curve, but this makes the math easier). $40K + $46K + $53K + $60K + $66K + $73K + $80K = $418K
  • Our datacenter space, power and cooling costs for those arrays is ~$1.36/month for every $100 of storage. (~$544month @ $40K, ramping to ~$1088/month @ $80K). $544 + $626 + $721 + $816 + $898 + $993 + $1088 = $5,686.
  • It’s cost us some manpower to move everything up to S3. So while I expect to save money on manpower in the long run, currently it’s probably break even – I don’t have to install, manage and maintain new hardware, but I’ve had to copy more than 100TB up to Amazon. (We’re not done copying old data up yet, either)
  • Total amount NOT spent over the last 7 months: $423,686
  • Total amount spent on S3: $84,255.25
  • Total savings: $339,430.75
  • That works out to $48,490 / month, which is $581,881 per year. Remember, though, our rate of growth is high, so over the remaining 5 months, the monthly savings will be even greater.
  • These are real, hard numbers after using S3 for 7 months, not our projections. They closely match (but are actually slightly better) than our projections.

So there you have it.

But wait! It gets even better! Because of the stupid way the tax law operates in this country, I would actually have to pay taxes on the $423K I spent buying drives (yes, exactly like the money I spent was actually profit. Dumb.). So I’d have to pay an additional ~$135K in taxes. Technically, I’d get that back over the next 5 years, so I didn’t want to include it as “savings” but as you can imagine, the cash flow implications are huge. In a very real sense, the actual cash I conserved so far is about $474,000.

But wait! It gets even better! Amazon has been so reliable over the last 7 months (considerably more reliable than our own internal storage, which I consider to be quite reliable), that just last week we made S3 an even more fundamental part of our storage architecture. I’ll save the details for a future post, but the bottom line is that we’re actually going to start selling up to 90% of our hard drives on eBay or something. So costs I had previously assumed were sunk are actually about to be recouped. We should get many hundreds of thousands of dollars back in cash.

I expect our savings from Amazon S3 to be well over $1M in 2007, maybe as high as $2M.

Perhaps most important, though, is the difficult-to-quantify time, effort, and mental thought we’re saving. We get to spend both that money and all of our extra time and effort on providing a better customer experience and delivering better customer service. Storage was a necessary evil that’s now been nearly removed as a concern.

Want more? I have some other posts on the subject:

And I’ll continue to post with more hard details, including our technical architecture and some of our code, as well. And yes, we’re starting to consume other Amazon services like EC2.