Amazon S3 Issues: Load Balancers and MD5
Amazon’s S3 storage system had some issues last week with data corruption on files using MD5 to perform integrity checks. After some investigation, Amazon confirmed the problems and identified the cause:
We’ve isolated this issue to a single load balancer that was brought into service at 10:55pm PDT on Friday, 6/20. It was taken out of service at 11am PDT Sunday, 6/22. While it was in service it handled a small fraction of Amazon S3′s total requests in the US. Intermittently, under load, it was corrupting single bytes in the byte stream. … Based on our investigation with both internal and external customers, the small amount of traffic received by this particular load balancer, and the intermittent nature of the above issue on this one load balancer, this appears to have impacted a very small portion of PUTs during this time frame.
Thanks for the link.
One thing to clarify: it was the developers using MD5 that identified there was data corruption. Or to put it another way, the developers that did not code MD5 checks would not be aware that their transfers were getting silently corrupted by the load balancer.