<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Brief Power Outage for Amazon Data Center</title>
	<atom:link href="http://www.datacenterknowledge.com/archives/2009/12/10/power-outage-for-amazon-data-center/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.datacenterknowledge.com/archives/2009/12/10/power-outage-for-amazon-data-center/</link>
	<description>News and analysis about data centers, cloud computing, managed hosting and disaster recovery</description>
	<lastBuildDate>Mon, 13 Feb 2012 17:24:17 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
	<item>
		<title>By: vinod tomar</title>
		<link>http://www.datacenterknowledge.com/archives/2009/12/10/power-outage-for-amazon-data-center/comment-page-1/#comment-11444</link>
		<dc:creator>vinod tomar</dc:creator>
		<pubDate>Mon, 01 Mar 2010 09:42:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.datacenterknowledge.com/?p=19526#comment-11444</guid>
		<description>hi 
In the above case how should we operate the parallel redundant system so that cascade failures do not occur in the case of failure of one of the paths 
Thanks</description>
		<content:encoded><![CDATA[<p>hi<br />
In the above case how should we operate the parallel redundant system so that cascade failures do not occur in the case of failure of one of the paths<br />
Thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Reported cloud outages for Amazon, Google, Microsoft and Salesforce.com in 2008 and 2009 &#171; Muon Cloud</title>
		<link>http://www.datacenterknowledge.com/archives/2009/12/10/power-outage-for-amazon-data-center/comment-page-1/#comment-10439</link>
		<dc:creator>Reported cloud outages for Amazon, Google, Microsoft and Salesforce.com in 2008 and 2009 &#171; Muon Cloud</dc:creator>
		<pubDate>Sun, 31 Jan 2010 15:50:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.datacenterknowledge.com/?p=19526#comment-10439</guid>
		<description>[...] datacenterknowledge [...]</description>
		<content:encoded><![CDATA[<p>[...] datacenterknowledge [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: LogicMonitor Blog &#187; Active/Active or N+1?</title>
		<link>http://www.datacenterknowledge.com/archives/2009/12/10/power-outage-for-amazon-data-center/comment-page-1/#comment-9390</link>
		<dc:creator>LogicMonitor Blog &#187; Active/Active or N+1?</dc:creator>
		<pubDate>Mon, 21 Dec 2009 22:52:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.datacenterknowledge.com/?p=19526#comment-9390</guid>
		<description>[...] The risk with Active/Active is that load does not scale linearly.  If you have two systems running at 40% load, that does not mean that one will be able the handle the load of both, and run at 80%.  More likely you will run into an inflection point, where you will run into an unanticipated bottleneck &#8211; be it CPU, memory bandwidth, disk IO, or some system that is providing external API resources. It can even be the power system. If servers have redundant power supplies, and each PSU is attached to separate Power Distribution Units (PDUs), the critical load for each PDU is now 40% of the rating.  If one circuit fails, all load switches to the other PDU &#8211; and if that PDU is now asked to carry more than 80% of its rating, overload circuits will trip, leading to a total outage.  There is some speculation that a cascading failure of this type was behind the recent Amazon EC2 outage. [...]</description>
		<content:encoded><![CDATA[<p>[...] The risk with Active/Active is that load does not scale linearly.  If you have two systems running at 40% load, that does not mean that one will be able the handle the load of both, and run at 80%.  More likely you will run into an inflection point, where you will run into an unanticipated bottleneck &#8211; be it CPU, memory bandwidth, disk IO, or some system that is providing external API resources. It can even be the power system. If servers have redundant power supplies, and each PSU is attached to separate Power Distribution Units (PDUs), the critical load for each PDU is now 40% of the rating.  If one circuit fails, all load switches to the other PDU &#8211; and if that PDU is now asked to carry more than 80% of its rating, overload circuits will trip, leading to a total outage.  There is some speculation that a cascading failure of this type was behind the recent Amazon EC2 outage. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Julius Neudorfer</title>
		<link>http://www.datacenterknowledge.com/archives/2009/12/10/power-outage-for-amazon-data-center/comment-page-1/#comment-9062</link>
		<dc:creator>Julius Neudorfer</dc:creator>
		<pubDate>Fri, 11 Dec 2009 20:12:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.datacenterknowledge.com/?p=19526#comment-9062</guid>
		<description>The underlying question is did the &quot;redundant PDU&quot; actually fail because there was an unknown defect, or was it simply a case of a classic &quot;Cascade Failure&quot;. 

This is when the total power, which is usually split 50%-50% across the 2 paths, exceeds the max rating of the power path i.e.  120 KVA  split across 2 x 100 KVA paths. (each side sees only a 60% load [60 kva] so it &quot;seems&quot; ok)  When a path fails (for any reason) the entire load shifts to the remaining path and it overloads and drops off line.  

The hence classic Cascade Failure.

See 
http://www.eweek.com/c/a/IT-Infrastructure/How-to-Avoid-a-Redundant-Path-to-Power-Failure/</description>
		<content:encoded><![CDATA[<p>The underlying question is did the &#8220;redundant PDU&#8221; actually fail because there was an unknown defect, or was it simply a case of a classic &#8220;Cascade Failure&#8221;. </p>
<p>This is when the total power, which is usually split 50%-50% across the 2 paths, exceeds the max rating of the power path i.e.  120 KVA  split across 2 x 100 KVA paths. (each side sees only a 60% load [60 kva] so it &#8220;seems&#8221; ok)  When a path fails (for any reason) the entire load shifts to the remaining path and it overloads and drops off line.  </p>
<p>The hence classic Cascade Failure.</p>
<p>See<br />
<a href="http://www.eweek.com/c/a/IT-Infrastructure/How-to-Avoid-a-Redundant-Path-to-Power-Failure/" rel="nofollow">http://www.eweek.com/c/a/IT-Infrastructure/How-to-Avoid-a-Redundant-Path-to-Power-Failure/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff</title>
		<link>http://www.datacenterknowledge.com/archives/2009/12/10/power-outage-for-amazon-data-center/comment-page-1/#comment-9050</link>
		<dc:creator>Jeff</dc:creator>
		<pubDate>Fri, 11 Dec 2009 14:56:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.datacenterknowledge.com/?p=19526#comment-9050</guid>
		<description>Apparently the future isn&#039;t here just yet.

Isn&#039;t the point of &quot;cloud computing&quot; to allow redundancy at the server level so that any component failure can be corrected seamlessly (or at least in near-realtime) because some other part of the cloud is still operating?  It seems to me that if EC2 isn&#039;t structured like this, then it is nothing more than managed hosting, the same old thing that&#039;s been popular for over a decade.

Wake me up when &quot;the cloud&quot; is an ether of compute resources that can seamlessly tolerate a loss of any number of nodes up to the capacity threshold.  Until then, &quot;what&#039;s old is new again&quot; is in full effect at EC2.</description>
		<content:encoded><![CDATA[<p>Apparently the future isn&#8217;t here just yet.</p>
<p>Isn&#8217;t the point of &#8220;cloud computing&#8221; to allow redundancy at the server level so that any component failure can be corrected seamlessly (or at least in near-realtime) because some other part of the cloud is still operating?  It seems to me that if EC2 isn&#8217;t structured like this, then it is nothing more than managed hosting, the same old thing that&#8217;s been popular for over a decade.</p>
<p>Wake me up when &#8220;the cloud&#8221; is an ether of compute resources that can seamlessly tolerate a loss of any number of nodes up to the capacity threshold.  Until then, &#8220;what&#8217;s old is new again&#8221; is in full effect at EC2.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ernie</title>
		<link>http://www.datacenterknowledge.com/archives/2009/12/10/power-outage-for-amazon-data-center/comment-page-1/#comment-9043</link>
		<dc:creator>Ernie</dc:creator>
		<pubDate>Fri, 11 Dec 2009 13:08:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.datacenterknowledge.com/?p=19526#comment-9043</guid>
		<description>All too often a data center becomes inadequate immediately after it is build. Data Center engineers get complacent with their corner of the world as long as everything is operating correctly. I’m sure an autopsy will be performed in a conference room somewhere discussing what could have or should have been. 

Terms such as “We could not have known the second path would fail” or it wasn’t our fault that the second PDU went down. I have to point my finger at Amazon’s maintenance program. I will bet the preverbal dime to a donut that all their UPS, PDU, air conditioning and generator maintenance is up to date. I will then double down my bet by saying I bet they haven’t completed an electrical assessment of their facility since it was built. 

With the life cycle of a data center that begins to reach it&#039;s half-life six months after it was built, staying on top of the electrical infrastructure is paramount. To many times I have seen where “My UPS is running fine, I don’t need to worry about anything else” attitude. Once you begin to move around servers, you have changed the physical characteristics of the data center. NFPA 70E says you have to perform a circuit breaker coordination study every 5 years or after any changes. 

Things change daily in a data center and those changes need to be tracked. Data Center software such as aperture does this. If you are not willing to invest the money for a program such as  that then you better invest money in a full time staff to do nothing but monitor changes to your electrical infrastructure.  I have preached the gospel of “Just because it is new does not mean it works” my entire life. 

If there is a data center that has not completed an electrical assessment then that data center is a ticking time bomb.  Time to lose the “UPS is running fine mentality” and pay attention to the electrical infrastructure of each and every data center. Cloud computing will not become a sky scraper as long as the foundation is built in sand.</description>
		<content:encoded><![CDATA[<p>All too often a data center becomes inadequate immediately after it is build. Data Center engineers get complacent with their corner of the world as long as everything is operating correctly. I’m sure an autopsy will be performed in a conference room somewhere discussing what could have or should have been. </p>
<p>Terms such as “We could not have known the second path would fail” or it wasn’t our fault that the second PDU went down. I have to point my finger at Amazon’s maintenance program. I will bet the preverbal dime to a donut that all their UPS, PDU, air conditioning and generator maintenance is up to date. I will then double down my bet by saying I bet they haven’t completed an electrical assessment of their facility since it was built. </p>
<p>With the life cycle of a data center that begins to reach it&#8217;s half-life six months after it was built, staying on top of the electrical infrastructure is paramount. To many times I have seen where “My UPS is running fine, I don’t need to worry about anything else” attitude. Once you begin to move around servers, you have changed the physical characteristics of the data center. NFPA 70E says you have to perform a circuit breaker coordination study every 5 years or after any changes. </p>
<p>Things change daily in a data center and those changes need to be tracked. Data Center software such as aperture does this. If you are not willing to invest the money for a program such as  that then you better invest money in a full time staff to do nothing but monitor changes to your electrical infrastructure.  I have preached the gospel of “Just because it is new does not mean it works” my entire life. </p>
<p>If there is a data center that has not completed an electrical assessment then that data center is a ticking time bomb.  Time to lose the “UPS is running fine mentality” and pay attention to the electrical infrastructure of each and every data center. Cloud computing will not become a sky scraper as long as the foundation is built in sand.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: The data center, the cloud, and power failure. &#171; The Server Room</title>
		<link>http://www.datacenterknowledge.com/archives/2009/12/10/power-outage-for-amazon-data-center/comment-page-1/#comment-9024</link>
		<dc:creator>The data center, the cloud, and power failure. &#171; The Server Room</dc:creator>
		<pubDate>Fri, 11 Dec 2009 04:18:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.datacenterknowledge.com/?p=19526#comment-9024</guid>
		<description>[...] } Today we got the story about Amazon&#8217;s outage of part of its EC2 cloud. In this post, I&#8217;ll examine what happened and what you can do to avoid the same [...]</description>
		<content:encoded><![CDATA[<p>[...] } Today we got the story about Amazon&#8217;s outage of part of its EC2 cloud. In this post, I&#8217;ll examine what happened and what you can do to avoid the same [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

