<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: High uptime equates to low mean time to recovery</title>
	<atom:link href="http://saasinterrupted.com/2009/12/01/high-uptime-equates-to-low-mean-time-to-recovery/feed/" rel="self" type="application/rss+xml" />
	<link>http://saasinterrupted.com/2009/12/01/high-uptime-equates-to-low-mean-time-to-recovery/</link>
	<description>-- A blog by Ashish Soni.</description>
	<lastBuildDate>Wed, 03 Nov 2010 23:57:46 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Ash</title>
		<link>http://saasinterrupted.com/2009/12/01/high-uptime-equates-to-low-mean-time-to-recovery/#comment-10</link>
		<dc:creator><![CDATA[Ash]]></dc:creator>
		<pubDate>Tue, 15 Dec 2009 20:50:58 +0000</pubDate>
		<guid isPermaLink="false">http://saasinterrupted.com/?p=13#comment-10</guid>
		<description><![CDATA[Thanks for the comments Patrick.  
MTBF starts to lose its value very quickly as soon as one starts to manage a service that is being successful and therefore needs to be highly available. Take the following example : 
1) Lets say you have reliable servers along with the software on it with a MTBF of 5 years.  (Therefore a server will crash after ~1800 days of operation)
2) And lets say that you are managing 250 such servers
3) This would mean that you will have one server crash every week! (250*7 is approximately 1800 operational server days).  You just don&#039;t know which server and you better be ready to handle this crash and hope that this does not cause downtime.

If one just focused on MTBF and got servers that were twice as reliable in the above example you would still have a server crash and possible outage every other week.   

This leads us back to the original point that that for highly available systems it is MTTR that matters.  MTBF just proves that failures will occur.  If you have a MTTR of 0 it does not matter much whether the MTBF is 1 year or 10 years.  The failure is non service interrupting and in the end that is all that matters.

The only minor exception I would make for the above is for components where the MTTR is not 0.  In that case one could try to focus on ensuring that the MTBF is as high as possible for those non 0 MTTR components.  
Though I would contend that the time spent on getting a higher MTBF component would be better spent in reducing the MTTR of that component to 0.

Until the MTBF gets to infinity for a component, a 0 MTTR approach will always win out.]]></description>
		<content:encoded><![CDATA[<p>Thanks for the comments Patrick.<br />
MTBF starts to lose its value very quickly as soon as one starts to manage a service that is being successful and therefore needs to be highly available. Take the following example :<br />
1) Lets say you have reliable servers along with the software on it with a MTBF of 5 years.  (Therefore a server will crash after ~1800 days of operation)<br />
2) And lets say that you are managing 250 such servers<br />
3) This would mean that you will have one server crash every week! (250*7 is approximately 1800 operational server days).  You just don&#8217;t know which server and you better be ready to handle this crash and hope that this does not cause downtime.</p>
<p>If one just focused on MTBF and got servers that were twice as reliable in the above example you would still have a server crash and possible outage every other week.   </p>
<p>This leads us back to the original point that that for highly available systems it is MTTR that matters.  MTBF just proves that failures will occur.  If you have a MTTR of 0 it does not matter much whether the MTBF is 1 year or 10 years.  The failure is non service interrupting and in the end that is all that matters.</p>
<p>The only minor exception I would make for the above is for components where the MTTR is not 0.  In that case one could try to focus on ensuring that the MTBF is as high as possible for those non 0 MTTR components.<br />
Though I would contend that the time spent on getting a higher MTBF component would be better spent in reducing the MTTR of that component to 0.</p>
<p>Until the MTBF gets to infinity for a component, a 0 MTTR approach will always win out.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Patrick</title>
		<link>http://saasinterrupted.com/2009/12/01/high-uptime-equates-to-low-mean-time-to-recovery/#comment-9</link>
		<dc:creator><![CDATA[Patrick]]></dc:creator>
		<pubDate>Tue, 15 Dec 2009 05:31:22 +0000</pubDate>
		<guid isPermaLink="false">http://saasinterrupted.com/?p=13#comment-9</guid>
		<description><![CDATA[This is just wrong.  Low MTTR is part of the plan but the notion that MTBF doesn&#039;t matter is nonsense.  MTBF, MTTR, and a solid SLA are all parts of an equation and it&#039;s my opinion that you improve availability from &quot;miserable&quot; to &quot;pretty respectable&quot; by starting with a focus on reducing incidence of easily prevented failures (which is accomplished by an emphasis on MTBF). Eventually you may start getting close to the metal and be at a theoretical limit on failure incidence and at that point the only way to squeeze higher uptime out of a system is through reducing the time you spend in the breaks that you aren&#039;t able to prevent, but until you&#039;re nearing that point, it&#039;s definitely not a mistake to look at MTBF and it may be a mistake to ignore it.]]></description>
		<content:encoded><![CDATA[<p>This is just wrong.  Low MTTR is part of the plan but the notion that MTBF doesn&#8217;t matter is nonsense.  MTBF, MTTR, and a solid SLA are all parts of an equation and it&#8217;s my opinion that you improve availability from &#8220;miserable&#8221; to &#8220;pretty respectable&#8221; by starting with a focus on reducing incidence of easily prevented failures (which is accomplished by an emphasis on MTBF). Eventually you may start getting close to the metal and be at a theoretical limit on failure incidence and at that point the only way to squeeze higher uptime out of a system is through reducing the time you spend in the breaks that you aren&#8217;t able to prevent, but until you&#8217;re nearing that point, it&#8217;s definitely not a mistake to look at MTBF and it may be a mistake to ignore it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: brk</title>
		<link>http://saasinterrupted.com/2009/12/01/high-uptime-equates-to-low-mean-time-to-recovery/#comment-8</link>
		<dc:creator><![CDATA[brk]]></dc:creator>
		<pubDate>Tue, 01 Dec 2009 14:50:22 +0000</pubDate>
		<guid isPermaLink="false">http://saasinterrupted.com/?p=13#comment-8</guid>
		<description><![CDATA[SLA&#039;s are usually measured in terms of availability over a set time (1 month or 1 year).  You don&#039;t &quot;roll over&quot; unused SLA outages, so your outage time after 5 years would be 8 hours, not 40.

If this is a major concern, then you should be placing servers in locations that can be served from multiple carriers over divergent paths.]]></description>
		<content:encoded><![CDATA[<p>SLA&#8217;s are usually measured in terms of availability over a set time (1 month or 1 year).  You don&#8217;t &#8220;roll over&#8221; unused SLA outages, so your outage time after 5 years would be 8 hours, not 40.</p>
<p>If this is a major concern, then you should be placing servers in locations that can be served from multiple carriers over divergent paths.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ash</title>
		<link>http://saasinterrupted.com/2009/12/01/high-uptime-equates-to-low-mean-time-to-recovery/#comment-7</link>
		<dc:creator><![CDATA[Ash]]></dc:creator>
		<pubDate>Tue, 01 Dec 2009 14:13:32 +0000</pubDate>
		<guid isPermaLink="false">http://saasinterrupted.com/?p=13#comment-7</guid>
		<description><![CDATA[I would agree that MTBF can be useful in comparing like hardware components.  
If the component is a part of a High Availability Architecture then the solution with the lower MTTR would win.
I think you made my point exactly, MTTR is generally not considered while making hardware purchasing decisions and it should be - especially for HA.]]></description>
		<content:encoded><![CDATA[<p>I would agree that MTBF can be useful in comparing like hardware components.<br />
If the component is a part of a High Availability Architecture then the solution with the lower MTTR would win.<br />
I think you made my point exactly, MTTR is generally not considered while making hardware purchasing decisions and it should be &#8211; especially for HA.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: PHP Developer</title>
		<link>http://saasinterrupted.com/2009/12/01/high-uptime-equates-to-low-mean-time-to-recovery/#comment-6</link>
		<dc:creator><![CDATA[PHP Developer]]></dc:creator>
		<pubDate>Tue, 01 Dec 2009 12:50:32 +0000</pubDate>
		<guid isPermaLink="false">http://saasinterrupted.com/?p=13#comment-6</guid>
		<description><![CDATA[Would it not be better to accept both metrics but apply weights accordingly? MTBF is very useful im making hardware purchase decisions.]]></description>
		<content:encoded><![CDATA[<p>Would it not be better to accept both metrics but apply weights accordingly? MTBF is very useful im making hardware purchase decisions.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

