I came across this in my wanderings. It is a paper by some Microsoft guys analysing their phone home error reporting for a million PCs. The interest being that these are home and private machines rather than corporate servers or desktops that don't report in. There has been plenty of research into server failures, but this is the first I have seen for such a number of private PCs.

Obviously there are limitations and assumptions, but they seem quite clearly documented. Also, the data collected is for Microsoft so is biased towards operating system crashes rather than precise hardware issues.


Some not unexpected results:

1. Things fail early on or when they get old.
2. Once something has failed once it is massively more likely to fail again.
3. Underclocked machines are more reliable than stock settings.
4. Overclocked machines are massively more likely to fail than the others.
5. Big name (World top 20 sales) OEMs products are less likely to fail than "white box" products.
6. Machines with less RAM are more likely to have an HDD failure.

And a couple that I found rather strange:

7. OEM laptops are less likely to fail than their desktops.
8. Machines with more RAM are more likely to show CPU errors.

#5 I can understand, as the big OEMs are very conscious of support costs, as they carry a lot of overheads, as well as the actual repair element.

I would offer a suggestion that that relates to #7 also, as laptops are far more expensive to fix due to the component costs. On top of that if you have a failed desktop you might just repair it yourself, rather than wait the 2~6 weeks or so that RTMing it would involve. That would cost the OEM nothing

I can imagine where their quality assurance effort is concentrated.

#8 is a bit weird, and the researchers admitted they didn't have an explanation.

I suspect that this is a statistical artifact of some sort, and the relationship is not direct.

The most reliable devices were from the big OEMs, who will not put more RAM than necessary into a box. They sell that as "room for expansion"

Build to order and build yourself people will probably stick in as much RAM as they can? Overclockers almost certainly would, and probably overclock the RAM as well? These are overall high risk situations?

In general the more RAM you have the more strips you have, so that increases the probability of a RAM failure, but as I have suggested, these are also most likely high risk machines anyway, that would greatly increase the probability of a CPU failure.

Sure, I buy unlocked and clockable stuff for myself, because I know that it is cherry picked to support clocking. In that case it should be well above normal reliability at stock settings.

Incidentally, these figures are all to do with Windows, and represent recoverable "bluescreen" events, otherwise the machines wouldn't have been able to phone home