A processor's biggest fan
'We watched in horror as smoke clouds rose from the overheating core ...'
Back in the bad old days of the Pentium, there was one item we needed to keep stock of - processor fans.
Every nine months or so, they'd go sticky and start droning, causing all sorts of odd effects especially on SMP multiprocessor motherboards running NT. Things crashed for no apparent reason, although changing the fan usually cured the problem.
That's why my standard advice at the time was to remove the cover before turning off the power. That way you would see if a processor fan was stalling.
You'd think manufacturers would have learnt from this. Intel has - you can run a Pentium 4 quite happily without a fan or heat sink. The processor just slows down until it is thermally happy.
Not so with AMD's offerings. It appears that there is no thermal protection in an AMD processor - it hit more than 3,700C within a matter of seconds.
I can only quote Tom's Hardware Guide: "A split second after the heat sink had been taken off the Palomino-Athlon, the system crashed. We then watched in horror as smoke clouds rose from the overheating core. The temperature measurement assured us of what we had feared. No semiconductor survives 3,000C/5,800F. Palomino was dead."
You might wonder if this really matters. I think it does. When deciding on server designs, you want as much reliability and resilience as possible. You need to know that there are no significant weak points in the chain, and that you can take appropriate action to cover yourself against problem areas.
Having 'soft' failure modes is obviously a good thing because it helps protect you when something goes wrong. As a failure mode, I'd rather my server slowed down and bleated in error logs than caught fire.
It could be argued that the likelihood of the fan stopping and the heat sink falling off is minimal. I wish I could agree. Certainly, I would be surprised if a well secured heat sink fell off an Athlon. But it would not be unheard of for the fan to fail.
What appropriate actions should you take? It is not unreasonable to check the mounting of the fan and heat sink on an Athlon machine. This is especially true for a desktop tower machine where the heat sink might be mounted vertically.
Simply being aware of the problem, and performing a grown-up risk analysis, is the best course of action. AMD obviously thinks there is no significant risk - you may or may not agree.