October 08, 2005

Raid Woes...... stas? warez? 31die?

My PC has regular failures of the nVidia nvRaid onboard raid system.

Several months ago I had the problem intermittently. Perhaps once in six weeks. Recently the time between raid failures (nvRaid saying, "raid degraded") has fallen to as little as one day between failure.

I'm not sure what the problem is. Stupid nVidia/MSI provide zero troubleshooting support, and in fact as someone points out, the documentation for their windows based raid support software doesn't even mention the "array degraded" state.

I upgraded the BIOS to the latest version. This didn't help.

A few weeks ago I noticed that the pc seemed fairly hot. I wondered if the whole thing was overheating? The MSI temperature utility doesn't work. It regularly says the system temperature is 0 degrees C. The CPU ranges as high as 56c, but I don't think that is unusual. As I felt around, though, I could feel that the hardrive enclosure felt damn hot -- hot enough that it was quite uncomfortable to hold my fingers against the hard drive. Ooops....did my harddrives critically overheat?

My investigations showed that the Nexus Breeze case has pretty poor airflow. The intake port is nearly choked off. So I raised the case off the ground a few inches so that it can inhale air a bit easier. I also removed the plates and the noise-blocking foam inserts covering the harddrives on the front of the computer. I bought a cheap indoor/outdoor thermometer and began monitoring the temperature of the harddrive enclosure. I also didn't let the sytem run 24/7 like I have for the past many months. It runs little more than twelve hours daily now.

The highest recorded temperature has been 43.8c or 110F. I don't know what the error range of this cheap thermometer is, perhaps 10%, giving me temperatures as high as 120F. Is that too hot for a harddrive? I'm not sure. I checked the Seagate documentation and it says the operating temperature range is 0-60c and that actual drive case temperature should not exceed 69C (156F).

So current temperatures appear to be ok. But possibly the drives were suffocated and overheated in the past, and now they are unstable? Did they fail for other reasons and in the RAID trying to repair them they are used so heavily that they overheated? Or this is all absolutely unrelated to temperature?

I would like to run one of the low-level diagnostic programs provided by Seagate to analyze these Seagate 300gb SATA 7200+8mb NCQ drives, however they won't function because the drives are hidden behind the nVidia raid controller. To test them I will need to remove the drive, stick it in an external drive enclosure on another computer, and then run the diagnostic from there. It's looking like this is becoming unavoidable. Maybe one/both of them truly is failing. ugh.

If I do find that the drives are unstable and dying, then I need to replace them. I keep hearing about "ghosting" programs. Is there some way I can duplicate these drives (remembering that they hold my OS, etc) as I hate hate hate hate hate hate hate reinstalling a PC.

It almost makes me not want to use the RAID system at all, since it prevents me from running harddrive diagnostic software to see how the drives are performing anyway. Ugh. Bite me.




Even the fucking external harddrive is not working... http://forum.msi.com.tw/index.php?topic=88355.0

Posted by Nils Blutig at October 8, 2005 11:42 AM | TrackBack