Where is my GPU thermal management in Windows Vista?
So, our topic today is how on earth did our system allow the graphics card to reach such extreme temperatures? Where is the kill switch? More precisely, why did the graphics card driver NOT tell the system to shut down? Is thermal management control (monitoring and throttling) not working as it should under Microsoft's latest operating system, Windows Vista? We think some companies have some serious answering to do - those companies are nVidia and Microsoft, two big fish of the industry.
In nVidia's defense, they were most concerned about the issue once we reported it to them but they are on Thanks Giving holidays right now in the USA. That meant we couldn't talk directly to the nVidia driver team but other nVidia offices around the world are still active. After exchanging several phone calls and emails from nVidia on the issue, the response which shocked us most was this one from one of our contacts at the nVidia Taiwan office who works in a technical marketing capacity:
In regards to the issue you've found, I think it may be an individual case. I tried testing a 7900GS board in my lab this afternoon with Vista running and our drivers posted on the net, but did not see a similar occurrence. It worked fine.
And my response:
Just another question, when you said "It worked fine" - please clarify, did you remove ALL cooling and let the graphics card run for an extended amount of time (and how long…)?
And the last response from nVidia:
I did not remove all cooling to let it run because this is not the normal sku that graphics cards using our GPUs are shipped. I ran the system with benchmarks as well as let the system sit for a while for a total of around 10 hours. But of course, this is with the normal cooling on the card.
Truthfully speaking, I find it somewhat amazing that a GPU can get to temperatures so hot to melt copper. Considering that the melting point of copper is close to 1100C, I would think the whole board would've caught on fire before copper would melt.
As for the temperature monitoring, since it notes that the feature will be available in a separate downloadable utility, I'm assuming that it would be best if you download that utility. But as for how our drivers are programmed and how our hardware is designed, I am not at liberty to discuss this information. It would really be best if you can wait until we get a more concrete response from our counterparts in the US.
Let's start from the beginning - I was using a water cooling setup and the pump doesn't have water flow protection. So, the water stopped pumping and the pump was unable to shut down the system because it could not detect that fundamental flaw - Yes, we are also in talks with that company as well about why their pump doesn't contain this feature. The water stopped flowing, temperatures started to rapidly increase (even with the GPU sitting idle in Vista, although the Aero interface probably results in higher operating idle temperatures). In Windows XP, nVidia's ForceWare graphics card drivers have GPU monitoring (full thermal management) enabled but in ForceWare 96.85 for Vista (the latest publically available on the nVidia website) lack many features. nVidia in their own release notes claim these to be "Not NVIDIA Issues":
This section lists issues that are not due to the NVIDIA driver as well as features that are not meant to be supported by the NVIDIA driver for Windows Vista.
That's interesting to say the least! It would seem like nVidia have this BETA driver available on their website but are effectively trying to isolate themselves from taking any responsibility from any unforeseeable issues which may or may not occur. First of all, that type of action is shocking and even if the driver is in BETA and not "officially supported", that means nothing. The driver is available and listed on nVidia's public website as WHQL (Microsoft test it and certify it for public consumption - that is a $10,000 USD exercise with each new driver) and hence driver related and occurred issues ARE their responsibility because they are making it ability to users. nVidia have a duty of care to the computer users who use their graphics cards implementing their GPU - any judge or lawyer will tell you this and it is the first thing you learn in school in legal studies.
GPU Temperature Monitoring
Temperature monitoring is no longer supported in the default GPU driver control panel. This feature will be available in a separate downloadable utility.
This is definitely one for the books! From our research so far, because the current public Vista drivers for GeForce graphics cards is unable to monitor GPU temperatures in their control panel, that fact suggests that it is also unable to operate its usual internal thermal management systems like it does in Windows XP. If our mentioned cooling failure occurred in Windows XP, the driver would have told the graphics card to crash - if in a game running the higher 3D clock speeds, the game will crash to desktop. If sitting idle in Windows, the driver would have told Windows to crash and usually the result would be a system power down. This gives you a chance to fix the cooling issue and avoid death of your graphics cards and possibly other components if things became bad.
We also used the exact same 7900GS graphics card under Windows XP before installing Vista, so we can confirm it does have a thermal diode and thermal management does work under the older OS.