有台機器只要大量匯入資料讓他運算, 就會當機, 最後由 syslog 找到是溫度過高造成的.
正常 Log 狀態如下:
Apr 25 09:43:12 www sensord: Chip: acpitz-virtual-0
Apr 25 09:43:12 www sensord: Adapter: Virtual device
Apr 25 09:43:12 www sensord: temp1: 40.0 C
Apr 25 09:43:12 www sensord: Chip: k8temp-pci-00c3
Apr 25 09:43:12 www sensord: Adapter: PCI adapter
Apr 25 09:43:12 www sensord: Core0 Temp: 31.0 C
Apr 25 09:43:12 www sensord: Core1 Temp: 33.0 C
當機時, CPU 溫度高達 80度, 幸好當機了, 不然可能得多花不少錢. XD
這個溫度監控, 在我的機器沒有看過, 也想來裝一裝, 於是就找找是哪個 Package (lm-sensors).
Sensors 安裝、設定步驟
- apt-get install lm-sensors
- pwmconfig # 自動偵測有哪些裝置可以監控
- modprobe sensor # kernel 載入 sensor module
- sensors-detect # 設定哪些要列入偵測監控
Sensors 使用方式
$ sensors # 執行 sensors 可以看機器狀況(看來我機器狀況蠻糟糕的)
w83627ehf-isa-0290
Adapter: ISA adapter
VCore: +1.34 V (min = +0.00 V, max = +1.74 V)
+12V: +12.04 V (min = +4.91 V, max = +5.02 V) ALARM
AVCC: +3.18 V (min = +2.00 V, max = +3.26 V)
3VCC: +3.20 V (min = +2.03 V, max = +0.70 V) ALARM
in4: +1.60 V (min = +0.98 V, max = +0.47 V) ALARM
in5: +1.60 V (min = +1.49 V, max = +1.49 V) ALARM
+5V: +0.10 V (min = +3.15 V, max = +3.23 V) ALARM
VSB: +3.15 V (min = +1.52 V, max = +4.08 V)
VBAT: +3.15 V (min = +4.06 V, max = +0.98 V) ALARM
in9: +0.03 V (min = +1.06 V, max = +2.02 V) ALARM
Case Fan: 0 RPM (min = 57 RPM, div = 128) ALARM
CPU Fan: 3276 RPM (min = 2360 RPM, div = 4)
Aux Fan: 0 RPM (min = 55 RPM, div = 128) ALARM
fan5: 0 RPM (min = 6490 RPM, div = 16) ALARM
Sys Temp: +43.0°C (high = -79.0°C, hyst = +123.0°C) sensor = thermistor
CPU Temp: +40.5°C (high = +80.0°C, hyst = +75.0°C) sensor = diode
AUX Temp: +47.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
cpu0_vid: +0.000 V
$ sensors # 這是另外一台機器, 看起來就安心多了~
acpitz-virtual-0
Adapter: Virtual device
temp1: +40.0°C (crit = +74.0°C)k8temp-pci-00c3
Adapter: PCI adapter
Core0 Temp: +29.0°C
Core1 Temp: +30.0°C
另外, 舊的機器沒有 sensor 晶片, 就算裝了程式也沒有用, 看不到就當做沒事吧. XD
Tsung 大可以考慮配合 rrdtool,弄個簡單的網頁出來
我自己是這樣用來測量房間的溫度...
厄, 我還是看溫度計就好了.. Orz..
而且, 我主要的 Server... 沒辦法感測溫度.. :~~~ (4 ~ 5年前的機器了)