I will ask another 2 members here for their valued input. No action necessary from you.
@osprey
@GatorsFan
The report of temperatures in post #16 do seem odd with the ranges from -273.1C to 65261.8C as pointed out by
@CaffeineAddict in post #19 and post #21.
Unfortunately, the linux software does behave in that odd manner, as shown here too:
Code:
[root@min ~]# sensors
<snip>
nvme-pci-0200
Adapter: PCI adapter
Composite: +42.9°C (low = -0.1°C, high = +84.8°C)
(crit = +94.8°C)
Sensor 1: +42.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +52.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 8: +42.9°C (low = -273.1°C, high = +65261.8°C)
nouveau-pci-0100
Adapter: PCI adapter
GPU core: 912.00 mV (min = +0.80 V, max = +1.19 V)
temp1: +48.0°C (high = +95.0°C, hyst = +3.0°C)
(crit = +105.0°C, hyst = +5.0°C)
(emerg = +135.0°C, hyst = +5.0°C)
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +33.0°C (high = +80.0°C, crit = +100.0°C)
Core 0: +30.0°C (high = +80.0°C, crit = +100.0°C)
Core 4: +31.0°C (high = +80.0°C, crit = +100.0°C)
Core 8: +32.0°C (high = +80.0°C, crit = +100.0°C)
Core 12: +30.0°C (high = +80.0°C, crit = +100.0°C)
Core 16: +32.0°C (high = +80.0°C, crit = +100.0°C)
Core 20: +30.0°C (high = +80.0°C, crit = +100.0°C)
Core 24: +31.0°C (high = +80.0°C, crit = +100.0°C)
Core 25: +31.0°C (high = +80.0°C, crit = +100.0°C)
Core 26: +31.0°C (high = +80.0°C, crit = +100.0°C)
Core 27: +31.0°C (high = +80.0°C, crit = +100.0°C)
Core 28: +31.0°C (high = +80.0°C, crit = +100.0°C)
Core 29: +30.0°C (high = +80.0°C, crit = +100.0°C)
Core 30: +30.0°C (high = +80.0°C, crit = +100.0°C)
Core 31: +30.0°C (high = +80.0°C, crit = +100.0°C)
spd5118-i2c-0-51
Adapter: SMBus I801 adapter at efa0
temp1: +35.2°C (low = +0.0°C, high = +55.0°C)
(crit low = +0.0°C, crit = +85.0°C)
Looking at the output though, it's clear that the temperatures of the GPU, CPU cores and SMBus look quite reasonable, and prima facie, plausible on this well running machine here, despite the absurd looking ranges shown for sensors 1, 2 and 8 which are the same values as those shown in the output by
@hoanghieubrant in post #16.
The suggestion is that the software has a problem with ranges for nvme disks, but maybe not with the initial temperatures it is showing.
@hoanghieubrant reports the temperatures rising, and the outputs provided show that rising phenomenon when comparing the outputs in post #16 and post #22:
From post #16:
Sensor 1 +80.8C
Sensor 2 +40.9C
....
Sensor 3 +40.9C
From post #22:
Code:
Temperature Sensor 1: 91 Celsius
Temperature Sensor 2: 70 Celsius
Temperature Sensor 3: 70 Celsius
Sensor 1 appears to have risen to a near critical level of 91C where critical on the machine here looks like 94.8C (see output above). Sensor 1 on my motherboard appears to be sensing the nvme disk. These disks have their own internal sensors and although the adapter is mentioned in the output, it's not the adapter that is being measured.
It's worth pointing out that smartctl and sensors access the same temperature data in the /sys filesystem, e.g. at /sys/class/thermal/* and /sys/class/hwmon/*. For example on the machine here:
Code:
[root@min ~]# smartctl -a /dev/nvme0n1 | grep -i Temper
Temperature: 43 Celsius
<snip>
Temperature Sensor 1: 43 Celsius
Temperature Sensor 2: 52 Celsius
Temperature Sensor 8: 43 Celsius
is outputting virtually the same values as:
Code:
[root@min ~]# sensors
<snip>
nvme-pci-0200
Adapter: PCI adapter
Composite: +42.9°C (low = -0.1°C, high = +84.8°C)
(crit = +94.8°C)
Sensor 1: +42.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +52.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 8: +42.9°C (low = -273.1°C, high = +65261.8°C)
The tiny differences can be explained by the slightly different times the programs were run, one after the other, or rounding by the software.
The data suggests that Sensor 1 is sensing the temperature of the nvme disk itself in the output from the sensors command, both at 42.9C, on this particular motherboard. The Sensors may vary on different motherboards as to what they are sensing. One needs to read the motherboard documentation to be certain.
In the output of temperatures in post #16 from
@hoanghieubrant, the nvme disk is shown at 38.0C, but none of the Sensors show the same, so it's difficult to surmise what Sensors 1, 2 and 3 are measuring, unlike the motherboard here.
At this point my suggestion would be to install the linux distribution and configure the system to shut down automatically when a certain temperature threshold is reached to prevent overheating. Since there doesn't appear to be a built-in, direct command for setting an automatic shutdown temperature, one is left with having to achieve the functionality through scripting, using the temperature monitoring tools and system management commands. Some options for running such a script are from a systemd unit written for the purpose, a cron job or it could run from
/etc/rc.local.
If such a script was created and run, and the linux system never shut itself itself down, then there's probably no problem. Such a script would be a safe way to go to protect things and is possibly a reasonable test for this particular problem.