.. include:: /keyword.rst ======= Thermal ======= Thermal management is an important feature during operations of devices. Thermal management helps to prevent from overheating in certain circumstances causing device damage, and to extend device lifetime. Linux kernel provides a thermal framework to allow users to monitor temperature, adjust policy, and query current configurations. |IOT-YOCTO| follows the same scheme, and provides a simple, consistent interface to developers. For more information about Linux thermal framework, please refer to `kernel document `_. The following sections will describe thermal management interface provided by |IOT-YOCTO|. .. contents:: Sections :local: :depth: 1 Components in Thermal Framework =============================== The framework consists of following components: Thermal Zone ------------ The thermal zone is the central place of thermal management. It reads temperature, compares it with thermal thresholds, and reacts according to different thermal conditions. Thermal Sensor -------------- The thermal sensor provides thermal sensing capability to a thermal zone. Cooling Device -------------- The cooling device provides heap dissipation capability to thermal zone. There are two types of cooling: * Passive cooling uses regulation of device performance such as lowering CPU or GPU frequencies to keep temperature in a controlled range. * Active cooling uses external devices to help removing dissipated heat, such as a fan. The cooling device has a range of cooling states, which correspond to different levels of heap dissipation. For example, the cooling states of a fan are corresponded to different fan speeds it supports. A cooling state is represented as an unsigned integer, where larger numbers indicate greater heat dissipation. Trip ---- A trip is a specific thermal point on which the framework should take action. There are four types of trip points: * Active: The thermal point which enables active cooling. * Passive: The thermal point which enables passive cooling. * Hot: The thermal point which sends notification to underlying thermal driver. * Critical: The thermal point which sends notification and triggers system shutdown. Each trip has an associated temperature threshold indicating that the framework needs to take action when the temperature reaches a given trip point. Trip point settings can be specified in device tree, but can not be changed at runtime. Governor -------- For non-critical trips (trips are not hot neither critical), the governor is used in thermal zone to control policy of transition of cooling states. |IOT-YOCTO| currently supports two types of governors: *step_wise* and *power_allocator*. Only step_wise governor is covered in this document. `sysfs` Attributes ================== These components mentioned in previous section are exported as `sysfs` attributes. Here are attributes of a thermal zone on |i350-EVK| (note not all attributes are listed): .. prompt:: bash # auto # ls -l /sys/class/thermal/thermal_zone0/ total 0 -r--r--r-- 1 root root 4096 Sep 20 10:44 available_policies lrwxrwxrwx 1 root root 0 Sep 20 10:44 cdev0 -> ../cooling_device0 -r--r--r-- 1 root root 4096 Sep 20 10:44 cdev0_trip_point -rw-r--r-- 1 root root 4096 Sep 20 10:44 cdev0_weight --w------- 1 root root 4096 Sep 20 10:44 emul_temp -rw-r--r-- 1 root root 4096 Sep 20 10:44 mode -rw-r--r-- 1 root root 4096 Sep 20 10:44 policy -r--r--r-- 1 root root 4096 Sep 20 10:44 temp -rw-r--r-- 1 root root 4096 Sep 20 10:44 trip_point_0_hyst -r--r--r-- 1 root root 4096 Sep 20 10:44 trip_point_0_temp -r--r--r-- 1 root root 4096 Sep 20 10:44 trip_point_0_type -rw-r--r-- 1 root root 4096 Sep 20 10:44 trip_point_1_hyst -r--r--r-- 1 root root 4096 Sep 20 10:44 trip_point_1_temp -r--r--r-- 1 root root 4096 Sep 20 10:44 trip_point_1_type -rw-r--r-- 1 root root 4096 Sep 20 10:44 trip_point_2_hyst -r--r--r-- 1 root root 4096 Sep 20 10:44 trip_point_2_temp -r--r--r-- 1 root root 4096 Sep 20 10:44 trip_point_2_type -r--r--r-- 1 root root 4096 Sep 20 10:44 type .. note:: The number of attributes exported might vary depending on different platforms. The type of the thermal zone can be read by running the command: .. prompt:: bash # auto # cat /sys/class/thermal/thermal_zone0/type cpu_thermal The governor can be changed at runtime: .. prompt:: bash # auto # cat /sys/class/thermal/thermal_zone0/available_policies power_allocator step_wise # echo step_wise > /sys/class/thermal/thermal_zone0/policy The temperature threshold of the each trip point can be read by the command: .. prompt:: bash # auto cat /sys/class/thermal/thermal_zone0/trip_point_0_temp cat /sys/class/thermal/thermal_zone0/trip_point_1_temp cat /sys/class/thermal/thermal_zone0/trip_point_2_temp For engineering purpose, you can disable thermal throttle control by increasing all the trip points to a higher temperature: .. prompt:: bash # auto echo 115000 > /sys/class/thermal/thermal_zone0/trip_point_0_temp echo 115000 > /sys/class/thermal/thermal_zone0/trip_point_1_temp echo 115000 > /sys/class/thermal/thermal_zone0/trip_point_2_temp .. warning:: Disable thermal throttle may cause the system to reboot due to CPU overheat. The temperature of the thermal zone can be read by the command: .. prompt:: bash # auto # cat /sys/class/thermal/thermal_zone0/temp 20923 Note the unit of temperature is millidegree Celcius. Here are attributes of a cooling device: .. prompt:: bash # auto # ls -l /sys/class/thermal/cooling_device0/ total 0 -rw-r--r-- 1 root root 4096 Sep 20 10:56 cur_state -r--r--r-- 1 root root 4096 Sep 20 10:56 max_state drwxr-xr-x 2 root root 0 Sep 20 10:56 power -r--r--r-- 1 root root 4096 Sep 20 10:56 type The attribute *max_state* indicates how many cooling states this device supports, and *cur_state* indicates the current state of the device. Verification of Thermal Management ================================== This section describes commands needed to verify thermal management on the board (Use |i350-EVK| as an example). Note we assume the *step_wise* governor is used in following subsections. .. note:: The temperature values used in following steps might vary depending on different platforms. Please consult related documents on appropriate values. Step 1: Set System to Performance Mode -------------------------------------- Before verification, we need to keep the CPU running at the highest frequency: .. prompt:: bash # auto # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 1308000 # echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 2001000 Step 2: Use Temperature Emulation --------------------------------- The framework provides temperature emulation to allow an user to verify thermal management functionalities without actually heating the device. For example, .. prompt:: bash # auto # echo 95000 > /sys/class/thermal/thermal_zone0/emul_temp According to the configuration of |i350-EVK|, the passive cooling (lowering CPU frequency in this case) is enabled when the temperature exceeds 105 degrees Celsius. We can verify it by running the commands: .. prompt:: bash # auto # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 2001000 # echo 105000 > /sys/class/thermal/thermal_zone0/emul_temp # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 1917000 Every time when we increase the temperature a little bit, the CPU frequency is lowered to the next level. And if we decrease the temperature, when it's below the threshold (105 degree), the frequency will increase step by step until reaching the maximum. To disable temperature emulation, run: .. prompt:: bash # auto # echo 0 > /sys/class/thermal/thermal_zone0/emul_temp Verification Script ------------------- The verification steps described above can be automated by a script: .. code:: text read_freq() { cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq } set_emul_temp() { echo $(($1 * 1000)) > /sys/class/thermal/thermal_zone0/emul_temp } echo "Setting performance mode" echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor echo "Current freq: `read_freq`" TEMP_MIN=90 TEMP_MAX=116 TEMP=100 while [ "$TEMP" -le "$TEMP_MAX" ]; do echo "Temperature: $TEMP" set_emul_temp $TEMP echo "Freq: `read_freq`" sleep 2 TEMP=$(($TEMP + 1)) done TEMP=115 while [ "$TEMP" -ge "$TEMP_MIN" ]; do echo "Temperature: $TEMP" set_emul_temp $TEMP echo "Freq: `read_freq`" sleep 2 TEMP=$(($TEMP - 1)) done echo "Done" echo schedutil > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor set_emul_temp 0