[BPI-R64]Only 10% cpu speed at already 48 degrees celcius, speed not increasing anymore

https://linuxize.com/post/bash-increment-decrement-variable/

cat loopt.sh
#!/bin/bash
i=0
until [ $i -gt 999999 ]
do
#  echo i: $i
  ((i=i+1))
done

Same result:

root@bpi-r64:~# time ./loopt.sh                                                 
                                                                                
real    0m32.162s                                                               
user    0m31.369s                                                               
sys     0m0.332s                                                                
root@bpi-r64:~# time ./loopt.sh                                                 
                                                                                
real    0m34.129s                                                               
user    0m33.398s                                                               
sys     0m0.256s                                                                
root@bpi-r64:~# time ./loopt.sh                                                 
                                                                                
real    0m55.154s                                                               
user    0m54.392s                                                               
sys     0m0.047s                                                                
root@bpi-r64:~# time ./loopt.sh                                                 
                                                                                
real    4m35.645s                                                               
user    4m6.002s                                                                
sys     0m9.492s

And times are only increasing until 4 and a half minutes, but never come back down to a half minute…

root@bpi-r64:~# cat /sys/class/thermal/thermal_zone0/temp                       
58900                                                     

Stays around 60, does not go any higher anymore

Try this one with a really cooled down BPI-R64

#!/bin/bash
i=0
until [ $i -gt 999999 ]
do
#  echo i: $i
  ((i=i+1))
  if [ $(( i % 3000 )) -eq 0 ]; then
    echo -n "TEMP:" $(cat /sys/class/thermal/thermal_zone0/temp)
    echo "   FREQ:" $(cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq)
  fi
done

Already at 48 degrees it throttles down, but then never cools down below 48 anymore, so never speeds up anymore…

1 Like

based on this:

https://elixir.bootlin.com/linux/v5.13-rc2/source/arch/arm64/boot/dts/mediatek/mt7622.dtsi#L146

i guess till 47°C CPU is passive, but else it looks similar to mt7623(r2)

my scriptrun starts with 47°C and the it throttles down from 1350000 to 119999, and then i see it goes slower. but i see a up-scaling too:

TEMP: 47300   FREQ: 437500
TEMP: 47300   FREQ: 437500
TEMP: 47100   FREQ: 119999
TEMP: 47200   FREQ: 437500
TEMP: 47100   FREQ: 437500
TEMP: 46900   FREQ: 119999
TEMP: 47400   FREQ: 119999
TEMP: 47200   FREQ: 437500
TEMP: 47800   FREQ: 437500
TEMP: 46500   FREQ: 119999
TEMP: 46800   FREQ: 119999
TEMP: 47200   FREQ: 437500
TEMP: 47700   FREQ: 119999
TEMP: 47400   FREQ: 119999

after this point it stays at 199999

Indeed I see the same. But even in idle the CPU does not cool down anymore, so it does not run like it did at the start, until you leave it unplugged for 15 minutes:

TEMP: 46900   FREQ: 1350000
TEMP: 47100   FREQ: 1025000
TEMP: 47100   FREQ: 1025000
TEMP: 47200   FREQ: 1262500

I think throttling down to 10 percent at 48 is too early.

r2 (mt7623) has same threshold (and currently runs at 52°C)

https://elixir.bootlin.com/linux/v5.13-rc2/source/arch/arm/boot/dts/mt7623.dtsi#L163

$ echo "TEMP:" $(cat /sys/class/thermal/thermal_zone0/temp) "Freq:" $(sudo cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq)
TEMP: 52596 Freq: 1040000

it seems that we can read only 1 temp and only one freq, right (not for each core)?

Looks like only one freq and temp.

When the R64 is doing some cpu extensive job, we can let the temperature easily rise to 67, with a maximum of 87 degrees. At least this is according to the dtsi. Maybe the manual has more information on this… It actually does, but let’s stay away from the 125 degrees :slight_smile:

Ok, I found something.

Cpu-passive setting is exactly the same as cpu-hot setting. So we are taking drastic hot cpu thermal actions already with a passive cpu.

Apply this patch to have the BPI-R64 regulating it’s temperature at 67 degrees.

diff -Naur a/arch/arm64/boot/dts/mediatek/mt7622.dtsi b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
--- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi	2021-05-12 11:29:16.000000000 +0200
+++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi	2021-05-15 15:15:07.214758959 +0200
@@ -170,8 +170,8 @@
 			cooling-maps {
 				map0 {
 					trip = <&cpu_passive>;
-					cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
-							 <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+					cooling-device = <&cpu0 0 0>,
+							 <&cpu1 0 0>;
 				};
 
 				map1 {

With this patch you can check if it regulates nicely. Here it does :slight_smile:

But it should actually be ok to not have any frequency limits until 87 degrees. Maybe a fan cooling-device could be activated by map1, but we leave the cpu alone. Now we could apply this patch:

diff -Naur a/arch/arm64/boot/dts/mediatek/mt7622.dtsi b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
--- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi	2021-05-12 11:29:16.000000000 +0200
+++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi	2021-05-15 15:15:07.214758959 +0200
@@ -170,14 +170,14 @@
 			cooling-maps {
 				map0 {
 					trip = <&cpu_passive>;
-					cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
-							 <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+					cooling-device = <&cpu0 0 0>,
+							 <&cpu1 0 0>;
 				};
 
 				map1 {
 					trip = <&cpu_active>;
-					cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
-							 <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+					cooling-device = <&cpu0 0 0>,
+							 <&cpu1 0 0>;
 				};
 
 				map2 {

Imho you disable the 2 thermal zones here completely…

maybe there is another way, e.g. wait x for idle and then step down or similar.,or just increasing temp to a more real world value.

Seems like there is only THERMAL_NO_LIMIT https://elixir.bootlin.com/linux/latest/source/include/dt-bindings/thermal/thermal.h#L13

Values are 0-7, because the cpu has 8 operating states defined.

THERMAL_NO_LIMIT on first position equals zero THERMAL_NO_LIMIT on second position equals 7 in this case.

Setting THERMAL_NO_LIMIT THERMAL_NO_LIMIT, thus 0 7, then would imply that all operating states are to be used, in order to stay on the trip temperature defined in the corresponding map.

The lightest throttling values to use would be 0 1 which equals THERMAL_NO_LIMIT 1

But then the cpu will not run at it’s highest frequency of 1.35 GHz, already at 48 degrees celcius. This does not make sense.

In fact, now we also just have one thermal zone, because the active and hot zones are never reached, when the passive zone is setup like this.

Changing the temperature on the passive trip, would ultimately do the same thing as the patches, but now the name would be wrong, because a cpu at 65 degrees or so, is not passive anymore…

Here is an example of a good multiple maps. It is used for fan control. Only one (the one with highest temp) will be used for throttling cpu frequency.

	trips {
		cpu_alert0: cpu-alert0 {
			temperature = <90000>; /* millicelsius */
			hysteresis = <2000>; /* millicelsius */
			type = "active";
		};
		cpu_alert1: cpu-alert1 {
			temperature = <100000>; /* millicelsius */
			hysteresis = <2000>; /* millicelsius */
			type = "passive";
		};
		cpu_crit: cpu-crit {
			temperature = <125000>; /* millicelsius */
			hysteresis = <2000>; /* millicelsius */
			type = "critical";
		};

	cooling-maps {
			map0 {
				trip = <&cpu_alert0>;
				cooling-device = <&fan0 THERMAL_NO_LIMIT 4>;
			};
			map1 {
				trip = <&cpu_alert1>;
				cooling-device = <&fan0 5 THERMAL_NO_LIMIT>, <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
		};

From: https://www.kernel.org/doc/Documentation/devicetree/bindings/thermal/thermal.txt

Just add a middle map and, since we do not have a fan, remove the fan. Then it is the setup we have… Only limit cpu freq at highest thermal map.

Why this? Header file defines it as ~0, so negate it = all bits to one…don’t know if it is byte (ff),uint (255) or int (-1),but not 0 independing of position

Somewhere in the code…

Take a look at he above example, for the fan it is, if the fan has 10 states for example…

map0: cooling-device = <&fan0 0 4>;

So in map0 the states 0 to 4 are used (less cooling)

And later

map1: cooling-device = <&fan0 5 9>;

In map1 the states 5 to 9 are used (more cooling)

In our case the cpu itself is defined as cooling device…so a stepdown,but no specific step (range)…i guess this is handled by driver anyhow

Building the kernel now on the BPI-R64 with 2 cores… A pretty good stress test…

10 minutes building now, and there is no need to throttle the frequency at all yet:

root@bpi-r64:~# echo "TEMP:" $(cat /sys/class/thermal/thermal_zone0/temp) "   FREQ:" $(cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq)
TEMP: 79500    FREQ: 1350000

After about 1 and 1/4 hours, finished building the complete BPI-R64 kernel. All the time running at 100% cpu frequency, temperature slowly creeped up to 85 degrees celcius. No thermal measures were taken as it is set at 87 degrees Celcius. Last measurements before completion:

TEMP: 84600   FREQ: 1350000
TEMP: 85100   FREQ: 1350000
TEMP: 85100   FREQ: 1350000
TEMP: 84800   FREQ: 1350000
TEMP: 85100   FREQ: 1350000

I do have to say this cpu runs much cooler then the cpu on the Linksys WRT routers. Even with heatsinks they went up quickly above 100 degrees.

So the mt7622 is running fine without heatsinks, if thermal zone is configured correctly.

Would you post to mainline as rfc as you can better explain behaviour?

Maybe as rfc first,and finally with CC:stable

Where did you find the max core temp?

Sure… Where to post it?

Create patch with git format-patch, run through scripts/checkpatch.pl Get recipients (only lists+maintainer+reviewer) with scripts/get_maintainer.pl And send via git send-email

have wrote it down here (but comments only in german): https://www.fw-web.de/dokuwiki/doku.php?id=programming:git:start#git_format-patch

Send. Let’s see if they apply the patch to the kernel.

1 Like

At least there is a discussion open for the “how is the right way”

Have you tried the suggestion from daniel (passive to 70/75,active to 80/85)?