[BPI-R64]Only 10% cpu speed at already 48 degrees celcius, speed not increasing anymore

When BPI R64 is running on image 2020-12-20-ubuntu-18.04.3-bpi-r64-5.4-sd-emmc

[email protected] ~# time $(i=0; while (( i < 999999 )); do (( i ++ )); done)

real    3m55.269s
user    3m45.925s
sys     0m0.038s

This is a MediaTek MT7622,1.35GHZ 64-bit dual-core ARM Cortex-A53

I am migrating from Linksys WRT1900ACS, running kernel 5.10 and ubuntu focal. There it runs like this:

[email protected]:~# time $(i=0; while (( i < 999999 )); do (( i ++ )); done)

real	0m15.485s
user	0m15.470s
sys	0m0.000s

This is on a 88F6820-A0 C160 @1.6GHz Armada 385 32-bit dual-core.

I Would expect a difference, mainly since the R64 is running without heatsinks, but the BPI-R64 performes so much slower!

What is going wrong here?

5.13.0-rc2-bpi-r64

[email protected]:~# time $(i=0; while (( i < 999999 )); do (( i ++ )); done)        
                                                                                
real    0m18.389s                                                               
user    0m18.286s                                                               
sys     0m0.005s                                                                
[email protected]:~#
[email protected]:~# time $(i=0; while (( i < 999999 )); do (( i ++ )); done)

real    0m20.904s
user    0m20.509s
sys     0m0.115s
[email protected]:~# time $(i=0; while (( i < 999999 )); do (( i ++ )); done)

real    1m35.995s
user    1m29.354s
sys     0m1.947s
[email protected]:~# time $(i=0; while (( i < 999999 )); do (( i ++ )); done)

real    3m55.452s
user    3m31.812s
sys     0m7.552s

Indeed it can run fast… But if you keep doing it, it slows down. It seems to be a thermal issue…

indeed…but sometimes it sems to be a wrong display:

[email protected]:~# time $(i=0; while (( i < 999999 )); do (( i ++ )); done)        
                                                                                
real    65671m59.329s     #first run after bootup, but runs max 5min
user    0m18.466s                                                               
sys     0m0.001s                                                                
[email protected]:~# time $(i=0; while (( i < 999999 )); do (( i ++ )); done)        
                                                                                
real    0m18.389s                                                               
user    0m18.286s                                                               
sys     0m0.005s                                                                
[email protected]:~# time $(i=0; while (( i < 999999 )); do (( i ++ )); done)        
                                                                                
real    3m36.067s                                                               
user    3m29.296s                                                               
sys     0m0.018s                                                                
[email protected]:~#

It does really take 4 minutes…

Once the cpu has throttled down on temperature, it does not seems to be able to throttle up anymore. Not even after minutes of idling.

had run multiple tests after that…all 3-4min

after running some time i have SOC 53°C and around it 50°C, Switch is 36°, now poweroff, waiting bit and then measuring temp more often

after poweroff and disconnected powercord (so really powercycle) for ~5min i booted again and the first 2 cycles takes >3min, temp goes to max 45°C so it does not look like a power throttle for me

are you sure the syntax is right? i tried to print $i on each iteration and it does not print anything

time $(i=0; while (( i < 999999 )); do echo $i;(( i ++ )); done) 

wonder why you do not use $i inside while, but i guess this is because output is captured by $() for time, so maybe it waits for completion before printing anything. WIthout outer capture i see echos…while running i see no slowdown…of course it is slower than without printing, but i want see if there is any hang at some point

st=$( date '+%s' );i=0; while [ $i -lt 9999 ]; do echo $i;i=$(( $i + 1 )); done;et=$( date '+%s' );tt=$(($et-$st));echo $tt" seconds"

https://linuxize.com/post/bash-increment-decrement-variable/

cat loopt.sh
#!/bin/bash
i=0
until [ $i -gt 999999 ]
do
#  echo i: $i
  ((i=i+1))
done

Same result:

[email protected]:~# time ./loopt.sh                                                 
                                                                                
real    0m32.162s                                                               
user    0m31.369s                                                               
sys     0m0.332s                                                                
[email protected]:~# time ./loopt.sh                                                 
                                                                                
real    0m34.129s                                                               
user    0m33.398s                                                               
sys     0m0.256s                                                                
[email protected]:~# time ./loopt.sh                                                 
                                                                                
real    0m55.154s                                                               
user    0m54.392s                                                               
sys     0m0.047s                                                                
[email protected]:~# time ./loopt.sh                                                 
                                                                                
real    4m35.645s                                                               
user    4m6.002s                                                                
sys     0m9.492s

And times are only increasing until 4 and a half minutes, but never come back down to a half minute…

[email protected]:~# cat /sys/class/thermal/thermal_zone0/temp                       
58900                                                     

Stays around 60, does not go any higher anymore

Try this one with a really cooled down BPI-R64

#!/bin/bash
i=0
until [ $i -gt 999999 ]
do
#  echo i: $i
  ((i=i+1))
  if [ $(( i % 3000 )) -eq 0 ]; then
    echo -n "TEMP:" $(cat /sys/class/thermal/thermal_zone0/temp)
    echo "   FREQ:" $(cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq)
  fi
done

Already at 48 degrees it throttles down, but then never cools down below 48 anymore, so never speeds up anymore…

1 Like

based on this:

https://elixir.bootlin.com/linux/v5.13-rc2/source/arch/arm64/boot/dts/mediatek/mt7622.dtsi#L146

i guess till 47°C CPU is passive, but else it looks similar to mt7623(r2)

my scriptrun starts with 47°C and the it throttles down from 1350000 to 119999, and then i see it goes slower. but i see a up-scaling too:

TEMP: 47300   FREQ: 437500
TEMP: 47300   FREQ: 437500
TEMP: 47100   FREQ: 119999
TEMP: 47200   FREQ: 437500
TEMP: 47100   FREQ: 437500
TEMP: 46900   FREQ: 119999
TEMP: 47400   FREQ: 119999
TEMP: 47200   FREQ: 437500
TEMP: 47800   FREQ: 437500
TEMP: 46500   FREQ: 119999
TEMP: 46800   FREQ: 119999
TEMP: 47200   FREQ: 437500
TEMP: 47700   FREQ: 119999
TEMP: 47400   FREQ: 119999

after this point it stays at 199999

Indeed I see the same. But even in idle the CPU does not cool down anymore, so it does not run like it did at the start, until you leave it unplugged for 15 minutes:

TEMP: 46900   FREQ: 1350000
TEMP: 47100   FREQ: 1025000
TEMP: 47100   FREQ: 1025000
TEMP: 47200   FREQ: 1262500

I think throttling down to 10 percent at 48 is too early.

r2 (mt7623) has same threshold (and currently runs at 52°C)

https://elixir.bootlin.com/linux/v5.13-rc2/source/arch/arm/boot/dts/mt7623.dtsi#L163

$ echo "TEMP:" $(cat /sys/class/thermal/thermal_zone0/temp) "Freq:" $(sudo cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq)
TEMP: 52596 Freq: 1040000

it seems that we can read only 1 temp and only one freq, right (not for each core)?

Looks like only one freq and temp.

When the R64 is doing some cpu extensive job, we can let the temperature easily rise to 67, with a maximum of 87 degrees. At least this is according to the dtsi. Maybe the manual has more information on this… It actually does, but let’s stay away from the 125 degrees :slight_smile:

Ok, I found something.

Cpu-passive setting is exactly the same as cpu-hot setting. So we are taking drastic hot cpu thermal actions already with a passive cpu.

Apply this patch to have the BPI-R64 regulating it’s temperature at 67 degrees.

diff -Naur a/arch/arm64/boot/dts/mediatek/mt7622.dtsi b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
--- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi	2021-05-12 11:29:16.000000000 +0200
+++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi	2021-05-15 15:15:07.214758959 +0200
@@ -170,8 +170,8 @@
 			cooling-maps {
 				map0 {
 					trip = <&cpu_passive>;
-					cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
-							 <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+					cooling-device = <&cpu0 0 0>,
+							 <&cpu1 0 0>;
 				};
 
 				map1 {

With this patch you can check if it regulates nicely. Here it does :slight_smile:

But it should actually be ok to not have any frequency limits until 87 degrees. Maybe a fan cooling-device could be activated by map1, but we leave the cpu alone. Now we could apply this patch:

diff -Naur a/arch/arm64/boot/dts/mediatek/mt7622.dtsi b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
--- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi	2021-05-12 11:29:16.000000000 +0200
+++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi	2021-05-15 15:15:07.214758959 +0200
@@ -170,14 +170,14 @@
 			cooling-maps {
 				map0 {
 					trip = <&cpu_passive>;
-					cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
-							 <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+					cooling-device = <&cpu0 0 0>,
+							 <&cpu1 0 0>;
 				};
 
 				map1 {
 					trip = <&cpu_active>;
-					cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
-							 <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+					cooling-device = <&cpu0 0 0>,
+							 <&cpu1 0 0>;
 				};
 
 				map2 {

Imho you disable the 2 thermal zones here completely…

maybe there is another way, e.g. wait x for idle and then step down or similar.,or just increasing temp to a more real world value.

Seems like there is only THERMAL_NO_LIMIT https://elixir.bootlin.com/linux/latest/source/include/dt-bindings/thermal/thermal.h#L13

Values are 0-7, because the cpu has 8 operating states defined.

THERMAL_NO_LIMIT on first position equals zero THERMAL_NO_LIMIT on second position equals 7 in this case.

Setting THERMAL_NO_LIMIT THERMAL_NO_LIMIT, thus 0 7, then would imply that all operating states are to be used, in order to stay on the trip temperature defined in the corresponding map.

The lightest throttling values to use would be 0 1 which equals THERMAL_NO_LIMIT 1

But then the cpu will not run at it’s highest frequency of 1.35 GHz, already at 48 degrees celcius. This does not make sense.

In fact, now we also just have one thermal zone, because the active and hot zones are never reached, when the passive zone is setup like this.

Changing the temperature on the passive trip, would ultimately do the same thing as the patches, but now the name would be wrong, because a cpu at 65 degrees or so, is not passive anymore…

Here is an example of a good multiple maps. It is used for fan control. Only one (the one with highest temp) will be used for throttling cpu frequency.

	trips {
		cpu_alert0: cpu-alert0 {
			temperature = <90000>; /* millicelsius */
			hysteresis = <2000>; /* millicelsius */
			type = "active";
		};
		cpu_alert1: cpu-alert1 {
			temperature = <100000>; /* millicelsius */
			hysteresis = <2000>; /* millicelsius */
			type = "passive";
		};
		cpu_crit: cpu-crit {
			temperature = <125000>; /* millicelsius */
			hysteresis = <2000>; /* millicelsius */
			type = "critical";
		};

	cooling-maps {
			map0 {
				trip = <&cpu_alert0>;
				cooling-device = <&fan0 THERMAL_NO_LIMIT 4>;
			};
			map1 {
				trip = <&cpu_alert1>;
				cooling-device = <&fan0 5 THERMAL_NO_LIMIT>, <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
		};

From: https://www.kernel.org/doc/Documentation/devicetree/bindings/thermal/thermal.txt

Just add a middle map and, since we do not have a fan, remove the fan. Then it is the setup we have… Only limit cpu freq at highest thermal map.

Why this? Header file defines it as ~0, so negate it = all bits to one…don’t know if it is byte (ff),uint (255) or int (-1),but not 0 independing of position

Somewhere in the code…

Take a look at he above example, for the fan it is, if the fan has 10 states for example…

map0: cooling-device = <&fan0 0 4>;

So in map0 the states 0 to 4 are used (less cooling)

And later

map1: cooling-device = <&fan0 5 9>;

In map1 the states 5 to 9 are used (more cooling)

In our case the cpu itself is defined as cooling device…so a stepdown,but no specific step (range)…i guess this is handled by driver anyhow

Building the kernel now on the BPI-R64 with 2 cores… A pretty good stress test…

10 minutes building now, and there is no need to throttle the frequency at all yet:

[email protected]:~# echo "TEMP:" $(cat /sys/class/thermal/thermal_zone0/temp) "   FREQ:" $(cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq)
TEMP: 79500    FREQ: 1350000

After about 1 and 1/4 hours, finished building the complete BPI-R64 kernel. All the time running at 100% cpu frequency, temperature slowly creeped up to 85 degrees celcius. No thermal measures were taken as it is set at 87 degrees Celcius. Last measurements before completion:

TEMP: 84600   FREQ: 1350000
TEMP: 85100   FREQ: 1350000
TEMP: 85100   FREQ: 1350000
TEMP: 84800   FREQ: 1350000
TEMP: 85100   FREQ: 1350000

I do have to say this cpu runs much cooler then the cpu on the Linksys WRT routers. Even with heatsinks they went up quickly above 100 degrees.

So the mt7622 is running fine without heatsinks, if thermal zone is configured correctly.