R3 - thermal sensor went crazy after the kernel update

Hi all,

After updating the kernel version to 6.4 the CPU thermal sensor shows -274 C while on the previous kernel ( stock [BPI-R3] Debian Bullseye Image kernel) it was ok.

bpi-r3 ~ # cat /sys/class/thermal/thermal_zone0/temp  
-274000

bpi-r3 ~ # uname -r
6.4.0-bpi-r3-main

image

I used the default config + some network-related stuff was added.

Does anyone have similar behavior?

Thak You!

Can you revert to the old kernel to check for hardware issue?

And if it works pls try kernels between…i try to find some time to check with 6.5-rc on my device

bpi-r3 ~ # uname -r
6.1.0-bpi-r3-main
bpi-r3 ~ # cat /sys/class/thermal/thermal_zone0/temp  
45395
bpi-r3 ~ #

this is the original prebuilt kernel from the bpi-r3_sdmmc_bullseye.img.gz

Sure, it will take some time

6.5.0-rc7-next-20230823

shows same behaviour on my device

root@bpi-r3:~# cat /sys/class/thermal/thermal_zone0/temp                                                                     
-274000
1 Like

back to 6.4:

bpi-r3 ~ # uname -r
6.4.0-bpi-r3-main
bpi-r3 ~ # cat /sys/class/thermal/thermal_zone0/temp  
-274000

Another issue with the 6.4 kernel - after reboot u-boot is unable access SD-card:

F0: 102B 0000
FA: 1040 0000
FA: 1040 0000 [0200]
F9: 103F 0000
F3: 1001 0000 [0200]
F3: 1001 0000
F6: 300C 0028
F5: 0000 0000
V0: 0000 0000 [0001]
00: 0000 0000
BP: 2400 0041 [0000]
G0: 1190 0000
EC: 0000 0000 [3000]
T0: 0000 01B7 [010F]
Jump to BL

NOTICE:  BL2: v2.6(release):v2.6-897-gc30a1caf8274 sdmmc
NOTICE:  BL2: Built : 09:39:42, Dec 16 2022
NOTICE:  WDT: disabled
NOTICE:  CPU: MT7986 (2000MHz)
NOTICE:  EMI: Using DDR4 settings
NOTICE:  EMI: Detected DRAM size: 2048MB
NOTICE:  EMI: complex R/W mem test passed
NOTICE:  BL2: Booting BL31
NOTICE:  BL31: v2.6(release):v2.6-897-gc30a1caf8274 sdmmc
NOTICE:  BL31: Built : 09:39:44, Dec 16 2022


U-Boot 2022.10-00039-g2d6f3964e104-dirty (Dec 16 2022 - 09:35:59 +0100)

CPU:   MediaTek MT7986
Model: mt7986-rfb
DRAM:  2 GiB
Core:  44 devices, 20 uclasses, devicetree: separate
MMC:   mmc@11230000: 0
Loading Environment from MMC... unable to read ssr
*** ERROR: Can't read GPT header ***
find_valid_gpt: *** ERROR: Invalid GPT ***
*** ERROR: Can't read GPT Entries ***
find_valid_gpt: *** ERROR: Invalid Backup GPT ***
*** ERROR: Can't read GPT header ***
.....
boot menu after ~x20 repeated messages
....
* No partition table - mmc 0 **
Couldn't find partition mmc 0:5
## Error: "initrd" not defined
mmc - MMC sub system

Usage:
mmc info - display info of the current MMC device
mmc read addr blk# cnt
mmc write addr blk# cnt
mmc erase blk# cnt
mmc rescan [mode]
mmc part - lists available partition on current mmc device
mmc dev [dev] [part] [mode] - show or set current mmc device [partition] and set mode
  - the required speed mode is passed as the index from the following list
    [MMC_LEGACY, MMC_HS, SD_HS, MMC_HS_52, MMC_DDR_52, UHS_SDR12, UHS_SDR25,
    UHS_SDR50, UHS_DDR50, UHS_SDR104, MMC_HS_200, MMC_HS_400, MMC_HS_400_ES]
mmc list - lists available devices
mmc wp [PART] - power on write protect boot partitions
  arguments:
   PART - [0|1]
       : 0 - first boot partition, 1 - second boot partition
         if not assigned, write protect all boot partitions
mmc hwpartition <USER> <GP> <MODE> - does hardware partitioning
  arguments (sizes in 512-byte blocks):
   USER - <user> <enh> <start> <cnt> <wrrel> <{on|off}>
        : sets user data area attributes
   GP - <{gp1|gp2|gp3|gp4}> <cnt> <enh> <wrrel> <{on|off}>
        : general purpose partition
   MODE - <{check|set|complete}>
        : mode, complete set partitioning completed
  WARNING: Partitioning is a write-once setting once it is set to complete.
  Power cycling is required to initialize partitions after set to complete.
mmc setdsr <value> - set DSR register value

sd available
jedec_spi_nor spi_nor@0: unrecognized JEDEC id bytes: 00, ef, aa
Failed to initialize SPI flash at 0:0 (error 0)
NAND available
sd nand
fit=bpi-r3.itb
** No partition table - mmc 0 **
Couldn't find partition mmc 0:5
Can't set block device
BPI-R3>

the 6.1 always reboots correctly. 1st boot after a cold start to any kernel is always good.

uboot -> linux-6.1 -> reboot -> uboot -> linux 6.4 -> reboot -> uboot -> linux-6.4 .... always works unlil cold boot

uboot -> linux-6.4 -> reboot -> uboot-error as described

I’ll try to update u-boot and test again

P.S. u-boot is a stock image version

UPD:

BPI-R3> mmc info
Device: mmc@11230000
Manufacturer ID: 12
OEM: 3456
Name: SDBus Speed: 50000000
Mode: SD High Speed (50MHz)
Rd Block Len: 512
SD version 3.0
High Capacity: Yes
Capacity: 58.2 GiB
Bus Width: 1-bit
Erase Group Size: 512 Bytes
BPI-R3> mmc rescan
unable to read ssr
unable to read ssr
BPI-R3>

cold boot:

BPI-R3> mmc info
Device: mmc@11230000
Manufacturer ID: 12
OEM: 3456
Name: SDBus Speed: 25000000
Mode: MMC legacy
Rd Block Len: 512
SD version 3.0
High Capacity: Yes
Capacity: 58.2 GiB
Bus Width: 4-bit
Erase Group Size: 512 Bytes

bus-width is different

UPD: sometimes 6.4 reboots correctly.

6.1 -> warm reboot > 6.4 -> warm reboot > … i’ve tested it with 5x 6.4 reboots in a row

a bit another output beginning is the same:

** No partition table - mmc 0 **
Couldn't find partition mmc 0:5
Can't set block device
BPI-R3>  
BPI-R3> mmc rescan
mmc_init: -95, time 2315
BPI-R3> mmc info
unable to read ssr
unable to read ssr
Device: mmc@11230000
Manufacturer ID: 12
OEM: 3456
Name: SDBus Speed: 25000000
Mode: MMC legacy
Rd Block Len: 512
SD version 3.0
High Capacity: Yes
Capacity: 58.2 GiB
Bus Width: 4-bit
Erase Group Size: 512 Bytes
BPI-R3> mmc rescan  
unable to read ssr
unable to read ssr

after reset command the kernel was able to boot from SD

4bit is right, wonder about the 1bit bus, but this is unrelated to kernel…made some similar experiences in past where sdcard was not recognized some times, but had no idea what may be the cause, maybe try to update ATF/uboot to newer version (mtk-atf-r4 +2023-07-bpi in my uboot-repo)

but sometimes the sd cannot be read on cold boot…a reset in uboot solves this

i have also the “unable to read ssr” message, but only 1x and boot works

with 6.3.0-rc6 the temp works for me…so it is problem between this and 6.4

made a diff and log (combined in one file so not usable for patch, only for manual looking)

thermal_6.3rc6-6.4.diff (11,3 KB)

i reverted “Add temperature constraints to validate read” and i get 0…so something involved here, but should only target v1 where mt7986 is v3

breaking commit is “thermal/drivers/mediatek: Control buffer enablement tweaks”

after reverting this and fixing some build-issues i can read the temp again with 6.5-rc7

root@bpi-r3:~# uname -a
Linux bpi-r3 6.5.0-rc7-next-20230824-bpi-r3-2g5 #6 SMP Thu Aug 31 10:59:43 CEST 2023 aarch64 GNU/Linux
root@bpi-r3:~# cat /sys/class/thermal/thermal_zone0/temp 
33013

see changes in https://github.com/frank-w/BPI-Router-Linux/tree/6.5-2g5

sent a regression report to mailinglist and authors/maintainers of the breaking patch

https://lkml.org/lkml/2023/8/31/265

I had the same on R64, some cards U-Boot can’t handle. You may find that another type SD card works fine… It is why I have started to boot from ATF directly in to kernel.

for thermal issue:

it looks like only the additional member fields were missing for mt7986, if i add the same used for mt7622, i get a temperature (before the values were fixed and now the read did not happen because they are missing)

--- a/drivers/thermal/mediatek/auxadc_thermal.c
+++ b/drivers/thermal/mediatek/auxadc_thermal.c
@@ -690,6 +690,9 @@ static const struct mtk_thermal_data mt7986_thermal_data = {
        .adcpnp = mt7986_adcpnp,
        .sensor_mux_values = mt7986_mux_values,
        .version = MTK_THERMAL_V3,
+       .apmixed_buffer_ctl_reg = APMIXED_SYS_TS_CON1,
+       .apmixed_buffer_ctl_mask = GENMASK(31, 6) | BIT(3),
+       .apmixed_buffer_ctl_set = BIT(0),
 };

@DeadMeat can you confirm this?

for mmc-issue…i ran again into it, and it looks like same like yours…1bit width and no access to mmc, but size is correctly read…i guess this is some kind of clock issue

on this card i have an older uboot installed (2023.01, so no r4 patches)

after some time 4bit-bus is visible in mmc info and some time after the read works

@hackpascal any idea here?

I’ve applied the patch to 6.4-main. Will update after testing

UPD:

It works

bpi-r3 ~ # uname -r
6.4.0-bpi-r3-main
bpi-r3 ~ # cat /sys/class/thermal/thermal_zone0/temp  
48113

Thanks Frank :slight_smile:

ok, sent out official patch to ML

https://patchwork.kernel.org/project/linux-mediatek/patch/[email protected]/