Frequent freeze-ups of M2U


(Pavel Zykin) #1

Hello, I’ve got mine M2U some days ago and have constant freezes repeating every 1-20 min. Device stops to respond spontaneously without any error either in log or serial console, green led (heartbeat) stays on or off state indefinitely after freeze. The SoC temperature (/sys/devices/virtual/thermal/thermal_zone0/temp) is around 70 degrees at full load and less then 60 at no load. I’ve tried to boot from different SD cards and from on-board mmc, tried lowering cpu frequency down to 480 MHz (/sys/devices/system/cpu/cpufreq/) and disabling all cores except core0, tried to switch off NIC and WLAN, lowered dram frequency down to 192 MHz (/sys/devices/1c62000.dramfreq/devfreq/dramfreq/target_freq). But still with no success. I use recommended 5V2A power adapter (IRN-050200U) and have 5.24 V at the board socket, no additional periphery is connected except for USB keyboard. Freezes happen with both Debian and Ubuntu images. Please advise what else could be wrong.


An unofficial method to solve instability problem
#2

If you boot up and let the board idle, does it freeze after some time also? I have the board here so i may check if it happens here, please give some more details.


(Pavel Zykin) #3

Thank you for the reply. Plain Debian 2017-01-20-debian-8-jessie-lite-beta3-bpi-m2u.img.zip booted from sd-card at idle freezes at least once in 10 hours. Ubuntu Mate 2016-11-29-ubuntu-16.04.1-mate-desktop-preview3-bpi-m2u-sd-emmc.img.zip booted from onboard emmc freezes about once in 3-5 hours at idle. Under load it freezes in 10-15 minutes. With “Linux bpi-iot-ros-ai 3.10.65-BPI-M2U-Kernel #5” kernel I was able once to see kernel “Oops: 5 [#1] SMP ARM”, which led to kernel panic in swapper “Kernel panic - not syncing: Fatal exception in interrupt”, but most lock-ups do not produce any output on console.

I use SIC silicon carbide Ceramic heat sink on cpu 56 degrees on idle, up to 70-75 under load. The only real hot part of PBC is some small components near AXP221S. So I don’t think it is due to overheating. I wonder if there are any boot-time memory tests such as memtester86 or DDR stress test tool available for this board. Memtester under linux failed to complete full 1.7G RAM test due to freezeup during test. Just want to test RAM to be sure it is not RAM related.

The board version is BPI-M2_Ultra v1.0, serial number 100100525.


#4

Did you compile memtester or installed it? Can you point which memtester you used, so i give it a try otherwise i will wait for the 10 hrs at idle a see what happens.

Edit: And under what load? Please specify so i can test with the same load.


(Pavel Zykin) #5

I’ve installed memtester from ubuntu repository (memtester version 4.3.0 (32-bit)). For load tests I’ve used boinc-client with 4 running setiathome_8.02 jobs.

Tried to run memtester to test only 100M with ‘sudo memtester 100M 50’ - this time it freezes after 47 cycles 45 of which were without any errors and 3 were with strange errors happening at entirely different addresses and tests:

Loop 5/50: Bit Flip : testing 30FAILURE: 0xff7ffff7 != 0xfffffff7 at offset 0x000d91b0. Loop 32/50: Walking Ones : testing 45FAILURE: 0xff7bffff != 0xfffbffff at offset 0x02041370. Loop 35/50: Bit Flip : testing 30FAILURE: 0xff7ffff7 != 0xfffffff7 at offset 0x000d91b0.

looks like some memory timing or undervoltage condition, but voltages seems ok.

According to /sys/bus/platform/devices/axp22-regulator/regulator/regulator.*/ axp22_dldo1 3300000 axp22_dldo2 3300000 axp22_dldo3 3300000 axp22_dldo4 2500000 axp22_eldo1 2800000 axp22_eldo2 1500000 axp22_eldo3 1200000 axp22_ldoio0 3300000 axp22_ldoio1 1800000 axp22_dc1sw 3300000 axp22_dcdc1 3300000 axp22_dc5ldo 1100000 axp22_dcdc2 1160000 axp22_dcdc3 1100000 axp22_dcdc4 1100000 axp22_dcdc5 1500000 axp22_rtc 3000000 axp22_aldo1 2800000 axp22_aldo2 2500000 axp22_aldo3 3000000


#6

I will try with my customized image and report back, still using the BPI fex timing.


#7

I’ve had a few freezes too with my M2 Ultra. Been using Ubuntu mate and adding additional apps. Thought it was just me trying to do too many things at once, but with lock ups and samba problems, this is not looking good. I hope this is software related and not hardware related. I have the pi3 but it’s missing Gig NIC and SATA connection for my needs.

If I had bought this device from a local shop, I would have returned it by now.


(Pavel Zykin) #8

Seems like some random and rare glitches with memory happening at different addresses. Results of 100 runs of memtester:

pagesize is 4096 pagesizemask is 0xfffff000 want 100MB (104857600 bytes) got 100MB (104857600 bytes), trying mlock …locked.

Block Sequential :FAILURE: 0xa0a0a0a0 != 0xa020a0a0 at offset 0x025f41ac. Block Sequential :FAILURE: 0x99199999 != 0x99999999 at offset 0x026e9530. Solid Bits :FAILURE: 0xff7fffff != 0xffffffff at offset 0x016ddfb0. Bit Flip :FAILURE: 0xff7ffff7 != 0xfffffff7 at offset 0x010390b0. Bit Flip :FAILURE: 0xffdfffff != 0xff5fffff at offset 0x003db12c. Bit Flip :FAILURE: 0xff6fffff != 0xffefffff at offset 0x0301a830. Bit Flip : FAILURE: 0xfe7fffff != 0xfeffffff at offset 0x002abeb0. Bit Flip : FAILURE: 0xfffff7ff != 0xff7ff7ff at offset 0x00c5152c. Checkerboard : FAILURE: 0xaaaaaaaa != 0xaa2aaaaa at offset 0x01e91b2c. Bit Spread : FAILURE: 0x00200000 != 0x00a00000 at offset 0x01a6af30. Bit Spread : FAILURE: 0x00200000 != 0x00a00000 at offset 0x0099adf0. Walking Ones : FAILURE: 0xfffffffb != 0xff7ffffb at offset 0x0051242c.


#9

Here is my result:

You should disable DPMS completely.

Update: This is my customized image with BSP settings, DPMS disabled, screensaver disabled, cma=512


(Pavel Zykin) #10

Disabled DPMS, changed cma from 256 to 512. Still the same memory problem.


#11

Try dan-and ( https://github.com/dan-and/BPI-M2U-bsp ) kernel 3.10.104 and see if it fix your issue.

What gcc are you using to build the kernel?


(Pavel Zykin) #12

I use gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4). I’ve tried dan-and kernel and the same memory issue still persist.


#13

I don’t have stability issues on my board, at least it did not show up yet but one thing i have noticed is my Guvcview app some times get weird memory corruption at startup (maybe i introduced a bug, still investigating), i don’t see this happening on M64, this makes me wonder if DRAM timings is really correct or need some adjustments.

No other application has shown any memory issue.

I have done some minor kernel modifications on my customized build but nothing relevant. Using arm-linux-gnueabihf-gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

One important missing info that may help you, i don’t have Wifi/BT enabled and loaded on my own image.


(Tomáš Janovský) #14

Same problem after several days. It freezes every 3-4 days. (No BT/WiFi) using 2016-11-29-debian-8-jessie-lite-beta2-bpi-m2u-sd-emmc.img.zip

memtester 100M 100

Loop 3/100: Stuck Address : ok Random Value : ok Compare XOR : ok Compare SUB : ok Compare MUL : ok Compare DIV : ok Compare OR : ok Compare AND : ok Sequential Increment: ok Solid Bits : ok Block Sequential : testing 63FAILURE: 0x3f3f3f3f != 0x3f162f05 at offset 0x02c385d4.

It looks like BPI M2U is shit :frowning:


#15

Come on, hopefully a kernel update will fix this right? I bought this board so I can had a 24/7 NAS box . Raspberry Pi3 is rock solid but has slow network connection :frowning:

Do people here think a kernel fix will sort this out or is it a hardware issue?


(Pavel Zykin) #16

Seems that small voltage tweak could resolve this issue. I’ve increased dcdc2_vol, dcdc3_vol, dcdc4_vol and dc5ldo_vol and set dram frequency at 480 MHz. Memtester does not show any errors anymore and it is stable running memtester + cpuburn for 1 day. Still in progress of finding minimal voltage changes which would make it possible to run at DRAM full speed.

!DISCLAIMER! Changing voltages could physically damage your device and probably will void your warranty. Be extra careful changing any voltages of frequencies and do it at your own risk. This change do lead to some rise of chip temperature. Voltage default working dcdc2_vol 1001160 1001280 dcdc3_vol 1001100 1001150 dcdc4_vol 1100 1160 dc5ldo_vol 1100 1200 dram_clk = 480


(Tomáš Janovský) #17

Hi, I have got a new BPI M2U and memtest works fine (it will see in next several days), so I think that our problematic M2U are waste. :frowning: So what can we do with that? It was bought on Aliexpres. I hope for my money return.


(Pavel Zykin) #18

It works with dram frequency up to 648 Mhz with those voltage adjustments for 2 days now, no signs of memory corruption. Both BT and WiFi is on, memtester + cpuburn + iperf (GbE + WiFi) + dd to continiously read mmc + BT to continuously read from phone. Seems that this issue is closed.


(Robin Kriebel) #19

Hello, I’m new here. Can you here write please where and how to set this values? Thank you very much. Sorry for my English.


(Pavel Zykin) #20

Hi! First of all you should check that your board have the same issue with memory corruption. Get any system image, like 2017-01-20-debian-8-jessie-lite-beta3-bpi-m2u.img.zip from this post: BPI-M2 Ultra new image:ubuntu-16.04.1-mate-desktop-preview3-bpi-m2u-sd-emmc.img 2016-11-29 . Install it as usual, install memtester package ("sudo apt install memtester"), check memory with "sudo memtester 1000M 10" and see that you really have failed tests.

!DISCLAIMER! Changing voltages could physically damage your device and probably will void your warranty. Be extra careful changing any voltages of frequencies and do it at your own risk. This change do lead to some rise of chip temperature.

If you do have memory corruption you can compile a new kernel (for instance [quote=“avaf, post:11, topic:2823”] dan-and ( https://github.com/dan-and/BPI-M2U-bsp ) kernel 3.10.104 [/quote]) with corrected sys_config.fex at the desired configuration. For 1080p HDMI video output it would be /trunk/target/allwinner/azalea-m2ultra/configs/BPI_M2U_1080P/sys_config.fex. you should change [power_sply] and [dram_para] sections:

[power_sply]
dcdc1_vol                  = 1003300
dcdc2_vol                  = 1001280
dcdc3_vol                  = 1001150
dcdc4_vol                  = 1160
;dcdc5_vol                  = 1500
aldo1_vol                  = 2800
aldo2_vol                  = 1001500
aldo3_vol                  = 1003000
dc1sw_vol                  = 3000
dc5ldo_vol                 = 1200
dldo1_vol                  = 3300
dldo2_vol                  = 3300
dldo3_vol                  = 3300
dldo4_vol                  = 2500
eldo1_vol                  = 2800
eldo2_vol                  = 1500
eldo3_vol                  = 1200
gpio0_vol                  = 3300
gpio1_vol                  = 1800

[dram_para]
dram_clk      = 480

This kernel could be cross-compiled for M2U arm from any regular x86 Ubuntu or other linux installation ( more info here https://github.com/dan-and/BPI-M2U-bsp/blob/master/REQUIREMENTS.md ). Just run ./build.sh at the bsp source root, choose 1, then 6 and you’ll have compiled images at ./SD. Copy them to your M2U Linux installation and use bpi-bootsel to apply them to your system.