Boot halts at bringing up secondary CPUs

I have a BPI R4 4GB PoE (I soldered myself like 6 month ago) which I bought December 2024. I was using it with the OpenWRT images but just used the official poe 4gb sd card image to test this. Everything seems normal to me until the last line:

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd090]
[    0.000000] Linux version 5.4.260 ([email protected]) (gcc version 8.4.0 (OpenWrt GCC 8.4.0 unknown)) #0 SMP Fri Jan 19 02:26:09 2024
[    0.000000] Machine model: Bananapi BPI-R4
[    0.000000] earlycon: uart8250 at MMIO32 0x0000000011000000 (options '')
[    0.000000] printk: bootconsole [uart8250] enabled
[    0.000000] On node 0 totalpages: 1045952
[    0.000000]   DMA32 zone: 12288 pages used for memmap
[    0.000000]   DMA32 zone: 0 pages reserved
[    0.000000]   DMA32 zone: 783808 pages, LIFO batch:63
[    0.000000]   Normal zone: 4096 pages used for memmap
[    0.000000]   Normal zone: 262144 pages, LIFO batch:63
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.1 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000] psci: SMC Calling Convention v1.0
[    0.000000] percpu: Embedded 20 pages/cpu s44376 r8192 d29352 u81920
[    0.000000] pcpu-alloc: s44376 r8192 d29352 u81920 alloc=20*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
[    0.000000] Detected VIPT I-cache on CPU0
[    0.000000] CPU features: detected: GIC system register CPU interface
[    0.000000] CPU features: kernel page table isolation disabled by kernel configuration
[    0.000000] CPU features: detected: Spectre-BHB
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 1029568
[    0.000000] Kernel command line: console=ttyS0,115200n1 loglevel=8                       earlycon=uart8250,mmio32,0x11000000                           2
[    0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
[    0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] software IO TLB: mapped [mem 0xfb7ed000-0xff7ed000] (64MB)
[    0.000000] Memory: 4025132K/4183808K available (8382K kernel code, 600K rwdata, 2460K rodata, 512K init, 295K bss, 158676K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     CONFIG_RCU_FANOUT set to non-default value of 32.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] GICv3: GIC: Using split EOI/Deactivate mode
[    0.000000] GICv3: 416 SPIs implemented
[    0.000000] GICv3: 0 Extended SPIs implemented
[    0.000000] GICv3: Distributor has no Range Selector support
[    0.000000] GICv3: 16 PPIs implemented
[    0.000000] GICv3: no VLPI support, no direct LPI support
[    0.000000] GICv3: CPU0: found redistributor 0 region 0:0x000000000c080000
[    0.000000] arch_timer: cp15 timer(s) running at 13.00MHz (phys).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x2ff89eacb, max_idle_ns: 440795202429 ns
[    0.000002] sched_clock: 56 bits at 13MHz, resolution 76ns, wraps every 4398046511101ns
[    0.008280] Calibrating delay loop (skipped), value calculated using timer frequency.. 26.00 BogoMIPS (lpj=52000)
[    0.018676] pid_max: default: 32768 minimum: 301
[    0.023405] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.030900] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.039448] ASID allocator initialised with 65536 entries
[    0.044959] rcu: Hierarchical SRCU implementation.
[    0.050059] smp: Bringing up secondary CPUs ...

My System crashed sometime yesterday or so (I wasn’t home for 1 day). Does this mean my cpu is fried? Since I can’t even get into anything (except the web recovery ui, which isn’t helpful to this I think).

Something similar also happens with the OpenWRT images, but there it halts at “Starting Kernel…”.

If anyone knows something, would be great to know if I need to contact the vendor and pray.

The funny behaviour of stock OpenWrt is to boot into recovery mode when it crashes in normal boot. It’ll always boot into recovery until you (inspect and) clean up the files in /sys/fs/pstore.

The wishful idea of such design is to allow you a chance to inspect what might have caused the crash.

Thank you for the insight, but that shouldn’t effect the sinovoip version of openwrt shouldn’t it? And nonetheless I could not go to anywhere and clear anything as the system doesn’t boot and I can’t really access the affected filesystem, as that is on the emmc. But the other ones also do not start, so I dont think thats the problem here.

Sadly I didn’t see what the last thing the router did while it was still operational was, as I did not expect a failure of this magnitude.

Have you tried with a recent release of stock OpenWrt on SD card? Worth a try if you haven’t. The Sinovoip software is too old. Many people don’t have experience. It’s also the default behaviour of MediaTek SDK (which was/is based on OpenWrt but not exactly same).

I’ve seen two or three cases of sudden deaths of R4 in the past month on this forum. None got to the bottom of the causes. Two were likely caused by electrostatic discharge. Of the two, one was confirmed to be consistently stuck in recovery mode of stock OpenWrt. The others didn’t provide much meaningful diagnosis.

It seems to me sudden death of R4 in winter is not a very rare event if not handled properly…

Running on it when it crashed was the 25.12.0-rc01 and I also tried with rc05, but as I said, both halt at “Starting Kernel…” And just now, I also tried 24.10.5, which should be the latest and it does the same.

I checked my boot log

[    0.004384] smp: Bringing up secondary CPUs ...
[    0.004602] Detected VIPT I-cache on CPU1
[    0.004639] GICv3: CPU1: found redistributor 1 region 0:0x000000000c0a0000
[    0.004658] CPU1: Booted secondary processor 0x0000000001 [0x411fd090]
[    0.004928] Detected VIPT I-cache on CPU2
[    0.004949] GICv3: CPU2: found redistributor 2 region 0:0x000000000c0c0000
[    0.004960] CPU2: Booted secondary processor 0x0000000002 [0x411fd090]
[    0.005197] Detected VIPT I-cache on CPU3
[    0.005217] GICv3: CPU3: found redistributor 3 region 0:0x000000000c0e0000
[    0.005227] CPU3: Booted secondary processor 0x0000000003 [0x411fd090]
[    0.005263] smp: Brought up 1 node, 4 CPUs
[    0.005267] SMP: Total of 4 processors activated.
[    0.005268] CPU: All CPU(s) started at EL2
[    0.005271] CPU features: detected: 32-bit EL0 Support
[    0.005273] CPU features: detected: CRC32 instructions
[    0.005301] alternatives: applying system-wide alternatives
[    0.005378] CPU features: emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching

Above are what should follow where your R4 stuck.

I’m not sure you were unlucky to have good CPU0, and three other bad cores. Unfortunately it seems likely there is hardware broken.

And the log before is the same? Thank you for sharing, I already suspected a failure but this seems to confirm it, especially if the log is actually the same before that. Bummer

Not exactly same because I’m on kernel 6.12.71.

Interestingly your CPU0 can run ROM and boot up the kernel up to the point of bringing up other three cores. Weird but interesting.

Just a wild guess. Try a different power supply and/or socket.

Sadly, I already tried that. First used the poe than the powersupply it shipped with. I am now in contact with the seller. Hope to get a refund for this.