[BPI-R4] Zyxel PMG3000-D20B SFP module not detected

Bring this threat back to life. I suspect lots of pon or xgspon sfp modules are going to suffer from the same exact issue. Not keen to physically bridge the mosfet. Any software based hacks/patches/boot parameters?

Quote" Ok, I have managed to make this module at least work. The issue from my observation: It has a really long boot-up sequence, and only AFTER that sequence, it pulls the MOD_DEF0 pin low.

The problem: While that pin is not pulled low, the circuit of the BPI-R4 board itself does not turn on the power supply for the SFP module. There’s a mosfet that prevents it."

It cannot be circumvented by software. The logic is implemented via circuit, the software sees the module for a split second, and then it vanishes to never come back. The BPI-R3 had no such circuicy, and any other bit of SFP hardware I tested doesn’t either. The module only does not work in the stock BPI-R4. And given the circuit diagram for the R4 has a place for a brige in place of the mosfet drawn into the diagram, I highly assume this was at least to some degree accounted for. There are no ill effects to replacing the mosfet with a bridge. Other than the status LED being permanently lit.

Sorry for such a silly question: Once I remove the MOSFET and solder as shown, might there be issues with other SFP(+) modules?

I highly doubt it. The BPI-R4 is the first device I have ever seen that employs such an odd setup. The R3 for example simply does not have any such setup, and just permanently powers the modules. As does every Switch or Media-Converter I could get my hands on (meaning, the GPON module booted up fine in them).

1 Like

I’ve tried today…

… unfortunately, no luck.

All that’s changed is that I now get sfp sfp1/2: module removed when I plug the module in.

Any idea where I messed up? (OS is OpenWrt, kernel 6.6.72)

The LED does now shine perma-bright (it was only dimly lit before):

I’ll have a 2.5G ethernet SFP module in 2w or so, then I can at least test if the ports themselves still work.

If the LED is not permanently lit, something went wrong with the solder mod. Cause it lighting up permanently is a direct result of the modules power circuit now being permanently powered. Given it’s lit up in the picture, I guess it’s just a typo?

When the LED is brightly lit all the time, power is being supplied fine, and the issue with the module lies elsewhere.

Sorry, that was a typo, yes. It is perma-lit. Only the left one though regardless of which sfp port is used.

the issue with the module lies elsewhere.

Come to think of it, when I plugged in the module the first time, there did seem to be a short bright flash. But I had already assembled the case and played it off as my fatigue/imagination.

Do you maybe have a working non-OpenWrt image that you could please share? So I can test this further. (A tutorial to build one myself would be great, too.)

edit: oh. the first result on google is literally your gentoo tutorial XD

Only one being lit makes sense, given you only soldered out the mosfet for one of the two ports. The module will also only work in that port.

Sure, I was mistakenly under the impression that the right LED would dimly light up permanently when its port operates the module.

But I checked again, the pulse (~4s) is all the same, just limited to unplugging. There is/was no light during plugging in or “operation.”

What’s really got me confused is the sfp sfp1: module removed during plugging in. There’s nothing else in the system logs. But you had that, too.

Anyway, I just ordered myself a cheap Intel 82599EN PCIe SFP+ adapter card, that’ll allow me to test without relying on the R4.

edit: Another silly question: The module ought to show up regardless of being connected to an actual fiber network, right?

If you have the exact same model, yes. It’ll show up without being hooked up to an actual fiber network. The light being dimmly lit up is also the result of the module itself pulling the def0 pin high, but not as “strong” as normally, so a little bit of power is let through by the moset, resulting in the dim LED. The initial short bright flash is also a result of the module controling that in software. It boots up, and very early in the boot sequence, like milliseconds into it, stops pulling the pin low, and effectively disconnecting itself from power. Why that does result in a flickering of it powering down, the pin going back low, and it booting up again… I’m not sure.

Uff, turns out it’s an issue with OpenWrt seeming to have broken something in 24.10 (incl. snapshot @ 6.6.74 now).

I have no clue what (just that it definitely isn’t the renaming of eth{1,2} to sfp-{l,w}an), but with Sinovoip’s 20240620 build (21.02 @ 5.4.271) the module does show properly in dmesg & ethtool -m eth2 returns everything.

I’m not surprised, OpenWrt’s BPI-R4 support so far is horribly broken. Wifi is likewise nonfunctional in snapshot. Though while there’s this custom build that ‘fixes’ the latter through mtk 24.10 patches (which also return eth{1,2}), the module is still AWOL in it (No such device).

edit: Or is there anybody who has it still working with 24.10+?

Try this https://www.mediafire.com/file/mdlemqrst8zwupc/openwrt-mediatek-filogic-bananapi_bpi-r4-sdcard.img.gz/file

This is today’s compilation with MTK patches from GitHub - rmandrad/openwrt at bananapi4 I added argon theme and incredible package

Thanks, but it still doesn’t show in dmesg or elsewhere. ethtool -m sfp-wan even shows netlink error: No such device. (i2cdump is not present.)

What’s curious since soldering (regardless of image used) is just how hot the device gets. I guess once my 2.5G RJ45 module is here and both SFPs actually process data, I can use the R4 as heater.

The GPON module gets insanely hot, mine is usually sitting at around 70°. It’s just what happens when you try to cool 3W in such a tiny formfactor.

Yeah, I just didn’t expect that to happen even when it’s disconnected/idling on standby. Makes me wonder how full 48-SFP-port switches handle it, but I suppose it’s but another reason why datacenters are insanely loud.

Anyway, for quirks, do you happen to know whether the format needs to be…

SFP_QUIRK_F("ZYXEL", "PMG3000-D20B", sfp_fixup_ignore_tx_fault),

… or (given the ethtool -m output, which after a quick google usually doesn’t contain these _s)…

SFP_QUIRK_F("ZYXEL___________", "PMG3000-D20B____", sfp_fixup_ignore_tx_fault),

…?

With the solder mod in place, the module needs zero kernel quirks applied for it. It will only present itself as present, by pulling def0 low, when it’s done booting, and will then just work. I’m using it with a the unmodified 6.12 Kernel from Franks Github.

I guess I may have to wait until OpenWrt('s mediatek target) finally pulls 6.12.

Others were suggesting modifying OpenWrt’s mt7988a-bananapi-bpi-r4.dtsi for another module’s vanishing, but it doesn’t add anything to the soldering.

One more thing: Could you please tell the oldest kernel version that worked for you unmodified? (In particular, whether that included 6.6.)

edit: Oh great, the bulk of R4 upstreamed fixes will only make it into 6.14. But they do apply fine on top of 6.12. But even then, frank-w’s repo still seems widely diverged from it… leaving one to guess how much did not actually yet get upstreamed.

I don’t remember anymore. Whatever was the latest kernel on Franks Github when I made this thread.

Thank you for the info, still.

I’ve now built OpenWrt from the final main commit that allowed kernel 6.1(.89) for mediatek, r0+26156-fb2475e6bd, and the first 6.6 one afterwards.

In the latter, the module is gone, with general SFP issues apparently known when the kernel was updated. In the former, it’s very much still present. ethtool -m eth2 (this is before the renaming to sfp-wan) reports its full info.

Mostly: The module seems to be in a kind of reset loop? I’m unsure what’s the cause (the module itself, my soldering, or the kernel/a lack of fan activity indicating overheating?).

Occasionally ethtool (patched to 6.11) will break off its readout halfway through and instead go

[…]
	Vendor SN                                 : S23267120070___
	Date code                                 : 150525

netlink error: Operation timed out
Failed to read Page A2h.
root@OpenWrt:~#  ethtool -m eth2
netlink error: Operation timed out

There’s something very wrong with these readouts, anyway (** my mark):

	Laser bias current                        : **0.000 mA**
	Laser output power                        : 2.0000 mW / 3.01 dBm
	Receiver signal average optical power     : 0.0000 mW / -inf dBm
	Module temperature                        : **10.85 degrees C / 51.53 degrees F**
	Module voltage                            : 3.3000 V
	Alarm/warning flags implemented           : Yes
	Laser bias current high alarm             : Off
	Laser bias current low alarm              : Off
	Laser bias current high warning           : Off
	Laser bias current low warning            : **On**

edit: On Sinovoip’s 20240620 build, there’s no such issue. I can hammer ethtool -m by the hundreds. But every minute or so it does briefly fail with Module EEPROM data: No such device or address. I’ve also seen this in dmesg:

[  459.323626] sfp sfp@0: SM: enter present:up:wait_los event tx_fault
[  459.329941] sfp sfp@0: module transmit fault indicated
[  459.335084] sfp sfp@0: SM: exit present:up:tx_fault
[  459.443629] sfp sfp@0: SM: enter present:up:tx_fault event tx_clear
[  459.449894] sfp sfp@0: SM: exit present:up:tx_fault
[  460.355164] sfp sfp@0: SM: enter present:up:tx_fault event timeout
[  460.361412] sfp sfp@0: SM: exit present:up:reinit
[  460.675168] sfp sfp@0: SM: enter present:up:reinit event timeout
[  460.681177] sfp sfp@0: module transmit fault recovered
[  460.686313] sfp sfp@0: SM: exit present:up:wait_los

Dunno how the module will work with this once fiber actually blesses this building.

Anyway, if anybody has this module (on any device) running fine on OpenWrt 24.10/snapshot (kernel 6.6), please do raise your hands.