Huawei OptiXstar S800E XGSPON SPF+ ONU not detected

Maybe you need a quirk/fixup similar to the rollball delaying the probe

Setting sfp->module_t_wait (Maybe sfp->phy_t_retry too)

maybe you can set a boot delay for openwrt.

fw_setenv bootdelay 60

if it works shorten delay to 45 or 30 until point of failure.

fw_setenv is imho uboot config. This sets the time showing uboot boomenu,no the time for linux kernel wait for sfp

isn’t it the same? both give the sfp module enough time to boot up upon power up.

frank-w and glassdoor, thank you both for your suggestions:

I’ve just tried fw_setenv bootdelay 60, however even with the additional wait, the EEPROM is still unavailable :

[   68.232653] sfp sfp1: failed to read EEPROM: -ENXIO

I intend to try the suggestion from frank-w to temporarily change sfp->module_t_wait to a much higher default value. If I understand correctly, the key difference between this and changing bootdelay is that sfp_module_tx_enable(sfp); should first pull the tx-disable pin low before waiting. In the event that the S800E’s processor reset is directly tied to the tx-disable pin, this should allow the cpu to start up and properly emulate an EEPROM.

I currently do not have a OpenWrt build environment ready, so it might take me a while to test this and report back. (Is there a faster/easier way to do this without recompiling? Maybe swapping the gpios on the devicetree?)

You could try if your kernel has something like this, add that to the fixup, or add the fixup:

eicwoud, thank you for your suggestion. I’ll look into it.

Small update on a failed experiment: I removed R106 which connects GPIO70 to SFP1_TX_Disable. My assumption is that tx-disable will be permanently pulled to GND via R101, and therefore any installed SFP module will always be enabled regardless of the cpu/driver state. However this did not work, and I am still getting the same EEPROM error.

image

Do you have still the “slow to respond error”? Have you increased the bootup time via quirk and verified that this was called (e.g. debug message in the function).

Afair there was a way to tag eeprom as broken (imho only for checksum error),but here it seems eeprom cannot be accessed at all.

just curious, did you just rip of the mosfet and resistor with a plier?

I have a build with the follow squark patch applied if you want to test.

SFP_QUIRK_F(“HUAWEI”, “S800E”, sfp_fixup_long_startup),

frank-w: I have not tried rebuilding the kernel to insert the delay in sfp.c. Unfortunately my OpenWrt build environment is unavailable right now so I can’t try the fix that you suggested yet. I tried the resistor removal hack as it was a faster and easier alternative to check if tx-disable was causing any issues.

Adding this as a quirk will be difficult because the EEPROM is unavailable. The sfp_lookup_quirk function expects the EEPROM to at least provide the vendor_name and vendor_pn which isn’t available here with the S800E.

If I had to hack this delay in temporarily, I would probably call sfp_fixup_long_startup(sfp); after the quirk lookup.


glassdoor: Those components were desoldered. I am concerned that ripping the parts off might also tear off the underlying traces.

On your original post on whether a software based approach is possible, I would cautiously suggest modifying the devicetree so that the pin at moddef0 is not used by sfp1. You might then be able to use pin 82 as a gpio output and manually drive it low so that mosfet Q12 can deliver power to the module.

Edit: bear in mind that the sfp driver only starts probing for the module when the configured moddef0 pin is grounded, so you’ll also have to figure out how to tell the kernel to start probing for the inserted sfp module

Thanks for the offer on your custom build. I am not sure if that will work as the current quirk implementation expects the EEPROM to at least work (discussed in this same post in my reply to frank-w)

deleting line 44 will mean that sfp1 module will always be assumed to be present. https://www.kernel.org/doc/Documentation/devicetree/bindings/net/sff%2Csfp.txt

how you manually drive pin 82 low on the devicetree? how do you change it from input to output? that is what i am really interested in. any suggestions?

Edit: from cat /sys/kernel/debug/gpio

gpiochip0: GPIOs 512-595, parent: platform/1001f000.pinctrl, pinctrl_moore:
 gpio-512 (                    |tx-disable          ) out lo 
 gpio-513 (                    |tx-fault            ) in  lo IRQ 
 gpio-514 (                    |los                 ) in  lo IRQ 
 gpio-515 (                    |rate-select0        ) in  hi ACTIVE LOW
 gpio-517 (                    |reset               ) out hi ACTIVE LOW
 gpio-524 (                    |cd                  ) in  lo IRQ ACTIVE LOW
 gpio-526 (                    |WPS                 ) in  hi IRQ ACTIVE LOW
 gpio-533 (                    |rate-select0        ) in  lo ACTIVE LOW
 gpio-566 (                    |los                 ) in  lo IRQ 
 gpio-575 (                    |blue:wps            ) out lo 
 gpio-581 (                    |tx-fault            ) in  lo IRQ 
 gpio-582 (                    |tx-disable          ) out lo 
 gpio-591 (                    |green:status        ) out hi 
 gpio-594 (                    |mod-def0            ) in  lo IRQ ACTIVE LOW
 gpio-595 (                    |mod-def0            ) in  lo IRQ ACTIVE LOW

glassdoor: Yes you’re right about that, I added an edit shortly after posting that you might have to find a way to get the sfp driver to probe after a certain period since the moddef0 pin would be driven as an output and would not be able to tell if a module is actually inserted into the cage.

The guide at https://openwrt.org/docs/techref/hardware/port.gpio might be able to steer you in the right direction. I originally tried this approach but I could not export the gpio as it was already in use by the sfp driver. That was when I decided to go for a hardware mod to bypass the mosfet.

In your logs, gpio-594 ( |mod-def0 ) in lo IRQ ACTIVE LOW appears to be the moddef0 gpio for your sfp1.

Edit: I tried manually driving my sfp2 (since my sfp1 was already bypassed). The sfp driver was first unloaded to free the pins. When the pin was driven low “0”, I could see the green power LED light up so at least that seems to physically work. In your case, you will want to use replace 595 with 594 for sfp1.

root@OpenWrt:~# cat /sys/kernel/debug/gpio
gpiochip0: GPIOs 512-595, parent: platform/1001f000.pinctrl, pinctrl_moore:
 gpio-512 (                    |tx-disable          ) in  lo
 gpio-513 (                    |tx-fault            ) in  hi IRQ
 gpio-514 (                    |los                 ) in  hi IRQ
 gpio-515 (                    |rate-select0        ) in  hi ACTIVE LOW
 gpio-517 (                    |reset               ) out hi ACTIVE LOW
 gpio-524 (                    |cd                  ) in  lo IRQ ACTIVE LOW
 gpio-526 (                    |WPS                 ) in  hi IRQ ACTIVE LOW
 gpio-533 (                    |rate-select0        ) in  lo ACTIVE LOW
 gpio-566 (                    |los                 ) in  hi IRQ
 gpio-575 (                    |blue:wps            ) out lo
 gpio-581 (                    |tx-fault            ) in  hi IRQ
 gpio-582 (                    |tx-disable          ) in  hi
 gpio-591 (                    |green:status        ) out hi
 gpio-594 (                    |mod-def0            ) in  hi IRQ ACTIVE LOW
 gpio-595 (                    |mod-def0            ) in  hi IRQ ACTIVE LOW
root@OpenWrt:~# rmmod sfp.ko
root@OpenWrt:~# cat /sys/kernel/debug/gpio
gpiochip0: GPIOs 512-595, parent: platform/1001f000.pinctrl, pinctrl_moore:
 gpio-517 (                    |reset               ) out hi ACTIVE LOW
 gpio-524 (                    |cd                  ) in  lo IRQ ACTIVE LOW
 gpio-526 (                    |WPS                 ) in  hi IRQ ACTIVE LOW
 gpio-575 (                    |blue:wps            ) out lo
 gpio-591 (                    |green:status        ) out hi

root@OpenWrt:~# echo "595" > /sys/class/gpio/export
root@OpenWrt:~# echo "out" > /sys/class/gpio/gpio595/direction
root@OpenWrt:~# echo "1" > /sys/class/gpio/gpio595/value
root@OpenWrt:~# echo "0" > /sys/class/gpio/gpio595/value
root@OpenWrt:~#

I can confirm that freeing the sfp1 mod-def0 pin. allows it to be configured as output and to be pull low. S800E now gets power. Also removing mod-def0 gpio definition from tree seems ok. as other sfp modules still work in

However S800E still fails to work (previously nothing shows for sfp1 as it was not getting powered)

sfp sfp1: please wait, module slow to respond

…further down the bootlog

sfp sfp1: failed to read EEPROM: -ENXIO

the above applies with or without the below squark patch. it seems your suspicion that the eeprom read fails and as such sfp_quirk does not apply.

SFP_QUIRK_F(“HUAWEI”, “S800E”, sfp_fixup_long_startup),

Ideas?

I think we are now facing the same issue where the EEPROM just isn’t available. Both frank-w and ericwoud have suggested adding a delay during the module initialization. If you’re keen to try that, can you modify sfp.c to add this snippet after the quirk fixup?

In sfp_sm_mod_probe at linux/drivers/net/phy/sfp.c at 059dd502b263d8a4e2a84809cf1068d6a3905e6f · torvalds/linux · GitHub

	if (sfp->quirk && sfp->quirk->fixup)
		sfp->quirk->fixup(sfp);

+	dev_warn(sfp->dev, "hack: applying slow gpon\n");
+	sfp->module_t_start_up = T_START_UP_BAD_GPON;

	sfp->state_hw_mask &= ~sfp->state_ignore_mask;
	mutex_unlock(&sfp->st_mutex);

This should force the quirk to be applied regardless of whether the EEPROM is available. If it is applied, you should hopefully see the “hack:…” message in your dmesg.

(I’m also surprised that your sfp still works without a moddef0 gpio. No idea how that is happening but it’s good that there’s power and modules are still getting probed)

slow sun day here so…

i removed the below inaddition to the moddef0 gpio in the device tree

		tx-disable-gpios = <&pio 70 GPIO_ACTIVE_HIGH>;
		tx-fault-gpios = <&pio 69 GPIO_ACTIVE_HIGH>;

with the above, and manually changing moddef0 to low. When I boot up with a dac cable in sfp1. it shows and correctly reads the eeprom of the dac cable.

sfp sfp1: No tx_disable pin: SFP modules will always be emitting.

at this point if I swap out the dac cable for the S800E. It just works without further kernel messages/errors. Link is up and everything works as it should. But ethtool -m eth2 shows nothing. So eeprom is not probed/loaded but it works.

ethtool -m eth2
netlink error: No such device or address
ethtool eth2
Settings for eth2:
	Supported ports: [  ]
	Supported link modes:   2500baseX/Full
	                        1000baseX/Full
	                        10000baseCR/Full
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  10000baseCR/Full
	Advertised pause frame use: Symmetric
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Speed: 10000Mb/s
	Duplex: Full
	Auto-negotiation: on
	Port: Other
	PHYAD: 0
	Transceiver: internal
        Current message level: 0x000000ff (255)
                               drv probe link timer ifdown ifup rx_err tx_err
	Link detected: yes
cat /sys/kernel/debug/sfp1/state
Module state: present
Module probe attempts: 0 0
Device state: up
Main state: link_up
Fault recovery remaining retries: 5
PHY probe remaining retries: 12
Signalling rate: 10313 kBd
Rate select threshold: 0 kBd
moddef0: 1
rx_los: 0
tx_fault: 0
tx_disable: 0
rs0: 0
rs1: 0

Howevery if I do a reboot, then the same error comes back.

sfp sfp1: please wait, module slow to respond

…further down the bootlog

sfp sfp1: failed to read EEPROM: -ENXIO
cat /sys/kernel/debug/sfp1/state
Module state: error
Module probe attempts: 10 12
Device state: up
Main state: down
Fault recovery remaining retries: 0
PHY probe remaining retries: 0
Signalling rate: 10313 kBd
Rate select threshold: 0 kBd
moddef0: 1
rx_los: 0
tx_fault: 0
tx_disable: 1
rs0: 0
rs1: 0

I will patch a build with a default startup delay as suggested adn find time/opportunity to test. the S800E xgspon is my main fiber on another setup. bpi-r4 is more of a test/lab device.

You have tx disable set to 1, so the other side does not get link and so you get none too

applied your suggested default slow gpon patch…

during boot up for sfp2, the eeprom is displayed then the kernel warning “hack: applying slow gpon” is displayed (which mean patch is indeed applied). But for sfp1, it’s still the same “please wait, module slow to respond” then further down “failed to read EEPROM: -ENXIO”. sfp1 does not display the slow gpon warning meaning eeprom has to be read first.

so no go on that front.

from this: patch set: Allow slow to initialise GPON modules to work it gives a bit more detail on the boot process and reading eeprom of the sfp https://lore.kernel.org/all/[email protected]/T/

It tried to increase the probe time and retries 5s to 10s and retries from 12 to 50

#define T_PROBE_RETRY_INIT     msecs_to_jiffies(100)
#define R_PROBE_RETRY_INIT     10
#define T_PROBE_RETRY_SLOW     msecs_to_jiffies(5000)
#define R_PROBE_RETRY_SLOW     12

This is what is referred to in the probe attempts 10 and 12.

cat /sys/kernel/debug/sfp1/state
Module state: error
Module probe attempts: 10 12

But even with the increase probe time and retries. it still failed to read the eeprom. i can actually see the probe attempts number increasing to 50 then fail.

the issues to me now seems to be the actual probing for the eeprom (wrong address? wrong eeprom response??). Maybe we are looking at the wrong place? maybe i2c for sfp1 issue?

So with just removing moddef0 gpio in the device tree and manually pulling the pin low. I was able to provide power to the S800E. So for testing, if the system is booted up with a dac cable in sfp1 and everything detected correctly. quickly hot swapping to the S800E works. But a reboot will bring back the same failed to read eeprom error.

ideas?

just installed i2c-tools to troubleshoot

I can detect and dump i2c eeprom from a DAC cable connected on SFP1

i2cdetect -y 3
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         -- -- -- -- -- -- -- -- 
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
50: 50 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
70: UU -- -- -- -- -- -- -- 

i2cdump -y 3 0x50
No size specified (using byte-data access)
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 03 04 23 01 00 00 04 41 84 80 d5 00 67 00 00 00    ??#?..?A???.g...
10: 00 00 01 00 55 62 69 71 75 69 74 69 20 49 6e 63    ..?.Ubiquiti Inc
20: 2e 20 20 20 00 24 5a 4c 44 41 43 2d 53 46 50 31    .   .$ZLDAC-SFP1

However I get nothing with a S800E on the same port.

i2cdetect -y 3
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         -- -- -- -- -- -- -- -- 
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
70: UU -- -- -- -- -- -- --

i2cdump -y 3 0x50
No size specified (using byte-data access)
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX    XXXXXXXXXXXXXXXX
10: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX    XXXXXXXXXXXXXXXX

Any ideas what’s happening here? and next steps for trouble shoorting?

You need a circuit diagram.

Some sfp’s use a real eeprom at 0x50, so try to locate one on the module. Likely it is not there.

Otherwise the module needs to be powered first, if it is not already.

Maybe removing R106 was not such a great idea…

Is the module still functional?

I did not remove R106. As per stated above, i removed moddef0 gpio from the sfp1 device tree. Then manually change direction to out and pull it to low. This was the manual way to provide power to sfp. Details a few scrolls up above. Yes, module is fully functional if i hotswap. But it fails when i do a reboot. The s800e is providing 10G fiber connection on another setup as my primary internet connect.

my next step is to telnet or ssh into the S800E and see what’s up with it.