BPI-R3 SFP Module compatibility

Found patch here: net: sfp: add quirk for DFP-34X-2C2 GPON ONU SFP - Patchwork

Forgot to mention that new versions need a version increasing…you get it when using format-patch with -v flag (-v2 creates version 2 where patches named v2-000*.patch),sorry for telling this.

Also noted that you have a indent beginning with line 2 of message and a typo in “enabled”. Have you make sure your message has max 75 chars per line (checkpatch should report this)?

I guess you will need v3. Don’t worry,nobody is perfect…i made patches with more than 8 versions :slight_smile:

Else once patch is out,wait for comments and fix the problems told you,if unsure ask how you should fix it (but after thinking about it)

Hi Frank, the 75 chars is ok. I saw the typo ater the patch was sent. I think the name of the patch is different from the one posted today. Is it the warning related to the [PATCH] instead of [PATCH net-next] in the subject? In that case what is needed to do? Edit the patch file changing typo and adding net-next and sending? or restart from recreation the patch with git format-patch ?

You should always change the commit and recreate the atch as described above,but first wait for responses of your v2 patch…i would like to know if the comment line is ok or should be dropped

For length…if your commits headline is below 75chars it is ok…the [Patch …] Is dropped when patch is applied…it will also contain the version when you add it…it makes it easier to see changes and drop the old versions from list

As you got response from russel,you can prepare changes he suggested…but i woult wait at least 24h for others to comment

You can set the tree (net-next here) by providing --subject-prefix (iirc,maybe do “git format-patch --help”)

If you post new version please add a version log after the tags (below signed-off-by atm) separated by — line

Something like this:

Signed-off-by: Sergio Palumbo <...>
---
Changes:
V3:
- fix identation of commit message
- add target tree (net-next)
- describe 1000 vs 2500baseX switch
V2:
- add commit message and SoB
- add subsystem prefix
---
 drivers/net/phy/sfp.c | 3

Hi Frank, I got comments from Russel and now is clear changes I have to do. Unfortunately there are now 2 new variables:

  1. I discovered that I have two different versions of the modle DFP-34X-2C2 one with vedor ID “ODI” and another with vendor ID “OEM” so I already compiled a new version of the OWRT for Banana PI R3 with the patch to accomodate both vendor IDs and is correctly working with the 2 modules.
  2. the net-next is closed due to the cycle and Russel told that this it should be the correct tree where to submit the patch. As far as the point 1. Do I need to create a completely new Patch with V1? or do I have to create a V2 includiing also the second? As far as pooint 2 better to wait the net-next is opened again or ppost the new patch in the net tree? When sending a patch with a new version will the patchwork create a new record? Thanks Sergio

1 quirk per patch, you can create a series, but imho it is easier to send them both alone (new one as v1, the old as v3 when changed the things suggested) as they do not depend on each other, right?

no as russel told you, new features are net-next (this is a special handling for the networking subsystem as it is very large subsystem with much patches not clearly fixes or feature-updates).

yes, wait for net-next to be opened…it should be only ~1 week (Monday in 1 week, 2 weeks after release of 6.7 final where the current next-trees are merged into torvalds as 6.8-rc1).

but you can test how to get the net-next-flag in (afair git format-patch --subject-prefix), or do you have already?

Frank, Daniel, can you please help me in the discussion with net-dev group and Russell who does not want to accept the patch for the two modules because he is saying that it has to be tested in case it is used with an linux host working only at 1000base-X and the quirk installed. I tested with Banana PI R3 without the patch and module showing up to linux at 1000base-X only and setting speed at 1000 Mb. I tested with Banana PI R3 with the patch and module showing up at both 1000base-X and 2500base-X setting speed at 2500 Mb I think this is same situation of Huawei MA5671A and Fiberstone GPON-ONU-34-20BI. This two modules require “sfp_quirk_2500basex” and “sfp_fixup_ignore_tx_fault” quirks, while DFP-34X-2C2 only requires “sfp_quirk_2500basex” Is there any possibility that after the quirk there will be any situation where there can be problems? I’m really confused, but I’m sure the patch is working and I tested in all situation I could. Thanks for your help. Sergio

If i understood correctly the module now works in both modes, before it was only 1000base-X, right?

https://patchwork.kernel.org/project/netdevbpf/patch/AS1PR03MB8189AD85CEB6E139F27307D3827F2@AS1PR03MB8189.eurprd03.prod.outlook.com/#25704971

russel still want to avoid that this quirk will be changed another time so it should include all modes the SFP supports. i do not have a ONT SFP and not know much about them, so except the commit-Message itself i cannot help much…when posting to mainline please always refer to mainline code, not openwrt.

for commit-message you still need repost because of the unwanted indentation. Then write which modes the SFP supports from datasheet and maybe which modes you have tested. Together with the main problem (sfp is not working because it needs mode xyz not set in eeprom) in front imho it should be OK.

only one problem i read out of russels comments:

it will cause these modules to regress when they are in the manufacturer default state when used with a host that supports both 1000base-X and 2500base-X

so maybe you now force the SFP to 2500Base-X so it does not work with 1000Base-X which is a regression for other users, i’m not sure, if i understand it correctly and if you can add multiple modes.

Hi Frank, I tested on Banana PI R3 host capable of both 1000-X and 2500-X without the quirk the system was working at 1000-X with both settings LAN_SDS_MODE=1 (1000-X) and LAN_SDS_MODE=6 (2500-X). The module was not showing up at 2500-X. After the quirk the module started showing up at 2500-X. The system was working at 2500-X with both settings LAN_SDS_MODE=1 (1000-X) and LAN_SDS_MODE=6 (2500-X). Being BPI R3 working at both 1000-X and 2500-X after the quirk is always working at 2500-X independently by the settings in the module. I do not have a machine with an sfp working at maximum 1000-X to test if after the quirk the module can work at 1000-X I tried to change the speed usign ethtool but when trying to change speed I get an error message, However we had a lot of messages on this with Russell and at the end he asked me:

Hi Sergio,

I did ask for the kernel messages from a specific scenario:

- host that supports 1000base-X and 2500base-X with your quirk
- SFP inserted with LAN_SDS_MODE=1

What I expet to see in the kernel messages is that the system will
use 2500base-X, and a failure.

You claim that the kernel will link at 1000base-X. There is no
mechanism in the kernel for this to happen, and I believe that
if you look at the kernel messages, this will prove my point.
I asked for it with a kernel that has  asked for it with a kernel that has #define DEBUG in phylink.c, but I see no debug messages from phylink in your quoted output. 

Unfortunately I do not know how to do the test with a kernel that has #define DEBUG in phylink.c. I asked help on how to do it in openwrt and started saying openwrt different from the main etc… Do you think I can do the test with this debug? Any idea on how to test an host having 1000-X only? I’m quite sure that the other 2 SFP GPON ONT already quirked are having same behavior of mine, but cannot be sure. Any comment is welcome.

Have you tried adding the define to phylink.c in your openwrt code? Afair it needs to be before the includes as one of them enables the dev_dbg.

For 1000baseX only host,maybe you can change mtk mac driver to drop 2500baseX mode on mt7986 temporarily :stuck_out_tongue:

OK now copilig with #define DEBUG on top in

/build_dir/toolchain-aarch64_cortex-a53__gcc-112.30_musl/linux-5.15.137/drivers/net/phy/phylink.c

Let’s see what will happen.

So the real question, if the module should work at 2500basex and 1000basex (somehow it sorts out which to use), then why doesn’t it connect at 2500basex without quirk? What exactly happens / is set or cleared in the quirk?

Before there was someone with a module here on the forum, that switched between the 2 interface modes, until it found out which to use. How does your module work?

Also keep in mind, that the R3 does not support inband auto-negotiation at 2500basex, but it does at 1000basex. However, phylink still does not know that the pcs/mac doesn’t support it at 2500basex, so has it enabled. This causes problems.

Since these modules connect as optical modules, the autoneg set with ethtool only applies to the autoneg between module and mac. So inband an can be disabled with ethtool.

I have been working on a phylink patch to handle this, but it also isn’t going to be accepted upstream. This case is much more complex and my simple hacky patch will not be good enough for all hardware.

If at all this is the problem here…

Hello Eric, I think the problem is that the EEPROM does not provide to linux the correct info and the module without the quirk is showing up at 1000basex only:

root@OpenWrt:~# ethtool eth1
Settings for eth1:100
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseX/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  1000baseX/Full
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Link partner advertised link modes:  1000baseX/Full
        Link partner advertised pause frame use: Symmetric Receive-only link modes:
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 1000Mb/s
        Duplex: Full
        Auto-negotiation: on
        Port: FIBRE
        PHYAD: 0
        Transceiver: internal
        Current message level: 0x000000ff (255)
                               drv probe link timer ifdown ifup rx_err tx_err
        Link detected: yes

with the quirk the module is showing up at both 2500basex and 1000basex

root@OpenWrt:~# ethtool eth1
Settings for eth1:
        Supported ports: [ FIBRE ]
        Supported link modes:   2500baseX/Full
                                1000baseX/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  2500baseX/Full
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: 2500Mb/s
        Duplex: Full
        Auto-negotiation: on
        Port: FIBRE
        PHYAD: 0
        Transceiver: internal
        Current message level: 0x000000ff (255)
                               drv probe link timer ifdown ifup rx_err tx_err
        Link detected: yes

After the quirk the module always connecting at 2500basex even if I try to set the module at 1000X

Tried to decrease speed to 1000 by using ethtool:

root@OpenWrt:~# ethtool -s eth1 speed 1000
netlink error: link settings update failed
netlink error: Invalid argument

it seems not to be possible.

I am not very familiar with ont’s, but you also have settings in the ont. What it reports as ‘eeprom’ is not so static on these things.

What is exactly in the quirk that you point it to?

Which code and what does it set/clear?

The quirk is not a areal patch to the codeand I do not know how it works at low level. There is a place in drivers/net/phy/sfp.c file where you can declare vendor and part number of the module and to force 2500base-X. There is a list of the modules and the possible rules to be applied and the patch is consisting in adding to the list the above mentoned paramters for a new module. I did it and the module is working, but it seems regression is needed in order to besure that who is currently using the module without the quirk is not suffering problems when the quirk is applied. Hope this clarifies.

Which source do you use? There are many different versions. And what is the code in sfp_quirk_2500basex() in this source?

This is the surce in sfp.c in dev-next:

static void sfp_quirk_2500basex(const struct sfp_eeprom_id *id,
				unsigned long *modes,
				unsigned long *interfaces)
{
	linkmode_set_bit(ETHTOOL_LINK_MODE_2500baseX_Full_BIT, modes);
	__set_bit(PHY_INTERFACE_MODE_2500BASEX, interfaces);
}

This is the code used for the quirk in BPI R3:

static void sfp_quirk_2500basex(const struct sfp_eeprom_id *id,
				unsigned long *modes)
{
	linkmode_set_bit(ETHTOOL_LINK_MODE_2500baseX_Full_BIT, modes);
}

They seems to be same code

Then doesn’t the ont have a config option to set it to reporting 2500 speed capability?

That is the only thing the quirk does.

FIBER_MODEoption forced 1g?

Carefully may lock you out…

The module has a config option to be used for forcing at 2500, but still the EEPROM is not reporting the rigth speed and, as far as I know, Linux is not negotiating 2500. on the contrary using the quirk, I do not know why the modle is running at 2500 even if the module is configured for 1000. This is it. I suppose the quirk forces 2500 even if the EEPROM is not reporting the 2500 speed. I do not think it is only for fiber. The quirk is also used tor some copper sfp not running in FIBER MODE.