[BPIR64] PCIE Port0 link down

I am trying to use my new MT7921 m.2 card on the BPI-R64. I have it mounted on a mpcie to m.2 adapter. On my very old laptop, it works ok, the card is detected. I have not tried anything else on the old laptop. On the R64 the card is not detected. I’ve even tried prolonging the poll delay from 100ms to 5000ms in here:

/* 100ms timeout value should be enough for Gen1/2 training */
	err = readl_poll_timeout(port->base + PCIE_LINK_STATUS_V2, val,
				 !!(val & PCIE_PORT_LINKUP_V2), 20,
				 100 * USEC_PER_MSEC);

But even waiting 5 seconds, the Port0 link does not come up.

[    0.983923] mtk-pcie 1a143000.pcie: host bridge /pcie@1a143000 ranges:
[    0.990755] mtk-pcie 1a143000.pcie: Parsing ranges property...
[    0.996680] mtk-pcie 1a143000.pcie:      MEM 0x0020000000..0x0027ffffff -> 0x0020000000
[    1.005242] mtk-pcie 1a143000.pcie: Port0 Executing startup!!! (added myself)
[    6.122597] mtk-pcie 1a143000.pcie: Port0 link down
[    6.127754] mtk-pcie 1a143000.pcie: PCI host bridge to bus 0000:00
[    6.133990] pci_bus 0000:00: root bus resource [bus 00-ff]
[    6.139485] pci_bus 0000:00: root bus resource [mem 0x20000000-0x27ffffff]
[    6.146368] pci_bus 0000:00: scanning bus
[    6.151900] pci_bus 0000:00: fixups for bus
[    6.156109] pci_bus 0000:00: bus scan returning with max=00
[    6.162123] mtk-pcie 1a145000.pcie: host bridge /pcie@1a145000 ranges:
[    6.168702] mtk-pcie 1a145000.pcie: Parsing ranges property...
[    6.174548] mtk-pcie 1a145000.pcie:      MEM 0x0028000000..0x002fffffff -> 0x0028000000
[    6.182927] mtk-pcie 1a145000.pcie: Port1 Executing startup!!!
[   11.296227] mtk-pcie 1a145000.pcie: Port1 link down
[   11.301360] mtk-pcie 1a145000.pcie: PCI host bridge to bus 0001:00
[   11.307564] pci_bus 0001:00: root bus resource [bus 00-ff]
[   11.313058] pci_bus 0001:00: root bus resource [mem 0x28000000-0x2fffffff]
[   11.319939] pci_bus 0001:00: scanning bus
[   11.325461] pci_bus 0001:00: fixups for bus
[   11.329665] pci_bus 0001:00: bus scan returning with max=00

The card is in the CN25 slot. I’ve even tried the wifi card from my old laptop in the R64, with exactly the same result, link down.

I have soldered a led (with shunt resistor ofcourse) on the 3.3V of the adapter, so I have checked that the 3.3V switches on, just before the (now visible) 5 second wait. (It also is switched on briefly when I connect power).

I have tried other images, because I suspected that not running u-boot could cause a difference in some register somewhere, but without succes. I tried the latest debian image from Frank, and tried the earliest v4.4.92 debian version I could find on the wiki page. All the same, Port0 link down.

I’ve looked in the schematics, the PERST signal is being used to switch on the 3.3V supply. I wonder if every card will like this? This should be the signal indicating to the card that supply and clock are stable right?

Anyone have any suggestions? Be a little patient, I don’t always have time to try something new immedeately.

Edit:

I’ve checked that neither the m.2 adapter, nor the old wifi card from laptop (AR9285), have any of the 5V pins connected. They are not connected to anything on both.

Also u-boot on Franks latest debian image finds nothing when I type pci enum

My guess is because the card processor is not running on the always active 5v, that the card never sees the rising edge of PERST and therefore does not initiate link training? But I am no expert on this topic.

@frank-w: Is the short described here: [BPI-R64] PCIe issues a short between IN 3.3V and OUT 3.3V ?

If so I think for me I could try the wire from the 3.3V of the adapter to my led and change it to the 3.3V on the GPIO header, thereby creating the same short between IN and OUT. The card should then be supplied with 3.3V all the time and should be able to see the rising edge on the PERST pin.

Afaik it shortens r220 between vin and perst0n, not in to out (shortcut chip)

Yes, it does short in-to-out.

Do note that one side effect of the short is losing the ability to soft-boot. On a soft-boot the system hangs at PCI initialization, for the very reason that the card can’t be shut down any more. The board boots as normal on a power cycle.

Thanks for the info. The card should reset when the PERST_N pin goes down again when the driver starts up the port , also at a reboot. I will be curious what is going wrong there.

It is a strange design anyway. If a card is only powered by 3.3V. When the 3.3V gets switched on by PERST_N and the card processor starts running, it would always be too late to see the rising edge of PERST_N. I am no pcie expert, but this definitely sounds wrong to me.

Well, I applied 3.3V from the gpio to the card, via the adapter.

Sadly still link down… On all 3 images mentioned earlier.

Why is the card not recogised by the BPI R64?

Is card (still) working on another board?

Yes it it still recognised in my old laptop.

I’m looking at the pci-mediatek.c, trying to find out what goes on and perhapse how to debug. Trying to find out which register to read for an error status. Looks like it should be at 0x5B0 ???

I find that of the PCIe V2 registers, the following registers do totally not match with the datasheet. Looks like others do match.

/* PCIe V2 per-port registers */

#define PCIE_CONF_VEND_ID	0x100
#define PCIE_CONF_DEVICE_ID	0x102
#define PCIE_CONF_CLASS_ID	0x106

But they do get written to just like the other registers:

writel(0, port->base + REGISTER);

From the same port->base.

According to datasheet VEND_ID should be at 0x4F0. and 0x100 is undocumented.

This makes it very hard to figure out what is going wrong, if indeed this is not a bug…

(( VEND_ID end DEVICE_ID are set up before waiting on the link to go up, so I assume they are needed for a succesfull link training.))

Edit: Found that PCIE_MSI_VECTOR is used differently, so I’ve removed this one. This leaves the VEND_ID and DEVICE_ID, which is according to datasheet at 0x4F0, but also at 0x100 it seems. So this checks out ok.

Any way, guess I wrote too soon. that part is not a bug.

Guess now I should try and see if I can get a status from the 0x5B0 register.

Too bad the 0x804 register is not documented. It has link status up/down and what else?

Also: the gen3 driver also adds the following to the pci_setup() routine:

	/*
	 * The controller may have been left out of reset by the bootloader
	 * so make sure that we get a clean start by asserting resets here.
	 */
	reset_control_assert(pcie->phy_reset);
	reset_control_assert(pcie->mac_reset);
	usleep_range(10, 20);

It would not hurt to try this too… Add a delay between here:

	reset_control_assert(port->reset);
	reset_control_deassert(port->reset);

Ok, Here is the result of the status:

[    0.983523] mtk-pcie 1a143000.pcie: host bridge /pcie@1a143000 ranges:
[    0.990357] mtk-pcie 1a143000.pcie: Parsing ranges property...
[    0.996280] mtk-pcie 1a143000.pcie:      MEM 0x0020000000..0x0027ffffff -> 0x0020000000
[    1.114361] mtk-pcie 1a143000.pcie: Port0 Executing startup try 3!!!
[    1.227670] mtk-pcie 1a143000.pcie: REGISTER 0x5B0 = 0x8108 !!!
[    1.433632] mtk-pcie 1a143000.pcie: REGISTER 0x5B0 = 0x8121 !!!
[    1.439564] mtk-pcie 1a143000.pcie: Port0 link down
[    1.444692] mtk-pcie 1a143000.pcie: PCI host bridge to bus 0000:00
[    1.450888] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.456382] pci_bus 0000:00: root bus resource [mem 0x20000000-0x27ffffff]
[    1.463264] pci_bus 0000:00: scanning bus
[    1.468755] pci_bus 0000:00: fixups for bus
[    1.472953] pci_bus 0000:00: bus scan returning with max=00
[    1.478972] mtk-pcie 1a145000.pcie: host bridge /pcie@1a145000 ranges:
[    1.485554] mtk-pcie 1a145000.pcie: Parsing ranges property...
[    1.491400] mtk-pcie 1a145000.pcie:      MEM 0x0028000000..0x002fffffff -> 0x0028000000
[    1.607657] mtk-pcie 1a145000.pcie: Port1 Executing startup try 3!!!
[    1.719631] mtk-pcie 1a145000.pcie: REGISTER 0x5B0 = 0x8108 !!!
[    1.925590] mtk-pcie 1a145000.pcie: REGISTER 0x5B0 = 0x8108 !!!
[    1.931520] mtk-pcie 1a145000.pcie: Port1 link down
[    1.936600] mtk-pcie 1a145000.pcie: PCI host bridge to bus 0001:00
[    1.942794] pci_bus 0001:00: root bus resource [bus 00-ff]
[    1.948288] pci_bus 0001:00: root bus resource [mem 0x28000000-0x2fffffff]
[    1.955172] pci_bus 0001:00: scanning bus
[    1.960633] pci_bus 0001:00: fixups for bus
[    1.964827] pci_bus 0001:00: bus scan returning with max=00

At least I can see now that the LTSSM is doing something with the card. Port0 there is a status change and Port1 there is not, because there is no card in port 1. So far this seems to work.

The changed status seems to indicate:

LTSSM state: 001: Polling

Bit 3 changed to 0: Non-Eletrical Idle

Bit 5 changed tp 1: Elastic buffer overflow at least once 
PIPE: RxStatus[2:0]=101

Now figure out what this means…

Adding extra delay after reset_control_assert(port->reset) does not help at all.

Maybe this is why mtk suggested the capacitors? Do you have the in or not?

Since they use large electrolythic capacitors, I assume it is being used to increase capacity of C264, the 3.3V supply going to the card? That would be to flatten the high current spikes.

I could add one on the adapter if this is the case. The adapter actually already has 4 small capacitors between 3.3V and GND. I’ll measure the total capacity.

Sadly in the reference documentation also no description of the pci subsys registers at 0x1a140000, the port shared registers. Nothing at all. Driver is writing to register 0 (PCIE_SYS_CFG_V2)

Ok, I have added a 100uF capacitor on the m.2 to mpcie adapter, to stabilise the 3.3V, in combination with the 3.3V short of the SY6280 switch. Charging the capacitor will put that switch in overload anyway.

But again, link down. Status register 0x5B0 the same value as without the capacitor.

Anyway, that was a solution for the current limit, which occurred only when the card was actually being used.

This seems to be a completely different problem.

I’ve also checked the 0x5B0 register, when it has my old AR9285 wifi card inserted. Then it does not even come as far as changing the register. Before and after the poll wait for link up, both times the same value, same as an empty pcie slot.

Just to make sure nothing is broken on my r64, I’ve tried the old wifi card (AR9285) in my other r64, also link down.

Is there any way to force the controller to choose the lowest speed only?