[BPI-R64] PCIe issues

hi jasmin, thank you for the patches, i have applied them to new 5.4-pcie branch

can anybody having issues testing it? (maybe with coral-card)

i hope this patch does not break bpi-r2 (my 5.4 branch is for both boards).

Hi Frank,

thanks for applying the patches… I am courious, whether they work in more setups/with more boards.

However, 2 pcie-issues I discovered still remain:

a) compex WLE1216V5-20 not recognized at all:

mtk-pcie 1a140000.pcie: Port0 link down

I have already tried to increase the training time in mtk-pcie, but it did not help. Furthermore I used k_gbl_speed_sup (see mt7622) to restrict the link speed: did not help either. Perhaps it is related to this:

I am now trying to patch mtk-pcie accordingly…

b) exar XR17V35X serial controller (connected via mpcie -> pcie x1 adapter) can not receive data over serial line. This serial controller card (including the adapter) works perfectly with 2 other different boards I have tested with (same kernel version). The card uses MMIO to map 8250 compatible register layout into memory. When data is received over serial line, an interrupt is fired (using INTX or MSI works both with patches above): the receive-function of the serial-driver is then correctly called, which looks into MMIO Bar in order to detect the reason for the interrupt. If data was received over serial line, a corresponding bit should be set in MMIO, which signals this circumstance as reason for the interrupt: but this bit is NOT set in MMIO space. So I guess, that incoming MMIO Transfer from the serial card to memory does not work properly, but

“lspci -vv” does not show SERR or PERR:

 01:00.0 Serial controller: Exar Corp. Device 8358 (rev 03) (prog-if 02 [16550])
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 140
    Region 0: Memory at 20000000 (32-bit, non-prefetchable) [size=16K]
    Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
            Address: 0000000040a350c0  Data: 0000
    Capabilities: [78] Power Management version 3
            Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
            Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [80] Express (v2) Endpoint, MSI 01
            DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                    ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
            DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                    RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                    MaxPayload 128 bytes, MaxReadReq 512 bytes
            DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
            LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
                    ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
            LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                    ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
            LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
            DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
            DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
            LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                     Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                     Compliance De-emphasis: -6dB
            LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                     EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [100 v1] Virtual Channel
            Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
            Arb:    Fixed- WRR32- WRR64- WRR128-
            Ctrl:   ArbSelect=Fixed
            Status: InProgress-
            VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                    Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                    Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                    Status: NegoPending- InProgress-
    Kernel driver in use: exar_serial

After receiving some characters, the register 1A143424 (see MT7622 manual) (PCIE_INT_STATUS) is:

[ 1032.661583] mtk_pcie_intr_handler: PCIE_INT_STATUS reg 0x1A143424 val 0x10800000
[ 1032.668885] mtk_pcie_intr_handler: mpci_cfg_soft=0, pcie_rc_l2_wake=0, legacy_pm_chg=0, ltr_en=1, ltr_msg=0, cpu_active=0, obff=0, obff_idle=0
[ 1032.668889] mtk_pcie_intr_handler: msi=1, aer_event=0, pm_hp_event=0, serr=0, intd=0, intc=0, intb=0, inta=0
[ 1032.682006] mtk_pcie_intr_handler: UNUSED=0, b2p_discard=0, b2p_rerr=0, p2b_wr_win2=0, p2b_wr_win1=0, p2b_wr_win0=0, p2b_rerr=0, p2b_werr=0
[ 1032.692259] mtk_pcie_intr_handler: UNUSED=0, rdma_berr=0, rdma_perr=0, rdma_end=0, UNUSED=0, wdma_berr=0, wdma_perr=0, wdma_end=0

Furthermore I have looked into the REG_ES_* registers (see MT7622 manual):

[ 1032.705028] mtk_pcie_intr_handler: reg 0x10a355a0 val 0x0
[ 1032.723442] mtk_pcie_intr_handler: reg 0x10a355a4 val 0x0
[ 1032.729960] mtk_pcie_intr_handler: reg 0x10a355a8 val 0x0
[ 1032.736478] mtk_pcie_intr_handler: reg 0x10a355ac val 0x0
[ 1032.742996] mtk_pcie_intr_handler: reg 0x10a355b0 val 0x8f02
[ 1032.749774] mtk_pcie_intr_handler: reg 0x10a355b4 val 0x2
[ 1032.756293] mtk_pcie_intr_handler: reg 0x10a355b8 val 0x0
[ 1032.762811] mtk_pcie_intr_handler: reg 0x10a355bc val 0x0
[ 1032.769328] mtk_pcie_intr_handler: reg 0x10a355c0 val 0x0
[ 1032.775846] mtk_pcie_intr_handler: reg 0x10a355c4 val 0x0
[ 1032.782365] mtk_pcie_intr_handler: reg 0x10a355c8 val 0x1
[ 1032.788883] mtk_pcie_intr_handler: reg 0x10a355cc val 0x0
[ 1032.795401] mtk_pcie_intr_handler: reg 0x10a355d0 val 0x104
[ 1032.802092] mtk_pcie_intr_handler: reg 0x10a355d4 val 0xc22fff7

I do not know the reason why the bit, that is signalling, that data was received over the serial line, is not transferred to MMIO. As already mentioned: it works perfectly with other boards/SBCs and the same kernel (and architecture arm64).

What I found out is, that if this signalling bit is “ignored” by the serial driver and data is read from MMIO again and again (interrupt fire and loops run all the time), sometimes (and with latency of several seconds) some single characters get through!

My currently “best” hypothesis is, that the data does not come through PCIE-AHB/AXI-Transfer into memory – but only for certain transfer-types or something? Any help/ideas are appreciated :slight_smile:

Thanks Jasmin

which pci-slot do you use? port shared with sata (cn8) has a hardware-limitation (missing capacitors on tx-pins) causing only gen1 cards to be recognized

Good to know :slight_smile: I have tried out both ports… same result. The reset algorithm (like mentioned on github/link in my posting above) is likely the cause I think… I hope this can be solved by changing mtk_pcie_startup_port_v2 in pcie-mediatek.c accordingly. https://bugzilla.kernel.org/show_bug.cgi?id=84821#c48

So at least there is something, that can be tried out :slight_smile:

Jasmin

my compex wle900vx was recognized with splitted pci-ports in cn25 and i had open a AP on it

Btw.did you applied this patch? https://github.com/frank-w/BPI-R2-4.14/commit/9a89cdc0a9ac454bbb52ac2b359efdd80a064902

The issue was caused by class code is zero of Google Coral PCIe card, so linux pcie framework didn’t assign resource after scan.

The work-around solution is remove "class == PCI_CLASS_NOT_DEFINED " in setup-bus.c, thanks.

static void __dev_sort_resources(struct pci_dev *dev,
                                 struct list_head *head)
{
        u16 class = dev->class >> 8;

        /* Don't touch classless devices or host bridges or ioapics.  */
        if (class == PCI_CLASS_NOT_DEFINED || class == PCI_CLASS_BRIDGE_HOST)
                return;

ps. please refer to page 8 in below pdf to get more information about class id https://pcisig.com/sites/default/files/files/PCI_Code-ID_r_1_11__v24_Jan_2019.pdf

1 Like

You linked mutex issue…do you mean this or the bar0 unassigned issue happening on coral cards?

I linked to the wrong issue. I meant bar0 unassign issue was caused by class id is 0. it is a invalid value, thanks.

Ok,if it is drivercode,card should not work with other pci-controllers or these controllers allowing classless devices…

Anyone here have this googlecard running on another pcie-controller and which?

As workaround is in generic framework and i guess driver should work with other controller (class maybe set by any read value) it looks like assignment issue.

I wonder why class is shifted to variable compared with none…maybe the bits for class are not on right position?

Just to link issue thread…

I think we can use x86 pc and lspci.command to check class id. Maybe most google cards have valid id and only few card have such issue, thanks.

https://diego.assencio.com/?index=649b7a71b35fc7ad41e03b6d0e825f07

I have apply the pci interrupt patch and pci controller splitting dts patch from frank. And I have tested 10PCS WLE900VX, but not all WLE900VX working with R64.

  1. 3PCS WLE900VX work perfect with R64.
  2. 3PCS WLE900VX can’t detect by R64.
  3. 4PCS WLE900VX sometimes work, sometimes can’t detect by R64.

I also think it’s the RESET signal sequence problem, I try to sleep 1000ms before end reset, but it can’t fix all the problem.

static int mtk_pcie_startup_port_v2(struct mtk_pcie_port *port)
{
        struct mtk_pcie *pcie = port->pcie;
        struct resource *mem = &pcie->mem;
        const struct mtk_pcie_soc *soc = port->pcie->soc;
        u32 val;
        size_t size;
        int err;

        /* MT7622 platforms need to enable LTSSM and ASPM from PCIe subsys */
        if (pcie->base) {
                val = readl(pcie->base + PCIE_SYS_CFG_V2);
                val |= PCIE_CSR_LTSSM_EN(port->slot) |
                       PCIE_CSR_ASPM_L1_EN(port->slot);
                writel(val, pcie->base + PCIE_SYS_CFG_V2);
        }

        /* Assert all reset signals */
        writel(0, port->base + PCIE_RST_CTRL);

        /*
         * Enable PCIe link down reset, if link status changed from link up to
         * link down, this will reset MAC control registers and configuration
         * space.
         */
        writel(PCIE_LINKDOWN_RST_EN, port->base + PCIE_RST_CTRL);

        if (port->slot == 0){
                dev_err(pcie->dev, "pcie port0 sleep 1000ms to wait reset for QCA988X device!!!!\n");
                msleep(1000);
        }

        /* De-assert PHY, PE, PIPE, MAC and configuration reset */
        val = readl(port->base + PCIE_RST_CTRL);
        val |= PCIE_PHY_RSTB | PCIE_PERSTB | PCIE_PIPE_SRSTB |
               PCIE_MAC_SRSTB | PCIE_CRSTB;
        writel(val, port->base + PCIE_RST_CTRL);

        /* Set up vendor ID and class code */
        if (soc->need_fix_class_id) {
                val = PCI_VENDOR_ID_MEDIATEK;
                writew(val, port->base + PCIE_CONF_VEND_ID);

                val = PCI_CLASS_BRIDGE_PCI;
                writew(val, port->base + PCIE_CONF_CLASS_ID);
        }

        /* 100ms timeout value should be enough for Gen1/2 training */
        err = readl_poll_timeout(port->base + PCIE_LINK_STATUS_V2, val,
                                 !!(val & PCIE_PORT_LINKUP_V2), 20,
                                 100 * USEC_PER_MSEC);
        if (err)
                return -ETIMEDOUT;

        /* Set INTx mask */
        val = readl(port->base + PCIE_INT_MASK);
        val &= ~INTX_MASK;
        writel(val, port->base + PCIE_INT_MASK);

        if (IS_ENABLED(CONFIG_PCI_MSI))
                mtk_pcie_enable_msi(port);

        /* Set AHB to PCIe translation windows */
        size = mem->end - mem->start;
        val = lower_32_bits(mem->start) | AHB2PCIE_SIZE(fls(size));
        writel(val, port->base + PCIE_AHB_TRANS_BASE0_L);

        val = upper_32_bits(mem->start);
        writel(val, port->base + PCIE_AHB_TRANS_BASE0_H);

        /* Set PCIe to AXI translation memory space.*/
        val = fls(0xffffffff) | WIN_ENABLE;
        writel(val, port->base + PCIE_AXI_WINDOW0);

        return 0;
}

I will try your way, hope it can support WLE900VX.

Please try only front slot (cn25) because the slot behind (cn8) has hardware limitation

Yes, all test with the CN25 pcie slot

Hi frank-w,

I have updated the setup-bus.c file and removed check class == PCI_CLASS_NOT_DEFINED I Get a resource collision error as follows

pi@bpi-iot-ros-ai:~$ dmesg | grep pci
[    1.494119] mtk-pcie 1a143000.pcie: host bridge /pcie@1a143000 ranges:
[    1.505642] mtk-pcie 1a143000.pcie: Parsing ranges property...
[    1.515992] mtk-pcie 1a143000.pcie:   MEM 0x20000000..0x2fffffff -> 0x20000000
[    1.556439] mtk-pcie 1a143000.pcie: PCI host bridge to bus 0000:00
[    1.568448] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.573946] pci_bus 0000:00: root bus resource [mem 0x20000000-0x2fffffff]
[    1.580841] pci_bus 0000:00: scanning bus
[    1.584897] pci 0000:00:00.0: [14c3:3258] type 01 class 0x060400
[    1.590957] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x1ffffffff 64bit pref]
[    1.599829] pci_bus 0000:00: fixups for bus
[    1.604022] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
[    1.610729] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.618747] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
[    1.625563] pci_bus 0000:01: scanning bus
[    1.629683] pci 0000:01:00.0: [1ac1:089a] type 00 class 0x0000ff
[    1.635984] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit pref]
[    1.643318] pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x000fffff 64bit pref]
[    1.651524] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 4.000 Gb/s with 5 GT/s x1 link)
[    1.666709] pci_bus 0000:01: fixups for bus
[    1.670896] pci_bus 0000:01: bus scan returning with max=01
[    1.676472] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    1.683093] pci_bus 0000:00: bus scan returning with max=01
[    1.688684] pci 0000:00:00.0: BAR 0: no space for [mem size 0x200000000 64bit pref]
[    1.696347] pci 0000:00:00.0: BAR 0: failed to assign [mem size 0x200000000 64bit pref]
[    1.704355] pci 0000:00:00.0: BAR 8: assigned [mem 0x20000000-0x201fffff]
[    1.711151] pci 0000:01:00.0: BAR 2: assigned [mem 0x20000000-0x200fffff 64bit pref]
[    1.718983] pci 0000:01:00.0: BAR 0: assigned [mem 0x20100000-0x20103fff 64bit pref]
[    1.726814] pci 0000:00:00.0: PCI bridge to [bus 01]
[    1.731787] pci 0000:00:00.0:   bridge window [mem 0x20000000-0x201fffff]
[    1.738888] mtk-pcie 1a145000.pcie: host bridge /pcie@1a145000 ranges:
[    1.745425] mtk-pcie 1a145000.pcie: Parsing ranges property...
[    1.751269] mtk-pcie 1a145000.pcie:   MEM 0x28000000..0x2fffffff -> 0x28000000
[    1.758508] mtk-pcie 1a145000.pcie: resource collision: [mem 0x28000000-0x2fffffff] conflicts with pcie@1a143000 [mem 0x20000000-0x2fffffff]
[    1.771141] mtk-pcie: probe of 1a145000.pcie failed with error -16
[    8.381659]  pci_enable_resources+0x68/0x18c
[    8.381665]  pcibios_enable_device+0xc/0x14
[    8.381670]  do_pci_enable_device+0x50/0xd8
[    8.381675]  pci_enable_device_flags+0x100/0x15c
[    8.381680]  pci_enable_device+0x10/0x18
[    8.381684]  pci_enable_bridge+0x50/0x78
[    8.381689]  pci_enable_device_flags+0x98/0x15c
[    8.381694]  pci_enable_device+0x10/0x18
[    8.381703]  apex_pci_probe+0x38/0x468 [apex]
[    8.381709]  pci_device_probe+0xa0/0x144
[    8.381755]  __pci_register_driver+0x40/0x48
[    8.381811] pci 0000:00:00.0: Assigned....BAR 8 [mem 0x20000000-0x201fffff]........
[    8.381817] pci 0000:00:00.0: Assigned and claimed....BAR 8 [mem 0x20000000-0x201fffff]........
[    8.381823] pci 0000:00:00.0: enabling device (0000 -> 0002)
[    8.381837] pci 0000:00:00.0: enabling bus mastering
[    8.381880]  pci_enable_resources+0x68/0x18c
[    8.381884]  pcibios_enable_device+0xc/0x14
[    8.381889]  do_pci_enable_device+0x50/0xd8
[    8.381893]  pci_enable_device_flags+0x100/0x15c
[    8.381898]  pci_enable_device+0x10/0x18
[    8.381906]  apex_pci_probe+0x38/0x468 [apex]
[    8.381911]  pci_device_probe+0xa0/0x144
[    8.381953]  __pci_register_driver+0x40/0x48

As a result of this I am unable to run Inference on Google Coral.

How can we resolve this collision error?

I am working on the following repo

In this tree ranges is wrong from tests, please use 5.4-main or change ranges property to values from this tree

Hi franh-w

I have compared the branches 5.4-main and 5.4-dsa w.r.t. the PCIe sources. The memory map range is the same. PCI splitting is also the same.

I am sure it will give me the same memory not claimed issue with 5.4-main. Additionally if I apply the changes to detect PCIe device with class code 0000 it should give me resource collision error.

What is the benefit of using 5.4 main?

I thought there was a difference…

5.4-main will be updated. 5.4-dsa was only testing branch and will be deleted…

Else i have no idea because there i only apply patches i get from mtk/others

Regarding MiniPCIe card issue:

I was able to glean through this thread and put together a patch that works, atleast with WLE900VX modules. Thanks to @frank-w (for PCIe split patches) @bourne_hlm (for identifying RESET signal sequence problem - although I find 3000 msecs sleep works better than 1000msec) . I did not try @jasmin patches though. I’ve attached consolidated patch for helping whoever is hitting this problem. This patch is on top of clean, latest openwrt 4.19.101. With this patch, I see WLE900VX is consistently recognized and ath10k driver is coming up successfully. Phew!!!

patch (7.7 KB)

1 Like

Spoke too soon. Even with 3000 msec sleep, sometimes miniPCIe card is not detected. Sometimes it’s detected. Any other clues I can try ?