Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA: SAUCE: PCI: Use downstream bridges for distributing resources 24.04_linux-nvidia-adv-6.11-next #46

Open
wants to merge 1 commit into
base: 24.04_linux-nvidia-adv-6.11-next
Choose a base branch
from

Conversation

clsotog
Copy link
Collaborator

@clsotog clsotog commented Jan 14, 2025

NVBUG: https://nvbugswb.nvidia.com/NVBugs5/redir.aspx?url=/4868471
BugLink: https://bugs.launchpad.net/bugs/2094821

Systems with BF3 switch will hit error where some BAR are not getting assigned.

Lore discussion: https://lore.kernel.org/all/[email protected]/
Without patch we need to use kernel parameter pci=realloc to workaround the issue.

Patch needed for Tech Preview kernel.

BugLink: https://bugs.launchpad.net/bugs/2094821

Commit 7180c1d ("PCI: Distribute available resources for root
buses, too") breaks BAR assignment on some devcies:
[   10.021193] pci 0006:03:00.0: BAR 0 [mem 0x6300c0000000-0x6300c1ffffff 64bit pref]: assigned
[   10.029880] pci 0006:03:00.1: BAR 0 [mem 0x6300c2000000-0x6300c3ffffff 64bit pref]: assigned
[   10.038561] pci 0006:03:00.2: BAR 0 [mem size 0x00800000 64bit pref]: can't assign; no space
[   10.047191] pci 0006:03:00.2: BAR 0 [mem size 0x00800000 64bit pref]: failed to assign
[   10.055285] pci 0006:03:00.0: VF BAR 0 [mem size 0x02000000 64bit pref]: can't assign; no space
[   10.064180] pci 0006:03:00.0: VF BAR 0 [mem size 0x02000000 64bit pref]: failed to assign
[   10.072543] pci 0006:03:00.1: VF BAR 0 [mem size 0x02000000 64bit pref]: can't assign; no space
[   10.081437] pci 0006:03:00.1: VF BAR 0 [mem size 0x02000000 64bit pref]: failed to assign

The apertures of domain 0006 before the commit:
6300c0000000-63ffffffffff : PCI Bus 0006:00
  6300c0000000-6300c9ffffff : PCI Bus 0006:01
    6300c0000000-6300c9ffffff : PCI Bus 0006:02
      6300c0000000-6300c8ffffff : PCI Bus 0006:03
        6300c0000000-6300c1ffffff : 0006:03:00.0
          6300c0000000-6300c1ffffff : mlx5_core
        6300c2000000-6300c3ffffff : 0006:03:00.1
          6300c2000000-6300c3ffffff : mlx5_core
        6300c4000000-6300c47fffff : 0006:03:00.2
        6300c4800000-6300c67fffff : 0006:03:00.0
        6300c6800000-6300c87fffff : 0006:03:00.1
      6300c9000000-6300c9bfffff : PCI Bus 0006:04
        6300c9000000-6300c9bfffff : PCI Bus 0006:05
          6300c9000000-6300c91fffff : PCI Bus 0006:06
          6300c9200000-6300c93fffff : PCI Bus 0006:07
          6300c9400000-6300c95fffff : PCI Bus 0006:08
          6300c9600000-6300c97fffff : PCI Bus 0006:09

After the commit:
6300c0000000-63ffffffffff : PCI Bus 0006:00
  6300c0000000-6300c9ffffff : PCI Bus 0006:01
    6300c0000000-6300c9ffffff : PCI Bus 0006:02
      6300c0000000-6300c43fffff : PCI Bus 0006:03
        6300c0000000-6300c1ffffff : 0006:03:00.0
          6300c0000000-6300c1ffffff : mlx5_core
        6300c2000000-6300c3ffffff : 0006:03:00.1
          6300c2000000-6300c3ffffff : mlx5_core
      6300c4400000-6300c4dfffff : PCI Bus 0006:04
        6300c4400000-6300c4dfffff : PCI Bus 0006:05
          6300c4400000-6300c45fffff : PCI Bus 0006:06
          6300c4600000-6300c47fffff : PCI Bus 0006:07
          6300c4800000-6300c49fffff : PCI Bus 0006:08
          6300c4a00000-6300c4bfffff : PCI Bus 0006:09

We can see that the window of 0006:03 gets shrunken too much and 0006:04
eats away the window for 0006:03:00.2.

The offending commit distributes the upstream bridge's resources
multiple times to every downstream bridges, hence makes the aperture
smaller than desired because calculation of io_per_b, mmio_per_b and
mmio_pref_per_b becomes incorrect.

Instead, distributing downstream bridges' own resources to resolve the
issue.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=219540
Cc: Carol Soto <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Chris Chiu <[email protected]>
Cc: Mika Westerberg <[email protected]>
Tested-by: Chia-Lin Kao (AceLan) <[email protected]>
Fixes: 7180c1d ("PCI: Distribute available resources for root buses, too")
Signed-off-by: Kai-Heng Feng <[email protected]>
[backported from https://lore.kernel.org/all/[email protected]/]
Signed-off-by: Carol L Soto <[email protected]>
@jamieNguyenNVIDIA
Copy link
Collaborator

Acked-by: Jamie Nguyen [email protected]

Copy link
Collaborator

@khfeng khfeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Kai-Heng Feng [email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants