-
Notifications
You must be signed in to change notification settings - Fork 5.3k
6.18: PCIe GPU testing (AMD and Intel Xe) #7113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rpi-6.18.y
Are you sure you want to change the base?
Conversation
|
Also just to keep the breadcrumbs going:
|
d9f2960 to
4a4ea03
Compare
2ef2bea to
b17e55e
Compare
|
Branch rebased to keep it relevant. I've just noted https://lore.kernel.org/dri-devel/[email protected]/ I don't see it having been merged, so at some point it would be useful to test it as the thread appears to have stalled. |
d003d86 to
dff72a6
Compare
b17e55e to
6bf3935
Compare
|
Branch rebased. |
|
I guess you don't want noise and user testing here. But just as a thumbs up, I've installed this kernel on a rpi5 with oculink-connected RX 6800 XT, and the combo seems rock-solid from a user perspective. I've run multiple games and benchmarks, as well as Jeffs llama examples both in chat- and in web-server mode. Not a single hickup. The card even seems to go into suspend mode and recovers on use. |
|
With the old PR on 6.17.x, I was able to get the AI Pro R9700 working. With this PR on 6.18.x, I get: Maybe the AMD driver in 6.18 has a new thing where it forces a BAR resize, and that's currently failing? The only thing I could find that's remotely related is this patch. Or could be related to this post: [REGRESSION] amdgpu fails to load external RX 580 since PCI: Allow relaxed bridge window tail sizing. The RX 7900 XT is loading fine, however. |
|
@pigong I haven't really the time to dig into these @geerlingguy There were other patches around for resizing the BAR and releasing the old allocation before creating the new one. I don't know if those would help in this situation. They seem to be referenced in #6621 |
|
@6by9 - With a fresh rebuild of the entire OS, I was able to get the R9700 working again, not sure why it wasn't before... but that previous install was about a month old at this point, and I generally rebuild things weekly, oops :) It's nice to have just one or two commands to run, and not have to wait for a kernel recompile! |
aa8e7a1 to
e132677
Compare
Only CMD38 with Arg=0x1 (Discard) is supported when in CQ mode, so turn it off before issuing a non-discard erase op. Signed-off-by: Jonathan Bell <[email protected]>
Recent Integral cards end up with corrupt sectors after a flash erase. This covers sizes for the A2 range, which can't be differentiated from the A1 range which might not have the same issue. Signed-off-by: Jonathan Bell <[email protected]>
Newer versions of the DesignWare I2C block support the detection of stuck signals, and a mechanism to recover from them. Add the required software support to the driver. This change was prompted by the observation that reading a single byte from register 0 of a VEML7700 seems to cause it to issue an ACK too early, and the controller to complain about losing arbitration. There is a suspicion that this may be a more widespread problem, but at least this patch prevents the bus from locking up. See: raspberrypi#6057 Signed-off-by: Phil Elwell <[email protected]>
In the absence of a value in Device Tree, set the SDA hold time to half the SCL low time. Signed-off-by: Phil Elwell <[email protected]>
This code: for_each_sg(sgl, sg, sg_len, i) num_sgs += DIV_ROUND_UP(sg_dma_len(sg), axi_block_len); determines how many hw_desc are allocated. If sg_dma_len(sg)=0 we don't allocate for this sgl. However in the next loop, we will increment loop for this case, and loop gets higher than num_sgs and we trample memory. Signed-off-by: Dom Cobley <[email protected]>
"rotation" is listed as a standard property of panels in panel-common.yaml, therefore it would be logical to process that from within the core code should a panel driver not implement the get_orientation hook. Call of_drm_get_panel_orientation from drm_connector_set_orientation_from_panel to get that information. This removes the need for any boiler-plate in panel drivers for calling drm_connector_set_orientation_from_panel or drm_connector_set_panel_orientation. Signed-off-by: Dave Stevenson <[email protected]>
The autodetection of resolution/timing by the TC358762 can lead to the display being shifted by a pixel or two. Program the TC358762 with the requested mode timing so that it can reproduce it accurately. Signed-off-by: Dave Stevenson <[email protected]>
Reverts 8a4b2fc ("drm/bridge: tc358762: Split register programming from pre-enable to enable") as we want the config commands sent before video starts. Signed-off-by: Dave Stevenson <[email protected]>
Having accepted the upstream change to add the persist_gpio_outputs parameter, make it true by default. See: raspberrypi#6117 Signed-off-by: Phil Elwell <[email protected]>
Even when configured to use only gpiod CS lines, the DW SPI controller still expects a bit to be set in the SER register, otherwise transfers stall. For the csgpiod case, nominate bit 0 for the job. See: raspberrypi#6159 Signed-off-by: Phil Elwell <[email protected]>
Remove the arbitrary 1 MiB transfer limit from imx500_spi_write() as the 1 MiB chunk limitation for firmware / network transfers is already upheld by imx500_state_transition(). This allows for larger input tensors to be injected via imx500_inject_input_tensor() (where there is no 1 MiB chunk limitation). Increase the number of poll attempts whilst waiting for the injection handshake register to be TRANS_COMP to allow more time for the checksum of larger input tensors to be calculated. Alter error messages in imx500_inject_input_tensor() so that imx500_spi_write() and imx500_injection_wait_transfer_complete() errors may be more easily differentiated. Signed-off-by: Richard Oliver <[email protected]>
…rror Connect PLL_AUDIO_SEC to CLK_AUDIO_OUT, which had been commented out to avoid interference with I2S: we expect them never to be enabled at the same time. Work around a rounding error that occurs when the desired rate is exactly the max but not exactly achievable by the PLL. Signed-off-by: Nick Hollinghurst <[email protected]>
PLL dividers are registered using the clk_hw in the clk_divider member of rp1_clk_desc, rather than the direct clk_hw member. In order for parent location to work, parent declarations must link to &<clock>.div.hw, not &<clock>.hw. Signed-off-by: Phil Elwell <[email protected]>
In fact the register field has 6 bits, but we only ever set it to unity. Due to a typo we were setting it to BIT(1) == 2, causing PLLs to run at half the desired rate. Signed-off-by: Nick Hollinghurst <[email protected]>
The determine_rate member of clk_ops returns the rate to the caller by modifying the pass-by-reference req structure. Its actual return value is a status code. Signed-off-by: Phil Elwell <[email protected]>
|
@6by9 - FYI @mariobalanica has a set of 5 patches for nouveau for Nvidia cards; I haven't had time to take a look yet, but something else interesting to note! https://github.com/mariobalanica/arm-pcie-gpu-patches/tree/nvidia-wip/linux/6.17 |
Various PCIe controllers on ARM64 platforms don't support cache snooping, which leads to numerous issues when attempting to use PCIe graphics cards. Switching ttm_prot_from_caching to return pgprot_dmacoherent for ttm_cached pages solves the issue, albeit with a performance hit. There is a second check in ttm_prot_from_caching that also needs updating. Signed-off-by: Yang Bo <[email protected]> Signed-off-by: Dave Stevenson <[email protected]>
Also includes SND_HDA_* modules for audio on AMD GPUs. Signed-off-by: Dave Stevenson <[email protected]>
Taken from https://github.com/chimera-linux/cports/blob/master/main/linux-stable/patches/xe-nonx86.patch Signed-off-by: Dave Stevenson <[email protected]>
Signed-off-by: Dave Stevenson <[email protected]>
Signed-off-by: Dave Stevenson <[email protected]>
40759d3 to
0deed8f
Compare
|
Branch rebased to keep it vaguely current. I'd seen that mariobalanica had some patches before, but had registered them for NVidia's proprietary drivers and not nouveau. I'll grab those patches and see whether it works with my old GT710 (too old to be supported by the latest proprietary drivers - I did try). |
|
Erm, I've just got kmstest producing a sensible output on a GT710! The framebuffer emulation is messed up - that's odd as XR24 (RGBX8888) works fine through kmstest. kmscube doesn't render correctly. It reports renderer "NV106", OpenGL ES 3.2, but flashes a vaguely recognisable variant of what kmscube should be producing as if the stride is wrong. Frame rate of 5.4fps @ 1080p, or 1.4fps @ 4k30. The kernel log has messages -22 being -EINVAL. That falls out from
There is a log line Start X and run Seeing as this PR is still being batted around, I've pushed those patches to the branch so others can give it a try without having to rebuild the kernel. |
Allow kernel/user space code to perform unaligned accesses to memory regions that do not normally support them (e.g. device mappings) by trapping alignment faults on common load/store instructions and breaking up the offending accesses into naturally aligned ones. Signed-off-by: Mario Bălănică <[email protected]>
PCIe device drivers may map MMIO space as Normal non-cacheable, for the purpose of enabling write combining or unaligned accesses. On many platforms (e.g. Ampere Altra, RK35xx), the PCIe interface cannot support unaligned outbound transactions. This may lead to data corruption, for instance, when a regular memcpy is performed by an application on a GPU's VRAM BAR. Add an option to force all software that maps PCIe MMIO space as Normal non-cacheable memory to use Device-nGnRE instead. If the strict alignment is not met, the CPU will raise alignment faults that can be further handled by the kernel by enabling CONFIG_ARM64_ALIGNMENT_FIXUPS. Signed-off-by: Mario Bălănică <[email protected]>
672b98e to
5652cd2
Compare
5652cd2 to
317c5c0
Compare
…itectures Avoid the warning "no previous prototype for 'range_is_pci' when building for ARM or other architectures. Signed-off-by: Dave Stevenson <[email protected]>
PCIe GPU device drivers may use normal cached mappings for DMA memory. This requires the PCIe interface to be coherent with the CPU caches, which is not supported by many Arm platforms (e.g. RK35xx), leading to data corruption on inbound transactions. Add an option to force write-combined mappings instead (Normal non-cacheable on Arm). Note that this is just a band-aid to keep the patch small. The TTM allocator should frankly not be concerned with hardware limitations and always pass the requested caching type (a driver could still use cached memory and perform its own cache maintenance). A proper solution would be for GPU drivers to check whether the device supports coherency and request the appropriate caching type. The drm_arch_can_wc_memory() helper also needs to be reworked or possibly even dropped. Signed-off-by: Mario Bălănică <[email protected]>
Root port BARs waste MMIO space, preventing large devices like GPUs from assigning their BARs. Signed-off-by: Mario Bălănică <[email protected]>
Signed-off-by: Mario Bălănică <[email protected]>
Signed-off-by: Dave Stevenson <[email protected]>
317c5c0 to
392fdfd
Compare
Well that puts the kibosh on using this particular card for labwc or other current window manager, same as the |
PR to create CI artifacts supporting AMD and Intel Xe GPUs.