summaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'drm-xe-fixes-2024-11-08' of ↵Dave Airlie2024-11-0812-41/+54
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Fix ccs_mode setting for Xe2 and later (Balasubramani) - Synchronize ccs_mode setting with client creation (Balasubramani) - Apply scheduling WA for LNL in additional places as needed (Nirmoy) - Fix leak and lock handling in error paths of xe_exec ioctl (Matthew Brost) - Fix GGTT allocation leak leading to eventual crash in SR-IOV (Michal Wajdeczko) - Move run_ticks update out of job handling to avoid synchronization with reader (Lucas) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/4ffcebtluaaaohquxfyf5babpihmtscxwad3jjmt5nggwh2xpm@ztw67ucywttg
| * drm/xe: Stop accumulating LRC timestamp on job_freeLucas De Marchi2024-11-062-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The exec queue timestamp is only really useful when it's being queried through the fdinfo. There's no need to update it so often, on every job_free. Tracing a simple app like vkcube running shows an update rate of ~ 120Hz. In case of discrete, the BO is on vram, creating a lot of pcie transactions. The update on job_free() is used to cover a gap: if exec queue is created and destroyed rapidly, before a new query, the timestamp still needs to be accumulated and accounted for in the xef. Initial implementation in commit 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime") couldn't do it on the exec_queue_fini since the xef could be gone at that point. However since commit ce8c161cbad4 ("drm/xe: Add ref counting for xe_file") the xef is refcounted and the exec queue always holds a reference, making this safe now. Improve the fix in commit 2149ded63079 ("drm/xe: Fix use after free when client stats are captured") by reducing the frequency in which the update is needed. Fixes: 2149ded63079 ("drm/xe: Fix use after free when client stats are captured") Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241104143815.2112272-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 83db047d9425d9a649f01573797558eff0f632e1) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
| * drm/xe/pf: Fix potential GGTT allocation leakMichal Wajdeczko2024-11-061-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In unlikely event that we fail during sending the new VF GGTT configuration to the GuC, we will free only the GGTT node data struct but will miss to release the actual GGTT allocation. This will later lead to list corruption, GGTT space leak and finally risking crash when unloading the driver: [ ] ... [drm] GT0: PF: Failed to provision VF1 with 1073741824 (1.00 GiB) GGTT (-EIO) [ ] ... [drm] GT0: PF: VF1 provisioning remains at 0 (0 B) GGTT [ ] list_add corruption. next->prev should be prev (ffff88813cfcd628), but was 0000000000000000. (next=ffff88813cfe2028). [ ] RIP: 0010:__list_add_valid_or_report+0x6b/0xb0 [ ] Call Trace: [ ] drm_mm_insert_node_in_range+0x2c0/0x4e0 [ ] xe_ggtt_node_insert+0x46/0x70 [xe] [ ] pf_provision_vf_ggtt+0x7f5/0xa70 [xe] [ ] xe_gt_sriov_pf_config_set_ggtt+0x5e/0x770 [xe] [ ] ggtt_set+0x4b/0x70 [xe] [ ] simple_attr_write_xsigned.constprop.0.isra.0+0xb0/0x110 [ ] ... [drm] GT0: PF: Failed to provision VF1 with 1073741824 (1.00 GiB) GGTT (-ENOSPC) [ ] ... [drm] GT0: PF: VF1 provisioning remains at 0 (0 B) GGTT [ ] Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b7b: 0000 [#1] PREEMPT SMP NOPTI [ ] RIP: 0010:drm_mm_remove_node+0x1b7/0x390 [ ] Call Trace: [ ] <TASK> [ ] ? die_addr+0x2e/0x80 [ ] ? exc_general_protection+0x1a1/0x3e0 [ ] ? asm_exc_general_protection+0x22/0x30 [ ] ? drm_mm_remove_node+0x1b7/0x390 [ ] ggtt_node_remove+0xa5/0xf0 [xe] [ ] xe_ggtt_node_remove+0x35/0x70 [xe] [ ] xe_ttm_bo_destroy+0x123/0x220 [xe] [ ] intel_user_framebuffer_destroy+0x44/0x70 [xe] [ ] intel_plane_destroy_state+0x3b/0xc0 [xe] [ ] drm_atomic_state_default_clear+0x1cd/0x2f0 [ ] intel_atomic_state_clear+0x9/0x20 [xe] [ ] __drm_atomic_state_free+0x1d/0xb0 Fix that by using pf_release_ggtt() on the error path, which now works regardless if the node has GGTT allocation or not. Fixes: 34e804220f69 ("drm/xe: Make xe_ggtt_node struct independent") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241104144901.1903-1-michal.wajdeczko@intel.com (cherry picked from commit 43b1dd2b550f0861ce80fbfffd5881b1b26272b1) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
| * drm/xe: Drop VM dma-resv lock on xe_sync_in_fence_get failure in exec IOCTLMatthew Brost2024-11-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Upon failure all locks need to be dropped before returning to the user. Fixes: 58480c1c912f ("drm/xe: Skip VMAs pin when requesting signal to the last XE_EXEC") Cc: <stable@vger.kernel.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241105043524.4062774-3-matthew.brost@intel.com (cherry picked from commit 7d1a4258e602ffdce529f56686925034c1b3b095) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
| * drm/xe: Fix possible exec queue leak in exec IOCTLMatthew Brost2024-11-061-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a couple of places after an exec queue is looked up the exec IOCTL returns on input errors without dropping the exec queue ref. Fix this ensuring the exec queue ref is dropped on input error. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: <stable@vger.kernel.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241105043524.4062774-2-matthew.brost@intel.com (cherry picked from commit 07064a200b40ac2195cb6b7b779897d9377e5e6f) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
| * drm/xe/guc/tlb: Flush g2h worker in case of tlb timeoutNirmoy Das2024-11-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flush the g2h worker explicitly if TLB timeout happens which is observed on LNL and that points to the recent scheduling issue with E-cores on LNL. This is similar to the recent fix: commit e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout") and should be removed once there is E core scheduling fix. v2: Add platform check(Himal) v3: Remove gfx platform check as the issue related to cpu platform(John) Use the common WA macro(John) and print when the flush resolves timeout(Matt B) v4: Remove the resolves log and do the flush before taking pending_lock(Matt A) Cc: Badal Nilawar <badal.nilawar@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: stable@vger.kernel.org # v6.11+ Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2687 Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241029120117.449694-3-nirmoy.das@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit e1f6fa55664a0eeb0a641f497e1adfcf6672e995) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
| * drm/xe/ufence: Flush xe ordered_wq in case of ufence timeoutNirmoy Das2024-11-041-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flush xe ordered_wq in case of ufence timeout which is observed on LNL and that points to recent scheduling issue with E-cores. This is similar to the recent fix: commit e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout") and should be removed once there is a E-core scheduling fix for LNL. v2: Add platform check(Himal) s/__flush_workqueue/flush_workqueue(Jani) v3: Remove gfx platform check as the issue related to cpu platform(John) v4: Use the Common macro(John) and print when the flush resolves timeout(Matt B) Cc: Badal Nilawar <badal.nilawar@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: stable@vger.kernel.org # v6.11+ Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2754 Suggested-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241029120117.449694-2-nirmoy.das@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 38c4c8722bd74452280951edc44c23de47612001) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
| * drm/xe: Move LNL scheduling WA to xe_device.hNirmoy Das2024-11-042-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move LNL scheduling WA to xe_device.h so this can be used in other places without needing keep the same comment about removal of this WA in the future. The WA, which flushes work or workqueues, is now wrapped in macros and can be reused wherever needed. Cc: Badal Nilawar <badal.nilawar@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> cc: stable@vger.kernel.org # v6.11+ Suggested-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241029120117.449694-1-nirmoy.das@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit cbe006a6492c01a0058912ae15d473f4c149896c) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
| * drm/xe: Use the filelist from drm for ccs_mode changeBalasubramani Vivekanandan2024-11-043-23/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Drop the exclusive client count tracking and use the filelist from the drm to track the active clients. This also ensures the clients created internally by the driver won't block changing the ccs mode. Fixes: ce8c161cbad4 ("drm/xe: Add ref counting for xe_file") Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241008073628.377433-3-balasubramani.vivekanandan@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 1c35f1ed1fe3c649f8c16214d0d3dd828b5265d9) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
| * drm/xe: Set mask bits for CCS_MODE registerBalasubramani Vivekanandan2024-11-042-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CCS_MODE register requires setting mask bits from Xe2+ platforms. Set the mask bits unconditionally, as those bits are unused for older platforms. Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Cc: stable@vger.kernel.org # v6.11+ Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241008073628.377433-2-balasubramani.vivekanandan@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 23ea2c7572d4735ef66beb1e4feb8ae510b78247) [ Fix conflict with mmio refactors ] Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
* | Merge tag 'drm-misc-fixes-2024-11-08' of ↵Dave Airlie2024-11-089-5/+92
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: imagination: - Track PVR context per file - Break ref-counting cycle panel-orientation-quirks: - Fix matching Lenovo Yoga Tab 3 X90F panthor: - Lock VM array - Be strict about I/O mapping flags Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20241108085058.GA37468@linux.fritz.box
| * | drm/panthor: Be stricter about IO mapping flagsJann Horn2024-11-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current panthor_device_mmap_io() implementation has two issues: 1. For mapping DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET, panthor_device_mmap_io() bails if VM_WRITE is set, but does not clear VM_MAYWRITE. That means userspace can use mprotect() to make the mapping writable later on. This is a classic Linux driver gotcha. I don't think this actually has any impact in practice: When the GPU is powered, writes to the FLUSH_ID seem to be ignored; and when the GPU is not powered, the dummy_latest_flush page provided by the driver is deliberately designed to not do any flushes, so the only thing writing to the dummy_latest_flush could achieve would be to make *more* flushes happen. 2. panthor_device_mmap_io() does not block MAP_PRIVATE mappings (which are mappings without the VM_SHARED flag). MAP_PRIVATE in combination with VM_MAYWRITE indicates that the VMA has copy-on-write semantics, which for VM_PFNMAP are semi-supported but fairly cursed. In particular, in such a mapping, the driver can only install PTEs during mmap() by calling remap_pfn_range() (because remap_pfn_range() wants to **store the physical address of the mapped physical memory into the vm_pgoff of the VMA**); installing PTEs later on with a fault handler (as panthor does) is not supported in private mappings, and so if you try to fault in such a mapping, vmf_insert_pfn_prot() splats when it hits a BUG() check. Fix it by clearing the VM_MAYWRITE flag (userspace writing to the FLUSH_ID doesn't make sense) and requiring VM_SHARED (copy-on-write semantics for the FLUSH_ID don't make sense). Reproducers for both scenarios are in the notes of my patch on the mailing list; I tested that these bugs exist on a Rock 5B machine. Note that I only compile-tested the patch, I haven't tested it; I don't have a working kernel build setup for the test machine yet. Please test it before applying it. Cc: stable@vger.kernel.org Fixes: 5fe909cae118 ("drm/panthor: Add the device logical block") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241105-panthor-flush-page-fixes-v1-1-829aaf37db93@google.com
| * | drm/panthor: Lock XArray when getting entries for the VMLiviu Dudau2024-11-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to commit cac075706f29 ("drm/panthor: Fix race when converting group handle to group object") we need to use the XArray's internal locking when retrieving a vm pointer from there. v2: Removed part of the patch that was trying to protect fetching the heap pointer from XArray, as that operation is protected by the @pool->lock. Fixes: 647810ec2476 ("drm/panthor: Add the MMU/VM logical block") Reported-by: Jann Horn <jannh@google.com> Cc: stable@vger.kernel.org Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241106185806.389089-1-liviu.dudau@arm.com
| * | drm: panel-orientation-quirks: Make Lenovo Yoga Tab 3 X90F DMI match less strictHans de Goede2024-11-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are 2G and 4G RAM versions of the Lenovo Yoga Tab 3 X90F and it turns out that the 2G version has a DMI product name of "CHERRYVIEW D1 PLATFORM" where as the 4G version has "CHERRYVIEW C0 PLATFORM". The sys-vendor + product-version check are unique enough that the product-name check is not necessary. Drop the product-name check so that the existing DMI match for the 4G RAM version also matches the 2G RAM version. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240825132131.6643-1-hdegoede@redhat.com
| * | Merge drm/drm-fixes into drm-misc-fixesThomas Zimmermann2024-11-0519-71/+258
| |\| | | | | | | | | | | | | | | | Backmerging to get the latest fixes from v6.12-rc6. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
| * | drm/imagination: Break an object reference loopBrendan King2024-11-044-4/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When remaining resources are being cleaned up on driver close, outstanding VM mappings may result in resources being leaked, due to an object reference loop, as shown below, with each object (or set of objects) referencing the object below it: PVR GEM Object GPU scheduler "finished" fence GPU scheduler “scheduled” fence PVR driver “done” fence PVR Context PVR VM Context PVR VM Mappings PVR GEM Object The reference that the PVR VM Context has on the VM mappings is a soft one, in the sense that the freeing of outstanding VM mappings is done as part of VM context destruction; no reference counts are involved, as is the case for all the other references in the loop. To break the reference loop during cleanup, free the outstanding VM mappings before destroying the PVR Context associated with the VM context. Signed-off-by: Brendan King <brendan.king@imgtec.com> Signed-off-by: Matt Coster <matt.coster@imgtec.com> Reviewed-by: Frank Binns <frank.binns@imgtec.com> Cc: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/8a25924f-1bb7-4d9a-a346-58e871dfb1d1@imgtec.com
| * | drm/imagination: Add a per-file PVR context listBrendan King2024-11-044-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a linked list of VM contexts which is needed for the next patch to be able to correctly track VM contexts for destruction on file close. It is only safe for VM contexts to be removed from the list and destroyed when not in interrupt context. Signed-off-by: Brendan King <brendan.king@imgtec.com> Signed-off-by: Matt Coster <matt.coster@imgtec.com> Reviewed-by: Frank Binns <frank.binns@imgtec.com> Cc: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/e57128ea-f0ce-4e93-a9d4-3f033a8b06fa@imgtec.com
* | | drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read()Alex Deucher2024-11-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid a possible buffer overflow if size is larger than 4K. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f5d873f5825b40d886d03bd2aede91d4cf002434) Cc: stable@vger.kernel.org
* | | drm/amdgpu: Adjust debugfs eviction and IB access permissionsAlex Deucher2024-11-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Users should not be able to run these. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7ba9395430f611cfc101b1c2687732baafa239d5) Cc: stable@vger.kernel.org
* | | drm/amdgpu: Adjust debugfs register access permissionsAlex Deucher2024-11-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Regular users shouldn't have read access. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c0cfd2e652553d607b910be47d0cc5a7f3a78641) Cc: stable@vger.kernel.org
* | | drm/amdgpu: Fix DPX valid mode check on GC 9.4.3Lijo Lazar2024-11-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For DPX mode, the number of memory partitions supported should be less than or equal to 2. Fixes: 1589c82a1085 ("drm/amdgpu: Check memory ranges for valid xcp mode") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 990c4f580742de7bb78fa57420ffd182fc3ab4cd) Cc: stable@vger.kernel.org
* | | drm/amd/pm: correct the workload settingKenneth Feng2024-11-0412-36/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Correct the workload setting in order not to mix the setting with the end user. Update the workload mask accordingly. v2: changes as below: 1. the end user can not erase the workload from driver except default workload. 2. always shows the real highest priority workoad to the end user. 3. the real workload mask is combined with driver workload mask and end user workload mask. v3: apply this to the other ASICs as well. v4: simplify the code v5: refine the code based on the review comments. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8cc438be5d49b8326b2fcade0bdb7e6a97df9e0b) Cc: stable@vger.kernel.org # 6.11.x
* | | drm/amd/pm: always pick the pptable from IFWIKenneth Feng2024-11-041-64/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | always pick the pptable from IFWI on smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 136ce12bd5907388cb4e9aa63ee5c9c8c441640b) Cc: stable@vger.kernel.org # 6.11.x
* | | drm/amdgpu: prevent NULL pointer dereference if ATIF is not supportedAntonio Quartulli2024-11-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | acpi_evaluate_object() may return AE_NOT_FOUND (failure), which would result in dereferencing buffer.pointer (obj) while being NULL. Although this case may be unrealistic for the current code, it is still better to protect against possible bugs. Bail out also when status is AE_NOT_FOUND. This fixes 1 FORWARD_NULL issue reported by Coverity Report: CID 1600951: Null pointer dereferences (FORWARD_NULL) Signed-off-by: Antonio Quartulli <antonio@mandelbit.com> Fixes: c9b7c809b89f ("drm/amd: Guard against bad data for ATIF ACPI method") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241031152848.4716-1-antonio@mandelbit.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 91c9e221fe2553edf2db71627d8453f083de87a1) Cc: stable@vger.kernel.org
* | | drm/amd/display: parse umc_info or vram_info based on ASICAurabindo Pillai2024-11-041-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An upstream bug report suggests that there are production dGPUs that are older than DCN401 but still have a umc_info in VBIOS tables with the same version as expected for a DCN401 product. Hence, reading this tables should be guarded with a version check. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3678 Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2551b4a321a68134360b860113dd460133e856e5) Fixes: 00c391102abc ("drm/amd/display: Add misc DC changes for DCN401") Cc: stable@vger.kernel.org # 6.11.x
* | | drm/amd/display: Fix brightness level not retained over rebootTom Chung2024-11-041-0/+15
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [Why] During boot up and resume the DC layer will reset the panel brightness to fix a flicker issue. It will cause the dm->actual_brightness is not the current panel brightness level. (the dm->brightness is the correct panel level) [How] Set the backlight level after do the set mode. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Fixes: d9e865826c20 ("drm/amd/display: Simplify brightness initialization") Reported-by: Mark Herbert <mark.herbert42@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3655 Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7875afafba84817b791be6d2282b836695146060) Cc: stable@vger.kernel.org
* | Merge tag 'drm-xe-fixes-2024-10-31' of ↵Dave Airlie2024-11-015-35/+72
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Fix missing HPD interrupt enabling, bringing one PM refactor with it (Imre / Maarten) - Workaround LNL GGTT invalidation not being visible to GuC (Matthew Brost) - Avoid getting jobs stuck without a protecting timeout (Matthew Brost) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/tsbftadm7owyizzdaqnqu7u4tqggxgeqeztlfvmj5fryxlfomi@5m5bfv2zvzmw
| * | drm/xe: Don't short circuit TDR on jobs not startedMatthew Brost2024-10-311-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Short circuiting TDR on jobs not started is an optimization which is not required. On LNL we are facing an issue where jobs do not get scheduled by the GuC if it misses a GGTT page update. When this occurs let the TDR fire, toggle the scheduling which may get the job unstuck, and print a warning message. If the TDR fires twice on job that hasn't started, timeout the job. v2: - Add warning message (Paulo) - Add fixes tag (Paulo) - Timeout job which hasn't started after TDR firing twice v3: - Include local change v4: - Short circuit check_timeout on job not started - use warn level rather than notice (Paulo) Fixes: 7ddb9403dd74 ("drm/xe: Sample ctx timestamp to determine if jobs have timed out") Cc: stable@vger.kernel.org Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241025214330.2010521-2-matthew.brost@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 35d25a4a0012e690ef0cc4c5440231176db595cc) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
| * | drm/xe: Add mmio read before GGTT invalidateMatthew Brost2024-10-311-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On LNL without a mmio read before a GGTT invalidate the GuC can incorrectly read the GGTT scratch page upon next access leading to jobs not getting scheduled. A mmio read before a GGTT invalidate seems to fix this. Since a GGTT invalidate is not a hot code path, blindly do a mmio read before each GGTT invalidate. Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: stable@vger.kernel.org Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3164 Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241023221200.1797832-1-matthew.brost@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 5a710196883e0ac019ac6df2a6d79c16ad3c32fa) [ Fix conflict with mmio vs gt argument ] Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
| * | drm/xe/display: Add missing HPD interrupt enabling during non-d3cold RPM resumeImre Deak2024-10-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Atm the display HPD interrupts that got disabled during runtime suspend, are re-enabled only if d3cold is enabled. Fix things by also re-enabling the interrupts if d3cold is disabled. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241009194358.1321200-5-imre.deak@intel.com (cherry picked from commit bbc4a30de095f0349d3c278500345a1b620d495e) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
| * | drm/xe/display: Separate the d3cold and non-d3cold runtime PM handlingImre Deak2024-10-291-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For clarity separate the d3cold and non-d3cold runtime PM handling. The only change in behavior is disabling polling later during runtime resume. This shouldn't make a difference, since the poll disabling is handled from a work, which could run at any point wrt. the runtime resume handler. The work will also require a runtime PM reference, syncing it with the resume handler. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241009194358.1321200-4-imre.deak@intel.com (cherry picked from commit a4de6beb83fc5adee788518350247c629568901e) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
| * | drm/xe: Remove runtime argument from display s/r functionsMaarten Lankhorst2024-10-293-28/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous change ensures that pm_suspend is only called when suspending or resuming. This ensures no further bugs like those in the previous commit. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240905150052.174895-3-maarten.lankhorst@linux.intel.com (cherry picked from commit f90491d4b64e302e940133103d3d9908e70e454f) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
* | | Merge tag 'mediatek-drm-fixes-20241028' of ↵Dave Airlie2024-10-3111-32/+179
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-fixes Mediatek DRM Fixes - 20241028 1. Fix degradation problem of alpha blending 2. Fix color format MACROs in OVL 3. Fix get efuse issue for MT8188 DPTX 4. Fix potential NULL dereference in mtk_crtc_destroy() 5. Correct dpi power-domains property 6. Add split subschema property constraints Signed-off-by: Dave Airlie <airlied@redhat.com> From: Chun-Kuang Hu <chunkuang.hu@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241028135846.3570-1-chunkuang.hu@kernel.org
| * | | drm/mediatek: Fix potential NULL dereference in mtk_crtc_destroy()Dan Carpenter2024-10-231-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In mtk_crtc_create(), if the call to mbox_request_channel() fails then we set the "mtk_crtc->cmdq_client.chan" pointer to NULL. In that situation, we do not call cmdq_pkt_create(). During the cleanup, we need to check if the "mtk_crtc->cmdq_client.chan" is NULL first before calling cmdq_pkt_destroy(). Calling cmdq_pkt_destroy() is unnecessary if we didn't call cmdq_pkt_create() and it will result in a NULL pointer dereference. Fixes: 7627122fd1c0 ("drm/mediatek: Add cmdq_handle in mtk_crtc") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/cc537bd6-837f-4c85-a37b-1a007e268310@stanley.mountain/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
| * | | drm/mediatek: Fix get efuse issue for MT8188 DPTXLiankun Yang2024-10-231-1/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update efuse data for MT8188 displayport. The DP monitor can not display when DUT connected to USB-c to DP dongle. Analysis view is invalid DP efuse data. Fixes: 350c3fe907fb ("drm/mediatek: dp: Add support MT8188 dp/edp function") Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Liankun Yang <liankun.yang@mediatek.com> Reviewed-by: Fei Shao <fshao@chromium.org> Tested-by: Fei Shao <fshao@chromium.org> Reviewed-by: CK Hu <ck.hu@mediatek.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20240923132521.22785-1-liankun.yang@mediatek.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
| * | | drm/mediatek: Fix color format MACROs in OVLHsin-Te Yuan2024-10-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 9f428b95ac89 ("drm/mediatek: Add new color format MACROs in OVL"), some new color formats are defined in the MACROs to make the switch statement more concise. That commit was intended to be a no-op cleanup. However, there are typos in these formats MACROs, which cause the return value to be incorrect. Fix the typos to ensure the return value remains unchanged. Fixes: 9f428b95ac89 ("drm/mediatek: Add new color format MACROs in OVL") Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Hsin-Te Yuan <yuanhsinte@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20241016-color-v3-1-e0f5f44a72d8@chromium.org/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
| * | | drm/mediatek: Add blend_modes to mtk_plane_init() for different SoCsJason-JH.Lin2024-10-2210-10/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since some SoCs support premultiplied pixel formats but some do not, the blend_modes parameter is added to mtk_plane_init(), which is obtained from the mtk_ddp_comp_get_blend_modes function implemented in different blending supported components. The blending supported components can use driver data to set the blend mode capabilities for different SoCs. Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20241009034646.13143-6-jason-jh.lin@mediatek.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
| * | | drm/mediatek: ovl: Add blend_modes to driver dataJason-JH.Lin2024-10-221-3/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OVL_CON_CLRFMT_MAN is a configuration for extending color format settings of DISP_REG_OVL_CON(n). It will change some of the original color format settings. Take the settings of (3 << 12) for example. - If OVL_CON_CLRFMT_MAN = 0 means OVL_CON_CLRFMT_RGBA8888. - If OVL_CON_CLRFMT_MAN = 1 means OVL_CON_CLRFMT_PARGB8888. Since previous SoCs did not support OVL_CON_CLRFMT_MAN, this means that the SoC does not support the premultiplied color format. It will break the original color format setting of MT8173. Therefore, the blend_modes is added to the driver data and then mtk_ovl_fmt_convert() will check the blend_modes to see if pre-multiplied is supported in the current platform. If it is not supported, use coverage mode to set it to the supported color formats to solve the degradation problem. Fixes: a3f7f7ef4bfe ("drm/mediatek: Support "Pre-multiplied" blending in OVL") Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20241009034646.13143-5-jason-jh.lin@mediatek.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
| * | | drm/mediatek: ovl: Remove the color format comment for ovl_fmt_convert()Jason-JH.Lin2024-10-221-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we changed MACROs to be consistent with DRM input color format naming, the comment for ovl_fmt_conver() is no longer needed. Fixes: 9f428b95ac89 ("drm/mediatek: Add new color format MACROs in OVL") Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20241009034646.13143-4-jason-jh.lin@mediatek.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
| * | | drm/mediatek: ovl: Refine ignore_pixel_alpha comment and placementJason-JH.Lin2024-10-221-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refine the comment for ignore_pixel_alpha flag and move it to if(state->fb) statement to make it less conditional. Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20241009034646.13143-3-jason-jh.lin@mediatek.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
| * | | drm/mediatek: ovl: Fix XRGB format breakage for blend_modes unsupported SoCsJason-JH.Lin2024-10-221-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OVL_CON_AEN is for alpha blending enable. For the SoC that is supported the blend_modes, OVL_CON_AEN will always enabled to use constant alpha and then use the ignore_pixel_alpha bit to do the alpha blending for XRGB8888 format. Note that ignore pixel alpha bit is not supported if the SoC is not supported the blend_modes. So it will break the original setting of XRGB8888 format for the blend_modes unsupported SoCs, such as MT8173. To fix the downgrade issue, enable alpha blending only when a valid blend_mode or has_alpha is set. Fixes: bc46eb5d5d77 ("drm/mediatek: Support DRM plane alpha in OVL") Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20241009034646.13143-2-jason-jh.lin@mediatek.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
* | | | Merge tag 'amd-drm-fixes-6.12-2024-10-31' of ↵Dave Airlie2024-10-313-4/+7
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.12-2024-10-31: amdgpu: - DCN 3.5 fix - Vangogh SMU KASAN fix - SMU 13 profile reporting fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241031151539.3523633-1-alexander.deucher@amd.com
| * | | drm/amdgpu/smu13: fix profile reportingAlex Deucher2024-10-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following 3 commits landed in parallel: commit d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting") commit 7a1613e47e65 ("drm/amdgpu/smu13: always apply the powersave optimization") commit 7c210ca5a2d7 ("drm/amdgpu: handle default profile on on devices without fullscreen 3D") While everything is set correctly, this caused the profile to be reported incorrectly because both the powersave and fullscreen3d bits were set in the mask and when the driver prints the profile, it looks for the first bit set. Fixes: d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting") Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ecfe9b237687a55d596fff0650ccc8cc455edd3f) Cc: stable@vger.kernel.org
| * | | drm/amd/pm: Vangogh: Fix kernel memory out of bounds writeTvrtko Ursulin2024-10-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KASAN reports that the GPU metrics table allocated in vangogh_tables_init() is not large enough for the memset done in smu_cmn_init_soft_gpu_metrics(). Condensed report follows: [ 33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu] [ 33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067 ... [ 33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G W 6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544 [ 33.861816] Tainted: [W]=WARN [ 33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023 [ 33.861822] Call Trace: [ 33.861826] <TASK> [ 33.861829] dump_stack_lvl+0x66/0x90 [ 33.861838] print_report+0xce/0x620 [ 33.861853] kasan_report+0xda/0x110 [ 33.862794] kasan_check_range+0xfd/0x1a0 [ 33.862799] __asan_memset+0x23/0x40 [ 33.862803] smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.863306] vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.864257] vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.865682] amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.866160] amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.867135] dev_attr_show+0x43/0xc0 [ 33.867147] sysfs_kf_seq_show+0x1f1/0x3b0 [ 33.867155] seq_read_iter+0x3f8/0x1140 [ 33.867173] vfs_read+0x76c/0xc50 [ 33.867198] ksys_read+0xfb/0x1d0 [ 33.867214] do_syscall_64+0x90/0x160 ... [ 33.867353] Allocated by task 378 on cpu 7 at 22.794876s: [ 33.867358] kasan_save_stack+0x33/0x50 [ 33.867364] kasan_save_track+0x17/0x60 [ 33.867367] __kasan_kmalloc+0x87/0x90 [ 33.867371] vangogh_init_smc_tables+0x3f9/0x840 [amdgpu] [ 33.867835] smu_sw_init+0xa32/0x1850 [amdgpu] [ 33.868299] amdgpu_device_init+0x467b/0x8d90 [amdgpu] [ 33.868733] amdgpu_driver_load_kms+0x19/0xf0 [amdgpu] [ 33.869167] amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu] [ 33.869608] local_pci_probe+0xda/0x180 [ 33.869614] pci_device_probe+0x43f/0x6b0 Empirically we can confirm that the former allocates 152 bytes for the table, while the latter memsets the 168 large block. Root cause appears that when GPU metrics tables for v2_4 parts were added it was not considered to enlarge the table to fit. The fix in this patch is rather "brute force" and perhaps later should be done in a smarter way, by extracting and consolidating the part version to size logic to a common helper, instead of brute forcing the largest possible allocation. Nevertheless, for now this works and fixes the out of bounds write. v2: * Drop impossible v3_0 case. (Mario) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: 41cec40bc9ba ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Evan Quan <evan.quan@amd.com> Cc: Wenyou Yang <WenYou.Yang@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0880f58f9609f0200483a49429af0f050d281703) Cc: stable@vger.kernel.org # v6.6+
| * | | Revert "drm/amd/display: update DML2 policy ↵Ovidiu Bunea2024-10-281-0/+1
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EnhancedPrefetchScheduleAccelerationFinal DCN35" This reverts commit 9dad21f910fc ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35") [why & how] The offending commit exposes a hang with lid close/open behavior. Both issues seem to be related to ODM 2:1 mode switching, so there is another issue generic to that sequence that needs to be investigated. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 68bf95317ebf2cfa7105251e4279e951daceefb7) Cc: stable@vger.kernel.org
* | | drm/tests: hdmi: Fix memory leaks in drm_display_mode_from_cea_vic()Jinjie Ruan2024-10-311-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | modprobe drm_hdmi_state_helper_test and then rmmod it, the following memory leak occurs. The `mode` allocated in drm_mode_duplicate() called by drm_display_mode_from_cea_vic() is not freed, which cause the memory leak: unreferenced object 0xffffff80ccd18100 (size 128): comm "kunit_try_catch", pid 1851, jiffies 4295059695 hex dump (first 32 bytes): 57 62 00 00 80 02 90 02 f0 02 20 03 00 00 e0 01 Wb........ ..... ea 01 ec 01 0d 02 00 00 0a 00 00 00 00 00 00 00 ................ backtrace (crc c2f1aa95): [<000000000f10b11b>] kmemleak_alloc+0x34/0x40 [<000000001cd4cf73>] __kmalloc_cache_noprof+0x26c/0x2f4 [<00000000f1f3cffa>] drm_mode_duplicate+0x44/0x19c [<000000008cbeef13>] drm_display_mode_from_cea_vic+0x88/0x98 [<0000000019daaacf>] 0xffffffedc11ae69c [<000000000aad0f85>] kunit_try_run_case+0x13c/0x3ac [<00000000a9210bac>] kunit_generic_run_threadfn_adapter+0x80/0xec [<000000000a0b2e9e>] kthread+0x2e8/0x374 [<00000000bd668858>] ret_from_fork+0x10/0x20 ...... Free `mode` by using drm_kunit_display_mode_from_cea_vic() to fix it. Cc: stable@vger.kernel.org Fixes: 4af70f19e559 ("drm/tests: Add RGB Quantization tests") Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241030023504.530425-4-ruanjinjie@huawei.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
* | | drm/connector: hdmi: Fix memory leak in drm_display_mode_from_cea_vic()Jinjie Ruan2024-10-311-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | modprobe drm_connector_test and then rmmod drm_connector_test, the following memory leak occurs. The `mode` allocated in drm_mode_duplicate() called by drm_display_mode_from_cea_vic() is not freed, which cause the memory leak: unreferenced object 0xffffff80cb0ee400 (size 128): comm "kunit_try_catch", pid 1948, jiffies 4294950339 hex dump (first 32 bytes): 14 44 02 00 80 07 d8 07 04 08 98 08 00 00 38 04 .D............8. 3c 04 41 04 65 04 00 00 05 00 00 00 00 00 00 00 <.A.e........... backtrace (crc 90e9585c): [<00000000ec42e3d7>] kmemleak_alloc+0x34/0x40 [<00000000d0ef055a>] __kmalloc_cache_noprof+0x26c/0x2f4 [<00000000c2062161>] drm_mode_duplicate+0x44/0x19c [<00000000f96c74aa>] drm_display_mode_from_cea_vic+0x88/0x98 [<00000000d8f2c8b4>] 0xffffffdc982a4868 [<000000005d164dbc>] kunit_try_run_case+0x13c/0x3ac [<000000006fb23398>] kunit_generic_run_threadfn_adapter+0x80/0xec [<000000006ea56ca0>] kthread+0x2e8/0x374 [<000000000676063f>] ret_from_fork+0x10/0x20 ...... Free `mode` by using drm_kunit_display_mode_from_cea_vic() to fix it. Cc: stable@vger.kernel.org Fixes: abb6f74973e2 ("drm/tests: Add HDMI TDMS character rate tests") Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241030023504.530425-3-ruanjinjie@huawei.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
* | | drm/tests: helpers: Add helper for drm_display_mode_from_cea_vic()Jinjie Ruan2024-10-311-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Maxime suggested, add a new helper drm_kunit_display_mode_from_cea_vic(), it can replace the direct call of drm_display_mode_from_cea_vic(), and it will help solving the `mode` memory leaks. Acked-by: Maxime Ripard <mripard@kernel.org> Suggested-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241030023504.530425-2-ruanjinjie@huawei.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
* | | drm/panthor: Report group as timedout when we fail to properly suspendBoris Brezillon2024-10-301-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we don't do that, the group is considered usable by userspace, but all further GROUP_SUBMIT will fail with -EINVAL. Changes in v3: - Add R-bs Changes in v2: - New patch Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241029152912.270346-3-boris.brezillon@collabora.com
* | | drm/panthor: Fail job creation when the group is deadBoris Brezillon2024-10-301-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Userspace can use GROUP_SUBMIT errors as a trigger to check the group state and recreate the group if it became unusable. Make sure we report an error when the group became unusable. Changes in v3: - None Changes in v2: - Add R-bs Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241029152912.270346-2-boris.brezillon@collabora.com