summaryrefslogtreecommitdiffstats
path: root/drivers/gpu (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'drm-fixes-5.4-2019-10-23' of ↵Dave Airlie2019-10-258-46/+92
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://people.freedesktop.org/~agd5f/linux into drm-fixes drm-fixes-5.4-2019-10-23: amdgpu: - Fix suspend/resume issue related to multi-media engines - Fix memory leak in user ptr code related to hmm conversion - Fix possible VM faults when allocating page table memory - Fix error handling in bo list ioctl Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191024031809.3155-1-alexander.deucher@amd.com
| * drm/amdgpu/vce: fix allocation size in enc ring testAlex Deucher2019-10-172-5/+16
| | | | | | | | | | | | | | | | | | | | | | We need to allocate a large enough buffer for the feedback buffer, otherwise the IB test can overwrite other memory. Reviewed-by: James Zhu <James.Zhu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
| * drm/amdgpu: fix error handling in amdgpu_bo_list_createChristian König2019-10-171-1/+6
| | | | | | | | | | | | | | | | We need to drop normal and userptr BOs separately. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| * drm/amdgpu: fix potential VM faultsChristian König2019-10-171-1/+2
| | | | | | | | | | | | | | | | | | When we allocate new page tables under memory pressure we should not evict old ones. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| * drm/amdgpu: user pages array memory leak fixPhilip Yang2019-10-171-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | user_pages array should always be freed after validation regardless if user pages are changed after bo is created because with HMM change parse bo always allocate user pages array to get user pages for userptr bo. v2: remove unused local variable and amend commit v3: add back get user pages in gem_userptr_ioctl, to detect application bug where an userptr VMA is not ananymous memory and reject it. Bugzilla: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1844962 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Joe Barnett <thejoe@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 5.3
| * drm/amdgpu/vcn: fix allocation size in enc ring testAlex Deucher2019-10-171-12/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to allocate a large enough buffer for the session info, otherwise the IB test can overwrite other memory. - Session info is 128K according to mesa - Use the same session info for create and destroy Bug: https://bugzilla.kernel.org/show_bug.cgi?id=204241 Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
| * drm/amdgpu/uvd7: fix allocation size in enc ring test (v2)Alex Deucher2019-10-171-11/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to allocate a large enough buffer for the session info, otherwise the IB test can overwrite other memory. v2: - session info is 128K according to mesa - use the same session info for create and destroy Bug: https://bugzilla.kernel.org/show_bug.cgi?id=204241 Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
| * drm/amdgpu/uvd6: fix allocation size in enc ring test (v2)Alex Deucher2019-10-171-10/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to allocate a large enough buffer for the session info, otherwise the IB test can overwrite other memory. v2: - session info is 128K according to mesa - use the same session info for create and destroy Bug: https://bugzilla.kernel.org/show_bug.cgi?id=204241 Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
* | Merge tag 'drm-misc-fixes-2019-10-23' of ↵Dave Airlie2019-10-252-3/+4
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Two fixes for komeda, one for typos and one to prevent an hardware issue when flushing inactive pipes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20191023112643.evpp6f23mpjwdsn4@gilmour
| * drm/komeda: Fix typos in komeda_splitter_validateMihail Atanassov2019-10-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Fix both the string and the struct member being printed. Changes since v1: - Now with a bonus grammar fix, too. Fixes: 264b9436d23b ("drm/komeda: Enable writeback split support") Reviewed-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com> Signed-off-by: Mihail Atanassov <mihail.atanassov@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190930122231.33029-1-mihail.atanassov@arm.com
| * drm/komeda: Don't flush inactive pipesMihail Atanassov2019-10-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | HW doesn't allow flushing inactive pipes and raises an MERR interrupt if you try to do so. Stop triggering the MERR interrupt in the middle of a commit by calling drm_atomic_helper_commit_planes with the ACTIVE_ONLY flag. Reviewed-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com> Signed-off-by: Mihail Atanassov <mihail.atanassov@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191010102950.56253-1-mihail.atanassov@arm.com
* | Merge tag 'drm-misc-fixes-2019-10-17' of ↵Dave Airlie2019-10-177-21/+33
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://anongit.freedesktop.org/drm/drm-misc into drm-fixes -dma-resv: Change shared_count to post-increment to fix lima crash (Qiang) -ttm: A couple fixes related to lifetime and restore prefault behavior (Christian & Thomas) -panfrost: Fill in missing feature reg values and fix stoppedjob timeouts (Steven) Cc: Qiang Yu <yuq825@gmail.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Christian König <christian.koenig@amd.com> Cc: Steven Price <steven.price@arm.com> Signed-off-by: Dave Airlie <airlied@redhat.com> From: Sean Paul <sean@poorly.run> Link: https://patchwork.freedesktop.org/patch/msgid/20191017203419.GA142909@art_vandelay
| * drm/panfrost: Handle resetting on timeout betterSteven Price2019-10-151-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Panfrost uses multiple schedulers (one for each slot, so 2 in reality), and on a timeout has to stop all the schedulers to safely perform a reset. However more than one scheduler can trigger a timeout at the same time. This race condition results in jobs being freed while they are still in use. When stopping other slots use cancel_delayed_work_sync() to ensure that any timeout started for that slot has completed. Also use mutex_trylock() to obtain reset_lock. This means that only one thread attempts the reset, the other threads will simply complete without doing anything (the first thread will wait for this in the call to cancel_delayed_work_sync()). While we're here and since the function is already dependent on sched_job not being NULL, let's remove the unnecessary checks. Fixes: aa20236784ab ("drm/panfrost: Prevent concurrent resets") Tested-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Steven Price <steven.price@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20191009094456.9704-1-steven.price@arm.com
| * drm/panfrost: Add missing GPU feature registersSteven Price2019-10-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | Three feature registers were declared but never actually read from the GPU. Add THREAD_MAX_THREADS, THREAD_MAX_WORKGROUP_SIZE and THREAD_MAX_BARRIER_SIZE so that the complete set are available. Fixes: 4bced8bea094 ("drm/panfrost: Export all GPU feature registers") Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20191014151515.13839-1-steven.price@arm.com
| * drm/ttm: fix handling in ttm_bo_add_mem_to_lruChristian König2019-10-141-2/+3
| | | | | | | | | | | | | | | | | | We should not add the BO to the swap LRU when the new mem is fixed and the TTM object about to be destroyed. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Link: https://patchwork.freedesktop.org/patch/335246/
| * drm/ttm: Restore ttm prefaultingThomas Hellstrom2019-10-141-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4daa4fba3a38 ("gpu: drm: ttm: Adding new return type vm_fault_t") broke TTM prefaulting. Since vmf_insert_mixed() typically always returns VM_FAULT_NOPAGE, prefaulting stops after the second PTE. Restore (almost) the original behaviour. Unfortunately we can no longer with the new vm_fault_t return type determine whether a prefaulting PTE insertion hit an already populated PTE, and terminate the insertion loop. Instead we continue with the pre-determined number of prefaults. Fixes: 4daa4fba3a38 ("gpu: drm: ttm: Adding new return type vm_fault_t") Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Christian König <christian.koenig@amd.com> Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/330387/
| * drm/ttm: fix busy reference in ttm_mem_evict_firstChristian König2019-10-141-2/+2
| | | | | | | | | | | | | | | | | | The busy BO might actually be already deleted, so grab only a list reference. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Thomas Hellström <thellstrom@vmware.com> Link: https://patchwork.freedesktop.org/patch/332877/
| * drm/msm/dsi: Implement reset correctlyJeffrey Hugo2019-10-111-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On msm8998, vblank timeouts are observed because the DSI controller is not reset properly, which ends up stalling the MDP. This is because the reset logic is not correct per the hardware documentation. The documentation states that after asserting reset, software should wait some time (no indication of how long), or poll the status register until it returns 0 before deasserting reset. wmb() is insufficient for this purpose since it just ensures ordering, not timing between writes. Since asserting and deasserting reset occurs on the same register, ordering is already guaranteed by the architecture, making the wmb extraneous. Since we would define a timeout for polling the status register to avoid a possible infinite loop, lets just use a static delay of 20 ms, since 16.666 ms is the time available to process one frame at 60 fps. Fixes: a689554ba6ed ("drm/msm: Initial add DSI connector support") Cc: Hai Li <hali@codeaurora.org> Cc: Rob Clark <robdclark@gmail.com> Signed-off-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com> Reviewed-by: Sean Paul <sean@poorly.run> [seanpaul renamed RESET_DELAY to DSI_RESET_TOGGLE_DELAY_MS] Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20191011133939.16551-1-jeffrey.l.hugo@gmail.com
| * drm/edid: Add 6 bpc quirk for SDC panel in Lenovo G50Kai-Heng Feng2019-10-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | Another panel that needs 6BPC quirk. BugLink: https://bugs.launchpad.net/bugs/1819968 Cc: <stable@vger.kernel.org> # v4.8+ Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190402033037.21877-1-kai.heng.feng@canonical.com
| * drm/tiny: Kconfig: Remove always-y THERMAL dep. from TINYDRM_REPAPERUlf Magnusson2019-10-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [cherry-picked to drm-misc-fixes: drm-misc-next commit dfef959803c7] Commit 554b3529fe01 ("thermal/drivers/core: Remove the module Kconfig's option") changed the type of THERMAL from tristate to bool, so THERMAL || !THERMAL is now always y. Remove the redundant dependency. Discovered through Kconfiglib detecting a dependency loop. The C tools simplify the expression to y before running dependency loop detection, and so don't see it. Changing the type of THERMAL back to tristate makes the C tools detect the same loop. Not sure if running dep. loop detection after simplification can be called a bug. Fixing this nit unbreaks Kconfiglib on the kernel at least. Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com> Signed-off-by: Noralf Trønnes <noralf@tronnes.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190927174218.GA32085@huvuddator
* | Merge tag 'drm-fixes-5.4-2019-10-16' of ↵Dave Airlie2019-10-176-46/+38
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://people.freedesktop.org/~agd5f/linux into drm-fixes drm-fixes-5.4-2019-10-16: amdgpu: - Powerplay fix for SMU7 parts - Bail earlier when cik/si support is not set to 1 - Fix an SDMA issue on navi radeon: - revert a PPC fix which broken x86 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191017022443.3853-1-alexander.deucher@amd.com
| * | drm/amdgpu/sdma5: fix mask value of POLL_REGMEM packet for pipe syncXiaojie Yuan2019-10-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sdma will hang once sequence number to be polled reaches 0x1000_0000 Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
| * | drm/amdgpu: Bail earlier when amdgpu.cik_/si_support is not set to 1Hans de Goede2019-10-122-35/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bail from the pci_driver probe function instead of from the drm_driver load function. This avoid /dev/dri/card0 temporarily getting registered and then unregistered again, sending unwanted add / remove udev events to userspace. Specifically this avoids triggering the (userspace) bug fixed by this plymouth merge-request: https://gitlab.freedesktop.org/plymouth/plymouth/merge_requests/59 Note that despite that being a userspace bug, not sending unnecessary udev events is a good idea in general. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1490490 Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
| * | Revert "drm/radeon: Fix EEH during kexec"Alex Deucher2019-10-121-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 6f7fe9a93e6c09bf988c5059403f5f88e17e21e6. This breaks some boards. Maybe just enable this on PPC for now? Bug: https://bugzilla.kernel.org/show_bug.cgi?id=205147 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
| * | drm/amdgpu/powerplay: fix typo in mvdd table setupAlex Deucher2019-10-092-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Polaris and vegam use count for the value rather than level. This looks like a copy paste typo from when the code was adapted from previous asics. I'm not sure that the SMU actually uses this value, so I don't know that it actually is a bug per se. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=108609 Reported-by: Robert Strube <rstrube@gmail.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* | | drm/i915: Fixup preempt-to-busy vs resubmission of a virtual requestChris Wilson2019-10-161-6/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As preempt-to-busy leaves the request on the HW as the resubmission is processed, that request may complete in the background and even cause a second virtual request to enter queue. This second virtual request breaks our "single request in the virtual pipeline" assumptions. Furthermore, as the virtual request may be completed and retired, we lose the reference the virtual engine assumes is held. Normally, just removing the request from the scheduler queue removes it from the engine, but the virtual engine keeps track of its singleton request via its ve->request. This pointer needs protecting with a reference. v2: Drop unnecessary motion of rq->engine = owner Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-1-chris@chris-wilson.co.uk (cherry picked from commit b647c7df01b75761b4c0b1cb6f4841088c0b1121) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
* | | drm/i915/userptr: Never allow userptr into the mappable GGTTChris Wilson2019-10-165-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Vetter uncovered a nasty cycle in using the mmu-notifiers to invalidate userptr objects which also happen to be pulled into GGTT mmaps. That is when we unbind the userptr object (on mmu invalidation), we revoke all CPU mmaps, which may then recurse into mmu invalidation. We looked for ways of breaking the cycle, but the revocation on invalidation is required and cannot be avoided. The only solution we could see was to not allow such GGTT bindings of userptr objects in the first place. In practice, no one really wants to use a GGTT mmapping of a CPU pointer... Just before Daniel's explosive lockdep patches land in v5.4-rc1, we got a genuine blip from CI: <4>[ 246.793958] ====================================================== <4>[ 246.793972] WARNING: possible circular locking dependency detected <4>[ 246.793989] 5.3.0-gbd6c56f50d15-drmtip_372+ #1 Tainted: G U <4>[ 246.794003] ------------------------------------------------------ <4>[ 246.794017] kswapd0/145 is trying to acquire lock: <4>[ 246.794030] 000000003f565be6 (&dev->struct_mutex/1){+.+.}, at: userptr_mn_invalidate_range_start+0x18f/0x220 [i915] <4>[ 246.794250] but task is already holding lock: <4>[ 246.794263] 000000001799cef9 (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe6/0x2a0 <4>[ 246.794291] which lock already depends on the new lock. <4>[ 246.794307] the existing dependency chain (in reverse order) is: <4>[ 246.794322] -> #3 (&anon_vma->rwsem){++++}: <4>[ 246.794344] down_write+0x33/0x70 <4>[ 246.794357] __vma_adjust+0x3d9/0x7b0 <4>[ 246.794370] __split_vma+0x16a/0x180 <4>[ 246.794385] mprotect_fixup+0x2a5/0x320 <4>[ 246.794399] do_mprotect_pkey+0x208/0x2e0 <4>[ 246.794413] __x64_sys_mprotect+0x16/0x20 <4>[ 246.794429] do_syscall_64+0x55/0x1c0 <4>[ 246.794443] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 246.794456] -> #2 (&mapping->i_mmap_rwsem){++++}: <4>[ 246.794478] down_write+0x33/0x70 <4>[ 246.794493] unmap_mapping_pages+0x48/0x130 <4>[ 246.794519] i915_vma_revoke_mmap+0x81/0x1b0 [i915] <4>[ 246.794519] i915_vma_unbind+0x11d/0x4a0 [i915] <4>[ 246.794519] i915_vma_destroy+0x31/0x300 [i915] <4>[ 246.794519] __i915_gem_free_objects+0xb8/0x4b0 [i915] <4>[ 246.794519] drm_file_free.part.0+0x1e6/0x290 <4>[ 246.794519] drm_release+0xa6/0xe0 <4>[ 246.794519] __fput+0xc2/0x250 <4>[ 246.794519] task_work_run+0x82/0xb0 <4>[ 246.794519] do_exit+0x35b/0xdb0 <4>[ 246.794519] do_group_exit+0x34/0xb0 <4>[ 246.794519] __x64_sys_exit_group+0xf/0x10 <4>[ 246.794519] do_syscall_64+0x55/0x1c0 <4>[ 246.794519] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 246.794519] -> #1 (&vm->mutex){+.+.}: <4>[ 246.794519] i915_gem_shrinker_taints_mutex+0x6d/0xe0 [i915] <4>[ 246.794519] i915_address_space_init+0x9f/0x160 [i915] <4>[ 246.794519] i915_ggtt_init_hw+0x55/0x170 [i915] <4>[ 246.794519] i915_driver_probe+0xc9f/0x1620 [i915] <4>[ 246.794519] i915_pci_probe+0x43/0x1b0 [i915] <4>[ 246.794519] pci_device_probe+0x9e/0x120 <4>[ 246.794519] really_probe+0xea/0x3d0 <4>[ 246.794519] driver_probe_device+0x10b/0x120 <4>[ 246.794519] device_driver_attach+0x4a/0x50 <4>[ 246.794519] __driver_attach+0x97/0x130 <4>[ 246.794519] bus_for_each_dev+0x74/0xc0 <4>[ 246.794519] bus_add_driver+0x13f/0x210 <4>[ 246.794519] driver_register+0x56/0xe0 <4>[ 246.794519] do_one_initcall+0x58/0x300 <4>[ 246.794519] do_init_module+0x56/0x1f6 <4>[ 246.794519] load_module+0x25bd/0x2a40 <4>[ 246.794519] __se_sys_finit_module+0xd3/0xf0 <4>[ 246.794519] do_syscall_64+0x55/0x1c0 <4>[ 246.794519] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 246.794519] -> #0 (&dev->struct_mutex/1){+.+.}: <4>[ 246.794519] __lock_acquire+0x15d8/0x1e90 <4>[ 246.794519] lock_acquire+0xa6/0x1c0 <4>[ 246.794519] __mutex_lock+0x9d/0x9b0 <4>[ 246.794519] userptr_mn_invalidate_range_start+0x18f/0x220 [i915] <4>[ 246.794519] __mmu_notifier_invalidate_range_start+0x85/0x110 <4>[ 246.794519] try_to_unmap_one+0x76b/0x860 <4>[ 246.794519] rmap_walk_anon+0x104/0x280 <4>[ 246.794519] try_to_unmap+0xc0/0xf0 <4>[ 246.794519] shrink_page_list+0x561/0xc10 <4>[ 246.794519] shrink_inactive_list+0x220/0x440 <4>[ 246.794519] shrink_node_memcg+0x36e/0x740 <4>[ 246.794519] shrink_node+0xcb/0x490 <4>[ 246.794519] balance_pgdat+0x241/0x580 <4>[ 246.794519] kswapd+0x16c/0x530 <4>[ 246.794519] kthread+0x119/0x130 <4>[ 246.794519] ret_from_fork+0x24/0x50 <4>[ 246.794519] other info that might help us debug this: <4>[ 246.794519] Chain exists of: &dev->struct_mutex/1 --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem <4>[ 246.794519] Possible unsafe locking scenario: <4>[ 246.794519] CPU0 CPU1 <4>[ 246.794519] ---- ---- <4>[ 246.794519] lock(&anon_vma->rwsem); <4>[ 246.794519] lock(&mapping->i_mmap_rwsem); <4>[ 246.794519] lock(&anon_vma->rwsem); <4>[ 246.794519] lock(&dev->struct_mutex/1); <4>[ 246.794519] *** DEADLOCK *** v2: Say no to mmap_ioctl Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111744 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111870 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190928082546.3473-1-chris@chris-wilson.co.uk (cherry picked from commit a4311745bba9763e3c965643d4531bd5765b0513) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
* | | drm/i915: Favor last VBT child device with conflicting AUX ch/DDC pinVille Syrjälä2019-10-161-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first come first served apporoach to handling the VBT child device AUX ch conflicts has backfired. We have machines in the wild where the VBT specifies both port A eDP and port E DP (in that order) with port E being the real one. So let's try to flip the preference around and let the last child device win once again. Cc: stable@vger.kernel.org Cc: Jani Nikula <jani.nikula@intel.com> Tested-by: Masami Ichikawa <masami256@gmail.com> Tested-by: Torsten <freedesktop201910@liggy.de> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111966 Fixes: 36a0f92020dc ("drm/i915/bios: make child device order the priority order") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191011202030.8829-1-ville.syrjala@linux.intel.com Acked-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 41e35ffb380bde1379e4030bb5b2ac824d5139cf) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
* | | drm/i915/execlists: Refactor -EIO markup of hung requestsChris Wilson2019-10-161-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull setting -EIO on the hung requests into its own utility function. Having allowed ourselves to short-circuit submission of completed requests, we can now do the mark_eio() prior to submission and avoid some redundant operations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-4-chris@chris-wilson.co.uk (cherry picked from commit 0d7cf7bc15e75bf79f2f65d61d19f896609f816a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
* | | Merge tag 'for-linus-5.4-rc3-tag' of ↵Linus Torvalds2019-10-121-10/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - correct panic handling when running as a Xen guest - cleanup the Xen grant driver to remove printing a pointer being always NULL - remove a soon to be wrong call of of_dma_configure() * tag 'for-linus-5.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Stop abusing DT of_dma_configure API xen/grant-table: remove unnecessary printing x86/xen: Return from panic notifier
| * | | xen: Stop abusing DT of_dma_configure APIRob Herring2019-10-101-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the removed comments say, these aren't DT based devices. of_dma_configure() is going to stop allowing a NULL DT node and calling it will no longer work. The comment is also now out of date as of commit 9ab91e7c5c51 ("arm64: default to the direct mapping in get_arch_dma_ops"). Direct mapping is now the default rather than dma_dummy_ops. According to Stefano and Oleksandr, the only other part needed is setting the DMA masks and there's no reason to restrict the masks to 32-bits. So set the masks to 64 bits. Cc: Robin Murphy <robin.murphy@arm.com> Cc: Julien Grall <julien.grall@arm.com> Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: xen-devel@lists.xenproject.org Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
* | | | Merge tag 'drm-intel-fixes-2019-10-10' of ↵Dave Airlie2019-10-1117-82/+188
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Fix CML display by adding a missing ID. - Drop redundant list_del_init - Only enqueue already completed requests to avoid races - Fixup preempt-to-busy vs reset of a virtual request - Protect peeking at execlists->active - execlists->active is serialised by the tasklet drm-intel-next-fixes-2019-09-19: - Extend old HSW workaround to fix some GPU hangs on Haswell GT2 - Fix return error code on GEM mmap. - White list a chicken bit register for push constants legacy mode on Mesa - Fix resume issue related to GGTT restore - Remove incorrect BUG_ON on execlist's schedule-out - Fix unrecoverable GPU hangs with Vulkan compute workloads on SKL drm-intel-next-fixes-2019-09-26: - Fix concurrence on cases where requests where getting retired at same time as resubmitted to HW - Fix gen9 display resolutions by setting the right max plane width - Fix GPU hang on preemption - Mark contents as dirty on a write fault. This was breaking cursor sprite with dumb buffers. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191010143039.GA15313@intel.com
| * | | | drm/i915/gt: execlists->active is serialised by the taskletChris Wilson2019-10-093-9/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The active/pending execlists is no longer protected by the engine->active.lock, but is serialised by the tasklet instead. Update the locking around the debug and stats to follow suit. v2: local_bh_disable() to prevent recursing into the tasklet in case we trigger a softirq (Tvrtko) Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191009160906.16195-1-chris@chris-wilson.co.uk (cherry picked from commit c36eebd9ba5d70b84e1e7408ccc7632566f285c4) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
| * | | | drm/i915/execlists: Protect peeking at execlists->activeChris Wilson2019-10-091-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we dropped the engine->active.lock serialisation from around process_csb(), direct submission can run concurrently to the interrupt handler. As such execlists->active may be advanced as we dequeue, dropping the reference to the request. We need to employ our RCU request protection to ensure that the request is not freed too early. Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191009100955.21477-1-chris@chris-wilson.co.uk (cherry picked from commit c949ae431467764277cdd88d7c26ff963a9db40a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
| * | | | drm/i915: Fixup preempt-to-busy vs reset of a virtual requestChris Wilson2019-10-092-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to the nature of preempt-to-busy the execlists active tracking and the schedule queue may become temporarily desync'ed (between resubmission to HW and its ack from HW). This means that we may have unwound a request and passed it back to the virtual engine, but it is still inflight on the HW and may even result in a GPU hang. If we detect that GPU hang and try to reset, the hanging request->engine will no longer match the current engine, which means that the request is not on the execlists active list and we should not try to find an older incomplete request. Given that we have deduced this must be a request on a virtual engine, it is the single active request in the context and so must be guilty (as the context is still inflight, it is prevented from being executed on another engine as we process the reset). Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-2-chris@chris-wilson.co.uk (cherry picked from commit cb2377a919bbe8107af269c5a31a8d5cfb27d867) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
| * | | | drm/i915: Only enqueue already completed requestsChris Wilson2019-10-093-38/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we are asked to submit a completed request, just move it onto the active-list without modifying it's payload. If we try to emit the modified payload of a completed request, we risk racing with the ring->head update during retirement which may advance the head past our breadcrumb and so we generate a warning for the emission being behind the RING_HEAD. v2: Commentary for the sneaky, shared responsibility between functions. v3: Spelling mistakes and bonus assertion Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk (cherry picked from commit c0bb487dc19fc45dbeede7dcf8f513df51a3cd33) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
| * | | | drm/i915/execlists: Drop redundant list_del_init(&rq->sched.link)Chris Wilson2019-10-091-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since amalgamating the queued and active lists in commit 422d7df4f090 ("drm/i915: Replace engine->timeline with a plain list"), performing a i915_request_submit() will remove the request from the execlists priority queue. References: 422d7df4f090 ("drm/i915: Replace engine->timeline with a plain list") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-2-chris@chris-wilson.co.uk (cherry picked from commit 3231f8c01121ee1febfd82398ee22f7ff9dc5d76) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
| * | | | drm/i915/cml: Add second PCH ID for CMPMatt Roper2019-10-092-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CMP PCH ID we have in the driver is correct for the CML-U machines we have in our CI system, but the CML-S and CML-H CI machines appear to use a different PCH ID, leading our driver to detect no PCH for them. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> References: 729ae330a0f2e2 ("drm/i915/cml: Introduce Comet Lake PCH") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111461 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190916233251.387-1-matthew.d.roper@intel.com Fixes: 729ae330a0f2e2 ("drm/i915/cml: Introduce Comet Lake PCH") (cherry picked from commit 8698ba53cd7173c32320ebbef4d389d41ebb5780) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
| * | | | drm/i915: Mark contents as dirty on a write faultChris Wilson2019-10-071-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since dropping the set-to-gtt-domain in commit a679f58d0510 ("drm/i915: Flush pages on acquisition"), we no longer mark the contents as dirty on a write fault. This has the issue of us then not marking the pages as dirty on releasing the buffer, which means the contents are not written out to the swap device (should we ever pick that buffer as a victim). Notably, this is visible in the dumb buffer interface used for cursors. Having updated the cursor contents via mmap, and swapped away, if the shrinker should evict the old cursor, upon next reuse, the cursor would be invisible. E.g. echo 80 > /proc/sys/kernel/sysrq ; echo f > /proc/sysrq-trigger Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111541 Fixes: a679f58d0510 ("drm/i915: Flush pages on acquisition") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190920121821.7223-1-chris@chris-wilson.co.uk (cherry picked from commit 5028851cdfdf78dc22eacbc44a0ab0b3f599ee4a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
| * | | | drm/i915: Prevent bonded requests from overtaking each other on preemptionChris Wilson2019-10-071-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Force bonded requests to run on distinct engines so that they cannot be shuffled onto the same engine where timeslicing will reverse the order. A bonded request will often wait on a semaphore signaled by its master, creating an implicit dependency -- if we ignore that implicit dependency and allow the bonded request to run on the same engine and before its master, we will cause a GPU hang. [Whether it will hang the GPU is debatable, we should keep on timeslicing and each timeslice should be "accidentally" counted as forward progress, in which case it should run but at one-half to one-third speed.] We can prevent this inversion by restricting which engines we allow ourselves to jump to upon preemption, i.e. baking in the arrangement established at first execution. (We should also consider capturing the implicit dependency using i915_sched_add_dependency(), but first we need to think about the constraints that requires on the execution/retirement ordering.) Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing") References: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding") Testcase: igt/gem_exec_balancer/bonded-slice Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-3-chris@chris-wilson.co.uk (cherry picked from commit e2144503bf3b22275dd33cef2880e1cb5fb200c5) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
| * | | | drm/i915: Bump skl+ max plane width to 5k for linear/x-tiledVille Syrjälä2019-10-071-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The officially validated plane width limit is 4k on skl+, however we already had people using 5k displays before we started to enforce the limit. Also it seems Windows allows 5k resolutions as well (though not sure if they do it with one plane or two). According to hw folks 5k should work with the possible exception of the following features: - Ytile (already limited to 4k) - FP16 (already limited to 4k) - render compression (already limited to 4k) - KVMR sprite and cursor (don't care) - horizontal panning (need to verify this) - pipe and plane scaling (need to verify this) So apart from last two items on that list we are already fine. We should really verify what happens with those last two items but I don't have a 5k display on hand atm so it'll have to wait. In the meantime let's just bump the limit back up to 5k since several users have already been using it without apparent issues. At least we'll be no worse off than we were prior to lowering the limits. Cc: stable@vger.kernel.org Cc: Sean Paul <sean@poorly.run> Cc: José Roberto de Souza <jose.souza@intel.com> Tested-by: Leho Kraav <leho@kraav.com> Fixes: 372b9ffb5799 ("drm/i915: Fix skl+ max plane width") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111501 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190905135044.2001-1-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Sean Paul <sean@poorly.run> (cherry picked from commit bed34ef544f9ab37ab349c04cf4142282c4dcf5d) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
| * | | | drm/i915: Verify the engine after acquiring the active.lockChris Wilson2019-10-071-3/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using virtual engines, the rq->engine is not stable until we hold the engine->active.lock (as the virtual engine may be exchanged with the sibling). Since commit 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") we may retire a request concurrently with resubmitting it to HW, we need to be extra careful to verify we are holding the correct lock for the request's active list. This is similar to the issue we saw with rescheduling the virtual requests, see sched_lock_engine(). Or else: <4> [876.736126] list_add corruption. prev->next should be next (ffff8883f931a1f8), but was dead000000000100. (prev=ffff888361ffa610). <4> [876.736136] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70 <4> [876.736137] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915] <4> [876.736154] CPU: 2 PID: 21 Comm: ksoftirqd/2 Tainted: G U 5.3.0-CI-CI_DRM_6898+ #1 <4> [876.736156] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019 <4> [876.736157] RIP: 0010:__list_add_valid+0x4d/0x70 <4> [876.736159] Code: c3 48 89 d1 48 c7 c7 20 33 0e 82 48 89 c2 e8 4a 4a bc ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 70 33 0e 82 e8 33 4a bc ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 c0 33 0e 82 e8 <4> [876.736160] RSP: 0018:ffffc9000018bd30 EFLAGS: 00010082 <4> [876.736162] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000104 <4> [876.736163] RDX: 0000000080000104 RSI: 0000000000000000 RDI: 00000000ffffffff <4> [876.736164] RBP: ffffc9000018bd68 R08: 0000000000000000 R09: 0000000000000001 <4> [876.736165] R10: 00000000aed95de3 R11: 000000007fe927eb R12: ffff888361ffca10 <4> [876.736166] R13: ffff888361ffa610 R14: ffff888361ffc880 R15: ffff8883f931a1f8 <4> [876.736168] FS: 0000000000000000(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000 <4> [876.736169] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [876.736170] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0 <4> [876.736171] PKRU: 55555554 <4> [876.736172] Call Trace: <4> [876.736226] __i915_request_submit+0x152/0x370 [i915] <4> [876.736263] __execlists_submission_tasklet+0x6da/0x1f50 [i915] <4> [876.736293] ? execlists_submission_tasklet+0x29/0x50 [i915] <4> [876.736321] execlists_submission_tasklet+0x34/0x50 [i915] <4> [876.736325] tasklet_action_common.isra.5+0x47/0xb0 <4> [876.736328] __do_softirq+0xd8/0x4ae <4> [876.736332] ? smpboot_thread_fn+0x23/0x280 <4> [876.736334] ? smpboot_thread_fn+0x6b/0x280 <4> [876.736336] run_ksoftirqd+0x2b/0x50 <4> [876.736338] smpboot_thread_fn+0x1d3/0x280 <4> [876.736341] ? sort_range+0x20/0x20 <4> [876.736343] kthread+0x119/0x130 <4> [876.736345] ? kthread_park+0xa0/0xa0 <4> [876.736347] ret_from_fork+0x24/0x50 <4> [876.736353] irq event stamp: 2290145 <4> [876.736356] hardirqs last enabled at (2290144): [<ffffffff8123cde8>] __slab_free+0x3e8/0x500 <4> [876.736358] hardirqs last disabled at (2290145): [<ffffffff819cfb4d>] _raw_spin_lock_irqsave+0xd/0x50 <4> [876.736360] softirqs last enabled at (2290114): [<ffffffff81c0033e>] __do_softirq+0x33e/0x4ae <4> [876.736361] softirqs last disabled at (2290119): [<ffffffff810b815b>] run_ksoftirqd+0x2b/0x50 <4> [876.736363] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70 <4> [876.736364] ---[ end trace 3e58d6c7356c65bf ]--- <4> [876.736406] ------------[ cut here ]------------ <4> [876.736415] list_del corruption. prev->next should be ffff888361ffca10, but was ffff88840ac2c730 <4> [876.736421] WARNING: CPU: 2 PID: 5490 at lib/list_debug.c:53 __list_del_entry_valid+0x79/0x90 <4> [876.736422] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915] <4> [876.736433] CPU: 2 PID: 5490 Comm: i915_selftest Tainted: G U W 5.3.0-CI-CI_DRM_6898+ #1 <4> [876.736435] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019 <4> [876.736436] RIP: 0010:__list_del_entry_valid+0x79/0x90 <4> [876.736438] Code: 0b 31 c0 c3 48 89 fe 48 c7 c7 30 34 0e 82 e8 ae 49 bc ff 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 68 34 0e 82 e8 97 49 bc ff <0f> 0b 31 c0 c3 48 c7 c7 a8 34 0e 82 e8 86 49 bc ff 0f 0b 31 c0 c3 <4> [876.736439] RSP: 0018:ffffc900003ef758 EFLAGS: 00010086 <4> [876.736440] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000002 <4> [876.736442] RDX: 0000000080000002 RSI: 0000000000000000 RDI: 00000000ffffffff <4> [876.736443] RBP: ffffc900003ef780 R08: 0000000000000000 R09: 0000000000000001 <4> [876.736444] R10: 000000001418e4b7 R11: 000000007f0ea93b R12: ffff888361ffcab8 <4> [876.736445] R13: ffff88843b6d0000 R14: 000000000000217c R15: 0000000000000001 <4> [876.736447] FS: 00007f4e6f255240(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000 <4> [876.736448] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [876.736449] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0 <4> [876.736450] PKRU: 55555554 <4> [876.736451] Call Trace: <4> [876.736488] i915_request_retire+0x224/0x8e0 [i915] <4> [876.736521] i915_request_create+0x4b/0x1b0 [i915] <4> [876.736550] nop_virtual_engine+0x230/0x4d0 [i915] Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111695 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190918145453.8800-1-chris@chris-wilson.co.uk (cherry picked from commit 37fa0de3c137d5f54f7e64f53495c9d501d42a4d) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
| * | | | drm/i915: Extend Haswell GT1 PSMI workaround to allChris Wilson2019-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A few times in CI, we have detected a GPU hang on our Haswell GT2 systems with the characteristic IPEHR of 0x780c0000. When the PSMI w/a was first introducted, it was applied to all Haswell, but later on we found an erratum that supposedly restricted the issue to GT1 and so constrained it only be applied on GT1. That may have been a mistake... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111692 Fixes: 167bc759e823 ("drm/i915: Restrict PSMI context load w/a to Haswell GT1") References: 2c550183476d ("drm/i915: Disable PSMI sleep messages on all rings around context switches") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190917194746.26710-1-chris@chris-wilson.co.uk (cherry picked from commit 56c05de6bd773b96deca379370965c49042b5fbf) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
| * | | | drm/i915: Don't mix srcu tag and negative error codesChris Wilson2019-10-073-10/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While srcu may use an integer tag, it does not exclude potential error codes and so may overlap with our own use of -EINTR. Use a separate outparam to store the tag, and report the error code separately. Fixes: 2caffbf11762 ("drm/i915: Revoke mmaps and prevent access to fence registers across reset") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190912160834.30601-1-chris@chris-wilson.co.uk (cherry picked from commit eebab60f224fcfd560957715d08c31564d8672ed) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
| * | | | drm/i915: Whitelist COMMON_SLICE_CHICKEN2Kenneth Graunke2019-10-071-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows userspace to use "legacy" mode for push constants, where they are committed at 3DPRIMITIVE or flush time, rather than being committed at 3DSTATE_BINDING_TABLE_POINTERS_XS time. Gen6-8 and Gen11 both use the "legacy" behavior - only Gen9 works in the "new" way. Conflating push constants with binding tables is painful for userspace, we would like to be able to avoid doing so. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Cc: stable@vger.kernel.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190911014801.26821-1-kenneth@whitecape.org (cherry picked from commit 0606259e3b3a1220a0f04a92a1654a3f674f47ee) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
| * | | | drm/i915: Perform GGTT restore much earlier during resumeChris Wilson2019-10-073-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As soon as we re-enable the various functions within the HW, they may go off and read data via a GGTT offset. Hence, if we have not yet restored the GGTT PTE before then, they may read and even *write* random locations in memory. Detected by DMAR faults during resume. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Martin Peres <martin.peres@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190909110011.8958-4-chris@chris-wilson.co.uk (cherry picked from commit cec5ca08e36fd18d2939b98055346b3b06f56c6c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
| * | | | drm/i915/execlists: Remove incorrect BUG_ON for schedule-outChris Wilson2019-10-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we may unwind incomplete requests (for preemption) prior to processing the CSB and the schedule-out events, we may update rq->engine (resetting it to point back to the parent virtual engine) prior to calling execlists_schedule_out(), invalidating the assertion that the request still points to the inflight engine. (The likelihood of this is increased if the CSB interrupt processing is pushed to the ksoftirqd for being too slow and direct submission overtakes it.) Tvrtko summarised it as: "So unwind from direct submission resets rq->engine and races with process_csb from the tasklet which notices request has actually completed." Reported-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190907105046.19934-1-chris@chris-wilson.co.uk (cherry picked from commit d810583fc2fcf139cc766eb2303500b2d9cf064d) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
* | | | | Merge tag 'drm-fixes-5.4-2019-10-09' of ↵Dave Airlie2019-10-111-7/+7
|\ \ \ \ \ | | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://people.freedesktop.org/~agd5f/linux into drm-fixes drm-fixes-5.4-2019-10-09: amdgpu: - fix memory leak in bo_list ioctl error path Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191010031023.23359-1-alexander.deucher@amd.com
| * | | | drm/amdgpu: fix memory leakNirmoy Das2019-10-091-7/+7
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cleanup error handling code and make sure temporary info array with the handles are freed by amdgpu_bo_list_put() on idr_replace()'s failure. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* | | | Merge tag 'drm-misc-fixes-2019-10-10' of ↵Dave Airlie2019-10-116-7/+39
|\ \ \ \ | |/ / / |/| | / | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull (less than what git shortlog provides): - SPI Aliases fixes for panels - One fix for the tc358767 bridge dealing with visual artifacts Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20191010105137.j6juxht5dsobgxph@gilmour