| Commit message (Collapse) | Author | Files | Lines |
|
The whole approach wasn't thought through till the end.
We already had a reset lock like this in the past and it caused the same problems like this one.
Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary.
This reverts commit df9c8d1aa278c435c30a69b8f2418b4a52fcb929.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Support Sienna Cichlid mgpu fan boost enablement.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Support Navi1X mgpu fan boost enablement.
V2: rich the comment and correct the revision id check
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Enable mgpu fan boost feature on swSMU routines.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Cover the implementation details from outside(of power). Also preparing
for expanding this to swSMU.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
when function arcturus_get_smu_metrics_data() call failed,
it will cause the variable "throttler_status" isn't initialized before use.
warning:
powerplay/arcturus_ppt.c:2268:24: warning: ‘throttler_status’ may be used uninitialized in this function [-Wmaybe-uninitialized]
2268 | if (throttler_status & logging_label[throttler_idx].feature_mask) {
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
gfxoff is temporarily disabled for navy_flounder,
since at present the feature has broken some basic
amdgpu test.
Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
To fit the latest SMU firmware.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Instead of having one copy in each ASIC.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
To make the setting same as Arcturus/Navi1x/Sienna_Cichlid.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add more function pointers to amdgpu_mmhub_funcs. ASIC specific
implementation of most mmhub functions are called from a general
function pointer, instead of calling different function for
different ASIC. Simplify the code by deleting duplicate functions
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fixes: c030f2e4166c3f55 ("drm/amdgpu: add amdgpu_ras.c to support ras (v2)")
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[ 584.110304] ============================================
[ 584.110590] WARNING: possible recursive locking detected
[ 584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G OE
[ 584.111164] --------------------------------------------
[ 584.111456] kworker/38:1/553 is trying to acquire lock:
[ 584.111721] ffff9b15ff0a47a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.112112]
but task is already holding lock:
[ 584.112673] ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.113068]
other info that might help us debug this:
[ 584.113689] Possible unsafe locking scenario:
[ 584.114350] CPU0
[ 584.114685] ----
[ 584.115014] lock(&adev->reset_sem);
[ 584.115349] lock(&adev->reset_sem);
[ 584.115678]
*** DEADLOCK ***
[ 584.116624] May be due to missing lock nesting notation
[ 584.117284] 4 locks held by kworker/38:1/553:
[ 584.117616] #0: ffff9ad635c1d348 ((wq_completion)events){+.+.}, at: process_one_work+0x21f/0x630
[ 584.117967] #1: ffffac708e1c3e58 ((work_completion)(&con->recovery_work)){+.+.}, at: process_one_work+0x21f/0x630
[ 584.118358] #2: ffffffffc1c2a5d0 (&tmp->hive_lock){+.+.}, at: amdgpu_device_gpu_recover+0xae/0x1030 [amdgpu]
[ 584.118786] #3: ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.119222]
stack backtrace:
[ 584.119990] CPU: 38 PID: 553 Comm: kworker/38:1 Kdump: loaded Tainted: G OE 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1
[ 584.120782] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 584.121223] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
[ 584.121638] Call Trace:
[ 584.122050] dump_stack+0x98/0xd5
[ 584.122499] __lock_acquire+0x1139/0x16e0
[ 584.122931] ? trace_hardirqs_on+0x3b/0xf0
[ 584.123358] ? cancel_delayed_work+0xa6/0xc0
[ 584.123771] lock_acquire+0xb8/0x1c0
[ 584.124197] ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.124599] down_write+0x49/0x120
[ 584.125032] ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.125472] amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[ 584.125910] ? amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu]
[ 584.126367] amdgpu_ras_do_recovery+0x159/0x190 [amdgpu]
[ 584.126789] process_one_work+0x29e/0x630
[ 584.127208] worker_thread+0x3c/0x3f0
[ 584.127621] ? __kthread_parkme+0x61/0x90
[ 584.128014] kthread+0x12f/0x150
[ 584.128402] ? process_one_work+0x630/0x630
[ 584.128790] ? kthread_park+0x90/0x90
[ 584.129174] ret_from_fork+0x3a/0x50
Each adev has owned lock_class_key to avoid false positive
recursive locking.
v2:
1. register adev->lock_key into lockdep, otherwise lockdep will
report the below warning
[ 1216.705820] BUG: key ffff890183b647d0 has not been registered!
[ 1216.705924] ------------[ cut here ]------------
[ 1216.705972] DEBUG_LOCKS_WARN_ON(1)
[ 1216.705997] WARNING: CPU: 20 PID: 541 at kernel/locking/lockdep.c:3743 lockdep_init_map+0x150/0x210
v3:
change to use down_write_nest_lock to annotate the false dead-lock
warning.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
After amdgpu driver loading successfully, we can use
RAP debugfs interface <debugfs_dir>/dri/xxx/rap_test
to trigger RAP test.
Currently only L0 validate test is supported.
v2: refine amdgpu_rap.h
Signed-off-by: Wenhui Sheng <Wenhui.Sheng@amd.com>
Reviewed-by: Guchun Chen <Guchun.Chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Enable the RAP TA loading path and add RAP test
trigger interface.
v2: fix potential mem leak issue
Signed-off-by: Wenhui Sheng <Wenhui.Sheng@amd.com>
Reviewed-by: Guchun Chen <Guchun.Chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|