diff options
author | Christophe Lombard <clombard@linux.vnet.ibm.com> | 2016-04-22 15:39:22 +0200 |
---|---|---|
committer | Michael Ellerman <mpe@ellerman.id.au> | 2016-05-11 13:54:11 +0200 |
commit | 266eab8f32cc43b688c2e9aaab63c2565a3998c2 (patch) | |
tree | 3170b67c1bdd3484fa934695beb50fcea4dc3921 /drivers/misc/cxl/cxl.h | |
parent | cxl: Add kernel API to allow a context to operate with relocate disabled (diff) | |
download | linux-266eab8f32cc43b688c2e9aaab63c2565a3998c2.tar.xz linux-266eab8f32cc43b688c2e9aaab63c2565a3998c2.zip |
cxl: Check periodically the coherent platform function's state
In the PowerVM environment, the PHYP CoherentAccel component manages
the state of the Coherent Accelerator Processor Interface adapter and
virtualizes CAPI resources, handles CAPP, PSL, PSL Slice errors - and
interrupts - and provides a new set of hcalls for the OS APIs to utilize
Accelerator Function Unit (AFU).
During the course of operation, a coherent platform function can
encounter errors. Some possible reason for errors are:
• Hardware recoverable and unrecoverable errors
• Transient and over-threshold correctable errors
PHYP implements its own state model for the coherent platform function.
The state of the AFU is available through a hcall.
The current implementation of the cxl driver, for the PowerVM
environment, checks this state of the AFU only when an action is
requested - open a device, ioctl command, memory map, attach/detach a
process - from an external driver - cxlflash, libcxl. If an error is
detected the cxl driver handles the error according the content of the
Power Architecture Platform Requirements document.
But in case of low-level troubles (or error injection), the PHYP
component may reset the card and change the AFU state. The PHYP
interface doesn't provide any way to be notified when that happens thus
implies that the cxl driver:
• cannot handle immediatly the state change of the AFU.
• cannot notify other drivers (cxlflash, ...)
The purpose of this patch is to wake up the cpu periodically to check
the current state of each AFU and to see if we need to enter an error
recovery path.
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Diffstat (limited to 'drivers/misc/cxl/cxl.h')
-rw-r--r-- | drivers/misc/cxl/cxl.h | 4 |
1 files changed, 3 insertions, 1 deletions
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h index 9dd3a0e7917f..d23a3a59a712 100644 --- a/drivers/misc/cxl/cxl.h +++ b/drivers/misc/cxl/cxl.h @@ -366,11 +366,13 @@ struct cxl_afu_native { }; struct cxl_afu_guest { + struct cxl_afu *parent; u64 handle; phys_addr_t p2n_phys; u64 p2n_size; int max_ints; - struct mutex recovery_lock; + bool handle_err; + struct delayed_work work_err; int previous_state; }; |