summaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/i915/intel_ringbuffer.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* drm/i915/execlists: Delay writing to ELSP until HW has processed the ↵Michel Thierry2017-11-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | previous write The hardware needs some time to process the information received in the ExecList Submission Port, and expects us to not write anything more until it has 'acknowledged' this new submission by sending an IDLE_ACTIVE or PREEMPTED CSB event. If we do not follow this, the driver could write new data into the ELSP before HW had finishing fetching the previous one, putting us in 'undefined behaviour' space. This seems to be the problem causing the spurious PREEMPTED & COMPLETE events after a COMPLETE like the one below: [] vcs0: sw rd pointer = 2, hw wr pointer = 0, current 'head' = 3. [] vcs0: Execlist CSB[0]: 0x00000018 _ 0x00000007 [] vcs0: Execlist CSB[1]: 0x00000001 _ 0x00000000 [] vcs0: Execlist CSB[2]: 0x00000018 _ 0x00000007 <<< COMPLETE [] vcs0: Execlist CSB[3]: 0x00000012 _ 0x00000007 <<< PREEMPTED & COMPLETE [] vcs0: Execlist CSB[4]: 0x00008002 _ 0x00000006 [] vcs0: Execlist CSB[5]: 0x00000014 _ 0x00000006 The ELSP writes that lead to this CSB sequence show that the HW hadn't started executing the previous execlist (the one with only ctx 0x6) by the time the new one was submitted; this is a bit more clear in the data show in the EXECLIST_STATUS register at the time of the ELSP write. [] vcs0: ELSP[0] = 0x0_0 [execlist1] - status_reg = 0x0_302 [] vcs0: ELSP[1] = 0x6_fedb2119 [execlist0] - status_reg = 0x0_8302 [] vcs0: ELSP[2] = 0x7_fedaf119 [execlist1] - status_reg = 0x0_8308 [] vcs0: ELSP[3] = 0x6_fedb2119 [execlist0] - status_reg = 0x7_8308 Note that having to wait for this ack does not disable lite-restores, although it may reduce their numbers. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102035 Signed-off-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/<20171118003038.7935-1-michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-4-chris@chris-wilson.co.uk Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
* drm/i915: Update execlists tasklet namingSagar Arun Kamble2017-11-161-2/+2
| | | | | | | | | | | | | | | | | | | | intel_lrc_irq_handler and i915_guc_irq_handler are HW submission related tasklet functions. Name them with "submission_tasklet" suffix and remove intel/i915 prefix as they are static. Also rename irq_tasklet as just tasklet for clarity. v2: s/_bh/_tasklet (Chris) Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/1510839162-25197-2-git-send-email-sagar.a.kamble@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
* drm/i915: Make request's wait-for-space explicitChris Wilson2017-11-151-0/+1
| | | | | | | | | | | | | | | At the start of building a request, we would wait for roughly enough space to fit the average request (to reduce the likelihood of having to wait and abort partway through request construction). To achieve we would try to begin a 0-length command packet, this just adds extra confusion so make the wait-for-space explicit, as in the next patch we want to move it from the backend to the i915_gem_request_alloc() so it can ensure that the wait-for-space is the first operation in building a new request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171115151204.8105-2-chris@chris-wilson.co.uk
* drm/i915: Stop caching the "golden" renderstateChris Wilson2017-11-101-2/+0
| | | | | | | | | | As we now record the default HW state and so only emit the "golden" renderstate once to prepare the HW, there is no advantage in keeping the renderstate batch around as it will never be used again. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-8-chris@chris-wilson.co.uk
* drm/i915: Record the default hw state after reset upon loadChris Wilson2017-11-101-0/+2
| | | | | | | | | | | | | | | | | | | Take a copy of the HW state after a reset upon module loading by executing a context switch from a blank context to the kernel context, thus saving the default hw state over the blank context image. We can then use the default hw state to initialise any future context, ensuring that each starts with the default view of hw state. v2: Unmap our default state from the GTT after stealing it from the context. This should stop us from accidentally overwriting it via the GTT (and frees up some precious GTT space). Testcase: igt/gem_ctx_isolation Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-7-chris@chris-wilson.co.uk
* drm/i915: Define an engine class enum for the uABITvrtko Ursulin2017-11-101-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | We want to be able to report back to userspace details about an engine's class, and in return for userspace to be able to request actions regarding certain classes of engines. To isolate the uABI from any variations between hw generations, we define an abstract class for the engines and internally map onto the hw. v2: Remove MAX from the uABI; keep it internal if we need it, but don't let userspace make the mistake of using it themselves. v3: s/OTHER/INVALID/ The use of OTHER is ill-defined, so remove it from the uABI as any future new type of engine can define a class to suit it. But keep a reserved value for an invalid class, so that we can always unambiguously express when something doesn't belong to the classification. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v2 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-1-chris@chris-wilson.co.uk
* drm/i915/guc: Preemption! With GuCMichał Winiarski2017-10-261-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pretty similar to what we have on execlists. We're reusing most of the GEM code, however, due to GuC quirks we need a couple of extra bits. Preemption is implemented as GuC action, and actions can be pretty slow. Because of that, we're using a mutex to serialize them. Since we're requesting preemption from the tasklet, the task of creating a workitem and wrapping it in GuC action is delegated to a worker. To distinguish that preemption has finished, we're using additional piece of HWSP, and since we're not getting context switch interrupts, we're also adding a user interrupt. The fact that our special preempt context has completed unfortunately doesn't mean that we're ready to submit new work. We also need to wait for GuC to finish its own processing. v2: Don't compile out the wait for GuC, handle workqueue flush on reset, no need for ordered workqueue, put on a reviewer hat when looking at my own patches (Chris) Move struct work around in intel_guc, move user interruput outside of conditional (Michał) Keep ring around rather than chase though intel_context v3: Extract WA for flushing ggtt writes to a helper (Chris) Keep work_struct in intel_guc rather than engine (Michał) Use ordered workqueue for inject_preempt worker to avoid GuC quirks. v4: Drop now unused INTEL_GUC_PREEMPT_OPTION_IMMEDIATE (Daniele) Drop stray newlines, use container_of for intel_guc in worker, check for presence of workqueue when flushing it, rather than enable_guc_submission modparam, reorder preempt postprocessing (Chris) v5: Make wq NULL after destroying it v6: Swap struct guc_preempt_work members (Michał) Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jeff McGee <jeff.mcgee@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171026133558.19580-1-michal.winiarski@intel.com
* drm/i915: Add information needed to track engine preempt stateMichał Winiarski2017-10-261-0/+7
| | | | | | | | | | | | | | | | | | We shouldn't inspect ELSP context status (or any other bits depending on specific submission backend) when using GuC submission. Let's use another piece of HWSP for preempt context, to write its bit of information, meaning that preemption has finished, and hardware is now idle. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jeff McGee <jeff.mcgee@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jeff McGee <jeff.mcgee@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171025200020.16636-9-michal.winiarski@intel.com
* drm/i915: Extract "emit write" part of emit breadcrumb functionsMichał Winiarski2017-10-261-0/+38
| | | | | | | | | | | | | | | | | Let's separate the "emit" part from touching any internal structures, this way we can have a generic "emit coherent GGTT write" function. We would like to reuse this functionality for emitting HWSP write, to confirm that preempt-to-idle has finished. v2: Reorder args to match emit_pipe_control, s/render/rcs (Chris) Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171025200020.16636-8-michal.winiarski@intel.com
* drm/i915/guc: Always enable the breadcrumbs irqChris Wilson2017-10-251-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The execlists emulation on top of the GuC (used for scheduling and preemption) depends on the MI_USER_INTERRUPT for its notifications and tasklet action. As we always employ the irq, there is no advantage in ever disabling it while we are using the GuC, so allow us to arm the breadcrumb irq when enabling GuC submission and disarm upon disabling. The impact should be lessened by the delayed irq disabling we do (we only disable after receiving an interrupt for which no one was wanting), but allowing guc to explicitly manage the irq in relation to itself is simpler and prevents an issue with losing an interrupt for preemption as it is not coupled to an active request. Internally, we add a reference counter (breadcrumbs.irq_enabled) as a simple mechanism to allow GuC to keep the breadcrumb irq enabled. To improve upon always enabling the irq while guc is selected, we need to hook into the parking facility of intel_engines so that we only enable the breadcrumbs while the GT is active (one step better would be to individually park/unpark each engine). In effect, this means that we keep the breadcrumb irq always enabled for the entire duration the guc is busy, whereas before we would try to switch it off whenever we idled for more than interrupt with no associated waiters. The difference *should* be negligible in practice! v2: Stop abusing fence signaling (and its auxiliary data structures) to enable the breadcrumbs irqs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com>, Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>, Link: https://patchwork.freedesktop.org/patch/msgid/20171025143943.7661-3-chris@chris-wilson.co.uk
* drm/i915: Add a hook for making the engines idle (parking) and unparkingChris Wilson2017-10-251-1/+6
| | | | | | | | | | | | | | In the next patch, we will want to install a callback when the engines (GT as a whole) become idle and similarly when they first become busy. To enable that callback, first rename intel_engines_mark_idle() to intel_engines_park() and provide the companion intel_engines_unpark(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171025143943.7661-2-chris@chris-wilson.co.uk
* drm/i915: Use same test for eviction and submitting kernel contextChris Wilson2017-10-251-0/+2
| | | | | | | | | | | | | | | | During evict, we wish to idle the GPU if we see that the GGTT is full. However, our test for idle in i915_gem_evict_something() and in i915_gem_switch_to_kernel_context() do not match leading to disappointment - we never believe that we are idle and keep trying to flush the GGTT ad infinitum. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103438 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171024220855.30155-2-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
* drm/i915: Filter out spurious execlists context-switch interruptsChris Wilson2017-10-241-2/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Back in commit a4b2b01523a8 ("drm/i915: Don't mark an execlists context-switch when idle") we noticed the presence of late context-switch interrupts. We were able to filter those out by looking at whether the ELSP remained active, but in commit beecec901790 ("drm/i915/execlists: Preemption!") that became problematic as we now anticipate receiving a context-switch event for preemption while ELSP may be empty. To restore the spurious interrupt suppression, add a counter for the expected number of pending context-switches and skip if we do not need to handle this interrupt to make forward progress. v2: Don't forget to switch on for preempt. v3: Reduce the counter to a on/off boolean tracker. Declare the HW as active when we first submit, and idle after the final completion event (with which we confirm the HW says it is idle), and track each source of activity separately. With a finite number of sources, it should aide us in debugging which gets stuck. Fixes: beecec901790 ("drm/i915/execlists: Preemption!") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171023213237.26536-3-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
* drm/i915: Make i915_engine_info pretty printer to standaloneChris Wilson2017-10-091-0/+4
| | | | | | | | | | | | We can use drm_printer to hide the differences between printk and seq_printf, and so make the i915_engine_info pretty printer able to be called from different contexts and not just debugfs. For instance, I want to use the pretty printer to debug kselftests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171009110301.21705-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
* drm/i915/execlists: Preemption!Chris Wilson2017-10-041-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we write to ELSP, it triggers a context preemption at the earliest arbitration point (3DPRIMITIVE, some PIPECONTROLs, a few other operations and the explicit MI_ARB_CHECK). If this is to the same context, it triggers a LITE_RESTORE where the RING_TAIL is merely updated (used currently to chain requests from the same context together, avoiding bubbles). However, if it is to a different context, a full context-switch is performed and it will start to execute the new context saving the image of the old for later execution. Previously we avoided preemption by only submitting a new context when the old was idle. But now we wish embrace it, and if the new request has a higher priority than the currently executing request, we write to the ELSP regardless, thus triggering preemption, but we tell the GPU to switch to our special preemption context (not the target). In the context-switch interrupt handler, we know that the previous contexts have finished execution and so can unwind all the incomplete requests and compute the new highest priority request to execute. It would be feasible to avoid the switch-to-idle intermediate by programming the ELSP with the target context. The difficulty is in tracking which request that should be whilst maintaining the dependency change, the error comes in with coalesced requests. As we only track the most recent request and its priority, we may run into the issue of being tricked in preempting a high priority request that was followed by a low priority request from the same context (e.g. for PI); worse still that earlier request may be our own dependency and the order then broken by preemption. By injecting the switch-to-idle and then recomputing the priority queue, we avoid the issue with tracking in-flight coalesced requests. Having tried the preempt-to-busy approach, and failed to find a way around the coalesced priority issue, Michal's original proposal to inject an idle context (based on handling GuC preemption) succeeds. The current heuristic for deciding when to preempt are only if the new request is of higher priority, and has the privileged priority of greater than 0. Note that the scheduler remains unfair! v2: Disable for gen8 (bdw/bsw) as we need additional w/a for GPGPU. Since, the feature is now conditional and not always available when we have a scheduler, make it known via the HAS_SCHEDULER GETPARAM (now a capability mask). v3: Stylistic tweaks. v4: Appease Joonas with a snippet of kerneldoc, only to fuel to fire of the preempt vs preempting debate. Suggested-by: Michal Winiarski <michal.winiarski@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-8-chris@chris-wilson.co.uk
* drm/i915: Make execlist port count variableMika Kuoppala2017-09-251-4/+17
| | | | | | | | | | | | | | As we emulate execlists on top of the GuC workqueue, it is not restricted to just 2 ports and we can increase that number arbitrarily to trade-off queue depth (i.e. scheduling latency) against pipeline bubbles. v2: rebase. better commit msg (Chris) v3: rebase Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-5-mika.kuoppala@intel.com
* drm/i915: Add execlist_port_completeMika Kuoppala2017-09-251-1/+13
| | | | | | | | | | | | | When first execlist entry is processed, we move the port (contents). Introduce function for this as execlist and guc use this common operation. v2: rebase. s/GEM_DEBUG_BUG/GEM_BUG (Chris) v3: rebase Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-4-mika.kuoppala@intel.com
* drm/i915: Make own struct for execlist itemsMika Kuoppala2017-09-251-21/+79
| | | | | | | | | | | | | | | | | | | | Engine's execlist related items have been increasing to a point where a separate struct is warranted. Carve execlist specific items to a dedicated struct to add clarity. v2: add kerneldoc and fix whitespace (Joonas, Chris) v3: csb_mmio changes, rebase v4: s/\b(el|execlist)\b/execlists/ (Joonas) Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> (v3) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v3) Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-1-mika.kuoppala@intel.com
* drm/i915: Cancel all ready but queued requests when wedgingChris Wilson2017-09-181-0/+8
| | | | | | | | | | | | | | | | | | | | When wedging the hw, we want to mark all in-flight requests as -EIO. This is made slightly more complex by execlists who store the ready but not yet submitted-to-hw requests on a private queue (an rbtree priolist). Call into execlists to cancel not only the ELSP tracking for the submitted requests, but also the queue of unsubmitted requests. v2: Move the majority of engine_set_wedged to the backends (both legacy ringbuffer and execlists handling their own lists). Reported-by: Michał Winiarski <michal.winiarski@intel.com> Testcase: igt/gem_eio/in-flight-contexts Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170915173100.26470-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
* drm/i915/execlists: Read the context-status HEAD from the HWSPChris Wilson2017-09-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The engine also provides a mirror of the CSB write pointer in the HWSP, but not of our read pointer. To take advantage of this we need to remember where we read up to on the last interrupt and continue off from there. This poses a problem following a reset, as we don't know where the hw will start writing from, and due to the use of power contexts we cannot perform that query during the reset itself. So we continue the current modus operandi of delaying the first read of the context-status read/write pointers until after the first interrupt. With this we should now have eliminated all uncached mmio reads in handling the context-status interrupt, though we still have the uncached mmio writes for submitting new work, and many uncached mmio reads in the global interrupt handler itself. Still a step in the right direction towards reducing our resubmit latency, although it appears lost in the noise! v2: Cannonlake moved the CSB write index v3: Include the sw/hwsp state in debugfs/i915_engine_info v4: Also revert to using CSB mmio for GVT-g v5: Prevent the compiler reloading tail (Mika) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-6-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
* drm/i915/execlists: Read the context-status buffer from the HWSPChris Wilson2017-09-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The engine provides a mirror of the CSB in the HWSP. If we use the cacheable reads from the HWSP, we can shave off a few mmio reads per context-switch interrupt (which are quite frequent!). Just removing a couple of mmio is not enough to actually reduce any latency, but a small reduction in overall cpu usage. Much appreciation for Ben dropping the bombshell that the CSB was in the HWSP and for Michel in digging out the details. v2: Don't be lazy, add the defines for the indices. v3: Include the HWSP in debugfs/i915_engine_info v4: Check for GVT-g, it currently depends on intercepting CSB mmio v5: Fixup GVT-g mmio path v6: Disable HWSP if VT-d is active as the iommu adds unpredictable memory latency. (Mika) v7: Also markup the CSB read with READ_ONCE() as it may still be an mmio read and we want to stop the compiler from issuing a later (v.slow) reload. Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170913133534.26927-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
* drm/i915: Disable MI_STORE_DATA_IMM for i915g/i915gmChris Wilson2017-09-061-11/+1
| | | | | | | | | | | | | | The early gen3 machines (i915g/Grantsdale and i915gm/Alviso) share a lot of characteristics in their MI/GTT blocks with gen2, and in particular can only use physical addresses in MI_STORE_DATA_IMM. This makes it incompatible with our usage, so include those two machines in the blacklist to prevent usage. v2: Make it easy for gcc and rewrite it as a switch to save some space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v1 Link: https://patchwork.freedesktop.org/patch/msgid/20170906152859.5304-1-chris@chris-wilson.co.uk
* drm/i915: Don't use MI_STORE_DWORD_IMM on Sandybridge/vcsChris Wilson2017-08-181-0/+12
| | | | | | | | | | | | | MI_STORE_DWORD_IMM just doesn't work on the video decode engine under Sandybridge, so refrain from using it. Then switch the selftests over to using the now common test prior to using MI_STORE_DWORD_IMM. Fixes: 7dd4f6729f92 ("drm/i915: Async GPU relocation processing") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.13-rc1+ Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
* drm/i915: Look for active requests earlier in the reset pathMichel Thierry2017-06-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | And store the active request so that we only search for it once. v2: Check for request completion inside _prepare_engine, don't use ECANCELED, remove unnecessary null checks (Chris). v3: Capture active requests during reset_prepare and store it the engine hangcheck obj. v4: Rename commit, change i915_gem_reset_request to just confirm the active_request is still incomplete, instead of engine_stalled (Chris). v5: With style; pass the active request to gem_reset_engine, keep single return in reset_prepare_engine (Chris). v6: Moved before reset-engine code appears (Chris) Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5) Signed-off-by: Michel Thierry <michel.thierry@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-2-michel.thierry@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-3-chris@chris-wilson.co.uk
* drm/i915: Split execlist priority queue into rbtree + linked listChris Wilson2017-05-171-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All the requests at the same priority are executed in FIFO order. They do not need to be stored in the rbtree themselves, as they are a simple list within a level. If we move the requests at one priority into a list, we can then reduce the rbtree to the set of priorities. This should keep the height of the rbtree small, as the number of active priorities can not exceed the number of active requests and should be typically only a few. Currently, we have ~2k possible different priority levels, that may increase to allow even more fine grained selection. Allocating those in advance seems a waste (and may be impossible), so we opt for allocating upon first use, and freeing after its requests are depleted. To avoid the possibility of an allocation failure causing us to lose a request, we preallocate the default priority (0) and bump any request to that priority if we fail to allocate it the appropriate plist. Having a request (that is ready to run, so not leading to corruption) execute out-of-order is better than leaking the request (and its dependency tree) entirely. There should be a benefit to reducing execlists_dequeue() to principally using a simple list (and reducing the frequency of both rbtree iteration and balancing on erase) but for typical workloads, request coalescing should be small enough that we don't notice any change. The main gain is from improving PI calls to schedule, and the explicit list within a level should make request unwinding simpler (we just need to insert at the head of the list rather than the tail and not have to make the rbtree search more complicated). v2: Avoid use-after-free when deleting a depleted priolist v3: Michał found the solution to handling the allocation failure gracefully. If we disable all priority scheduling following the allocation failure, those requests will be executed in fifo and we will ensure that this request and its dependencies are in strict fifo (even when it doesn't realise it is only a single list). Normal scheduling is restored once we know the device is idle, until the next failure! Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
* drm/i915/execlists: Pack the count into the low bits of the port.requestChris Wilson2017-05-171-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187) function old new delta execlists_submit_ports 262 471 +209 port_assign.isra - 136 +136 capture 6344 6359 +15 reset_common_ring 438 452 +14 execlists_submit_request 228 238 +10 gen8_init_common_ring 334 341 +7 intel_engine_is_idle 106 105 -1 i915_engine_info 2314 2290 -24 __i915_gem_set_wedged_BKL 485 411 -74 intel_lrc_irq_handler 1789 1604 -185 execlists_update_context 294 - -294 The most important change there is the improve to the intel_lrc_irq_handler and excclist_submit_ports (net improvement since execlists_update_context is now inlined). v2: Use the port_api() for guc as well (even though currently we do not pack any counters in there, yet) and hide all port->request_count inside the helpers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk
* drm/i915: Micro-optimise hotpath through intel_ring_begin()Chris Wilson2017-05-041-1/+2
| | | | | | | | | | | | | | Typically, there is space available within the ring and if not we have to wait (by definition a slow path). Rearrange the code to reduce the number of branches and stack size for the hotpath, accomodating a slight growth for the wait. v2: Fix the new assert that packets are not larger than the actual ring. v3: Make the parameters unsigned as well to make usage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170504130846.4807-3-chris@chris-wilson.co.uk
* drm/i915: Report the ring->space from intel_ring_update_space()Chris Wilson2017-05-041-1/+1
| | | | | | | | | | Some callers immediately want to know the current ring->space after calling intel_ring_update_space(), which we can freely provide via the return parameter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170504130846.4807-2-chris@chris-wilson.co.uk
* drm/i915: Avoid the branch in computing intel_ring_space()Chris Wilson2017-05-041-14/+22
| | | | | | | | | | | | Exploit the power-of-two ring size to compute the space across the wraparound using a mask rather than a if. Convert to unsigned integers so the operation is well defined. References: https://bugs.freedesktop.org/show_bug.cgi?id=99671 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170504130846.4807-1-chris@chris-wilson.co.uk
* drm/i915: Use engine->context_pin() to report the intel_ringChris Wilson2017-05-041-2/+2
| | | | | | | | | | | | | | | | | | | Since unifying ringbuffer/execlist submission to use engine->pin_context, we ensure that the intel_ring is available before we start constructing the request. We can therefore move the assignment of the request->ring to the central i915_gem_request_alloc() and not require it in every engine->request_alloc() callback. Another small step towards simplification (of the core, but at a cost of handling error pointers in less important callers of engine->pin_context). v2: Rearrange a few branches to reduce impact of PTR_ERR() on gcc's code generation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170504093308.4137-1-chris@chris-wilson.co.uk
* drm/i915: Sanitize engine context sizesJoonas Lahtinen2017-04-281-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Pre-calculate engine context size based on engine class and device generation and store it in the engine instance. v2: - Squash and get rid of hw_context_size (Chris) v3: - Move after MMIO init for probing on Gen7 and 8 (Chris) - Retained rounding (Tvrtko) v4: - Rebase for deferred legacy context allocation Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: intel-gvt-dev@lists.freedesktop.org Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
* drm/i915: Skip waking the signaler when enabling before request submissionChris Wilson2017-04-261-1/+2
| | | | | | | | | | | | | | If we are enabling the breadcrumbs signaling prior to submitting the request, we know that we cannot have missed the interrupt and can therefore skip immediately waking the signaler to check. This reduces a significant chunk of the __i915_gem_request_submit() overhead for inter-engine synchronisation, for example in gem_exec_whisper. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170426080659.28771-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
* drm/i915: Differentiate between sw write location into ring and last hw readChris Wilson2017-04-251-2/+17
| | | | | | | | | | | | | | | | | | | | We need to keep track of the last location we ask the hw to read up to (RING_TAIL) separately from our last write location into the ring, so that in the event of a GPU reset we do not tell the HW to proceed into a partially written request (which can happen if that request is waiting for an external signal before being executed). v2: Refactor intel_ring_reset() (Mika) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100144 Testcase: igt/gem_exec_fence/await-hang Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests") Fixes: d55ac5bf97c6 ("drm/i915: Defer transfer onto execution timeline to actual hw submission") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170425130049.26147-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
* drm/i915: Rename intel_engine_cs.exec_id to uabi_idChris Wilson2017-04-111-1/+1
| | | | | | | | | | | | We want to refer to the index of the engine consistently throughout the userspace ABI. We already have such an index through the execbuffer engine specifier, that needs to be able to refer to each engine specifically, so rename it the index to uabi_id to reflect its generality beyond execbuf. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170411124306.15448-1-chris@chris-wilson.co.uk
* drm/i915: Generate the engine name based on the instance numberOscar Mateo2017-04-111-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Not really needed, but makes the next change a little bit more compact. v2: - Use zero-based numbering for engine names: xcs0, xcs1.. xcsN (Tvrtko, Chris) - Make sure the mock engine name is null-terminated (Tvrtko, Chris) v3: Because I'm stupid (Chris) v4: Verify engine name wasn't truncated (Michal) v5: - Kill the warning in mock engine (Chris) Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1491834873-9345-4-git-send-email-oscar.mateo@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
* drm/i915: Use the same vfunc for BSD2 ring initOscar Mateo2017-04-111-1/+0
| | | | | | | | | | | | | | | | | | | | | | If we needed to do something different for the init functions, we could always look at the engine instance to make the distinction. But, in any case, the two functions are virtually identical already (please notice that BSD2_RING is only used from gen8 onwards). With this, the init functions depends excusively on the engine class (a fact that we will use soon). v2: Commit message Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1491834873-9345-3-git-send-email-oscar.mateo@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
* drm/i915: Classify the engines in class + instanceDaniele Ceraolo Spurio2017-04-111-0/+4
| | | | | | | | | | | | | | | | | | | | In such a way that vcs and vcs2 are just two different instances (0 and 1) of the same engine class (VIDEO_DECODE_CLASS). v2: Align the instance types (Tvrtko) v3: Don't use enums for bspec-defined stuff (Michal) Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1491834873-9345-2-git-send-email-oscar.mateo@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
* drm/i915: intel_ring.engine is unusedChris Wilson2017-04-031-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Or rather it is used only by intel_ring_pin() to extract the drm_i915_private which we can easily pass in. As this is a relatively rare operation, save the space in the struct, and as such it is even break even in the extra code for passing around the parameter: add/remove: 0/0 grow/shrink: 2/3 up/down: 15/-15 (0) function old new delta intel_init_ring_buffer 906 918 +12 execlists_context_pin 1308 1311 +3 mock_engine 407 403 -4 intel_engine_create_ring 367 363 -4 intel_ring_pin 326 319 -7 Total: Before=1261794, After=1261794, chg +0.00% v2: Reorder intel_init_ring_buffer to keep the ring setup together: add/remove: 0/0 grow/shrink: 2/3 up/down: 9/-15 (-6) function old new delta intel_init_ring_buffer 906 912 +6 execlists_context_pin 1308 1311 +3 mock_engine 407 403 -4 intel_engine_create_ring 367 363 -4 intel_ring_pin 326 319 -7 Total: Before=1261794, After=1261788, chg -0.00% Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170403113426.25707-1-chris@chris-wilson.co.uk
* drm/i915: Refactor tests for validity of RING_TAILChris Wilson2017-03-271-0/+11
| | | | | | | | | | | | | | Whilst I like having the assertions clearly visible in the code, they are quite repetitious! As we find new limits we want to incorporate into the set of assertions, it make sense to refactor them to a common routine. v2: Add a guc holdout. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170327131412.20293-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
* drm/i915/execlists: Wrap tail pointer after reset tweakingChris Wilson2017-03-271-2/+8
| | | | | | | | | | | | | | | | | | If the request->wa_tail is 0 (because it landed exactly on the end of the ringbuffer), when we reconstruct request->tail following a reset we fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is never able to catch up with RING_TAIL and the GPU spins endlessly. If the ring contains a couple of breadcrumbs, even our hangcheck is unable to catch the busy-looping as the ACTHD and seqno continually advance. v2: Move the wrap into a common intel_ring_wrap(). Fixes: a3aabe86a340 ("drm/i915/execlists: Reinitialise context image after GPU hang") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: <stable@vger.kernel.org> # v4.10+ Link: http://patchwork.freedesktop.org/patch/msgid/20170327130009.4678-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
* drm/i915: Use BIT() for computing the engine's flagChris Wilson2017-03-271-2/+2
| | | | | | | | | Since the engine's flag is just the bit of its id, use BIT(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170324163540.31981-3-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
* drm/i915: Remove unused intel_flush_status_page()Chris Wilson2017-03-271-8/+0
| | | | | | | | | | | | | intel_flush_status_page() is defunct since commit f8dd2934c4ec ("drm/i915: Remove BXT incoherent seqno write workaround"), time to remove it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170324163540.31981-2-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
* drm/i915: Fixup intel_write_status_page() for old CPUs without clflushChris Wilson2017-03-271-7/+15
| | | | | | | | | | | | | | | Not all of our target platforms have clflush. For those without, just assume the status page is sufficiently coherent that we do not need our paranoia. Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: 14a6bbf9e535 ("drm/i915: Replace irq_seqno_barrier on hws write with a clflush") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170324163540.31981-1-chris@chris-wilson.co.uk Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
* drm/i915: Remove intel_ring.last_retired_headChris Wilson2017-03-211-10/+0
| | | | | | | | | | Storing the position of the breadcrumb of the last retired request as a separate last_retired_head is superfluous as we always copy that into head prior to recalculation of the intel_ring.space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170321102552.24357-1-chris@chris-wilson.co.uk
* drm/i915: Move engine->submit_request selection to a vfuncChris Wilson2017-03-161-0/+4
| | | | | | | | | | | | | | | | | It turns out that we may want to restore the original engine->submit_request (and engine->schedule) callbacks from more than just the guc <-> execlists transition. Move this to a vfunc so we can have a common interface. v2: Move initial selection to intel_engines_init_common(), repaint vfunc with engine->set_default_submission (and a similar colour for the helper). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-2-chris@chris-wilson.co.uk
* drm/i915: make context status notifier head be per engineChangbin Du2017-03-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GVTg has introduced the context status notifier to schedule the GVTg workload. At that time, the notifier is bound to GVTg context only, so GVTg is not aware of host workloads. Now we are going to improve GVTg's guest workload scheduler policy, and add Guc emulation support for new Gen graphics. Both these two features require acknowledgment for all contexts running on hardware. (But will not alter host workload.) So here try to make some change. The change is simple: 1. Move the context status notifier head from i915_gem_context to intel_engine_cs. Which means there is a notifier head per engine instead of per context. Execlist driver still call notifier for each context sched-in/out events of current engine. 2. At GVTg side, it binds a notifier_block for each physical engine at GVTg initialization period. Then GVTg can hear all context status events. In this patch, GVTg do nothing for host context event, but later will add a function there. But in any case, the notifier callback is a noop if this is no active vGPU. Since intel_gvt_init() is called at early initialization stage and require the status notifier head has been initiated, I initiate it in intel_engine_setup(). v2: remove a redundant newline. (chris) Fixes: 3c7ba6359d70 ("drm/i915: Introduce execlist context status change notification") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100232 Signed-off-by: Changbin Du <changbin.du@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170313024711.28591-1-changbin.du@intel.com Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
* drm/i915: Replace irq_seqno_barrier on hws write with a clflushChris Wilson2017-03-161-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When manually overwriting the HWS, rather than assume irq_seqno_barrier does the right thing, we can explicitly flush the cacheline instead. This avoids us calling the engine->irq_seqno_barrier() from an illegal context: [ 1472.651797] BUG: scheduling while atomic: migration/0/11/0x00000002 [ 1472.651807] Modules linked in: ctr ccm arc4 snd_hda_codec_hdmi bnep rfcomm iwldvm snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel mac80211 snd_hda_codec snd_hda_core snd_pcm dm_multipath snd_hwdep intel_powerclamp coretemp snd_seq_midi crct10dif_pclmul snd_seq_midi_event crc32_pclmul iwlwifi ghash_clmulni_intel btusb snd_rawmidi btrtl aesni_intel btbcm aes_x86_64 crypto_simd btintel cryptd glue_helper bluetooth snd_seq cfg80211 snd_timer snd_seq_device intel_ips binfmt_misc snd mei_me soundcore mei dm_mirror dm_region_hash dm_log i915 intel_gtt i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea prime_numbers e1000e drm ahci libahci [ 1472.651897] CPU: 0 PID: 11 Comm: migration/0 Tainted: G U 4.11.0-rc1+ #203 [ 1472.651899] Hardware name: LENOVO 514328U/514328U, BIOS 6QET44WW (1.14 ) 04/20/2010 [ 1472.651900] Call Trace: [ 1472.651913] dump_stack+0x63/0x90 [ 1472.651922] __schedule_bug+0x5d/0x6b [ 1472.651930] __schedule+0x46a/0x5f0 [ 1472.651934] schedule+0x38/0x90 [ 1472.651938] schedule_hrtimeout_range_clock+0x85/0x110 [ 1472.651945] ? hrtimer_init+0x10/0x10 [ 1472.651949] schedule_hrtimeout_range+0xe/0x10 [ 1472.651952] usleep_range+0x4d/0x60 [ 1472.652037] gen5_seqno_barrier+0x13/0x20 [i915] [ 1472.652101] intel_engine_init_global_seqno+0xd7/0x160 [i915] [ 1472.652160] __i915_gem_set_wedged_BKL+0xa0/0x180 [i915] [ 1472.652166] multi_cpu_stop+0xbb/0xe0 [ 1472.652170] ? cpu_stop_queue_work+0x90/0x90 [ 1472.652174] cpu_stopper_thread+0x82/0x110 [ 1472.652179] smpboot_thread_fn+0x137/0x190 [ 1472.652184] kthread+0xf7/0x130 [ 1472.652187] ? sort_range+0x20/0x20 [ 1472.652191] ? kthread_park+0x90/0x90 [ 1472.652195] ret_from_fork+0x2c/0x40 Testcase: igt/gem_eio #ilk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170314111452.9375-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
* drm/i915: Don't use enums for hardware engine idMichal Wajdeczko2017-03-031-16/+16
| | | | | | | | | | | | | | Generally we are using macros for any hardware identifiers as these may change between Gens. Do the same with hardware engine ids. v2: move hw engine defs to i915_reg.h (Chris) Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170301202615.118632-1-michal.wajdeczko@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
* drm/i915: Split breadcrumbs spinlock into twoChris Wilson2017-03-031-3/+5
| | | | | | | | | | | | | | As we now take the breadcrumbs spinlock within the interrupt handler, we wish to minimise its hold time. During the interrupt we do not care about the state of the full rbtree, only that of the first element, so we can guard that with a separate lock. v2: Rename first_wait to irq_wait to make it clearer that it is guarded by irq_lock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170303190824.1330-1-chris@chris-wilson.co.uk
* drm/i915: Generalise wait for execlists to be idleChris Wilson2017-03-031-0/+1
| | | | | | | | | The code to check for execlists completion is generic, so move it to intel_engine_cs.c, where we can reuse the new intel_engine_is_idle(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170303121947.20482-2-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>