| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
In order to prepare to per-arch implementations of preempt_count move
the required bits into an asm-generic header and use this for all
archs.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-h5j0c1r3e3fk015m30h8f1zx@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to combine the preemption and need_resched test we need to
fold the need_resched information into the preempt_count value.
Since the NEED_RESCHED flag is set across CPUs this needs to be an
atomic operation, however we very much want to avoid making
preempt_count atomic, therefore we keep the existing TIF_NEED_RESCHED
infrastructure in place but at 3 sites test it and fold its value into
preempt_count; namely:
- resched_task() when setting TIF_NEED_RESCHED on the current task
- scheduler_ipi() when resched_task() sets TIF_NEED_RESCHED on a
remote task it follows it up with a reschedule IPI
and we can modify the cpu local preempt_count from
there.
- cpu_idle_loop() for when resched_task() found tsk_is_polling().
We use an inverted bitmask to indicate need_resched so that a 0 means
both need_resched and !atomic.
Also remove the barrier() in preempt_enable() between
preempt_enable_no_resched() and preempt_check_resched() to avoid
having to reload the preemption value and allow the compiler to use
the flags of the previuos decrement. I couldn't come up with any sane
reason for this barrier() to be there as preempt_enable_no_resched()
already has a barrier() before doing the decrement.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-7a7m5qqbn5pmwnd4wko9u6da@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace the single preempt_count() 'function' that's an lvalue with
two proper functions:
preempt_count() - returns the preempt_count value as rvalue
preempt_count_set() - Allows setting the preempt-count value
Also provide preempt_count_ptr() as a convenience wrapper to implement
all modifying operations.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-orxrbycjozopqfhb4dxdkdvb@git.kernel.org
[ Fixed build failure. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mike reported that commit 7d1a9417 ("x86: Use generic idle loop")
regressed several workloads and caused excessive reschedule
interrupts.
The patch in question failed to notice that the x86 code had an
inverted sense of the polling state versus the new generic code (x86:
default polling, generic: default !polling).
Fix the two prominent x86 mwait based idle drivers and introduce a few
new generic polling helpers (fixing the wrong smp_mb__after_clear_bit
usage).
Also switch the idle routines to using tif_need_resched() which is an
immediate TIF_NEED_RESCHED test as opposed to need_resched which will
end up being slightly different.
Reported-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: lenb@kernel.org
Cc: tglx@linutronix.de
Link: http://lkml.kernel.org/n/tip-nc03imb0etuefmzybzj7sprf@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Preemption semantics are going to change which mandate a change.
All DRM usage sites are already broken and will not be affected (much)
by this change. DRM people are aware and will remove the last few
stragglers.
For now, leave an empty stub that generates a warning, once all users
are gone we can remove this.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: airlied@linux.ie
Cc: daniel.vetter@ffwll.ch
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/n/tip-qfc1el2zvhxiyut4ai99ij4n@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch builds on patch 2 and periodically decays that max value to
do idle balancing per sched domain by approximately 1% per second. Also
decay the rq's max_idle_balance_cost value.
Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379096813-3032-4-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this patch, we keep track of the max cost we spend doing idle load balancing
for each sched domain. If the avg time the CPU remains idle is less then the
time we have already spent on idle balancing + the max cost of idle balancing
in the sched domain, then we don't continue to attempt the balance. We also
keep a per rq variable, max_idle_balance_cost, which keeps track of the max
time spent on newidle load balances throughout all its domains so that we can
determine the avg_idle's max value.
By using the max, we avoid overrunning the average. This further reduces the
chance we attempt balancing when the CPU is not idle for longer than the cost
to balance.
Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379096813-3032-3-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Two small fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix UAPI export of PERF_EVENT_IOC_ID
perf/x86/intel: Fix Silvermont offcore masks
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Without the following patch I have problems compiling code using
the new PERF_EVENT_IOC_ID ioctl(). It looks like u64 was used
instead of __u64
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1309171450380.11444@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull KVM fixes from Gleb Natapov.
* 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: set "blocked by NMI" flag if EPT violation happens during IRET from NMI
kvm: free resources after canceling async_pf
KVM: nEPT: reset PDPTR register cache on nested vmentry emulation
KVM: mmu: allow page tables to be in read-only slots
KVM: x86 emulator: emulate RETF imm
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Page tables in a read-only memory slot will currently cause a triple
fault because the page walker uses gfn_to_hva and it fails on such a slot.
OVMF uses such a page table; however, real hardware seems to be fine with
that as long as the accessed/dirty bits are set. Save whether the slot
is readonly, and later check it when updating the accessed and dirty bits.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|\ \ \
| |/ /
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:
"Fixes for CVE-2013-2897, CVE-2013-2895, CVE-2013-2897, CVE-2013-2894,
CVE-2013-2893, CVE-2013-2891, CVE-2013-2890, CVE-2013-2889.
All the bugs are triggerable only by specially crafted evil-on-purpose
HW devices. Fixes by Kees Cook and Benjamin Tissoires"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: lenovo-tpkbd: fix leak if tpkbd_probe_tp fails
HID: multitouch: validate indexes details
HID: logitech-dj: validate output report details
HID: validate feature and input report details
HID: lenovo-tpkbd: validate output report details
HID: LG: validate HID output report details
HID: steelseries: validate output report details
HID: sony: validate HID output report details
HID: zeroplus: validate output report details
HID: provide a helper for validating hid reports
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Many drivers need to validate the characteristics of their HID report
during initialization to avoid misusing the reports. This adds a common
helper to perform validation of the report exisitng, the field existing,
and the expected number of values within the field.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer code update from Thomas Gleixner:
- armada SoC clocksource overhaul with a trivial merge conflict
- Minor improvements to various SoC clocksource drivers
* 'timers/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: armada-370-xp: Add detailed clock requirements in devicetree binding
clocksource: armada-370-xp: Get reference fixed-clock by name
clocksource: armada-370-xp: Replace WARN_ON with BUG_ON
clocksource: armada-370-xp: Fix device-tree binding
clocksource: armada-370-xp: Introduce new compatibles
clocksource: armada-370-xp: Use CLOCKSOURCE_OF_DECLARE
clocksource: armada-370-xp: Simplify TIMER_CTRL register access
clocksource: armada-370-xp: Use BIT()
ARM: timer-sp: Set dynamic irq affinity
ARM: nomadik: add dynamic irq flag to the timer
clocksource: sh_cmt: 32-bit control register support
clocksource: em_sti: Convert to devm_* managed helpers
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This is almost cosmetic: we achieve a bit of consistency with
other clocksource drivers by using the CLOCKSOURCE_OF_DECLARE
macro for the boilerplate code.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull misc SCSI driver updates from James Bottomley:
"This patch set is a set of driver updates (megaraid_sas, fnic, lpfc,
ufs, hpsa) we also have a couple of bug fixes (sd out of bounds and
ibmvfc error handling) and the first round of esas2r checker fixes and
finally the much anticipated big endian additions for megaraid_sas"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (47 commits)
[SCSI] fnic: fnic Driver Tuneables Exposed through CLI
[SCSI] fnic: Kernel panic while running sh/nosh with max lun cfg
[SCSI] fnic: Hitting BUG_ON(io_req->abts_done) in fnic_rport_exch_reset
[SCSI] fnic: Remove QUEUE_FULL handling code
[SCSI] fnic: On system with >1.1TB RAM, VIC fails multipath after boot up
[SCSI] fnic: FC stat param seconds_since_last_reset not getting updated
[SCSI] sd: Fix potential out-of-bounds access
[SCSI] lpfc 8.3.42: Update lpfc version to driver version 8.3.42
[SCSI] lpfc 8.3.42: Fixed issue of task management commands having a fixed timeout
[SCSI] lpfc 8.3.42: Fixed inconsistent spin lock usage.
[SCSI] lpfc 8.3.42: Fix driver's abort loop functionality to skip IOs already getting aborted
[SCSI] lpfc 8.3.42: Fixed failure to allocate SCSI buffer on PPC64 platform for SLI4 devices
[SCSI] lpfc 8.3.42: Fix WARN_ON when driver unloads
[SCSI] lpfc 8.3.42: Avoided making pci bar ioremap call during dual-chute WQ/RQ pci bar selection
[SCSI] lpfc 8.3.42: Fixed driver iocbq structure's iocb_flag field running out of space
[SCSI] lpfc 8.3.42: Fix crash on driver load due to cpu affinity logic
[SCSI] lpfc 8.3.42: Fixed logging format of setting driver sysfs attributes hard to interpret
[SCSI] lpfc 8.3.42: Fixed back to back RSCNs discovery failure.
[SCSI] lpfc 8.3.42: Fixed race condition between BSG I/O dispatch and timeout handling
[SCSI] lpfc 8.3.42: Fixed function mode field defined too small for not recognizing dual-chute mode
...
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch adds the PCI ID's for HP Smart Array Gen9 controllers. Please
consider this patch for inclusion.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
Pull SLAB update from Pekka Enberg:
"Nothing terribly exciting here apart from Christoph's kmalloc
unification patches that brings sl[aou]b implementations closer to
each other"
* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
slab: Use correct GFP_DMA constant
slub: remove verify_mem_not_deleted()
mm/sl[aou]b: Move kmallocXXX functions to common code
mm, slab_common: add 'unlikely' to size check of kmalloc_slab()
mm/slub.c: beautify code for removing redundancy 'break' statement.
slub: Remove unnecessary page NULL check
slub: don't use cpu partial pages on UP
mm/slub: beautify code for 80 column limitation and tab alignment
mm/slub: remove 'per_cpu' which is useless variable
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
On Thu, 5 Sep 2013, kbuild test robot wrote:
> >> include/linux/slab.h:433:53: sparse: restricted gfp_t degrades to integer
> 429 static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
> 430 {
> 431 #ifndef CONFIG_SLOB
> 432 if (__builtin_constant_p(size) &&
> > 433 size <= KMALLOC_MAX_CACHE_SIZE && !(flags & SLAB_CACHE_DMA)) {
> 434 int i = kmalloc_index(size);
> 435
flags is of type gfp_t and not a slab internal flag. Therefore
use GFP_DMA.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
I do not see any user for this code in the tree.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The kmalloc* functions of all slab allcoators are similar now so
lets move them into slab.h. This requires some function naming changes
in slob.
As a results of this patch there is a common set of functions for
all allocators. Also means that kmalloc_large() is now available
in general to perform large order allocations that go directly
via the page allocator. kmalloc_large() can be substituted if
kmalloc() throws warnings because of too large allocations.
kmalloc_large() has exactly the same semantics as kmalloc but
can only used for allocations > PAGE_SIZE.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input update from Dmitry Torokhov:
"The only change is David Hermann's new EVIOCREVOKE evdev ioctl that
allows safely passing file descriptors to input devices to session
processes and later being able to stop delivery of events through
these fds so that inactive sessions will no longer receive user input
that does not belong to them"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: evdev - add EVIOCREVOKE ioctl
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
If we have multiple sessions on a system, we normally don't want
background sessions to read input events. Otherwise, it could capture
passwords and more entered by the user on the foreground session. This is
a real world problem as the recent XMir development showed:
http://mjg59.dreamwidth.org/27327.html
We currently rely on sessions to release input devices when being
deactivated. This relies on trust across sessions. But that's not given on
usual systems. We therefore need a way to control which processes have
access to input devices.
With VTs the kernel simply routed them through the active /dev/ttyX. This
is not possible with evdev devices, though. Moreover, we want to avoid
routing input-devices through some dispatcher-daemon in userspace (which
would add some latency).
This patch introduces EVIOCREVOKE. If called on an evdev fd, this revokes
device-access irrecoverably for that *single* open-file. Hence, once you
call EVIOCREVOKE on any dup()ed fd, all fds for that open-file will be
rather useless now (but still valid compared to close()!). This allows us
to pass fds directly to session-processes from a trusted source. The
source keeps a dup()ed fd and revokes access once the session-process is
no longer active.
Compared to the EVIOCMUTE proposal, we can avoid the CAP_SYS_ADMIN
restriction now as there is no way to revive the fd again. Hence, a user
is free to call EVIOCREVOKE themself to kill the fd.
Additionally, this ioctl allows multi-layer access-control (again compared
to EVIOCMUTE which was limited to one layer via CAP_SYS_ADMIN). A middle
layer can simply request a new open-file from the layer above and pass it
to the layer below. Now each layer can call EVIOCREVOKE on the fds to
revoke access for all layers below, at the expense of one fd per layer.
There's already ongoing experimental user-space work which demonstrates
how it can be used:
http://lists.freedesktop.org/archives/systemd-devel/2013-August/012897.html
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
Pull writeback fix from Wu Fengguang:
"A trivial writeback fix"
* tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Do not sort b_io list only because of block device inode
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
It is very likely that block device inode will be part of BDI dirty list
as well. However it doesn't make sence to sort inodes on the b_io list
just because of this inode (as it contains buffers all over the device
anyway). So save some CPU cycles which is valuable since we hold relatively
contented wb->list_lock.
Signed-off-by: Jan Kara <jack@suse.cz>
|
|\ \ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Pull aio changes from Ben LaHaise:
"First off, sorry for this pull request being late in the merge window.
Al had raised a couple of concerns about 2 items in the series below.
I addressed the first issue (the race introduced by Gu's use of
mm_populate()), but he has not provided any further details on how he
wants to rework the anon_inode.c changes (which were sent out months
ago but have yet to be commented on).
The bulk of the changes have been sitting in the -next tree for a few
months, with all the issues raised being addressed"
* git://git.kvack.org/~bcrl/aio-next: (22 commits)
aio: rcu_read_lock protection for new rcu_dereference calls
aio: fix race in ring buffer page lookup introduced by page migration support
aio: fix rcu sparse warnings introduced by ioctx table lookup patch
aio: remove unnecessary debugging from aio_free_ring()
aio: table lookup: verify ctx pointer
staging/lustre: kiocb->ki_left is removed
aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3"
aio: be defensive to ensure request batching is non-zero instead of BUG_ON()
aio: convert the ioctx list to table lookup v3
aio: double aio_max_nr in calculations
aio: Kill ki_dtor
aio: Kill ki_users
aio: Kill unneeded kiocb members
aio: Kill aio_rw_vect_retry()
aio: Don't use ctx->tail unnecessarily
aio: io_cancel() no longer returns the io_event
aio: percpu ioctx refcount
aio: percpu reqs_available
aio: reqs_active -> reqs_available
aio: fix build when migration is disabled
...
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
On Wed, Jun 12, 2013 at 11:14:40AM -0700, Kent Overstreet wrote:
> On Mon, Apr 15, 2013 at 02:40:55PM +0300, Octavian Purdila wrote:
> > When using a large number of threads performing AIO operations the
> > IOCTX list may get a significant number of entries which will cause
> > significant overhead. For example, when running this fio script:
> >
> > rw=randrw; size=256k ;directory=/mnt/fio; ioengine=libaio; iodepth=1
> > blocksize=1024; numjobs=512; thread; loops=100
> >
> > on an EXT2 filesystem mounted on top of a ramdisk we can observe up to
> > 30% CPU time spent by lookup_ioctx:
> >
> > 32.51% [guest.kernel] [g] lookup_ioctx
> > 9.19% [guest.kernel] [g] __lock_acquire.isra.28
> > 4.40% [guest.kernel] [g] lock_release
> > 4.19% [guest.kernel] [g] sched_clock_local
> > 3.86% [guest.kernel] [g] local_clock
> > 3.68% [guest.kernel] [g] native_sched_clock
> > 3.08% [guest.kernel] [g] sched_clock_cpu
> > 2.64% [guest.kernel] [g] lock_release_holdtime.part.11
> > 2.60% [guest.kernel] [g] memcpy
> > 2.33% [guest.kernel] [g] lock_acquired
> > 2.25% [guest.kernel] [g] lock_acquire
> > 1.84% [guest.kernel] [g] do_io_submit
> >
> > This patchs converts the ioctx list to a radix tree. For a performance
> > comparison the above FIO script was run on a 2 sockets 8 core
> > machine. This are the results (average and %rsd of 10 runs) for the
> > original list based implementation and for the radix tree based
> > implementation:
> >
> > cores 1 2 4 8 16 32
> > list 109376 ms 69119 ms 35682 ms 22671 ms 19724 ms 16408 ms
> > %rsd 0.69% 1.15% 1.17% 1.21% 1.71% 1.43%
> > radix 73651 ms 41748 ms 23028 ms 16766 ms 15232 ms 13787 ms
> > %rsd 1.19% 0.98% 0.69% 1.13% 0.72% 0.75%
> > % of radix
> > relative 66.12% 65.59% 66.63% 72.31% 77.26% 83.66%
> > to list
> >
> > To consider the impact of the patch on the typical case of having
> > only one ctx per process the following FIO script was run:
> >
> > rw=randrw; size=100m ;directory=/mnt/fio; ioengine=libaio; iodepth=1
> > blocksize=1024; numjobs=1; thread; loops=100
> >
> > on the same system and the results are the following:
> >
> > list 58892 ms
> > %rsd 0.91%
> > radix 59404 ms
> > %rsd 0.81%
> > % of radix
> > relative 100.87%
> > to list
>
> So, I was just doing some benchmarking/profiling to get ready to send
> out the aio patches I've got for 3.11 - and it looks like your patch is
> causing a ~1.5% throughput regression in my testing :/
... <snip>
I've got an alternate approach for fixing this wart in lookup_ioctx()...
Instead of using an rbtree, just use the reserved id in the ring buffer
header to index an array pointing the ioctx. It's not finished yet, and
it needs to be tidied up, but is most of the way there.
-ben
--
"Thought is the essence of where you are now."
--
kmo> And, a rework of Ben's code, but this was entirely his idea
kmo> -Kent
bcrl> And fix the code to use the right mm_struct in kill_ioctx(), actually
free memory.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
sock_aio_dtor() is dead code - and stuff that does need to do cleanup
can simply do it before calling aio_complete().
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
The kiocb refcount is only needed for cancellation - to ensure a kiocb
isn't freed while a ki_cancel callback is running. But if we restrict
ki_cancel callbacks to not block (which they currently don't), we can
simply drop the refcount.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
The old aio retry infrastucture needed to save the various arguments to
to aio operations. But with the retry infrastructure gone, we can trim
struct kiocb quite a bit.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
This code doesn't serve any purpose anymore, since the aio retry
infrastructure has been removed.
This change should be safe because aio_read/write are also used for
synchronous IO, and called from do_sync_read()/do_sync_write() - and
there's no looping done in the sync case (the read and write syscalls).
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Originally, io_event() was documented to return the io_event if
cancellation succeeded - the io_event wouldn't be delivered via the ring
buffer like it normally would.
But this isn't what the implementation was actually doing; the only
driver implementing cancellation, the usb gadget code, never returned an
io_event in its cancel function. And aio_complete() was recently changed
to no longer suppress event delivery if the kiocb had been cancelled.
This gets rid of the unused io_event argument to kiocb_cancel() and
kiocb->ki_cancel(), and changes io_cancel() to return -EINPROGRESS if
kiocb->ki_cancel() returned success.
Also tweak the refcounting in kiocb_cancel() to make more sense.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
As the aio job will pin the ring pages, that will lead to mem migrated
failed. In order to fix this problem we use an anon inode to manage the aio ring
pages, and setup the migratepage callback in the anon inode's address space, so
that when mem migrating the aio ring pages will be moved to other mem node safely.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
|
| | |_|/ / / / /
| |/| | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Introduce a new lib function anon_inode_getfile_private(), it creates a new file
instance by hooking it up to an anonymous inode, and a dentry that describe the
"class" of the file, similar to anon_inode_getfile(), but each file holds a
single inode. Furthermore, anyone who wants to create a private anon file will
benefit from this change.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|\ \ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
"Lots of activity again this round for I/O performance optimizations
(per-cpu IDA pre-allocation for vhost + iscsi/target), and the
addition of new fabric independent features to target-core
(COMPARE_AND_WRITE + EXTENDED_COPY).
The main highlights include:
- Support for iscsi-target login multiplexing across individual
network portals
- Generic Per-cpu IDA logic (kent + akpm + clameter)
- Conversion of vhost to use per-cpu IDA pre-allocation for
descriptors, SGLs and userspace page pointer list
- Conversion of iscsi-target + iser-target to use per-cpu IDA
pre-allocation for descriptors
- Add support for generic COMPARE_AND_WRITE (AtomicTestandSet)
emulation for virtual backend drivers
- Add support for generic EXTENDED_COPY (CopyOffload) emulation for
virtual backend drivers.
- Add support for fast memory registration mode to iser-target (Vu)
The patches to add COMPARE_AND_WRITE and EXTENDED_COPY support are of
particular significance, which make us the first and only open source
target to support the full set of VAAI primitives.
Currently Linux clients are lacking upstream support to actually
utilize these primitives. However, with server side support now in
place for folks like MKP + ZAB working on the client, this logic once
reserved for the highest end of storage arrays, can now be run in VMs
on their laptops"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
target/iscsi: Bump versions to v4.1.0
target: Update copyright ownership/year information to 2013
iscsi-target: Bump default TCP listen backlog to 256
target: Fix >= v3.9+ regression in PR APTPL + ALUA metadata write-out
iscsi-target; Bump default CmdSN Depth to 64
iscsi-target: Remove unnecessary wait_for_completion in iscsi_get_thread_set
iscsi-target: Add thread_set->ts_activate_sem + use common deallocate
iscsi-target: Fix race with thread_pre_handler flush_signals + ISCSI_THREAD_SET_DIE
target: remove unused including <linux/version.h>
iser-target: introduce fast memory registration mode (FRWR)
iser-target: generalize rdma memory registration and cleanup
iser-target: move rdma wr processing to a shared function
target: Enable global EXTENDED_COPY setup/release
target: Add Third Party Copy (3PC) bit in INQUIRY response
target: Enable EXTENDED_COPY setup in spc_parse_cdb
target: Add support for EXTENDED_COPY copy offload emulation
target: Avoid non-existent tg_pt_gp_mem in target_alua_state_check
target: Add global device list for EXTENDED_COPY
target: Make helpers non static for EXTENDED_COPY command setup
target: Make spc_parse_naa_6h_vendor_specific non static
...
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
This patch adds the Third Party Copy (3PC) bit to signal support
for EXTENDED_COPY within standard inquiry response data.
Also add emulate_3pc device attribute in configfs (enabled by default)
to allow the exposure of this bit to be disabled, if necessary.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Zach Brown <zab@redhat.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
This patch adds support for EXTENDED_COPY emulation from SPC-3, that
enables full copy offload target support within both a single virtual
backend device, and across multiple virtual backend devices. It also
functions independent of target fabric, and supports copy offload
across multiple target fabric ports.
This implemenation supports both EXTENDED_COPY PUSH and PULL models
of operation, so the actual CDB may be received on either source or
desination logical unit.
For Target Descriptors, it currently supports the NAA IEEE Registered
Extended designator (type 0xe4), which allows the reference of target
ports to occur independent of fabric type using EVPD 0x83 WWNs.
For Segment Descriptors, it currently supports copy from block to
block (0x02) mode.
It also honors any present SCSI reservations of the destination target
port. Note that only Supports No List Identifier (SNLID=1) mode is
supported.
Also included is basic RECEIVE_COPY_RESULTS with service action type
OPERATING PARAMETERS (0x03) required for SNLID=1 operation.
v3 changes:
- Fix incorrect return type in target_do_receive_copy_results()
(Fengguang)
v2 changes:
- Use target_alloc_sgl() instead of transport_generic_get_mem()
- Convert debug output to use pr_debug()
- Convert target_xcopy_parse_target_descriptors() NAA IEEN WWN
dump to use 0x%16phN format specification
- Drop unnecessary xcopy_pt_cmd->xpt_passthrough_wsem, and
associated usage in xcopy_pt_write_pending() and
target_xcopy_issue_pt_cmd()
- Add check for unsupported EXTENDED_COPY(LID4) service action
bits in target_do_xcopy()
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Zach Brown <zab@redhat.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
EXTENDED_COPY needs to be able to search a global list of devices
based on NAA WWN device identifiers, so add a simple g_device_list
protected by g_device_mutex.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Zach Brown <zab@redhat.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Both target_alloc_sgl() and transport_generic_map_mem_to_cmd() are
required by EXTENDED_COPY logic when setting up internally dispatched
command descriptors, so go ahead and make both of these non static.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Zach Brown <zab@redhat.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
target_core_fabric.h
Reversing the dma_data_direction for pci_map_sg() friends is useful
for other drivers, so move it from tcm_qla2xxx into inline code
within target_core_fabric.h.
Also drop internal usage of equivlient in tcm_qla2xxx fabric code.
Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Giridhar Malavali <giridhar.malavali@qlogic.com>
Cc: Chad Dupuis <chad.dupuis@qlogic.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
This patch adds support for COMPARE_AND_WRITE emulation on a per block
basis. This logic is used as an atomic test and set primative currently
used by VMWare ESX VAAI for performing array side locking of individual
VMFS extent ownership.
This includes the COMPARE_AND_WRITE CDB parsing within sbc_parse_cdb(),
and does the majority of the work within the compare_and_write_callback()
to perform the verify instance user data comparision, and subsequent
write instance user data I/O submission upon a successfull comparision.
The synchronization is enforced by se_device->caw_sem, that is obtained
before the initial READ I/O submission in sbc_compare_and_write(). The
mutex is then released upon MISCOMPARE in compare_and_write_callback(),
or upon WRITE instance user-data completion in compare_and_write_post().
The implementation currently assumes a single logical block (NoLB=1).
v4 changes:
- Explicitly clear cmd->transport_complete_callback for two failure
cases in sbc_compare_and_write() in order to avoid double unlock
of ->caw_sem in compare_and_write_callback() (Dan Carpenter)
v3 changes:
- Convert se_device->caw_mutex to ->caw_sem
v2 changes:
- Set SCF_COMPARE_AND_WRITE and cmd->execute_cmd() to
sbc_compare_and_write() during setup in sbc_parse_cdb()
- Use sbc_compare_and_write() for initial READ submission with
DMA_FROM_DEVICE
- Reset cmd->execute_cmd() to sbc_execute_rw() for write instance
user-data in compare_and_write_callback()
- Drop SCF_BIDI command flag usage
- Set TRANSPORT_PROCESSING + transport_state flags before write
instance submission, and convert to __target_execute_cmd()
- Prevent sbc_get_size() from being being called twice to
generate incorrect size in sbc_parse_cdb()
- Enforce se_device->caw_mutex synchronization between initial
READ I/O submission, and final WRITE I/O completion.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
This patch adds the MAXIMUM COMPARE AND WRITE LENGTH bit, currently
hardcoded to a single logical block (NoLB=1) within the Block Limits
VPD in spc_emulate_evpd_b0().
Also add emulate_caw device attribute in configfs (enabled by default)
to allow the exposure of this bit to be disabled, if necessary.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Required by COMPARE_AND_WRITE for write instance user-data
submission, in order to bypass target_execute_cmd() checks.
Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
After COMPARE_AND_WRITE completes it's comparision, the WRITE
payload SGLs head expect to be updated to point from the verify
instance of user data, to the write instance of user data.
So for this special case, add transport_reset_sgl_orig() usage
within transport_free_pages() and add se_cmd->t_data_[sg,nents]_orig
members to save the original assignments.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
COMPARE_AND_WRITE expects to be able to send down a DMA_FROM_DEVICE
to obtain the necessary READ payload for comparision against the
first half of the WRITE payload containing the verify user data.
Currently virtual backends expect to internally reference SGLs,
SGL nents, and data_direction, so change IBLOCK, FILEIO and RD
sbc_ops->execute_rw() to accept this values as function parameters.
Also add default sbc_execute_rw() handler for the typical case for
cmd->execute_rw() submission using cmd->t_data_sg, cmd->t_data_nents,
and cmd->data_direction).
v2 Changes:
- Add SCF_COMPARE_AND_WRITE command flag
- Use sbc_execute_rw() for normal cmd->execute_rw() submission
with expected se_cmd members.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
This patch adds TCM_MISCOMPARE_VERIFY (ASC=0x1d, ASCQ=0x00) sense
handling to transport_send_check_condition_and_sense(), which is
required for a COMPARE_AND_WRITE comparision failure.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
This patch adds a sense_reason_t return to ->transport_complete_callback(),
and updates target_complete_ok_work() to invoke the call if necessary to
transport_send_check_condition_and_sense() during the failure case.
Also update xdreadwrite_callback() to use this return value.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
|