| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Martin reported that his test system would not boot with
current git, it oopsed with this:
BUG: unable to handle kernel paging request at ffff88046c6c9e80
IP: [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150
PGD 1ddf067 PUD 1de2067 PMD 47fc7d067 PTE 800000046c6c9060
Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: sd_mod lpfc(+) scsi_transport_fc scsi_tgt oracleasm
rpcsec_gss_krb5 ipv6 igb dca i2c_algo_bit i2c_core hwmon
CPU: 3 PID: 87 Comm: kworker/u17:1 Not tainted 3.14.0+ #246
Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013
Workqueue: events_unbound async_run_entry_fn
task: ffff8802743c2150 ti: ffff880273d02000 task.ti: ffff880273d02000
RIP: 0010:[<ffffffff812971e0>] [<ffffffff812971e0>]
blk_queue_start_tag+0x90/0x150
RSP: 0018:ffff880273d03a58 EFLAGS: 00010092
RAX: ffff88046c6c9e78 RBX: ffff880077208e78 RCX: 00000000fffc8da6
RDX: 00000000fffc186d RSI: 0000000000000009 RDI: 00000000fffc8d9d
RBP: ffff880273d03a88 R08: 0000000000000001 R09: ffff8800021c2410
R10: 0000000000000005 R11: 0000000000015b30 R12: ffff88046c5bb8a0
R13: ffff88046c5c0890 R14: 000000000000001e R15: 000000000000001e
FS: 0000000000000000(0000) GS:ffff880277b00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88046c6c9e80 CR3: 00000000018f6000 CR4: 00000000000407e0
Stack:
ffff880273d03a98 ffff880474b18800 0000000000000000 ffff880474157000
ffff88046c5c0890 ffff880077208e78 ffff880273d03ae8 ffffffff813b9e62
ffff880200000010 ffff880474b18968 ffff880474b18848 ffff88046c5c0cd8
Call Trace:
[<ffffffff813b9e62>] scsi_request_fn+0xf2/0x510
[<ffffffff81293167>] __blk_run_queue+0x37/0x50
[<ffffffff8129ac43>] blk_execute_rq_nowait+0xb3/0x130
[<ffffffff8129ad24>] blk_execute_rq+0x64/0xf0
[<ffffffff8108d2b0>] ? bit_waitqueue+0xd0/0xd0
[<ffffffff813bba35>] scsi_execute+0xe5/0x180
[<ffffffff813bbe4a>] scsi_execute_req_flags+0x9a/0x110
[<ffffffffa01b1304>] sd_spinup_disk+0x94/0x460 [sd_mod]
[<ffffffff81160000>] ? __unmap_hugepage_range+0x200/0x2f0
[<ffffffffa01b2b9a>] sd_revalidate_disk+0xaa/0x3f0 [sd_mod]
[<ffffffffa01b2fb8>] sd_probe_async+0xd8/0x200 [sd_mod]
[<ffffffff8107703f>] async_run_entry_fn+0x3f/0x140
[<ffffffff8106a1c5>] process_one_work+0x175/0x410
[<ffffffff8106b373>] worker_thread+0x123/0x400
[<ffffffff8106b250>] ? manage_workers+0x160/0x160
[<ffffffff8107104e>] kthread+0xce/0xf0
[<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff815f0bac>] ret_from_fork+0x7c/0xb0
[<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70
Code: 48 0f ab 11 72 db 48 81 4b 40 00 00 10 00 89 83 08 01 00 00 48 89
df 49 8b 04 24 48 89 1c d0 e8 f7 a8 ff ff 49 8b 85 28 05 00 00 <48> 89
58 08 48 89 03 49 8d 85 28 05 00 00 48 89 43 08 49 89 9d
RIP [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150
RSP <ffff880273d03a58>
CR2: ffff88046c6c9e80
Martin bisected and found this to be the problem patch;
commit 6d113398dcf4dfcd9787a4ead738b186f7b7ff0f
Author: Jan Kara <jack@suse.cz>
Date: Mon Feb 24 16:39:54 2014 +0100
block: Stop abusing rq->csd.list in blk-softirq
and the problem was immediately apparent. The patch states that
it is safe to reuse queuelist at completion time, since it is
no longer used. However, that is not true if a device is using
block enabled tagging. If that is the case, then the queuelist
is reused to keep track of busy tags. If a device also ended
up using softirq completions, we'd reuse ->queuelist for the
IPI handling while block tagging was still using it. Boom.
Fix this by adding a new ipi_list list head, and share the
memory used with the request hash table. The hash table is
never used after the request is moved to the dispatch list,
which happens long before any potential completion of the
request. Add a new request bit for this, so we don't have
cases that check rq->hash while it could potentially have
been reused for the IPI completion.
Reported-by: Martin K. Petersen <martin.petersen@oracle.com>
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
cmd_flags in struct request is now 64 bits wide but the scsi_execute
functions truncated arguments passed to int leading to errors. Make sure
the flags parameters are u64.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Jens Axboe <axboe@fb.com>
CC: Jan Kara <jack@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
| |
We'd occasionally attempt to generate protection information for flushes
and other requests with a zero payload. Make sure we only attempt to
enable integrity for reads and writes.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Commit bf36f9cfa6d3d caused a regression by effectively reverting Nic's
fix from 5837c80e870b that ensures we traverse the full bio_vec list
upon completion.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 4550dd6c6b062 introduced for_each_bvec() which iterates over each
bvec attached to a bio or bip. However, the macro fails to check bi_size
before dereferencing which can lead to crashes while counting/mapping
integrity scatterlist segments.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Metric tons of high speed spew is not helpful when things go pear shaped.
systemd lost its mind, forgot how to stop services it insists on being
sole manager of, massive printk() flood ensued, box eventually died.
[16206.684000] loop: Write error at byte offset 11412291584, length 4096.
[16206.684000] systemd-journald[1758]: /dev/kmsg buffer overrun, some messages lost.
[16206.684000] loop: Write error at byte offset 13155434496, length 4096.
[16206.684000] loop: Write error at byte offset 13155438592, length 4096.
[16206.684000] loop: Write error at byte offset 13155442688, length 4096.
[16206.684000] loop: Write error at byte offset 13960736768, length 4096.
[16206.684000] loop: Write error at byte offset 14229172224, length 4096.
[16206.684000] systemd-journald[1758]: /dev/kmsg buffer overrun, some messages lost.
[16206.684000] loop: Write error at byte offset 14766043136, length 4096.
[16206.684000] loop: Write error at byte offset 15034478592, length 4096.
[16206.684000] systemd-journald[1758]: /dev/kmsg buffer overrun, some messages lost.
Signed-off-by: Mike Galbraith <bitbucket@online.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a CPU is unplugged, we move the blk_mq_ctx request entries
to the current queue. The current code forgets to remap the
blk_mq_hw_ctx before marking the software context pending,
which breaks if old-cpu and new-cpu don't map to the same
hardware queue.
Additionally, if we mark entries as pending in the new
hardware queue, then make sure we schedule it for running.
Otherwise request could be sitting there until someone else
queues IO for that hardware queue.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I got a bug report yesterday from Laszlo Ersek <lersek@xxxxxxxxxx>, in
which he states that his kvm instance fails to suspend. He Laszlo
bisected it down to this commit:
commit 1cf7e9c68fe84248174e998922b39e508375e7c1
Author: Jens Axboe <axboe@xxxxxxxxx>
Date: Fri Nov 1 10:52:52 2013 -0600
virtio_blk: blk-mq support
where virtio-blk is converted to use the blk-mq infrastructure. After
digging a bit, it became clear that the issue was with the queue drain.
blk-mq tracks queue usage in a percpu counter, which is incremented on
request alloc and decremented when the request is freed. The initial
hunt was for an inconsistency in blk-mq, but everything seemed fine. In
fact, the counter only returned crazy values when suspend was in
progress. When a CPU is unplugged, the percpu counters merges that CPU
state with the general state. blk-mq takes care to register a hotcpu
notifier with the appropriate priority, so we know it runs after the
percpu counter notifier. However, the percpu counter notifier only
merges the state when the CPU is fully gone. This leaves a state
transition where the CPU going away is no longer in the online mask, yet
it still holds private values. This means that in this state,
percpu_counter_sum() returns invalid results, and the suspend then hangs
waiting for abs(dead-cpu-value) requests to complete which of course
will never happen.
Fix this by clearing the state earlier, so we never have a case where
the CPU isn't in online mask but still holds private state. This bug has
been there since forever, I guess we don't have a lot of users where
percpu counters needs to be reliable during the suspend cycle.
Reported-by: <lersek@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull arch/tile updates from Chris Metcalf:
"These fix a few stray build issues seen in linux-next, and also add
the minimal required support for perf to tilegx"
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile: remove unused variable 'devcap'
tile: Fix vDSO compilation issue with allyesconfig
perf tools: Allow building for tile
tile/perf: Support perf_events on tilegx and tilepro
tile: Enable NMIs on return from handle_nmi() without errors
tile: Add support for handling PMC hardware
tile: don't use __get_cpu_var() with structure-typed arguments
tile: avoid overflow in ns2cycles
|
| |
| |
| |
| |
| |
| |
| | |
Commit 503275bf37 removed the use of the variable but not
the variable itself.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
make allyesconfig give the following build error on tile:
tilegx-linux-gcc: error: arch/tile/kernel/vdso/vgettimeofday32.o: No such file or directory
tilegx-linux-objcopy: 'arch/tile/kernel/vdso/vdso32.so.dbg': No such file or directory
In case with CONFIG_MODVERSIONS, cmd_cc_o_c generate .tmp_<file>.o from
<file>.c only. Fix it by execute rule_cc_o_c instead.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kerry Sheh <ksheh@tilera.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Tested by building perf:
- Cross-compiled for tile on x86_64
- Built natively on tile
Signed-off-by: Zhigang Lu <zlu@tilera.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
|
| |
| |
| |
| |
| |
| |
| | |
Add perf support for tile architecture.
Signed-off-by: Zhigang Lu <zlu@tilera.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
NMI interrupts mask ALL interrupts before calling the handler,
so we need to unmask NMIs according to the value handle_nmi() returns.
If it returns zero, the NMIs should be re-enabled; if it returns
a non-zero error, the NMIs should be disabled.
Signed-off-by: Zhigang Lu <zlu@tilera.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
|
| |
| |
| |
| |
| |
| |
| | |
The PMC module is used by perf_events, oprofile and watchdogs.
Signed-off-by: Zhigang Lu <zlu@tilera.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
|
| |
| |
| |
| |
| |
| | |
This no longer works with the new per-cpu infrastructure.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In commit 4cecf6d401a ("sched, x86: Avoid unnecessary overflow in
sched_clock") and in recent patch "clocksource: avoid unnecessary
overflow in cyclecounter_cyc2ns()" https://lkml.org/lkml/2014/3/4/17,
the mult-shift approach is replaced by 2 steps to avoid storing a large,
intermediate value that could overflow.
arch/tile/kernel/time.c has a similar pattern in cycles2ns, and this
copies the same pattern in this function
CC: John Stultz <johnstul@us.ibm.com>
CC: Mike Galbraith <bitbucket@online.de>
CC: Salman Qazi <sqazi@google.com>
Signed-off-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper changes from Mike Snitzer:
- Fix dm-cache corruption caused by discard_block_size > cache_block_size
- Fix a lock-inversion detected by LOCKDEP in dm-cache
- Fix a dangling bio bug in the dm-thinp target's process_deferred_bios
error path
- Fix corruption due to non-atomic transaction commit which allowed a
metadata superblock to be written before all other metadata was
successfully written -- this is common to all targets that use the
persistent-data library's transaction manager (dm-thinp, dm-cache and
dm-era).
- Various small cleanups in the DM core
- Add the dm-era target which is useful for keeping track of which
blocks were written within a user defined period of time called an
'era'. Use cases include tracking changed blocks for backup
software, and partially invalidating the contents of a cache to
restore cache coherency after rolling back a vendor snapshot.
- Improve the on-disk layout of multithreaded writes to the
dm-thin-pool by splitting the pool's deferred bio list to be a
per-thin device list and then sorting that list using an rb_tree.
The subsequent read throughput of the data written via multiple
threads improved by ~70%.
- Simplify the multipath target's handling of queuing IO by pushing
requests back to the request queue rather than queueing the IO
internally.
* tag 'dm-3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (24 commits)
dm cache: fix a lock-inversion
dm thin: sort the per thin deferred bios using an rb_tree
dm thin: use per thin device deferred bio lists
dm thin: simplify pool_is_congested
dm thin: fix dangling bio in process_deferred_bios error path
dm mpath: print more useful warnings in multipath_message()
dm-mpath: do not activate failed paths
dm mpath: remove extra nesting in map function
dm mpath: remove map_io()
dm mpath: reduce memory pressure when requeuing
dm mpath: remove process_queued_ios()
dm mpath: push back requests instead of queueing
dm table: add dm_table_run_md_queue_async
dm mpath: do not call pg_init when it is already running
dm: use RCU_INIT_POINTER instead of rcu_assign_pointer in __unbind
dm: stop using bi_private
dm: remove dm_get_mapinfo
dm: make dm_table_alloc_md_mempools static
dm: take care to copy the space map roots before locking the superblock
dm transaction manager: fix corruption due to non-atomic transaction commit
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When suspending a cache the policy is walked and the individual policy
hints written to the metadata via sync_metadata(). This led to this
lock order:
policy->lock
cache_metadata->root_lock
When loading the cache target the policy is populated while the metadata
lock is held:
cache_metadata->root_lock
policy->lock
Fix this potential lock-inversion (ABBA) deadlock in sync_metadata() by
ensuring the cache_metadata root_lock is held whilst all the hints are
written, rather than being repeatedly locked while policy->lock is held
(as was the case with each callout that policy_walk_mappings() made to
the old save_hint() method).
Found by turning on the CONFIG_PROVE_LOCKING ("Lock debugging: prove
locking correctness") build option. However, it is not clear how the
LOCKDEP reported paths can lead to a deadlock since the two paths,
suspending a target and loading a target, never occur at the same time.
But that doesn't mean the same lock-inversion couldn't have occurred
elsewhere.
Reported-by: Marian Csontos <mcsontos@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
A thin-pool will allocate blocks using FIFO order for all thin devices
which share the thin-pool. Because of this simplistic allocation the
thin-pool's space can become fragmented quite easily; especially when
multiple threads are requesting blocks in parallel.
Sort each thin device's deferred_bio_list based on logical sector to
help reduce fragmentation of the thin-pool's ondisk layout.
The following tables illustrate the realized gains/potential offered by
sorting each thin device's deferred_bio_list. An "io size"-sized random
read of the device would result in "seeks/io" fragments being read, with
an average "distance/seek" between each fragment.
Data was written to a single thin device using multiple threads via
iozone (8 threads, 64K for both the block_size and io_size).
unsorted:
io size seeks/io distance/seek
--------------------------------------
4k 0.000 0b
16k 0.013 11m
64k 0.065 11m
256k 0.274 10m
1m 1.109 10m
4m 4.411 10m
16m 17.097 11m
64m 60.055 13m
256m 148.798 25m
1g 809.929 21m
sorted:
io size seeks/io distance/seek
--------------------------------------
4k 0.000 0b
16k 0.000 1g
64k 0.001 1g
256k 0.003 1g
1m 0.011 1g
4m 0.045 1g
16m 0.181 1g
64m 0.747 1011m
256m 3.299 1g
1g 14.373 1g
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The thin-pool previously only had a single deferred_bios list that would
collect bios for all thin devices in the pool. Split this per-pool
deferred_bios list out to per-thin deferred_bios_list -- doing so
enables increased parallelism when processing deferred bios. And now
that each thin device has it's own deferred_bios_list we can sort all
bios in the list using logical sector. The requeue code in error
handling path is also cleaner as a side-effect.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The pool is congested if the pool is in PM_OUT_OF_DATA_SPACE mode. This
is more explicit/clear/efficient than inferring whether or not the pool
is congested by checking if retry_on_resume_list is empty.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If unable to ensure_next_mapping() we must add the current bio, which
was removed from the @bios list via bio_list_pop, back to the
deferred_bios list before all the remaining @bios.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The warning message "Unrecognised multipath message received" is
displayed in two different situations in multipath_message(): when the
number of arguments passed is invalid and when the string passed in
argv[0] is not recognized.
Make it easier to identify where the problem is by making these warnings
more specific with additional context for each case.
Signed-off-by: Jose Castillo <jcastillo@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
activate_path() is run without a lock, so the path might be
set to failed before activate_path() had a chance to run.
This patch add a check for ->active in activate_path() to
avoid unnecessary overhead by calling functions which are known
to be failing.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Return early for case when no path exists, and when the
pathgroup isn't ready. This eliminates the need for
extra nesting for the the common case.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
multipath_map() is now just a wrapper around map_io(), so we
can rename map_io() to multipath_map().
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When multipath needs to requeue I/O in the block layer the per-request
context shouldn't be allocated, as it will be freed immediately
afterwards anyway. Avoiding this memory allocation will reduce memory
pressure during requeuing.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
process_queued_ios() has served 3 functions:
1) select pg and pgpath if none is selected
2) start pg_init if requested
3) dispatch queued IOs when pg is ready
Basically, a call to queue_work(process_queued_ios) can be replaced by
dm_table_run_md_queue_async(), which runs request queue and ends up
calling map_io(), which does 1), 2) and 3).
Exception is when !pg_ready() (which means either pg_init is running or
requested), then multipath_busy() prevents map_io() being called from
request_fn.
If pg_init is running, it should be ok as long as pg_init_done() does
the right thing when pg_init is completed, I.e.: restart pg_init if
!pg_ready() or call dm_table_run_md_queue_async() to kick map_io().
If pg_init is requested, we have to make sure the request is detected
and pg_init will be started. pg_init is requested in 3 places:
a) __choose_pgpath() in map_io()
b) __choose_pgpath() in multipath_ioctl()
c) pg_init retry in pg_init_done()
a) is ok because map_io() calls __pg_init_all_paths(), which does 2).
b) needs a call to __pg_init_all_paths(), which does 2).
c) needs a call to __pg_init_all_paths(), which does 2).
So this patch removes process_queued_ios() and ensures that
__pg_init_all_paths() is called at the appropriate locations.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There is no reason why multipath needs to queue requests internally for
queue_if_no_path or pg_init; we should rather push them back onto the
request queue.
And while we're at it we can simplify the conditional statement in
map_io() to make it easier to read.
Since mpath no longer does internal queuing of I/O the table info no
longer emits the internal queue_size. Instead it displays 1 if queuing
is being used or 0 if it is not.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Introduce dm_table_run_md_queue_async() to run the request_queue of the
mapped_device associated with a request-based DM table.
Also add dm_md_get_queue() wrapper to extract the request_queue from a
mapped_device.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch moves condition checks as a preparation of following
patches and has no effect on behaviour.
process_queued_ios() is the only caller of __pg_init_all_paths()
and 2 condition checks are moved from outside to inside without
side effects.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Replace rcu_assign_pointer(p, NULL) with RCU_INIT_POINTER(p, NULL).
The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure. And in the
case of the NULL pointer, there is no structure to initialize. So,
rcu_assign_pointer(p, NULL) can be safely converted to
RCU_INIT_POINTER(p, NULL).
Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Device mapper uses the bio structure's bi_private field as a pointer
to dm_target_io or dm_rq_clone_bio_info. But a bio structure is
embedded in the dm_target_io and dm_rq_clone_bio_info structures, so the
pointer to the structure that contains the bio can be found with the
container_of() macro.
Remove the use of bi_private and use container_of() instead.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Remove dm_get_mapinfo() because no target uses it. Targets can allocate
per-bio data using ti->per_bio_data_size, this is much more flexible
than union map_info.
Leave union map_info only for the request-based multipath target's use.
Also delete the unused "unsigned long long ll" field of union map_info.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make the function dm_table_alloc_md_mempools static because it is not
called from another file.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In theory copying the space map root can fail, but in practice it never
does because we're careful to check what size buffer is needed.
But make certain we're able to copy the space map roots before
locking the superblock.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # drop dm-era and dm-cache changes as needed
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The persistent-data library used by dm-thin, dm-cache, etc is
transactional. If anything goes wrong, such as an io error when writing
new metadata or a power failure, then we roll back to the last
transaction.
Atomicity when committing a transaction is achieved by:
a) Never overwriting data from the previous transaction.
b) Writing the superblock last, after all other metadata has hit the
disk.
This commit and the following commit ("dm: take care to copy the space
map roots before locking the superblock") fix a bug associated with (b).
When committing it was possible for the superblock to still be written
in spite of an io error occurring during the preceeding metadata flush.
With these commits we're careful not to take the write lock out on the
superblock until after the metadata flush has completed.
Change the transaction manager's semantics for dm_tm_commit() to assume
all data has been flushed _before_ the single superblock that is passed
in.
As a prerequisite, split the block manager's block unlocking and
flushing by simplifying dm_bm_flush_and_unlock() to dm_bm_flush(). Now
the unlocking must be done separately.
This issue was discovered by forcing io errors at the crucial time
using dm-flakey.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Discard block size not being equal to cache block size causes data
corruption by erroneously avoiding migrations in issue_copy() because
the discard state is being cleared for a group of cache blocks when it
should not.
Completely remove all code that enabled a distinction between the
cache block size and discard block size.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the discard block size is larger than the cache block size we will
not properly quiesce IO to a region that is about to be discarded. This
results in a race between a cache migration where no copy is needed, and
a write to an adjacent cache block that's within the same large discard
block.
Workaround this by limiting the discard_block_size to cache_block_size.
Also limit the max_discard_sectors to cache_block_size.
A more comprehensive fix that introduces range locking support in the
bio_prison and proper quiescing of a discard range that spans multiple
cache blocks is already in development.
Reported-by: Morgan Mears <Morgan.Mears@netapp.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Acked-by: Heinz Mauelshagen <heinzm@redhat.com>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This change offers a big performance boost for dm-era.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
dm-era is a target that behaves similar to the linear target. In
addition it keeps track of which blocks were written within a user
defined period of time called an 'era'. Each era target instance
maintains the current era as a monotonically increasing 32-bit
counter.
Use cases include tracking changed blocks for backup software, and
partially invalidating the contents of a cache to restore cache
coherency after rolling back a vendor snapshot.
dm-era is primarily expected to be paired with the dm-cache target.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU upates from Joerg Roedel:
"This time a few more updates queued up.
- Rework VT-d code to support ACPI devices
- Improvements for memory and PCI hotplug support in the VT-d driver
- Device-tree support for OMAP IOMMU
- Convert OMAP IOMMU to use devm_* interfaces
- Fixed PASID support for AMD IOMMU
- Other random cleanups and fixes for OMAP, ARM-SMMU and SHMOBILE
IOMMU
Most of the changes are in the VT-d driver because some rework was
necessary for better hotplug and ACPI device support"
* tag 'iommu-updates-v3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (75 commits)
iommu/vt-d: Fix error handling in ANDD processing
iommu/vt-d: returning free pointer in get_domain_for_dev()
iommu/vt-d: Only call dmar_acpi_dev_scope_init() if DRHD units present
iommu/vt-d: Check for NULL pointer in dmar_acpi_dev_scope_init()
iommu/amd: Fix logic to determine and checking max PASID
iommu/vt-d: Include ACPI devices in iommu=pt
iommu/vt-d: Finally enable translation for non-PCI devices
iommu/vt-d: Remove to_pci_dev() in intel_map_page()
iommu/vt-d: Remove pdev from intel_iommu_attach_device()
iommu/vt-d: Remove pdev from iommu_no_mapping()
iommu/vt-d: Make domain_add_dev_info() take struct device
iommu/vt-d: Make domain_remove_one_dev_info() take struct device
iommu/vt-d: Rename 'hwdev' variables to 'dev' now that that's the norm
iommu/vt-d: Remove some pointless to_pci_dev() calls
iommu/vt-d: Make get_valid_domain_for_dev() take struct device
iommu/vt-d: Make iommu_should_identity_map() take struct device
iommu/vt-d: Handle RMRRs for non-PCI devices
iommu/vt-d: Make get_domain_for_dev() take struct device
iommu/vt-d: Make domain_context_mapp{ed,ing}() take struct device
iommu/vt-d: Make device_to_iommu() cope with non-PCI devices
...
|
| | \ \ | |
| | \ \ | |
| | \ \ | |
| | \ \ | |
| | \ \ | |
| | \ \ | |
| | \ \ | |
| | \ \ | |
| |\ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
'arm/shmobile' and 'x86/vt-d' into next
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
If we failed to find an ACPI device to correspond to an ANDD record, we
would fail to increment our pointer and would just process the same record
over and over again, with predictable results.
Turn it from a while() loop into a for() loop to let the 'continue' in
the error paths work correctly.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
If we hit this error condition then we want to return a NULL pointer and
not a freed variable.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
As pointed out by Jörg and fixed in commit 11f1a7768 ("iommu/vt-d: Check
for NULL pointer in dmar_acpi_dev_scope_init(), this code path can
bizarrely get exercised even on AMD IOMMU systems with IRQ remapping
enabled.
In addition to the defensive check for NULL which Jörg added, let's also
just avoid calling the function at all if there aren't an Intel IOMMU
units in the system.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
When ir_dev_scope_init() is called via a rootfs initcall it
will check for irq_remapping_enabled before it calls
(indirectly) into dmar_acpi_dev_scope_init() which uses the
dmar_tbl pointer without any checks.
The AMD IOMMU driver also sets the irq_remapping_enabled
flag which causes the dmar_acpi_dev_scope_init() function to
be called on systems with AMD IOMMU hardware too, causing a
boot-time kernel crash.
Signed-off-by: Joerg Roedel <joro@8bytes.org>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|