diff options
author | Shaohua Li <shli@fb.com> | 2017-02-28 22:00:20 +0100 |
---|---|---|
committer | Shaohua Li <shli@fb.com> | 2017-03-09 18:02:42 +0100 |
commit | 61eb2b43b99ebdc9bc6bc83d9792257b243e7cb3 (patch) | |
tree | 509d571e2f2cccce0beda5401a18c47e6505526b /kernel/membarrier.c | |
parent | md: don't impose the MD_SB_DISKS limit on arrays without metadata. (diff) | |
download | linux-61eb2b43b99ebdc9bc6bc83d9792257b243e7cb3.tar.xz linux-61eb2b43b99ebdc9bc6bc83d9792257b243e7cb3.zip |
md/raid1/10: fix potential deadlock
Neil Brown pointed out a potential deadlock in raid 10 code with
bio_split/chain. The raid1 code could have the same issue, but recent
barrier rework makes it less likely to happen. The deadlock happens in
below sequence:
1. generic_make_request(bio), this will set current->bio_list
2. raid10_make_request will split bio to bio1 and bio2
3. __make_request(bio1), wait_barrer, add underlayer disk bio to
current->bio_list
4. __make_request(bio2), wait_barrer
If raise_barrier happens between 3 & 4, since wait_barrier runs at 3,
raise_barrier waits for IO completion from 3. And since raise_barrier
sets barrier, 4 waits for raise_barrier. But IO from 3 can't be
dispatched because raid10_make_request() doesn't finished yet.
The solution is to adjust the IO ordering. Quotes from Neil:
"
It is much safer to:
if (need to split) {
split = bio_split(bio, ...)
bio_chain(...)
make_request_fn(split);
generic_make_request(bio);
} else
make_request_fn(mddev, bio);
This way we first process the initial section of the bio (in 'split')
which will queue some requests to the underlying devices. These
requests will be queued in generic_make_request.
Then we queue the remainder of the bio, which will be added to the end
of the generic_make_request queue.
Then we return.
generic_make_request() will pop the lower-level device requests off the
queue and handle them first. Then it will process the remainder
of the original bio once the first section has been fully processed.
"
Note, this only happens in read path. In write path, the bio is flushed to
underlaying disks either by blk flush (from schedule) or offladed to raid1/10d.
It's queued in current->bio_list.
Cc: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org (v3.14+, only the raid10 part)
Suggested-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Diffstat (limited to 'kernel/membarrier.c')
0 files changed, 0 insertions, 0 deletions