diff options
author | Shaohua Li <shli@fb.com> | 2015-08-13 23:31:59 +0200 |
---|---|---|
committer | NeilBrown <neilb@suse.com> | 2015-10-24 08:16:19 +0200 |
commit | f6bed0ef0a808164f51197de062e0450ce6c1f96 (patch) | |
tree | 9fc2ec276f40ef36eaa14e295f543595472b56a9 /drivers/md/raid5.c | |
parent | raid5: add a new state for stripe log handling (diff) | |
download | linux-f6bed0ef0a808164f51197de062e0450ce6c1f96.tar.xz linux-f6bed0ef0a808164f51197de062e0450ce6c1f96.zip |
raid5: add basic stripe log
This introduces a simple log for raid5. Data/parity writing to raid
array first writes to the log, then write to raid array disks. If
crash happens, we can recovery data from the log. This can speed up
raid resync and fix write hole issue.
The log structure is pretty simple. Data/meta data is stored in block
unit, which is 4k generally. It has only one type of meta data block.
The meta data block can track 3 types of data, stripe data, stripe
parity and flush block. MD superblock will point to the last valid
meta data block. Each meta data block has checksum/seq number, so
recovery can scan the log correctly. We store a checksum of stripe
data/parity to the metadata block, so meta data and stripe data/parity
can be written to log disk together. otherwise, meta data write must
wait till stripe data/parity is finished.
For stripe data, meta data block will record stripe data sector and
size. Currently the size is always 4k. This meta data record can be made
simpler if we just fix write hole (eg, we can record data of a stripe's
different disks together), but this format can be extended to support
caching in the future, which must record data address/size.
For stripe parity, meta data block will record stripe sector. It's
size should be 4k (for raid5) or 8k (for raid6). We always store p
parity first. This format should work for caching too.
flush block indicates a stripe is in raid array disks. Fixing write
hole doesn't need this type of meta data, it's for caching extension.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Diffstat (limited to 'drivers/md/raid5.c')
-rw-r--r-- | drivers/md/raid5.c | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 4b789f1f4550..64a256538ff7 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -895,6 +895,8 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s) might_sleep(); + if (r5l_write_stripe(conf->log, sh) == 0) + return; for (i = disks; i--; ) { int rw; int replace_only = 0; @@ -3495,6 +3497,7 @@ returnbi: WARN_ON(test_bit(R5_SkipCopy, &dev->flags)); WARN_ON(dev->page != dev->orig_page); } + if (!discard_pending && test_bit(R5_Discard, &sh->dev[sh->pd_idx].flags)) { clear_bit(R5_Discard, &sh->dev[sh->pd_idx].flags); @@ -5745,6 +5748,7 @@ static int handle_active_stripes(struct r5conf *conf, int group, for (i = 0; i < batch_size; i++) handle_stripe(batch[i]); + r5l_write_stripe_run(conf->log); cond_resched(); |