| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
"if" statements of checkpoint updates have too many responsibilties.
This results in unclear code flow and duplicated code.
Refactor checkpoint update code and simplify "if" statements.
Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mdmon assumes that kernel marks checkpoint every 1/16 of the volume size
and that the checkpoints are equal in size. This is not true, kernel may
mark checkpoints more frequently depending on several factors, including
sync speed. This results in checkpoints reported by mdadm --examine
falling behind the one reported by kernel.
Remove hardcoded checkpoint interval checking.
Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
String "none" is used many times throughout the code.
Replace "none" strings with predefined macro.
Add str_is_none() for comparing strings with "none".
Replace str(n)cmp calls with function.
Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
sysfs_get_str() usages have inconsistant buffer size.
This results in wild buffer declarations and redundant memory usage.
Define maximum buffer size for sysfs strings.
Replace wild sysfs string buffer sizes for globaly defined value.
Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not all struct superswitch implement a get_bad_blocks() function,
yet mdmon seems to call it without checking for NULL and thus
occasionally segfaults in the test 10ddf-geometry.
Fix this by checking for NULL before calling it.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Jes Sorensen <jes@trained-monkey.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Up to this date signal() was used which implementation could vary [1].
Sigaction() call is preferred. This commit introduces replacement
from signal() to sigaction() by the use of signal_s() wrapper.
Also remove redundant signal.h header includes.
[1] https://man7.org/linux/man-pages/man2/signal.2.html
Signed-off-by: Lukasz Florczak <lukasz.florczak@linux.intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently if a md raid0/linear array gets one or more members removed while
being mounted, kernel keeps showing state 'clean' in the 'array_state'
sysfs attribute. Despite udev signaling the member device is gone, 'mdadm'
cannot issue the STOP_ARRAY ioctl successfully, given the array is mounted.
Nothing else hints that something is wrong (except that the removed devices
don't show properly in the output of mdadm 'detail' command). There is no
other property to be checked, and if user is not performing reads/writes
to the array, even kernel log is quiet and doesn't give a clue about the
missing member.
This patch is the mdadm counterpart of kernel new array state 'broken'.
The 'broken' state mimics the state 'clean' in every aspect, being useful
only to distinguish if an array has some member missing. All necessary
paths in mdadm were changed to deal with 'broken' state, and in case the
tool runs in a kernel that is not updated, it'll work normally, i.e., it
doesn't require the 'broken' state in order to work.
Also, this patch changes the way the array state is showed in the 'detail'
command (for raid0/linear only) - now it takes the 'array_state' sysfs
attribute into account instead of only rely in the MD_SB_CLEAN flag.
Cc: Jes Sorensen <jes.sorensen@gmail.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
|
|
|
|
|
|
| |
Logical oprators never belong at the beginning of a line.
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an update of acknowledged bad blocks file is notified, read entire
bad block list from sysfs file and compare it against local list of bad
blocks. If any obsolete entries are found, remove them from metadata.
As mdmon cannot perform any memory allocation, new superswitch method
get_bad_blocks is expected to return a list of bad blocks in metadata
without allocating memory. It's up to metadata handler to allocate all
required memory in advance.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If md has changed the state to 'blocked' and metadata handler supports
bad blocks, try process them first. If metadata handler has successfully
stored bad block, acknowledge it to md via 'badblocks' sysfs file. If
metadata handler has failed to store the new bad block (ie. lack of
space), remove bad block support for a disk by writing "-external_bbl"
to state sysfs file. If all bad blocks have been acknowledged, request
to unblock the array.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Acked-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Open 'badblocks' and 'unacknowledged_bad_blocks' sysfs files for each
disk in the array. Add them to the list of files observed by monitor.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bad block support has incremented sysfs disk state reported by kernel
("external_bbl") so it became longer than 20 bytes. It causes reshape to
fail as it reads truncated entry from sysfs.
Increase buffer so it can accommodate the string including all state
values currently implemented in kernel at the same time.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a case of successful completion of a resync (in the last step)
- read_and_act sometimes still reads sync_action as "resync"
but sync_completed already is set to component_size.
When this race occurs, sync operation is
marked as finished, but last_checkpoint is
overwritten with sync_completed. It will cause next
sync operation (ie. reshape) to be reported as complete immediately
after start - mdmon will write successful completion of the reshape
to metadata. This patch sets last_checkpoint to 0 once the sync
is completed to stop it happening.
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
make dprintf() print program name and __func__, so that
this messaging is consistent.
Also remove all __func__ messages from pr_err(). We shouldn't
leak that internal data in error message.
If we really want function name there, we new pr_XXX might
be wanted.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reverts 24a216bf:
"Monitor: Don't write metadata in inactive array state".
While it's true that writing meta data is usually not necessary
in readonly state, there is one important exception: if a
disk goes faulty, we want to record that, even if the array is
inactive.
We might as well just revert 24a216bf, because with the recently
submitted patch
"Monitor: don't set arrays dirty after transition to read-only"
those meta data writes that really annoying (for a clean, readonly,
healthy array during startup) are gone anyway.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reverts commit 4867e068. Setting arrays dirty after
transition from inactive to anything else causes unnecessary
meta data writes and may wreak trouble unnecessarily when
a disk was missing during assembly but the array was never
written to.
The reason for 4867e068 was a special situation during reshape
from RAID0 to RAID4. I ran all IMSM test cases with it reverted
and found no regressions, so I believe the reshape logic for
IMSM works fine in mdadm 3.3 also without this.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
| |
Correctly print out wake reason if it was a signal. Previous code
would print misleading select events (pselect(2) man page says the
fdsets become undefined in case of error).
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
|
| |
read_and_act() currently prints a debug message only very late.
Print the status seen by mdmon right away, to track mdmon's
actions more closely. Add a time stamp to observe long delays
between read_and_act calls, e.g. caused by meta data writes.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
|
|
| |
The kernel docs state that meta data is never written in states
clear, inactive, suspended, readonly, and read_auto.
Why should this be different for containers?
We need to write metadata when the array is disabled, though.
Tested with the DDF (10*) and IMSM (9*) tests, works.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
| |
Failure to read array_state can only mean the array has been
deleted by the kernel; it is not an indication that the array
is dirty.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When arrays are stopped, sysfs attributes may be deleted by
the kernel, and attempts to read these attributes will fail.
Setting resync_start to 0 is wrong in this case, because it
may make is_resync_complete() erroneously return
FALSE for a clean array. It is better to leave resync_start
untouched (the previously read value for this array).
Otherwise set_array_state() will pass thewrong state information
to the metadata handler, which will write it to disk, and at
the next restart an unnecessary recovery is started for the
array.
It is also possible that resync_start is actually *not* deleted
yet when read_and_act is running, and an apparently valid
value of "0" is read from it, with the same effect as described
above. This happens if the kernel has already called md_clean()
on the array (setting recovery_cp = 0), but the delayed removal
of "resync_start" hasn't happened yet. Therefore, in "clear"
state, "resync_start" shouldn't be read at all.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It makes no sense to listen for events on files that have
been deleted. This happens when arrays are stopped and the
kernel removes the associated sysfs structures.
Calling pselect() on the deleted attributes may cause a storm
of wake events.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We widely use a "devnum" which is 0 or +ve for md%d devices
and -ve for md_d%d devices.
But I want to be able to use md_%s device names.
So get rid of devnum (a number) and use devnm (a 32char string).
eg.
md0
md_d2
md_home
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
| |
If a 'remove' fails there is no certainty that another event will
happen soon, so make sure we retry soon anyway.
Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
| |
Rather than just a number, use a named flag.
This makes the code easier to understand and allows room for returning
more flags later.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we see a failed device, we both unblock and remove it (after
updating the metadata).
However it might not be blocked as there can be a delay between
unblocking and the device being free to be removed.
If this happens the clearing of 'blocked' succeeds so md sends a sysfs
notification and mdmon checks again and tries to clear 'blocked'
again.
Thus it enters a busy-loop until the 'remove' succeeds.
To avoid this, only try to unblock if the device was blocked.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Description of the bug:
Sometimes mdmon crashes after changing RAID level from 1 to 0 (takeover).
Cause of the bug:
The managemon marks an active_array for removal from monitoring
by assigning a->container to NULL value (in the "manage_member" function).
Sometimes (during stress test) it happens right when the monitor
is in the "read_and_act" function and a->container pointer is in use.
This causes the monitor crashes.
Solution:
The active array has to be marked for removal in another way
than setting NULL pointer when it can be in use.
A new field "to_remove" was added to the "active_array" structure.
It is used in the managemon to mark a container to remove
(instead of the old assigment: a->container = NULL)
and monitor checks it to determine if the array should be removed.
The field "to_remove" should be checked in some other places
to avoid managing of the array which is going to be removed.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
If we have an array with two failed disks and the array is in degraded
state (now it is possible only for raid10 with 2 degraded mirrors) and
we have two spare devices in the container, recovery process should be
triggered on booth failed disks. It does not.
Recovery is triggered only for first failed disk.
Second failed disk remains unchanged although the spare drive exists
in the container and is ready to recovery.
Root cause:
mdmon does not check if the array is degraded after recovery of first
drive is completed.
Resolution:
Check if current number of disks in the array equals target number of disks.
If not, trigger degradation check and then recovery process.
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tracking RAID0 arrays doesn't really work. There is no need,
and there are some sysfs files which won't exist when the array
appears and then won't be opened when the level is changed.
So simply ignore RAID0 and LINEAR arrays - don't add them when they
appear and if an array we are monitoring turns into one of these,
discard it promptly.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
|
| |
If mdmon is shutting down because there are no devices
left to look at, then don't wait 5 seconds for an O_EXCL open,
and that can block progress of --grow.
Only wait for O_EXCL if we received a signal.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|\ |
|
| |
| |
| |
| |
| |
| | |
These should be open or closed together.
Signed-off-by: NeilBrown <neilb@suse.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If we can't read actual disk state, it shoud be initiated
to 0.
Overwise it may be out of date value resulting false action
later in code (e.g. set disk to improper state).
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When reshape is finished monitor has to set last checkpoint
to the array end to allow metatdata for reshape finalization.
Metadata has to know if reshape is finished or it is broken
On reshape finish metadata finalization is required.
When reshape is broken, metadata must remain as is to allow
for reshape restart from checkpoint.
This can be resolved based on reshape_position sysfs entry.
When it is equal to 'none', it means that md finishes work.
In such situation move checkpoint to the end of array.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If we can't read actual disk state, it shoud be initiated
to 0.
Overwise it may be out of date value resulting false action
later in code (e.g. set disk to improper state).
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
md signals reshape completion (whole area or parts) by setting
sync_completed to 0. This causes in set_array_state() to rollback
metadata changes (super-intel.c:4977. To avoid this do not allow for
set last_checkpoint to 0 if reshape is finished.
This was also root cause of my previous fix for finalization reshape
that I agreed earlier is not necessary,
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
| |
| |
| |
| |
| |
| |
| |
| | |
When raid0 array is takeovered to raid4 for reshape it should be possible to detect
that array for reshape is monitored now for metadata update.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For level migration support it is necessary to allow mdmon to react for level changes.
It has to have ability to change configuration of active array,
and for array level change to raid0 finish array monitoring.
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We need to allow metadata to handle progress of reshape,
completion, and abort-before-start.
Include all those in ->set_array_state()
Signed-off-by: NeilBrown <neilb@suse.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When mdadm starts a reshape, it might add some devices to the array
first. mdmon needs to notice the reshape starting and check for any
new devices. If there are any they need to be provided to be
monitored.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|/
|
|
|
|
|
|
|
|
| |
Sometimes (~50%) mdadm -Ss cannot stop container as mdmon opens its device
and do not close it before exit(). The period between open and release of
handle is too long and md is not able stop device. Releasing handle before
exit does not block md.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When sync_action is idle mdmon takes the latest value of md/resync_start
or md/<dev>/recovery_start to record the resync/rebuild checkpoint in
the metadata. However, now that mdmon is reading sync_completed there
is no longer a need to wait for, or force an idle event to take a
checkpoint.
Simply update the forward progress of ->last_checkpoint at every wakeup
event and force it to be recorded at least every 1/16th array-size
interval. It may be recorded more frequently if a ->set_array_state()
event occurs.
This also cleans up some confusion in handling the dual-rebuild case.
If more than one spare has been activated the kernel starts the rebuild
at the lowest recovery offset, so we do not need to worry about
min_recovery_start().
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The kernel updates and notifies md/sync_completed when it is time to
take a checkpoint. When this occurs (at 1/16 array size intervals)
write 'idle' to md/sync_action to have the current recovery position
updated in recovery_start and resync_start.
Requires the metadata handler to reset ->last_checkpoint when it has
determined that recovery has ended.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we don't "mdadm --takeover" until /var/run is writable
there is no need to continually try to create files in there.
So only create these files at startup and fail if they cannot be
made. This means that to start an array with externally managed
metadata, either /var/run or ALT_RUN (e.g. /lib/init/rw) must be
writable. To 'takeover' from a previous mdmon instance, /var/run
must be writable.
This means we don't need to worry about SIGHUP (which was once used to
tell us it was time to create .pid) and SIGALRM.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
| |
Replace occurrences of ~0ULL to make it clear we are talking about maximal
resync/recovery position.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
|
|
|
|
| |
Prepare the code to handle saving a recovery checkpoint.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
We don't need to sprinkle reads of this attribute all over the place,
just once at the entry of read_and_act(). Also, the mdinfo structure
for the array already has a 'resync_start' member, so just reuse that.
Finally, rename get_resync_start() to read_resync_start to make it
consistent with the other sysfs accessors in monitor.c.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
|
|
|
|
| |
Also removed 'paper' addresses.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From 2.6.30, /proc/mounts and various /sys files will
probably always returns 'readable' to select, so we will need
to wait on POLLPRI to get the 'new data is available' signal.
When using select, this corresponds to an 'exception', so
adjust calls to select accordingly.
In one case we sometimes wait on a socket and sometime on
/proc/mounts, so we need to test which.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
|
|
|
|
|
|
|
| |
Starting with 2.6.30 the md/resync_start attribute will no longer return
a non-sensical number when resync is complete, instead it now returns
'none'.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|