diff options
author | David Teigland <teigland@redhat.com> | 2007-01-24 17:21:33 +0100 |
---|---|---|
committer | Steven Whitehouse <swhiteho@redhat.com> | 2007-02-05 19:37:50 +0100 |
commit | b790c3b7c38aae28c497bb363a6fe72f7c96568f (patch) | |
tree | e4dda1c5c775b4ae164a997f1216147066de31e3 /fs/gfs2/ops_inode.c | |
parent | [DLM] saved dlm message can be dropped (diff) | |
download | linux-b790c3b7c38aae28c497bb363a6fe72f7c96568f.tar.xz linux-b790c3b7c38aae28c497bb363a6fe72f7c96568f.zip |
[DLM] can miss clearing resend flag
A long, complicated sequence of events, beginning with the RESEND flag not
being cleared on an lkb, can result in an unlock never completing.
- lkb on waiters list for remote lookup
- the remote node is both the dir node and the master node, so
it optimizes the lookup into a request and sends a request
reply back
- the request reply is saved on the requestqueue to be processed
after recovery
- recovery runs dlm_recover_waiters_pre() which sets RESEND flag
so the lookup will be resent after recovery
- end of recovery: process_requestqueue takes saved request reply
which removes the lkb off the waitesr list, _without_ clearing
the RESEND flag
- end of recovery: dlm_recover_waiters_post() doesn't do anything
with the now completed lookup lkb (would usually clear RESEND)
- later, the node unmounts, unlocks this lkb that still has RESEND
flag set
- the lkb is on the waiters list again, now for unlock, when recovery
occurs, dlm_recover_waiters_pre() shows the lkb for unlock with RESEND
set, doesn't do anything since the master still exists
- end of recovery: dlm_recover_waiters_post() takes this lkb off
the waiters list because it has the RESEND flag set, then reports
an error because unlocks are never supposed to be handled in
recover_waiters_post().
- later, the unlock reply is received, doesn't find the lkb on
the waiters list because recover_waiters_post() has wrongly
removed it.
- the unlock operation has been lost, and we're left with a
stray granted lock
- unmount spins waiting for the unlock to complete
The visible evidence of this problem will be a node where gfs umount is
spinning, the dlm waiters list will be empty, and the dlm locks list will
show a granted lock.
The fix is simply to clear the RESEND flag when taking an lkb off the
waiters list.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Diffstat (limited to 'fs/gfs2/ops_inode.c')
0 files changed, 0 insertions, 0 deletions