diff options
author | Guoqing Jiang <gqjiang@suse.com> | 2015-07-10 11:01:15 +0200 |
---|---|---|
committer | NeilBrown <neilb@suse.com> | 2015-08-31 19:41:41 +0200 |
commit | 66099bb0ee6c20f91ace3fa5f82202fbceb67d8e (patch) | |
tree | 9b5be424509589322b12a9a6d1571e0b32111731 /drivers/md/md-cluster.c | |
parent | md-cluster: transfer the resync ownership to another node (diff) | |
download | linux-66099bb0ee6c20f91ace3fa5f82202fbceb67d8e.tar.xz linux-66099bb0ee6c20f91ace3fa5f82202fbceb67d8e.zip |
md-cluster: fix deadlock issue on message lock
There is problem with previous communication mechanism, and we got below
deadlock scenario with cluster which has 3 nodes.
Sender Receiver Receiver
token(EX)
message(EX)
writes message
downconverts message(CR)
requests ack(EX)
get message(CR) gets message(CR)
reads message reads message
requests EX on message requests EX on message
To fix this problem, we do the following changes:
1. the sender downconverts MESSAGE to CW rather than CR.
2. and the receiver request PR lock not EX lock on message.
And in case we failed to down-convert EX to CW on message, it is better to
unlock message otherthan still hold the lock.
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Lidong Zhong <ldzhong@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Diffstat (limited to 'drivers/md/md-cluster.c')
-rw-r--r-- | drivers/md/md-cluster.c | 14 |
1 files changed, 7 insertions, 7 deletions
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c index 47199addae04..85b7836fb4b5 100644 --- a/drivers/md/md-cluster.c +++ b/drivers/md/md-cluster.c @@ -488,8 +488,8 @@ static void recv_daemon(struct md_thread *thread) /*release CR on ack_lockres*/ dlm_unlock_sync(ack_lockres); - /*up-convert to EX on message_lockres*/ - dlm_lock_sync(message_lockres, DLM_LOCK_EX); + /*up-convert to PR on message_lockres*/ + dlm_lock_sync(message_lockres, DLM_LOCK_PR); /*get CR on ack_lockres again*/ dlm_lock_sync(ack_lockres, DLM_LOCK_CR); /*release CR on message_lockres*/ @@ -522,7 +522,7 @@ static void unlock_comm(struct md_cluster_info *cinfo) * The function: * 1. Grabs the message lockresource in EX mode * 2. Copies the message to the message LVB - * 3. Downconverts message lockresource to CR + * 3. Downconverts message lockresource to CW * 4. Upconverts ack lock resource from CR to EX. This forces the BAST on other nodes * and the other nodes read the message. The thread will wait here until all other * nodes have released ack lock resource. @@ -543,12 +543,12 @@ static int __sendmsg(struct md_cluster_info *cinfo, struct cluster_msg *cmsg) memcpy(cinfo->message_lockres->lksb.sb_lvbptr, (void *)cmsg, sizeof(struct cluster_msg)); - /*down-convert EX to CR on Message*/ - error = dlm_lock_sync(cinfo->message_lockres, DLM_LOCK_CR); + /*down-convert EX to CW on Message*/ + error = dlm_lock_sync(cinfo->message_lockres, DLM_LOCK_CW); if (error) { - pr_err("md-cluster: failed to convert EX to CR on MESSAGE(%d)\n", + pr_err("md-cluster: failed to convert EX to CW on MESSAGE(%d)\n", error); - goto failed_message; + goto failed_ack; } /*up-convert CR to EX on Ack*/ |