ocfs2: Reconnect after idle time out.

Currently, o2net connects to a node on hb_up and disconnects on hb_down and net timeout. It disconnects on net timeout is ok, but it should attempt to reconnect back. This is because sometimes nodes get overloaded enough that the network connection breaks but the disk hb does not. And if we get into that situation, we either fence (unnecessarily) or wait for its disk hb to die (and sometimes hang in the process). So in this updated scheme, when the network disconnects, we keep attempting to reconnect till we succeed or we get a disk hb down event. If the other node is really dead, then we will eventually get a node down event. If not, we should be able to connect again and continue. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
author: Tao Ma <tao.ma@oracle.com> 2008-03-05 08:50:12 +0100
committer: Mark Fasheh <mfasheh@suse.com> 2008-04-18 17:56:10 +0200
commit: 5cc3bf2786f63cceb191c3c02ddd83c6f38a7d64 (patch)
tree: a9d7f6fa7d251cff67d6b177835ff1f43d23ab2d /fs/ocfs2/cluster/tcp_internal.h
parent: ocfs2/dlm: Cleanup lockres print (diff)
download: linux-5cc3bf2786f63cceb191c3c02ddd83c6f38a7d64.tar.xz
linux-5cc3bf2786f63cceb191c3c02ddd83c6f38a7d64.zip
1 files changed, 2 insertions, 0 deletions
diff --git a/fs/ocfs2/cluster/tcp_internal.h b/fs/ocfs2/cluster/tcp_internal.h
index d25b9af28500..b4c5586f46ea 100644
--- a/fs/ocfs2/cluster/tcp_internal.h
+++ b/fs/ocfs2/cluster/tcp_internal.h
@@ -95,6 +95,8 @@ struct o2net_node {
 	unsigned			nn_sc_valid:1;
 	/* if this is set tx just returns it */
 	int				nn_persistent_error;
+	/* It is only set to 1 after the idle time out. */
+	atomic_t			nn_timeout;
 
 	/* threads waiting for an sc to arrive wait on the wq for generation
 	 * to increase.  it is increased when a connecting socket succeeds
author	Tao Ma <tao.ma@oracle.com>	2008-03-05 08:50:12 +0100
committer	Mark Fasheh <mfasheh@suse.com>	2008-04-18 17:56:10 +0200
commit	5cc3bf2786f63cceb191c3c02ddd83c6f38a7d64 (patch)
tree	a9d7f6fa7d251cff67d6b177835ff1f43d23ab2d /fs/ocfs2/cluster/tcp_internal.h
parent	ocfs2/dlm: Cleanup lockres print (diff)
download	linux-5cc3bf2786f63cceb191c3c02ddd83c6f38a7d64.tar.xz linux-5cc3bf2786f63cceb191c3c02ddd83c6f38a7d64.zip