summaryrefslogtreecommitdiffstats
path: root/fs/ceph/messenger.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* ceph: avoid reopening osd connections when address hasn't changedSage Weil2010-03-231-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | We get a fault callback on _every_ tcp connection fault. Normally, we want to reopen the connection when that happens. If the address we have is bad, however, and connection attempts always result in a connection refused or similar error, explicitly closing and reopening the msgr connection just prevents the messenger's backoff logic from kicking in. The result can be a console full of [ 3974.417106] ceph: osd11 10.3.14.138:6800 connection failed [ 3974.423295] ceph: osd11 10.3.14.138:6800 connection failed [ 3974.429709] ceph: osd11 10.3.14.138:6800 connection failed Instead, if we get a fault, and have outstanding requests, but the osd address hasn't changed and the connection never successfully connected in the first place, do nothing to the osd connection. The messenger layer will back off and retry periodically, because we never connected and thus the lossy bit is not set. Instead, touch each request's r_stamp so that handle_timeout can tell the request is still alive and kicking. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix connection fault con_work reentrancy problemSage Weil2010-03-231-2/+0
| | | | | | | | | | | | The messenger fault was clearing the BUSY bit, for reasons unclear. This made it possible for the con->ops->fault function to reopen the connection, and requeue work in the workqueue--even though the current thread was already in con_work. This avoids a problem where the client busy loops with connection failures on an unreachable OSD, but doesn't address the root cause of that problem. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix authenticator timeoutSage Weil2010-03-211-8/+1
| | | | | | | | | | | | | | | | | We were failing to reconnect to services due to an old authenticator, even though we had the new ticket, because we weren't properly retrying the connect handshake, because we were calling an old/incorrect helper that left in_base_pos incorrect. The result was a failure to reconnect to the OSD or MDS (with an authentication error) if the MDS restarted after the service had been up a few hours (long enough for the original authenticator to be invalid). This was only a problem if the AUTH_X authentication was enabled. Now that the 'negotiate' and 'connect' stages are fully separated, we should use the prepare_read_connect() helper instead, and remove the obsolete one. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: reset front len on return to msgpool; BUG on mismatched front iovSage Weil2010-03-021-0/+2
| | | | | | | | | | Reset msg front len when a message is returned to the pool: the caller may have changed it. BUG if we try to send a message with a hdr.front_len that doesn't match the front iov. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: reset bits on connection closeSage Weil2010-03-021-0/+3
| | | | | | | | | | | | Clear LOSSYTX bit, so that if/when we reconnect, said reconnect will retry on failure. Clear _PENDING bits too, to avoid polluting subsequent connection state. Drop unused REGISTERED bit. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix connection fault STANDBY checkSage Weil2010-02-251-18/+13
| | | | | | | | | | Move any out_sent messages to out_queue _before_ checking if out_queue is empty and going to STANDBY, or else we may drop something that was never acked. And clean up the code a bit (less goto). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: invalidate_authorizer without con->mutex heldSage Weil2010-02-251-8/+9
| | | | | | | | This fixes lock ABBA inversion, as the ->invalidate_authorizer() op may need to take a lock (or even call back into the messenger). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix up unexpected message handlingSage Weil2010-02-231-2/+3
| | | | | | | | Fix skipping of unexpected message types from osd, mon. Clean up pr_info and debug output. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: cancel delayed work when closing connectionSage Weil2010-02-171-2/+5
| | | | | | | | | | | | This ensures that if/when we reopen the connection, we can requeue work on the connection immediately, without waiting for an old timer to expire. Queue new delayed work inside con->mutex to avoid any race. This fixes problems with clients failing to reconnect to the MDS due to the client_reconnect message arriving too late (due to waiting for an old delayed work timeout to expire). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: allow connection to be reopened by fault callbackSage Weil2010-02-171-1/+1
| | | | | | | | | | | Fix the messenger to allow a ceph_con_open() during the fault callback. Previously the work wasn't getting queued on the connection because the fault path avoids requeued work (normally spurious). Loop on reopening by checking for the OPENING state bit. This fixes OSD reconnects when a TCP connection drops. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix msgr to keep sent messages until ackedSage Weil2010-02-141-2/+2
| | | | | | | | | The test was backwards from commit b3d1dbbd: keep the message if the connection _isn't_ lossy. This allows the client to continue when the TCP connection drops for some reason (network glitch) but both ends survive. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: allow renewal of auth credentialsSage Weil2010-02-111-0/+9
| | | | | | | | | Add infrastructure to allow the mon_client to periodically renew its auth credentials. Also add a messenger callback that will force such a renewal if a peer rejects our authenticator. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: include type in ceph_entity_addr, filepathSage Weil2010-01-291-0/+1
| | | | | | | Include a type/version in ceph_entity_addr and filepath. Include extra byte in filepath encoding as necessary. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: keep reserved replies on the request structureYehuda Sadeh2010-01-251-10/+10
| | | | | | | | This includes treating all the data preallocation and revokation at the same place, not having to have a special case for the reserved pages. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
* ceph: alloc message data pages and check if tid existsYehuda Sadeh2010-01-251-31/+2
| | | | | | | | | | | Now doing it in the same callback that is also responsible for allocating the 'front' part of the message. If we get a message that we haven't got a corresponding tid for, mark it for skipping. Moving the mutex unlock/lock from the osd alloc_msg callback to the calling function in the messenger. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
* ceph: refactor messages data section allocationYehuda Sadeh2010-01-251-28/+39
| | | | Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
* ceph: allocate middle of message before stating to readYehuda Sadeh2010-01-251-60/+82
| | | | | | | Both front and middle parts of the message are now being allocated at the ceph_alloc_msg(). Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
* ceph: remove unused erank fieldSage Weil2010-01-141-11/+8
| | | | | | | The ceph_entity_addr erank field is obsolete; remove it. Get rid of trivial addr comparison helpers while we're at it. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: support ceph_pagelist for message payloadSage Weil2009-12-231-4/+20
| | | | | | | | | | | The ceph_pagelist is a simple list of whole pages, strung together via their lru list_head. It facilitates encoding to a "buffer" of unknown size. Allow its use in place of the ceph_msg page vector. This will be used to fix the huge buffer preallocation woes of MDS reconnection. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: add feature bits to connection handshake (protocol change)Sage Weil2009-12-231-10/+37
| | | | | | | | Define supported and required feature set. Fail connection if the server requires features we do not support (TAG_FEATURES), or if the server does not support features we require. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: control access to page vector for incoming dataSage Weil2009-12-231-0/+29
| | | | | | | | | | | | | | When we issue an OSD read, we specify a vector of pages that the data is to be read into. The request may be sent multiple times, to multiple OSDs, if the osdmap changes, which means we can get more than one reply. Only read data into the page vector if the reply is coming from the OSD we last sent the request to. Keep track of which connection is using the vector by taking a reference. If another connection was already using the vector before and a new reply comes in on the right connection, revoke the pages from the other connection. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use connection mutex to protect read and write stagesSage Weil2009-12-231-18/+31
| | | | | | | | | Use a single mutex (previously out_mutex) to protect both read and write activity from concurrent ceph_con_* calls. Drop the mutex when doing callbacks to avoid nested locking (the callback may need to call something like ceph_con_close). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: remove unaccessible codeYehuda Sadeh2009-12-221-4/+0
| | | | Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
* ceph: plug leak of incoming message during connection fault/closeSage Weil2009-12-221-2/+8
| | | | | | | If we explicitly close a connection, or there is a socket error, we need to drop any partially received message. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: hex dump corrupt server data to KERN_DEBUGSage Weil2009-12-221-0/+20
| | | | | | Also, print fsid using standard format, NOT hex dump. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: don't save sent messages on lossy connectionsSage Weil2009-12-221-3/+7
| | | | | | | For lossy connections we drop all state on socket errors, so there is no reason to keep sent ceph_msg's around. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: detect lossy state of connectionSage Weil2009-12-221-2/+4
| | | | | | | The server indicates whether a connection is lossy; set our LOSSYTX bit appropriately. Do not set lossy bit on outgoing connections. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: plug msg leak in con_faultSage Weil2009-12-221-2/+7
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: carry explicit msg reference for currently sending messageSage Weil2009-12-221-4/+18
| | | | | | | Carry a ceph_msg reference for connection->out_msg. This will allow us to make out_sent optional. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use kref for ceph_msgSage Weil2009-12-081-29/+18
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: simplify ceph_buffer interfaceSage Weil2009-12-071-1/+1
| | | | | | | | | We never allocate the ceph_buffer and buffer separtely, so use a single constructor. Disallow put on NULL buffer; make the caller check. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: reset msgr backoff during open, not after successful handshakeSage Weil2009-11-211-2/+1
| | | | | | | | Reset the backoff delay when we reopen the connection, so that the delays for any initial connection problems are reasonable. We were resetting only after a successful handshake, which was of limited utility. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: negotiate authentication protocol; implement AUTH_NONE protocolSage Weil2009-11-191-2/+52
| | | | | | | | | | | | | | | | When we open a monitor session, we send an initial AUTH message listing the auth protocols we support, our entity name, and (possibly) a previously assigned global_id. The monitor chooses a protocol and responds with an initial message. Initially implement AUTH_NONE, a dummy protocol that provides no security, but works within the new framework. It generates 'authorizers' that are used when connecting to (mds, osd) services that simply state our entity name and global_id. This is a wire protocol change. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: remove unnecessary ceph_con_shutdownSage Weil2009-11-181-12/+1
| | | | | | | We require that ceph_con_close be called before we drop the connection, so this is unneeded. Just BUG if con->sock != NULL. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: separate banner and connect during handshake into distinct stagesSage Weil2009-11-101-42/+75
| | | | | | | | We need to make sure we only swab the address during the banner once. So break process_banner out of process_connect, and clean up the surrounding code so that these are distinct phases of the handshake. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: convert port endiannessSage Weil2009-11-051-2/+2
| | | | | | The port is informational only, but we should make it correct. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use fixed endian encoding for ceph_entity_addrSage Weil2009-11-041-5/+18
| | | | | | | | | We exchange struct ceph_entity_addr over the wire and store it on disk. The sockaddr_storage.ss_family field, however, is host endianness. So, fix ss_family endianness to big endian when sending/receiving over the wire. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: update to mon client protocol v15Sage Weil2009-10-101-1/+1
| | | | | | | The mon request headers now include session_mon information that must be properly initialized. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: messenger librarySage Weil2009-10-061-0/+2019
A generic message passing library is used to communicate with all other components in the Ceph file system. The messenger library provides ordered, reliable delivery of messages between two nodes in the system. This implementation is based on TCP. Signed-off-by: Sage Weil <sage@newdream.net>