summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* 9p: Improve debug supportEric Van Hensbergen2008-10-173-29/+72
| | | | | | | | | | | | | The new debug support lacks some of the information that the previous fcprint code provided -- this patch focuses on better presentation of debug data along with more helpful debug along error paths. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: eliminate depricated conv functionsEric Van Hensbergen2008-10-176-1160/+37
| | | | | | | | | | | | | Remove depricated conv functions which have been replaced with new protocol routines. This patch also reworks the one instance of the file-system code which directly calls conversion routines (to accomplish unpacking dirreads). Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: rework client code to use new protocol support functionsEric Van Hensbergen2008-10-1712-544/+611
| | | | | | | | | | Now that the new protocol functions are in place, this patch switches the client code to using the new support code. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: remove unnecessary tag field from p9_req_t structureEric Van Hensbergen2008-10-172-3/+1
| | | | | | | | | | | This removes the vestigial tag field from the p9_req_t structure. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: remove 9p fcall debug printsEric Van Hensbergen2008-10-175-386/+0
| | | | | | | | | | | | | | | One of the current debug options allows users to get a verbose dump of fcalls. This isn't really necessary as correctly parsed protocol frames can be printed as part of the code in the client functions. The consolidated printfcalls structure would require new entries to be added for every extension. This patch removes the debug print methods and their use. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: add new protocol support codeEric Van Hensbergen2008-10-174-1/+496
| | | | | | | | | | | | | | | | | This adds a new protocol processing support code based on Anthony Liguori's 9p library code. This code performs protocol marshalling/unmarshalling using printf like strings to represent protocol elements. It is my intent to use them to replace the current functions in conv.c as well as the p9_create_* functions. This should make the client implementation much more clear, and also make it much easier to add new protocol extensions by limiting the number of places in which changes need to be made. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: encapsulate version functionEric Van Hensbergen2008-10-171-30/+44
| | | | | | | | | | | | | | Alsmot all 9P client wire functions have their own (set of) functions. Tversion is an exception as its encapsulated into the client_create code. This patch moves the protocol specifics of this to a function to match the rest of the code. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: move dirread to fs layerEric Van Hensbergen2008-10-173-124/+38
| | | | | | | | | | | | | | | | | Currently reading a directory is implemented in the client code. This function is not actually a wire operation, but a meta operation which calls read operations and processes the results. This patch moves this functionality to the fs layer and calls component wire operations instead of constructing their packets. This provides a cleaner separation and will help when we reorganize the client functions and protocol processing methods. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: adjust 9p vfs write operationEric Van Hensbergen2008-10-171-7/+29
| | | | | | | | | | | | | Currently, the 9p net wire operation ensures that all data is sent by sending multiple packets if the data requested is larger than the msize. This is better handled in the vfs code so that we can simplify wire operations to being concerned with only putting data onto and taking data off of the wire. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: move readn meta-function from client to fs layerEric Van Hensbergen2008-10-176-36/+57
| | | | | | | | | | | | | | | There are a couple of methods in the client code which aren't actually wire operations. To keep things organized cleaner, these operations are being moved to the fs layer. This patch moves the readn meta-function (which executes multiple wire reads until a buffer is full) to the fs layer. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: consolidate read/write functionsEric Van Hensbergen2008-10-173-130/+34
| | | | | | | | | | | | | | | Currently there are two separate versions of read and write. One for dealing with user buffers and the other for dealing with kernel buffers. There is a tremendous amount of code duplication in the otherwise identical versions of these functions. This patch adds an additional user buffer parameter to read and write and conditionalizes handling of the buffer on whether the kernel buffer or the user buffer is populated. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: drop broken unused error path from p9_conn_create()Tejun Heo2008-10-171-16/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Post p9_fd_poll() error path which checks m->poll_waddr[i] for PTR_ERR value has the following problems. * It's completely unused. Error value is set iff NULL @wait_address has been specified to p9_pollwait() which is guaranteed not to happen. * It dereferences @m after deallocating it (introduced by 571ffeaf and spotted by Raja R Harinath. * It returned the wrong value on error. It should return poll_waddr[i] but it returnes poll_waddr (introduced by 571ffeaf). * p9_mux_poll_stop() doesn't handle PTR_ERR value. It will try to operate on the PTR_ERR value as if it's a normal pointer and cause oops. As the error path is bogus in the first place, there's no reason to hold onto it. Kill it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Cc: Raja R Harinath <harinath@hurrynot.org>
* 9p: make rpc code common and rework flush codeEric Van Hensbergen2008-10-175-313/+316
| | | | | | | | | | | | | This code moves the rpc function to the common client base, reorganizes the flush code to be more simple and stable, and makes the necessary adjustments to the underlying transports to adapt to the new structure. This reduces the overall amount of code duplication between the transports and should make adding new transports more straightforward. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: use the rcall structure passed in the request in trans_fd read_workEric Van Hensbergen2008-10-171-63/+66
| | | | | | | | | This patch reworks the read_work function to enable it to directly use a passed in rcall structure. This should help allow us to remove unnecessary copies in the future. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: apply common request code to trans_fdEric Van Hensbergen2008-10-174-172/+126
| | | | | | | Apply the now common p9_req_t structure to the fd transport. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: apply common tagpool handling to trans_fdEric Van Hensbergen2008-10-171-34/+10
| | | | | | | Simplify trans_fd by using new common client tagpool structure. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: move request management to client codeEric Van Hensbergen2008-10-173-127/+238
| | | | | | | | | | The virtio transport uses a simplified request management system that I want to use for all transports. This patch adapts and moves the exisiting code for managing requests to the client common code. Later patches will apply these mechanisms to the other transports. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: eliminate callback complexityEric Van Hensbergen2008-10-171-83/+66
| | | | | | | | | | The current trans_fd rpc mechanisms use a dynamic callback mechanism which introduces a lot of complexity which only accomodates a single special case. This patch removes much of that complexity in favor of a simple exception mechanism to deal with flushes. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: consolidate mux_rpc and request structureEric Van Hensbergen2008-10-171-46/+22
| | | | | | | | | | Currently, trans_fd has two structures (p9_req and p9_mux-rpc) which contain mostly duplicate data. This patch consolidates these two structures and removes p9_mux_rpc. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: remove unnecessary prototypesEric Van Hensbergen2008-10-172-548/+495
| | | | | | | | | | | Cleanup files by reordering functions in order to remove need for unnecessary function prototypes. There are no code changes here, just functions being moved around and prototypes being eliminated. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: remove duplicate client stateEric Van Hensbergen2008-10-171-15/+11
| | | | | | | | Now that we are passing client state into the transport modules, remove duplicate state which is present in transport private structures. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p: consolidate transport structureEric Van Hensbergen2008-10-177-207/+146
| | | | | | | | | | | | | Right now there is a transport module structure which provides per-transport type functions and data and a transport structure which contains per-instance public data as well as function pointers to instance specific functions. This patch moves public transport visible instance data to the client structure (which in some cases had duplicate data) and consolidates the functions into the transport module structure. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 9p-trans_fd: use single pollerTejun Heo2008-10-171-166/+86
| | | | | | | | | | | | | | | | | | | | | | | | trans_fd used pool of upto 100 pollers to monitor the r/w fds. The approach makes sense in userspace back when the only available interfaces were poll(2) and select(2). As each event monitor - trigger - handling iteration took O(n) where `n' is the number of watched fds, it makes sense to spread them to many pollers such that the `n' can be divided by the number of pollers. However, this doesn't make any sense in kernel because persistent edge triggered event monitoring is how the whole thing is implemented in the kernel in the first place. This patch converts trans_fd to use single poller which watches all the fds instead of the poll of pollers approach. All the fds are registered for monitoring on creation and only the fds with pending events are scanned when something happens much like how epoll is implemented. This change makes trans_fd fd monitoring more efficient and simpler. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* {pci,pnp} quirks.c: don't use deprecated print_fn_descriptor_symbol()Linus Torvalds2008-10-172-4/+2
| | | | | | | | | | | I dunno how this missed Bjorn and his quest to use %pF in commit c80cfb0406c01bb5da91bfe30f5cb1fd96831138 ("vsprintf: use new vsprintf symbolic function pointer format"), but it did. So use %pF in the two remaining places that still tried to print out function pointers by hand. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds2008-10-1726-500/+955
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (53 commits) NFS: Fix a resolution problem with nfs_inode->cache_change_attribute NFS: Fix the resolution problem with nfs_inode_attrs_need_update() NFS: Changes to inode->i_nlinks must set the NFS_INO_INVALID_ATTR flag RPC/RDMA: ensure connection attempt is complete before signalling. RPC/RDMA: correct the reconnect timer backoff RPC/RDMA: optionally emit useful transport info upon connect/disconnect. RPC/RDMA: reformat a debug printk to keep lines together. RPC/RDMA: harden connection logic against missing/late rdma_cm upcalls. RPC/RDMA: fix connect/reconnect resource leak. RPC/RDMA: return a consistent error, when connect fails. RPC/RDMA: adhere to protocol for unpadded client trailing write chunks. RPC/RDMA: avoid an oops due to disconnect racing with async upcalls. RPC/RDMA: maintain the RPC task bytes-sent statistic. RPC/RDMA: suppress retransmit on RPC/RDMA clients. RPC/RDMA: fix connection IRD/ORD setting RPC/RDMA: support FRMR client memory registration. RPC/RDMA: check selected memory registration mode at runtime. RPC/RDMA: add data types and new FRMR memory registration enum. RPC/RDMA: refactor the inline memory registration code. NFS: fix nfs_parse_ip_address() corner case ...
| * Merge branch 'next'Trond Myklebust2008-10-1526-500/+955
| |\
| | * NFS: Fix a resolution problem with nfs_inode->cache_change_attributeTrond Myklebust2008-10-152-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cache_change_attribute is used to decide whether or not a directory has changed, in which case we may need to look it up again. Again, the use of 'jiffies' leads to an issue of resolution. Once again, the fix is to change nfs_inode->cache_change_attribute, and just make it a simple counter. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * NFS: Fix the resolution problem with nfs_inode_attrs_need_update()Trond Myklebust2008-10-154-18/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It appears that 'jiffies' timestamps do not have high enough resolution for nfs_inode_attrs_need_update(). One problem is that a GETATTR can be launched within < 1 jiffy of the last operation that updated the attribute. Another problem is that RPC calls can take < 1 jiffy to execute. We can fix this by switching the variables to use a simple global counter that gets incremented every time we start another GETATTR call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * NFS: Changes to inode->i_nlinks must set the NFS_INO_INVALID_ATTR flagTrond Myklebust2008-10-151-0/+3
| | | | | | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: ensure connection attempt is complete before signalling.Tom Talpey2008-10-101-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | The RPC/RDMA connection logic could return early from reconnection attempts, leading to additional spurious retries. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: correct the reconnect timer backoffTom Talpey2008-10-101-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RPC/RDMA code had a constant 5-second reconnect backoff, and always performed it, even when re-establishing a connection to a server after the RPC layer closed it due to being idle. Make it an geometric backoff (up to 30 seconds), and don't delay idle reconnect. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: optionally emit useful transport info upon connect/disconnect.Tom Talpey2008-10-102-1/+22
| | | | | | | | | | | | | | | Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: reformat a debug printk to keep lines together.Tom Talpey2008-10-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | The send marshaling code split a particular dprintk across two lines, which makes it hard to extract from logfiles. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: harden connection logic against missing/late rdma_cm upcalls.Tom Talpey2008-10-103-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add defensive timeouts to wait_for_completion() calls in RDMA address resolution, and make them interruptible. Fix the timeout units to milliseconds (formerly jiffies) and move to private header. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: fix connect/reconnect resource leak.Tom Talpey2008-10-101-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | The RPC/RDMA code can leak RDMA connection manager endpoints in certain error cases on connect. Don't signal unwanted events, and be certain to destroy any allocated qp. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: return a consistent error, when connect fails.Tom Talpey2008-10-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The xprt_connect call path does not expect such errors as ECONNREFUSED to be returned from failed transport connection attempts, otherwise it translates them to EIO and signals fatal errors. For example, mount.nfs prints simply "internal error". Translate all such errors to ENOTCONN from RPC/RDMA to match sockets behavior. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: adhere to protocol for unpadded client trailing write chunks.Tom Talpey2008-10-103-2/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RPC/RDMA protocol allows clients and servers to avoid RDMA operations for data which is purely the result of XDR padding. On the client, automatically insert the necessary padding for such server replies, and optionally don't marshal such chunks. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: avoid an oops due to disconnect racing with async upcalls.Tom Talpey2008-10-101-11/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | RDMA disconnects yield an upcall from the RDMA connection manager, which can race with rpc transport close, e.g. on ^C of a mount. Ensure any rdma cm_id and qp are fully destroyed before continuing. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: maintain the RPC task bytes-sent statistic.Tom Talpey2008-10-101-0/+1
| | | | | | | | | | | | | | | Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: suppress retransmit on RPC/RDMA clients.Tom Talpey2008-10-103-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | An RPC/RDMA client cannot retransmit on an unbroken connection, doing so violates its flow control with the server. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: fix connection IRD/ORD settingTom Tucker2008-10-101-37/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This logic sets the connection parameter that configures the local device and informs the remote peer how many concurrent incoming RDMA_READ requests are supported. The original logic didn't really do what was intended for two reasons: - The max number supported by the device is typically smaller than any one factor in the calculation used, and - The field in the connection parameter structure where the value is stored is a u8 and always overflows for the default settings. So what really happens is the value requested for responder resources is the left over 8 bits from the "desired value". If the desired value happened to be a multiple of 256, the result was zero and it wouldn't connect at all. Given the above and the fact that max_requests is almost always larger than the max responder resources supported by the adapter, this patch simplifies this logic and simply requests the max supported by the device, subject to a reasonable limit. This bug was found by Jim Schutt at Sandia. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: support FRMR client memory registration.Tom Talpey2008-10-102-6/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Configure, detect and use "fastreg" support from IB/iWARP verbs layer to perform RPC/RDMA memory registration. Make FRMR the default memreg mode (will fall back if not supported by the selected RDMA adapter). This allows full and optimal operation over the cxgb3 adapter, and others. Signed-off-by: Tom Talpey <talpey@netapp.com> Acked-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: check selected memory registration mode at runtime.Tom Talpey2008-10-101-15/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At transport creation, check for, and use, any local dma lkey. Then, check that the selected memory registration mode is in fact supported by the RDMA adapter selected for the mount. Fall back to best alternative if not. Signed-off-by: Tom Talpey <talpey@netapp.com> Acked-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: add data types and new FRMR memory registration enum.Tom Talpey2008-10-102-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | Internal RPC/RDMA structure updates in preparation for FRMR support. Signed-off-by: Tom Talpey <talpey@netapp.com> Acked-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * RPC/RDMA: refactor the inline memory registration code.Tom Talpey2008-10-101-158/+207
| | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor the memory registration and deregistration routines. This saves stack space, makes the code more readable and prepares to add the new FRMR registration methods. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * NFS: fix nfs_parse_ip_address() corner caseChuck Lever2008-10-101-11/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bruce observed that nfs_parse_ip_address() will successfully parse an IPv6 address that looks like this: "::1%" A scope delimiter is present, but there is no scope ID following it. This is harmless, as it would simply set the scope ID to zero. However, in some cases we would like to flag this as an improperly formed address. We are now also careful to reject addresses where garbage follows the address (up to the length of the string), instead of ignoring the non-address characters; and where the scope ID is nonsense (not a valid device name, but also not numeric). Before, both of these cases would result in a harmless zero scope ID. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * NFS: Cleanup nfs_set_portJ. Bruce Fields2008-10-101-10/+9
| | | | | | | | | | | | | | | Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * NFS: Fix attribute updatesTrond Myklebust2008-10-091-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a regression seen when running the Connectathon testsuite against an ext3 filesystem. The reason was that the inode was constantly being marked as 'just updated' by the jiffy wraparound test. This again meant that newer GETATTR calls were failing to pass the nfs_inode_attrs_need_update() test unless the changes caused a ctime update on the server, since they were perceived as having been started before the latest inode update. Given that nfs_inode_attrs_need_update() already checks for wraparound of nfsi->last_updated, we can drop the buggy "protection" in nfs_update_inode(). Also make a slight micro-optimisation of nfs_inode_attrs_need_update(): we are more often going to see time_after(fattr->time_start, nfsi->last_updated) be true, rather than seeing an update of ctime/size, so put that test first to ensure that we optimise away the ctime/size tests. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * NFS: Save padding bytes in struct nfs4_setclientidTrond Myklebust2008-10-081-1/+1
| | | | | | | | | | | | | | | | | | | | | Peter Staubach suggested reducing NFS4_SETCLIENTID_NAMELEN by one byte so as to avoid 7 bytes of unnecessary padding. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * sunrpc: fix oops in rpc_create when the mount namespace is unsharedCedric Le Goater2008-10-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a system with nfs mounts, if a task unshares its mount namespace, a oops can occur when the system is rebooted if the task is the last to unreference the nfs mount. It will try to create a rpc request using utsname() which has been invalidated by free_nsproxy(). The patch fixes the issue by using the global init_utsname() which is always valid. the capability of identifying rpc clients per uts namespace stills needs some extra work so this should not be a problem. BUG: unable to handle kernel NULL pointer dereference at 00000004 IP: [<c024c9ab>] rpc_create+0x332/0x42f Oops: 0000 [#1] DEBUG_PAGEALLOC Pid: 1857, comm: uts-oops Not tainted (2.6.27-rc5-00319-g7686ad5 #4) EIP: 0060:[<c024c9ab>] EFLAGS: 00210287 CPU: 0 EIP is at rpc_create+0x332/0x42f EAX: 00000000 EBX: df26adf0 ECX: c0251887 EDX: 00000001 ESI: df26ae58 EDI: c02f293c EBP: dda0fc9c ESP: dda0fc2c DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 Process uts-oops (pid: 1857, ti=dda0e000 task=dd9a0778 task.ti=dda0e000) Stack: c0104532 dda0fffc dda0fcac dda0e000 dda0e000 dd93b7f0 00000009 c02f2880 df26aefc dda0fc68 c01096b7 00000000 c0266ee0 c039a070 c039a070 dda0fc74 c012ca67 c039a064 dda0fc8c c012cb20 c03daf74 00000011 00000000 c0275c90 Call Trace: [<c0104532>] ? dump_trace+0xc2/0xe2 [<c01096b7>] ? save_stack_trace+0x1c/0x3a [<c012ca67>] ? save_trace+0x37/0x8c [<c012cb20>] ? add_lock_to_list+0x64/0x96 [<c0256fc4>] ? rpcb_register_call+0x62/0xbb [<c02570c8>] ? rpcb_register+0xab/0xb3 [<c0252f4d>] ? svc_register+0xb4/0x128 [<c0253114>] ? svc_destroy+0xec/0x103 [<c02531b2>] ? svc_exit_thread+0x87/0x8d [<c01a75cd>] ? lockd_down+0x61/0x81 [<c01a577b>] ? nlmclnt_done+0xd/0xf [<c01941fe>] ? nfs_destroy_server+0x14/0x16 [<c0194328>] ? nfs_free_server+0x4c/0xaa [<c019a066>] ? nfs_kill_super+0x23/0x27 [<c0158585>] ? deactivate_super+0x3f/0x51 [<c01695d1>] ? mntput_no_expire+0x95/0xb4 [<c016965b>] ? release_mounts+0x6b/0x7a [<c01696cc>] ? __put_mnt_ns+0x62/0x70 [<c0127501>] ? free_nsproxy+0x25/0x80 [<c012759a>] ? switch_task_namespaces+0x3e/0x43 [<c01275a9>] ? exit_task_namespaces+0xa/0xc [<c0117fed>] ? do_exit+0x4fd/0x666 [<c01181b3>] ? do_group_exit+0x5d/0x83 [<c011fa8c>] ? get_signal_to_deliver+0x2c8/0x2e0 [<c0102630>] ? do_notify_resume+0x69/0x700 [<c011d85a>] ? do_sigaction+0x134/0x145 [<c0127205>] ? hrtimer_nanosleep+0x8f/0xce [<c0126d1a>] ? hrtimer_wakeup+0x0/0x1c [<c0103488>] ? work_notifysig+0x13/0x1b ======================= Code: 70 20 68 cb c1 2c c0 e8 75 4e 01 00 8b 83 ac 00 00 00 59 3d 00 f0 ff ff 5f 77 63 eb 57 a1 00 80 2d c0 8b 80 a8 02 00 00 8d 73 68 <8b> 40 04 83 c0 45 e8 41 46 f7 ff ba 20 00 00 00 83 f8 21 0f 4c EIP: [<c024c9ab>] rpc_create+0x332/0x42f SS:ESP 0068:dda0fc2c Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>