| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull more documentation updates from Jonathan Corbet:
"A few late-arriving fixes, plus two more significant changes that were
*almost* ready at the beginning of the merge window:
- A new document on debugging techniques from Sebastian Fricke
- A clarification on MODULE_LICENSE terms meant to head off the sort
of confusion that led to the recent Tuxedo Computers mess"
* tag 'docs-6.13-2' of git://git.lwn.net/linux:
docs: Add debugging guide for the media subsystem
docs: Add debugging section to process
docs/licensing: Clarify wording about "GPL" and "Proprietary"
docs: core-api/gfp_mask-from-fs-io: indicate that vmalloc supports GFP_NOFS/GFP_NOIO
Documentation: kernel-doc: enumerate identifier *type*s
Documentation: pwrseq: Fix trivial misspellings
Documentation: filesystems: update filename extensions
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Provide a guide for developers on how to debug code with a focus on the
media subsystem. This document aims to provide a rough overview over the
possibilities and a rational to help choosing the right tool for the
given circumstances.
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241028-media_docs_improve_v3-v3-2-edf5c5b3746f@collabora.com
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This idea was formed after noticing that new developers experience
certain difficulty to navigate within the multitude of different
debugging options in the Kernel and while there often is good
documentation for the tools, the developer has to know first that they
exist and where to find them.
Add a general debugging section to the Kernel documentation, as an
easily locatable entry point to other documentation and as a general
guideline for the topic.
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241028-media_docs_improve_v3-v3-1-edf5c5b3746f@collabora.com
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There are currently some doubts about out-of-tree kernel modules licensed
under GPLv3 and if they are supposed to be able to use symbols exported
using EXPORT_SYMBOL_GPL.
Clarify that "Proprietary" means anything non-GPL2 even though the
license might be an open source license. Also disambiguate "GPL
compatible" to "GPLv2 compatible".
Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241115103842.585207-2-ukleinek@kernel.org
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
GFP_NOFS/GFP_NOIO
After the commit 451769ebb7e79 ("mm/vmalloc: alloc GFP_NO{FS,IO} for
vmalloc") in v5.17 it is now safe to use GFP_NOFS/GFP_NOIO flags
in [k]vmalloc, let's reflect it in documentation.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241119093922.567138-1-ptikhomirov@virtuozzo.com
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Explain that a kernel-doc :identifiers: line can refer to a struct,
union, enum, or typedef as well as functions.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241119203201.110953-1-rdunlap@infradead.org
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Use proper spelling for 'discrete'. When at it, capitalize 'Linux',
which is common practice in the documentation.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241120-pwrseq-doc-trivial-fixes-v1-1-19a70f4dd156@gmail.com
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Update references to most txt files to rst files.
Update one reference to an md file to a rst file.
Update one file path to its current location.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org
Cc: Ian Kent <raven@themaw.net>
Cc: autofs@vger.kernel.org
Cc: Alexander Aring <aahringo@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: gfs2@lists.linux.dev
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Theodore Y. Ts'o <tytso@mit.edu>
Cc: fsverity@lists.linux.dev
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: ocfs2-devel@lists.linux.dev
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241120055246.158368-1-rdunlap@infradead.org
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull ecryptfs mount api conversion from Christian Brauner:
"Convert ecryptfs to the new mount api"
* tag 'vfs-6.13.ecryptfs.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
ecryptfs: Fix spelling mistake "validationg" -> "validating"
ecryptfs: Convert ecryptfs to use the new mount API
ecryptfs: Factor out mount option validation
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There is a spelling mistake in an error message literal string. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20241108112509.109891-1-colin.i.king@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Eric Sandeen <sandeen@redhat.com> says:
This is lightly tested with the kernel tests present in ecryptfs-utils,
but it could certainly use a bit more testing and review, particularly
with invalid mount option sets.
This one is a little unique compared to other filesystems in that I
allocate both an fs context and the *sbi in .init_fs_context; the *sbi
is long-lived, and the context is only present during the initial mount.
Allocating sbi with the filesystem context means we can set options
into it directly, rather than needing to do it after parsing. And it's
particularly simple to do it this way given that there is no remount.
* patches from https://lore.kernel.org/r/20241028143359.605061-1-sandeen@redhat.com:
ecryptfs: Convert ecryptfs to use the new mount API
ecryptfs: Factor out mount option validation
Link: https://lore.kernel.org/r/20241028143359.605061-1-sandeen@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Convert ecryptfs to the new mount API.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Link: https://lore.kernel.org/r/20241028143359.605061-3-sandeen@redhat.com
Acked-by: Tyler Hicks <code@tyhicks.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Under the new mount API, mount options are parsed one at a time.
Any validation that examines multiple options must be done after parsing
is complete, so factor out a ecryptfs_validate_options() which can be
called separately.
To facilitate this, temporarily move the local variables that tracked
whether various options have been set in the parsing function, into the
ecryptfs_mount_crypt_stat structure so that they can be examined later.
These will be moved to a more ephemeral struct in the mount api conversion
patch to follow.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Link: https://lore.kernel.org/r/20241028143359.605061-2-sandeen@redhat.com
Acked-by: Tyler Hicks <code@tyhicks.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs exportfs updates from Christian Brauner:
"This contains work to bring NFS connectable file handles to userspace
servers.
The name_to_handle_at() system call is extended to encode connectable
file handles. Such file handles can be resolved to an open file with a
connected path. So far userspace NFS servers couldn't make use of this
functionality even though the kernel does already support it. This is
achieved by introducing a new flag for name_to_handle_at().
Similarly, the open_by_handle_at() system call is tought to understand
connectable file handles explicitly created via name_to_handle_at()"
* tag 'vfs-6.13.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: open_by_handle_at() support for decoding "explicit connectable" file handles
fs: name_to_handle_at() support for "explicit connectable" file handles
fs: prepare for "explicit connectable" file handles
|
| |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Amir Goldstein <amir73il@gmail.com> says:
These patches bring the NFS connectable file handles feature to
userspace servers.
They rely on Christian's and Aleksa's changes recently merged to v6.12.
The API I chose for encoding conenctable file handles is pretty
conventional (AT_HANDLE_CONNECTABLE).
open_by_handle_at(2) does not have AT_ flags argument, but also, I find
it more useful API that encoding a connectable file handle can mandate
the resolving of a connected fd, without having to opt-in for a
connected fd independently.
I chose to implemnent this by using upper bits in the handle type field
It may be that out-of-tree filesystems return a handle type with upper
bits set, but AFAIK, no in-tree filesystem does that.
I added some warnings just in case we encouter that.
I have written an fstest [1] and a man page draft [2] for the feature.
[1] https://github.com/amir73il/xfstests/commits/connectable-fh/
[2] https://github.com/amir73il/man-pages/commits/connectable-fh/
* patches from https://lore.kernel.org/r/20241011090023.655623-1-amir73il@gmail.com:
fs: open_by_handle_at() support for decoding "explicit connectable" file handles
fs: name_to_handle_at() support for "explicit connectable" file handles
fs: prepare for "explicit connectable" file handles
Link: https://lore.kernel.org/r/20241011090023.655623-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Teach open_by_handle_at(2) about the type format of "explicit connectable"
file handles that were created using the AT_HANDLE_CONNECTABLE flag to
name_to_handle_at(2).
When decoding an "explicit connectable" file handles, name_to_handle_at(2)
should fail if it cannot open a "connected" fd with known path, which is
accessible (to capable user) from mount fd path.
Note that this does not check if the path is accessible to the calling
user, just that it is accessible wrt the mount namesapce, so if there
is no "connected" alias, or if parts of the path are hidden in the
mount namespace, open_by_handle_at(2) will return -ESTALE.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20241011090023.655623-4-amir73il@gmail.com
Fixes: 570df4e9c23f ("ceph: snapshot nfs re-export")
Acked-by:
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
nfsd encodes "connectable" file handles for the subtree_check feature,
which can be resolved to an open file with a connected path.
So far, userspace nfs server could not make use of this functionality.
Introduce a new flag AT_HANDLE_CONNECTABLE to name_to_handle_at(2).
When used, the encoded file handle is "explicitly connectable".
The "explicitly connectable" file handle sets bits in the high 16bit of
the handle_type field, so open_by_handle_at(2) will know that it needs
to open a file with a connected path.
old kernels will now recognize the handle_type with high bits set,
so "explicitly connectable" file handles cannot be decoded by
open_by_handle_at(2) on old kernels.
The flag AT_HANDLE_CONNECTABLE is not allowed together with either
AT_HANDLE_FID or AT_EMPTY_PATH.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20241011090023.655623-3-amir73il@gmail.com
Fixes: 570df4e9c23f ("ceph: snapshot nfs re-export")
Acked-by:
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
| |/ / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We would like to use the high 16bit of the handle_type field to encode
file handle traits, such as "connectable".
In preparation for this change, make sure that filesystems do not return
a handle_type value with upper bits set and that the open_by_handle_at(2)
syscall rejects these handle types.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20241011090023.655623-2-amir73il@gmail.com
Fixes: 570df4e9c23f ("ceph: snapshot nfs re-export")
Acked-by:
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pid_namespace rust bindings from Christian Brauner:
"This contains my Rust bindings for pid namespaces needed for various
rust drivers. Here's a description of the basic C semantics and how
they are mapped to Rust.
The pid namespace of a task doesn't ever change once the task is
alive. A unshare(CLONE_NEWPID) or setns(fd_pidns/pidfd, CLONE_NEWPID)
will not have an effect on the calling task's pid namespace. It will
only effect the pid namespace of children created by the calling task.
This invariant guarantees that after having acquired a reference to a
task's pid namespace it will remain unchanged.
When a task has exited and been reaped release_task() will be called.
This will set the pid namespace of the task to NULL. So retrieving the
pid namespace of a task that is dead will return NULL. Note, that
neither holding the RCU lock nor holding a reference count to the task
will prevent release_task() from being called.
In order to retrieve the pid namespace of a task the
task_active_pid_ns() function can be used. There are two cases to
consider:
(1) retrieving the pid namespace of the current task
(2) retrieving the pid namespace of a non-current task
From system call context retrieving the pid namespace for case (1) is
always safe and requires neither RCU locking nor a reference count to
be held. Retrieving the pid namespace after release_task() for current
will return NULL but no codepath like that is exposed to Rust.
Retrieving the pid namespace from system call context for (2) requires
RCU protection. Accessing a pid namespace outside of RCU protection
requires a reference count that must've been acquired while holding
the RCU lock. Note that accessing a non-current task means NULL can be
returned as the non-current task could have already passed through
release_task().
To retrieve (1) the current_pid_ns!() macro should be used. It ensures
that the returned pid namespace cannot outlive the calling scope. The
associated current_pid_ns() function should not be called directly as
it could be abused to created an unbounded lifetime for the pid
namespace. The current_pid_ns!() macro allows Rust to handle the
common case of accessing current's pid namespace without RCU
protection and without having to acquire a reference count.
For (2) the task_get_pid_ns() method must be used. This will always
acquire a reference on the pid namespace and will return an Option to
force the caller to explicitly handle the case where pid namespace is
None. Something that tends to be forgotten when doing the equivalent
operation in C.
Missing RCU primitives make it difficult to perform operations that
are otherwise safe without holding a reference count as long as RCU
protection is guaranteed. But it is not important currently. But we do
want it in the future.
Note that for (2) the required RCU protection around calling
task_active_pid_ns() synchronizes against putting the last reference
of the associated struct pid of task->thread_pid. The struct pid
stored in that field is used to retrieve the pid namespace of the
caller. When release_task() is called task->thread_pid will be NULLed
and put_pid() on said struct pid will be delayed in free_pid() via
call_rcu() allowing everyone with an RCU protected access to the
struct pid acquired from task->thread_pid to finish"
* tag 'vfs-6.13.rust.pid_namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
rust: add PidNamespace
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
The `PidNamespace` of a `Task` doesn't ever change once the `Task` is
alive. A `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)`
will not have an effect on the calling `Task`'s pid namespace. It will
only effect the pid namespace of children created by the calling `Task`.
This invariant guarantees that after having acquired a reference to a
`Task`'s pid namespace it will remain unchanged.
When a task has exited and been reaped `release_task()` will be called.
This will set the `PidNamespace` of the task to `NULL`. So retrieving
the `PidNamespace` of a task that is dead will return `NULL`. Note, that
neither holding the RCU lock nor holding a referencing count to the
`Task` will prevent `release_task()` being called.
In order to retrieve the `PidNamespace` of a `Task` the
`task_active_pid_ns()` function can be used. There are two cases to
consider:
(1) retrieving the `PidNamespace` of the `current` task (2) retrieving
the `PidNamespace` of a non-`current` task
From system call context retrieving the `PidNamespace` for case (1) is
always safe and requires neither RCU locking nor a reference count to be
held. Retrieving the `PidNamespace` after `release_task()` for current
will return `NULL` but no codepath like that is exposed to Rust.
Retrieving the `PidNamespace` from system call context for (2) requires
RCU protection. Accessing `PidNamespace` outside of RCU protection
requires a reference count that must've been acquired while holding the
RCU lock. Note that accessing a non-`current` task means `NULL` can be
returned as the non-`current` task could have already passed through
`release_task()`.
To retrieve (1) the `current_pid_ns!()` macro should be used which
ensure that the returned `PidNamespace` cannot outlive the calling
scope. The associated `current_pid_ns()` function should not be called
directly as it could be abused to created an unbounded lifetime for
`PidNamespace`. The `current_pid_ns!()` macro allows Rust to handle the
common case of accessing `current`'s `PidNamespace` without RCU
protection and without having to acquire a reference count.
For (2) the `task_get_pid_ns()` method must be used. This will always
acquire a reference on `PidNamespace` and will return an `Option` to
force the caller to explicitly handle the case where `PidNamespace` is
`None`, something that tends to be forgotten when doing the equivalent
operation in `C`. Missing RCU primitives make it difficult to perform
operations that are otherwise safe without holding a reference count as
long as RCU protection is guaranteed. But it is not important currently.
But we do want it in the future.
Note for (2) the required RCU protection around calling
`task_active_pid_ns()` synchronizes against putting the last reference
of the associated `struct pid` of `task->thread_pid`. The `struct pid`
stored in that field is used to retrieve the `PidNamespace` of the
caller. When `release_task()` is called `task->thread_pid` will be
`NULL`ed and `put_pid()` on said `struct pid` will be delayed in
`free_pid()` via `call_rcu()` allowing everyone with an RCU protected
access to the `struct pid` acquired from `task->thread_pid` to finish.
Link: https://lore.kernel.org/r/20241002-brauner-rust-pid_namespace-v5-1-a90e70d44fde@kernel.org
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Pull nfsd updates from Chuck Lever:
"Jeff Layton contributed a scalability improvement to NFSD's NFSv4
backchannel session implementation. This improvement is intended to
increase the rate at which NFSD can safely recall NFSv4 delegations
from clients, to avoid the need to revoke them. Revoking requires a
slow state recovery process.
A wide variety of bug fixes and other incremental improvements make up
the bulk of commits in this series. As always I am grateful to the
NFSD contributors, reviewers, testers, and bug reporters who
participated during this cycle"
* tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (72 commits)
nfsd: allow for up to 32 callback session slots
nfs_common: must not hold RCU while calling nfsd_file_put_local
nfsd: get rid of include ../internal.h
nfsd: fix nfs4_openowner leak when concurrent nfsd4_open occur
NFSD: Add nfsd4_copy time-to-live
NFSD: Add a laundromat reaper for async copy state
NFSD: Block DESTROY_CLIENTID only when there are ongoing async COPY operations
NFSD: Handle an NFS4ERR_DELAY response to CB_OFFLOAD
NFSD: Free async copy information in nfsd4_cb_offload_release()
NFSD: Fix nfsd4_shutdown_copy()
NFSD: Add a tracepoint to record canceled async COPY operations
nfsd: make nfsd4_session->se_flags a bool
nfsd: remove nfsd4_session->se_bchannel
nfsd: make use of warning provided by refcount_t
nfsd: Don't fail OP_SETCLIENTID when there are too many clients.
svcrdma: fix miss destroy percpu_counter in svc_rdma_proc_init()
xdrgen: Remove program_stat_to_errno() call sites
xdrgen: Update the files included in client-side source code
xdrgen: Remove check for "nfs_ok" in C templates
xdrgen: Remove tracepoint call site
...
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
nfsd currently only uses a single slot in the callback channel, which is
proving to be a bottleneck in some cases. Widen the callback channel to
a max of 32 slots (subject to the client's target_maxreqs value).
Change the cb_holds_slot boolean to an integer that tracks the current
slot number (with -1 meaning "unassigned"). Move the callback slot
tracking info into the session. Add a new u32 that acts as a bitmap to
track which slots are in use, and a u32 to track the latest callback
target_slotid that the client reports. To protect the new fields, add
a new per-session spinlock (the se_lock). Fix nfsd41_cb_get_slot to always
search for the lowest slotid (using ffs()).
Finally, convert the session->se_cb_seq_nr field into an array of
ints and add the necessary handling to ensure that the seqids get
reset when the slot table grows after shrinking.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Move holding the RCU from nfs_to_nfsd_file_put_local to
nfs_to_nfsd_net_put. It is the call to nfs_to->nfsd_serv_put that
requires the RCU anyway (the puts for nfsd_file and netns were
combined to avoid an extra indirect reference but that
micro-optimization isn't possible now).
This fixes xfstests generic/013 and it triggering:
"Voluntary context switch within RCU read-side critical section!"
[ 143.545738] Call Trace:
[ 143.546206] <TASK>
[ 143.546625] ? show_regs+0x6d/0x80
[ 143.547267] ? __warn+0x91/0x140
[ 143.547951] ? rcu_note_context_switch+0x496/0x5d0
[ 143.548856] ? report_bug+0x193/0x1a0
[ 143.549557] ? handle_bug+0x63/0xa0
[ 143.550214] ? exc_invalid_op+0x1d/0x80
[ 143.550938] ? asm_exc_invalid_op+0x1f/0x30
[ 143.551736] ? rcu_note_context_switch+0x496/0x5d0
[ 143.552634] ? wakeup_preempt+0x62/0x70
[ 143.553358] __schedule+0xaa/0x1380
[ 143.554025] ? _raw_spin_unlock_irqrestore+0x12/0x40
[ 143.554958] ? try_to_wake_up+0x1fe/0x6b0
[ 143.555715] ? wake_up_process+0x19/0x20
[ 143.556452] schedule+0x2e/0x120
[ 143.557066] schedule_preempt_disabled+0x19/0x30
[ 143.557933] rwsem_down_read_slowpath+0x24d/0x4a0
[ 143.558818] ? xfs_efi_item_format+0x50/0xc0 [xfs]
[ 143.559894] down_read+0x4e/0xb0
[ 143.560519] xlog_cil_commit+0x1b2/0xbc0 [xfs]
[ 143.561460] ? _raw_spin_unlock+0x12/0x30
[ 143.562212] ? xfs_inode_item_precommit+0xc7/0x220 [xfs]
[ 143.563309] ? xfs_trans_run_precommits+0x69/0xd0 [xfs]
[ 143.564394] __xfs_trans_commit+0xb5/0x330 [xfs]
[ 143.565367] xfs_trans_roll+0x48/0xc0 [xfs]
[ 143.566262] xfs_defer_trans_roll+0x57/0x100 [xfs]
[ 143.567278] xfs_defer_finish_noroll+0x27a/0x490 [xfs]
[ 143.568342] xfs_defer_finish+0x1a/0x80 [xfs]
[ 143.569267] xfs_bunmapi_range+0x4d/0xb0 [xfs]
[ 143.570208] xfs_itruncate_extents_flags+0x13d/0x230 [xfs]
[ 143.571353] xfs_free_eofblocks+0x12e/0x190 [xfs]
[ 143.572359] xfs_file_release+0x12d/0x140 [xfs]
[ 143.573324] __fput+0xe8/0x2d0
[ 143.573922] __fput_sync+0x1d/0x30
[ 143.574574] nfsd_filp_close+0x33/0x60 [nfsd]
[ 143.575430] nfsd_file_free+0x96/0x150 [nfsd]
[ 143.576274] nfsd_file_put+0xf7/0x1a0 [nfsd]
[ 143.577104] nfsd_file_put_local+0x18/0x30 [nfsd]
[ 143.578070] nfs_close_local_fh+0x101/0x110 [nfs_localio]
[ 143.579079] __put_nfs_open_context+0xc9/0x180 [nfs]
[ 143.580031] nfs_file_clear_open_context+0x4a/0x60 [nfs]
[ 143.581038] nfs_file_release+0x3e/0x60 [nfs]
[ 143.581879] __fput+0xe8/0x2d0
[ 143.582464] __fput_sync+0x1d/0x30
[ 143.583108] __x64_sys_close+0x41/0x80
[ 143.583823] x64_sys_call+0x189a/0x20d0
[ 143.584552] do_syscall_64+0x64/0x170
[ 143.585240] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 143.586185] RIP: 0033:0x7f3c5153efd7
Fixes: 65f2a5c36635 ("nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
added back in 2015 for the sake of vfs_clone_file_range(),
which is in linux/fs.h these days
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The action force umount(umount -f) will attempt to kill all rpc_task even
umount operation may ultimately fail if some files remain open.
Consequently, if an action attempts to open a file, it can potentially
send two rpc_task to nfs server.
NFS CLIENT
thread1 thread2
open("file")
...
nfs4_do_open
_nfs4_do_open
_nfs4_open_and_get_state
_nfs4_proc_open
nfs4_run_open_task
/* rpc_task1 */
rpc_run_task
rpc_wait_for_completion_task
umount -f
nfs_umount_begin
rpc_killall_tasks
rpc_signal_task
rpc_task1 been wakeup
and return -512
_nfs4_do_open // while loop
...
nfs4_run_open_task
/* rpc_task2 */
rpc_run_task
rpc_wait_for_completion_task
While processing an open request, nfsd will first attempt to find or
allocate an nfs4_openowner. If it finds an nfs4_openowner that is not
marked as NFS4_OO_CONFIRMED, this nfs4_openowner will released. Since
two rpc_task can attempt to open the same file simultaneously from the
client to server, and because two instances of nfsd can run
concurrently, this situation can lead to lots of memory leak.
Additionally, when we echo 0 to /proc/fs/nfsd/threads, warning will be
triggered.
NFS SERVER
nfsd1 nfsd2 echo 0 > /proc/fs/nfsd/threads
nfsd4_open
nfsd4_process_open1
find_or_alloc_open_stateowner
// alloc oo1, stateid1
nfsd4_open
nfsd4_process_open1
find_or_alloc_open_stateowner
// find oo1, without NFS4_OO_CONFIRMED
release_openowner
unhash_openowner_locked
list_del_init(&oo->oo_perclient)
// cannot find this oo
// from client, LEAK!!!
alloc_stateowner // alloc oo2
nfsd4_process_open2
init_open_stateid
// associate oo1
// with stateid1, stateid1 LEAK!!!
nfs4_get_vfs_file
// alloc nfsd_file1 and nfsd_file_mark1
// all LEAK!!!
nfsd4_process_open2
...
write_threads
...
nfsd_destroy_serv
nfsd_shutdown_net
nfs4_state_shutdown_net
nfs4_state_destroy_net
destroy_client
__destroy_client
// won't find oo1!!!
nfsd_shutdown_generic
nfsd_file_cache_shutdown
kmem_cache_destroy
for nfsd_file_slab
and nfsd_file_mark_slab
// bark since nfsd_file1
// and nfsd_file_mark1
// still alive
=======================================================================
BUG nfsd_file (Not tainted): Objects remaining in nfsd_file on
__kmem_cache_shutdown()
-----------------------------------------------------------------------
Slab 0xffd4000004438a80 objects=34 used=1 fp=0xff11000110e2ad28
flags=0x17ffffc0000240(workingset|head|node=0|zone=2|lastcpupid=0x1fffff)
CPU: 4 UID: 0 PID: 757 Comm: sh Not tainted 6.12.0-rc6+ #19
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x53/0x70
slab_err+0xb0/0xf0
__kmem_cache_shutdown+0x15c/0x310
kmem_cache_destroy+0x66/0x160
nfsd_file_cache_shutdown+0xac/0x210 [nfsd]
nfsd_destroy_serv+0x251/0x2a0 [nfsd]
nfsd_svc+0x125/0x1e0 [nfsd]
write_threads+0x16a/0x2a0 [nfsd]
nfsctl_transaction_write+0x74/0xa0 [nfsd]
vfs_write+0x1ae/0x6d0
ksys_write+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Disabling lock debugging due to kernel taint
Object 0xff11000110e2ac38 @offset=3128
Allocated in nfsd_file_do_acquire+0x20f/0xa30 [nfsd] age=1635 cpu=3
pid=800
nfsd_file_do_acquire+0x20f/0xa30 [nfsd]
nfsd_file_acquire_opened+0x5f/0x90 [nfsd]
nfs4_get_vfs_file+0x4c9/0x570 [nfsd]
nfsd4_process_open2+0x713/0x1070 [nfsd]
nfsd4_open+0x74b/0x8b0 [nfsd]
nfsd4_proc_compound+0x70b/0xc20 [nfsd]
nfsd_dispatch+0x1b4/0x3a0 [nfsd]
svc_process_common+0x5b8/0xc50 [sunrpc]
svc_process+0x2ab/0x3b0 [sunrpc]
svc_handle_xprt+0x681/0xa20 [sunrpc]
nfsd+0x183/0x220 [nfsd]
kthread+0x199/0x1e0
ret_from_fork+0x31/0x60
ret_from_fork_asm+0x1a/0x30
Add nfs4_openowner_unhashed to help found unhashed nfs4_openowner, and
break nfsd4_open process to fix this problem.
Cc: stable@vger.kernel.org # v5.4+
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Keep async copy state alive for a few lease cycles after the copy
completes so that OFFLOAD_STATUS returns something meaningful.
This means that NFSD's client shutdown processing needs to purge
any of this state that happens to be waiting to die.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
RFC 7862 Section 4.8 states:
> A copy offload stateid will be valid until either (A) the client
> or server restarts or (B) the client returns the resource by
> issuing an OFFLOAD_CANCEL operation or the client replies to a
> CB_OFFLOAD operation.
Instead of releasing async copy state when the CB_OFFLOAD callback
completes, now let it live until the next laundromat run after the
callback completes.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Currently __destroy_client() consults the nfs4_client's async_copies
list to determine whether there are ongoing async COPY operations.
However, NFSD now keeps copy state in that list even when the
async copy has completed, to enable OFFLOAD_STATUS to find the
COPY results for a while after the COPY has completed.
DESTROY_CLIENTID should not be blocked if the client's async_copies
list contains state for only completed copy operations.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
RFC 7862 permits callback services to respond to CB_OFFLOAD with
NFS4ERR_DELAY. Currently NFSD drops the CB_OFFLOAD in that case.
To improve the reliability of COPY offload, NFSD should rather send
another CB_OFFLOAD completion notification.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
RFC 7862 Section 4.8 states:
> A copy offload stateid will be valid until either (A) the client
> or server restarts or (B) the client returns the resource by
> issuing an OFFLOAD_CANCEL operation or the client replies to a
> CB_OFFLOAD operation.
Currently, NFSD purges the metadata for an async COPY operation as
soon as the CB_OFFLOAD callback has been sent. It does not wait even
for the client's CB_OFFLOAD response, as the paragraph above
suggests that it should.
This makes the OFFLOAD_STATUS operation ineffective during the
window between the completion of an asynchronous COPY and the
server's receipt of the corresponding CB_OFFLOAD response. This is
important if, for example, the client responds with NFS4ERR_DELAY,
or the transport is lost before the server receives the response. A
client might use OFFLOAD_STATUS to query the server about the still
pending asynchronous COPY, but NFSD will respond to OFFLOAD_STATUS
as if it had never heard of the presented copy stateid.
This patch starts to address this issue by extending the lifetime of
struct nfsd4_copy at least until the server has seen the client's
CB_OFFLOAD response, or the CB_OFFLOAD has timed out.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
nfsd4_shutdown_copy() is just this:
while ((copy = nfsd4_get_copy(clp)) != NULL)
nfsd4_stop_copy(copy);
nfsd4_get_copy() bumps @copy's reference count, preventing
nfsd4_stop_copy() from releasing @copy.
A while loop like this usually works by removing the first element
of the list, but neither nfsd4_get_copy() nor nfsd4_stop_copy()
alters the async_copies list.
Best I can tell, then, is that nfsd4_shutdown_copy() continues to
loop until other threads manage to remove all the items from this
list. The spinning loop blocks shutdown until these items are gone.
Possibly the reason we haven't seen this issue in the field is
because client_has_state() prevents __destroy_client() from calling
nfsd4_shutdown_copy() if there are any items on this list. In a
subsequent patch I plan to remove that restriction.
Fixes: e0639dc5805a ("NFSD introduce async copy feature")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
While this holds the flags from the CREATE_SESSION request, nothing
ever consults them. The only flag used is NFS4_SESSION_DEAD. Make it a
simple bool instead.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This field is written and is never consulted again. Remove it.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
refcount_t, by design, checks for unwanted situations and provides
warnings. It is rarely useful to have explicit warnings with refcount
usage.
In this case we have an explicit warning if a refcount_t reaches zero
when decremented. Simply using refcount_dec() will provide a similar
warning and also mark the refcount_t as saturated to avoid any possible
use-after-free.
This patch drops the warning and uses refcount_dec() instead of
refcount_dec_and_test().
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Failing OP_SETCLIENTID or OP_EXCHANGE_ID should only happen if there is
memory allocation failure. Putting a hard limit on the number of
clients is not really helpful as it will either happen too early and
prevent clients that the server can easily handle, or too late and
allow clients when the server is swamped.
The calculated limit is still useful for expiring courtesy clients where
there are "too many" clients, but it shouldn't prevent the creation of
active clients.
Testing of lots of clients against small-mem servers reports repeated
NFS4ERR_DELAY responses which doesn't seem helpful. There may have been
reports of similar problems in production use.
Also remove an outdated comment - we do use a slab cache.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
There's issue as follows:
RPC: Registered rdma transport module.
RPC: Registered rdma backchannel transport module.
RPC: Unregistered rdma transport module.
RPC: Unregistered rdma backchannel transport module.
BUG: unable to handle page fault for address: fffffbfff80c609a
PGD 123fee067 P4D 123fee067 PUD 123fea067 PMD 10c624067 PTE 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
RIP: 0010:percpu_counter_destroy_many+0xf7/0x2a0
Call Trace:
<TASK>
__die+0x1f/0x70
page_fault_oops+0x2cd/0x860
spurious_kernel_fault+0x36/0x450
do_kern_addr_fault+0xca/0x100
exc_page_fault+0x128/0x150
asm_exc_page_fault+0x26/0x30
percpu_counter_destroy_many+0xf7/0x2a0
mmdrop+0x209/0x350
finish_task_switch.isra.0+0x481/0x840
schedule_tail+0xe/0xd0
ret_from_fork+0x23/0x80
ret_from_fork_asm+0x1a/0x30
</TASK>
If register_sysctl() return NULL, then svc_rdma_proc_cleanup() will not
destroy the percpu counters which init in svc_rdma_proc_init().
If CONFIG_HOTPLUG_CPU is enabled, residual nodes may be in the
'percpu_counters' list. The above issue may occur once the module is
removed. If the CONFIG_HOTPLUG_CPU configuration is not enabled, memory
leakage occurs.
To solve above issue just destroy all percpu counters when
register_sysctl() return NULL.
Fixes: 1e7e55731628 ("svcrdma: Restore read and write stats")
Fixes: 22df5a22462e ("svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter")
Fixes: df971cd853c0 ("svcrdma: Convert rdma_stat_recv to a per-CPU counter")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Refactor: Translating an on-the-wire value to a local host errno is
architecturally a job for the proc function, not the XDR decoder.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
In particular, client-side source code needs the definition of
"struct rpc_procinfo" and does not want header files that pull
in "struct svc_rqst". Otherwise, the source does not compile.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Obviously, "nfs_ok" is defined only for NFS protocols. Other XDR
protocols won't know "nfs_ok" from Adam.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This tracepoint was a "note to self" and is not operational. It is
added only to client-side code, which so far we haven't needed. It
will cause immediate breakage once we start generating client code,
though, so remove it now.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The last reference for `cache_head` can be reduced to zero in `c_show`
and `e_show`(using `rcu_read_lock` and `rcu_read_unlock`). Consequently,
`svc_export_put` and `expkey_put` will be invoked, leading to two
issues:
1. The `svc_export_put` will directly free ex_uuid. However,
`e_show`/`c_show` will access `ex_uuid` after `cache_put`, which can
trigger a use-after-free issue, shown below.
==================================================================
BUG: KASAN: slab-use-after-free in svc_export_show+0x362/0x430 [nfsd]
Read of size 1 at addr ff11000010fdc120 by task cat/870
CPU: 1 UID: 0 PID: 870 Comm: cat Not tainted 6.12.0-rc3+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x53/0x70
print_address_description.constprop.0+0x2c/0x3a0
print_report+0xb9/0x280
kasan_report+0xae/0xe0
svc_export_show+0x362/0x430 [nfsd]
c_show+0x161/0x390 [sunrpc]
seq_read_iter+0x589/0x770
seq_read+0x1e5/0x270
proc_reg_read+0xe1/0x140
vfs_read+0x125/0x530
ksys_read+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Allocated by task 830:
kasan_save_stack+0x20/0x40
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x8f/0xa0
__kmalloc_node_track_caller_noprof+0x1bc/0x400
kmemdup_noprof+0x22/0x50
svc_export_parse+0x8a9/0xb80 [nfsd]
cache_do_downcall+0x71/0xa0 [sunrpc]
cache_write_procfs+0x8e/0xd0 [sunrpc]
proc_reg_write+0xe1/0x140
vfs_write+0x1a5/0x6d0
ksys_write+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 868:
kasan_save_stack+0x20/0x40
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x37/0x50
kfree+0xf3/0x3e0
svc_export_put+0x87/0xb0 [nfsd]
cache_purge+0x17f/0x1f0 [sunrpc]
nfsd_destroy_serv+0x226/0x2d0 [nfsd]
nfsd_svc+0x125/0x1e0 [nfsd]
write_threads+0x16a/0x2a0 [nfsd]
nfsctl_transaction_write+0x74/0xa0 [nfsd]
vfs_write+0x1a5/0x6d0
ksys_write+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
2. We cannot sleep while using `rcu_read_lock`/`rcu_read_unlock`.
However, `svc_export_put`/`expkey_put` will call path_put, which
subsequently triggers a sleeping operation due to the following
`dput`.
=============================
WARNING: suspicious RCU usage
5.10.0-dirty #141 Not tainted
-----------------------------
...
Call Trace:
dump_stack+0x9a/0xd0
___might_sleep+0x231/0x240
dput+0x39/0x600
path_put+0x1b/0x30
svc_export_put+0x17/0x80
e_show+0x1c9/0x200
seq_read_iter+0x63f/0x7c0
seq_read+0x226/0x2d0
vfs_read+0x113/0x2c0
ksys_read+0xc9/0x170
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x67/0xd1
Fix these issues by using `rcu_work` to help release
`svc_expkey`/`svc_export`. This approach allows for an asynchronous
context to invoke `path_put` and also facilitates the freeing of
`uuid/exp/key` after an RCU grace period.
Fixes: 9ceddd9da134 ("knfsd: Allow lockless lookups of the exports")
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The function `c_show` was called with protection from RCU. This only
ensures that `cp` will not be freed. Therefore, the reference count for
`cp` can drop to zero, which will trigger a refcount use-after-free
warning when `cache_get` is called. To resolve this issue, use
`cache_get_rcu` to ensure that `cp` remains active.
------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 7 PID: 822 at lib/refcount.c:25
refcount_warn_saturate+0xb1/0x120
CPU: 7 UID: 0 PID: 822 Comm: cat Not tainted 6.12.0-rc3+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
RIP: 0010:refcount_warn_saturate+0xb1/0x120
Call Trace:
<TASK>
c_show+0x2fc/0x380 [sunrpc]
seq_read_iter+0x589/0x770
seq_read+0x1e5/0x270
proc_reg_read+0xe1/0x140
vfs_read+0x125/0x530
ksys_read+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The function `e_show` was called with protection from RCU. This only
ensures that `exp` will not be freed. Therefore, the reference count for
`exp` can drop to zero, which will trigger a refcount use-after-free
warning when `exp_get` is called. To resolve this issue, use
`cache_get_rcu` to ensure that `exp` remains active.
------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 3 PID: 819 at lib/refcount.c:25
refcount_warn_saturate+0xb1/0x120
CPU: 3 UID: 0 PID: 819 Comm: cat Not tainted 6.12.0-rc3+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
RIP: 0010:refcount_warn_saturate+0xb1/0x120
...
Call Trace:
<TASK>
e_show+0x20b/0x230 [nfsd]
seq_read_iter+0x589/0x770
seq_read+0x1e5/0x270
vfs_read+0x125/0x530
ksys_read+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: bf18f163e89c ("NFSD: Using exp_get for export getting")
Cc: stable@vger.kernel.org # 4.20+
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Since commit 75c7940d2a86 ("lockd: set missing fl_flags field when
retrieving args"), nlmsvc_retrieve_args() initializes the flc_flags
field. svcxdr_decode_lock() no longer needs to do this.
This clean up removes one dependency on the nlm_lock:fl field. No
behavior change is expected.
Analysis:
svcxdr_decode_lock() is called by:
nlm4svc_decode_testargs()
nlm4svc_decode_lockargs()
nlm4svc_decode_cancargs()
nlm4svc_decode_unlockargs()
nlm4svc_decode_testargs() is used by:
- NLMPROC4_TEST and NLMPROC4_TEST_MSG, which call nlmsvc_retrieve_args()
- NLMPROC4_GRANTED and NLMPROC4_GRANTED_MSG, which don't pass the
lock's file_lock to the generic lock API
nlm4svc_decode_lockargs() is used by:
- NLMPROC4_LOCK and NLM4PROC4_LOCK_MSG, which call nlmsvc_retrieve_args()
- NLMPROC4_UNLOCK and NLM4PROC4_UNLOCK_MSG, which call nlmsvc_retrieve_args()
- NLMPROC4_NM_LOCK, which calls nlmsvc_retrieve_args()
nlm4svc_decode_cancargs() is used by:
- NLMPROC4_CANCEL and NLMPROC4_CANCEL_MSG, which call nlmsvc_retrieve_args()
nlm4svc_decode_unlockargs() is used by:
- NLMPROC4_UNLOCK and NLMPROC4_UNLOCK_MSG, which call nlmsvc_retrieve_args()
All callers except GRANTED/GRANTED_MSG eventually call
nlmsvc_retrieve_args() before using nlm_lock::fl.c.flc_flags. Thus
this change is safe.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The nlm_cookie parameter has been unused since commit 09802fd2a8ca
("lockd: rip out deferred lock handling from testlock codepath").
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Clean up.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Since commit 103cc1fafee4 ("SUNRPC: Parametrize how much of argsize
should be zeroed") (and possibly long before that, even) all of the
memory underlying rqstp->rq_argp is zeroed already. There's no need
for the memset() in nlm4svc_decode_shareargs().
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Clean up: Looks like the last usage of this typedef was removed by
commit 026fec7e7c47 ("sunrpc: properly type pc_decode callbacks") in
2017.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
It's only current caller already length-checks the string, but let's
be safe.
Fixes: 0964a3d3f1aa ("[PATCH] knfsd: nfsd4 reboot dirname fix")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|