summaryrefslogtreecommitdiffstats
path: root/fs/locks.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* treewide: mark stuff as __ro_after_initAlexey Dobriyan2023-10-181-2/+2
| | | | | | | | | | | | | __read_mostly predates __ro_after_init. Many variables which are marked __read_mostly should have been __ro_after_init from day 1. Also, mark some stuff as "const" and "__init" while I'm at it. [akpm@linux-foundation.org: revert sysctl_nr_open_min, sysctl_nr_open_max changes due to arm warning] [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/4f6bb9c0-abba-4ee4-a7aa-89265e886817@p183 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* Merge tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds2023-09-011-7/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Chuck Lever: "I'm thrilled to announce that the Linux in-kernel NFS server now offers NFSv4 write delegations. A write delegation enables a client to cache data and metadata for a single file more aggressively, reducing network round trips and server workload. Many thanks to Dai Ngo for contributing this facility, and to Jeff Layton and Neil Brown for reviewing and testing it. This release also sees the removal of all support for DES- and triple-DES-based Kerberos encryption types in the kernel's SunRPC implementation. These encryption types have been deprecated by the Internet community for years and are considered insecure. This change affects both the in-kernel NFS client and server. The server's UDP and TCP socket transports have now fully adopted David Howells' new bio_vec iterator so that no more than one sendmsg() call is needed to transmit each RPC message. In particular, this helps kTLS optimize record boundaries when sending RPC-with-TLS replies, and it takes the server a baby step closer to handling file I/O via folios. We've begun work on overhauling the SunRPC thread scheduler to remove a costly linked-list walk when looking for an idle RPC service thread to wake. The pre-requisites are included in this release. Thanks to Neil Brown for his ongoing work on this improvement" * tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (56 commits) Documentation: Add missing documentation for EXPORT_OP flags SUNRPC: Remove unused declaration rpc_modcount() SUNRPC: Remove unused declarations NFSD: da_addr_body field missing in some GETDEVICEINFO replies SUNRPC: Remove return value of svc_pool_wake_idle_thread() SUNRPC: make rqst_should_sleep() idempotent() SUNRPC: Clean up svc_set_num_threads SUNRPC: Count ingress RPC messages per svc_pool SUNRPC: Deduplicate thread wake-up code SUNRPC: Move trace_svc_xprt_enqueue SUNRPC: Add enum svc_auth_status SUNRPC: change svc_xprt::xpt_flags bits to enum SUNRPC: change svc_rqst::rq_flags bits to enum SUNRPC: change svc_pool::sp_flags bits to enum SUNRPC: change cache_head.flags bits to enum SUNRPC: remove timeout arg from svc_recv() SUNRPC: change svc_recv() to return void. SUNRPC: call svc_process() from svc_recv(). nfsd: separate nfsd_last_thread() from nfsd_put() nfsd: Simplify code around svc_exit_thread() call in nfsd() ...
| * locks: allow support for write delegationDai Ngo2023-08-291-7/+0
| | | | | | | | | | | | | | | | | | | | Remove the check for F_WRLCK in generic_add_lease to allow file_lock to be used for write delegation. First consumer is NFSD. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | Merge tag 'filelock-v6.6' of ↵Linus Torvalds2023-08-281-5/+22
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: - new functionality for F_OFD_GETLK: requesting a type of F_UNLCK will find info about whatever lock happens to be first in the given range, regardless of type. - an OFD lock selftest - bugfix involving a UAF in a tracepoint - comment typo fix * tag 'filelock-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lock fs/locks: Fix typo selftests: add OFD lock tests fs/locks: F_UNLCK extension for F_OFD_GETLK
| * | locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lockWill Shiu2023-08-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As following backtrace, the struct file_lock request , in posix_lock_inode is free before ftrace function using. Replace the ftrace function ahead free flow could fix the use-after-free issue. [name:report&]=============================================== BUG:KASAN: use-after-free in trace_event_raw_event_filelock_lock+0x80/0x12c [name:report&]Read at addr f6ffff8025622620 by task NativeThread/16753 [name:report_hw_tags&]Pointer tag: [f6], memory tag: [fe] [name:report&] BT: Hardware name: MT6897 (DT) Call trace: dump_backtrace+0xf8/0x148 show_stack+0x18/0x24 dump_stack_lvl+0x60/0x7c print_report+0x2c8/0xa08 kasan_report+0xb0/0x120 __do_kernel_fault+0xc8/0x248 do_bad_area+0x30/0xdc do_tag_check_fault+0x1c/0x30 do_mem_abort+0x58/0xbc el1_abort+0x3c/0x5c el1h_64_sync_handler+0x54/0x90 el1h_64_sync+0x68/0x6c trace_event_raw_event_filelock_lock+0x80/0x12c posix_lock_inode+0xd0c/0xd60 do_lock_file_wait+0xb8/0x190 fcntl_setlk+0x2d8/0x440 ... [name:report&] [name:report&]Allocated by task 16752: ... slab_post_alloc_hook+0x74/0x340 kmem_cache_alloc+0x1b0/0x2f0 posix_lock_inode+0xb0/0xd60 ... [name:report&] [name:report&]Freed by task 16752: ... kmem_cache_free+0x274/0x5b0 locks_dispose_list+0x3c/0x148 posix_lock_inode+0xc40/0xd60 do_lock_file_wait+0xb8/0x190 fcntl_setlk+0x2d8/0x440 do_fcntl+0x150/0xc18 ... Signed-off-by: Will Shiu <Will.Shiu@mediatek.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
| * | fs/locks: Fix typoJakub Wilk2023-08-241-1/+1
| | | | | | | | | | | | | | | Signed-off-by: Jakub Wilk <jwilk@jwilk.net> Signed-off-by: Jeff Layton <jlayton@kernel.org>
| * | fs/locks: F_UNLCK extension for F_OFD_GETLKStas Sergeev2023-06-271-3/+20
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently F_UNLCK with F_OFD_GETLK returns -EINVAL. This patch changes it such that specifying F_UNLCK returns information only about OFD locks that are owned by the given file description. Cc: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Shuah Khan <shuah@kernel.org> Cc: linux-kselftest@vger.kernel.org Cc: linux-api@vger.kernel.org Signed-off-by: Stas Sergeev <stsp2@yandex.ru> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* / fs: Pass argument to fcntl_setlease as intLuca Vizzarro2023-07-101-10/+10
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | The interface for fcntl expects the argument passed for the command F_SETLEASE to be of type int. The current code wrongly treats it as a long. In order to avoid access to undefined bits, we should explicitly cast the argument to int. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Kevin Brodsky <Kevin.Brodsky@arm.com> Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com> Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: David Laight <David.Laight@ACULAB.com> Cc: Mark Rutland <Mark.Rutland@arm.com> Cc: linux-fsdevel@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: linux-nfs@vger.kernel.org Cc: linux-morello@op-lists.linaro.org Signed-off-by: Luca Vizzarro <Luca.Vizzarro@arm.com> Message-Id: <20230414152459.816046-3-Luca.Vizzarro@arm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
* filelocks: use mount idmapping for setlease permission checkSeth Forshee2023-03-091-1/+2
| | | | | | | | | | | | | | A user should be allowed to take out a lease via an idmapped mount if the fsuid matches the mapped uid of the inode. generic_setlease() is checking the unmapped inode uid, causing these operations to be denied. Fix this by comparing against the mapped inode uid instead of the unmapped uid. Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP") Cc: stable@vger.kernel.org Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* fs/locks: Remove redundant assignment to cmdJiapeng Chong2023-03-091-1/+0
| | | | | | | | | | | | Variable 'cmd' set but not used. fs/locks.c:2428:3: warning: Value stored to 'cmd' is never read. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4439 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* Merge tag 'rcu.2023.02.10a' of ↵Linus Torvalds2023-02-211-25/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Documentation updates - Miscellaneous fixes, perhaps most notably: - Throttling callback invocation based on the number of callbacks that are now ready to invoke instead of on the total number of callbacks - Several patches that suppress false-positive boot-time diagnostics, for example, due to lockdep not yet being initialized - Make expedited RCU CPU stall warnings dump stacks of any tasks that are blocking the stalled grace period. (Normal RCU CPU stall warnings have done this for many years) - Lazy-callback fixes to avoid delays during boot, suspend, and resume. (Note that lazy callbacks must be explicitly enabled, so this should not (yet) affect production use cases) - Make kfree_rcu() and friends take advantage of polled grace periods, thus reducing memory footprint by almost two orders of magnitude, admittedly on a microbenchmark This also begins the transition from kfree_rcu(p) to kfree_rcu_mightsleep(p). This transition was motivated by bugs where kfree_rcu(p), which can block, was typed instead of the intended kfree_rcu(p, rh) - SRCU updates, perhaps most notably fixing a bug that causes SRCU to fail when booted on a system with a non-zero boot CPU. This surprising situation actually happens for kdump kernels on the powerpc architecture This also adds an srcu_down_read() and srcu_up_read(), which act like srcu_read_lock() and srcu_read_unlock(), but allow an SRCU read-side critical section to be handed off from one task to another - Clean up the now-useless SRCU Kconfig option There are a few more commits that are not yet acked or pulled into maintainer trees, and these will be in a pull request for a later merge window - RCU-tasks updates, perhaps most notably these fixes: - A strange interaction between PID-namespace unshare and the RCU-tasks grace period that results in a low-probability but very real hang - A race between an RCU tasks rude grace period on a single-CPU system and CPU-hotplug addition of the second CPU that can result in a too-short grace period - A race between shrinking RCU tasks down to a single callback list and queuing a new callback to some other CPU, but where that queuing is delayed for more than an RCU grace period. This can result in that callback being stranded on the non-boot CPU - Torture-test updates and fixes - Torture-test scripting updates and fixes - Provide additional RCU CPU stall-warning information in kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y, and restore the full five-minute timeout limit for expedited RCU CPU stall warnings * tag 'rcu.2023.02.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits) rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep() kernel/notifier: Remove CONFIG_SRCU init: Remove "select SRCU" fs/quota: Remove "select SRCU" fs/notify: Remove "select SRCU" fs/btrfs: Remove "select SRCU" fs: Remove CONFIG_SRCU drivers/pci/controller: Remove "select SRCU" drivers/net: Remove "select SRCU" drivers/md: Remove "select SRCU" drivers/hwtracing/stm: Remove "select SRCU" drivers/dax: Remove "select SRCU" drivers/base: Remove CONFIG_SRCU rcu: Disable laziness if lazy-tracking says so rcu: Track laziness during boot and suspend rcu: Remove redundant call to rcu_boost_kthread_setaffinity() rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts rcu: Align the output of RCU CPU stall warning messages rcu: Add RCU stall diagnosis information sched: Add helper nr_context_switches_cpu() ...
| * fs: Remove CONFIG_SRCUPaul E. McKenney2023-02-031-25/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in conditional compilation based on CONFIG_SRCU. Therefore, remove the #ifdef and throw away the #else clause. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <linux-fsdevel@vger.kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: John Ogness <john.ogness@linutronix.de>
* | fs: remove locks_inodeJeff Layton2023-01-111-14/+14
| | | | | | | | | | | | | | | | | | | | | | locks_inode was turned into a wrapper around file_inode in de2a4a501e71 (Partially revert "locks: fix file locking on overlayfs"). Finish replacing locks_inode invocations everywhere with file_inode. Acked-by: Miklos Szeredi <mszeredi@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* | filelock: move file locking definitions to separate header fileJeff Layton2023-01-111-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | The file locking definitions have lived in fs.h since the dawn of time, but they are only used by a small subset of the source files that include it. Move the file locking definitions to a new header file, and add the appropriate #include directives to the source files that need them. By doing this we trim down fs.h a bit and limit the amount of rebuilding that has to be done when we make changes to the file locking APIs. Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by: Steve French <stfrench@microsoft.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* Add process name and pid to locks warningAndi Kleen2022-11-301-1/+1
| | | | | | | | | | | | It's fairly useless to complain about using an obsolete feature without telling the user which process used it. My Fedora desktop randomly drops this message, but I would really need this patch to figure out what triggers is. [ jlayton: print pid as well as process name ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* filelock: add a new locks_inode_context accessor functionJeff Layton2022-11-301-12/+12
| | | | | | | | | | | | | | There are a number of places in the kernel that are accessing the inode->i_flctx field without smp_load_acquire. This is required to ensure that the caller doesn't see a partially-initialized structure. Add a new accessor function for it to make this clear and convert all of the relevant accesses in locks.c to use it. Also, convert locks_free_lock_context to use the helper as well instead of just doing a "bare" assignment. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* filelock: new helper: vfs_inode_has_locksJeff Layton2022-11-301-0/+23
| | | | | | | | | | | | | | | Ceph has a need to know whether a particular inode has any locks set on it. It's currently tracking that by a num_locks field in its filp->private_data, but that's problematic as it tries to decrement this field when releasing locks and that can race with the file being torn down. Add a new vfs_inode_has_locks helper that just returns whether any locks are currently held on the inode. Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* filelock: WARN_ON_ONCE when ->fl_file and filp don't matchJeff Layton2022-11-171-0/+3
| | | | | | | | | | | | | | | | | vfs_lock_file, vfs_test_lock and vfs_cancel_lock all take both a struct file argument and a file_lock. The file_lock has a fl_file field in it howevever and it _must_ match the file passed in. While most of the locks.c routines use the separately-passed file argument, some filesystems rely on fl_file being filled out correctly. I'm working on a patch series to remove the redundant argument from these routines, but for now, let's ensure that the callers always set this properly by issuing a WARN_ON_ONCE if they ever don't match. Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Trond Myklebust <trondmy@hammerspace.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* locks: Fix dropped call to ->fl_release_private()David Howells2022-08-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to commit 4149be7bda7e, sys_flock() would allocate the file_lock struct it was going to use to pass parameters, call ->flock() and then call locks_free_lock() to get rid of it - which had the side effect of calling locks_release_private() and thus ->fl_release_private(). With commit 4149be7bda7e, however, this is no longer the case: the struct is now allocated on the stack, and locks_free_lock() is no longer called - and thus any remaining private data doesn't get cleaned up either. This causes afs flock to cause oops. Kasan catches this as a UAF by the list_del_init() in afs_fl_release_private() for the file_lock record produced by afs_fl_copy_lock() as the original record didn't get delisted. It can be reproduced using the generic/504 xfstest. Fix this by reinstating the locks_release_private() call in sys_flock(). I'm not sure if this would affect any other filesystems. If not, then the release could be done in afs_flock() instead. Changes ======= ver #2) - Don't need to call ->fl_release_private() after calling the security hook, only after calling ->flock(). Fixes: 4149be7bda7e ("fs/lock: Don't allocate file_lock in flock_make_lock().") cc: Chuck Lever <chuck.lever@oracle.com> cc: Jeff Layton <jlayton@kernel.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/166075758809.3532462.13307935588777587536.stgit@warthog.procyon.org.uk/ # v1 Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* fs/lock: Rearrange ops in flock syscall.Kuniyuki Iwashima2022-07-181-24/+19
| | | | | | | | | | | | | | | | | | | | The previous patch added flock_translate_cmd() in flock syscall. The test and the other one for LOCK_MAND do not depend on struct fd and are cheaper, so we can put them at the top and defer fdget() after that. Also, we can remove the unlock variable and use type instead. While at it, we fix this checkpatch error. CHECK: spaces preferred around that '|' (ctx:VxV) #45: FILE: fs/locks.c:2099: + if (type != F_UNLCK && !(f.file->f_mode & (FMODE_READ|FMODE_WRITE))) ^ Finally, we can move the can_sleep part just before we use it. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* fs/lock: Don't allocate file_lock in flock_make_lock().Kuniyuki Iwashima2022-07-181-31/+15
| | | | | | | | | | | | | | | | | | | Two functions, flock syscall and locks_remove_flock(), call flock_make_lock(). It allocates struct file_lock from slab cache if its argument fl is NULL. When we call flock syscall, we pass NULL to allocate memory for struct file_lock. However, we always free it at the end by locks_free_lock(). We need not allocate it and instead should use a local variable as locks_remove_flock() does. Also, the validation for flock_translate_cmd() is not necessary for locks_remove_flock(). So we move the part to flock syscall and make flock_make_lock() return nothing. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* fs/lock: add 2 callbacks to lock_manager_operations to resolve conflictDai Ngo2022-05-191-3/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add 2 new callbacks, lm_lock_expirable and lm_expire_lock, to lock_manager_operations to allow the lock manager to take appropriate action to resolve the lock conflict if possible. A new field, lm_mod_owner, is also added to lock_manager_operations. The lm_mod_owner is used by the fs/lock code to make sure the lock manager module such as nfsd, is not freed while lock conflict is being resolved. lm_lock_expirable checks and returns true to indicate that the lock conflict can be resolved else return false. This callback must be called with the flc_lock held so it can not block. lm_expire_lock is called to resolve the lock conflict if the returned value from lm_lock_expirable is true. This callback is called without the flc_lock held since it's allowed to block. Upon returning from this callback, the lock conflict should be resolved and the caller is expected to restart the conflict check from the beginnning of the list. Lock manager, such as NFSv4 courteous server, uses this callback to resolve conflict by destroying lock owner, or the NFSv4 courtesy client (client that has expired but allowed to maintains its states) that owns the lock. Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
* fs/lock: add helper locks_owner_has_blockers to check for blockersDai Ngo2022-05-191-0/+28
| | | | | | | | | | Add helper locks_owner_has_blockers to check if there is any blockers for a given lockowner. Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
* fs: move locking sysctls where they are usedLuis Chamberlain2022-01-221-2/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. The locking fs sysctls are only used on fs/locks.c, so move them there. Link: https://lkml.kernel.org/r/20211129205548.605569-7-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Stephen Kitt <steve@sk2.org> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* locks: remove changelog commentsJ. Bruce Fields2021-10-191-110/+4
| | | | | | | | | | | | This is only of historical interest, and anyone interested in the history can dig out an old version of locks.c from from git. Triggered by the observation that it references the now-removed Documentation/filesystems/mandatory-locking.rst. Reported-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* locks: remove LOCK_MAND flock lock supportJeff Layton2021-09-101-25/+22
| | | | | | | | | | | | | | | As best I can tell, the logic for these has been broken for a long time (at least before the move to git), such that they never conflict with anything. Also, nothing checks for these flags and prevented opens or read/write behavior on the files. They don't seem to do anything. Given that, we can rip these symbols out of the kernel, and just make flock(2) return 0 when LOCK_MAND is set in order to preserve existing behavior. Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* Revert "memcg: enable accounting for file lock caches"Linus Torvalds2021-09-071-4/+2
| | | | | | | | | | | | | | | | | | | | | This reverts commit 0f12156dff2862ac54235fc72703f18770769042. The kernel test robot reports a sizeable performance regression for this commit, and while it clearly does the rigth thing in theory, we'll need to look at just how to avoid or minimize the performance overhead of the memcg accounting. People already have suggestions on how to do that, but it's "future work". So revert it for now. Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/ Acked-by: Jens Axboe <axboe@kernel.dk> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'akpm' (patches from Andrew)Linus Torvalds2021-09-031-2/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc updates from Andrew Morton: "173 patches. Subsystems affected by this series: ia64, ocfs2, block, and mm (debug, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock, oom-kill, migration, ksm, percpu, vmstat, and madvise)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits) mm/madvise: add MADV_WILLNEED to process_madvise() mm/vmstat: remove unneeded return value mm/vmstat: simplify the array size calculation mm/vmstat: correct some wrong comments mm/percpu,c: remove obsolete comments of pcpu_chunk_populated() selftests: vm: add COW time test for KSM pages selftests: vm: add KSM merging time test mm: KSM: fix data type selftests: vm: add KSM merging across nodes test selftests: vm: add KSM zero page merging test selftests: vm: add KSM unmerge test selftests: vm: add KSM merge test mm/migrate: correct kernel-doc notation mm: wire up syscall process_mrelease mm: introduce process_mrelease system call memblock: make memblock_find_in_range method private mm/mempolicy.c: use in_task() in mempolicy_slab_node() mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies mm/mempolicy: advertise new MPOL_PREFERRED_MANY mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY ...
| * memcg: enable accounting for file lock cachesVasily Averin2021-09-031-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User can create file locks for each open file and force kernel to allocate small but long-living objects per each open file. It makes sense to account for these objects to limit the host's memory consumption from inside the memcg-limited container. Link: https://lkml.kernel.org/r/b009f4c7-f0ab-c0ec-8e83-918f47d677da@virtuozzo.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roman Gushchin <guro@fb.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Yutian Yang <nglaive@gmail.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs: remove mandatory file locking supportJeff Layton2021-08-231-116/+1
|/ | | | | | | | | | | | | | | | | | We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it off in Fedora and RHEL8. Several other distros have followed suit. I've heard of one problem in all that time: Someone migrated from an older distro that supported "-o mand" to one that didn't, and the host had a fstab entry with "mand" in it which broke on reboot. They didn't actually _use_ mandatory locking so they just removed the mount option and moved on. This patch rips out mandatory locking support wholesale from the kernel, along with the Kconfig option and the Documentation file. It also changes the mount code to ignore the "mand" mount option instead of erroring out, and to throw a big, ugly warning. Signed-off-by: Jeff Layton <jlayton@kernel.org>
* Merge tag 'nfsd-5.13-1' of ↵Linus Torvalds2021-05-051-0/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull more nfsd updates from Chuck Lever: "Additional fixes and clean-ups for NFSD since tags/nfsd-5.13, including a fix to grant read delegations for files open for writing" * tag 'nfsd-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Fix null pointer dereference in svc_rqst_free() SUNRPC: fix ternary sign expansion bug in tracing nfsd: Fix fall-through warnings for Clang nfsd: grant read delegations to clients holding writes nfsd: reshuffle some code nfsd: track filehandle aliasing in nfs4_files nfsd: hash nfs4_files by inode number nfsd: ensure new clients break delegations nfsd: removed unused argument in nfsd_startup_generic() nfsd: remove unused function svcrdma: Pass a useful error code to the send_err tracepoint svcrdma: Rename goto labels in svc_rdma_sendto() svcrdma: Don't leak send_ctxt on Send errors
| * nfsd: grant read delegations to clients holding writesJ. Bruce Fields2021-04-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | It's OK to grant a read delegation to a client that holds a write, as long as it's the only client holding the write. We originally tried to do this in commit 94415b06eb8a ("nfsd4: a client's own opens needn't prevent delegations"), which had to be reverted in commit 6ee65a773096 ("Revert "nfsd4: a client's own opens needn't prevent delegations""). Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | Merge tag 'locks-v5.13' of ↵Linus Torvalds2021-04-261-10/+56
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "When we reworked the blocked locks into a tree structure instead of a flat list a few releases ago, we lost the ability to see all of the file locks in /proc/locks. Luo's patch fixes it to dump out all of the blocked locks instead, which restores the full output. This changes the format of /proc/locks as the blocked locks are shown at multiple levels of indentation now, but lslocks (the only common program I've ID'ed that scrapes this info) seems to be OK with that. Tian also contributed a small patch to remove a useless assignment" * tag 'locks-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs/locks: remove useless assignment in fcntl_getlk fs/locks: print full locks information
| * fs/locks: remove useless assignment in fcntl_getlkTian Tao2021-04-131-1/+0
| | | | | | | | | | | | | | Function parameter 'cmd' is rewritten with unused value at locks.c Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
| * fs/locks: print full locks informationLuo Longjun2021-03-111-9/+56
| | | | | | | | | | | | | | | | | | | | | | | | Commit fd7732e033e3 ("fs/locks: create a tree of dependent requests.") has put blocked locks into a tree. So, with a for loop, we can't check all locks information. To solve this problem, we should traverse the tree. Signed-off-by: Luo Longjun <luolongjun@huawei.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* | Revert "nfsd4: a client's own opens needn't prevent delegations"J. Bruce Fields2021-03-091-3/+0
|/ | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 94415b06eb8aed13481646026dc995f04a3a534a. That commit claimed to allow a client to get a read delegation when it was the only writer. Actually it allowed a client to get a read delegation when *any* client has a write open! The main problem is that it's depending on nfs4_clnt_odstate structures that are actually only maintained for pnfs exports. This causes clients to miss writes performed by other clients, even when there have been intervening closes and opens, violating close-to-open cache consistency. We can do this a different way, but first we should just revert this. I've added pynfs 4.1 test DELEG19 to test for this, as I should have done originally! Cc: stable@vger.kernel.org Reported-by: Timo Rothenpieler <timo@rothenpieler.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* Merge branch 'exec-for-v5.11' of ↵Linus Torvalds2020-12-161-6/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull execve updates from Eric Biederman: "This set of changes ultimately fixes the interaction of posix file lock and exec. Fundamentally most of the change is just moving where unshare_files is called during exec, and tweaking the users of files_struct so that the count of files_struct is not unnecessarily played with. Along the way fcheck and related helpers were renamed to more accurately reflect what they do. There were also many other small changes that fell out, as this is the first time in a long time much of this code has been touched. Benchmarks haven't turned up any practical issues but Al Viro has observed a possibility for a lot of pounding on task_lock. So I have some changes in progress to convert put_files_struct to always rcu free files_struct. That wasn't ready for the merge window so that will have to wait until next time" * 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits) exec: Move io_uring_task_cancel after the point of no return coredump: Document coredump code exclusively used by cell spufs file: Remove get_files_struct file: Rename __close_fd_get_file close_fd_get_file file: Replace ksys_close with close_fd file: Rename __close_fd to close_fd and remove the files parameter file: Merge __alloc_fd into alloc_fd file: In f_dupfd read RLIMIT_NOFILE once. file: Merge __fd_install into fd_install proc/fd: In fdinfo seq_show don't use get_files_struct bpf/task_iter: In task_file_seq_get_next use task_lookup_next_fd_rcu proc/fd: In proc_readfd_common use task_lookup_next_fd_rcu file: Implement task_lookup_next_fd_rcu kcmp: In get_file_raw_ptr use task_lookup_fd_rcu proc/fd: In tid_fd_mode use task_lookup_fd_rcu file: Implement task_lookup_fd_rcu file: Rename fcheck lookup_fd_rcu file: Replace fcheck_files with files_lookup_fd_rcu file: Factor files_lookup_fd_locked out of fcheck_files file: Rename __fcheck_files to files_lookup_fd_raw ...
| * file: Factor files_lookup_fd_locked out of fcheck_filesEric W. Biederman2020-12-101-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To make it easy to tell where files->file_lock protection is being used when looking up a file create files_lookup_fd_locked. Only allow this function to be called with the file_lock held. Update the callers of fcheck and fcheck_files that are called with the files->file_lock held to call files_lookup_fd_locked instead. Hopefully this makes it easier to quickly understand what is going on. The need for better names became apparent in the last round of discussion of this set of changes[1]. [1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqyQ@mail.gmail.com Link: https://lkml.kernel.org/r/20201120231441.29911-8-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* | locks: fix a typo at a kernel-doc markupMauro Carvalho Chehab2020-10-261-1/+1
| | | | | | | | | | | | | | locks_delete_lock -> locks_delete_block Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* | locks: Fix UBSAN undefined behaviour in flock64_to_posix_lockLuo Meng2020-10-261-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | When the sum of fl->fl_start and l->l_len overflows, UBSAN shows the following warning: UBSAN: Undefined behaviour in fs/locks.c:482:29 signed integer overflow: 2 + 9223372036854775806 cannot be represented in type 'long long int' Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xe4/0x14e lib/dump_stack.c:118 ubsan_epilogue+0xe/0x81 lib/ubsan.c:161 handle_overflow+0x193/0x1e2 lib/ubsan.c:192 flock64_to_posix_lock fs/locks.c:482 [inline] flock_to_posix_lock+0x595/0x690 fs/locks.c:515 fcntl_setlk+0xf3/0xa90 fs/locks.c:2262 do_fcntl+0x456/0xf60 fs/fcntl.c:387 __do_sys_fcntl fs/fcntl.c:483 [inline] __se_sys_fcntl fs/fcntl.c:468 [inline] __x64_sys_fcntl+0x12d/0x180 fs/fcntl.c:468 do_syscall_64+0xc8/0x5a0 arch/x86/entry/common.c:293 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fix it by parenthesizing 'l->l_len - 1'. Signed-off-by: Luo Meng <luomeng12@huawei.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva2020-08-241-3/+3
| | | | | | | | | | Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
* Merge tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6Linus Torvalds2020-08-091-0/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS server updates from Chuck Lever: "Highlights: - Support for user extended attributes on NFS (RFC 8276) - Further reduce unnecessary NFSv4 delegation recalls Notable fixes: - Fix recent krb5p regression - Address a few resource leaks and a rare NULL dereference Other: - De-duplicate RPC/RDMA error handling and other utility functions - Replace storage and display of kernel memory addresses by tracepoints" * tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6: (38 commits) svcrdma: CM event handler clean up svcrdma: Remove transport reference counting svcrdma: Fix another Receive buffer leak SUNRPC: Refresh the show_rqstp_flags() macro nfsd: netns.h: delete a duplicated word SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()") nfsd: avoid a NULL dereference in __cld_pipe_upcall() nfsd4: a client's own opens needn't prevent delegations nfsd: Use seq_putc() in two functions svcrdma: Display chunk completion ID when posting a rw_ctxt svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send() svcrdma: Introduce Send completion IDs svcrdma: Record Receive completion ID in svc_rdma_decode_rqst svcrdma: Introduce Receive completion IDs svcrdma: Introduce infrastructure to support completion IDs svcrdma: Add common XDR encoders for RDMA and Read segments svcrdma: Add common XDR decoders for RDMA and Read segments SUNRPC: Add helpers for decoding list discriminators symbolically svcrdma: Remove declarations for functions long removed svcrdma: Clean up trace_svcrdma_send_failed() tracepoint ...
| * nfsd4: a client's own opens needn't prevent delegationsJ. Bruce Fields2020-07-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We recently fixed lease breaking so that a client's actions won't break its own delegations. But we still have an unnecessary self-conflict when granting delegations: a client's own write opens will prevent us from handing out a read delegation even when no other client has the file open for write. Fix that by turning off the checks for conflicting opens under vfs_setlease, and instead performing those checks in the nfsd code. We don't depend much on locks here: instead we acquire the delegation, then check for conflicts, and drop the delegation again if we find any. The check beforehand is an optimization of sorts, just to avoid acquiring the delegation unnecessarily. There's a race where the first check could cause us to deny the delegation when we could have granted it. But, that's OK, delegation grants are optional (and probably not even a good idea in that case). Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | Merge tag 'filelock-v5.9-1' of ↵Linus Torvalds2020-08-031-0/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking fix from Jeff Layton: "Just a single, one-line patch to fix an inefficiency in the posix locking code that can lead to it doing more wakeups than necessary" * tag 'filelock-v5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: add locks_move_blocks in posix_lock_inode
| * locks: add locks_move_blocks in posix_lock_inodeyangerkun2020-06-021-0/+1
| | | | | | | | | | | | | | | | We forget to call locks_move_blocks in posix_lock_inode when try to process same owner and different types. Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
* | Merge tag 'nfsd-5.8' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2020-06-111-0/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "Highlights: - Keep nfsd clients from unnecessarily breaking their own delegations. Note this requires a small kthreadd addition. The result is Tejun Heo's suggestion (see link), and he was OK with this going through my tree. - Patch nfsd/clients/ to display filenames, and to fix byte-order when displaying stateid's. - fix a module loading/unloading bug, from Neil Brown. - A big series from Chuck Lever with RPC/RDMA and tracing improvements, and lay some groundwork for RPC-over-TLS" Link: https://lore.kernel.org/r/1588348912-24781-1-git-send-email-bfields@redhat.com * tag 'nfsd-5.8' of git://linux-nfs.org/~bfields/linux: (49 commits) sunrpc: use kmemdup_nul() in gssp_stringify() nfsd: safer handling of corrupted c_type nfsd4: make drc_slab global, not per-net SUNRPC: Remove unreachable error condition in rpcb_getport_async() nfsd: Fix svc_xprt refcnt leak when setup callback client failed sunrpc: clean up properly in gss_mech_unregister() sunrpc: svcauth_gss_register_pseudoflavor must reject duplicate registrations. sunrpc: check that domain table is empty at module unload. NFSD: Fix improperly-formatted Doxygen comments NFSD: Squash an annoying compiler warning SUNRPC: Clean up request deferral tracepoints NFSD: Add tracepoints for monitoring NFSD callbacks NFSD: Add tracepoints to the NFSD state management code NFSD: Add tracepoints to NFSD's duplicate reply cache SUNRPC: svc_show_status() macro should have enum definitions SUNRPC: Restructure svc_udp_recvfrom() SUNRPC: Refactor svc_recvfrom() SUNRPC: Clean up svc_release_skb() functions SUNRPC: Refactor recvfrom path dealing with incomplete TCP receives SUNRPC: Replace dprintk() call sites in TCP receive path ...
| * | nfsd: clients don't need to break their own delegationsJ. Bruce Fields2020-05-091-0/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently revoke read delegations on any write open or any operation that modifies file data or metadata (including rename, link, and unlink). But if the delegation in question is the only read delegation and is held by the client performing the operation, that's not really necessary. It's not always possible to prevent this in the NFSv4.0 case, because there's not always a way to determine which client an NFSv4.0 delegation came from. (In theory we could try to guess this from the transport layer, e.g., by assuming all traffic on a given TCP connection comes from the same client. But that's not really correct.) In the NFSv4.1 case the session layer always tells us the client. This patch should remove such self-conflicts in all cases where we can reliably determine the client from the compound. To do that we need to track "who" is performing a given (possibly lease-breaking) file operation. We're doing that by storing the information in the svc_rqst and using kthread_data() to map the current task back to a svc_rqst. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | Merge branch 'proc-linus' of ↵Linus Torvalds2020-06-041-2/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull proc updates from Eric Biederman: "This has four sets of changes: - modernize proc to support multiple private instances - ensure we see the exit of each process tid exactly - remove has_group_leader_pid - use pids not tasks in posix-cpu-timers lookup Alexey updated proc so each mount of proc uses a new superblock. This allows people to actually use mount options with proc with no fear of messing up another mount of proc. Given the kernel's internal mounts of proc for things like uml this was a real problem, and resulted in Android's hidepid mount options being ignored and introducing security issues. The rest of the changes are small cleanups and fixes that came out of my work to allow this change to proc. In essence it is swapping the pids in de_thread during exec which removes a special case the code had to handle. Then updating the code to stop handling that special case" * 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: proc_pid_ns takes super_block as an argument remove the no longer needed pid_alive() check in __task_pid_nr_ns() posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type posix-cpu-timers: Extend rcu_read_lock removing task_struct references signal: Remove has_group_leader_pid exec: Remove BUG_ON(has_group_leader_pid) posix-cpu-timer: Unify the now redundant code in lookup_task posix-cpu-timer: Tidy up group_leader logic in lookup_task proc: Ensure we see the exit of each process tid exactly once rculist: Add hlists_swap_heads_rcu proc: Use PIDTYPE_TGID in next_tgid Use proc_pid_ns() to get pid_namespace from the proc superblock proc: use named enums for better readability proc: use human-readable values for hidepid docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior proc: add option to mount only a pids subset proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option proc: allow to mount many instances of proc in one pid namespace proc: rename struct proc_fs_info to proc_fs_opts
| * | proc: proc_pid_ns takes super_block as an argumentAlexey Gladkov2020-05-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot found that touch /proc/testfile causes NULL pointer dereference at tomoyo_get_local_path() because inode of the dentry is NULL. Before c59f415a7cb6, Tomoyo received pid_ns from proc's s_fs_info directly. Since proc_pid_ns() can only work with inode, using it in the tomoyo_get_local_path() was wrong. To avoid creating more functions for getting proc_ns, change the argument type of the proc_pid_ns() function. Then, Tomoyo can use the existing super_block to get pid_ns. Link: https://lkml.kernel.org/r/0000000000002f0c7505a5b0e04c@google.com Link: https://lkml.kernel.org/r/20200518180738.2939611-1-gladkov.alexey@gmail.com Reported-by: syzbot+c1af344512918c61362c@syzkaller.appspotmail.com Fixes: c59f415a7cb6 ("Use proc_pid_ns() to get pid_namespace from the proc superblock") Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
| * | Use proc_pid_ns() to get pid_namespace from the proc superblockAlexey Gladkov2020-04-241-2/+2
| |/ | | | | | | | | | | | | | | | | | | | | To get pid_namespace from the procfs superblock should be used a special helper. This will avoid errors when s_fs_info will change the type. Link: https://lore.kernel.org/lkml/20200423200316.164518-3-gladkov.alexey@gmail.com/ Link: https://lore.kernel.org/lkml/20200423112858.95820-1-gladkov.alexey@gmail.com/ Link: https://lore.kernel.org/lkml/06B50A1C-406F-4057-BFA8-3A7729EA7469@lca.pw/ Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>