summaryrefslogtreecommitdiffstats
path: root/fs/overlayfs/file.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* ovl: convert ovl_real_fdget() callers to ovl_real_file()Amir Goldstein2024-11-151-84/+59
| | | | | | | | | | | Stop using struct fd to return a real file from ovl_real_fdget(), because we no longer return a temporary file object and the callers always get a borrowed file reference. Rename the helper to ovl_real_file(), return a borrowed reference of the real file that is referenced from the overlayfs file or an error. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path()Amir Goldstein2024-11-151-25/+34
| | | | | | | | | | | Stop using struct fd to return a real file from ovl_real_fdget_path(), because we no longer return a temporary file object and the callers always get a borrowed file reference. Rename the helper to ovl_real_file_path(), return a borrowed reference of the real file that is referenced from the overlayfs file or an error. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* ovl: store upper real file in ovl_file structAmir Goldstein2024-11-151-8/+41
| | | | | | | | | | | When an overlayfs file is opened as lower and then the file is copied up, every operation on the overlayfs open file will open a temporary backing file to the upper dentry and close it at the end of the operation. Store the upper real file along side the original (lower) real file in ovl_file instead of opening a temporary upper file on every operation. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* ovl: allocate a container struct ovl_file for ovl private contextAmir Goldstein2024-11-151-6/+34
| | | | | | | Instead of using ->private_data to point at realfile directly, so that we can add more context per ovl open file. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* ovl: do not open non-data lower file for fsyncAmir Goldstein2024-11-151-27/+31
| | | | | | | | | | | | | | | ovl_fsync() with !datasync opens a backing file from the top most dentry in the stack, checks if this dentry is non-upper and skips the fsync. In case of an overlay dentry stack with lower data and lower metadata above it, but without an upper metadata above it, the backing file is opened from the top most lower metadata dentry and never used. Refactor the helper ovl_real_fdget_meta() into ovl_real_fdget_path() and open code the checks for non-upper inode in ovl_fsync(), so in that case we can avoid the unneeded backing file open. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* ovl: use wrapper ovl_revert_creds()Vinicius Costa Gomes2024-11-111-7/+7
| | | | | | | | | Introduce ovl_revert_creds() wrapper of revert_creds() to match callers of ovl_override_creds(). Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* backing-file: clean up the APIMiklos Szeredi2024-11-111-9/+13
| | | | | | | | | | | | - Pass iocb to ctx->end_write() instead of file + pos - Get rid of ctx->user_file, which is redundant most of the time - Instead pass iocb to backing_file_splice_read and backing_file_splice_write Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* fs: pass offset and result to backing_file end_write() callbackAmir Goldstein2024-10-161-2/+7
| | | | | | | | | This is needed for extending fuse inode size after fuse passthrough write. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/linux-fsdevel/CAJfpegs=cvZ_NYy6Q_D42XhYS=Sjj5poM1b5TzXzOVvX=R36aA@mail.gmail.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: fix file leak in ovl_real_fdget_meta()Amir Goldstein2024-09-271-1/+1
| | | | | | | | | | ovl_open_realfile() is wrongly called twice after conversion to new struct fd. Fixes: 88a2f6468d01 ("struct fd: representation change") Reported-by: syzbot+d9efec94dcbfa0de1c07@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* struct fd: representation changeAl Viro2024-08-131-13/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want the compiler to see that fdput() on empty instance is a no-op. The emptiness check is that file reference is NULL, while fdput() is "fput() if FDPUT_FPUT is present in flags". The reason why fdput() on empty instance is a no-op is something compiler can't see - it's that we never generate instances with NULL file reference combined with non-zero flags. It's not that hard to deal with - the real primitives behind fdget() et.al. are returning an unsigned long value, unpacked by (inlined) __to_fd() into the current struct file * + int. The lower bits are used to store flags, while the rest encodes the pointer. Linus suggested that keeping this unsigned long around with the extractions done by inlined accessors should generate a sane code and that turns out to be the case. Namely, turning struct fd into a struct-wrapped unsinged long, with fd_empty(f) => unlikely(f.word == 0) fd_file(f) => (struct file *)(f.word & ~3) fdput(f) => if (f.word & 1) fput(fd_file(f)) ends up with compiler doing the right thing. The cost is the patch footprint, of course - we need to switch f.file to fd_file(f) all over the tree, and it's not doable with simple search and replace; there are false positives, etc. Note that the sole member of that structure is an opaque unsigned long - all accesses should be done via wrappers and I don't want to use a name that would invite manual casts to file pointers, etc. The value of that member is equal either to (unsigned long)p | flags, p being an address of some struct file instance, or to 0 for an empty fd. For now the new predicate (fd_empty(f)) has no users; all the existing checks have form (!fd_file(f)). We will convert to fd_empty() use later; here we only define it (and tell the compiler that it's unlikely to return true). This commit only deals with representation change; there will be followups. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* introduce fd_file(), convert all accessors to it.Al Viro2024-08-131-20/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are very few (3 in linux/file.h, 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in explicit initializers). Those can be dealt with in the commit converting to new layout; accesses to struct fd::file are too many for that. This commit converts (almost) all of f.file to fd_file(f). It's not entirely mechanical ('file' is used as a member name more than just in struct fd) and it does not even attempt to distinguish the uses in pointer context from those in boolean context; the latter will be eventually turned into a separate helper (fd_empty()). NOTE: mass conversion to fd_empty(), tempting as it might be, is a bad idea; better do that piecewise in commit that convert from fdget...() to CLASS(...). [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c caught by git; fs/stat.c one got caught by git grep] [fs/xattr.c conflict] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ovl: implement tmpfileMiklos Szeredi2024-05-021-3/+0
| | | | | | | | | | | | | | | Combine inode creation with opening a file. There are six separate objects that are being set up: the backing inode, dentry and file, and the overlay inode, dentry and file. Cleanup in case of an error is a bit of a challenge and is difficult to test, so careful review is needed. All tmpfile testcases except generic/509 now run/pass, and no regressions are observed with full xfstests. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
* fs: factor out backing_file_mmap() helperAmir Goldstein2023-12-231-17/+6
| | | | | | | | Assert that the file object is allocated in a backing_file container so that file_user_path() could be used to display the user path and not the backing file's path in /proc/<pid>/maps. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* fs: factor out backing_file_splice_{read,write}() helpersAmir Goldstein2023-12-231-20/+13
| | | | | | | | There is not much in those helpers, but it makes sense to have them logically next to the backing_file_{read,write}_iter() helpers as they may grow more common logic in the future. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* fs: factor out backing_file_{read,write}_iter() helpersAmir Goldstein2023-12-231-175/+13
| | | | | | | | | | Overlayfs submits files io to backing files on other filesystems. Factor out some common helpers to perform io to backing files, into fs/backing-file.c. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/r/CAJfpeguhmZbjP3JLqtUy0AdWaHOkAPWeP827BBWwRFEAUgnUcQ@mail.gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* fs: prepare for stackable filesystems backing file helpersAmir Goldstein2023-12-231-0/+1
| | | | | | | | | | | | | | | In preparation for factoring out some backing file io helpers from overlayfs, move backing_file_open() into a new file fs/backing-file.c and header. Add a MAINTAINERS entry for stackable filesystems and add a Kconfig FS_STACK which stackable filesystems need to select. For now, the backing_file struct, the backing_file alloc/free functions and the backing_file_real_path() accessor remain internal to file_table.c. We may change that in the future. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* fs: move kiocb_start_write() into vfs_iocb_iter_write()Amir Goldstein2023-11-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | In vfs code, sb_start_write() is usually called after the permission hook in rw_verify_area(). vfs_iocb_iter_write() is an exception to this rule, where kiocb_start_write() is called by its callers. Move kiocb_start_write() from the callers into vfs_iocb_iter_write() after the rw_verify_area() checks, to make them "start-write-safe". The semantics of vfs_iocb_iter_write() is changed, so that the caller is responsible for calling kiocb_end_write() on completion only if async iocb was queued. The completion handlers of both callers were adapted to this semantic change. This is needed for fanotify "pre content" events. Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-14-amir73il@gmail.com Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
* fs: move file_start_write() into vfs_iter_write()Amir Goldstein2023-11-241-2/+0
| | | | | | | | | | | | | | | | | All the callers of vfs_iter_write() call file_start_write() just before calling vfs_iter_write() except for target_core_file's fd_do_rw(). Move file_start_write() from the callers into vfs_iter_write(). fd_do_rw() calls vfs_iter_write() with a non-regular file, so file_start_write() is a no-op. This is needed for fanotify "pre content" events. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-11-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
* ovl: add helper ovl_file_modified()Amir Goldstein2023-10-301-7/+11
| | | | | | | A simple wrapper for updating ovl inode size/mtime, to conform with ovl_file_accessed(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* ovl: punt write aio completion to workqueueAmir Goldstein2023-10-301-1/+41
| | | | | | | | | | | | | | | We want to protect concurrent updates of ovl inode size and mtime (i.e. ovl_copyattr()) from aio completion context. Punt write aio completion to a workqueue so that we can protect ovl_copyattr() with a spinlock. Export sb_init_dio_done_wq(), so that overlayfs can use its own dio workqueue to punt aio completions. Suggested-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/8620dfd3-372d-4ae0-aa3f-2fe97dda1bca@kernel.dk/ Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* ovl: propagate IOCB_APPEND flag on writes to realfileAmir Goldstein2023-10-301-1/+1
| | | | | | | | | | | | | | If ovl file is opened O_APPEND, the underlying realfile is also opened O_APPEND, so it makes sense to propagate the IOCB_APPEND flags on sync writes to realfile, just as we do with aio writes. Effectively, because sync ovl writes are protected by inode lock, this change only makes a difference if the realfile is written to (size extending writes) from underneath overlayfs. The behavior in this case is undefined, so it is ok if we change the behavior (to fail the ovl IOCB_APPEND write). Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* ovl: use simpler function to convert iocb to rw flagsAmir Goldstein2023-10-301-17/+11
| | | | | | | | | | | | Overlayfs implements its own function to translate iocb flags into rw flags, so that they can be passed into another vfs call. With commit ce71bfea207b4 ("fs: align IOCB_* flags with RWF_* flags") Jens created a 1:1 matching between the iocb flags and rw flags, simplifying the conversion. Signed-off-by: Alessio Balsini <balsini@android.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* overlayfs: convert to new timestamp accessorsJeff Layton2023-10-181-3/+6
| | | | | | | | Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-58-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
* ovl: fix file reference leak when submitting aioAmir Goldstein2023-10-021-2/+0
| | | | | | | | | | | Commit 724768a39374 ("ovl: fix incorrect fdput() on aio completion") took a refcount on real file before submitting aio, but forgot to avoid clearing FDPUT_FPUT from real.flags stack variable. This can result in a file reference leak. Fixes: 724768a39374 ("ovl: fix incorrect fdput() on aio completion") Reported-by: Gil Lev <contact@levgil.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* Merge tag 'v6.6-rc4.vfs.fixes' of ↵Linus Torvalds2023-09-261-0/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "This contains the usual miscellaneous fixes and cleanups for vfs and individual fses: Fixes: - Revert ki_pos on error from buffered writes for direct io fallback - Add missing documentation for block device and superblock handling for changes merged this cycle - Fix reiserfs flexible array usage - Ensure that overlayfs sets ctime when setting mtime and atime - Disable deferred caller completions with overlayfs writes until proper support exists Cleanups: - Remove duplicate initialization in pipe code - Annotate aio kioctx_table with __counted_by" * tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: overlayfs: set ctime when setting mtime and atime ntfs3: put resources during ntfs_fill_super() ovl: disable IOCB_DIO_CALLER_COMP porting: document superblock as block device holder porting: document new block device opening order fs/pipe: remove duplicate "offset" initializer fs-writeback: do not requeue a clean inode having skipped pages aio: Annotate struct kioctx_table with __counted_by direct_write_fallback(): on error revert the ->ki_pos update from buffered write reiserfs: Replace 1-element array with C99 style flex-array
| * ovl: disable IOCB_DIO_CALLER_COMPJens Axboe2023-09-251-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | overlayfs copies the kiocb flags when it sets up a new kiocb to handle a write, but it doesn't properly support dealing with the deferred caller completions of the kiocb. This means it doesn't get the final write completion value, and hence will complete the write with '0' as the result. We could support the caller completions in overlayfs, but for now let's just disable them in the generated write kiocb. Reported-by: Zorro Lang <zlang@redhat.com> Link: https://lore.kernel.org/io-uring/20230924142754.ejwsjen5pvyc32l4@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com/ Fixes: 8c052fb3002e ("iomap: support IOCB_DIO_CALLER_COMP") Signed-off-by: Jens Axboe <axboe@kernel.dk> Message-Id: <71897125-e570-46ce-946a-d4729725e28f@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
* | ovl: fix incorrect fdput() on aio completionAmir Goldstein2023-09-041-6/+3
|/ | | | | | | | | | | | | | | ovl_{read,write}_iter() always call fdput(real) to put one or zero refcounts of the real file, but for aio, whether it was submitted or not, ovl_aio_put() also calls fdput(), which is not balanced. This is only a problem in the less common case when FDPUT_FPUT flag is set. To fix the problem use get_file() to take file refcount and use fput() instead of fdput() in ovl_aio_put(). Fixes: 2406a307ac7d ("ovl: implement async IO routines") Cc: <stable@vger.kernel.org> # v5.6 Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* Merge tag 'ovl-update-6.6' of ↵Linus Torvalds2023-08-301-4/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Amir Goldstein: - add verification feature needed by composefs (Alexander Larsson) - improve integration of overlayfs and fanotify (Amir Goldstein) - fortify some overlayfs code (Andrea Righi) * tag 'ovl-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: validate superblock in OVL_FS() ovl: make consistent use of OVL_FS() ovl: Kconfig: introduce CONFIG_OVERLAY_FS_DEBUG ovl: auto generate uuid for new overlay filesystems ovl: store persistent uuid/fsid with uuid=on ovl: add support for unique fsid per instance ovl: support encoding non-decodable file handles ovl: Handle verity during copy-up ovl: Validate verity xattr when resolving lowerdata ovl: Add versioned header for overlay.metacopy xattr ovl: Add framework for verity support
| * ovl: Validate verity xattr when resolving lowerdataAlexander Larsson2023-08-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new digest field in the metacopy xattr is used during lookup to record whether the header contained a digest in the OVL_HAS_DIGEST flags. When accessing file data the first time, if OVL_HAS_DIGEST is set, we reload the metadata and check that the source lowerdata inode matches the specified digest in it (according to the enabled verity options). If the verity check passes we store this info in the inode flags as OVL_VERIFIED_DIGEST, so that we can avoid doing it again if the inode remains in memory. The verification is done in ovl_maybe_validate_verity() which needs to be called in the same places as ovl_maybe_lookup_lowerdata(), so there is a new ovl_verify_lowerdata() helper that calls these in the right order, and all current callers of ovl_maybe_lookup_lowerdata() are changed to call it instead. Signed-off-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* | Merge tag 'v6.6-vfs.misc' of ↵Linus Torvalds2023-08-281-8/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual filesystems. Features: - Block mode changes on symlinks and rectify our broken semantics - Report file modifications via fsnotify() for splice - Allow specifying an explicit timeout for the "rootwait" kernel command line option. This allows to timeout and reboot instead of always waiting indefinitely for the root device to show up - Use synchronous fput for the close system call Cleanups: - Get rid of open-coded lockdep workarounds for async io submitters and replace it all with a single consolidated helper - Simplify epoll allocation helper - Convert simple_write_begin and simple_write_end to use a folio - Convert page_cache_pipe_buf_confirm() to use a folio - Simplify __range_close to avoid pointless locking - Disable per-cpu buffer head cache for isolated cpus - Port ecryptfs to kmap_local_page() api - Remove redundant initialization of pointer buf in pipe code - Unexport the d_genocide() function which is only used within core vfs - Replace printk(KERN_ERR) and WARN_ON() with WARN() Fixes: - Fix various kernel-doc issues - Fix refcount underflow for eventfds when used as EFD_SEMAPHORE - Fix a mainly theoretical issue in devpts - Check the return value of __getblk() in reiserfs - Fix a racy assert in i_readcount_dec - Fix integer conversion issues in various functions - Fix LSM security context handling during automounts that prevented NFS superblock sharing" * tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) cachefiles: use kiocb_{start,end}_write() helpers ovl: use kiocb_{start,end}_write() helpers aio: use kiocb_{start,end}_write() helpers io_uring: use kiocb_{start,end}_write() helpers fs: create kiocb_{start,end}_write() helpers fs: add kerneldoc to file_{start,end}_write() helpers io_uring: rename kiocb_end_write() local helper splice: Convert page_cache_pipe_buf_confirm() to use a folio libfs: Convert simple_write_begin and simple_write_end to use a folio fs/dcache: Replace printk and WARN_ON by WARN fs/pipe: remove redundant initialization of pointer buf fs: Fix kernel-doc warnings devpts: Fix kernel-doc warnings doc: idmappings: fix an error and rephrase a paragraph init: Add support for rootwait timeout parameter vfs: fix up the assert in i_readcount_dec fs: Fix one kernel-doc comment docs: filesystems: idmappings: clarify from where idmappings are taken fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing ...
| * | ovl: use kiocb_{start,end}_write() helpersAmir Goldstein2023-08-211-8/+2
| |/ | | | | | | | | | | | | | | | | | | | | Use helpers instead of the open coded dance to silence lockdep warnings. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Message-Id: <20230817141337.1025891-7-amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
* / overlayfs: convert to ctime accessor functionsJeff Layton2023-07-241-2/+5
|/ | | | | | | | | | | | In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-64-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
* Merge tag 'ovl-update-6.5' of ↵Linus Torvalds2023-06-291-2/+19
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs update from Amir Goldstein: - fix two NULL pointer deref bugs (Zhihao Cheng) - add support for "data-only" lower layers destined to be used by composefs - port overlayfs to the new mount api (Christian Brauner) * tag 'ovl-update-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: (26 commits) ovl: add Amir as co-maintainer ovl: reserve ability to reconfigure mount options with new mount api ovl: modify layer parameter parsing ovl: port to new mount api ovl: factor out ovl_parse_options() helper ovl: store enum redirect_mode in config instead of a string ovl: pass ovl_fs to xino helpers ovl: clarify ovl_get_root() semantics ovl: negate the ofs->share_whiteout boolean ovl: check type and offset of struct vfsmount in ovl_entry ovl: implement lazy lookup of lowerdata in data-only layers ovl: prepare for lazy lookup of lowerdata inode ovl: prepare to store lowerdata redirect for lazy lowerdata lookup ovl: implement lookup in data-only layers ovl: introduce data-only lower layers ovl: remove unneeded goto instructions ovl: deduplicate lowerdata and lowerstack[] ovl: deduplicate lowerpath and lowerstack[] ovl: move ovl_entry into ovl_inode ovl: factor out ovl_free_entry() and ovl_stack_*() helpers ...
| * ovl: implement lazy lookup of lowerdata in data-only layersAmir Goldstein2023-06-191-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Defer lookup of lowerdata in the data-only layers to first data access or before copy up. We perform lowerdata lookup before copy up even if copy up is metadata only copy up. We can further optimize this lookup later if needed. We do best effort lazy lookup of lowerdata for d_real_inode(), because this interface does not expect errors. The only current in-tree caller of d_real_inode() is trace_uprobe and this caller is likely going to be followed reading from the file, before placing uprobes on offset within the file, so lowerdata should be available when setting the uprobe. Tested-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * ovl: prepare for lazy lookup of lowerdata inodeAmir Goldstein2023-06-191-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the code handle the case of numlower > 1 and missing lowerdata dentry gracefully. Missing lowerdata dentry is an indication for lazy lookup of lowerdata and in that case the lowerdata_redirect path is stored in ovl_inode. Following commits will defer lookup and perform the lazy lookup on access. Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | Merge tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linuxLinus Torvalds2023-06-261-1/+22
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull splice updates from Jens Axboe: "This kills off ITER_PIPE to avoid a race between truncate, iov_iter_revert() on the pipe and an as-yet incomplete DMA to a bio with unpinned/unref'ed pages from an O_DIRECT splice read. This causes memory corruption. Instead, we either use (a) filemap_splice_read(), which invokes the buffered file reading code and splices from the pagecache into the pipe; (b) copy_splice_read(), which bulk-allocates a buffer, reads into it and then pushes the filled pages into the pipe; or (c) handle it in filesystem-specific code. Summary: - Rename direct_splice_read() to copy_splice_read() - Simplify the calculations for the number of pages to be reclaimed in copy_splice_read() - Turn do_splice_to() into a helper, vfs_splice_read(), so that it can be used by overlayfs and coda to perform the checks on the lower fs - Make vfs_splice_read() jump to copy_splice_read() to handle direct-I/O and DAX - Provide shmem with its own splice_read to handle non-existent pages in the pagecache. We don't want a ->read_folio() as we don't want to populate holes, but filemap_get_pages() requires it - Provide overlayfs with its own splice_read to call down to a lower layer as overlayfs doesn't provide ->read_folio() - Provide coda with its own splice_read to call down to a lower layer as coda doesn't provide ->read_folio() - Direct ->splice_read to copy_splice_read() in tty, procfs, kernfs and random files as they just copy to the output buffer and don't splice pages - Provide wrappers for afs, ceph, ecryptfs, ext4, f2fs, nfs, ntfs3, ocfs2, orangefs, xfs and zonefs to do locking and/or revalidation - Make cifs use filemap_splice_read() - Replace pointers to generic_file_splice_read() with pointers to filemap_splice_read() as DIO and DAX are handled in the caller; filesystems can still provide their own alternate ->splice_read() op - Remove generic_file_splice_read() - Remove ITER_PIPE and its paraphernalia as generic_file_splice_read was the only user" * tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linux: (31 commits) splice: kdoc for filemap_splice_read() and copy_splice_read() iov_iter: Kill ITER_PIPE splice: Remove generic_file_splice_read() splice: Use filemap_splice_read() instead of generic_file_splice_read() cifs: Use filemap_splice_read() trace: Convert trace/seq to use copy_splice_read() zonefs: Provide a splice-read wrapper xfs: Provide a splice-read wrapper orangefs: Provide a splice-read wrapper ocfs2: Provide a splice-read wrapper ntfs3: Provide a splice-read wrapper nfs: Provide a splice-read wrapper f2fs: Provide a splice-read wrapper ext4: Provide a splice-read wrapper ecryptfs: Provide a splice-read wrapper ceph: Provide a splice-read wrapper afs: Provide a splice-read wrapper 9p: Add splice_read wrapper net: Make sock_splice_read() use copy_splice_read() by default tty, proc, kernfs, random: Use copy_splice_read() ...
| * | overlayfs: Implement splice-readDavid Howells2023-05-241-1/+22
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement splice-read for overlayfs by passing the request down a layer rather than going through generic_file_splice_read() which is going to be changed to assume that ->read_folio() is present on buffered files. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Christian Brauner <brauner@kernel.org> cc: Christoph Hellwig <hch@lst.de> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: John Hubbard <jhubbard@nvidia.com> cc: David Hildenbrand <david@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: Miklos Szeredi <miklos@szeredi.hu> cc: Amir Goldstein <amir73il@gmail.com> cc: linux-unionfs@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20230522135018.2742245-11-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | ovl: enable fsnotify events on underlying real filesAmir Goldstein2023-06-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Overlayfs creates the real underlying files with fake f_path, whose f_inode is on the underlying fs and f_path on overlayfs. Those real files were open with FMODE_NONOTIFY, because fsnotify code was not prapared to handle fsnotify hooks on files with fake path correctly and fanotify would report unexpected event->fd with fake overlayfs path, when the underlying fs was being watched. Teach fsnotify to handle events on the real files, and do not set real files to FMODE_NONOTIFY to allow operations on real file (e.g. open, access, modify, close) to generate async and permission events. Because fsnotify does not have notifications on address space operations, we do not need to worry about ->vm_file not reporting events to a watched overlayfs when users are accessing a mapped overlayfs file. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Message-Id: <20230615112229.2143178-6-amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
* | fs: use backing_file container for internal files with "fake" f_pathAmir Goldstein2023-06-191-2/+2
|/ | | | | | | | | | | | | | Overlayfs uses open_with_fake_path() to allocate internal kernel files, with a "fake" path - whose f_path is not on the same fs as f_inode. Allocate a container struct backing_file for those internal files, that is used to hold the "fake" ovl path along with the real path. backing_file_real_path() can be used to access the stored real path. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Message-Id: <20230615112229.2143178-5-amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
* fs: port inode_owner_or_capable() to mnt_idmapChristian Brauner2023-01-191-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* fs: port ->permission() to pass mnt_idmapChristian Brauner2023-01-191-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* Merge tag 'ovl-update-6.2' of ↵Linus Torvalds2022-12-131-1/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs update from Miklos Szeredi: - Fix a couple of bugs found by syzbot - Don't ingore some open flags set by fcntl(F_SETFL) - Fix failure to create a hard link in certain cases - Use type safe helpers for some mnt_userns transformations - Improve performance of mount - Misc cleanups * tag 'ovl-update-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: Kconfig: Fix spelling mistake "undelying" -> "underlying" ovl: use inode instead of dentry where possible ovl: Add comment on upperredirect reassignment ovl: use plain list filler in indexdir and workdir cleanup ovl: do not reconnect upper index records in ovl_indexdir_cleanup() ovl: fix comment typos ovl: port to vfs{g,u}id_t and associated helpers ovl: Use ovl mounter's fsuid and fsgid in ovl_link() ovl: Use "buf" flexible array for memcpy() destination ovl: update ->f_iocb_flags when ovl_change_flags() modifies ->f_flags ovl: fix use inode directly in rcu-walk mode
| * ovl: fix comment typosJiangshan Yi2022-12-081-1/+1
| | | | | | | | | | | | | | | | Fix two typos. Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: Jiangshan Yi <yijiangshan@kylinos.cn> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * ovl: update ->f_iocb_flags when ovl_change_flags() modifies ->f_flagsAl Viro2022-11-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | ovl_change_flags() is an open-coded variant of fs/fcntl.c:setfl() and it got missed by commit 164f4064ca81 ("keep iocb_flags() result cached in struct file"); the same change applies there. Reported-by: Pierre Labastie <pierre.labastie@neuf.fr> Fixes: 164f4064ca81 ("keep iocb_flags() result cached in struct file") Cc: <stable@vger.kernel.org> # v6.0 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216738 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | ovl: remove privs in ovl_fallocate()Amir Goldstein2022-10-181-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | Underlying fs doesn't remove privs because fallocate is called with privileged mounter credentials. This fixes some failure in fstests generic/683..687. Fixes: aab8848cee5e ("ovl: add ovl_fallocate()") Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* | ovl: remove privs in ovl_copyfile()Amir Goldstein2022-10-181-2/+14
|/ | | | | | | | | | | | Underlying fs doesn't remove privs because copy_range/remap_range are called with privileged mounter credentials. This fixes some failures in fstest generic/673. Fixes: 8ede205541ff ("ovl: add reflink/copyfile/dedup support") Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* overlayfs: constify pathAl Viro2022-09-011-1/+1
| | | | | Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge tag 'ovl-update-5.19' of ↵Linus Torvalds2022-05-301-19/+24
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: - Support idmapped layers in overlayfs (Christian Brauner) - Add a fix to exportfs that is relevant to open_by_handle_at(2) as well - Introduce new lookup helpers that allow passing mnt_userns into inode_permission() * tag 'ovl-update-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: support idmapped layers ovl: handle idmappings in ovl_xattr_{g,s}et() ovl: handle idmappings in layer open helpers ovl: handle idmappings in ovl_permission() ovl: use ovl_copy_{real,upper}attr() wrappers ovl: store lower path in ovl_inode ovl: handle idmappings for layer lookup ovl: handle idmappings for layer fileattrs ovl: use ovl_path_getxattr() wrapper ovl: use ovl_lookup_upper() wrapper ovl: use ovl_do_notify_change() wrapper ovl: pass layer mnt to ovl_open_realfile() ovl: pass ofs to setattr operations ovl: handle idmappings in creation operations ovl: add ovl_upper_mnt_userns() wrapper ovl: pass ofs to creation operations ovl: use wrappers to all vfs_*xattr() calls exportfs: support idmapped mounts fs: add two trivial lookup helpers
| * ovl: handle idmappings in layer open helpersChristian Brauner2022-04-281-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | In earlier patches we already passed down the relevant upper or lower path to ovl_open_realfile(). Now let the open helpers actually take the idmapping of the relevant mount into account when checking permissions. This is needed to support idmapped base layers with overlay. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * ovl: use ovl_copy_{real,upper}attr() wrappersChristian Brauner2022-04-281-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When copying inode attributes from the upper or lower layer to ovl inodes we need to take the upper or lower layer's mount's idmapping into account. In a lot of places we call ovl_copyattr() only on upper inodes and in some we call it on either upper or lower inodes. Split this into two separate helpers. The first one should only be called on upper inodes and is thus called ovl_copy_upperattr(). The second one can be called on upper or lower inodes. We add ovl_copy_realattr() for this task. The new helper makes use of the previously added ovl_i_path_real() helper. This is needed to support idmapped base layers with overlay. When overlay copies the inode information from an upper or lower layer to the relevant overlay inode it will apply the idmapping of the upper or lower layer when doing so. The ovl inode ownership will thus always correctly reflect the ownership of the idmapped upper or lower layer. All idmapping helpers are nops when no idmapped base layers are used. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>