diff options
author | Jeff Layton <jlayton@kernel.org> | 2023-08-07 21:38:40 +0200 |
---|---|---|
committer | Christian Brauner <brauner@kernel.org> | 2023-08-11 09:04:57 +0200 |
commit | ffb6cf19e06334062744b7e3493f71e500964f8e (patch) | |
tree | 77da82b86bbb1418f0903b6abd4e280615d80df9 /fs/stat.c | |
parent | fs: drop the timespec64 argument from update_time (diff) | |
download | linux-ffb6cf19e06334062744b7e3493f71e500964f8e.tar.xz linux-ffb6cf19e06334062744b7e3493f71e500964f8e.zip |
fs: add infrastructure for multigrain timestamps
The VFS always uses coarse-grained timestamps when updating the ctime
and mtime after a change. This has the benefit of allowing filesystems
to optimize away a lot metadata updates, down to around 1 per jiffy,
even when a file is under heavy writes.
Unfortunately, this has always been an issue when we're exporting via
NFSv3, which relies on timestamps to validate caches. A lot of changes
can happen in a jiffy, so timestamps aren't sufficient to help the
client decide to invalidate the cache. Even with NFSv4, a lot of
exported filesystems don't properly support a change attribute and are
subject to the same problems with timestamp granularity. Other
applications have similar issues with timestamps (e.g backup
applications).
If we were to always use fine-grained timestamps, that would improve the
situation, but that becomes rather expensive, as the underlying
filesystem would have to log a lot more metadata updates.
What we need is a way to only use fine-grained timestamps when they are
being actively queried.
POSIX generally mandates that when the the mtime changes, the ctime must
also change. The kernel always stores normalized ctime values, so only
the first 30 bits of the tv_nsec field are ever used.
Use the 31st bit of the ctime tv_nsec field to indicate that something
has queried the inode for the mtime or ctime. When this flag is set,
on the next mtime or ctime update, the kernel will fetch a fine-grained
timestamp instead of the usual coarse-grained one.
Filesytems can opt into this behavior by setting the FS_MGTIME flag in
the fstype. Filesystems that don't set this flag will continue to use
coarse-grained timestamps.
Later patches will convert individual filesystems to use the new
infrastructure.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230807-mgctime-v7-9-d1dec143a704@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Diffstat (limited to 'fs/stat.c')
-rw-r--r-- | fs/stat.c | 41 |
1 files changed, 39 insertions, 2 deletions
diff --git a/fs/stat.c b/fs/stat.c index 7644e5997035..136711ae72fb 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -27,6 +27,37 @@ #include "mount.h" /** + * fill_mg_cmtime - Fill in the mtime and ctime and flag ctime as QUERIED + * @stat: where to store the resulting values + * @request_mask: STATX_* values requested + * @inode: inode from which to grab the c/mtime + * + * Given @inode, grab the ctime and mtime out if it and store the result + * in @stat. When fetching the value, flag it as queried so the next write + * will use a fine-grained timestamp. + */ +void fill_mg_cmtime(struct kstat *stat, u32 request_mask, struct inode *inode) +{ + atomic_long_t *pnsec = (atomic_long_t *)&inode->__i_ctime.tv_nsec; + + /* If neither time was requested, then don't report them */ + if (!(request_mask & (STATX_CTIME|STATX_MTIME))) { + stat->result_mask &= ~(STATX_CTIME|STATX_MTIME); + return; + } + + stat->mtime = inode->i_mtime; + stat->ctime.tv_sec = inode->__i_ctime.tv_sec; + /* + * Atomically set the QUERIED flag and fetch the new value with + * the flag masked off. + */ + stat->ctime.tv_nsec = atomic_long_fetch_or(I_CTIME_QUERIED, pnsec) & + ~I_CTIME_QUERIED; +} +EXPORT_SYMBOL(fill_mg_cmtime); + +/** * generic_fillattr - Fill in the basic attributes from the inode struct * @idmap: idmap of the mount the inode was found from * @request_mask: statx request_mask @@ -58,8 +89,14 @@ void generic_fillattr(struct mnt_idmap *idmap, u32 request_mask, stat->rdev = inode->i_rdev; stat->size = i_size_read(inode); stat->atime = inode->i_atime; - stat->mtime = inode->i_mtime; - stat->ctime = inode_get_ctime(inode); + + if (is_mgtime(inode)) { + fill_mg_cmtime(stat, request_mask, inode); + } else { + stat->mtime = inode->i_mtime; + stat->ctime = inode_get_ctime(inode); + } + stat->blksize = i_blocksize(inode); stat->blocks = inode->i_blocks; |