diff options
author | Theodore Ts'o <tytso@mit.edu> | 2015-02-02 06:37:00 +0100 |
---|---|---|
committer | Al Viro <viro@zeniv.linux.org.uk> | 2015-02-05 08:45:00 +0100 |
commit | 0ae45f63d4ef8d8eeec49c7d8b44a1775fff13e8 (patch) | |
tree | 660dbb014482092361eab263847fb906b5a9ec22 /fs/ext4 | |
parent | Linux 3.19-rc7 (diff) | |
download | linux-0ae45f63d4ef8d8eeec49c7d8b44a1775fff13e8.tar.xz linux-0ae45f63d4ef8d8eeec49c7d8b44a1775fff13e8.zip |
vfs: add support for a lazytime mount option
Add a new mount option which enables a new "lazytime" mode. This mode
causes atime, mtime, and ctime updates to only be made to the
in-memory version of the inode. The on-disk times will only get
updated when (a) if the inode needs to be updated for some non-time
related change, (b) if userspace calls fsync(), syncfs() or sync(), or
(c) just before an undeleted inode is evicted from memory.
This is OK according to POSIX because there are no guarantees after a
crash unless userspace explicitly requests via a fsync(2) call.
For workloads which feature a large number of random write to a
preallocated file, the lazytime mount option significantly reduces
writes to the inode table. The repeated 4k writes to a single block
will result in undesirable stress on flash devices and SMR disk
drives. Even on conventional HDD's, the repeated writes to the inode
table block will trigger Adjacent Track Interference (ATI) remediation
latencies, which very negatively impact long tail latencies --- which
is a very big deal for web serving tiers (for example).
Google-Bug-Id: 18297052
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Diffstat (limited to 'fs/ext4')
-rw-r--r-- | fs/ext4/inode.c | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 5653fa42930b..628df5ba44a6 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4840,11 +4840,17 @@ int ext4_mark_inode_dirty(handle_t *handle, struct inode *inode) * If the inode is marked synchronous, we don't honour that here - doing * so would cause a commit on atime updates, which we don't bother doing. * We handle synchronous inodes at the highest possible level. + * + * If only the I_DIRTY_TIME flag is set, we can skip everything. If + * I_DIRTY_TIME and I_DIRTY_SYNC is set, the only inode fields we need + * to copy into the on-disk inode structure are the timestamp files. */ void ext4_dirty_inode(struct inode *inode, int flags) { handle_t *handle; + if (flags == I_DIRTY_TIME) + return; handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); if (IS_ERR(handle)) goto out; |