summaryrefslogtreecommitdiffstats
path: root/fs/overlayfs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* ovl: fix regression caused by exclusive upper/work dir protectionAmir Goldstein2017-10-052-8/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enforcing exclusive ownership on upper/work dirs caused a docker regression: https://github.com/moby/moby/issues/34672. Euan spotted the regression and pointed to the offending commit. Vivek has brought the regression to my attention and provided this reproducer: Terminal 1: mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none merged/ Terminal 2: unshare -m Terminal 1: umount merged mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none merged/ mount: /root/overlay-testing/merged: none already mounted or mount point busy To fix the regression, I replaced the error with an alarming warning. With index feature enabled, mount does fail, but logs a suggestion to override exclusive dir protection by disabling index. Note that index=off mount does take the inuse locks, so a concurrent index=off will issue the warning and a concurrent index=on mount will fail. Documentation was updated to reflect this change. Fixes: 2cac0c00a6cd ("ovl: get exclusive ownership on upper/work dirs") Cc: <stable@vger.kernel.org> # v4.13 Reported-by: Euan Kemp <euank@euank.com> Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: fix missing unlock_rename() in ovl_do_copy_up()Amir Goldstein2017-10-054-24/+22
| | | | | | | | | | Use the ovl_lock_rename_workdir() helper which requires unlock_rename() only on lock success. Fixes: ("fd210b7d67ee ovl: move copy up lock out") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: fix dentry leak in ovl_indexdir_cleanup()Amir Goldstein2017-10-051-2/+4
| | | | | | | | | | index dentry was not released when breaking out of the loop due to index verification error. Fixes: 415543d5c64f ("ovl: cleanup bad and stale index entries on mount") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: fix dput() of ERR_PTR in ovl_cleanup_index()Amir Goldstein2017-10-051-1/+4
| | | | | | | Fixes: caf70cb2ba5d ("ovl: cleanup orphan index entries") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: fix error value printed in ovl_lookup_index()Amir Goldstein2017-10-051-0/+1
| | | | | | | Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* Merge branch 'work.mount' of ↵Linus Torvalds2017-09-151-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mount flag updates from Al Viro: "Another chunk of fmount preparations from dhowells; only trivial conflicts for that part. It separates MS_... bits (very grotty mount(2) ABI) from the struct super_block ->s_flags (kernel-internal, only a small subset of MS_... stuff). This does *not* convert the filesystems to new constants; only the infrastructure is done here. The next step in that series is where the conflicts would be; that's the conversion of filesystems. It's purely mechanical and it's better done after the merge, so if you could run something like list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$') sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \ -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \ -e 's/\<MS_NODEV\>/SB_NODEV/g' \ -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \ -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \ -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \ -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \ -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \ -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \ -e 's/\<MS_SILENT\>/SB_SILENT/g' \ -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \ -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \ -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \ -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \ $list and commit it with something along the lines of 'convert filesystems away from use of MS_... constants' as commit message, it would save a quite a bit of headache next cycle" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: VFS: Differentiate mount flags (MS_*) from internal superblock flags VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
| * VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)David Howells2017-07-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Firstly by applying the following with coccinelle's spatch: @@ expression SB; @@ -SB->s_flags & MS_RDONLY +sb_rdonly(SB) to effect the conversion to sb_rdonly(sb), then by applying: @@ expression A, SB; @@ ( -(!sb_rdonly(SB)) && A +!sb_rdonly(SB) && A | -A != (sb_rdonly(SB)) +A != sb_rdonly(SB) | -A == (sb_rdonly(SB)) +A == sb_rdonly(SB) | -!(sb_rdonly(SB)) +!sb_rdonly(SB) | -A && (sb_rdonly(SB)) +A && sb_rdonly(SB) | -A || (sb_rdonly(SB)) +A || sb_rdonly(SB) | -(sb_rdonly(SB)) != A +sb_rdonly(SB) != A | -(sb_rdonly(SB)) == A +sb_rdonly(SB) == A | -(sb_rdonly(SB)) && A +sb_rdonly(SB) && A | -(sb_rdonly(SB)) || A +sb_rdonly(SB) || A ) @@ expression A, B, SB; @@ ( -(sb_rdonly(SB)) ? 1 : 0 +sb_rdonly(SB) | -(sb_rdonly(SB)) ? A : B +sb_rdonly(SB) ? A : B ) to remove left over excess bracketage and finally by applying: @@ expression A, SB; @@ ( -(A & MS_RDONLY) != sb_rdonly(SB) +(bool)(A & MS_RDONLY) != sb_rdonly(SB) | -(A & MS_RDONLY) == sb_rdonly(SB) +(bool)(A & MS_RDONLY) == sb_rdonly(SB) ) to make comparisons against the result of sb_rdonly() (which is a bool) work correctly. Signed-off-by: David Howells <dhowells@redhat.com>
* | mm: treewide: remove GFP_TEMPORARY allocation flagMichal Hocko2017-09-143-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's primary motivation was to allow users to tell that an allocation is short lived and so the allocator can try to place such allocations close together and prevent long term fragmentation. As much as this sounds like a reasonable semantic it becomes much less clear when to use the highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the context holding that memory sleep? Can it take locks? It seems there is no good answer for those questions. The current implementation of GFP_TEMPORARY is basically GFP_KERNEL | __GFP_RECLAIMABLE which in itself is tricky because basically none of the existing caller provide a way to reclaim the allocated memory. So this is rather misleading and hard to evaluate for any benefits. I have checked some random users and none of them has added the flag with a specific justification. I suspect most of them just copied from other existing users and others just thought it might be a good idea to use without any measuring. This suggests that GFP_TEMPORARY just motivates for cargo cult usage without any reasoning. I believe that our gfp flags are quite complex already and especially those with highlevel semantic should be clearly defined to prevent from confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and replace all existing users to simply use GFP_KERNEL. Please note that SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and so they will be placed properly for memory fragmentation prevention. I can see reasons we might want some gfp flag to reflect shorterm allocations but I propose starting from a clear semantic definition and only then add users with proper justification. This was been brought up before LSF this year by Matthew [1] and it turned out that GFP_TEMPORARY really doesn't have a clear semantic. It seems to be a heuristic without any measured advantage for most (if not all) its current users. The follow up discussion has revealed that opinions on what might be temporary allocation differ a lot between developers. So rather than trying to tweak existing users into a semantic which they haven't expected I propose to simply remove the flag and start from scratch if we really need a semantic for short term allocations. [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org [akpm@linux-foundation.org: fix typo] [akpm@linux-foundation.org: coding-style fixes] [sfr@canb.auug.org.au: drm/i915: fix up] Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Neil Brown <neilb@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'overlayfs-linus' of ↵Linus Torvalds2017-09-136-56/+395
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This fixes d_ino correctness in readdir, which brings overlayfs on par with normal filesystems regarding inode number semantics, as long as all layers are on the same filesystem. There are also some bug fixes, one in particular (random ioctl's shouldn't be able to modify lower layers) that touches some vfs code, but of course no-op for non-overlay fs" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix false positive ESTALE on lookup ovl: don't allow writing ioctl on lower layer ovl: fix relatime for directories vfs: add flags to d_real() ovl: cleanup d_real for negative ovl: constant d_ino for non-merge dirs ovl: constant d_ino across copy up ovl: fix readdir error value ovl: check snprintf return
| * | ovl: fix false positive ESTALE on lookupAmir Goldstein2017-09-121-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up origin") verifies that the origin lower inode stored in the overlayfs inode matched the inode of a copy up origin dentry found by lookup. There is a false positive result in that check when lower fs does not support file handles and copy up origin cannot be followed by file handle at lookup time. The false negative happens when finding an overlay inode in cache on a copied up overlay dentry lookup. The overlay inode still 'remembers' the copy up origin inode, but the copy up origin dentry is not available for verification. Relax the check in case copy up origin dentry is not available. Fixes: b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up...") Cc: <stable@vger.kernel.org> # v4.13 Reported-by: Jordi Pujol <jordipujolp@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: fix relatime for directoriesMiklos Szeredi2017-09-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Need to treat non-regular overlayfs files the same as regular files when checking for an atime update. Add a d_real() flag to make it return the upper dentry for all file types. Reported-by: "zhangyi (F)" <yi.zhang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | vfs: add flags to d_real()Miklos Szeredi2017-09-041-2/+2
| | | | | | | | | | | | | | | | | | | | | Add a separate flags argument (in addition to the open flags) to control the behavior of d_real(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: cleanup d_real for negativeMiklos Szeredi2017-09-041-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | d_real() is never called with a negative dentry. So remove the d_is_negative() check (which would never trigger anyway, since d_is_reg() returns false for a negative dentry). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: constant d_ino for non-merge dirsMiklos Szeredi2017-07-275-45/+266
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impure directories are ones which contain objects with origins (i.e. those that have been copied up). These are relevant to readdir operation only because of the d_ino field, no other transformation is necessary. Also a directory can become impure between two getdents(2) calls. This patch creates a cache for impure directories. Unlike the cache for merged directories, this one only contains entries with origin and is not refcounted but has a its lifetime tied to that of the dentry. Similarly to the merged cache, the impure cache is invalidated based on a version number. This version number is incremented when an entry with origin is added or removed from the directory. If the cache is empty, then the impure xattr is removed from the directory. This patch also fixes up handling of d_ino for the ".." entry if the parent directory is merged. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: constant d_ino across copy upAmir Goldstein2017-07-271-1/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When all layers are on the same fs, and iterating a directory which may contain copy up entries, call vfs_getattr() on the overlay entries to make sure that d_ino will be consistent with st_ino from stat(2). There is an overhead of lookup per upper entry in readdir. The overhead is minimal if the iterated entries are already in dcache. It is also quite useful for the common case of 'ls -l' that readdir() pre populates the dcache with the listed entries, making the following stat() calls faster. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: fix readdir error valueMiklos Szeredi2017-07-271-1/+3
| | | | | | | | | | | | | | | | | | | | | actor's return value is taken as a bool (filled/not filled) so we need to return the error in the context. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: check snprintf returnMiklos Szeredi2017-07-271-0/+3
| | | | | | | | | | | | Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | | overlayfs, locking: Remove smp_mb__before_spinlock() usagePeter Zijlstra2017-08-101-2/+2
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | While we could replace the smp_mb__before_spinlock() with the new smp_mb__after_spinlock(), the normal pattern is to use smp_store_release() to publish an object that is used for lockless_dereference() -- and mirrors the regular rcu_assign_pointer() / rcu_dereference() patterns. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | ovl: check for bad and whiteout index on lookupAmir Goldstein2017-07-201-5/+17
| | | | | | | | | | | | | | | | | | | | | | Index should always be of the same file type as origin, except for the case of a whiteout index. A whiteout index should only exist if all lower aliases have been unlinked, which means that finding a lower origin on lookup whose index is a whiteout should be treated as a lookup error. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | ovl: do not cleanup directory and whiteout index entriesAmir Goldstein2017-07-202-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Directory index entries are going to be used for looking up redirected upper dirs by lower dir fh when decoding an overlay file handle of a merge dir. Whiteout index entries are going to be used as an indication that an exported overlay file handle should be treated as stale (i.e. after unlink of the overlay inode). We don't know the verification rules for directory and whiteout index entries, because they have not been implemented yet, so fail to mount overlay rw if those entries are found to avoid corrupting an index that was created by a newer kernel. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | ovl: fix xattr get and set with selinuxMiklos Szeredi2017-07-204-23/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | inode_doinit_with_dentry() in SELinux wants to read the upper inode's xattr to get security label, and ovl_xattr_get() calls ovl_dentry_real(), which depends on dentry->d_inode, but d_inode is null and not initialized yet at this point resulting in an Oops. Fix by getting the upperdentry info from the inode directly in this case. Reported-by: Eryu Guan <eguan@redhat.com> Fixes: 09d8b586731b ("ovl: move __upperdentry to ovl_inode") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | ovl: remove unneeded check for IS_ERR()Amir Goldstein2017-07-131-4/+0
| | | | | | | | | | | | | | | | ovl_workdir_create() returns a valid index dentry or NULL. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | ovl: fix origin verification of index dirAmir Goldstein2017-07-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | Commit 54fb347e836f ("ovl: verify index dir matches upper dir") introduced a new ovl_fh flag OVL_FH_FLAG_PATH_UPPER to indicate an upper file handle, but forgot to add the flag to the mask of valid flags, so index dir origin verification always discards existing origin and stores a new one. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | ovl: mark parent impure on ovl_link()Amir Goldstein2017-07-131-4/+18
| | | | | | | | | | | | | | | | | | | | When linking a file with copy up origin into a new parent, mark the new parent dir "impure". Fixes: ee1d6d37b6b8 ("ovl: mark upper dir with type origin entries "impure"") Cc: <stable@vger.kernel.org> # v4.12 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | ovl: fix random return value on mountAmir Goldstein2017-07-131-0/+1
|/ | | | | | | | | | | | | On failure to prepare_creds(), mount fails with a random return value, as err was last set to an integer cast of a valid lower mnt pointer or set to 0 if inodes index feature is enabled. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 3fe6e52f0626 ("ovl: override creds with the ones from ...") Cc: <stable@vger.kernel.org> # v4.7 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: mark parent impure and restore timestamp on ovl_link_up()Amir Goldstein2017-07-041-24/+33
| | | | Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* ovl: cleanup orphan index entriesAmir Goldstein2017-07-045-5/+77
| | | | | | | | | | | | | | | | | index entry should live only as long as there are upper or lower hardlinks. Cleanup orphan index entries on mount and when dropping the last overlay inode nlink. When about to cleanup or link up to orphan index and the index inode nlink > 1, admit that something went wrong and adjust overlay nlink to index inode nlink - 1 to prevent it from dropping below zero. This could happen when adding lower hardlinks underneath a mounted overlay and then trying to unlink them. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: persistent overlay inode nlink for indexed inodesAmir Goldstein2017-07-045-3/+204
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With inodes index enabled, an overlay inode nlink counts the union of upper and non-covered lower hardlinks. During the lifetime of a non-pure upper inode, the following nlink modifying operations can happen: 1. Lower hardlink copy up 2. Upper hardlink created, unlinked or renamed over 3. Lower hardlink whiteout or renamed over For the first, copy up case, the union nlink does not change, whether the operation succeeds or fails, but the upper inode nlink may change. Therefore, before copy up, we store the union nlink value relative to the lower inode nlink in the index inode xattr trusted.overlay.nlink. For the second, upper hardlink case, the union nlink should be incremented or decremented IFF the operation succeeds, aligned with nlink change of the upper inode. Therefore, before link/unlink/rename, we store the union nlink value relative to the upper inode nlink in the index inode. For the last, lower cover up case, we simplify things by preceding the whiteout or cover up with copy up. This makes sure that there is an index upper inode where the nlink xattr can be stored before the copied up upper entry is unlink. Return the overlay inode nlinks for indexed upper inodes on stat(2). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: implement index dir copy upAmir Goldstein2017-07-043-37/+103
| | | | | | | | | | | | | | | | | | | | | Implement a copy up method for non-dir objects using index dir to prevent breaking lower hardlinks on copy up. This method requires that the inodes index dir feature was enabled and that all underlying fs support file handle encoding/decoding. On the first lower hardlink copy up, upper file is created in index dir, named after the hex representation of the lower origin inode file handle. On the second lower hardlink copy up, upper file is found in index dir, by the same lower handle key. On either case, the upper indexed inode is then linked to the copy up upper path. The index entry remains linked for future lower hardlink copy up and for lower to upper inode map, that is needed for exporting overlayfs to NFS. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: move copy up lock outMiklos Szeredi2017-07-041-25/+13
| | | | | | | Move ovl_copy_up_start()/ovl_copy_up_end() out so that it's used for both tempfile and workdir copy ups. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: rearrange copy upMiklos Szeredi2017-07-041-36/+50
| | | | | | Split up and rearrange copy up functions to make them better readable. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: add flag for upper in ovl_entryMiklos Szeredi2017-07-047-2/+31
| | | | | | | | For rename, we need to ensure that an upper alias exists for hard links before attempting the operation. Introduce a flag in ovl_entry to track the state of the upper alias. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: use struct copy_up_ctx as function argumentMiklos Szeredi2017-07-041-82/+78
| | | | | | This cleans up functions with too many arguments. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: base tmpfile in workdir tooMiklos Szeredi2017-07-041-5/+3
| | | | Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: factor out ovl_copy_up_inode() helperAmir Goldstein2017-07-041-17/+29
| | | | | | | | Factor out helper for copying lower inode data and metadata to temp upper inode, that is common to copy up using O_TMPFILE and workdir. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: extract helper to get temp file in copy upMiklos Szeredi2017-07-041-18/+41
| | | | Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: defer upper dir lock to tempfile linkAmir Goldstein2017-07-042-30/+38
| | | | | | | | On copy up of regular file using an O_TMPFILE, lock upper dir only before linking the tempfile in place. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: hash overlay non-dir inodes by copy up originMiklos Szeredi2017-07-043-9/+44
| | | | Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: cleanup bad and stale index entries on mountAmir Goldstein2017-07-045-10/+130
| | | | | | | | | | | | | | | | | | | Bad index entries are entries whose name does not match the origin file handle stored in trusted.overlay.origin xattr. Bad index entries could be a result of a system power off in the middle of copy up. Stale index entries are entries whose origin file handle is stale. Stale index entries could be a result of copying layers or removing lower entries while the overlay is not mounted. The case of copying layers should be detected earlier by the verification of upper root dir origin and index dir origin. Both bad and stale index entries are detected and removed on mount. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: lookup index entry for copy up originAmir Goldstein2017-07-043-2/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When inodes index feature is enabled, lookup in indexdir for the index entry of lower real inode or copy up origin inode. The index entry name is the hex representation of the lower inode file handle. If the index dentry in negative, then either no lower aliases have been copied up yet, or aliases have been copied up in older kernels and are not indexed. If the index dentry for a copy up origin inode is positive, but points to an inode different than the upper inode, then either the upper inode has been copied up and not indexed or it was indexed, but since then index dir was cleared. Either way, that index cannot be used to indentify the overlay inode. If a positive dentry that matches the upper inode was found, then it is safe to use the copy up origin st_ino for upper hardlinks, because all indexed upper hardlinks are represented by the same overlay inode as the copy up origin. Set the INDEX type flag on an indexed upper dentry. A non-upper dentry may also have a positive index from copy up of another lower hardlink. This situation will be handled by following patches. Index lookup is going to be used to prevent breaking hardlinks on copy up. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: verify index dir matches upper dirAmir Goldstein2017-07-044-8/+27
| | | | | | | | | | | | | | | | | | An index dir contains persistent hardlinks to files in upper dir. Therefore, we must never mount an existing index dir with a differnt upper dir. Store the upper root dir file handle in index dir inode when index dir is created and verify the file handle before using an existing index dir on mount. Add an 'is_upper' flag to the overlay file handle encoding and set it when encoding the upper root file handle. This is not critical for index dir verification, but it is good practice towards a standard overlayfs file handle format for NFS export. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: verify upper root dir matches lower root dirAmir Goldstein2017-07-044-15/+101
| | | | | | | | | | | When inodes index feature is enabled, verify that the file handle stored in upper root dir matches the lower root dir or fail to mount. If upper root dir has no stored file handle, encode and store the lower root dir file handle in overlay.origin xattr. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: introduce the inodes index dir featureAmir Goldstein2017-07-046-7/+108
| | | | | | | | | | | | | | | Create the index dir on mount. The index dir will contain hardlinks to upper inodes, named after the hex representation of their origin lower inodes. The index dir is going to be used to prevent breaking lower hardlinks on copy up and to implement overlayfs NFS export. Because the feature is not fully backward compat, enabling the feature is opt-in by config/module/mount option. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: generalize ovl_create_workdir()Amir Goldstein2017-07-041-16/+25
| | | | | | | | | | | | | | Pass in the subdir name to create and specify if subdir is persistent or if it should be cleaned up on every mount. Move fallback to readonly mount on failure to create dir and print of error message into the helper. This function is going to be used for creating the persistent 'index' dir under workbasedir. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: relax same fs constrain for ovl_check_origin()Amir Goldstein2017-07-041-18/+24
| | | | | | | | For the case of all layers not on the same fs, try to decode the copy up origin file handle on any of the lower layers. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: get exclusive ownership on upper/work dirsAmir Goldstein2017-07-042-3/+29
| | | | | | | | | | | | | | | Bad things can happen if several concurrent overlay mounts try to use the same upperdir/workdir path. Try to get the 'inuse' advisory lock on upperdir and workdir. Fail mount if another overlay mount instance or another user holds the 'inuse' lock on these directories. Note that this provides no protection for concurrent overlay mount that use overlapping (i.e. descendant) upper/work dirs. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* vfs: introduce inode 'inuse' lockAmir Goldstein2017-07-042-0/+33
| | | | | | | | | | | | | | Added an i_state flag I_INUSE and helpers to set/clear/test the bit. The 'inuse' lock is an 'advisory' inode lock, that can be used to extend exclusive create protection beyond parent->i_mutex lock among cooperating users. This is going to be used by overlayfs to get exclusive ownership on upper and work dirs among overlayfs mounts. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: move cache and version to ovl_inodeMiklos Szeredi2017-07-043-17/+13
| | | | Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: use ovl_inode mutex to synchronize concurrent copy upAmir Goldstein2017-07-043-20/+11
| | | | | | | | | | | | Use the new ovl_inode mutex to synchonize concurrent copy up instead of the super block copy up workqueue. Moving the synchronization object from the overlay dentry to the overlay inode is needed for synchonizing concurrent copy up of lower hardlinks to the same upper inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* ovl: move impure to ovl_inodeMiklos Szeredi2017-07-046-17/+26
| | | | Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>