summaryrefslogtreecommitdiffstats
path: root/fs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'upstream-5.2-rc2' of ↵Linus Torvalds2019-05-203-4/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBIFS fixes from Richard Weinberger: - build errors wrt xattrs - mismerge which lead to a wrong Kconfig ifdef - missing endianness conversion * tag 'upstream-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: Convert xattr inum to host order ubifs: Use correct config name for encryption ubifs: Fix build error without CONFIG_UBIFS_FS_XATTR
| * ubifs: Convert xattr inum to host orderRichard Weinberger2019-05-151-1/+1
| | | | | | | | | | | | | | | | | | | | UBIFS stores inode numbers as LE64 integers. We have to convert them to host oder, otherwise BE hosts won't be able to use the integer correctly. Reported-by: kbuild test robot <lkp@intel.com> Fixes: 9ca2d7326444 ("ubifs: Limit number of xattrs per inode") Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Use correct config name for encryptionRichard Weinberger2019-05-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | CONFIG_UBIFS_FS_ENCRYPTION is gone, fscrypt is now controlled via CONFIG_FS_ENCRYPTION. This problem slipped into the tree because of a mis-merge on my side. Reported-by: Eric Biggers <ebiggers@kernel.org> Fixes: eea2c05d927b ("ubifs: Remove #ifdef around CONFIG_FS_ENCRYPTION") Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Fix build error without CONFIG_UBIFS_FS_XATTRYueHaibing2019-05-151-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix gcc build error while CONFIG_UBIFS_FS_XATTR is not set fs/ubifs/dir.o: In function `ubifs_unlink': dir.c:(.text+0x260): undefined reference to `ubifs_purge_xattrs' fs/ubifs/dir.o: In function `do_rename': dir.c:(.text+0x1edc): undefined reference to `ubifs_purge_xattrs' fs/ubifs/dir.o: In function `ubifs_rmdir': dir.c:(.text+0x2638): undefined reference to `ubifs_purge_xattrs' Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 9ca2d7326444 ("ubifs: Limit number of xattrs per inode") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2019-05-191-3/+8
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge yet more updates from Andrew Morton: "A few final bits: - large changes to vmalloc, yielding large performance benefits - tweak the console-flush-on-panic code - a few fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: panic: add an option to replay all the printk message in buffer initramfs: don't free a non-existent initrd fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount mm/compaction.c: correct zone boundary handling when isolating pages from a pageblock mm/vmap: add DEBUG_AUGMENT_LOWEST_MATCH_CHECK macro mm/vmap: add DEBUG_AUGMENT_PROPAGATE_CHECK macro mm/vmalloc.c: keep track of free blocks for vmap allocation
| * | fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going ↵Jiufei Xue2019-05-191-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into workqueue when umount synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb switch may not go to the workqueue after synchronize_rcu(). Thus previous scheduled switches was not finished even flushing the workqueue, which will cause a NULL pointer dereferenced followed below. VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds. Have a nice day... BUG: unable to handle kernel NULL pointer dereference at 0000000000000278 evict+0xb3/0x180 iput+0x1b0/0x230 inode_switch_wbs_work_fn+0x3c0/0x6a0 worker_thread+0x4e/0x490 ? process_one_work+0x410/0x410 kthread+0xe6/0x100 ret_from_fork+0x39/0x50 Replace the synchronize_rcu() call with a rcu_barrier() to wait for all pending callbacks to finish. And inc isw_nr_in_flight after call_rcu() in inode_switch_wbs() to make more sense. Link: http://lkml.kernel.org/r/20190429024108.54150-1-jiufei.xue@linux.alibaba.com Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Acked-by: Tejun Heo <tj@kernel.org> Suggested-by: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'kbuild-v5.2-2' of ↵Linus Torvalds2019-05-193-5/+4
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - remove unneeded use of cc-option, cc-disable-warning, cc-ldoption - exclude tracked files from .gitignore - re-enable -Wint-in-bool-context warning - refactor samples/Makefile - stop building immediately if syncconfig fails - do not sprinkle error messages when $(CC) does not exist - move arch/alpha/defconfig to the configs subdirectory - remove crappy header search path manipulation - add comment lines to .config to clarify the end of menu blocks - check uniqueness of module names (adding new warnings intentionally) * tag 'kbuild-v5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (24 commits) kconfig: use 'else ifneq' for Makefile to improve readability kbuild: check uniqueness of module names kconfig: Terminate menu blocks with a comment in the generated config kbuild: add LICENSES to KBUILD_ALLDIRS kbuild: remove 'addtree' and 'flags' magic for header search paths treewide: prefix header search paths with $(srctree)/ media: prefix header search paths with $(srctree)/ media: remove unneeded header search paths alpha: move arch/alpha/defconfig to arch/alpha/configs/defconfig kbuild: terminate Kconfig when $(CC) or $(LD) is missing kbuild: turn auto.conf.cmd into a mandatory include file .gitignore: exclude .get_maintainer.ignore and .gitattributes kbuild: add all Clang-specific flags unconditionally kbuild: Don't try to add '-fcatch-undefined-behavior' flag kbuild: add some extra warning flags unconditionally kbuild: add -Wvla flag unconditionally arch: remove dangling asm-generic wrappers samples: guard sub-directories with CONFIG options kbuild: re-enable int-in-bool-context warning MAINTAINERS: kbuild: Add pattern for scripts/*vmlinux* ...
| * | | treewide: prefix header search paths with $(srctree)/Masahiro Yamada2019-05-183-5/+4
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the Kbuild core manipulates header search paths in a crazy way [1]. To fix this mess, I want all Makefiles to add explicit $(srctree)/ to the search paths in the srctree. Some Makefiles are already written in that way, but not all. The goal of this work is to make the notation consistent, and finally get rid of the gross hacks. Having whitespaces after -I does not matter since commit 48f6e3cf5bc6 ("kbuild: do not drop -I without parameter"). [1]: https://patchwork.kernel.org/patch/9632347/ Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* | | Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds2019-05-1912-62/+102
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Some bug fixes, and an update to the URL's for the final version of Unicode 12.1.0" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: avoid panic during forced reboot due to aborted journal ext4: fix block validity checks for journal inodes using indirect blocks unicode: update to Unicode 12.1.0 final unicode: add missing check for an error return from utf8lookup() ext4: fix miscellaneous sparse warnings ext4: unsigned int compared against zero ext4: fix use-after-free in dx_release() ext4: fix data corruption caused by overlapping unaligned and aligned IO jbd2: fix potential double free ext4: zero out the unused memory region in the extent tree block
| * | | ext4: avoid panic during forced reboot due to aborted journalJan Kara2019-05-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handling of aborted journal is a special code path different from standard ext4_error() one and it can call panic() as well. Commit 1dc1097ff60e ("ext4: avoid panic during forced reboot") forgot to update this path so fix that omission. Fixes: 1dc1097ff60e ("ext4: avoid panic during forced reboot") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 5.1
| * | | ext4: fix block validity checks for journal inodes using indirect blocksTheodore Ts'o2019-05-151-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity") failed to add an exception for the journal inode in ext4_check_blockref(), which is the function used by ext4_get_branch() for indirect blocks. This caused attempts to read from the ext3-style journals to fail with: [ 848.968550] EXT4-fs error (device sdb7): ext4_get_branch:171: inode #8: block 30343695: comm jbd2/sdb7-8: invalid block Fix this by adding the missing exception check. Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity") Reported-by: Arthur Marsh <arthur.marsh@internode.on.net> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | unicode: update to Unicode 12.1.0 finalTheodore Ts'o2019-05-121-21/+7
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Gabriel Krisman Bertazi <krisman@collabora.com>
| * | | unicode: add missing check for an error return from utf8lookup()Theodore Ts'o2019-05-121-0/+2
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Gabriel Krisman Bertazi <krisman@collabora.com>
| * | | ext4: fix miscellaneous sparse warningsTheodore Ts'o2019-05-123-3/+3
| | | | | | | | | | | | | | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | ext4: unsigned int compared against zeroColin Ian King2019-05-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two cases where u32 variables n and err are being checked for less than zero error values, the checks is always false because the variables are not signed. Fix this by making the variables ints. Addresses-Coverity: ("Unsigned compared against 0") Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | ext4: fix use-after-free in dx_release()Sahitya Tummala2019-05-111-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The buffer_head (frames[0].bh) and it's corresping page can be potentially free'd once brelse() is done inside the for loop but before the for loop exits in dx_release(). It can be free'd in another context, when the page cache is flushed via drop_caches_sysctl_handler(). This results into below data abort when accessing info->indirect_levels in dx_release(). Unable to handle kernel paging request at virtual address ffffffc17ac3e01e Call trace: dx_release+0x70/0x90 ext4_htree_fill_tree+0x2d4/0x300 ext4_readdir+0x244/0x6f8 iterate_dir+0xbc/0x160 SyS_getdents64+0x94/0x174 Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@kernel.org
| * | | ext4: fix data corruption caused by overlapping unaligned and aligned IOLukas Czerner2019-05-111-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unaligned AIO must be serialized because the zeroing of partial blocks of unaligned AIO can result in data corruption in case it's overlapping another in flight IO. Currently we wait for all unwritten extents before we submit unaligned AIO which protects data in case of unaligned AIO is following overlapping IO. However if a unaligned AIO is followed by overlapping aligned AIO we can still end up corrupting data. To fix this, we must make sure that the unaligned AIO is the only IO in flight by waiting for unwritten extents conversion not just before the IO submission, but right after it as well. This problem can be reproduced by xfstest generic/538 Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
| * | | jbd2: fix potential double freeChengguang Xu2019-05-113-33/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When failing from creating cache jbd2_inode_cache, we will destroy the previously created cache jbd2_handle_cache twice. This patch fixes this by moving each cache initialization/destruction to its own separate, individual function. Signed-off-by: Chengguang Xu <cgxu519@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
| * | | ext4: zero out the unused memory region in the extent tree blockSriram Rajagopalan2019-05-111-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit zeroes out the unused memory region in the buffer_head corresponding to the extent metablock after writing the extent header and the corresponding extent node entries. This is done to prevent random uninitialized data from getting into the filesystem when the extent block is synced. This fixes CVE-2019-11833. Signed-off-by: Sriram Rajagopalan <sriramr@arista.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
* | | | Merge tag '5.2-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2019-05-198-51/+173
|\ \ \ \ | |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cifs fixes from Steve French: "Minor cleanup and fixes, one for stable, four rdma (smbdirect) related. Also adds SEEK_HOLE support" * tag '5.2-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: add support for SEEK_DATA and SEEK_HOLE Fixed https://bugzilla.kernel.org/show_bug.cgi?id=202935 allow write on the same file cifs: Allocate memory for all iovs in smb2_ioctl cifs: Don't match port on SMBDirect transport cifs:smbd Use the correct DMA direction when sending data cifs:smbd When reconnecting to server, call smbd_destroy() after all MIDs have been called cifs: use the right include for signal_pending() smb3: trivial cleanup to smb2ops.c cifs: cleanup smb2ops.c and normalize strings smb3: display session id in debug data
| * | | cifs: add support for SEEK_DATA and SEEK_HOLERonnie Sahlberg2019-05-163-0/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add llseek op for SEEK_DATA and SEEK_HOLE. Improves xfstests/285,286,436,445,448 and 490 Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | Fixed https://bugzilla.kernel.org/show_bug.cgi?id=202935 allow write on the ↵Kovtunenko Oleksandr2019-05-161-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | same file Copychunk allows source and target to be on the same file. For details on restrictions see MS-SMB2 3.3.5.15.6 Signed-off-by: Kovtunenko Oleksandr <alexander198961@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | cifs: Allocate memory for all iovs in smb2_ioctlLong Li2019-05-161-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An IOCTL uses up to 2 iovs. The 1st iov is the command itself, the 2nd iov is optional data for that command. The 1st iov is always allocated on the heap but the 2nd iov may point to a variable on the stack. This will trigger an error when passing the 2nd iov for RDMA I/O. Fix this by allocating a buffer for the 2nd iov. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie sahlberg <lsahlber@redhat.com>
| * | | cifs: Don't match port on SMBDirect transportLong Li2019-05-161-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SMBDirect manages its own ports in the transport layer, there is no need to check the port to find a connection. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie sahlberg <lsahlber@redhat.com>
| * | | cifs:smbd Use the correct DMA direction when sending dataLong Li2019-05-141-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sending data, use the DMA_TO_DEVICE to map buffers. Also log the number of requests in a compounding request from upper layer. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
| * | | cifs:smbd When reconnecting to server, call smbd_destroy() after all MIDs ↵Long Li2019-05-141-17/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | have been called commit 214bab448476 ("cifs: Call MID callback before destroying transport") assumes that the MID callback should not take srv_mutex, this may not always be true. SMB Direct requires the MID callback completed before calling transport so all pending memory registration can be freed. So restore the original calling sequence so TCP transport will use the same code, but moving smbd_destroy() after all MID has been called. fixes: 214bab448476 ("cifs: Call MID callback before destroying transport") Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
| * | | cifs: use the right include for signal_pending()Ronnie Sahlberg2019-05-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This header is actually where signal_pending is defined although either would work. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | smb3: trivial cleanup to smb2ops.cSteve French2019-05-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Minor cleanup - e.g. missing \n at end of debug statement. Reported-by: Christoph Probst <kernel@probst.it> Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
| * | | cifs: cleanup smb2ops.c and normalize stringsChristoph Probst2019-05-091-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix checkpatch warnings/errors in smb2ops.c except "LONG_LINE". Add missing linebreaks, indentings, __func__. Remove void-returns, unneeded braces. Address warnings spotted by checkpatch. Add SPDX License Header. Add missing "\n" and capitalize first letter in some cifs_dbg() strings. Signed-off-by: Christoph Probst <kernel@probst.it> Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
| * | | smb3: display session id in debug dataSteve French2019-05-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Displaying the session id in /proc/fs/cifs/DebugData is needed in order to correlate Linux client information with network and server traces for many common support scenarios. Turned out to be very important for debugging. Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
* | | | Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds2019-05-171-1/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more vfs mount updates from Al Viro: "Propagation of new syscalls to other architectures + cosmetic change from Christian (fscontext didn't follow the convention for anon inode names)" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: uapi: Wire up the mount API syscalls on non-x86 arches [ver #2] uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2] uapi, fsopen: use square brackets around "fscontext" [ver #2]
| * | | | uapi, fsopen: use square brackets around "fscontext" [ver #2]Christian Brauner2019-05-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the name of the anon inode fd "[fscontext]" instead of "fscontext". This is minor but most core-kernel anon inode fds already carry square brackets around their name: [eventfd] [eventpoll] [fanotify] [io_uring] [pidfd] [signalfd] [timerfd] [userfaultfd] For the sake of consistency lets do the same for the fscontext anon inode fd that comes with the new mount api. Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | Merge tag 'for-linus-20190516' of git://git.kernel.dk/linux-blockLinus Torvalds2019-05-171-57/+31
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fixes from Jens Axboe: "A small set of fixes for io_uring. This contains: - smp_rmb() cleanup for io_cqring_events() (Jackie) - io_cqring_wait() simplification (Jackie) - removal of dead 'ev_flags' passing (me) - SQ poll CPU affinity verification fix (me) - SQ poll wait fix (Roman) - SQE command prep cleanup and fix (Stefan)" * tag 'for-linus-20190516' of git://git.kernel.dk/linux-block: io_uring: use wait_event_interruptible for cq_wait conditional wait io_uring: adjust smp_rmb inside io_cqring_events io_uring: fix infinite wait in khread_park() on io_finish_async() io_uring: remove 'ev_flags' argument io_uring: fix failure to verify SQ_AFF cpu io_uring: fix race condition reading SQE data
| * | | | | io_uring: use wait_event_interruptible for cq_wait conditional waitJackie Liu2019-05-161-15/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous patch has ensured that io_cqring_events contain smp_rmb memory barriers, Now we can use wait_event_interruptible to keep the code simple. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | io_uring: adjust smp_rmb inside io_cqring_eventsJackie Liu2019-05-161-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whenever smp_rmb is required to use io_cqring_events, keep smp_rmb inside the function io_cqring_events. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | io_uring: fix infinite wait in khread_park() on io_finish_async()Roman Penyaev2019-05-161-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes couple of races which lead to infinite wait of park completion with the following backtraces: [20801.303319] Call Trace: [20801.303321] ? __schedule+0x284/0x650 [20801.303323] schedule+0x33/0xc0 [20801.303324] schedule_timeout+0x1bc/0x210 [20801.303326] ? schedule+0x3d/0xc0 [20801.303327] ? schedule_timeout+0x1bc/0x210 [20801.303329] ? preempt_count_add+0x79/0xb0 [20801.303330] wait_for_completion+0xa5/0x120 [20801.303331] ? wake_up_q+0x70/0x70 [20801.303333] kthread_park+0x48/0x80 [20801.303335] io_finish_async+0x2c/0x70 [20801.303336] io_ring_ctx_wait_and_kill+0x95/0x180 [20801.303338] io_uring_release+0x1c/0x20 [20801.303339] __fput+0xad/0x210 [20801.303341] task_work_run+0x8f/0xb0 [20801.303342] exit_to_usermode_loop+0xa0/0xb0 [20801.303343] do_syscall_64+0xe0/0x100 [20801.303349] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [20801.303380] Call Trace: [20801.303383] ? __schedule+0x284/0x650 [20801.303384] schedule+0x33/0xc0 [20801.303386] io_sq_thread+0x38a/0x410 [20801.303388] ? __switch_to_asm+0x40/0x70 [20801.303390] ? wait_woken+0x80/0x80 [20801.303392] ? _raw_spin_lock_irqsave+0x17/0x40 [20801.303394] ? io_submit_sqes+0x120/0x120 [20801.303395] kthread+0x112/0x130 [20801.303396] ? kthread_create_on_node+0x60/0x60 [20801.303398] ret_from_fork+0x35/0x40 o kthread_park() waits for park completion, so io_sq_thread() loop should check kthread_should_park() along with khread_should_stop(), otherwise if kthread_park() is called before prepare_to_wait() the following schedule() never returns: CPU#0 CPU#1 io_sq_thread_stop(): io_sq_thread(): while(!kthread_should_stop() && !ctx->sqo_stop) { ctx->sqo_stop = 1; kthread_park() prepare_to_wait(); if (kthread_should_stop() { } schedule(); <<< nobody checks park flag, <<< so schedule and never return o if the flag ctx->sqo_stop is observed by the io_sq_thread() loop it is quite possible, that kthread_should_park() check and the following kthread_parkme() is never called, because kthread_park() has not been yet called, but few moments later is is called and waits there for park completion, which never happens, because kthread has already exited: CPU#0 CPU#1 io_sq_thread_stop(): io_sq_thread(): ctx->sqo_stop = 1; while(!kthread_should_stop() && !ctx->sqo_stop) { <<< observe sqo_stop and exit the loop } if (kthread_should_park()) kthread_parkme(); <<< never called, since was <<< never parked kthread_park() <<< waits forever for park completion In the current patch we quit the loop by only kthread_should_park() check (kthread_park() is synchronous, so kthread_should_stop() is never observed), and we abandon ->sqo_stop flag, since it is racy. At the end of the io_sq_thread() we unconditionally call parmke(), since we've exited the loop by the park flag. Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | io_uring: remove 'ev_flags' argumentJens Axboe2019-05-151-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We always pass in 0 for the cqe flags argument, since the support for "this read hit page cache" hint was dropped. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | io_uring: fix failure to verify SQ_AFF cpuJens Axboe2019-05-151-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test case we have is rightfully failing with the current kernel: io_uring_setup(1, 0x7ffe2cafebe0), flags: IORING_SETUP_SQPOLL|IORING_SETUP_SQ_AFF, resv: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000, sq_thread_cpu: 4 expected -1, got 3 This is in a vm, and CPU3 is the last valid one, hence asking for 4 should fail the setup with -EINVAL, not succeed. The problem is that we're using array_index_nospec() with nr_cpu_ids as the index, hence we wrap and end up using CPU0 instead of CPU4. This makes the setup succeed where it should be failing. We don't need to use array_index_nospec() as we're not indexing any array with this. Instead just compare with nr_cpu_ids directly. This is fine as we're checking with cpu_online() afterwards. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | io_uring: fix race condition reading SQE dataStefan Bühler2019-05-131-15/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When punting to workers the SQE gets copied after the initial try. There is a race condition between reading SQE data for the initial try and copying it for punting it to the workers. For example io_rw_done calls kiocb->ki_complete even if it was prepared for IORING_OP_FSYNC (and would be NULL). The easiest solution for now is to alway prepare again in the worker. req->file is safe to prepare though as long as it is checked before use. Signed-off-by: Stefan Bühler <source@stbuehler.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | Merge tag 'afs-fixes-b-20190516' of ↵Linus Torvalds2019-05-1720-1381/+1383
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS callback promise fixes from David Howells: "This series fixes a bunch of problems in callback promise handling, where a callback promise indicates a promise on the part of the server to notify the client in the event of some sort of change to a file or volume. In the event of a break, the client has to go and refetch the client status from the server and discard any cached permission information as the ACL might have changed. The problem in the current code is that changes made by other clients aren't always noticed, primarily because the file status information and the callback information aren't updated in the same critical section, even if these are carried in the same reply from an RPC operation, and so the AFS_VNODE_CB_PROMISED flag is unreliable. Arranging for them to be done in the same critical section during reply decoding is tricky because of the FS.InlineBulkStatus op - which has all the statuses in the reply arriving and then all the callbacks, so they have to be buffered. It simplifies things a lot to move the critical section out of the decode phase and do it after the RPC function returns. Also new inodes (either newly fetched or newly created) aren't properly managed against a callback break happening before we get the local inode up and running. Fix this by: - There's now a combined file status and callback record (struct afs_status_cb) to carry both plus some flags. - Each operation wrapper function allocates sufficient afs_status_cb records for all the vnodes it is interested in and passes them into RPC operations to be filled in from the reply. - The FileStatus and CallBack record decoders no longer apply the new/revised status and callback information to the inode/vnode at the point of decoding and instead store the information into the record from (2). - afs_vnode_commit_status() then revises the file status, detects deletion and notes callback information inside of a single critical section. It also checks the callback break counters and cancels the callback promise if they changed during the operation. [*] Note that "callback break counters" are counters of server events that cancel one or more callback promises that the client thinks it has. The client counts the events and compares the counters before and after an operation to see if the callback promise it thinks it just got evaporated before it got recorded under lock. - Volume and server callback break counters are passed into afs_iget() allowing callback breaks concurrent with inode set up to be detected and the callback promise thence to be cancelled. - AFS validation checks are now done under RCU conditions using a read lock on cb_lock. This requires vnode->cb_interest to be made RCU safe. - If the checks in (6) fail, the callback breaker is then called under write lock on the cb_lock - but only if the callback break counter didn't change from the value read before the checks were made. - Results from FS.InlineBulkStatus that correspond to inodes we currently have in memory are now used to update those inodes' status and callback information rather than being discarded. This requires those inodes to be looked up before the RPC op is made and all their callback break values saved. To aid in this, the following changes have also been made: - Don't pass the vnode into the reply delivery functions or the decoders. The vnode shouldn't be altered anywhere in those paths. The only exception, for the moment, is for the call done hook for file lock ops that wants access to both the vnode and the call - this can be fixed at a later time. - Get rid of the call->reply[] void* array and replace it with named and typed members. This avoids confusion since different ops were mapping different reply[] members to different things. - Fix an order-1 kmalloc allocation in afs_do_lookup() and replace it with kvcalloc(). - Always get the reply time. Since callback, lock and fileserver record expiry times are calculated for several RPCs, make this mandatory. - Call afs_pages_written_back() from the operation wrapper rather than from the delivery function. - Don't store the version and type from a callback promise in a reply as the information in them is of very limited use" * tag 'afs-fixes-b-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix application of the results of a inline bulk status fetch afs: Pass pre-fetch server and volume break counts into afs_iget5_set() afs: Fix unlink to handle YFS.RemoveFile2 better afs: Clear AFS_VNODE_CB_PROMISED if we detect callback expiry afs: Make vnode->cb_interest RCU safe afs: Split afs_validate() so first part can be used under LOOKUP_RCU afs: Don't save callback version and type fields afs: Fix application of status and callback to be under same lock afs: Always get the reply time afs: Fix order-1 allocation in afs_do_lookup() afs: Get rid of afs_call::reply[] afs: Don't pass the vnode pointer through into the inline bulk status op
| * | | | | | afs: Fix application of the results of a inline bulk status fetchDavid Howells2019-05-162-7/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix afs_do_lookup() such that when it does an inline bulk status fetch op, it will update inodes that are already extant (something that afs_iget() doesn't do) and to cache permits for each inode created (thereby avoiding a follow up FS.FetchStatus call to determine this). Extant inodes need looking up in advance so that their cb_break counters before and after the operation can be compared. To this end, the inode pointers are cached so that they don't need looking up again after the op. Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | afs: Pass pre-fetch server and volume break counts into afs_iget5_set()David Howells2019-05-164-49/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the server and volume break counts from before the status fetch operation that queried the attributes of a file into afs_iget5_set() so that the new vnode's break counters can be initialised appropriately. This allows detection of a volume or server break that happened whilst we were fetching the status or setting up the vnode. Fixes: c435ee34551e ("afs: Overhaul the callback handling") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | afs: Fix unlink to handle YFS.RemoveFile2 betterDavid Howells2019-05-166-28/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make use of the status update for the target file that the YFS.RemoveFile2 RPC op returns to correctly update the vnode as to whether the file was actually deleted or just had nlink reduced. Fixes: 30062bd13e36 ("afs: Implement YFS support in the fs client") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | afs: Clear AFS_VNODE_CB_PROMISED if we detect callback expiryDavid Howells2019-05-161-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix afs_validate() to clear AFS_VNODE_CB_PROMISED on a vnode if we detect any condition that causes the callback promise to be broken implicitly, including server break (cb_s_break), volume break (cb_v_break) or callback expiry. Fixes: ae3b7361dc0e ("afs: Fix validation/callback interaction") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | afs: Make vnode->cb_interest RCU safeDavid Howells2019-05-167-59/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use RCU-based freeing for afs_cb_interest struct objects and use RCU on vnode->cb_interest. Use that change to allow afs_check_validity() to use read_seqbegin_or_lock() instead of read_seqlock_excl(). This also requires the caller of afs_check_validity() to hold the RCU read lock across the call. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | afs: Split afs_validate() so first part can be used under LOOKUP_RCUDavid Howells2019-05-162-13/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split afs_validate() so that the part that decides if the vnode is still valid can be used under LOOKUP_RCU conditions from afs_d_revalidate(). Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | afs: Don't save callback version and type fieldsDavid Howells2019-05-166-16/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't save callback version and type fields as the version is about the format of the callback information and the type is relative to the particular RPC call. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | afs: Fix application of status and callback to be under same lockDavid Howells2019-05-1612-1004/+921
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When applying the status and callback in the response of an operation, apply them in the same critical section so that there's no race between checking the callback state and checking status-dependent state (such as the data version). Fix this by: (1) Allocating a joint {status,callback} record (afs_status_cb) before calling the RPC function for each vnode for which the RPC reply contains a status or a status plus a callback. A flag is set in the record to indicate if a callback was actually received. (2) These records are passed into the RPC functions to be filled in. The afs_decode_status() and yfs_decode_status() functions are removed and the cb_lock is no longer taken. (3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer update the vnode. (4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update the vnode. (5) vnodes, expected data-version numbers and callback break counters (cb_break) no longer need to be passed to the reply delivery functions. Note that, for the moment, the file locking functions still need access to both the call and the vnode at the same time. (6) afs_vnode_commit_status() is now given the cb_break value and the expected data_version and the task of applying the status and the callback to the vnode are now done here. This is done under a single taking of vnode->cb_lock. (7) afs_pages_written_back() is now called by afs_store_data() rather than by the reply delivery function. afs_pages_written_back() has been moved to before the call point and is now given the first and last page numbers rather than a pointer to the call. (8) The indicator from YFS.RemoveFile2 as to whether the target file actually got removed (status.abort_code == VNOVNODE) rather than merely dropping a link is now checked in afs_unlink rather than in xdr_decode_YFSFetchStatus(). Supplementary fixes: (*) afs_cache_permit() now gets the caller_access mask from the afs_status_cb object rather than picking it out of the vnode's status record. afs_fetch_status() returns caller_access through its argument list for this purpose also. (*) afs_inode_init_from_status() now uses a write lock on cb_lock rather than a read lock and now sets the callback inside the same critical section. Fixes: c435ee34551e ("afs: Overhaul the callback handling") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | afs: Always get the reply timeDavid Howells2019-05-165-16/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Always ask for the reply time from AF_RXRPC as it's used to calculate the callback expiry time and lock expiry times, so it's needed by most FS operations. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | afs: Fix order-1 allocation in afs_do_lookup()David Howells2019-05-165-50/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | afs_do_lookup() will do an order-1 allocation to allocate status records if there are more than 39 vnodes to stat. Fix this by allocating an array of {status,callback} records for each vnode we want to examine using vmalloc() if larger than a page. This not only gets rid of the order-1 allocation, but makes it easier to grow beyond 50 records for YFS servers. It also allows us to move to {status,callback} tuples for other calls too and makes it easier to lock across the application of the status and the callback to the vnode. Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup") Signed-off-by: David Howells <dhowells@redhat.com>