summaryrefslogtreecommitdiffstats
path: root/fs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2017-07-0322-68/+69
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - Add the SYSTEM_SCHEDULING bootup state to move various scheduler debug checks earlier into the bootup. This turns silent and sporadically deadly bugs into nice, deterministic splats. Fix some of the splats that triggered. (Thomas Gleixner) - A round of restructuring and refactoring of the load-balancing and topology code (Peter Zijlstra) - Another round of consolidating ~20 of incremental scheduler code history: this time in terms of wait-queue nomenclature. (I didn't get much feedback on these renaming patches, and we can still easily change any names I might have misplaced, so if anyone hates a new name, please holler and I'll fix it.) (Ingo Molnar) - sched/numa improvements, fixes and updates (Rik van Riel) - Another round of x86/tsc scheduler clock code improvements, in hope of making it more robust (Peter Zijlstra) - Improve NOHZ behavior (Frederic Weisbecker) - Deadline scheduler improvements and fixes (Luca Abeni, Daniel Bristot de Oliveira) - Simplify and optimize the topology setup code (Lauro Ramos Venancio) - Debloat and decouple scheduler code some more (Nicolas Pitre) - Simplify code by making better use of llist primitives (Byungchul Park) - ... plus other fixes and improvements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) sched/cputime: Refactor the cputime_adjust() code sched/debug: Expose the number of RT/DL tasks that can migrate sched/numa: Hide numa_wake_affine() from UP build sched/fair: Remove effective_load() sched/numa: Implement NUMA node level wake_affine() sched/fair: Simplify wake_affine() for the single socket case sched/numa: Override part of migrate_degrades_locality() when idle balancing sched/rt: Move RT related code from sched/core.c to sched/rt.c sched/deadline: Move DL related code from sched/core.c to sched/deadline.c sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled sched/fair: Spare idle load balancing on nohz_full CPUs nohz: Move idle balancer registration to the idle path sched/loadavg: Generalize "_idle" naming to "_nohz" sched/core: Drop the unused try_get_task_struct() helper function sched/fair: WARN() and refuse to set buddy when !se->on_rq sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h> sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h> ...
| * Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar2017-06-2415-58/+90
| |\ | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list namingIngo Molnar2017-06-206-20/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So I've noticed a number of instances where it was not obvious from the code whether ->task_list was for a wait-queue head or a wait-queue entry. Furthermore, there's a number of wait-queue users where the lists are not for 'tasks' but other entities (poll tables, etc.), in which case the 'task_list' name is actively confusing. To clear this all up, name the wait-queue head and entry list structure fields unambiguously: struct wait_queue_head::task_list => ::head struct wait_queue_entry::task_list => ::entry For example, this code: rqw->wait.task_list.next != &wait->task_list ... is was pretty unclear (to me) what it's doing, while now it's written this way: rqw->wait.head.next != &wait->entry ... which makes it pretty clear that we are iterating a list until we see the head. Other examples are: list_for_each_entry_safe(pos, next, &x->task_list, task_list) { list_for_each_entry(wq, &fence->wait.task_list, task_list) { ... where it's unclear (to me) what we are iterating, and during review it's hard to tell whether it's trying to walk a wait-queue entry (which would be a bug), while now it's written as: list_for_each_entry_safe(pos, next, &x->head, entry) { list_for_each_entry(wq, &fence->wait.head, entry) { Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into ↵Ingo Molnar2017-06-203-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | <linux/wait_bit.h> The wait_bit*() types and APIs are mixed into wait.h, but they are a pretty orthogonal extension of wait-queues. Furthermore, only about 50 kernel files use these APIs, while over 1000 use the regular wait-queue functionality. So clean up the main wait.h by moving the wait-bit functionality out of it, into a separate .h and .c file: include/linux/wait_bit.h for types and APIs kernel/sched/wait_bit.c for the implementation Update all header dependencies. This reduces the size of wait.h rather significantly, by about 30%. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | sched/wait: Standardize 'struct wait_bit_queue' wait-queue entry field nameIngo Molnar2017-06-204-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename 'struct wait_bit_queue::wait' to ::wq_entry, to more clearly name it as a wait-queue entry. Propagate it to a couple of usage sites where the wait-bit-queue internals are exposed. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | sched/wait: Rename wait_queue_t => wait_queue_entry_tIngo Molnar2017-06-2016-35/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename: wait_queue_t => wait_queue_entry_t 'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue", but in reality it's a queue *entry*. The 'real' queue is the wait queue head, which had to carry the name. Start sorting this out by renaming it to 'wait_queue_entry_t'. This also allows the real structure name 'struct __wait_queue' to lose its double underscore and become 'struct wait_queue_entry', which is the more canonical nomenclature for such data types. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | Merge branch 'for-4.13/block' of git://git.kernel.dk/linux-blockLinus Torvalds2017-07-0346-260/+463
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block/IO updates from Jens Axboe: "This is the main pull request for the block layer for 4.13. Not a huge round in terms of features, but there's a lot of churn related to some core cleanups. Note this depends on the UUID tree pull request, that Christoph already sent out. This pull request contains: - A series from Christoph, unifying the error/stats codes in the block layer. We now use blk_status_t everywhere, instead of using different schemes for different places. - Also from Christoph, some cleanups around request allocation and IO scheduler interactions in blk-mq. - And yet another series from Christoph, cleaning up how we handle and do bounce buffering in the block layer. - A blk-mq debugfs series from Bart, further improving on the support we have for exporting internal information to aid debugging IO hangs or stalls. - Also from Bart, a series that cleans up the request initialization differences across types of devices. - A series from Goldwyn Rodrigues, allowing the block layer to return failure if we will block and the user asked for non-blocking. - Patch from Hannes for supporting setting loop devices block size to that of the underlying device. - Two series of patches from Javier, fixing various issues with lightnvm, particular around pblk. - A series from me, adding support for write hints. This comes with NVMe support as well, so applications can help guide data placement on flash to improve performance, latencies, and write amplification. - A series from Ming, improving and hardening blk-mq support for stopping/starting and quiescing hardware queues. - Two pull requests for NVMe updates. Nothing major on the feature side, but lots of cleanups and bug fixes. From the usual crew. - A series from Neil Brown, greatly improving the bio rescue set support. Most notably, this kills the bio rescue work queues, if we don't really need them. - Lots of other little bug fixes that are all over the place" * 'for-4.13/block' of git://git.kernel.dk/linux-block: (217 commits) lightnvm: pblk: set line bitmap check under debug lightnvm: pblk: verify that cache read is still valid lightnvm: pblk: add initialization check lightnvm: pblk: remove target using async. I/Os lightnvm: pblk: use vmalloc for GC data buffer lightnvm: pblk: use right metadata buffer for recovery lightnvm: pblk: schedule if data is not ready lightnvm: pblk: remove unused return variable lightnvm: pblk: fix double-free on pblk init lightnvm: pblk: fix bad le64 assignations nvme: Makefile: remove dead build rule blk-mq: map all HWQ also in hyperthreaded system nvmet-rdma: register ib_client to not deadlock in device removal nvme_fc: fix error recovery on link down. nvmet_fc: fix crashes on bad opcodes nvme_fc: Fix crash when nvme controller connection fails. nvme_fc: replace ioabort msleep loop with completion nvme_fc: fix double calls to nvme_cleanup_cmd() nvme-fabrics: verify that a controller returns the correct NQN nvme: simplify nvme_dev_attrs_are_visible ...
| * | | fs/fcntl: use copy_to/from_user() for u64 typesJens Axboe2017-06-281-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some architectures (at least PPC) doesn't like get/put_user with 64-bit types on a 32-bit system. Use the variably sized copy to/from user variants instead. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: c75b1d9421f8 ("fs: add fcntl() interface for setting/getting write life time hints") Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | btrfs: add support for passing in write hints for buffered writesJens Axboe2017-06-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Chris Mason <clm@fb.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | xfs: add support for passing in write hints for buffered writesJens Axboe2017-06-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | ext4: add support for passing in write hints for buffered writesJens Axboe2017-06-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | fs: add support for buffered writeback to pass down write hintsJens Axboe2017-06-272-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | fs: add O_DIRECT and aio support for sending down write life time hintsJens Axboe2017-06-274-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | fs: add fcntl() interface for setting/getting write life time hintsJens Axboe2017-06-273-0/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define a set of write life time hints: RWH_WRITE_LIFE_NOT_SET No hint information set RWH_WRITE_LIFE_NONE No hints about write life time RWH_WRITE_LIFE_SHORT Data written has a short life time RWH_WRITE_LIFE_MEDIUM Data written has a medium life time RWH_WRITE_LIFE_LONG Data written has a long life time RWH_WRITE_LIFE_EXTREME Data written has an extremely long life time The intent is for these values to be relative to each other, no absolute meaning should be attached to these flag names. Add an fcntl interface for querying these flags, and also for setting them as well: F_GET_RW_HINT Returns the read/write hint set on the underlying inode. F_SET_RW_HINT Set one of the above write hints on the underlying inode. F_GET_FILE_RW_HINT Returns the read/write hint set on the file descriptor. F_SET_FILE_RW_HINT Set one of the above write hints on the file descriptor. The user passes in a 64-bit pointer to get/set these values, and the interface returns 0/-1 on success/error. Sample program testing/implementing basic setting/getting of write hints is below. Add support for storing the write life time hint in the inode flags and in struct file as well, and pass them to the kiocb flags. If both a file and its corresponding inode has a write hint, then we use the one in the file, if available. The file hint can be used for sync/direct IO, for buffered writeback only the inode hint is available. This is in preparation for utilizing these hints in the block layer, to guide on-media data placement. /* * writehint.c: get or set an inode write hint */ #include <stdio.h> #include <fcntl.h> #include <stdlib.h> #include <unistd.h> #include <stdbool.h> #include <inttypes.h> #ifndef F_GET_RW_HINT #define F_LINUX_SPECIFIC_BASE 1024 #define F_GET_RW_HINT (F_LINUX_SPECIFIC_BASE + 11) #define F_SET_RW_HINT (F_LINUX_SPECIFIC_BASE + 12) #endif static char *str[] = { "RWF_WRITE_LIFE_NOT_SET", "RWH_WRITE_LIFE_NONE", "RWH_WRITE_LIFE_SHORT", "RWH_WRITE_LIFE_MEDIUM", "RWH_WRITE_LIFE_LONG", "RWH_WRITE_LIFE_EXTREME" }; int main(int argc, char *argv[]) { uint64_t hint; int fd, ret; if (argc < 2) { fprintf(stderr, "%s: file <hint>\n", argv[0]); return 1; } fd = open(argv[1], O_RDONLY); if (fd < 0) { perror("open"); return 2; } if (argc > 2) { hint = atoi(argv[2]); ret = fcntl(fd, F_SET_RW_HINT, &hint); if (ret < 0) { perror("fcntl: F_SET_RW_HINT"); return 4; } } ret = fcntl(fd, F_GET_RW_HINT, &hint); if (ret < 0) { perror("fcntl: F_GET_RW_HINT"); return 3; } printf("%s: hint %s\n", argv[1], str[hint]); close(fd); return 0; } Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | btrfs: use new block error codeDan Carpenter2017-06-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function is supposed to return blk_status_t error codes now but there was a stray -ENOMEM left behind. Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | block: Make most scsi_req_init() calls implicitBart Van Assche2017-06-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of explicitly calling scsi_req_init() after blk_get_request(), call that function from inside blk_get_request(). Add an .initialize_rq_fn() callback function to the block drivers that need it. Merge the IDE .init_rq_fn() function into .initialize_rq_fn() because it is too small to keep it as a separate function. Keep the scsi_req_init() call in ide_prep_sense() because it follows a blk_rq_init() call. References: commit 82ed4db499b8 ("block: split scsi_request out of struct request") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | btrfs: nowait aio supportGoldwyn Rodrigues2017-06-202-6/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return EAGAIN if any of the following checks fail + i_rwsem is not lockable + NODATACOW or PREALLOC is not set + Cannot nocow at the desired location + Writing beyond end of file which is not allocated Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | xfs: nowait aio supportGoldwyn Rodrigues2017-06-202-6/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable immediately. IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin if it needs allocation either due to file extension, writing to a hole, or COW or waiting for other DIOs to finish. Return -EAGAIN if we don't have extent list in memory. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | ext4: nowait aio supportGoldwyn Rodrigues2017-06-201-6/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return EAGAIN if any of the following checks fail for direct I/O: + i_rwsem is lockable + Writing beyond end of file (will trigger allocation) + Blocks are not allocated at the write location Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | block: return on congested block deviceGoldwyn Rodrigues2017-06-201-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new bio operation flag REQ_NOWAIT is introduced to identify bio's orignating from iocb with IOCB_NOWAIT. This flag indicates to return immediately if a request cannot be made instead of retrying. Stacked devices such as md (the ones with make_request_fn hooks) currently are not supported because it may block for housekeeping. For example, an md can have a part of the device suspended. For this reason, only request based devices are supported. In the future, this feature will be expanded to stacked devices by teaching them how to handle the REQ_NOWAIT flags. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | fs: Introduce IOMAP_NOWAITGoldwyn Rodrigues2017-06-201-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IOCB_NOWAIT translates to IOMAP_NOWAIT for iomaps. This is used by XFS in the XFS patch. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | fs: Introduce RWF_NOWAIT and FMODE_AIO_NOWAITGoldwyn Rodrigues2017-06-201-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RWF_NOWAIT informs kernel to bail out if an AIO request will block for reasons such as file allocations, or a writeback triggered, or would block while allocating requests while performing direct I/O. RWF_NOWAIT is translated to IOCB_NOWAIT for iocb->ki_flags. FMODE_AIO_NOWAIT is a flag which identifies the file opened is capable of returning -EAGAIN if the AIO call will block. This must be set by supporting filesystems in the ->open() call. Filesystems xfs, btrfs and ext4 would be supported in the following patches. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | fs: Use RWF_* flags for AIO operationsGoldwyn Rodrigues2017-06-201-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | aio_rw_flags is introduced in struct iocb (using aio_reserved1) which will carry the RWF_* flags. We cannot use aio_flags because they are not checked for validity which may break existing applications. Note, the only place RWF_HIPRI comes in effect is dio_await_one(). All the rest of the locations, aio code return -EIOCBQUEUED before the checks for RWF_HIPRI. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | fs: Separate out kiocb flags setup based on RWF_* flagsGoldwyn Rodrigues2017-06-201-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also added RWF_SUPPORTED to encompass all flags. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | blk: replace bioset_create_nobvec() with a flags arg to bioset_create()NeilBrown2017-06-183-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "flags" arguments are often seen as good API design as they allow easy extensibility. bioset_create_nobvec() is implemented internally as a variation in flags passed to __bioset_create(). To support future extension, make the internal structure part of the API. i.e. add a 'flags' argument to bioset_create() and discard bioset_create_nobvec(). Note that the bio_split allocations in drivers/md/raid* do not need the bvec mempool - they should have used bioset_create_nobvec(). Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | Merge branch 'uuid-types' of bombadil.infradead.org:public_git/uuid into ↵Christoph Hellwig2017-06-1319-156/+51
| |\ \ \ | | | | | | | | | | | | | | | nvme-base
| * \ \ \ Merge tag 'v4.12-rc5' into for-4.13/blockJens Axboe2017-06-1251-273/+630
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've already got a few conflicts and upcoming work depends on some of the changes that have gone into mainline as regression fixes for this series. Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream trees to continue working on 4.13 changes. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | block: switch bios to blk_status_tChristoph Hellwig2017-06-0934-209/+220
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | | block_dev: propagate bio_iov_iter_get_pages error in __blkdev_direct_IOChristoph Hellwig2017-06-091-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once we move the block layer to its own status code we'll still want to propagate the bio_iov_iter_get_pages, so restructure __blkdev_direct_IO to take ret into account when returning the errno. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | | fs: simplify dio_bio_completeChristoph Hellwig2017-06-091-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only read bio->bi_error once in the common path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | | fs: remove the unused error argument to dio_end_io()Christoph Hellwig2017-06-092-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | | gfs2: remove the unused sd_log_error fieldChristoph Hellwig2017-06-092-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | | nfsd: Check queue type before submitting a SCSI requestBart Van Assche2017-06-011-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since using scsi_req() is only allowed against request queues for which struct scsi_request is the first member of their private request data, refuse to submit SCSI commands against a queue for which this is not the case. References: commit 82ed4db499b8 ("block: split scsi_request out of struct request") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: J. Bruce Fields <bfields@redhat.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Omar Sandoval <osandov@fb.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
* | | | | | Merge tag 'uuid-for-4.13' of git://git.infradead.org/users/hch/uuidLinus Torvalds2017-07-0319-156/+51
|\ \ \ \ \ \ | | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull uuid subsystem from Christoph Hellwig: "This is the new uuid subsystem, in which Amir, Andy and I have started consolidating our uuid/guid helpers and improving the types used for them. Note that various other subsystems have pulled in this tree, so I'd like it to go in early. UUID/GUID summary: - introduce the new uuid_t/guid_t types that are going to replace the somewhat confusing uuid_be/uuid_le types and make the terminology fit the various specs, as well as the userspace libuuid library. (me, based on a previous version from Amir) - consolidated generic uuid/guid helper functions lifted from XFS and libnvdimm (Amir and me) - conversions to the new types and helpers (Amir, Andy and me)" * tag 'uuid-for-4.13' of git://git.infradead.org/users/hch/uuid: (34 commits) ACPI: hns_dsaf_acpi_dsm_guid can be static mmc: sdhci-pci: make guid intel_dsm_guid static uuid: Take const on input of uuid_is_null() and guid_is_null() thermal: int340x_thermal: fix compile after the UUID API switch thermal: int340x_thermal: Switch to use new generic UUID API acpi: always include uuid.h ACPI: Switch to use generic guid_t in acpi_evaluate_dsm() ACPI / extlog: Switch to use new generic UUID API ACPI / bus: Switch to use new generic UUID API ACPI / APEI: Switch to use new generic UUID API acpi, nfit: Switch to use new generic UUID API MAINTAINERS: add uuid entry tmpfs: generate random sb->s_uuid scsi_debug: switch to uuid_t nvme: switch to uuid_t sysctl: switch to use uuid_t partitions/ldm: switch to use uuid_t overlayfs: use uuid_t instead of uuid_be fs: switch ->s_uuid to uuid_t ima/policy: switch to use uuid_t ...
| * | | | | overlayfs: use uuid_t instead of uuid_beChristoph Hellwig2017-06-052-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
| * | | | | fs: switch ->s_uuid to uuid_tChristoph Hellwig2017-06-058-27/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For some file systems we still memcpy into it, but in various places this already allows us to use the proper uuid helpers. More to come.. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> (Changes to IMA/EVM) Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
| * | | | | xfs: use the common helper uuid_is_null()Amir Goldstein2017-06-056-65/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the common helper uuid_is_null() and remove the xfs specific helper uuid_is_nil(). The common helper does not check for the NULL pointer value as xfs helper did, but xfs code never calls the helper with a pointer that can be NULL. Conform comments and warning strings to use the term 'null uuid' instead of 'nil uuid', because this is the terminology used by lib/uuid.c and its users. It is also the terminology used in userspace by libuuid and xfsprogs. Signed-off-by: Amir Goldstein <amir73il@gmail.com> [hch: remove now unused uuid.[ch]] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
| * | | | | xfs: remove uuid_getnodeuniq and xfs_uu_tChristoph Hellwig2017-06-053-27/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Opencode uuid_getnodeuniq in the only caller, and directly decode the uuid_t representation instead of using a structure cast for it. Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | uuid: hoist helpers uuid_equal() and uuid_copy() from xfsChristoph Hellwig2017-06-052-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These helper are used to compare and copy two uuid_t type objects. Signed-off-by: Amir Goldstein <amir73il@gmail.com> [hch: also provide the respective guid_ versions] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
| * | | | | uuid: rename uuid typesChristoph Hellwig2017-06-051-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our "little endian" UUID really is a Wintel GUID, so rename it and its helpers such (guid_t). The big endian UUID is the only true one, so give it the name uuid_t. The uuid_le and uuid_be names are retained for now, but will hopefully go away soon. The exception to that are the _cmp helpers that will be replaced by better primitives ASAP and thus don't get the new names. Also the _to_bin helpers are named to match the better named uuid_parse routine in userspace. Also remove the existing typedef in XFS that's now been superceeded by the generic type name. Signed-off-by: Christoph Hellwig <hch@lst.de> [andy: also update the UUID_LE/UUID_BE macros including fallout] Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | nfsd: namespace-prefix uuid_parseChristoph Hellwig2017-06-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | xfs: use uuid_be to implement the uuid_t typeChristoph Hellwig2017-06-052-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the generic Linux definition to implement our UUID type, this will allow using more generic infrastructure in the future. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | | | | xfs: use uuid_copy() helper to abstract uuid_tAmir Goldstein2017-06-051-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | uuid_t definition is about to change. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | | | | uuid,afs: move struct uuid_v1 back into afsChristoph Hellwig2017-06-053-10/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This essentially is a partial revert of commit ff548773 ("afs: Move UUID struct to linux/uuid.h") and moves struct uuid_v1 back into fs/afs as struct afs_uuid. It however keeps it as big endian structure so that we can use the normal uuid generation helpers when casting to/from struct afs_uuid. The V1 uuid intrepretation in struct form isn't really useful to the rest of the kernel, and not really compatible to it either, so move it back to AFS instead of polluting the global uuid.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David Howells <dhowells@redhat.com>
* | | | | | Merge branch 'overlayfs-linus' of ↵Linus Torvalds2017-06-301-14/+19
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "Fix two bugs in copy-up code. One introduced in 4.11 and one in 4.12-rc" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: don't set origin on broken lower hardlink ovl: copy-up: don't unlock between lookup and link
| * | | | | | ovl: don't set origin on broken lower hardlinkMiklos Szeredi2017-06-281-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When copying up a file that has multiple hard links we need to break any association with the origin file. This makes copy-up be essentially an atomic replace. The new file has nothing to do with the old one (except having the same data and metadata initially), so don't set the overlay.origin attribute. We can relax this in the future when we are able to index upper object by origin. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
| * | | | | | ovl: copy-up: don't unlock between lookup and linkMiklos Szeredi2017-06-281-12/+12
| | |_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nothing prevents mischief on upper layer while we are busy copying up the data. Move the lookup right before the looked up dentry is actually used. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 01ad3eb8a073 ("ovl: concurrent copy up of regular files") Cc: <stable@vger.kernel.org> # v4.11
* | | | | | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2017-06-291-1/+4
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "Two fixes that should go into this release. One is an nvme regression fix from Keith, fixing a missing queue freeze if the controller is being reset. This causes the reset to hang. The other is a fix for a leak of the bio protection info, if smaller sized O_DIRECT is used. This fix should be more involved as we have other problematic paths in the kernel, but given as this isn't a regression in this series, we'll tackle those for 4.13" * 'for-linus' of git://git.kernel.dk/linux-block: block: provide bio_uninit() free freeing integrity/task associations nvme/pci: Fix stuck nvme reset
| * | | | | | block: provide bio_uninit() free freeing integrity/task associationsJens Axboe2017-06-281-1/+4
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wen reports significant memory leaks with DIF and O_DIRECT: "With nvme devive + T10 enabled, On a system it has 256GB and started logging /proc/meminfo & /proc/slabinfo for every minute and in an hour it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute leaking. /proc/meminfo | grep SUnreclaim... SUnreclaim: 6752128 kB SUnreclaim: 6874880 kB SUnreclaim: 7238080 kB .... SUnreclaim: 22307264 kB SUnreclaim: 22485888 kB SUnreclaim: 22720256 kB When testcases with T10 enabled call into __blkdev_direct_IO_simple, code doesn't free memory allocated by bio_integrity_alloc. The patch fixes the issue. HTX has been run with +60 hours without failure." Since __blkdev_direct_IO_simple() allocates the bio on the stack, it doesn't go through the regular bio free. This means that any ancillary data allocated with the bio through the stack is not freed. Hence, we can leak the integrity data associated with the bio, if the device is using DIF/DIX. Fix this by providing a bio_uninit() and export it, so that we can use it to free this data. Note that this is a minimal fix for this issue. Any current user of bio's that are allocated outside of bio_alloc_bioset() suffers from this issue, most notably some drivers. We will fix those in a more comprehensive patch for 4.13. This also means that the commit marked as being fixed by this isn't the real culprit, it's just the most obvious one out there. Fixes: 542ff7bf18c6 ("block: new direct I/O implementation") Reported-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | Merge tag 'nfs-for-4.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2017-06-284-30/+29
|\ \ \ \ \ \ | |/ / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Trond Myklebust: "Bugfixes include: - stable fix for exclusive create if the server supports the umask attribute - trunking detection should handle ERESTARTSYS/EINTR - stable fix for a race in the LAYOUTGET function - stable fix to revert "nfs_rename() handle -ERESTARTSYS dentry left behind" - nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete()" * tag 'nfs-for-4.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4.1: nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete() Revert "NFS: nfs_rename() handle -ERESTARTSYS dentry left behind" NFSv4.1: Fix a race in nfs4_proc_layoutget NFS: Trunking detection should handle ERESTARTSYS/EINTR NFSv4.2: Don't send mode again in post-EXCLUSIVE4_1 SETATTR with umask