summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* xfs: fix off-by-one error in xfs_rtalloc_query_rangeDarrick J. Wong2018-06-241-2/+2
| | | | | | | | | | | | | In commit 8ad560d2565e6 ("xfs: strengthen rtalloc query range checks") we strengthened the input parameter checks in the rtbitmap range query function, but introduced an off-by-one error in the process. The call to xfs_rtfind_forw deals with the high key being rextents, but we clamp the high key to rextents - 1. This causes the returned results to stop one block short of the end of the rtdev, which is incorrect. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: fix uninitialized field in rtbitmap fsmap backendDarrick J. Wong2018-06-241-2/+2
| | | | | | | | | | | | | | Initialize the extent count field of the high key so that when we use the high key to synthesize an 'unknown owner' record (i.e. used space record) at the end of the queried range we have a field with which to compute rm_blockcount. This is not strictly necessary because the synthesizer never uses the rm_blockcount field, but we can shut up the static code analysis anyway. Coverity-id: 1437358 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: recheck reflink state after grabbing ILOCK_SHARED for a writeDarrick J. Wong2018-06-241-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reflink iflag could have changed since the earlier unlocked check, so if we got ILOCK_SHARED for a write and but we're now a reflink inode we have to switch to ILOCK_EXCL and relock. This helps us avoid blowing lock assertions in things like generic/166: XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/xfs_reflink.c, line: 383 WARNING: CPU: 1 PID: 24707 at fs/xfs/xfs_message.c:104 assfail+0x25/0x30 [xfs] Modules linked in: deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug] CPU: 1 PID: 24707 Comm: xfs_io Not tainted 4.18.0-rc1-djw #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:assfail+0x25/0x30 [xfs] Code: ff 0f 0b c3 90 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 e8 ef 1b a0 48 89 fa 31 ff e8 54 f9 ff ff 80 3d fd ba 0f 00 00 75 03 <0f> 0b c3 0f 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 63 f6 49 89 f9 RSP: 0018:ffffc90006423ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880030b65e80 RCX: 0000000000000000 RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa01b0447 RBP: ffffc90006423c10 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88003d43fc30 R11: f000000000000000 R12: ffff880077cda000 R13: 0000000000000000 R14: ffffc90006423c30 R15: ffffc90006423bf9 FS: 00007feba8986800(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000138ab58 CR3: 000000003d40a000 CR4: 00000000000006a0 Call Trace: xfs_reflink_allocate_cow+0x24c/0x3d0 [xfs] xfs_file_iomap_begin+0x6d2/0xeb0 [xfs] ? iomap_to_fiemap+0x80/0x80 iomap_apply+0x5e/0x130 iomap_dio_rw+0x2e0/0x400 ? iomap_to_fiemap+0x80/0x80 ? xfs_file_dio_aio_write+0x133/0x4a0 [xfs] xfs_file_dio_aio_write+0x133/0x4a0 [xfs] xfs_file_write_iter+0x7b/0xb0 [xfs] __vfs_write+0x16f/0x1f0 vfs_write+0xc8/0x1c0 ksys_pwrite64+0x74/0x90 do_syscall_64+0x56/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: don't allow insert-range to shift extents past the maximum offsetDarrick J. Wong2018-06-244-0/+34
| | | | | | | | | | | | | | | | | Zorro Lang reports that generic/485 blows an assert on a filesystem with 512 byte blocks. The test tries to fallocate a post-eof extent at the maximum file size and calls insert range to shift the extents right by two blocks. On a 512b block filesystem this causes startoff to overflow the 54-bit startoff field, leading to the assert. Therefore, always check the rightmost extent to see if it would overflow prior to invoking the insert range machinery. Reported-by: zlang@redhat.com Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200137 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: don't trip over negative free space in xfs_reserve_blocksDarrick J. Wong2018-06-241-1/+1
| | | | | | | | | | | | | | | | | If we somehow end up with a filesystem that has fewer free blocks than the blocks set aside to avoid ENOSPC deadlocks, it's possible that the free space calculation in xfs_reserve_blocks will spit out a negative number (because percpu_counter_sum returns s64). We fail to notice this negative number and set fdblks_delta to it. Now we increment fdblocks(!) and the unsigned type of m_resblks means that we end up setting a ridiculously huge m_resblks reservation. Avoid this comedy of errors by detecting the negative free space and returning -ENOSPC. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: allow empty transactions while frozenDarrick J. Wong2018-06-241-1/+6
| | | | | | | | | | | | In commit e89c041338ed6ef ("xfs: implement the GETFSMAP ioctl") we created the ability to obtain empty transactions. These transactions have no log or block reservations and therefore can't modify anything. Since they're also NO_WRITECOUNT they can run while the fs is frozen, so we don't need to WARN_ON about that usage. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: xfs_iflush_abort() can be called twice on cluster writeback failureDave Chinner2018-06-221-36/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a corrupt inode is detected during xfs_iflush_cluster, we can get a shutdown ASSERT failure like this: XFS (pmem1): Metadata corruption detected at xfs_symlink_shortform_verify+0x5c/0xa0, inode 0x86627 data fork XFS (pmem1): Unmount and run xfs_repair XFS (pmem1): xfs_do_force_shutdown(0x8) called from line 3372 of file fs/xfs/xfs_inode.c. Return address = ffffffff814f4116 XFS (pmem1): Corruption of in-memory data detected. Shutting down filesystem XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c. Return address = ffffffff814a8a88 XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c. Return address = ffffffff814a8ef9 XFS (pmem1): Please umount the filesystem and rectify the problem(s) XFS: Assertion failed: xfs_isiflocked(ip), file: fs/xfs/xfs_inode.h, line: 258 ..... Call Trace: xfs_iflush_abort+0x10a/0x110 xfs_iflush+0xf3/0x390 xfs_inode_item_push+0x126/0x1e0 xfsaild+0x2c5/0x890 kthread+0x11c/0x140 ret_from_fork+0x24/0x30 Essentially, xfs_iflush_abort() has been called twice on the original inode that that was flushed. This happens because the inode has been flushed to teh buffer successfully via xfs_iflush_int(), and so when another inode is detected as corrupt in xfs_iflush_cluster, the buffer is marked stale and EIO, and iodone callbacks are run on it. Running the iodone callbacks walks across the original inode and calls xfs_iflush_abort() on it. When xfs_iflush_cluster() returns to xfs_iflush(), it runs the error path for that function, and that calls xfs_iflush_abort() on the inode a second time, leading to the above assert failure as the inode is not flush locked anymore. This bug has been there a long time. The simple fix would be to just avoid calling xfs_iflush_abort() in xfs_iflush() if we've got a failure from xfs_iflush_cluster(). However, xfs_iflush_cluster() has magic delwri buffer handling that means it may or may not have run IO completion on the buffer, and hence sometimes we have to call xfs_iflush_abort() from xfs_iflush(), and sometimes we shouldn't. After reading through all the error paths and the delwri buffer code, it's clear that the error handling in xfs_iflush_cluster() is unnecessary. If the buffer is delwri, it leaves it on the delwri list so that when the delwri list is submitted it sees a shutdown fliesystem in xfs_buf_submit() and that marks the buffer stale, EIO and runs IO completion. i.e. exactly what xfs+iflush_cluster() does when it's not a delwri buffer. Further, marking a buffer stale clears the _XBF_DELWRI_Q flag on the buffer, which means when submission of the buffer occurs, it just skips over it and releases it. IOWs, the error handling in xfs_iflush_cluster doesn't need to care if the buffer is already on a the delwri queue or not - it just needs to mark the buffer stale, EIO and run completions. That means we can just use the easy fix for xfs_iflush() to avoid the double abort. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: More robust inode extent count validationDave Chinner2018-06-222-29/+50
| | | | | | | | | | | | | | | | When the inode is in extent format, it can't have more extents that fit in the inode fork. We don't currenty check this, and so this corruption goes unnoticed by the inode verifiers. This can lead to crashes operating on invalid in-memory structures. Attempts to access such a inode will now error out in the verifier rather than allowing modification operations to proceed. Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix a typedef, add some braces and breaks to shut up compiler warnings] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* xfs: simplify xfs_bmap_punch_delalloc_rangeChristoph Hellwig2018-06-221-53/+32
| | | | | | | | | | | | Instead of using xfs_bmapi_read to find delalloc extents and then punch them out using xfs_bunmapi, opencode the loop to iterate over the extents and call xfs_bmap_del_extent_delay directly. This both simplifies the code and reduces the number of extent tree lookups required. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* Linux 4.18-rc1v4.18-rc1Linus Torvalds2018-06-171-2/+2
|
* Merge tag 'for-linus-20180616' of git://git.kernel.dk/linux-blockLinus Torvalds2018-06-1616-275/+174
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "A collection of fixes that should go into -rc1. This contains: - bsg_open vs bsg_unregister race fix (Anatoliy) - NVMe pull request from Christoph, with fixes for regressions in this window, FC connect/reconnect path code unification, and a trace point addition. - timeout fix (Christoph) - remove a few unused functions (Christoph) - blk-mq tag_set reinit fix (Roman)" * tag 'for-linus-20180616' of git://git.kernel.dk/linux-block: bsg: fix race of bsg_open and bsg_unregister block: remov blk_queue_invalidate_tags nvme-fabrics: fix and refine state checks in __nvmf_check_ready nvme-fabrics: handle the admin-only case properly in nvmf_check_ready nvme-fabrics: refactor queue ready check blk-mq: remove blk_mq_tagset_iter nvme: remove nvme_reinit_tagset nvme-fc: fix nulling of queue data on reconnect nvme-fc: remove reinit_request routine blk-mq: don't time out requests again that are in the timeout handler nvme-fc: change controllers first connect to use reconnect path nvme: don't rely on the changed namespace list log nvmet: free smart-log buffer after use nvme-rdma: fix error flow during mapping request data nvme: add bio remapping tracepoint nvme: fix NULL pointer dereference in nvme_init_subsystem blk-mq: reinit q->tag_set_list entry only after grace period
| * bsg: fix race of bsg_open and bsg_unregisterAnatoliy Glagolev2018-06-151-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing implementation allows races between bsg_unregister and bsg_open paths. bsg_unregister and request_queue cleanup and deletion may start and complete right after bsg_get_device (in bsg_open path) retrieves bsg_class_device and releases the mutex. Then bsg_open path touches freed memory of bsg_class_device and request_queue. One possible fix is to hold the mutex all the way through bsg_get_device instead of releasing it after bsg_class_device retrieval. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-Off-By: Anatoliy Glagolev <glagolig@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: remov blk_queue_invalidate_tagsChristoph Hellwig2018-06-153-38/+1
| | | | | | | | | | | | | | | | This function is entirely unused, so remove it and the tag_queue_busy member of struct request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * Merge branch 'nvme-4.18' of git://git.infradead.org/nvme into for-linusJens Axboe2018-06-1511-224/+154
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NVMe fixes from Christoph: "Fix various little regressions introduced in this merge window, plus a rework of the fibre channel connect and reconnect path to share the code instead of having separate sets of bugs. Last but not least a trivial trace point addition from Hannes." * 'nvme-4.18' of git://git.infradead.org/nvme: nvme-fabrics: fix and refine state checks in __nvmf_check_ready nvme-fabrics: handle the admin-only case properly in nvmf_check_ready nvme-fabrics: refactor queue ready check blk-mq: remove blk_mq_tagset_iter nvme: remove nvme_reinit_tagset nvme-fc: fix nulling of queue data on reconnect nvme-fc: remove reinit_request routine nvme-fc: change controllers first connect to use reconnect path nvme: don't rely on the changed namespace list log nvmet: free smart-log buffer after use nvme-rdma: fix error flow during mapping request data nvme: add bio remapping tracepoint nvme: fix NULL pointer dereference in nvme_init_subsystem
| | * nvme-fabrics: fix and refine state checks in __nvmf_check_readyChristoph Hellwig2018-06-151-20/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - make sure we only allow internally generates commands in any non-live state - only allow connect commands on non-live queues when actually in the new or connecting states - treat all other non-live, non-dead states the same as a default cach-all This fixes a regression where we could not shutdown a controller orderly as we didn't allow the internal generated Property Set command, and also ensures we don't accidentally let a Connect command through in the wrong state. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Smart <james.smart@broadcom.com>
| | * nvme-fabrics: handle the admin-only case properly in nvmf_check_readyChristoph Hellwig2018-06-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | In the ADMIN_ONLY state we don't have any I/O queues, but we should accept all admin commands without further checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: James Smart <james.smart@broadcom.com>
| | * nvme-fabrics: refactor queue ready checkChristoph Hellwig2018-06-155-50/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the is_connected check to the fibre channel transport, as it has no meaning for other transports. To facilitate this split out a new nvmf_fail_nonready_command helper that is called by the transport when it is asked to handle a command on a queue that is not ready. Also avoid a function call for the queue live fast path by inlining the check. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Smart <james.smart@broadcom.com>
| | * blk-mq: remove blk_mq_tagset_iterChristoph Hellwig2018-06-142-31/+0
| | | | | | | | | | | | | | | | | | | | | Unused now that nvme stopped using it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk>
| | * nvme: remove nvme_reinit_tagsetChristoph Hellwig2018-06-142-12/+0
| | | | | | | | | | | | | | | | | | | | | Unused now that all transports stopped using it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk>
| | * nvme-fc: fix nulling of queue data on reconnectJames Smart2018-06-141-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reconnect path is calling the init routines to clear a queue structure. But the queue structure has state that perhaps needs to persist as long as the controller is live. Remove the nvme_fc_init_queue() calls on reconnect. The nvme_fc_free_queue() calls will clear state bits and reset any relevant queue state for a new connection. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-fc: remove reinit_request routineJames Smart2018-06-141-20/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reinit_request routine is not necessary. Remove support for the op callback. As all that nvme_reinit_tagset() does is itterate and call the reinit routine, it too has no purpose. Remove the call. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-fc: change controllers first connect to use reconnect pathJames Smart2018-06-141-57/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code follows the framework that has been in the transports from the beginning where initial link-side controller connect occurs as part of "creating the controller". Thus that first connect fully talks to the controller and obtains values that can then be used in for blk-mq setup, etc. It also means that everything about the controller is fully know before the "create controller" call returns. This has several weaknesses: - The initial create_ctrl call made by the cli will block for a long time as wire transactions are performed synchronously. This delay becomes longer if errors occur or connectivity is lost and retries need to be performed. - Code wise, it means there is a separate connect path for initial controller connect vs the (same) steps used in the reconnect path. - And as there's separate paths, it means there's separate error handling and retry logic. It also plays havoc with the NEW state (should transition out of it after successful initial connect) vs the RESETTING and CONNECTING (reconnect) states that want to be transitioned to on error. - As there's separate paths, to recover from errors and disruptions, it requires separate recovery/retry paths as well and can severely convolute the controller state. This patch reworks the fc transport to use the same connect paths for the initial connection as it uses for reconnect. This makes a single path for error recovery and handling. This patch: - Removes the driving of the initial connect and replaces it with a state transition to CONNECTING and initiating the reconnect thread. A dummy state transition of RESETTING had to be traversed as a direct transtion of NEW->CONNECTING is not allowed. Given that the controller is "new", the RESETTING transition is a simple no-op. Once in the reconnecting thread, the normal behaviors of ctrl_loss_tmo (max_retries * connect_delay) and dev_loss_tmo will apply before the controller is torn down. - Only if the state transitions couldn't be traversed and the reconnect thread not scheduled, will the controller be torn down while in create_ctrl. - The prior code used the controller state of NEW to indicate whether request queues had been initialized or not. For the admin queue, the request queue is always created, so there's no need to check a state. For IO queues, change to tracking whether a successful io request queue create has occurred (e.g. 1st successful connect). - The initial controller id is initialized to the dynamic controller id used in the initial connect message. It will be overwritten by the real controller id once the controller is connected on the wire. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme: don't rely on the changed namespace list logChristoph Hellwig2018-06-131-25/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't optimize our namespace rescan based on the changed namespace list log page as userspace might have changed the content through reading it. Suggested-by: Keith Busch <keith.busch@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@linux.intel.com> Reviewed-by: Hannes Reinecke <hare@suse.com>
| | * nvmet: free smart-log buffer after useChaitanya Kulkarni2018-06-111-1/+3
| | | | | | | | | | | | | | | | | | | | | Free smart-log buffer allocated in the function after use. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-rdma: fix error flow during mapping request dataMax Gurtovoy2018-06-111-7/+24
| | | | | | | | | | | | | | | | | | | | | | | | After dma mapping the sgl, we map the sgl to nvme sgl descriptor. In case of failure during the last mapping we never dma unmap the sgl. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme: add bio remapping tracepointHannes Reinecke2018-06-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | Adding a tracepoint to trace bio remapping for native nvme multipath. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme: fix NULL pointer dereference in nvme_init_subsystemIsrael Rukshin2018-06-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using nvme-pci driver the nvmf_ctrl_options is NULL. There is no need to check for discovery_nqn flag at non-fabrics controller. Fixes: 181303d0 ("nvme-fabrics: allow duplicate connections to the discovery controller") Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | blk-mq: don't time out requests again that are in the timeout handlerChristoph Hellwig2018-06-142-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can currently call the timeout handler again on a request that has already been handed over to the timeout handler. Prevent that with a new flag. Fixes: 12f5b931 ("blk-mq: Remove generation seqeunce") Reported-by: Andrew Randrianasulu <randrianasulu@gmail.com> Tested-by: Andrew Randrianasulu <randrianasulu@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | blk-mq: reinit q->tag_set_list entry only after grace periodRoman Pen2018-06-111-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is not allowed to reinit q->tag_set_list list entry while RCU grace period has not completed yet, otherwise the following soft lockup in blk_mq_sched_restart() happens: [ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270] [ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000 [ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150 [ 1064.256510] Call Trace: [ 1064.256664] <IRQ> [ 1064.256824] blk_mq_free_request+0xea/0x100 [ 1064.256987] msg_io_conf+0x59/0xd0 [ibnbd_client] [ 1064.257175] complete_rdma_req+0xf2/0x230 [ibtrs_client] [ 1064.257340] ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core] [ 1064.257502] ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client] [ 1064.257669] ib_create_qp+0x321/0x380 [ib_core] [ 1064.257841] ib_process_cq_direct+0xbd/0x120 [ib_core] [ 1064.258007] irq_poll_softirq+0xb7/0xe0 [ 1064.258165] __do_softirq+0x106/0x2a2 [ 1064.258328] irq_exit+0x92/0xa0 [ 1064.258509] do_IRQ+0x4a/0xd0 [ 1064.258660] common_interrupt+0x7a/0x7a [ 1064.258818] </IRQ> Meanwhile another context frees other queue but with the same set of shared tags: [ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds. [ 1288.201833] bash D 0 5910 5820 0x00000000 [ 1288.202016] Call Trace: [ 1288.202315] schedule+0x32/0x80 [ 1288.202462] schedule_timeout+0x1e5/0x380 [ 1288.203838] wait_for_completion+0xb0/0x120 [ 1288.204137] __wait_rcu_gp+0x125/0x160 [ 1288.204287] synchronize_sched+0x6e/0x80 [ 1288.204770] blk_mq_free_queue+0x74/0xe0 [ 1288.204922] blk_cleanup_queue+0xc7/0x110 [ 1288.205073] ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client] [ 1288.205389] ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client] [ 1288.205548] kernfs_fop_write+0x109/0x180 [ 1288.206328] vfs_write+0xb3/0x1a0 [ 1288.206476] SyS_write+0x52/0xc0 [ 1288.206624] do_syscall_64+0x68/0x1d0 [ 1288.206774] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 What happened is the following: 1. There are several MQ queues with shared tags. 2. One queue is about to be freed and now task is in blk_mq_del_queue_tag_set(). 3. Other CPU is in blk_mq_sched_restart() and loops over all queues in tag list in order to find hctx to restart. Because linked list entry was modified in blk_mq_del_queue_tag_set() without proper waiting for a grace period, blk_mq_sched_restart() never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup. Fix is simple: reinit list entry after an RCU grace period elapsed. Fixes: Fixes: 705cda97ee3a ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list") Cc: stable@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-block@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | Merge tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimentalLinus Torvalds2018-06-16206-339/+372
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull documentation fixes from Mauro Carvalho Chehab: "This solves a series of broken links for files under Documentation, and improves a script meant to detect such broken links (see scripts/documentation-file-ref-check). The changes on this series are: - can.rst: fix a footnote reference; - crypto_engine.rst: Fix two parsing warnings; - Fix a lot of broken references to Documentation/*; - improve the scripts/documentation-file-ref-check script, in order to help detecting/fixing broken references, preventing false-positives. After this patch series, only 33 broken references to doc files are detected by scripts/documentation-file-ref-check" * tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental: (26 commits) fix a series of Documentation/ broken file name references Documentation: rstFlatTable.py: fix a broken reference ABI: sysfs-devices-system-cpu: remove a broken reference devicetree: fix a series of wrong file references devicetree: fix name of pinctrl-bindings.txt devicetree: fix some bindings file names MAINTAINERS: fix location of DT npcm files MAINTAINERS: fix location of some display DT bindings kernel-parameters.txt: fix pointers to sound parameters bindings: nvmem/zii: Fix location of nvmem.txt docs: Fix more broken references scripts/documentation-file-ref-check: check tools/*/Documentation scripts/documentation-file-ref-check: get rid of false-positives scripts/documentation-file-ref-check: hint: dash or underline scripts/documentation-file-ref-check: add a fix logic for DT scripts/documentation-file-ref-check: accept more wildcards at filenames scripts/documentation-file-ref-check: fix help message media: max2175: fix location of driver's companion documentation media: v4l: fix broken video4linux docs locations media: dvb: point to the location of the old README.dvb-usb file ...
| * | | fix a series of Documentation/ broken file name referencesMauro Carvalho Chehab2018-06-158-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As files move around, their previous links break. Fix the references for them. Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | Documentation: rstFlatTable.py: fix a broken referenceMauro Carvalho Chehab2018-06-151-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old HOWTO was removed a long time ago. The flat table version is not metioned elsewhere, so just get rid of the text. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | ABI: sysfs-devices-system-cpu: remove a broken referenceMauro Carvalho Chehab2018-06-151-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This file doesn't exist anymore: Documentation/cpu-freq/user-guide.txt As the ABI already points to Documentation/cpu-freq, just remove the broken link and the associated text. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | devicetree: fix a series of wrong file referencesMauro Carvalho Chehab2018-06-158-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As files got renamed, their references broke. Manually fix a series of broken refs at the DT bindings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | devicetree: fix name of pinctrl-bindings.txtMauro Carvalho Chehab2018-06-159-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename: pinctrl-binding.txt -> pinctrl-bindings.txt In order to match the current name of this file. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | devicetree: fix some bindings file namesMauro Carvalho Chehab2018-06-153-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There were some file movements that changed the location for some DT bindings. Fix them with: scripts/documentation-file-ref-check --fix After manually checking if the new file makes sense. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | MAINTAINERS: fix location of DT npcm filesMauro Carvalho Chehab2018-06-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The specified locations are not right. Fix the wildcard logic to point to the correct directories. Without that, get-maintainer won't get things right: $ ./scripts/get_maintainer.pl --no-git-fallback --no-r --no-n --no-l -f Documentation/devicetree/bindings/arm/cpu-enable-method/nuvoton,npcm750-smp robh+dt@kernel.org (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS) mark.rutland@arm.com (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS) After the patch, it will properly point to NPCM arch maintainers: $ ./scripts/get_maintainer.pl --no-git-fallback --no-r --no-n --no-l -f Documentation/devicetree/bindings/arm/cpu-enable-method/nuvoton,npcm750-smp avifishman70@gmail.com (supporter:ARM/NUVOTON NPCM ARCHITECTURE) tmaimon77@gmail.com (supporter:ARM/NUVOTON NPCM ARCHITECTURE) robh+dt@kernel.org (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS) mark.rutland@arm.com (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS) Cc: Avi Fishman <avifishman70@gmail.com> Cc: Tomer Maimon <tmaimon77@gmail.com> Cc: Patrick Venture <venture@google.com> Cc: Nancy Yuen <yuenn@google.com> Cc: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | MAINTAINERS: fix location of some display DT bindingsMauro Carvalho Chehab2018-06-151-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Those files got a manufacturer's name prepended and were moved around. Adjust their references accordingly. Also, due those movements, Documentation/devicetree/bindings/video doesn't exist anymore. Cc: David Airlie <airlied@linux.ie> Cc: David Lechner <david@lechnology.com> Cc: Peter Senna Tschudin <peter.senna@collabora.com> Cc: Martin Donnelly <martin.donnelly@ge.com> Cc: Martyn Welch <martyn.welch@collabora.co.uk> Cc: Stefan Agner <stefan@agner.ch> Cc: Alison Wang <alison.wang@nxp.com> Cc: Eric Anholt <eric@anholt.net> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | kernel-parameters.txt: fix pointers to sound parametersMauro Carvalho Chehab2018-06-151-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The alsa parameters file was renamed to alsa-configuration.rst. With regards to OSS, it got retired as a hole by at changeset 727dede0ba8a ("sound: Retire OSS"). So, it doesn't make sense to keep mentioning it at kernel-parameters.txt. Fixes: 727dede0ba8a ("sound: Retire OSS") Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | bindings: nvmem/zii: Fix location of nvmem.txtMauro Carvalho Chehab2018-06-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The location pointed there is missing "bindings/" on its path. Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | docs: Fix more broken referencesMauro Carvalho Chehab2018-06-1523-41/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we move stuff around, some doc references are broken. Fix some of them via this script: ./scripts/documentation-file-ref-check --fix Manually checked that produced results are valid. Acked-by: Matthias Brugger <matthias.bgg@gmail.com> Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | scripts/documentation-file-ref-check: check tools/*/DocumentationMauro Carvalho Chehab2018-06-151-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some files, like tools/memory-model/README has references to a Documentation file that is locale to it. Handle references that are relative to them too. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | scripts/documentation-file-ref-check: get rid of false-positivesMauro Carvalho Chehab2018-06-151-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the number of broken refs are smaller, improve the logic that gets rid of false-positives. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | scripts/documentation-file-ref-check: hint: dash or underlineMauro Carvalho Chehab2018-06-151-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes, people use dash instead of underline or vice-versa. Try to autocorrect it. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | scripts/documentation-file-ref-check: add a fix logic for DTMauro Carvalho Chehab2018-06-151-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several links broken due to DT file movements. Add a hint logic to seek for those changes. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | scripts/documentation-file-ref-check: accept more wildcards at filenamesMauro Carvalho Chehab2018-06-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | at MAINTAINERS, some filename paths use '?' and things like [7,9]. So, accept more wildcards, in order to avoid false-positives. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | scripts/documentation-file-ref-check: fix help messageMauro Carvalho Chehab2018-06-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The name of the --fix option was renamed, but it was not changed at the quick help message. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | media: max2175: fix location of driver's companion documentationMauro Carvalho Chehab2018-06-151-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a missing ".rst" at the doc's file name. Acked-by: Ramesh Shanmugasundaram <Ramesh.shanmugasundaram@bp.renesas.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | media: v4l: fix broken video4linux docs locationsMauro Carvalho Chehab2018-06-157-15/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several places pointing to old documentation files: Documentation/video4linux/API.html Documentation/video4linux/bttv/ Documentation/video4linux/cx2341x/fw-encoder-api.txt Documentation/video4linux/m5602.txt Documentation/video4linux/v4l2-framework.txt Documentation/video4linux/videobuf Documentation/video4linux/Zoran Make them point to the new location where available, removing otherwise. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
| * | | media: dvb: point to the location of the old README.dvb-usb fileMauro Carvalho Chehab2018-06-1545-45/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This file got renamed, but the references still point to the old place. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>