summaryrefslogtreecommitdiffstats
path: root/drivers/block (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2019-03-101-4/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: "Several fixes, most notably fix for virtio on swiotlb systems" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: silence an unused-variable warning virtio: hint if callbacks surprisingly might sleep virtio-ccw: wire up ->bus_name callback s390/virtio: handle find on invalid queue gracefully virtio-ccw: diag 500 may return a negative cookie virtio_balloon: remove the unnecessary 0-initialization virtio-balloon: improve update_balloon_size_func virtio-blk: Consider virtio_max_dma_size() for maximum segment size virtio: Introduce virtio_max_dma_size() dma: Introduce dma_max_mapping_size() swiotlb: Add is_swiotlb_active() function swiotlb: Introduce swiotlb_max_mapping_size()
| * virtio-blk: Consider virtio_max_dma_size() for maximum segment sizeJoerg Roedel2019-03-061-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | Segments can't be larger than the maximum DMA mapping size supported on the platform. Take that into account when setting the maximum segment size for a block device. Cc: stable@vger.kernel.org Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | Merge tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-blockLinus Torvalds2019-03-088-49/+52
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block layer updates from Jens Axboe: "Not a huge amount of changes in this round, the biggest one is that we finally have Mings multi-page bvec support merged. Apart from that, this pull request contains: - Small series that avoids quiescing the queue for sysfs changes that match what we currently have (Aleksei) - Series of bcache fixes (via Coly) - Series of lightnvm fixes (via Mathias) - NVMe pull request from Christoph. Nothing major, just SPDX/license cleanups, RR mp policy (Hannes), and little fixes (Bart, Chaitanya). - BFQ series (Paolo) - Save blk-mq cpu -> hw queue mapping, removing a pointer indirection for the fast path (Jianchao) - fops->iopoll() added for async IO polling, this is a feature that the upcoming io_uring interface will use (Christoph, me) - Partition scan loop fixes (Dongli) - mtip32xx conversion from managed resource API (Christoph) - cdrom registration race fix (Guenter) - MD pull from Song, two minor fixes. - Various documentation fixes (Marcos) - Multi-page bvec feature. This brings a lot of nice improvements with it, like more efficient splitting, larger IOs can be supported without growing the bvec table size, and so on. (Ming) - Various little fixes to core and drivers" * tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block: (117 commits) block: fix updating bio's front segment size block: Replace function name in string with __func__ nbd: propagate genlmsg_reply return code floppy: remove set but not used variable 'q' null_blk: fix checking for REQ_FUA block: fix NULL pointer dereference in register_disk fs: fix guard_bio_eod to check for real EOD errors blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map block: optimize bvec iteration in bvec_iter_advance block: introduce mp_bvec_for_each_page() for iterating over page block: optimize blk_bio_segment_split for single-page bvec block: optimize __blk_segment_map_sg() for single-page bvec block: introduce bvec_nth_page() iomap: wire up the iopoll method block: add bio_set_polled() helper block: wire up block device iopoll method fs: add an iopoll method to struct file_operations loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part() loop: do not print warn message if partition scan is successful block: bounce: make sure that bvec table is updated ...
| * | nbd: propagate genlmsg_reply return codeLi RongQing2019-02-281-2/+1
| | | | | | | | | | | | | | | | | | | | | genlmsg_reply can fail, so propagate its return code Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | floppy: remove set but not used variable 'q'YueHaibing2019-02-281-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes gcc '-Wunused-but-set-variable' warning: drivers/block/floppy.c: In function 'request_done': drivers/block/floppy.c:2233:24: warning: variable 'q' set but not used [-Wunused-but-set-variable] It's never used and can be removed. Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | null_blk: fix checking for REQ_FUAHeinz Mauelshagen2019-02-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | null_handle_bio() erroneously uses the bio_op macro which masks respective request flag bits including REQ_FUA out thus failing the check. Fix by checking bio->bi_opf directly. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()Dongli Zhang2019-02-221-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()") moves blkdev_reread_part() out of the loop_ctl_mutex. However, GENHD_FL_NO_PART_SCAN is set before __blkdev_reread_part(). As a result, __blkdev_reread_part() will fail the check of GENHD_FL_NO_PART_SCAN and will not rescan the loop device to delete all partitions. Below are steps to reproduce the issue: step1 # dd if=/dev/zero of=tmp.raw bs=1M count=100 step2 # losetup -P /dev/loop0 tmp.raw step3 # parted /dev/loop0 mklabel gpt step4 # parted -a none -s /dev/loop0 mkpart primary 64s 1 step5 # losetup -d /dev/loop0 Step5 will not be able to delete /dev/loop0p1 (introduced by step4) and there is below kernel warning message: [ 464.414043] __loop_clr_fd: partition scan of loop0 failed (rc=-22) This patch sets GENHD_FL_NO_PART_SCAN after blkdev_reread_part(). Fixes: 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | loop: do not print warn message if partition scan is successfulDongli Zhang2019-02-221-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Do not print warn message when the partition scan returns 0. Fixes: d57f3374ba48 ("loop: Move special partition reread handling in loop_clr_fd()") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block: kill BLK_MQ_F_SG_MERGEMing Lei2019-02-155-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | QUEUE_FLAG_NO_SG_MERGE has been killed, so kill BLK_MQ_F_SG_MERGE too. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block: loop: pass multi-page bvec to iov_iterMing Lei2019-02-151-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | iov_iter is implemented on bvec itererator helpers, so it is safe to pass multi-page bvec to it, and this way is much more efficient than passing one page in each bvec. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block: kill QUEUE_FLAG_FLUSH_NQJens Axboe2019-02-091-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | We have various helpers for setting/clearing this flag, and also a helper to check if the queue supports queueable flushes or not. But nobody uses them anymore, kill it with fire. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | mtip32xx: ѕtop abusing the managed resource APIsChristoph Hellwig2019-01-311-21/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mtip32xx driver uses managed resources for DMA coherent memory and irqs, but then always pairs them with free calls anyway, making the resource tracking rather pointless. Given some DMA allocations are transient anyway, the irq freeing seems to require ordering vs other hardware access the best solution seems to be to stop using the managed resource API entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | lib/lzo: separate lzo-rle from lzoDave Rodgman2019-03-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To prevent any issues with persistent data, separate lzo-rle from lzo so that it is treated as a separate algorithm, and lzo is still available. Link: http://lkml.kernel.org/r/20190205155944.16007-3-dave.rodgman@arm.com Signed-off-by: Dave Rodgman <dave.rodgman@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Markus F.X.J. Oberhumer <markus@oberhumer.com> Cc: Matt Sealey <matt.sealey@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <nitingupta910@gmail.com> Cc: Richard Purdie <rpurdie@openedhand.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Sonny Rao <sonnyrao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'driver-core-5.1-rc1' of ↵Linus Torvalds2019-03-061-26/+19
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big driver core patchset for 5.1-rc1 More patches than "normal" here this merge window, due to some work in the driver core by Alexander Duyck to rework the async probe functionality to work better for a number of devices, and independant work from Rafael for the device link functionality to make it work "correctly". Also in here is: - lots of BUS_ATTR() removals, the macro is about to go away - firmware test fixups - ihex fixups and simplification - component additions (also includes i915 patches) - lots of minor coding style fixups and cleanups. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits) driver core: platform: remove misleading err_alloc label platform: set of_node in platform_device_register_full() firmware: hardcode the debug message for -ENOENT driver core: Add missing description of new struct device_link field driver core: Fix PM-runtime for links added during consumer probe drivers/component: kerneldoc polish async: Add cmdline option to specify drivers to be async probed driver core: Fix possible supplier PM-usage counter imbalance PM-runtime: Fix __pm_runtime_set_status() race with runtime resume driver: platform: Support parsing GpioInt 0 in platform_get_irq() selftests: firmware: fix verify_reqs() return value Revert "selftests: firmware: remove use of non-standard diff -Z option" Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config" device: Fix comment for driver_data in struct device kernfs: Allocating memory for kernfs_iattrs with kmem_cache. sysfs: remove unused include of kernfs-internal.h driver core: Postpone DMA tear-down until after devres release driver core: Document limitation related to DL_FLAG_RPM_ACTIVE PM-runtime: Take suppliers into account in __pm_runtime_set_status() device.h: Add __cold to dev_<level> logging functions ...
| * | | Merge 5.0-rc6 into driver-core-nextGreg Kroah-Hartman2019-02-111-2/+3
| |\| | | | | | | | | | | | | | | | | | | | | | We need the debugfs fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | block: rbd: convert to use BUS_ATTR_WO and ROGreg Kroah-Hartman2019-01-221-26/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are trying to get rid of BUS_ATTR() and the usage of that in rbd.c can be trivially converted to use BUS_ATTR_WO and RO, so use those macros instead. Cc: Sage Weil <sage@redhat.com> Cc: Alex Elder <elder@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Acked-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | mm: replace all open encodings for NUMA_NO_NODEAnshuman Khandual2019-03-061-2/+3
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "Replace all open encodings for NUMA_NO_NODE", v3. All these places for replacement were found by running the following grep patterns on the entire kernel code. Please let me know if this might have missed some instances. This might also have replaced some false positives. I will appreciate suggestions, inputs and review. 1. git grep "nid == -1" 2. git grep "node == -1" 3. git grep "nid = -1" 4. git grep "node = -1" This patch (of 2): At present there are multiple places where invalid node number is encoded as -1. Even though implicitly understood it is always better to have macros in there. Replace these open encodings for an invalid node number with the global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like 'invalid node' from various places redirecting them to a common definition. Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe] Acked-by: Jens Axboe <axboe@kernel.dk> [mtip32xx] Acked-by: Vinod Koul <vkoul@kernel.org> [dmaengine.c] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Doug Ledford <dledford@redhat.com> [drivers/infiniband] Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Hans Verkuil <hverkuil@xs4all.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | floppy: check_events callback should not return a negative numberYufen Yu2019-02-121-1/+1
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | floppy_check_events() is supposed to return bit flags to say which events occured. We should return zero to say that no event flags are set. Only BIT(0) and BIT(1) are used in the caller. And .check_events interface also expect to return an unsigned int value. However, after commit a0c80efe5956, it may return -EINTR (-4u). Here, both BIT(0) and BIT(1) are cleared. So this patch shouldn't affect runtime, but it obviously is still worth fixing. Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: a0c80efe5956 ("floppy: fix lock_fdc() signal handling") Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'for-linus-20190118' of git://git.kernel.dk/linux-blockLinus Torvalds2019-01-191-2/+3
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: - block size setting fixes for loop/nbd (Jan Kara) - md bio_alloc_mddev() cleanup (Marcos) - Ensure we don't lose the REQ_INTEGRITY flag (Ming) - Two NVMe fixes by way of Christoph: - Fix NVMe IRQ calculation (Ming) - Uninitialized variable in nvmet-tcp (Sagi) - BFQ comment fix (Paolo) - License cleanup for recently added blk-mq-debugfs-zoned (Thomas) * tag 'for-linus-20190118' of git://git.kernel.dk/linux-block: block: Cleanup license notice nvme-pci: fix nvme_setup_irqs() nvmet-tcp: fix uninitialized variable access block: don't lose track of REQ_INTEGRITY flag blockdev: Fix livelocks on loop device nbd: Use set_blocksize() to set device blocksize md: Make bio_alloc_mddev use bio_alloc_bioset block, bfq: fix comments on __bfq_deactivate_entity
| * nbd: Use set_blocksize() to set device blocksizeJan Kara2019-01-151-2/+3
| | | | | | | | | | | | | | | | | | | | NBD can update block device block size implicitely through bd_set_size(). Make it explicitely set blocksize with set_blocksize() as this behavior of bd_set_size() is going away. CC: Josef Bacik <jbacik@fb.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'for-linus-20190112' of git://git.kernel.dk/linux-blockLinus Torvalds2019-01-122-2/+34
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: - NVMe pull request from Christoph, with little fixes all over the map - Loop caching fix for offset/bs change (Jaegeuk Kim) - Block documentation tweaks (Jeff, Jon, Weiping, John) - null_blk zoned tweak (John) - ahch mvebu suspend/resume support. Should have gone into the merge window, but there was some confusion on which tree had it. (Miquel) * tag 'for-linus-20190112' of git://git.kernel.dk/linux-block: (22 commits) ata: ahci: mvebu: request PHY suspend/resume for Armada 3700 ata: ahci: mvebu: add Armada 3700 initialization needed for S2RAM ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs ata: ahci: mvebu: remove stale comment ata: libahci_platform: comply to PHY framework loop: drop caches if offset or block_size are changed block: fix kerneldoc comment for blk_attempt_plug_merge() nvme: don't initlialize ctrl->cntlid twice nvme: introduce NVME_QUIRK_IGNORE_DEV_SUBNQN nvme: pad fake subsys NQN vid and ssvid with zeros nvme-multipath: zero out ANA log buffer nvme-fabrics: unset write/poll queues for discovery controllers nvme-tcp: don't ask if controller is fabrics nvme-tcp: remove dead code nvme-pci: fix out of bounds access in nvme_cqe_pending nvme-pci: rerun irq setup on IO queue init errors nvme-pci: use the same attributes when freeing host_mem_desc_bufs. nvme-pci: fix the wrong setting of nr_maps block: doc: add slice_idle_us to bfq documentation block: clarify documentation for blk_{start|finish}_plug ...
| * loop: drop caches if offset or block_size are changedJaegeuk Kim2019-01-101-2/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we don't drop caches used in old offset or block_size, we can get old data from new offset/block_size, which gives unexpected data to user. For example, Martijn found a loopback bug in the below scenario. 1) LOOP_SET_FD loads first two pages on loop file 2) LOOP_SET_STATUS64 changes the offset on the loop file 3) mount is failed due to the cached pages having wrong superblock Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Reported-by: Martijn Coenen <maco@google.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * null_blk: add zoned config support informationJohn Pittman2019-01-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the kernel is built without CONFIG_BLK_DEV_ZONED, a modprobe of the null_blk driver with zoned=1 fails with 'Invalid argument'. This can be confusing to users, prompting a search as to why the parameter is invalid. To assist in that search, add a bit more information to the failure, additionally adding to the documentation that CONFIG_BLK_DEV_ZONED is needed for zoned=1. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: John Pittman <jpittman@redhat.com> Added null_blk prefix to error message. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'remove-dma_zalloc_coherent-5.0' of ↵Linus Torvalds2019-01-121-2/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.infradead.org/users/hch/dma-mapping Pull dma_zalloc_coherent() removal from Christoph Hellwig: "We've always had a weird situation around dma_zalloc_coherent. To safely support mapping the allocations to userspace major architectures like x86 and arm have always zeroed allocations from dma_alloc_coherent, but a couple other architectures were missing that zeroing either always or in corner cases. Then later we grew anothe dma_zalloc_coherent interface to explicitly request zeroing, but that just added __GFP_ZERO to the allocation flags, which for some allocators that didn't end up using the page allocator ended up being a no-op and still not zeroing the allocations. So for this merge window I fixed up all remaining architectures to zero the memory in dma_alloc_coherent, and made dma_zalloc_coherent a no-op wrapper around dma_alloc_coherent, which fixes all of the above issues. dma_zalloc_coherent is now pointless and can go away, and Luis helped me writing a cocchinelle script and patch series to kill it, which I think we should apply now just after -rc1 to finally settle these issue" * tag 'remove-dma_zalloc_coherent-5.0' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: remove dma_zalloc_coherent() cross-tree: phase out dma_zalloc_coherent() on headers cross-tree: phase out dma_zalloc_coherent()
| * | cross-tree: phase out dma_zalloc_coherent()Luis Chamberlain2019-01-081-2/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We already need to zero out memory for dma_alloc_coherent(), as such using dma_zalloc_coherent() is superflous. Phase it out. This change was generated with the following Coccinelle SmPL patch: @ replace_dma_zalloc_coherent @ expression dev, size, data, handle, flags; @@ -dma_zalloc_coherent(dev, size, handle, flags) +dma_alloc_coherent(dev, size, handle, flags) Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> [hch: re-ran the script on the latest tree] Signed-off-by: Christoph Hellwig <hch@lst.de>
* | Merge tag 'ceph-for-5.0-rc2' of git://github.com/ceph/ceph-clientLinus Torvalds2019-01-111-5/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Ilya Dryomov: "A patch to allow setting abort_on_full and a fix for an old "rbd unmap" edge case, marked for stable" * tag 'ceph-for-5.0-rc2' of git://github.com/ceph/ceph-client: rbd: don't return 0 on unmap if RBD_DEV_FLAG_REMOVING is set ceph: use vmf_error() in ceph_filemap_fault() libceph: allow setting abort_on_full for rbd
| * | rbd: don't return 0 on unmap if RBD_DEV_FLAG_REMOVING is setIlya Dryomov2019-01-101-5/+4
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a window between when RBD_DEV_FLAG_REMOVING is set and when the device is removed from rbd_dev_list. During this window, we set "already" and return 0. Returning 0 from write(2) can confuse userspace tools because 0 indicates that nothing was written. In particular, "rbd unmap" will retry the write multiple times a second: 10:28:05.463299 write(4, "0", 1) = 0 10:28:05.463509 write(4, "0", 1) = 0 10:28:05.463720 write(4, "0", 1) = 0 10:28:05.463942 write(4, "0", 1) = 0 10:28:05.464155 write(4, "0", 1) = 0 Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
* / zram: idle writeback fixes and cleanupMinchan Kim2019-01-092-26/+69
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch includes some fixes and cleanup for idle-page writeback. 1. writeback_limit interface Now writeback_limit interface is rather conusing. For example, once writeback limit budget is exausted, admin can see 0 from /sys/block/zramX/writeback_limit which is same semantic with disable writeback_limit at this moment. IOW, admin cannot tell that zero came from disable writeback limit or exausted writeback limit. To make the interface clear, let's sepatate enable of writeback limit to another knob - /sys/block/zram0/writeback_limit_enable * before: while true : # to re-enable writeback limit once previous one is used up echo 0 > /sys/block/zram0/writeback_limit echo $((200<<20)) > /sys/block/zram0/writeback_limit .. .. # used up the writeback limit budget * new # To enable writeback limit, from the beginning, admin should # enable it. echo $((200<<20)) > /sys/block/zram0/writeback_limit echo 1 > /sys/block/zram/0/writeback_limit_enable while true : echo $((200<<20)) > /sys/block/zram0/writeback_limit .. .. # used up the writeback limit budget It's much strightforward. 2. fix condition check idle/huge writeback mode check The mode in writeback_store is not bit opeartion any more so no need to use bit operations. Furthermore, current condition check is broken in that it does writeback every pages regardless of huge/idle. 3. clean up idle_store No need to use goto. [minchan@kernel.org: missed spin_lock_init] Link: http://lkml.kernel.org/r/20190103001601.GA255139@google.com Link: http://lkml.kernel.org/r/20181224033529.19450-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Suggested-by: John Dias <joaodias@google.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: John Dias <joaodias@google.com> Cc: Srinivas Paladugu <srnvs@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* block: sunvdc: don't run hw queue synchronously from irq contextMing Lei2019-01-031-1/+1
| | | | | | | | | | | | | | | | vdc_blk_queue_start() may be called from irq context, so we can't run queue via blk_mq_start_hw_queues() since we never allow to run queue from irq context. Use blk_mq_start_stopped_hw_queues(q, true) to fix this issue. Fixes: fa182a1fa97dff56cd ("sunvdc: convert to blk-mq") Reported-by: Anatoly Pugachev <matorola@gmail.com> Tested-by: Anatoly Pugachev <matorola@gmail.com> Cc: Anatoly Pugachev <matorola@gmail.com> Cc: sparclinux@vger.kernel.org Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2019-01-031-2/+81
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio/vhost updates from Michael Tsirkin: "Features, fixes, cleanups: - discard in virtio blk - misc fixes and cleanups" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: correct the related warning message vhost: split structs into a separate header file virtio: remove deprecated VIRTIO_PCI_CONFIG() vhost/vsock: switch to a mutex for vhost_vsock_hash virtio_blk: add discard and write zeroes support
| * virtio_blk: add discard and write zeroes supportChangpeng Liu2018-12-201-2/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 88c85538, "virtio-blk: add discard and write zeroes features to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio block specification has been extended to add VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES commands. This patch enables support for discard and write zeroes in the virtio-blk driver when the device advertises the corresponding features, VIRTIO_BLK_F_DISCARD and VIRTIO_BLK_F_WRITE_ZEROES. Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Signed-off-by: Daniel Verkamp <dverkamp@chromium.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
* | Merge tag 'for-4.21/block-20190102' of git://git.kernel.dk/linux-blockLinus Torvalds2019-01-0315-100/+437
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more block updates from Jens Axboe: - Dead code removal for loop/sunvdc (Chengguang) - Mark BIDI support for bsg as deprecated, logging a single dmesg warning if anyone is actually using it (Christoph) - blkcg cleanup, killing a dead function and making the tryget_closest variant easier to read (Dennis) - Floppy fixes, one fixing a regression in swim3 (Finn) - lightnvm use-after-free fix (Gustavo) - gdrom leak fix (Wenwen) - a set of drbd updates (Lars, Luc, Nathan, Roland) * tag 'for-4.21/block-20190102' of git://git.kernel.dk/linux-block: (28 commits) block/swim3: Fix regression on PowerBook G3 block/swim3: Fix -EBUSY error when re-opening device after unmount block/swim3: Remove dead return statement block/amiflop: Don't log error message on invalid ioctl gdrom: fix a memory leak bug lightnvm: pblk: fix use-after-free bug block: sunvdc: remove redundant code block: loop: remove redundant code bsg: deprecate BIDI support in bsg blkcg: remove unused __blkg_release_rcu() blkcg: clean up blkg_tryget_closest() drbd: Change drbd_request_detach_interruptible's return type to int drbd: Avoid Clang warning about pointless switch statment drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire") drbd: skip spurious timeout (ping-timeo) when failing promote drbd: don't retry connection if peers do not agree on "authentication" settings drbd: fix print_st_err()'s prototype to match the definition drbd: avoid spurious self-outdating with concurrent disconnect / down drbd: do not block when adjusting "disk-options" while IO is frozen drbd: fix comment typos ...
| * | block/swim3: Fix regression on PowerBook G3Finn Thain2018-12-311-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of v4.20, the swim3 driver crashes when loaded on a PowerBook G3 (Wallstreet). MacIO PCI driver attached to Gatwick chipset MacIO PCI driver attached to Heathrow chipset swim3 0.00015000:floppy: [fd0] SWIM3 floppy controller in media bay 0.00013020:ch-a: ttyS0 at MMIO 0xf3013020 (irq = 16, base_baud = 230400) is a Z85c30 ESCC - Serial port 0.00013000:ch-b: ttyS1 at MMIO 0xf3013000 (irq = 17, base_baud = 230400) is a Z85c30 ESCC - Infrared port macio: fixed media-bay irq on gatwick macio: fixed left floppy irqs swim3 1.00015000:floppy: [fd1] Couldn't request interrupt Unable to handle kernel paging request for data at address 0x00000024 Faulting instruction address: 0xc02652f8 Oops: Kernel access of bad area, sig: 11 [#1] BE SMP NR_CPUS=2 PowerMac Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.20.0 #2 NIP: c02652f8 LR: c026915c CTR: c0276d1c REGS: df43ba10 TRAP: 0300 Not tainted (4.20.0) MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28228288 XER: 00000100 DAR: 00000024 DSISR: 40000000 GPR00: c026915c df43bac0 df439060 c0731524 df494700 00000000 c06e1c08 00000001 GPR08: 00000001 00000000 df5ff220 00001032 28228282 00000000 c0004ca4 00000000 GPR16: 00000000 00000000 00000000 c073144c dfffe064 c0731524 00000120 c0586108 GPR24: c073132c c073143c c073143c 00000000 c0731524 df67cd70 df494700 00000001 NIP [c02652f8] blk_mq_free_rqs+0x28/0xf8 LR [c026915c] blk_mq_sched_tags_teardown+0x58/0x84 Call Trace: [df43bac0] [c0045f50] flush_workqueue_prep_pwqs+0x178/0x1c4 (unreliable) [df43bae0] [c026915c] blk_mq_sched_tags_teardown+0x58/0x84 [df43bb00] [c02697f0] blk_mq_exit_sched+0x9c/0xb8 [df43bb20] [c0252794] elevator_exit+0x84/0xa4 [df43bb40] [c0256538] blk_exit_queue+0x30/0x50 [df43bb50] [c0256640] blk_cleanup_queue+0xe8/0x184 [df43bb70] [c034732c] swim3_attach+0x330/0x5f0 [df43bbb0] [c034fb24] macio_device_probe+0x58/0xec [df43bbd0] [c032ba88] really_probe+0x1e4/0x2f4 [df43bc00] [c032bd28] driver_probe_device+0x64/0x204 [df43bc20] [c0329ac4] bus_for_each_drv+0x60/0xac [df43bc50] [c032b824] __device_attach+0xe8/0x160 [df43bc80] [c032ab38] bus_probe_device+0xa0/0xbc [df43bca0] [c0327338] device_add+0x3d8/0x630 [df43bcf0] [c0350848] macio_add_one_device+0x444/0x48c [df43bd50] [c03509f8] macio_pci_add_devices+0x168/0x1bc [df43bd90] [c03500ec] macio_pci_probe+0xc0/0x10c [df43bda0] [c02ad884] pci_device_probe+0xd4/0x184 [df43bdd0] [c032ba88] really_probe+0x1e4/0x2f4 [df43be00] [c032bd28] driver_probe_device+0x64/0x204 [df43be20] [c032bfcc] __driver_attach+0x104/0x108 [df43be40] [c0329a00] bus_for_each_dev+0x64/0xb4 [df43be70] [c032add8] bus_add_driver+0x154/0x238 [df43be90] [c032ca24] driver_register+0x84/0x148 [df43bea0] [c0004aa0] do_one_initcall+0x40/0x188 [df43bf00] [c0690100] kernel_init_freeable+0x138/0x1d4 [df43bf30] [c0004cbc] kernel_init+0x18/0x10c [df43bf40] [c00121e4] ret_from_kernel_thread+0x14/0x1c Instruction dump: 5484d97e 4bfff4f4 9421ffe0 7c0802a6 bf410008 7c9e2378 90010024 8124005c 2f890000 419e0078 81230004 7c7c1b78 <81290024> 2f890000 419e0064 81440000 ---[ end trace 12025ab921a9784c ]--- Reverting commit 8ccb8cb1892b ("swim3: convert to blk-mq") resolves the problem. That commit added a struct blk_mq_tag_set to struct floppy_state and initialized it with a blk_mq_init_sq_queue() call. Unfortunately, there is a memset() in swim3_add_device() that subsequently clears the floppy_state struct. That means fs->tag_set->ops is a NULL pointer, and it gets dereferenced by blk_mq_free_rqs() which gets called in the request_irq() error path. Move the memset() to fix this bug. BTW, the request_irq() failure for the left mediabay floppy (fd1) is not a regression. I don't know why it happens. The right media bay floppy (fd0) works fine however. Reported-and-tested-by: Stan Johnson <userm57@yahoo.com> Fixes: 8ccb8cb1892b ("swim3: convert to blk-mq") Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block/swim3: Fix -EBUSY error when re-opening device after unmountFinn Thain2018-12-311-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the block device is opened with FMODE_EXCL, ref_count is set to -1. This value doesn't get reset when the device is closed which means the device cannot be opened again. Fix this by checking for refcount <= 0 in the release method. Reported-and-tested-by: Stan Johnson <userm57@yahoo.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block/swim3: Remove dead return statementFinn Thain2018-12-311-1/+0
| | | | | | | | | | | | | | | | | | Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block/amiflop: Don't log error message on invalid ioctlFinn Thain2018-12-311-2/+0
| | | | | | | | | | | | | | | | | | Cc: linux-m68k@lists.linux-m68k.org Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block: sunvdc: remove redundant codeChengguang Xu2018-12-221-1/+0
| | | | | | | | | | | | | | | | | | | | | Code cleanup for removing redundant break in switch case. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block: loop: remove redundant codeChengguang Xu2018-12-221-1/+0
| | | | | | | | | | | | | | | | | | | | | Code cleanup for removing redundant break in switch case. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | drbd: Change drbd_request_detach_interruptible's return type to intNathan Chancellor2018-12-202-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clang warns when an implicit conversion is done between enumerated types: drivers/block/drbd/drbd_state.c:708:8: warning: implicit conversion from enumeration type 'enum drbd_ret_code' to different enumeration type 'enum drbd_state_rv' [-Wenum-conversion] rv = ERR_INTR; ~ ^~~~~~~~ drbd_request_detach_interruptible's only call site is in the return statement of adm_detach, which returns an int. Change the return type of drbd_request_detach_interruptible to match, silencing Clang's warning. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire")Lars Ellenberg2018-12-209-30/+251
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And also re-enable partial-zero-out + discard aligned. With the introduction of REQ_OP_WRITE_ZEROES, we started to use that for both WRITE_ZEROES and DISCARDS, hoping that WRITE_ZEROES would "do what we want", UNMAP if possible, zero-out the rest. The example scenario is some LVM "thin" backend. While an un-allocated block on dm-thin reads as zeroes, on a dm-thin with "skip_block_zeroing=true", after a partial block write allocated that block, that same block may well map "undefined old garbage" from the backends on LBAs that have not yet been written to. If we cannot distinguish between zero-out and discard on the receiving side, to avoid "undefined old garbage" to pop up randomly at later times on supposedly zero-initialized blocks, we'd need to map all discards to zero-out on the receiving side. But that would potentially do a full alloc on thinly provisioned backends, even when the expectation was to unmap/trim/discard/de-allocate. We need to distinguish on the protocol level, whether we need to guarantee zeroes (and thus use zero-out, potentially doing the mentioned full-alloc), or if we want to put the emphasis on discard, and only do a "best effort zeroing" (by "discarding" blocks aligned to discard-granularity, and zeroing only potential unaligned head and tail clippings to at least *try* to avoid "false positives" in an online-verify later), hoping that someone set skip_block_zeroing=false. For some discussion regarding this on dm-devel, see also https://www.mail-archive.com/dm-devel%40redhat.com/msg07965.html https://www.redhat.com/archives/dm-devel/2018-January/msg00271.html For backward compatibility, P_TRIM means zero-out, unless the DRBD_FF_WZEROES feature flag is agreed upon during handshake. To have upper layers even try to submit WRITE ZEROES requests, we need to announce "efficient zeroout" independently. We need to fixup max_write_zeroes_sectors after blk_queue_stack_limits(): if we can handle "zeroes" efficiently on the protocol, we want to do that, even if our backend does not announce max_write_zeroes_sectors itself. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | drbd: skip spurious timeout (ping-timeo) when failing promoteLars Ellenberg2018-12-201-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you try to promote a Secondary while connected to a Primary and allow-two-primaries is NOT set, we will wait for "ping-timeout" to give this node a chance to detect a dead primary, in case the cluster manager noticed faster than we did. But if we then are *still* connected to a Primary, we fail (after an additional timeout of ping-timout). This change skips the spurious second timeout. Most people won't notice really, since "ping-timeout" by default is half a second. But in some installations, ping-timeout may be 10 or 20 seconds or more, and spuriously delaying the error return becomes annoying. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | drbd: don't retry connection if peers do not agree on "authentication" settingsLars Ellenberg2018-12-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | emma: "Unexpected data packet AuthChallenge (0x0010)" ava: "expected AuthChallenge packet, received: ReportProtocol (0x000b)" "Authentication of peer failed, trying again." Pattern repeats. There is no point in retrying the handshake, if we expect to receive an AuthChallenge, but the peer is not even configured to expect or use a shared secret. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | drbd: fix print_st_err()'s prototype to match the definitionLuc Van Oostenryck2018-12-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | print_st_err() is defined with its 4th argument taking an 'enum drbd_state_rv' but its prototype use an int for it. Fix this by using 'enum drbd_state_rv' in the prototype too. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | drbd: avoid spurious self-outdating with concurrent disconnect / downLars Ellenberg2018-12-201-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If peers are "simultaneously" told to disconnect from each other, either explicitly, or implicitly by taking down the resource, with bad timing, one side may see its disconnect "fail" with a result of "state change failed by peer", and interpret this as "please oudate yourself". Try to catch this by checking for current connection status, and possibly retry as local-only state change instead. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | drbd: do not block when adjusting "disk-options" while IO is frozenLars Ellenberg2018-12-201-8/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "suspending" IO is overloaded. It can mean "do not allow new requests" (obviously), but it also may mean "must not complete pending IO", for example while the fencing handlers do their arbitration. When adjusting disk options, we suspend io (disallow new requests), then wait for the activity-log to become unused (drain all IO completions), and possibly replace it with a new activity log of different size. If the other "suspend IO" aspect is active, pending IO completions won't happen, and we would block forever (unkillable drbdsetup process). Fix this by skipping the activity log adjustment if the "al-extents" setting did not change. Also, in case it did change, fail early without blocking if it looks like we would block forever. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | drbd: fix comment typosLars Ellenberg2018-12-202-2/+2
| | | | | | | | | | | | | | | Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | drbd: reject attach of unsuitable uuids even if connectedLars Ellenberg2018-12-202-3/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multiple failure scenario: a) all good Connected Primary/Secondary UpToDate/UpToDate b) lose disk on Primary, Connected Primary/Secondary Diskless/UpToDate c) continue to write to the device, changes only make it to the Secondary storage. d) lose disk on Secondary, Connected Primary/Secondary Diskless/Diskless e) now try to re-attach on Primary This would have succeeded before, even though that is clearly the wrong data set to attach to (missing the modifications from c). Because we only compared our "effective" and the "to-be-attached" data generation uuid tags if (device->state.conn < C_CONNECTED). Fix: change that constraint to (device->state.pdsk != D_UP_TO_DATE) compare the uuids, and reject the attach. This patch also tries to improve the reverse scenario: first lose Secondary, then Primary disk, then try to attach the disk on Secondary. Before this patch, the attach on the Secondary succeeds, but since commit drbd: disconnect, if the wrong UUIDs are attached on a connected peer the Primary will notice unsuitable data, and drop the connection hard. Though unfortunately at a point in time during the handshake where we cannot easily abort the attach on the peer without more refactoring of the handshake. We now reject any attach to "unsuitable" uuids, as long as we can see a Primary role, unless we already have access to "good" data. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | drbd: attach on connected diskless peer must not shrink a consistent deviceLars Ellenberg2018-12-201-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | If we would reject a new handshake, if the peer had attached first, and then connected, we should force disconnect if the peer first connects, and only then attaches. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | drbd: fix confusing error message during attachLars Ellenberg2018-12-201-5/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we attach a (consistent) backing device, which knows about a last-agreed effective size, and that effective size is *larger* than the currently requested size, we refused to attach with ERR_DISK_TOO_SMALL Failure: (111) Low.dev. smaller than requested DRBD-dev. size. which is confusing to say the least. This patch changes the error code in that case to ERR_IMPLICIT_SHRINK Failure: (170) Implicit device shrinking not allowed. See kernel log. additional info from kernel: To-be-attached device has last effective > current size, and is consistent (9999 > 7777 sectors). Refusing to attach. It also allows to attach with an explicit size. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | drbd: disconnect, if the wrong UUIDs are attached on a connected peerLars Ellenberg2018-12-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With "on-no-data-accessible suspend-io", DRBD requires the next attach or connect to be to the very same data generation uuid tag it lost last. If we first lost connection to the peer, then later lost connection to our own disk, we would usually refuse to re-connect to the peer, because it presents the wrong data set. However, if the peer first connects without a disk, and then attached its disk, we accepted that same wrong data set, which would be "unexpected" by any user of that DRBD and cause "undefined results" (read: very likely data corruption). The fix is to forcefully disconnect as soon as we notice that the peer attached to the "wrong" dataset. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>