summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* iomap: wire up the iopoll methodChristoph Hellwig2019-02-244-15/+32
| | | | | | | | | | | | | | Store the request queue the last bio was submitted to in the iocb private data in addition to the cookie so that we find the right block device. Also refactor the common direct I/O bio submission code into a nice little helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Modified to use bio_set_polled(). Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: add bio_set_polled() helperJens Axboe2019-02-242-2/+16
| | | | | | | | | | | | | For the upcoming async polled IO, we can't sleep allocating requests. If we do, then we introduce a deadlock where the submitter already has async polled IO in-flight, but can't wait for them to complete since polled requests must be active found and reaped. Utilize the helper in the blockdev DIRECT_IO code. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: wire up block device iopoll methodChristoph Hellwig2019-02-241-1/+17
| | | | | | | | | | Just call blk_poll on the iocb cookie, we can derive the block device from the inode trivially. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* fs: add an iopoll method to struct file_operationsChristoph Hellwig2019-02-242-0/+5
| | | | | | | | | | | | | | This new methods is used to explicitly poll for I/O completion for an iocb. It must be called for any iocb submitted asynchronously (that is with a non-null ki_complete) which has the IOCB_HIPRI flag set. The method is assisted by a new ki_cookie field in struct iocb to store the polling cookie. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()Dongli Zhang2019-02-221-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()") moves blkdev_reread_part() out of the loop_ctl_mutex. However, GENHD_FL_NO_PART_SCAN is set before __blkdev_reread_part(). As a result, __blkdev_reread_part() will fail the check of GENHD_FL_NO_PART_SCAN and will not rescan the loop device to delete all partitions. Below are steps to reproduce the issue: step1 # dd if=/dev/zero of=tmp.raw bs=1M count=100 step2 # losetup -P /dev/loop0 tmp.raw step3 # parted /dev/loop0 mklabel gpt step4 # parted -a none -s /dev/loop0 mkpart primary 64s 1 step5 # losetup -d /dev/loop0 Step5 will not be able to delete /dev/loop0p1 (introduced by step4) and there is below kernel warning message: [ 464.414043] __loop_clr_fd: partition scan of loop0 failed (rc=-22) This patch sets GENHD_FL_NO_PART_SCAN after blkdev_reread_part(). Fixes: 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* loop: do not print warn message if partition scan is successfulDongli Zhang2019-02-221-2/+3
| | | | | | | | | Do not print warn message when the partition scan returns 0. Fixes: d57f3374ba48 ("loop: Move special partition reread handling in loop_clr_fd()") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: bounce: make sure that bvec table is updatedMing Lei2019-02-211-2/+6
| | | | | | | | | | | | | | | | | | | Block bounce needs to allocate new page for doing IO, and the new page has to be updated to bvec table. Commit 6dc4f100c switches __blk_queue_bounce() to use the new bio_for_each_segment_all() interface. Unfortunately the new bio_for_each_segment_all() can't be used to update bvec table. This patch fixes this issue by retrieving bvec from the table directly, then the new allocated page can be updated to the bio. This way is safe because the cloned bio is single page bvec. Fixes: 6dc4f100c ("block: allow bio_for_each_segment_all() to iterate over multi-page bvec") Cc: Christoph Hellwig <hch@lst.de> Cc: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge branch 'nvme-5.1' of git://git.infradead.org/nvme into for-5.1/blockJens Axboe2019-02-2129-289/+168
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NVMe changes for 5.1 from Christoph * 'nvme-5.1' of git://git.infradead.org/nvme: (22 commits) nvme-rdma: use nr_phys_segments when map rq to sgl nvmet: convert to SPDX identifiers nvmet-rdma: convert to SPDX identifiers nvme-loop: convert to SPDX identifiers nvmet-fcloop: convert to SPDX identifiers nvmet-fc: convert to SPDX identifiers nvme: convert to SPDX identifiers nvme-pci: convert to SPDX identifiers nvme-lightnvm: convert to SPDX identifiers nvme-rdma: convert to SPDX identifiers nvme-fc: convert to SPDX identifiers nvme-fabrics: convert to SPDX identifiers nvme-tcp.h: fix SPDX header nvme_ioctl.h: remove duplicate GPL boilerplate nvme: return error from nvme_alloc_ns() nvme: avoid that deleting a controller triggers a circular locking complaint nvme: introduce a helper function for controller deletion nvme: unexport nvme_delete_ctrl_sync() nvme-pci: check kstrtoint() return value in queue_count_set() nvme-fabrics: document the poll function argument ...
| * nvme-rdma: use nr_phys_segments when map rq to sglChaitanya Kulkarni2019-02-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Use blk_rq_nr_phys_segments() instead of blk_rq_payload_bytes() to check if a command contains data to be mapped. This fixes the case where a struct request contains LBAs, but it has no payload, such as Write Zeroes support. Fixes: 6e02318eaea5 ("nvme: add support for the Write Zeroes command") Reported-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvmet: convert to SPDX identifiersChristoph Hellwig2019-02-207-63/+7
| | | | | | | | | | | | | | | | Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| * nvmet-rdma: convert to SPDX identifiersChristoph Hellwig2019-02-201-9/+1
| | | | | | | | | | | | | | | | Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| * nvme-loop: convert to SPDX identifiersChristoph Hellwig2019-02-201-9/+1
| | | | | | | | | | | | | | | | Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| * nvmet-fcloop: convert to SPDX identifiersChristoph Hellwig2019-02-201-12/+1
| | | | | | | | | | | | | | | | Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| * nvmet-fc: convert to SPDX identifiersChristoph Hellwig2019-02-201-13/+1
| | | | | | | | | | | | | | | | Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| * nvme: convert to SPDX identifiersChristoph Hellwig2019-02-206-46/+6
| | | | | | | | | | | | | | | | Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| * nvme-pci: convert to SPDX identifiersChristoph Hellwig2019-02-201-9/+1
| | | | | | | | | | | | | | | | Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| * nvme-lightnvm: convert to SPDX identifiersChristoph Hellwig2019-02-201-15/+1
| | | | | | | | | | | | | | | | Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| * nvme-rdma: convert to SPDX identifiersChristoph Hellwig2019-02-202-18/+2
| | | | | | | | | | | | | | | | Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| * nvme-fc: convert to SPDX identifiersChristoph Hellwig2019-02-203-35/+3
| | | | | | | | | | | | | | | | Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| * nvme-fabrics: convert to SPDX identifiersChristoph Hellwig2019-02-202-18/+2
| | | | | | | | | | | | | | | | Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| * nvme-tcp.h: fix SPDX headerChristoph Hellwig2019-02-201-1/+1
| | | | | | | | | | | | | | For .h files we need to use /* */ style comments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| * nvme_ioctl.h: remove duplicate GPL boilerplateChristoph Hellwig2019-02-202-18/+1
| | | | | | | | | | | | | | We already have a ЅPDX header, so no need to duplicate the information. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| * nvme: return error from nvme_alloc_ns()Hannes Reinecke2019-02-201-10/+21
| | | | | | | | | | | | | | | | nvme_alloc_ns() might fail, so we should be returning an error code. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: avoid that deleting a controller triggers a circular locking complaintBart Van Assche2019-02-201-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rework nvme_delete_ctrl_sync() such that it does not have to wait for queued work. This patch avoids that test nvme/008 triggers the following complaint: WARNING: possible circular locking dependency detected 5.0.0-rc6-dbg+ #10 Not tainted ------------------------------------------------------ nvme/7918 is trying to acquire lock: 000000009a1a7b69 ((work_completion)(&ctrl->delete_work)){+.+.}, at: __flush_work+0x379/0x410 but task is already holding lock: 00000000ef5a45b4 (kn->count#389){++++}, at: kernfs_remove_self+0x196/0x210 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (kn->count#389){++++}: lock_acquire+0xc5/0x1e0 __kernfs_remove+0x42a/0x4a0 kernfs_remove_by_name_ns+0x45/0x90 remove_files.isra.1+0x3a/0x90 sysfs_remove_group+0x5c/0xc0 sysfs_remove_groups+0x39/0x60 device_remove_attrs+0x68/0xb0 device_del+0x24d/0x570 cdev_device_del+0x1a/0x50 nvme_delete_ctrl_work+0xbd/0xe0 process_one_work+0x4f1/0xa40 worker_thread+0x67/0x5b0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #0 ((work_completion)(&ctrl->delete_work)){+.+.}: __lock_acquire+0x1323/0x17b0 lock_acquire+0xc5/0x1e0 __flush_work+0x399/0x410 flush_work+0x10/0x20 nvme_delete_ctrl_sync+0x65/0x70 nvme_sysfs_delete+0x4f/0x60 dev_attr_store+0x3e/0x50 sysfs_kf_write+0x87/0xa0 kernfs_fop_write+0x186/0x240 __vfs_write+0xd7/0x430 vfs_write+0xfa/0x260 ksys_write+0xab/0x130 __x64_sys_write+0x43/0x50 do_syscall_64+0x71/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(kn->count#389); lock((work_completion)(&ctrl->delete_work)); lock(kn->count#389); lock((work_completion)(&ctrl->delete_work)); *** DEADLOCK *** 3 locks held by nvme/7918: #0: 00000000e2223b44 (sb_writers#6){.+.+}, at: vfs_write+0x1eb/0x260 #1: 000000003404976f (&of->mutex){+.+.}, at: kernfs_fop_write+0x128/0x240 #2: 00000000ef5a45b4 (kn->count#389){++++}, at: kernfs_remove_self+0x196/0x210 stack backtrace: CPU: 4 PID: 7918 Comm: nvme Not tainted 5.0.0-rc6-dbg+ #10 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x86/0xca print_circular_bug.isra.36.cold.54+0x173/0x1d5 check_prev_add.constprop.45+0x996/0x1110 __lock_acquire+0x1323/0x17b0 lock_acquire+0xc5/0x1e0 __flush_work+0x399/0x410 flush_work+0x10/0x20 nvme_delete_ctrl_sync+0x65/0x70 nvme_sysfs_delete+0x4f/0x60 dev_attr_store+0x3e/0x50 sysfs_kf_write+0x87/0xa0 kernfs_fop_write+0x186/0x240 __vfs_write+0xd7/0x430 vfs_write+0xfa/0x260 ksys_write+0xab/0x130 __x64_sys_write+0x43/0x50 do_syscall_64+0x71/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: introduce a helper function for controller deletionBart Van Assche2019-02-201-4/+9
| | | | | | | | | | | | | | | | | | This patch does not change any functionality but makes the next patch in this series easier to read. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: unexport nvme_delete_ctrl_sync()Bart Van Assche2019-02-202-3/+1
| | | | | | | | | | | | | | | | | | | | Since nvme_delete_ctrl_sync() is not called from any other kernel module, unexport it. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-pci: check kstrtoint() return value in queue_count_set()Bart Van Assche2019-02-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | This patch avoids that the compiler complains about 'ret' being set but not being used when building with W=1. Fixes: 3b6592f70ad7 ("nvme: utilize two queue maps, one for reads and one for writes") # v5.0-rc1 Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fabrics: document the poll function argumentBart Van Assche2019-02-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | This patch avoids that the kernel-doc tool reports a warning when building with W=1. Fixes: 26c682274e0a ("nvme-fabrics: allow nvmf_connect_io_queue to poll") # v5.0-rc1 Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvmet: fix indentationBart Van Assche2019-02-201-1/+1
| | | | | | | | | | | | | | | | | | | | This patch avoids that smatch complains about inconsistent indentation. Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") # v4.10 Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-multipath: round-robin I/O policyHannes Reinecke2019-02-203-1/+100
|/ | | | | | | | | | | | | Implement a simple round-robin I/O policy for multipathing. Path selection is done in two rounds, first iterating across all optimized paths, and if that doesn't return any valid paths, iterate over all optimized and non-optimized paths. If no paths are found, use the existing algorithm. Also add a sysfs attribute 'iopolicy' to switch between the current NUMA-aware I/O policy and the 'round-robin' I/O policy. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* block: avoid to READ fields of null bioMing Lei2019-02-191-1/+3
| | | | | | | | | | | rq->bio can be NULL sometimes, such as flush request, so don't read bio->bi_seg_front_size until this 'bio' is checked as valid. Cc: Bart Van Assche <bvanassche@acm.org> Reported-by: Bart Van Assche <bvanassche@acm.org> Fixes: dcebd755926b0f39dd1e ("block: use bio_for_each_bvec() to compute multi-page bvec count") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'v5.0-rc6' into for-5.1/blockJens Axboe2019-02-15566-2185/+5560
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull in 5.0-rc6 to avoid a dumb merge conflict with fs/iomap.c. This is needed since io_uring is now based on the block branch, to avoid a conflict between the multi-page bvecs and the bits of io_uring that touch the core block parts. * tag 'v5.0-rc6': (525 commits) Linux 5.0-rc6 x86/mm: Make set_pmd_at() paravirt aware MAINTAINERS: Update the ocores i2c bus driver maintainer, etc blk-mq: remove duplicated definition of blk_mq_freeze_queue Blk-iolatency: warn on negative inflight IO counter blk-iolatency: fix IO hang due to negative inflight counter MAINTAINERS: unify reference to xen-devel list x86/mm/cpa: Fix set_mce_nospec() futex: Handle early deadlock return correctly futex: Fix barrier comment net: dsa: b53: Fix for failure when irq is not defined in dt blktrace: Show requests without sector mips: cm: reprime error cause mips: loongson64: remove unreachable(), fix loongson_poweroff(). sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach() geneve: should not call rt6_lookup() when ipv6 was disabled KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221) KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222) kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974) signal: Better detection of synchronous signals ...
| * Linux 5.0-rc6v5.0-rc6Linus Torvalds2019-02-101-1/+1
| |
| * Merge tag 'dmaengine-fix-5.0-rc6' of ↵Linus Torvalds2019-02-104-76/+53
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.infradead.org/users/vkoul/slave-dma Pull dmaengine fixes from Vinod Koul: - Fix in at_xdmac fr wrongful channel state - Fix for imx driver for wrong callback invocation - Fix to bcm driver for interrupt race & transaction abort. - Fix in dmatest to abort in mapping error * tag 'dmaengine-fix-5.0-rc6' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: dmatest: Abort test in case of mapping error dmaengine: bcm2835: Fix abort of transactions dmaengine: bcm2835: Fix interrupt race on RT dmaengine: imx-dma: fix wrong callback invoke dmaengine: at_xdmac: Fix wrongfull report of a channel as in use
| | * dmaengine: dmatest: Abort test in case of mapping errorAndy Shevchenko2019-02-041-18/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of mapping error the DMA addresses are invalid and continuing will screw system memory or potentially something else. [ 222.480310] dmatest: dma0chan7-copy0: summary 1 tests, 3 failures 6 iops 349 KB/s (0) ... [ 240.912725] check: Corrupted low memory at 00000000c7c75ac9 (2940 phys) = 5656000000000000 [ 240.921998] check: Corrupted low memory at 000000005715a1cd (2948 phys) = 279f2aca5595ab2b [ 240.931280] check: Corrupted low memory at 000000002f4024c0 (2950 phys) = 5e5624f349e793cf ... Abort any test if mapping failed. Fixes: 4076e755dbec ("dmatest: convert to dmaengine_unmap_data") Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vkoul@kernel.org>
| | * Merge branch 'fix/brcm' into fixesVinod Koul2019-02-041-45/+25
| | |\
| | | * dmaengine: bcm2835: Fix abort of transactionsLukas Wunner2019-02-041-32/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are multiple issues with bcm2835_dma_abort() (which is called on termination of a transaction): * The algorithm to abort the transaction first pauses the channel by clearing the ACTIVE flag in the CS register, then waits for the PAUSED flag to clear. Page 49 of the spec documents the latter as follows: "Indicates if the DMA is currently paused and not transferring data. This will occur if the active bit has been cleared [...]" https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf So the function is entering an infinite loop because it is waiting for PAUSED to clear which is always set due to the function having cleared the ACTIVE flag. The only thing that's saving it from itself is the upper bound of 10000 loop iterations. The code comment says that the intention is to "wait for any current AXI transfer to complete", so the author probably wanted to check the WAITING_FOR_OUTSTANDING_WRITES flag instead. Amend the function accordingly. * The CS register is only read at the beginning of the function. It needs to be read again after pausing the channel and before checking for outstanding writes, otherwise writes which were issued between the register read at the beginning of the function and pausing the channel may not be waited for. * The function seeks to abort the transfer by writing 0 to the NEXTCONBK register and setting the ABORT and ACTIVE flags. Thereby, the 0 in NEXTCONBK is sought to be loaded into the CONBLK_AD register. However experimentation has shown this approach to not work: The CONBLK_AD register remains the same as before and the CS register contains 0x00000030 (PAUSED | DREQ_STOPS_DMA). In other words, the control block is not aborted but merely paused and it will be resumed once the next DMA transaction is started. That is absolutely not the desired behavior. A simpler approach is to set the channel's RESET flag instead. This reliably zeroes the NEXTCONBK as well as the CS register. It requires less code and only a single MMIO write. This is also what popular user space DMA drivers do, e.g.: https://github.com/metachris/RPIO/blob/master/source/c_pwm/pwm.c Note that the spec is contradictory whether the NEXTCONBK register is writeable at all. On the one hand, page 41 claims: "The value loaded into the NEXTCONBK register can be overwritten so that the linked list of Control Block data structures can be dynamically altered. However it is only safe to do this when the DMA is paused." On the other hand, page 40 specifies: "Only three registers in each channel's register set are directly writeable (CS, CONBLK_AD and DEBUG). The other registers (TI, SOURCE_AD, DEST_AD, TXFR_LEN, STRIDE & NEXTCONBK), are automatically loaded from a Control Block data structure held in external memory." Fixes: 96286b576690 ("dmaengine: Add support for BCM2835") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v3.14+ Cc: Frank Pavlic <f.pavlic@kunbus.de> Cc: Martin Sperl <kernel@martin.sperl.org> Cc: Florian Meier <florian.meier@koalo.de> Cc: Clive Messer <clive.m.messer@gmail.com> Cc: Matthias Reichl <hias@horus.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Acked-by: Florian Kauer <florian.kauer@koalo.de> Signed-off-by: Vinod Koul <vkoul@kernel.org>
| | | * dmaengine: bcm2835: Fix interrupt race on RTLukas Wunner2019-02-041-15/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If IRQ handlers are threaded (either because CONFIG_PREEMPT_RT_BASE is enabled or "threadirqs" was passed on the command line) and if system load is sufficiently high that wakeup latency of IRQ threads degrades, SPI DMA transactions on the BCM2835 occasionally break like this: ks8851 spi0.0: SPI transfer timed out bcm2835-dma 3f007000.dma: DMA transfer could not be terminated ks8851 spi0.0 eth2: ks8851_rdfifo: spi_sync() failed The root cause is an assumption made by the DMA driver which is documented in a code comment in bcm2835_dma_terminate_all(): /* * Stop DMA activity: we assume the callback will not be called * after bcm_dma_abort() returns (even if it does, it will see * c->desc is NULL and exit.) */ That assumption falls apart if the IRQ handler bcm2835_dma_callback() is threaded: A client may terminate a descriptor and issue a new one before the IRQ handler had a chance to run. In fact the IRQ handler may miss an *arbitrary* number of descriptors. The result is the following race condition: 1. A descriptor finishes, its interrupt is deferred to the IRQ thread. 2. A client calls dma_terminate_async() which sets channel->desc = NULL. 3. The client issues a new descriptor. Because channel->desc is NULL, bcm2835_dma_issue_pending() immediately starts the descriptor. 4. Finally the IRQ thread runs and writes BCM2835_DMA_INT to the CS register to acknowledge the interrupt. This clears the ACTIVE flag, so the newly issued descriptor is paused in the middle of the transaction. Because channel->desc is not NULL, the IRQ thread finalizes the descriptor and tries to start the next one. I see two possible solutions: The first is to call synchronize_irq() in bcm2835_dma_issue_pending() to wait until the IRQ thread has finished before issuing a new descriptor. The downside of this approach is unnecessary latency if clients desire rapidly terminating and re-issuing descriptors and don't have any use for an IRQ callback. (The SPI TX DMA channel is a case in point.) A better alternative is to make the IRQ thread recognize that it has missed descriptors and avoid finalizing the newly issued descriptor. So first of all, set the ACTIVE flag when acknowledging the interrupt. This keeps a newly issued descriptor running. If the descriptor was finished, the channel remains idle despite the ACTIVE flag being set. However the ACTIVE flag can then no longer be used to check whether the channel is idle, so instead check whether the register containing the current control block address is zero and finalize the current descriptor only if so. That way, there is no impact on latency and throughput if the client doesn't care for the interrupt: Only minimal additional overhead is introduced for non-cyclic descriptors as one further MMIO read is necessary per interrupt to check for idleness of the channel. Cyclic descriptors are sped up slightly by removing one MMIO write per interrupt. Fixes: 96286b576690 ("dmaengine: Add support for BCM2835") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v3.14+ Cc: Frank Pavlic <f.pavlic@kunbus.de> Cc: Martin Sperl <kernel@martin.sperl.org> Cc: Florian Meier <florian.meier@koalo.de> Cc: Clive Messer <clive.m.messer@gmail.com> Cc: Matthias Reichl <hias@horus.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Acked-by: Florian Kauer <florian.kauer@koalo.de> Signed-off-by: Vinod Koul <vkoul@kernel.org>
| | * | dmaengine: imx-dma: fix wrong callback invokeLeonid Iziumtsev2019-02-041-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once the "ld_queue" list is not empty, next descriptor will migrate into "ld_active" list. The "desc" variable will be overwritten during that transition. And later the dmaengine_desc_get_callback_invoke() will use it as an argument. As result we invoke wrong callback. That behaviour was in place since: commit fcaaba6c7136 ("dmaengine: imx-dma: fix callback path in tasklet"). But after commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") things got worse, since possible delay between tasklet_schedule() from DMA irq handler and actual tasklet function execution got bigger. And that gave more time for new DMA request to be submitted and to be put into "ld_queue" list. It has been noticed that DMA issue is causing problems for "mxc-mmc" driver. While stressing the system with heavy network traffic and writing/reading to/from sd card simultaneously the timeout may happen: 10013000.sdhci: mxcmci_watchdog: read time out (status = 0x30004900) That often lead to file system corruption. Signed-off-by: Leonid Iziumtsev <leonid.iziumtsev@gmail.com> Signed-off-by: Vinod Koul <vkoul@kernel.org> Cc: stable@vger.kernel.org
| | * | dmaengine: at_xdmac: Fix wrongfull report of a channel as in useCodrin Ciubotariu2019-02-021-9/+10
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | atchan->status variable is used to store two different information: - pass channel interrupts status from interrupt handler to tasklet; - channel information like whether it is cyclic or paused; This causes a bug when device_terminate_all() is called, (AT_XDMAC_CHAN_IS_CYCLIC cleared on atchan->status) and then a late End of Block interrupt arrives (AT_XDMAC_CIS_BIS), which sets bit 0 of atchan->status. Bit 0 is also used for AT_XDMAC_CHAN_IS_CYCLIC, so when a new descriptor for a cyclic transfer is created, the driver reports the channel as in use: if (test_and_set_bit(AT_XDMAC_CHAN_IS_CYCLIC, &atchan->status)) { dev_err(chan2dev(chan), "channel currently used\n"); return NULL; } This patch fixes the bug by adding a different struct member to keep the interrupts status separated from the channel status bits. Fixes: e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver") Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com> Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com> Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2019-02-104-26/+29
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A handful of fixes: - Fix an MCE corner case bug/crash found via MCE injection testing - Fix 5-level paging boot crash - Fix MCE recovery cache invalidation bug - Fix regression on Xen guests caused by a recent PMD level mremap speedup optimization" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Make set_pmd_at() paravirt aware x86/mm/cpa: Fix set_mce_nospec() x86/boot/compressed/64: Do not corrupt EDX on EFER.LME=1 setting x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()
| | * | x86/mm: Make set_pmd_at() paravirt awareJuergen Gross2019-02-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | set_pmd_at() calls native_set_pmd() unconditionally on x86. This was fine as long as only huge page entries were written via set_pmd_at(), as Xen pv guests don't support those. Commit 2c91bd4a4e2e53 ("mm: speed up mremap by 20x on large regions") introduced a usage of set_pmd_at() possible on pv guests, leading to failures like: BUG: unable to handle kernel paging request at ffff888023e26778 #PF error: [PROT] [WRITE] RIP: e030:move_page_tables+0x7c1/0xae0 move_vma.isra.3+0xd1/0x2d0 __se_sys_mremap+0x3c6/0x5b0 do_syscall_64+0x49/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Make set_pmd_at() paravirt aware by just letting it use set_pmd(). Fixes: 2c91bd4a4e2e53 ("mm: speed up mremap by 20x on large regions") Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: boris.ostrovsky@oracle.com Cc: sstabellini@kernel.org Cc: hpa@zytor.com Cc: bp@alien8.de Cc: torvalds@linux-foundation.org Link: https://lkml.kernel.org/r/20190210074056.11842-1-jgross@suse.com
| | * | x86/mm/cpa: Fix set_mce_nospec()Peter Zijlstra2019-02-081-25/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent commit fe0937b24ff5 ("x86/mm/cpa: Fold cpa_flush_range() and cpa_flush_array() into a single cpa_flush() function") accidentally made the call to make_addr_canonical_again() go away, which breaks set_mce_nospec(). Re-instate the call to convert the address back into canonical form right before invoking either CLFLUSH or INVLPG. Rename the function while at it to be shorter (and less MAGA). Fixes: fe0937b24ff5 ("x86/mm/cpa: Fold cpa_flush_range() and cpa_flush_array() into a single cpa_flush() function") Reported-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tony Luck <tony.luck@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Rik van Riel <riel@surriel.com> Link: https://lkml.kernel.org/r/20190208120859.GH32511@hirez.programming.kicks-ass.net
| | * | x86/boot/compressed/64: Do not corrupt EDX on EFER.LME=1 settingKirill A. Shutemov2019-02-061-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RDMSR in the trampoline code overwrites EDX but that register is used to indicate whether 5-level paging has to be enabled and if clobbered, leads to failure to boot on a 5-level paging machine. Preserve EDX on the stack while we are dealing with EFER. Fixes: b677dfae5aa1 ("x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode") Reported-by: Kyle D Pelton <kyle.d.pelton@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: dave.hansen@linux.intel.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Huang <wei@redhat.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190206115253.1907-1-kirill.shutemov@linux.intel.com
| | * | x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()Tony Luck2019-02-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Internal injection testing crashed with a console log that said: mce: [Hardware Error]: CPU 7: Machine Check Exception: f Bank 0: bd80000000100134 This caused a lot of head scratching because the MCACOD (bits 15:0) of that status is a signature from an L1 data cache error. But Linux says that it found it in "Bank 0", which on this model CPU only reports L1 instruction cache errors. The answer was that Linux doesn't initialize "m->bank" in the case that it finds a fatal error in the mce_no_way_out() pre-scan of banks. If this was a local machine check, then this partially initialized struct mce is being passed to mce_panic(). Fix is simple: just initialize m->bank in the case of a fatal error. Fixes: 40c36e2741d7 ("x86/mce: Fix incorrect "Machine check from unknown source" message") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: x86-ml <x86@kernel.org> Cc: stable@vger.kernel.org # v4.18 Note pre-v5.0 arch/x86/kernel/cpu/mce/core.c was called arch/x86/kernel/cpu/mcheck/mce.c Link: https://lkml.kernel.org/r/20190201003341.10638-1-tony.luck@intel.com
| * | | Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds2019-02-103-24/+85
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: "irqchip driver fixes: most of them are race fixes for ARM GIC (General Interrupt Controller) variants, but also a fix for the ARM MMP (Marvell PXA168 et al) irqchip affecting OLPC keyboards" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3-its: Fix ITT_entry_size accessor irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable irqchip/gic-v3-its: Gracefully fail on LPI exhaustion irqchip/gic-v3-its: Plug allocation race for devices sharing a DevID irqchip/gic-v4: Fix occasional VLPI drop
| | * \ \ Merge tag 'irqchip-5.0-3' of ↵Thomas Gleixner2019-02-073-24/+85
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip updates from Marc Zyngier: - Another GICv3 ITS fix for devices sharing the same DevID - Don't return invalid data on exhaustion of the GICv3 LPI pool - Fix a GICv3 field decoding bug leading to memory over-allocation - Init GICv4 at boot time instead of lazy init - Fix interrupt masking on PJ4
| | | * | | irqchip/gic-v3-its: Fix ITT_entry_size accessorZenghui Yu2019-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to ARM IHI 0069C (ID070116), we should use GITS_TYPER's bits [7:4] as ITT_entry_size instead of [8:4]. Although this is pretty annoying, it only results in a potential over-allocation of memory, and nothing bad happens. Fixes: 3dfa576bfb45 ("irqchip/gic-v3-its: Add probing for VLPI properties") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> [maz: massaged subject and commit message] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | | * | | irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disableLubomir Rintel2019-01-291-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resetting bit 4 disables the interrupt delivery to the "secure processor" core. This breaks the keyboard on a OLPC XO 1.75 laptop, where the firmware running on the "secure processor" bit-bangs the PS/2 protocol over the GPIO lines. It is not clear what the rest of the bits are and Marvell was unhelpful when asked for documentation. Aside from the SP bit, there are probably priority bits. Leaving the unknown bits as the firmware set them up seems to be a wiser course of action compared to just turning them off. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Acked-by: Pavel Machek <pavel@ucw.cz> [maz: fixed-up subject and commit message] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | | * | | irqchip/gic-v3-its: Gracefully fail on LPI exhaustionMarc Zyngier2019-01-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the unlikely event that we cannot find any available LPI in the system, we should gracefully return an error instead of carrying on with no LPI allocated at all. Fixes: 38dd7c494cf6 ("irqchip/gic-v3-its: Drop chunk allocation compatibility") Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>