summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* block: remove blk_needs_flush_plugChristoph Hellwig2022-02-024-16/+3
| | | | | | | | | | blk_needs_flush_plug fails to account for the cb_list, which needs flushing as well. Remove it and just check if there is a plug instead of poking into the internals of the plug structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220127070549.1377856-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: pass a block_device and opf to bio_resetChristoph Hellwig2022-02-0211-52/+29
| | | | | | | | | | | | Pass the block_device that we plan to use this bio for and the operation to bio_reset to optimize the assigment. A NULL block_device can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-20-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: pass a block_device and opf to bio_initChristoph Hellwig2022-02-0226-91/+68
| | | | | | | | | | | | Pass the block_device that we plan to use this bio for and the operation to bio_init to optimize the assignment. A NULL block_device can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: pass a block_device and opf to bio_allocChristoph Hellwig2022-02-0242-194/+130
| | | | | | | | | | | | | | | Pass the block_device and operation that we plan to use this bio for to bio_alloc to optimize the assignment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: pass a block_device and opf to bio_alloc_kiocbChristoph Hellwig2022-02-023-15/+18
| | | | | | | | | | Pass the block_device and operation that we plan to use this bio for to bio_alloc_kiocb to optimize the assigment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: pass a block_device and opf to bio_alloc_biosetChristoph Hellwig2022-02-0219-79/+75
| | | | | | | | | | | | | | | Pass the block_device and operation that we plan to use this bio for to bio_alloc_bioset to optimize the assigment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: pass a block_device and opf to blk_next_bioChaitanya Kulkarni2022-02-026-27/+18
| | | | | | | | | | All callers need to set the block_device and operation, so lift that into the common code. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move blk_next_bio to bio.cChristoph Hellwig2022-02-022-13/+13
| | | | | | | | | Keep blk_next_bio next to the core bio infrastructure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* xen-blkback: bio_alloc can't fail if it is allow to sleepChristoph Hellwig2022-02-021-14/+0
| | | | | | | | | Remove handling of NULL returns from sleeping bio_alloc calls given that those can't fail. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* rnbd-srv: remove struct rnbd_dev_blk_ioChristoph Hellwig2022-02-024-38/+7
| | | | | | | | | | Only the priv field of rnbd_dev_blk_io is used, so store the value of that in bio->bi_private directly and remove the entire bio_set overhead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20220124091107.642561-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* rnbd-srv: simplify bio mapping in process_rdmaChristoph Hellwig2022-02-023-69/+16
| | | | | | | | | | | | The memory mapped in process_rdma is contiguous, so there is no need to loop over bio_add_page. Remove rnbd_bio_map_kern and just open code the bio allocation and mapping in the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jack Wang <jinpu.wang@ionons.com> Tested-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20220124091107.642561-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* drbd: bio_alloc can't fail if it is allow to sleepChristoph Hellwig2022-02-021-18/+4
| | | | | | | | | Remove handling of NULL returns from sleeping bio_alloc calls given that those can't fail. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* dm-thin: use blkdev_issue_flush instead of open coding itChristoph Hellwig2022-02-021-10/+1
| | | | | | | | | Use blkdev_issue_flush, which uses an on-stack bio instead of an opencoded version with a bio embedded into struct pool. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* dm-snap: use blkdev_issue_flush instead of open coding itChristoph Hellwig2022-02-021-20/+1
| | | | | | | | | Use blkdev_issue_flush, which uses an on-stack bio instead of an opencoded version with a bio embedded into struct dm_snapshot. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* dm-crypt: remove clone_initChristoph Hellwig2022-02-021-13/+8
| | | | | | | | | | Just open code it next to the bio allocations, which saves a few lines of code, prepares for future changes and allows to remove the duplicate bi_opf assignment for the bio_clone_fast case in kcryptd_io_read. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* dm: bio_alloc can't fail if it is allowed to sleepChristoph Hellwig2022-02-025-51/+10
| | | | | | | | | Remove handling of NULL returns from sleeping bio_alloc calls given that those can't fail. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* ntfs3: remove ntfs_alloc_bioChristoph Hellwig2022-02-021-21/+2
| | | | | | | | | | bio_alloc will never fail if it is allowed to sleep, so there is no need for this loop. Also remove the __GFP_HIGH specifier as it doesn't make sense here given that we'll always fall back to the mempool anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nfs/blocklayout: remove bl_alloc_init_bioChristoph Hellwig2022-02-021-21/+5
| | | | | | | | | bio_alloc will never fail when it can sleep. Remove the now simple bl_alloc_init_bio helper and open code it in the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nilfs2: remove nilfs_alloc_seg_bioChristoph Hellwig2022-02-021-27/+4
| | | | | | | | | bio_alloc will never fail when it can sleep. Remove the now simple nilfs_alloc_seg_bio helper and open code it in the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* fs: remove mpage_allocChristoph Hellwig2022-02-021-29/+6
| | | | | | | | | | | | | | | | | open code mpage_alloc in it's two callers and simplify the results because of the context: - __mpage_writepage always passes GFP_NOFS and can thus always sleep and will never get a NULL return from bio_alloc at all. - do_mpage_readpage can only get a non-sleeping context for readahead which never sets PF_MEMALLOC and thus doesn't need the retry loop either. Both cases will never have __GFP_HIGH set. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove genhd.hChristoph Hellwig2022-02-0262-350/+282
| | | | | | | | | | | | There is no good reason to keep genhd.h separate from the main blkdev.h header that includes it. So fold the contents of genhd.h into blkdev.h and remove genhd.h entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move blk_drop_partitions to blk.hChristoph Hellwig2022-02-022-1/+1
| | | | | | | | | | No need to have this declaration in a public header. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220124093913.742411-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move disk_{block,unblock,flush}_events to blk.hChristoph Hellwig2022-02-022-3/+3
| | | | | | | | | | No need to have these declarations in a public header. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220124093913.742411-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: deprecate autoloading based on dev_tChristoph Hellwig2022-02-023-3/+24
| | | | | | | | | | Make the legacy dev_t based autoloading optional and add a deprecation warning. This kind of autoloading has ceased to be useful about 20 years ago. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220104071647.164918-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Linux 5.17-rc2v5.17-rc2Linus Torvalds2022-01-301-1/+1
|
* Merge tag 'irq_urgent_for_v5.17_rc2_p2' of ↵Linus Torvalds2022-01-306-36/+121
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Drop an unused private data field in the AIC driver - Various fixes to the realtek-rtl driver - Make the GICv3 ITS driver compile again in !SMP configurations - Force reset of the GICv3 ITSs at probe time to avoid issues during kexec - Yet another kfree/bitmap_free conversion - Various DT updates (Renesas, SiFive) * tag 'irq_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: dt-bindings: interrupt-controller: sifive,plic: Group interrupt tuples dt-bindings: interrupt-controller: sifive,plic: Fix number of interrupts dt-bindings: irqchip: renesas-irqc: Add R-Car V3U support irqchip/gic-v3-its: Reset each ITS's BASERn register before probe irqchip/gic-v3-its: Fix build for !SMP irqchip/loongson-pch-ms: Use bitmap_free() to free bitmap irqchip/realtek-rtl: Service all pending interrupts irqchip/realtek-rtl: Fix off-by-one in routing irqchip/realtek-rtl: Map control data to virq irqchip/apple-aic: Drop unused ipi_hwirq field
| * Merge tag 'irqchip-fixes-5.17-1' of ↵Thomas Gleixner2022-01-296-36/+121
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: - Drop an unused private data field in the AIC driver - Various fixes to the realtek-rtl driver - Make the GICv3 ITS driver compile again in !SMP configurations - Force reset of the GICv3 ITSs at probe time to avoid issues during kexec - Yet another kfree/bitmap_free conversion - Various DT updates (Renesas, SiFive) Link: https://lore.kernel.org/r/20220128174217.517041-1-maz@kernel.org
| | * dt-bindings: interrupt-controller: sifive,plic: Group interrupt tuplesGeert Uytterhoeven2022-01-281-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To improve human readability and enable automatic validation, the tuples in "interrupts-extended" properties should be grouped using angle brackets. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/211705e74a2ce77de43d036c5dea032484119bf7.1643360419.git.geert@linux-m68k.org
| | * dt-bindings: interrupt-controller: sifive,plic: Fix number of interruptsGeert Uytterhoeven2022-01-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The number of interrupts lacks an upper bound, thus assuming one, causing properly grouped "interrupts-extended" properties to be flagged as an error by "make dtbs_check". Fix this by adding the missing "maxItems", using the architectural maximum of 15872 interrupts. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/f73a0aead89e1426b146c4c64f797aa035868bf0.1643360419.git.geert@linux-m68k.org
| | * dt-bindings: irqchip: renesas-irqc: Add R-Car V3U supportGeert Uytterhoeven2022-01-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Document support for the Interrupt Controller for External Devices (INT-EC) in the Renesas R-Car V3U (r8a779a0) SoC. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/85b246cc0792663c72c1bb12a8576bd23d2299d3.1643200256.git.geert+renesas@glider.be
| | * irqchip/gic-v3-its: Reset each ITS's BASERn register before probeMarc Zyngier2022-01-261-21/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A recent bug report outlined that the way GICv4.1 is handled across kexec is pretty bad. We can end-up in a situation where ITSs share memory (this is the case when SVPET==1) and reprogram the base registers, creating a situation where ITSs that are part of a given affinity group see different pointers. Which is illegal. Boo. In order to restore some sanity, reset the BASERn registers to 0 *before* probing any ITS. Although this isn't optimised at all, this is only a once-per-boot cost, which shouldn't show up on anyone's radar. Cc: Jay Chen <jkchen@linux.alibaba.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Link: https://lore.kernel.org/r/20211216190315.GA14220@lpieralisi Link: https://lore.kernel.org/r/20220124133809.1291195-1-maz@kernel.org
| | * irqchip/gic-v3-its: Fix build for !SMPArd Biesheuvel2022-01-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 835f442fdbce ("irqchip/gic-v3-its: Limit memreserve cpuhp state lifetime") added a reference to cpus_booted_once_mask, which does not exist on !SMP builds, breaking the build for such configurations. Given the intent of the check, short circuit it to always pass. Cc: Valentin Schneider <valentin.schneider@arm.com> Fixes: 835f442fdbce ("irqchip/gic-v3-its: Limit memreserve cpuhp state lifetime") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220122151614.133766-1-ardb@kernel.org
| | * irqchip/loongson-pch-ms: Use bitmap_free() to free bitmapChristophe JAILLET2022-01-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | kfree() and bitmap_free() are the same. But using the latter is more consistent when freeing memory allocated with bitmap_zalloc(). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/0b982ab54844803049c217b2899baa59602faacd.1640529916.git.christophe.jaillet@wanadoo.fr
| | * irqchip/realtek-rtl: Service all pending interruptsSander Vanheule2022-01-171-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of only servicing the lowest pending interrupt line, make sure all pending SoC interrupts are serviced before exiting the chained handler. This adds a small overhead if only one interrupt is pending, but should prevent rapid re-triggering of the handler. Signed-off-by: Sander Vanheule <sander@svanheule.net> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/5082ad3cb8b4eedf55075561b93eff6570299fe1.1641739718.git.sander@svanheule.net
| | * irqchip/realtek-rtl: Fix off-by-one in routingSander Vanheule2022-01-171-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is an offset between routing values (1..6) and the connected MIPS CPU interrupts (2..7), but no distinction was made between these two values. This issue was previously hidden during testing, because an interrupt mapping was used where for each required interrupt another (unused) routing was configured, with an offset of +1. Offset the CPU IRQ numbers by -1 to retrieve the correct routing value. Fixes: 9f3a0f34b84a ("irqchip: Add support for Realtek RTL838x/RTL839x interrupt controller") Signed-off-by: Sander Vanheule <sander@svanheule.net> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/177b920aa8d8610615692d0e657e509f363c85ca.1641739718.git.sander@svanheule.net
| | * irqchip/realtek-rtl: Map control data to virqSander Vanheule2022-01-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver assigned the irqchip and irq handler to the hardware irq, instead of the virq. This is incorrect, and only worked because these irq numbers happened to be the same on the devices used for testing the original driver. Fixes: 9f3a0f34b84a ("irqchip: Add support for Realtek RTL838x/RTL839x interrupt controller") Signed-off-by: Sander Vanheule <sander@svanheule.net> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/4b4936606480265db47df152f00bc2ed46340599.1641739718.git.sander@svanheule.net
| | * irqchip/apple-aic: Drop unused ipi_hwirq fieldMarc Zyngier2022-01-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | This field was never used, remove it. Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Hector Martin <marcan@marcan.st> Link: https://lore.kernel.org/r/20220108140118.3378937-1-maz@kernel.org
* | | Merge tag 'perf_urgent_for_v5.17_rc2_p2' of ↵Linus Torvalds2022-01-301-4/+19
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Prevent accesses to the per-CPU cgroup context list from another CPU except the one it belongs to, to avoid list corruption - Make sure parent events are always woken up to avoid indefinite hangs in the traced workload * tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix cgroup event list management perf: Always wake the parent event
| * | | perf/core: Fix cgroup event list managementNamhyung Kim2022-01-261-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The active cgroup events are managed in the per-cpu cgrp_cpuctx_list. This list is only accessed from current cpu and not protected by any locks. But from the commit ef54c1a476ae ("perf: Rework perf_event_exit_event()"), it's possible to access (actually modify) the list from another cpu. In the perf_remove_from_context(), it can remove an event from the context without an IPI when the context is not active. This is not safe with cgroup events which can have some active events in the context even if ctx->is_active is 0 at the moment. The target cpu might be in the middle of list iteration at the same time. If the event is enabled when it's about to be closed, it might call perf_cgroup_event_disable() and list_del() with the cgrp_cpuctx_list on a different cpu. This resulted in a crash due to an invalid list pointer access during the cgroup list traversal on the cpu which the event belongs to. Let's fallback to IPI to access the cgrp_cpuctx_list from that cpu. Similarly, perf_install_in_context() should use IPI for the cgroup events too. Fixes: ef54c1a476ae ("perf: Rework perf_event_exit_event()") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220124195808.2252071-1-namhyung@kernel.org
| * | | perf: Always wake the parent eventJames Clark2022-01-261-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using per-process mode and event inheritance is set to true, forked processes will create a new perf events via inherit_event() -> perf_event_alloc(). But these events will not have ring buffers assigned to them. Any call to wakeup will be dropped if it's called on an event with no ring buffer assigned because that's the object that holds the wakeup list. If the child event is disabled due to a call to perf_aux_output_begin() or perf_aux_output_end(), the wakeup is dropped leaving userspace hanging forever on the poll. Normally the event is explicitly re-enabled by userspace after it wakes up to read the aux data, but in this case it does not get woken up so the event remains disabled. This can be reproduced when using Arm SPE and 'stress' which forks once before running the workload. By looking at the list of aux buffers read, it's apparent that they stop after the fork: perf record -e arm_spe// -vvv -- stress -c 1 With this patch applied they continue to be printed. This behaviour doesn't happen when using systemwide or per-cpu mode. Reported-by: Ruben Ayrapetyan <Ruben.Ayrapetyan@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211206113840.130802-2-james.clark@arm.com
* | | | Merge tag 'sched_urgent_for_v5.17_rc2_p2' of ↵Linus Torvalds2022-01-301-4/+5
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Borislav Petkov: "Make sure the membarrier-rseq fence commands are part of the reported set when querying membarrier(2) commands through MEMBARRIER_CMD_QUERY" * tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask
| * | | | sched/membarrier: Fix membarrier-rseq fence command missing from query bitmaskMathieu Desnoyers2022-01-251-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The membarrier command MEMBARRIER_CMD_QUERY allows querying the available membarrier commands. When the membarrier-rseq fence commands were added, a new MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK was introduced with the intent to expose them with the MEMBARRIER_CMD_QUERY command, the but it was never added to MEMBARRIER_CMD_BITMASK. The membarrier-rseq fence commands are therefore not wired up with the query command. Rename MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK to MEMBARRIER_PRIVATE_EXPEDITED_RSEQ_BITMASK (the bitmask is not a command per-se), and change the erroneous MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ_BITMASK (which does not actually exist) to MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ. Wire up MEMBARRIER_PRIVATE_EXPEDITED_RSEQ_BITMASK in MEMBARRIER_CMD_BITMASK. Fixing this allows discovering availability of the membarrier-rseq fence feature. Fixes: 2a36ab717e8f ("rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # 5.10+ Link: https://lkml.kernel.org/r/20220117203010.30129-1-mathieu.desnoyers@efficios.com
* | | | | Merge tag 'x86_urgent_for_v5.17_rc2' of ↵Linus Torvalds2022-01-302-1/+2
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Add another Intel CPU model to the list of CPUs supporting the processor inventory unique number - Allow writing to MCE thresholding sysfs files again - a previous change had accidentally disabled it and no one noticed. Goes to show how much is this stuff used * tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN x86/MCE/AMD: Allow thresholding interface updates after init
| * | | | | x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPINTony Luck2022-01-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Missed adding the Icelake-D CPU to the list. It uses the same MSRs to control and read the inventory number as all the other models. Fixes: dc6b025de95b ("x86/mce: Add Xeon Icelake to list of CPUs that support PPIN") Reported-by: Ailin Xu <ailin.xu@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220121174743.1875294-2-tony.luck@intel.com
| * | | | | x86/MCE/AMD: Allow thresholding interface updates after initYazen Ghannam2022-01-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes to the AMD Thresholding sysfs code prevents sysfs writes from updating the underlying registers once CPU init is completed, i.e. "threshold_banks" is set. Allow the registers to be updated if the thresholding interface is already initialized or if in the init path. Use the "set_lvt_off" value to indicate if running in the init path, since this value is only set during init. Fixes: a037f3ca0ea0 ("x86/mce/amd: Make threshold bank setting hotplug robust") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220117161328.19148-1-yazen.ghannam@amd.com
* | | | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2022-01-3012-70/+91
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc fixes from Andrew Morton: "12 patches. Subsystems affected by this patch series: sysctl, binfmt, ia64, mm (memory-failure, folios, kasan, and psi), selftests, and ocfs2" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: ocfs2: fix a deadlock when commit trans jbd2: export jbd2_journal_[grab|put]_journal_head psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n mm, kasan: use compare-exchange operation to set KASAN page tag kasan: test: fix compatibility with FORTIFY_SOURCE tools/testing/scatterlist: add missing defines mm: page->mapping folio->mapping should have the same offset memory-failure: fetch compound_head after pgmap_pfn_valid() ia64: make IA64_MCA_RECOVERY bool instead of tristate binfmt_misc: fix crash when load/unload module include/linux/sysctl.h: fix register_sysctl_mount_point() return type
| * | | | | | ocfs2: fix a deadlock when commit transJoseph Qi2022-01-301-14/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6f1b228529ae introduces a regression which can deadlock as follows: Task1: Task2: jbd2_journal_commit_transaction ocfs2_test_bg_bit_allocatable spin_lock(&jh->b_state_lock) jbd_lock_bh_journal_head __jbd2_journal_remove_checkpoint spin_lock(&jh->b_state_lock) jbd2_journal_put_journal_head jbd_lock_bh_journal_head Task1 and Task2 lock bh->b_state and jh->b_state_lock in different order, which finally result in a deadlock. So use jbd2_journal_[grab|put]_journal_head instead in ocfs2_test_bg_bit_allocatable() to fix it. Link: https://lkml.kernel.org/r/20220121071205.100648-3-joseph.qi@linux.alibaba.com Fixes: 6f1b228529ae ("ocfs2: fix race between searching chunks and release journal_head from buffer_head") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com> Tested-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com> Reported-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | jbd2: export jbd2_journal_[grab|put]_journal_headJoseph Qi2022-01-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "ocfs2: fix a deadlock case". This fixes a deadlock case in ocfs2. We firstly export jbd2 symbols jbd2_journal_[grab|put]_journal_head as preparation and later use them in ocfs2 insread of jbd_[lock|unlock]_bh_journal_head to fix the deadlock. This patch (of 2): This exports symbols jbd2_journal_[grab|put]_journal_head, which will be used outside modules, e.g. ocfs2. Link: https://lkml.kernel.org/r/20220121071205.100648-2-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com> Cc: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | psi: fix "defined but not used" warnings when CONFIG_PROC_FS=nSuren Baghdasaryan2022-01-301-38/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_PROC_FS is disabled psi code generates the following warnings: kernel/sched/psi.c:1364:30: warning: 'psi_cpu_proc_ops' defined but not used [-Wunused-const-variable=] 1364 | static const struct proc_ops psi_cpu_proc_ops = { | ^~~~~~~~~~~~~~~~ kernel/sched/psi.c:1355:30: warning: 'psi_memory_proc_ops' defined but not used [-Wunused-const-variable=] 1355 | static const struct proc_ops psi_memory_proc_ops = { | ^~~~~~~~~~~~~~~~~~~ kernel/sched/psi.c:1346:30: warning: 'psi_io_proc_ops' defined but not used [-Wunused-const-variable=] 1346 | static const struct proc_ops psi_io_proc_ops = { | ^~~~~~~~~~~~~~~ Make definitions of these structures and related functions conditional on CONFIG_PROC_FS config. Link: https://lkml.kernel.org/r/20220119223940.787748-3-surenb@google.com Fixes: 0e94682b73bf ("psi: introduce psi monitor") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=nSuren Baghdasaryan2022-01-301-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_CGROUPS is disabled psi code generates the following warnings: kernel/sched/psi.c:1112:21: warning: no previous prototype for 'psi_trigger_create' [-Wmissing-prototypes] 1112 | struct psi_trigger *psi_trigger_create(struct psi_group *group, | ^~~~~~~~~~~~~~~~~~ kernel/sched/psi.c:1182:6: warning: no previous prototype for 'psi_trigger_destroy' [-Wmissing-prototypes] 1182 | void psi_trigger_destroy(struct psi_trigger *t) | ^~~~~~~~~~~~~~~~~~~ kernel/sched/psi.c:1249:10: warning: no previous prototype for 'psi_trigger_poll' [-Wmissing-prototypes] 1249 | __poll_t psi_trigger_poll(void **trigger_ptr, | ^~~~~~~~~~~~~~~~ Change the declarations of these functions in the header to provide the prototypes even when they are unused. Link: https://lkml.kernel.org/r/20220119223940.787748-2-surenb@google.com Fixes: 0e94682b73bf ("psi: introduce psi monitor") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>