linux - linux

	Commit message (Collapse)	Author	Files	Lines
2023-05-10	drm/sched: Check scheduler work queue before calling timeout handling	Vitaly Prosyak	1	-1/+1
	During an IGT GPU reset test we see again oops despite of commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling timeout handling). It uses ready condition whether to call drm_sched_fault which unwind the TDR leads to GPU reset. However it looks the ready condition is overloaded with other meanings, for example, for the following stack is related GPU reset : 0 gfx_v9_0_cp_gfx_start 1 gfx_v9_0_cp_gfx_resume 2 gfx_v9_0_cp_resume 3 gfx_v9_0_hw_init 4 gfx_v9_0_resume 5 amdgpu_device_ip_resume_phase2 does the following: /* start the ring */ gfx_v9_0_cp_gfx_start(adev); ring->sched.ready = true; The same approach is for other ASICs as well : gfx_v8_0_cp_gfx_resume gfx_v10_0_kiq_resume, etc... As a result, our GPU reset test causes GPU fault which calls unconditionally gfx_v9_0_fault and then drm_sched_fault. However now it depends on whether the interrupt service routine drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which sets the ready field of the scheduler to true even for uninitialized schedulers and causes oops vs no fault or when ISR drm_sched_fault is completed prior gfx_v9_0_cp_gfx_start and NULL pointer dereference does not occur. Use the field timeout_wq to prevent oops for uninitialized schedulers. The field could be initialized by the work queue of resetting the domain. v1: Corrections to commit message (Luben) Fixes: 11b3b9f461c5c4 ("drm/sched: Check scheduler ready before calling timeout handling") Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Link: https://lore.kernel.org/r/20230510135111.58631-1-vitaly.prosyak@amd.com Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
2023-05-04	drm/mipi-dsi: Set the fwnode for mipi_dsi_device	Saravana Kannan	1	-1/+1
	After commit 3fb16866b51d ("driver core: fw_devlink: Make cycle detection more robust"), fw_devlink prints an error when consumer devices don't have their fwnode set. This used to be ignored silently. Set the fwnode mipi_dsi_device so fw_devlink can find them and properly track their dependencies. This fixes errors like this: [ 0.334054] nwl-dsi 30a00000.mipi-dsi: Failed to create device link with regulator-lcd-1v8 [ 0.346964] nwl-dsi 30a00000.mipi-dsi: Failed to create device link with backlight-dsi Reported-by: Martin Kepplinger <martin.kepplinger@puri.sm> Link: https://lore.kernel.org/lkml/2a8e407f4f18c9350f8629a2b5fa18673355b2ae.camel@puri.sm/ Fixes: 068a00233969 ("drm: Add MIPI DSI bus support") Signed-off-by: Saravana Kannan <saravanak@google.com> Tested-by: Martin Kepplinger <martin.kepplinger@puri.sm> Link: https://lore.kernel.org/r/20230310063910.2474472-1-saravanak@google.com Signed-off-by: Maxime Ripard <maxime@cerno.tech>
2023-04-28	drm/nouveau/disp: More DP_RECEIVER_CAP_SIZE array fixes	Kees Cook	3	-3/+6
	More arrays (and arguments) for dcpd were set to 16, when it looks like DP_RECEIVER_CAP_SIZE (15) should be used. Fix the remaining cases, seen with GCC 13: ../drivers/gpu/drm/nouveau/nvif/outp.c: In function 'nvif_outp_acquire_dp': ../include/linux/fortify-string.h:57:33: warning: array subscript 'unsigned char[16][0]' is partly outside array bounds of 'u8[15]' {aka 'unsigned char[15]'} [-Warray-bounds=] 57 \| #define __underlying_memcpy __builtin_memcpy \| ^ ... ../drivers/gpu/drm/nouveau/nvif/outp.c:140:9: note: in expansion of macro 'memcpy' 140 \| memcpy(args.dp.dpcd, dpcd, sizeof(args.dp.dpcd)); \| ^~~~~~ ../drivers/gpu/drm/nouveau/nvif/outp.c:130:49: note: object 'dpcd' of size [0, 15] 130 \| nvif_outp_acquire_dp(struct nvif_outp *outp, u8 dpcd[DP_RECEIVER_CAP_SIZE], \| ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: 813443721331 ("drm/nouveau/disp: move DP link config into acquire") Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Cc: dri-devel@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Karol Herbst <git@karolherbst.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230204184307.never.825-kees@kernel.org
2023-04-24	drm/dsc: fix DP_DSC_MAX_BPP_DELTA_* macro values	Jani Nikula	1	-2/+2
	The macro values just don't match the specs. Fix them. Fixes: 1482ec00be4a ("drm: Add missing DP DSC extended capability definitions.") Cc: Vinod Govindapillai <vinod.govindapillai@intel.com> Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230406134615.1422509-2-jani.nikula@intel.com
2023-04-24	drm/dsc: fix drm_edp_dsc_sink_output_bpp() DPCD high byte usage	Jani Nikula	2	-4/+2
	The operator precedence between << and & is wrong, leading to the high byte being completely ignored. For example, with the 6.4 format, 32 becomes 0 and 24 becomes 8. Fix it, and remove the slightly confusing and unnecessary DP_DSC_MAX_BITS_PER_PIXEL_HI_SHIFT macro while at it. Fixes: 0575650077ea ("drm/dp: DRM DP helper/macros to get DP sink DSC parameters") Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Cc: Manasi Navare <navaremanasi@google.com> Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> Cc: <stable@vger.kernel.org> # v5.0+ Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230406134615.1422509-1-jani.nikula@intel.com
2023-04-21	firmware/sysfb: Fix VESA format selection	Pierre Asselin	1	-1/+3
	Some legacy BIOSes report no reserved bits in their 32-bit rgb mode, breaking the calculation of bits_per_pixel in commit f35cd3fa7729 ("firmware/sysfb: Fix EFI/VESA format selection"). However they report lfb_depth correctly for those modes. Keep the computation but set bits_per_pixel to lfb_depth if the latter is larger. v2 fixes the warnings from a max3() macro with arguments of different types; split the bits_per_pixel assignment to avoid uglyfing the code with too many typecasts. v3 fixes space and formatting blips pointed out by Javier, and change the bit_per_pixel assignment back to a single statement using two casts. v4 go back to v2 and use max_t() Signed-off-by: Pierre Asselin <pa@panix.com> Fixes: f35cd3fa7729 ("firmware/sysfb: Fix EFI/VESA format selection") Link: https://lore.kernel.org/r/4Psm6B6Lqkz1QXM@panix3.panix.com Link: https://lore.kernel.org/r/20230412150225.3757223-1-javierm@redhat.com Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230419044834.10816-1-pa@panix.com
2023-04-21	drm/fbdev-generic: prohibit potential out-of-bounds access	Sui Jingfeng	1	-4/+12
	The fbdev test of IGT may write after EOF, which lead to out-of-bound access for drm drivers with fbdev-generic. For example, run fbdev test on a x86+ast2400 platform, with 1680x1050 resolution, will cause the linux kernel hang with the following call trace: Oops: 0000 [#1] PREEMPT SMP PTI [IGT] fbdev: starting subtest eof Workqueue: events drm_fb_helper_damage_work [drm_kms_helper] [IGT] fbdev: starting subtest nullptr RIP: 0010:memcpy_erms+0xa/0x20 RSP: 0018:ffffa17d40167d98 EFLAGS: 00010246 RAX: ffffa17d4eb7fa80 RBX: ffffa17d40e0aa80 RCX: 00000000000014c0 RDX: 0000000000001a40 RSI: ffffa17d40e0b000 RDI: ffffa17d4eb80000 RBP: ffffa17d40167e20 R08: 0000000000000000 R09: ffff89522ecff8c0 R10: ffffa17d4e4c5000 R11: 0000000000000000 R12: ffffa17d4eb7fa80 R13: 0000000000001a40 R14: 000000000000041a R15: ffffa17d40167e30 FS: 0000000000000000(0000) GS:ffff895257380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffa17d40e0b000 CR3: 00000001eaeca006 CR4: 00000000001706e0 Call Trace: <TASK> ? drm_fbdev_generic_helper_fb_dirty+0x207/0x330 [drm_kms_helper] drm_fb_helper_damage_work+0x8f/0x170 [drm_kms_helper] process_one_work+0x21f/0x430 worker_thread+0x4e/0x3c0 ? __pfx_worker_thread+0x10/0x10 kthread+0xf4/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> CR2: ffffa17d40e0b000 ---[ end trace 0000000000000000 ]--- The is because damage rectangles computed by drm_fb_helper_memory_range_to_clip() function is not guaranteed to be bound in the screen's active display area. Possible reasons are: 1) Buffers are allocated in the granularity of page size, for mmap system call support. The shadow screen buffer consumed by fbdev emulation may also choosed be page size aligned. 2) The DIV_ROUND_UP() used in drm_fb_helper_memory_range_to_clip() will introduce off-by-one error. For example, on a 16KB page size system, in order to store a 1920x1080 XRGB framebuffer, we need allocate 507 pages. Unfortunately, the size 192010804 can not be divided exactly by 16KB. 1920 * 1080 * 4 = 8294400 bytes 506 * 16 * 1024 = 8290304 bytes 507 * 16 * 1024 = 8306688 bytes line_length = 19204 = 7680 bytes 507 16 * 1024 / 7680 = 1081.6 off / line_length = 507 * 16 * 1024 / 7680 = 1081 DIV_ROUND_UP(507 * 16 * 1024, 7680) will yeild 1082 memcpy_toio() typically issue the copy line by line, when copy the last line, out-of-bound access will be happen. Because: 1082 * line_length = 1082 * 7680 = 8309760, and 8309760 > 8306688 Note that userspace may still write to the invisiable area if a larger buffer than width x stride is exposed. But it is not a big issue as long as there still have memory resolve the access if not drafting so far. - Also limit the y1 (Daniel) - keep fix patch it to minimal (Daniel) - screen_size is page size aligned because of it need mmap (Thomas) - Adding fixes tag (Thomas) Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn> Fixes: aa15c677cc34 ("drm/fb-helper: Fix vertical damage clipping") Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/dri-devel/ad44df29-3241-0d9e-e708-b0338bf3c623@189.cn/ Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230420030500.1578756-1-suijingfeng@loongson.cn
2023-04-21	drm/ast: Fix ARM compatibility	Jammy Huang	1	-4/+5
	ARM architecture only has 'memory', so all devices are accessed by MMIO if possible. Signed-off-by: Jammy Huang <jammy_huang@aspeedtech.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230421003354.27767-1-jammy_huang@aspeedtech.com
2023-04-17	drm/rockchip: vop2: Use regcache_sync() to fix suspend/resume	Sascha Hauer	1	-7/+3
	afa965a45e01 ("drm/rockchip: vop2: fix suspend/resume") uses regmap_reinit_cache() to fix the suspend/resume issue with the VOP2 driver. During discussion it came up that we should rather use regcache_sync() instead. As the original patch is already applied fix this up in this follow-up patch. Fixes: afa965a45e01 ("drm/rockchip: vop2: fix suspend/resume") Cc: stable@vger.kernel.org Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230417123747.2179695-1-s.hauer@pengutronix.de
2023-04-17	drm/nouveau: fix incorrect conversion to dma_resv_wait_timeout()	John Ogness	1	-6/+12
	Commit 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait") converted from ttm_bo_wait_ctx() to dma_resv_wait_timeout(). However, dma_resv_wait_timeout() returns greater than zero on success as opposed to ttm_bo_wait_ctx(). As a result, relocs will fail and log errors even when it was a success. Change the return code handling to match that of nouveau_gem_ioctl_cpu_prep(), which was already using dma_resv_wait_timeout() correctly. Fixes: 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait") Reported-by: Tanmay Bhushan <007047221b@gmail.com> Link: https://lore.kernel.org/lkml/20230119225351.71657-1-007047221b@gmail.com Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Karol Herbst <kherbst@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/87edolaomt.fsf@jogness.linutronix.de
2023-04-17	drm/rockchip: vop2: fix suspend/resume	Sascha Hauer	1	-0/+8
	During a suspend/resume cycle the VO power domain will be disabled and the VOP2 registers will reset to their default values. After that the cached register values will be out of sync and the read/modify/write operations we do on the window registers will result in bogus values written. Fix this by re-initializing the register cache each time we enable the VOP2. With this the VOP2 will show a picture after a suspend/resume cycle whereas without this the screen stays dark. Fixes: 604be85547ce4 ("drm/rockchip: Add VOP2 driver") Cc: stable@vger.kernel.org Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Tested-by: Chris Morgan <macromorgan@hotmail.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230413144347.3506023-1-s.hauer@pengutronix.de
2023-04-14	drm/sched: Check scheduler ready before calling timeout handling	Vitaly Prosyak	1	-1/+2
	During an IGT GPU reset test we see the following oops, [ +0.000003] ------------[ cut here ]------------ [ +0.000000] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:1656 __queue_delayed_work+0x6d/0xa0 [ +0.000004] Modules linked in: iptable_filter bpfilter amdgpu(OE) nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic intel_rapl_msr ledtrig_audio snd_hda_codec_hdmi intel_rapl_common snd_hda_intel edac_mce_amd snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core iommu_v2 gpu_sched(OE) kvm_amd drm_buddy snd_hwdep kvm video drm_ttm_helper snd_pcm ttm snd_seq_midi drm_display_helper snd_seq_midi_event snd_rawmidi cec crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 snd_seq aesni_intel rc_core crypto_simd cryptd binfmt_misc drm_kms_helper rapl snd_seq_device input_leds joydev snd_timer i2c_algo_bit syscopyarea snd ccp sysfillrect sysimgblt wmi_bmof k10temp soundcore mac_hid sch_fq_codel msr parport_pc ppdev drm lp parport ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid r8169 ahci xhci_pci gpio_amdpt realtek i2c_piix4 wmi crc32_pclmul xhci_pci_renesas libahci gpio_generic [ +0.000070] CPU: 9 PID: 0 Comm: swapper/9 Tainted: G W OE 6.1.11+ #2 [ +0.000003] Hardware name: Gigabyte Technology Co., Ltd. AB350-Gaming 3/AB350-Gaming 3-CF, BIOS F7 06/16/2017 [ +0.000001] RIP: 0010:__queue_delayed_work+0x6d/0xa0 [ +0.000003] Code: 7a 50 48 01 c1 48 89 4a 30 81 ff 00 20 00 00 75 38 4c 89 cf e8 64 3e 0a 00 5d e9 1e c5 11 01 e8 99 f7 ff ff 5d e9 13 c5 11 01 <0f> 0b eb c1 0f 0b 48 81 7a 38 70 5c 0e 81 74 9f 0f 0b 48 8b 42 28 [ +0.000002] RSP: 0018:ffffc90000398d60 EFLAGS: 00010007 [ +0.000002] RAX: ffff88810d589c60 RBX: 0000000000000000 RCX: 0000000000000000 [ +0.000002] RDX: ffff88810d589c58 RSI: 0000000000000000 RDI: 0000000000002000 [ +0.000001] RBP: ffffc90000398d60 R08: 0000000000000000 R09: ffff88810d589c78 [ +0.000002] R10: 72705f305f39765f R11: 7866673a6d72645b R12: ffff88810d589c58 [ +0.000001] R13: 0000000000002000 R14: 0000000000000000 R15: 0000000000000000 [ +0.000002] FS: 0000000000000000(0000) GS:ffff8887fee40000(0000) knlGS:0000000000000000 [ +0.000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000002] CR2: 00005562c4797fa0 CR3: 0000000110da0000 CR4: 00000000003506e0 [ +0.000002] Call Trace: [ +0.000001] <IRQ> [ +0.000001] mod_delayed_work_on+0x5e/0xa0 [ +0.000004] drm_sched_fault+0x23/0x30 [gpu_sched] [ +0.000007] gfx_v9_0_fault.isra.0+0xa6/0xd0 [amdgpu] [ +0.000258] gfx_v9_0_priv_reg_irq+0x29/0x40 [amdgpu] [ +0.000254] amdgpu_irq_dispatch+0x1ac/0x2b0 [amdgpu] [ +0.000243] amdgpu_ih_process+0x89/0x130 [amdgpu] [ +0.000245] amdgpu_irq_handler+0x24/0x60 [amdgpu] [ +0.000165] __handle_irq_event_percpu+0x4f/0x1a0 [ +0.000003] handle_irq_event_percpu+0x15/0x50 [ +0.000001] handle_irq_event+0x39/0x60 [ +0.000002] handle_edge_irq+0xa8/0x250 [ +0.000003] __common_interrupt+0x7b/0x150 [ +0.000002] common_interrupt+0xc1/0xe0 [ +0.000003] </IRQ> [ +0.000000] <TASK> [ +0.000001] asm_common_interrupt+0x27/0x40 [ +0.000002] RIP: 0010:native_safe_halt+0xb/0x10 [ +0.000003] Code: 46 ff ff ff cc cc cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d 69 f2 5e 00 f4 e9 f1 3b 3e 00 90 eb 07 0f 00 2d 59 f2 5e 00 fb f4 <e9> e0 3b 3e 00 0f 1f 44 00 00 55 48 89 e5 53 e8 b1 d4 fe ff 66 90 [ +0.000002] RSP: 0018:ffffc9000018fdc8 EFLAGS: 00000246 [ +0.000002] RAX: 0000000000004000 RBX: 000000000002e5a8 RCX: 000000000000001f [ +0.000001] RDX: 0000000000000001 RSI: ffff888101298800 RDI: ffff888101298864 [ +0.000001] RBP: ffffc9000018fdd0 R08: 000000527f64bd8b R09: 000000000001dc90 [ +0.000001] R10: 000000000001dc90 R11: 0000000000000003 R12: 0000000000000001 [ +0.000001] R13: ffff888101298864 R14: ffffffff832d9e20 R15: ffff888193aa8c00 [ +0.000003] ? acpi_idle_do_entry+0x5e/0x70 [ +0.000002] acpi_idle_enter+0xd1/0x160 [ +0.000003] cpuidle_enter_state+0x9a/0x6e0 [ +0.000003] cpuidle_enter+0x2e/0x50 [ +0.000003] call_cpuidle+0x23/0x50 [ +0.000002] do_idle+0x1de/0x260 [ +0.000002] cpu_startup_entry+0x20/0x30 [ +0.000002] start_secondary+0x120/0x150 [ +0.000003] secondary_startup_64_no_verify+0xe5/0xeb [ +0.000004] </TASK> [ +0.000000] ---[ end trace 0000000000000000 ]--- [ +0.000003] BUG: kernel NULL pointer dereference, address: 0000000000000102 [ +0.006233] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx_low timeout, signaled seq=3, emitted seq=4 [ +0.000734] #PF: supervisor read access in kernel mode [ +0.009670] [drm:amdgpu_job_timedout [amdgpu]] ERROR Process information: process amd_deadlock pid 2002 thread amd_deadlock pid 2002 [ +0.005135] #PF: error_code(0x0000) - not-present page [ +0.000002] PGD 0 P4D 0 [ +0.000002] Oops: 0000 [#1] PREEMPT SMP NOPTI [ +0.000002] CPU: 9 PID: 0 Comm: swapper/9 Tainted: G W OE 6.1.11+ #2 [ +0.000002] Hardware name: Gigabyte Technology Co., Ltd. AB350-Gaming 3/AB350-Gaming 3-CF, BIOS F7 06/16/2017 [ +0.012101] amdgpu 0000:0c:00.0: amdgpu: GPU reset begin! [ +0.005136] RIP: 0010:__queue_work+0x1f/0x4e0 [ +0.000004] Code: 87 cd 11 01 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 49 89 d5 41 54 49 89 f4 53 48 83 ec 10 89 7d d4 <f6> 86 02 01 00 00 01 0f 85 6c 03 00 00 e8 7f 36 08 00 8b 45 d4 48 For gfx_rings the schedulers may not be initialized by amdgpu_device_init_schedulers() due to ring->no_scheduler flag being set to true and thus the timeout_wq is NULL. As a result, since all ASICs call drm_sched_fault() unconditionally even for schedulers which have not been initialized, it is simpler to use the ready condition which indicates whether the given scheduler worker thread runs and whether the timeout_wq of the reset domain has been initialized. Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Cc: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20230406200054.633379-1-luben.tuikov@amd.com
2023-04-11	drm/armada: Fix a potential double free in an error handling path	Christophe JAILLET	1	-1/+0
	'priv' is a managed resource, so there is no need to free it explicitly or there will be a double free(). Fixes: 90ad200b4cbc ("drm/armada: Use devm_drm_dev_alloc") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/c4f3c9207a9fce35cb6dd2cc60e755275961588a.1640536364.git.christophe.jaillet@wanadoo.fr
2023-04-11	fbmem: Reject FB_ACTIVATE_KD_TEXT from userspace	Daniel Vetter	1	-0/+2
	This is an oversight from dc5bdb68b5b3 ("drm/fb-helper: Fix vt restore") - I failed to realize that nasty userspace could set this. It's not pretty to mix up kernel-internal and userspace uapi flags like this, but since the entire fb_var_screeninfo structure is uapi we'd need to either add a new parameter to the ->fb_set_par callback and fb_set_par() function, which has a _lot_ of users. Or some other fairly ugly side-channel int fb_info. Neither is a pretty prospect. Instead just correct the issue at hand by filtering out this kernel-internal flag in the ioctl handling code. Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Fixes: dc5bdb68b5b3 ("drm/fb-helper: Fix vt restore") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: shlomo@fastmail.com Cc: Michel Dänzer <michel@daenzer.net> Cc: Noralf Trønnes <noralf@tronnes.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.7+ Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Qiujun Huang <hqjagain@gmail.com> Cc: Peter Rosin <peda@axentia.se> Cc: linux-fbdev@vger.kernel.org Cc: Helge Deller <deller@gmx.de> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Shigeru Yoshida <syoshida@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230404193934.472457-1-daniel.vetter@ffwll.ch
2023-04-11	drm/nouveau/fb: add missing sysmen flush callbacks	Karol Herbst	4	-0/+4
	Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/203 Fixes: 5728d064190e1 ("drm/nouveau/fb: handle sysmem flush page from common code") Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Ben Skeggs <bskeggs@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230405110455.1368428-1-kherbst@redhat.com
2023-04-09	Linux 6.3-rc6v6.3-rc6	Linus Torvalds	1	-1/+1

2023-04-07	cifs: double lock in cifs_reconnect_tcon()	Dan Carpenter	1	-1/+1
	This lock was supposed to be an unlock. Fixes: 6cc041e90c17 ("cifs: avoid races in parallel reconnects in smb1") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-04-07	block: don't set GD_NEED_PART_SCAN if scan partition failed	Yu Kuai	1	-1/+7
	Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still set, and partition scan will be proceed again when blkdev_get_by_dev() is called. However, this will cause a problem that re-assemble partitioned raid device will creat partition for underlying disk. Test procedure: mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 sgdisk -n 0:0:+100MiB /dev/md0 blockdev --rereadpt /dev/sda blockdev --rereadpt /dev/sdb mdadm -S /dev/md0 mdadm -A /dev/md0 /dev/sda /dev/sdb Test result: underlying disk partition and raid partition can be observed at the same time Note that this can still happen in come corner cases that GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid device. Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-06	drm/scheduler: Fix UAF race in drm_sched_entity_push_job()	Asahi Lina	1	-2/+9
	After a job is pushed into the queue, it is owned by the scheduler core and may be freed at any time, so we can't write nor read the submit timestamp after that point. Fixes oopses observed with the drm/asahi driver, found with kASAN. Signed-off-by: Asahi Lina <lina@asahilina.net> Link: https://lore.kernel.org/r/20230406-scheduler-uaf-2-v1-1-972531cf0a81@asahilina.net Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
2023-04-06	tracing/synthetic: Make lastcmd_mutex static	Steven Rostedt (Google)	1	-1/+1
	The lastcmd_mutex is only used in trace_events_synth.c and should be static. Link: https://lore.kernel.org/linux-trace-kernel/202304062033.cRStgOuP-lkp@intel.com/ Link: https://lore.kernel.org/linux-trace-kernel/20230406111033.6e26de93@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Tze-nan Wu <Tze-nan.Wu@mediatek.com> Fixes: 4ccf11c4e8a8e ("tracing/synthetic: Fix races on freeing last_cmd") Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-04-06	net: stmmac: check fwnode for phy device before scanning for phy	Michael Sit Wei Hong	1	-4/+11
	Some DT devices already have phy device configured in the DT/ACPI. Current implementation scans for a phy unconditionally even though there is a phy listed in the DT/ACPI and already attached. We should check the fwnode if there is any phy device listed in fwnode and decide whether to scan for a phy to attach to. Fixes: fe2cfbc96803 ("net: stmmac: check if MAC needs to attach to a PHY") Reported-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Link: https://lore.kernel.org/lkml/20230403212434.296975-1-martin.blumenstingl@googlemail.com/ Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Shahab Vahedi <shahab@synopsys.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com> Link: https://lore.kernel.org/r/20230406024541.3556305-1-michael.wei.hong.sit@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-06	ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct()	Zheng Yejian	1	-6/+9
	Syzkaller report a WARNING: "WARN_ON(!direct)" in modify_ftrace_direct(). Root cause is 'direct->addr' was changed from 'old_addr' to 'new_addr' but not restored if error happened on calling ftrace_modify_direct_caller(). Then it can no longer find 'direct' by that 'old_addr'. To fix it, restore 'direct->addr' to 'old_addr' explicitly in error path. Link: https://lore.kernel.org/linux-trace-kernel/20230330025223.1046087-1-zhengyejian1@huawei.com Cc: stable@vger.kernel.org Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Cc: <ast@kernel.org> Cc: <daniel@iogearbox.net> Fixes: 8a141dd7f706 ("ftrace: Fix modify_ftrace_direct.") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-04-06	swiotlb: fix a braino in the alignment check fix	Petr Tesarik	1	-3/+3
	The alignment mask in swiotlb_do_find_slots() masks off the high bits which are not relevant for the alignment, so multiple requirements are combined with a bitwise OR rather than AND. In plain English, the stricter the alignment, the more bits must be set in iotlb_align_mask. Confusion may arise from the fact that the same variable is also used to mask off the offset within a swiotlb slot, which is achieved with a bitwise AND. Fixes: 0eee5ae10256 ("swiotlb: fix slot alignment checks") Reported-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/all/CAA42JLa1y9jJ7BgQvXeUYQh-K2mDNHd2BYZ4iZUz33r5zY7oAQ@mail.gmail.com/ Reported-by: Kelsey Steele <kelseysteele@linux.microsoft.com> Link: https://lore.kernel.org/all/20230405003549.GA21326@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/ Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-06	block: ublk: make sure that block size is set correctly	Ming Lei	1	-1/+3
	block size is one very key setting for block layer, and bad block size could panic kernel easily. Make sure that block size is set correctly. Meantime if ublk_validate_params() fails, clear ub->params so that disk is prevented from being added. Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Reported-and-tested-by: Breno Leitao <leitao@debian.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-06	ublk: read any SQE values upfront	Jens Axboe	1	-2/+20
	Since SQE memory is shared with userspace, we should only be reading it once. We cannot read it multiple times, particularly when it's read once for validation and then read again for the actual use. ublk_ch_uring_cmd() is safe when called as a retry operation, as the memory backing is stable at that point. But for normal issue, we want to ensure that we only read ublksrv_io_cmd once. Wrap the function in a helper that reads the value into an on-stack copy of the struct. Cc: stable@vger.kernel.org # 6.0+ Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-06	net: stmmac: Add queue reset into stmmac_xdp_open() function	Song Yoong Siang	1	-0/+2
	Queue reset was moved out from __init_dma_rx_desc_rings() and __init_dma_tx_desc_rings() functions. Thus, the driver fails to transmit and receive packet after XDP prog setup. This commit adds the missing queue reset into stmmac_xdp_open() function. Fixes: f9ec5723c3db ("net: ethernet: stmicro: stmmac: move queue reset to dedicated functions") Cc: <stable@vger.kernel.org> # 6.0+ Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230404044823.3226144-1-yoong.siang.song@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-06	selftests: net: rps_default_mask.sh: delete veth link specifically	Hangbin Liu	1	-0/+1
	When deleting the netns and recreating a new one while re-adding the veth interface, there is a small window of time during which the old veth interface has not yet been removed. This can cause the new addition to fail. To resolve this issue, we can either wait for a short while to ensure that the old veth interface is deleted, or we can specifically remove the veth interface. Before this patch: # ./rps_default_mask.sh empty rps_default_mask [ ok ] changing rps_default_mask dont affect existing devices [ ok ] changing rps_default_mask dont affect existing netns [ ok ] changing rps_default_mask affect newly created devices [ ok ] changing rps_default_mask don't affect newly child netns[II][ ok ] rps_default_mask is 0 by default in child netns [ ok ] RTNETLINK answers: File exists changing rps_default_mask in child ns don't affect the main one[ ok ] cat: /sys/class/net/vethC11an1/queues/rx-0/rps_cpus: No such file or directory changing rps_default_mask in child ns affects new childns devices./rps_default_mask.sh: line 36: [: -eq: unary operator expected [fail] expected 1 found changing rps_default_mask in child ns don't affect existing devices[ ok ] After this patch: # ./rps_default_mask.sh empty rps_default_mask [ ok ] changing rps_default_mask dont affect existing devices [ ok ] changing rps_default_mask dont affect existing netns [ ok ] changing rps_default_mask affect newly created devices [ ok ] changing rps_default_mask don't affect newly child netns[II][ ok ] rps_default_mask is 0 by default in child netns [ ok ] changing rps_default_mask in child ns don't affect the main one[ ok ] changing rps_default_mask in child ns affects new childns devices[ ok ] changing rps_default_mask in child ns don't affect existing devices[ ok ] Fixes: 3a7d84eae03b ("self-tests: more rps self tests") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20230404072411.879476-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-06	net: fec: make use of MDIO C45 quirk	Greg Ungerer	2	-12/+25
	Not all fec MDIO bus drivers support C45 mode transactions. The older fec hardware block in many ColdFire SoCs does not appear to support them, at least according to most of the different ColdFire SoC reference manuals. The bits used to generate C45 access on the iMX parts, in the OP field of the MMFR register, are documented as generating non-compliant MII frames (it is not documented as to exactly how they are non-compliant). Commit 8d03ad1ab0b0 ("net: fec: Separate C22 and C45 transactions") means the fec driver will always register c45 MDIO read and write methods. During probe these will always be accessed now generating non-compliant MII accesses on ColdFire based devices. Add a quirk define, FEC_QUIRK_HAS_MDIO_C45, that can be used to distinguish silicon that supports MDIO C45 framing or not. Add this to all the existing iMX quirks, so they will be behave as they do now (). () it seems that some iMX parts may not support C45 transactions either. The iMX25 and iMX50 Reference Manuals contain similar wording to the ColdFire Reference Manuals on this. Fixes: 8d03ad1ab0b0 ("net: fec: Separate C22 and C45 transactions") Signed-off-by: Greg Ungerer <gerg@linux-m68k.org> Reviewed-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230404052207.3064861-1-gerg@linux-m68k.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-06	maple_tree: fix a potential concurrency bug in RCU mode	Peng Zhang	1	-2/+1
	There is a concurrency bug that may cause the wrong value to be loaded when a CPU is modifying the maple tree. CPU1: mtree_insert_range() mas_insert() mas_store_root() ... mas_root_expand() ... rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node)); ma_set_meta(node, maple_leaf_64, 0, slot); <---IP CPU2: mtree_load() mtree_lookup_walk() ma_data_end(); When CPU1 is about to execute the instruction pointed to by IP, the ma_data_end() executed by CPU2 may return the wrong end position, which will cause the value loaded by mtree_load() to be wrong. An example of triggering the bug: Add mdelay(100) between rcu_assign_pointer() and ma_set_meta() in mas_root_expand(). static DEFINE_MTREE(tree); int work(void p) { unsigned long val; for (int i = 0 ; i< 30; ++i) { val = (unsigned long)mtree_load(&tree, 8); mdelay(5); pr_info("%lu",val); } return 0; } mt_init_flags(&tree, MT_FLAGS_USE_RCU); mtree_insert(&tree, 0, (void)12345, GFP_KERNEL); run_thread(work) mtree_insert(&tree, 1, (void*)56789, GFP_KERNEL); In RCU mode, mtree_load() should always return the value before or after the data structure is modified, and in this example mtree_load(&tree, 8) may return 56789 which is not expected, it should always return NULL. Fix it by put ma_set_meta() before rcu_assign_pointer(). Link: https://lkml.kernel.org/r/20230314124203.91572-4-zhangpeng.00@bytedance.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	maple_tree: fix get wrong data_end in mtree_lookup_walk()	Peng Zhang	1	-10/+5
	if (likely(offset > end)) max = pivots[offset]; The above code should be changed to if (likely(offset < end)), which is correct. This affects the correctness of ma_data_end(). Now it seems that the final result will not be wrong, but it is best to change it. This patch does not change the code as above, because it simplifies the code by the way. Link: https://lkml.kernel.org/r/20230314124203.91572-1-zhangpeng.00@bytedance.com Link: https://lkml.kernel.org/r/20230314124203.91572-2-zhangpeng.00@bytedance.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()	Rongwei Wang	1	-1/+2
	The si->lock must be held when deleting the si from the available list. Otherwise, another thread can re-add the si to the available list, which can lead to memory corruption. The only place we have found where this happens is in the swapoff path. This case can be described as below: core 0 core 1 swapoff del_from_avail_list(si) waiting try lock si->lock acquire swap_avail_lock and re-add si into swap_avail_head acquire si->lock but missing si already being added again, and continuing to clear SWP_WRITEOK, etc. It can be easily found that a massive warning messages can be triggered inside get_swap_pages() by some special cases, for example, we call madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile, run much swapon-swapoff operations (e.g. stress-ng-swap). However, in the worst case, panic can be caused by the above scene. In swapoff(), the memory used by si could be kept in swap_info[] after turning off a swap. This means memory corruption will not be caused immediately until allocated and reset for a new swap in the swapon path. A panic message caused: (with CONFIG_PLIST_DEBUG enabled) ------------[ cut here ]------------ top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70 Modules linked in: rfkill(E) crct10dif_ce(E)... CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+ Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015 pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) pc : plist_check_prev_next_node+0x50/0x70 lr : plist_check_prev_next_node+0x50/0x70 sp : ffff0018009d3c30 x29: ffff0018009d3c40 x28: ffff800011b32a98 x27: 0000000000000000 x26: ffff001803908000 x25: ffff8000128ea088 x24: ffff800011b32a48 x23: 0000000000000028 x22: ffff001800875c00 x21: ffff800010f9e520 x20: ffff001800875c00 x19: ffff001800fdc6e0 x18: 0000000000000030 x17: 0000000000000000 x16: 0000000000000000 x15: 0736076307640766 x14: 0730073007380731 x13: 0736076307640766 x12: 0730073007380731 x11: 000000000004058d x10: 0000000085a85b76 x9 : ffff8000101436e4 x8 : ffff800011c8ce08 x7 : 0000000000000000 x6 : 0000000000000001 x5 : ffff0017df9ed338 x4 : 0000000000000001 x3 : ffff8017ce62a000 x2 : ffff0017df9ed340 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: plist_check_prev_next_node+0x50/0x70 plist_check_head+0x80/0xf0 plist_add+0x28/0x140 add_to_avail_list+0x9c/0xf0 _enable_swap_info+0x78/0xb4 __do_sys_swapon+0x918/0xa10 __arm64_sys_swapon+0x20/0x30 el0_svc_common+0x8c/0x220 do_el0_svc+0x2c/0x90 el0_svc+0x1c/0x30 el0_sync_handler+0xa8/0xb0 el0_sync+0x148/0x180 irq event stamp: 2082270 Now, si->lock locked before calling 'del_from_avail_list()' to make sure other thread see the si had been deleted and SWP_WRITEOK cleared together, will not reinsert again. This problem exists in versions after stable 5.10.y. Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba.com Fixes: a2468cc9bfdff ("swap: choose swap device according to numa node") Tested-by: Yongchen Yin <wb-yyc939293@alibaba-inc.com> Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Aaron Lu <aaron.lu@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	nilfs2: fix sysfs interface lifetime	Ryusuke Konishi	2	-5/+9
	The current nilfs2 sysfs support has issues with the timing of creation and deletion of sysfs entries, potentially leading to null pointer dereferences, use-after-free, and lockdep warnings. Some of the sysfs attributes for nilfs2 per-filesystem instance refer to metadata file "cpfile", "sufile", or "dat", but nilfs_sysfs_create_device_group that creates those attributes is executed before the inodes for these metadata files are loaded, and nilfs_sysfs_delete_device_group which deletes these sysfs entries is called after releasing their metadata file inodes. Therefore, access to some of these sysfs attributes may occur outside of the lifetime of these metadata files, resulting in inode NULL pointer dereferences or use-after-free. In addition, the call to nilfs_sysfs_create_device_group() is made during the locking period of the semaphore "ns_sem" of nilfs object, so the shrinker call caused by the memory allocation for the sysfs entries, may derive lock dependencies "ns_sem" -> (shrinker) -> "locks acquired in nilfs_evict_inode()". Since nilfs2 may acquire "ns_sem" deep in the call stack holding other locks via its error handler __nilfs_error(), this causes lockdep to report circular locking. This is a false positive and no circular locking actually occurs as no inodes exist yet when nilfs_sysfs_create_device_group() is called. Fortunately, the lockdep warnings can be resolved by simply moving the call to nilfs_sysfs_create_device_group() out of "ns_sem". This fixes these sysfs issues by revising where the device's sysfs interface is created/deleted and keeping its lifetime within the lifetime of the metadata files above. Link: https://lkml.kernel.org/r/20230330205515.6167-1-konishi.ryusuke@gmail.com Fixes: dd70edbde262 ("nilfs2: integrate sysfs support into driver") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+979fa7f9c0d086fdc282@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/0000000000003414b505f7885f7e@google.com Reported-by: syzbot+5b7d542076d9bddc3c6a@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/0000000000006ac86605f5f44eb9@google.com Cc: Viacheslav Dubeyko <slava@dubeyko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	mm: take a page reference when removing device exclusive entries	Alistair Popple	1	-1/+15
	Device exclusive page table entries are used to prevent CPU access to a page whilst it is being accessed from a device. Typically this is used to implement atomic operations when the underlying bus does not support atomic access. When a CPU thread encounters a device exclusive entry it locks the page and restores the original entry after calling mmu notifiers to signal drivers that exclusive access is no longer available. The device exclusive entry holds a reference to the page making it safe to access the struct page whilst the entry is present. However the fault handling code does not hold the PTL when taking the page lock. This means if there are multiple threads faulting concurrently on the device exclusive entry one will remove the entry whilst others will wait on the page lock without holding a reference. This can lead to threads locking or waiting on a folio with a zero refcount. Whilst mmap_lock prevents the pages getting freed via munmap() they may still be freed by a migration. This leads to warnings such as PAGE_FLAGS_CHECK_AT_FREE due to the page being locked when the refcount drops to zero. Fix this by trying to take a reference on the folio before locking it. The code already checks the PTE under the PTL and aborts if the entry is no longer there. It is also possible the folio has been unmapped, freed and re-allocated allowing a reference to be taken on an unrelated folio. This case is also detected by the PTE check and the folio is unlocked without further changes. Link: https://lkml.kernel.org/r/20230330012519.804116-1-apopple@nvidia.com Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	mm: vmalloc: avoid warn_alloc noise caused by fatal signal	Yafang Shao	1	-3/+5
	There're some suspicious warn_alloc on my test serer, for example, [13366.518837] warn_alloc: 81 callbacks suppressed [13366.518841] test_verifier: vmalloc error: size 4096, page order 0, failed to allocate pages, mode:0x500dc2(GFP_HIGHUSER\|__GFP_ZERO\|__GFP_ACCOUNT), nodemask=(null),cpuset=/,mems_allowed=0-1 [13366.522240] CPU: 30 PID: 722463 Comm: test_verifier Kdump: loaded Tainted: G W O 6.2.0+ #638 [13366.524216] Call Trace: [13366.524702] <TASK> [13366.525148] dump_stack_lvl+0x6c/0x80 [13366.525712] dump_stack+0x10/0x20 [13366.526239] warn_alloc+0x119/0x190 [13366.526783] ? alloc_pages_bulk_array_mempolicy+0x9e/0x2a0 [13366.527470] __vmalloc_area_node+0x546/0x5b0 [13366.528066] __vmalloc_node_range+0xc2/0x210 [13366.528660] __vmalloc_node+0x42/0x50 [13366.529186] ? bpf_prog_realloc+0x53/0xc0 [13366.529743] __vmalloc+0x1e/0x30 [13366.530235] bpf_prog_realloc+0x53/0xc0 [13366.530771] bpf_patch_insn_single+0x80/0x1b0 [13366.531351] bpf_jit_blind_constants+0xe9/0x1c0 [13366.531932] ? __free_pages+0xee/0x100 [13366.532457] ? free_large_kmalloc+0x58/0xb0 [13366.533002] bpf_int_jit_compile+0x8c/0x5e0 [13366.533546] bpf_prog_select_runtime+0xb4/0x100 [13366.534108] bpf_prog_load+0x6b1/0xa50 [13366.534610] ? perf_event_task_tick+0x96/0xb0 [13366.535151] ? security_capable+0x3a/0x60 [13366.535663] __sys_bpf+0xb38/0x2190 [13366.536120] ? kvm_clock_get_cycles+0x9/0x10 [13366.536643] __x64_sys_bpf+0x1c/0x30 [13366.537094] do_syscall_64+0x38/0x90 [13366.537554] entry_SYSCALL_64_after_hwframe+0x72/0xdc [13366.538107] RIP: 0033:0x7f78310f8e29 [13366.538561] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 17 e0 2c 00 f7 d8 64 89 01 48 [13366.540286] RSP: 002b:00007ffe2a61fff8 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [13366.541031] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f78310f8e29 [13366.541749] RDX: 0000000000000080 RSI: 00007ffe2a6200b0 RDI: 0000000000000005 [13366.542470] RBP: 00007ffe2a620010 R08: 00007ffe2a6202a0 R09: 00007ffe2a6200b0 [13366.543183] R10: 00000000000f423e R11: 0000000000000206 R12: 0000000000407800 [13366.543900] R13: 00007ffe2a620540 R14: 0000000000000000 R15: 0000000000000000 [13366.544623] </TASK> [13366.545260] Mem-Info: [13366.546121] active_anon:81319 inactive_anon:20733 isolated_anon:0 active_file:69450 inactive_file:5624 isolated_file:0 unevictable:0 dirty:10 writeback:0 slab_reclaimable:69649 slab_unreclaimable:48930 mapped:27400 shmem:12868 pagetables:4929 sec_pagetables:0 bounce:0 kernel_misc_reclaimable:0 free:15870308 free_pcp:142935 free_cma:0 [13366.551886] Node 0 active_anon:224836kB inactive_anon:33528kB active_file:175692kB inactive_file:13752kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:59248kB dirty:32kB writeback:0kB shmem:18252kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:4616kB pagetables:10664kB sec_pagetables:0kB all_unreclaimable? no [13366.555184] Node 1 active_anon:100440kB inactive_anon:49404kB active_file:102108kB inactive_file:8744kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:50352kB dirty:8kB writeback:0kB shmem:33220kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:3896kB pagetables:9052kB sec_pagetables:0kB all_unreclaimable? no [13366.558262] Node 0 DMA free:15360kB boost:0kB min:304kB low:380kB high:456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [13366.560821] lowmem_reserve[]: 0 2735 31873 31873 31873 [13366.561981] Node 0 DMA32 free:2790904kB boost:0kB min:56028kB low:70032kB high:84036kB reserved_highatomic:0KB active_anon:1936kB inactive_anon:20kB active_file:396kB inactive_file:344kB unevictable:0kB writepending:0kB present:3129200kB managed:2801520kB mlocked:0kB bounce:0kB free_pcp:5188kB local_pcp:0kB free_cma:0kB [13366.565148] lowmem_reserve[]: 0 0 29137 29137 29137 [13366.566168] Node 0 Normal free:28533824kB boost:0kB min:596740kB low:745924kB high:895108kB reserved_highatomic:28672KB active_anon:222900kB inactive_anon:33508kB active_file:175296kB inactive_file:13408kB unevictable:0kB writepending:32kB present:30408704kB managed:29837172kB mlocked:0kB bounce:0kB free_pcp:295724kB local_pcp:0kB free_cma:0kB [13366.569485] lowmem_reserve[]: 0 0 0 0 0 [13366.570416] Node 1 Normal free:32141144kB boost:0kB min:660504kB low:825628kB high:990752kB reserved_highatomic:69632KB active_anon:100440kB inactive_anon:49404kB active_file:102108kB inactive_file:8744kB unevictable:0kB writepending:8kB present:33554432kB managed:33025372kB mlocked:0kB bounce:0kB free_pcp:270880kB local_pcp:46860kB free_cma:0kB [13366.573403] lowmem_reserve[]: 0 0 0 0 0 [13366.574015] Node 0 DMA: 04kB 08kB 016kB 032kB 064kB 0128kB 0256kB 0512kB 11024kB (U) 12048kB (M) 34096kB (M) = 15360kB [13366.575474] Node 0 DMA32: 7824kB (UME) 7568kB (UME) 73616kB (UME) 74532kB (UME) 69464kB (UME) 653128kB (UME) 595256kB (UME) 552512kB (UME) 4541024kB (UME) 3472048kB (UME) 2464096kB (UME) = 2790904kB [13366.577442] Node 0 Normal: 338564kB (UMEH) 518158kB (UMEH) 4241816kB (UMEH) 3627232kB (UMEH) 2219564kB (UMEH) 10296128kB (UMEH) 7238256kB (UMEH) 5638512kB (UEH) 53371024kB (UMEH) 35062048kB (UMEH) 14704096kB (UME) = 28533784kB [13366.580460] Node 1 Normal: 157764kB (UMEH) 374858kB (UMEH) 2950916kB (UMEH) 2142032kB (UMEH) 1481864kB (UMEH) 13051128kB (UMEH) 9918256kB (UMEH) 7374512kB (UMEH) 53971024kB (UMEH) 38872048kB (UMEH) 20024096kB (UME) = 32141240kB [13366.583027] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [13366.584380] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [13366.585702] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [13366.587042] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [13366.588372] 87386 total pagecache pages [13366.589266] 0 pages in swap cache [13366.590327] Free swap = 0kB [13366.591227] Total swap = 0kB [13366.592142] 16777082 pages RAM [13366.593057] 0 pages HighMem/MovableOnly [13366.594037] 357226 pages reserved [13366.594979] 0 pages hwpoisoned This failure really confuse me as there're still lots of available pages. Finally I figured out it was caused by a fatal signal. When a process is allocating memory via vm_area_alloc_pages(), it will break directly even if it hasn't allocated the requested pages when it receives a fatal signal. In that case, we shouldn't show this warn_alloc, as it is useless. We only need to show this warning when there're really no enough pages. Link: https://lkml.kernel.org/r/20230330162625.13604-1-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field	Tetsuo Handa	2	-0/+2
	nilfs_btree_assign_p() and nilfs_direct_assign_p() are not initializing "struct nilfs_binfo_dat"->bi_pad field, causing uninit-value reports when being passed to CRC function. Link: https://lkml.kernel.org/r/20230326152146.15872-1-konishi.ryusuke@gmail.com Reported-by: syzbot <syzbot+048585f3f4227bb2b49b@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com> Link: https://lkml.kernel.org/r/CANX2M5bVbzRi6zH3PTcNE_31TzerstOXUa9Bay4E6y6dX23_pg@mail.gmail.com Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()	Ryusuke Konishi	1	-2/+1
	The finalization of nilfs_segctor_thread() can race with nilfs_segctor_kill_thread() which terminates that thread, potentially causing a use-after-free BUG as KASAN detected. At the end of nilfs_segctor_thread(), it assigns NULL to "sc_task" member of "struct nilfs_sc_info" to indicate the thread has finished, and then notifies nilfs_segctor_kill_thread() of this using waitqueue "sc_wait_task" on the struct nilfs_sc_info. However, here, immediately after the NULL assignment to "sc_task", it is possible that nilfs_segctor_kill_thread() will detect it and return to continue the deallocation, freeing the nilfs_sc_info structure before the thread does the notification. This fixes the issue by protecting the NULL assignment to "sc_task" and its notification, with spinlock "sc_state_lock" of the struct nilfs_sc_info. Since nilfs_segctor_kill_thread() does a final check to see if "sc_task" is NULL with "sc_state_lock" locked, this can eliminate the race. Link: https://lkml.kernel.org/r/20230327175318.8060-1-konishi.ryusuke@gmail.com Reported-by: syzbot+b08ebcc22f8f3e6be43a@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/00000000000000660d05f7dfa877@google.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	zsmalloc: document freeable stats	Sergey Senozhatsky	1	-0/+2
	When freeable class stat was added to classes file (back in 2016) we forgot to update zsmalloc documentation. Fix that. Link: https://lkml.kernel.org/r/20230325024631.2817153-3-senozhatsky@chromium.org Fixes: 1120ed548394 ("mm/zsmalloc: add `freeable' column to pool stat") Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	zsmalloc: document new fullness grouping	Sergey Senozhatsky	1	-59/+74
	Patch series "zsmalloc: minor documentation updates". Two minor patches that bring zsmalloc documentation up to date. This patch (of 2): Update documentation and reflect new zspages fullness grouping (we don't use almost_empty and almost_full anymore). Link: https://lkml.kernel.org/r/20230325024631.2817153-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20230325024631.2817153-2-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Fixes: 67e157eb3639 ("zsmalloc: show per fullness group class stats") Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	fsdax: force clear dirty mark if CoW	Shiyang Ruan	1	-0/+37
	XFS allows CoW on non-shared extents to combat fragmentation[1]. The old non-shared extent could be mwrited before, its dax entry is marked dirty. This results in a WARNing: [ 28.512349] ------------[ cut here ]------------ [ 28.512622] WARNING: CPU: 2 PID: 5255 at fs/dax.c:390 dax_insert_entry+0x342/0x390 [ 28.513050] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables [ 28.515462] CPU: 2 PID: 5255 Comm: fsstress Kdump: loaded Not tainted 6.3.0-rc1-00001-g85e1481e19c1-dirty #117 [ 28.515902] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.1-1-1 04/01/2014 [ 28.516307] RIP: 0010:dax_insert_entry+0x342/0x390 [ 28.516536] Code: 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b 45 20 48 83 c0 01 e9 e2 fe ff ff 48 8b 45 20 48 83 c0 01 e9 cd fe ff ff <0f> 0b e9 53 ff ff ff 48 8b 7c 24 08 31 f6 e8 1b 61 a1 00 eb 8c 48 [ 28.517417] RSP: 0000:ffffc9000845fb18 EFLAGS: 00010086 [ 28.517721] RAX: 0000000000000053 RBX: 0000000000000155 RCX: 000000000018824b [ 28.518113] RDX: 0000000000000000 RSI: ffffffff827525a6 RDI: 00000000ffffffff [ 28.518515] RBP: ffffea00062092c0 R08: 0000000000000000 R09: ffffc9000845f9c8 [ 28.518905] R10: 0000000000000003 R11: ffffffff82ddb7e8 R12: 0000000000000155 [ 28.519301] R13: 0000000000000000 R14: 000000000018824b R15: ffff88810cfa76b8 [ 28.519703] FS: 00007f14a0c94740(0000) GS:ffff88817bd00000(0000) knlGS:0000000000000000 [ 28.520148] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 28.520472] CR2: 00007f14a0c8d000 CR3: 000000010321c004 CR4: 0000000000770ee0 [ 28.520863] PKRU: 55555554 [ 28.521043] Call Trace: [ 28.521219] <TASK> [ 28.521368] dax_fault_iter+0x196/0x390 [ 28.521595] dax_iomap_pte_fault+0x19b/0x3d0 [ 28.521852] __xfs_filemap_fault+0x234/0x2b0 [ 28.522116] __do_fault+0x30/0x130 [ 28.522334] do_fault+0x193/0x340 [ 28.522586] __handle_mm_fault+0x2d3/0x690 [ 28.522975] handle_mm_fault+0xe6/0x2c0 [ 28.523259] do_user_addr_fault+0x1bc/0x6f0 [ 28.523521] exc_page_fault+0x60/0x140 [ 28.523763] asm_exc_page_fault+0x22/0x30 [ 28.524001] RIP: 0033:0x7f14a0b589ca [ 28.524225] Code: c5 fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 40 0f b6 c6 48 89 d1 48 89 fa <f3> aa 48 89 d0 c5 f8 77 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 [ 28.525198] RSP: 002b:00007fff1dea1c98 EFLAGS: 00010202 [ 28.525505] RAX: 000000000000001e RBX: 000000000014a000 RCX: 0000000000006046 [ 28.525895] RDX: 00007f14a0c82000 RSI: 000000000000001e RDI: 00007f14a0c8d000 [ 28.526290] RBP: 000000000000006f R08: 0000000000000004 R09: 000000000014a000 [ 28.526681] R10: 0000000000000008 R11: 0000000000000246 R12: 028f5c28f5c28f5c [ 28.527067] R13: 8f5c28f5c28f5c29 R14: 0000000000011046 R15: 00007f14a0c946c0 [ 28.527449] </TASK> [ 28.527600] ---[ end trace 0000000000000000 ]--- To be able to delete this entry, clear its dirty mark before invalidate_inode_pages2_range(). [1] https://lore.kernel.org/linux-xfs/20230321151339.GA11376@frogsfrogsfrogs/ Link: https://lkml.kernel.org/r/1679653680-2-1-git-send-email-ruansy.fnst@fujitsu.com Fixes: f80e1668888f3 ("fsdax: invalidate pages when CoW") Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	mm/hugetlb: fix uffd wr-protection for CoW optimization path	Peter Xu	1	-2/+12
	This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be writable even with uffd-wp bit set. It only happens with hugetlb private mappings, when someone firstly wr-protects a missing pte (which will install a pte marker), then a write to the same page without any prior access to the page. Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before reaching hugetlb_wp() to avoid taking more locks that userfault won't need. However there's one CoW optimization path that can trigger hugetlb_wp() inside hugetlb_no_page(), which will bypass the trap. This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit is detected. The new path will only trigger in the CoW optimization path because generic hugetlb_fault() (e.g. when a present pte was wr-protected) will resolve the uffd-wp bit already. Also make sure anonymous UNSHARE won't be affected and can still be resolved, IOW only skip CoW not CoR. This patch will be needed for v5.19+ hence copy stable. [peterx@redhat.com: v2] Link: https://lkml.kernel.org/r/ZBzOqwF2wrHgBVZb@x1n [peterx@redhat.com: v3] Link: https://lkml.kernel.org/r/20230324142620.2344140-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20230321191840.1897940-1-peterx@redhat.com Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	mm: enable maple tree RCU mode by default	Liam R. Howlett	3	-2/+7
	Use the maple tree in RCU mode for VMA tracking. The maple tree tracks the stack and is able to update the pivot (lower/upper boundary) in-place to allow the page fault handler to write to the tree while holding just the mmap read lock. This is safe as the writes to the stack have a guard VMA which ensures there will always be a NULL in the direction of the growth and thus will only update a pivot. It is possible, but not recommended, to have VMAs that grow up/down without guard VMAs. syzbot has constructed a testcase which sets up a VMA to grow and consume the empty space. Overwriting the entire NULL entry causes the tree to be altered in a way that is not safe for concurrent readers; the readers may see a node being rewritten or one that does not match the maple state they are using. Enabling RCU mode allows the concurrent readers to see a stable node and will return the expected result. [Liam.Howlett@Oracle.com: we don't need to free the nodes with RCU[ Link: https://lore.kernel.org/linux-mm/000000000000b0a65805f663ace6@google.com/ Link: https://lkml.kernel.org/r/20230227173632.3292573-9-surenb@google.com Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: syzbot+8d95422d3537159ca390@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	maple_tree: add RCU lock checking to rcu callback functions	Liam R. Howlett	1	-92/+96
	Dereferencing RCU objects within the RCU callback without the RCU check has caused lockdep to complain. Fix the RCU dereferencing by using the RCU callback lock to ensure the operation is safe. Also stop creating a new lock to use for dereferencing during destruction of the tree or subtree. Instead, pass through a pointer to the tree that has the lock that is held for RCU dereferencing checking. It also does not make sense to use the maple state in the freeing scenario as the tree walk is a special case where the tree no longer has the normal encodings and parent pointers. Link: https://lkml.kernel.org/r/20230227173632.3292573-8-surenb@google.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	maple_tree: add smp_rmb() to dead node detection	Liam R. Howlett	1	-2/+6
	Add an smp_rmb() before reading the parent pointer to ensure that anything read from the node prior to the parent pointer hasn't been reordered ahead of this check. The is necessary for RCU mode. Link: https://lkml.kernel.org/r/20230227173632.3292573-7-surenb@google.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	maple_tree: fix write memory barrier of nodes once dead for RCU mode	Liam R. Howlett	2	-2/+21
	During the development of the maple tree, the strategy of freeing multiple nodes changed and, in the process, the pivots were reused to store pointers to dead nodes. To ensure the readers see accurate pivots, the writers need to mark the nodes as dead and call smp_wmb() to ensure any readers can identify the node as dead before using the pivot values. There were two places where the old method of marking the node as dead without smp_wmb() were being used, which resulted in RCU readers seeing the wrong pivot value before seeing the node was dead. Fix this race condition by using mte_set_node_dead() which has the smp_wmb() call to ensure the race is closed. Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being freed are marked as dead to ensure there are no other call paths besides the two updated paths. This is necessary for the RCU mode of the maple tree. Link: https://lkml.kernel.org/r/20230227173632.3292573-6-surenb@google.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	maple_tree: remove extra smp_wmb() from mas_dead_leaves()	Liam Howlett	1	-1/+0
	The call to mte_set_dead_node() before the smp_wmb() already calls smp_wmb() so this is not needed. This is an optimization for the RCU mode of the maple tree. Link: https://lkml.kernel.org/r/20230227173632.3292573-5-surenb@google.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	maple_tree: fix freeing of nodes in rcu mode	Liam Howlett	1	-11/+62
	The walk to destroy the nodes was not always setting the node type and would result in a destroy method potentially using the values as nodes. Avoid this by setting the correct node types. This is necessary for the RCU mode of the maple tree. Link: https://lkml.kernel.org/r/20230227173632.3292573-4-surenb@google.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	maple_tree: detect dead nodes in mas_start()	Liam Howlett	1	-0/+4
	When initially starting a search, the root node may already be in the process of being replaced in RCU mode. Detect and restart the walk if this is the case. This is necessary for RCU mode of the maple tree. Link: https://lkml.kernel.org/r/20230227173632.3292573-3-surenb@google.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-06	maple_tree: be more cautious about dead nodes	Liam Howlett	1	-9/+43
	Patch series "Fix VMA tree modification under mmap read lock". Syzbot reported a BUG_ON in mm/mmap.c which was found to be caused by an inconsistency between threads walking the VMA maple tree. The inconsistency is caused by the page fault handler modifying the maple tree while holding the mmap_lock for read. This only happens for stack VMAs. We had thought this was safe as it only modifies a single pivot in the tree. Unfortunately, syzbot constructed a test case where the stack had no guard page and grew the stack to abut the next VMA. This causes us to delete the NULL entry between the two VMAs and rewrite the node. We considered several options for fixing this, including dropping the mmap_lock, then reacquiring it for write; and relaxing the definition of the tree to permit a zero-length NULL entry in the node. We decided the best option was to backport some of the RCU patches from -next, which solve the problem by allocating a new node and RCU-freeing the old node. Since the problem exists in 6.1, we preferred a solution which is similar to the one we intended to merge next merge window. These patches have been in -next since next-20230301, and have received intensive testing in Android as part of the RCU page fault patchset. They were also sent as part of the "Per-VMA locks" v4 patch series. Patches 1 to 7 are bug fixes for RCU mode of the tree and patch 8 enables RCU mode for the tree. Performance v6.3-rc3 vs patched v6.3-rc3: Running these changes through mmtests showed there was a 15-20% performance decrease in will-it-scale/brk1-processes. This tests creating and inserting a single VMA repeatedly through the brk interface and isn't representative of any real world applications. This patch (of 8): ma_pivots() and ma_data_end() may be called with a dead node. Ensure to that the node isn't dead before using the returned values. This is necessary for RCU mode of the maple tree. Link: https://lkml.kernel.org/r/20230327185532.2354250-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20230227173632.3292573-1-surenb@google.com Link: https://lkml.kernel.org/r/20230227173632.3292573-2-surenb@google.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Li <chriscli@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: freak07 <michalechner92@googlemail.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05	ACPI: video: Add acpi_backlight=video quirk for Lenovo ThinkPad W530	Hans de Goede	1	-0/+14
	The Lenovo ThinkPad W530 uses a nvidia k1000m GPU. When this gets used together with one of the older nvidia binary driver series (the latest series does not support it), then backlight control does not work. This is caused by commit 3dbc80a3e4c5 ("ACPI: video: Make backlight class device registration a separate step (v2)") combined with commit 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default"). After these changes the acpi_video# backlight device is only registered when requested by a GPU driver calling acpi_video_register_backlight() which the nvidia binary driver does not do. I realize that using the nvidia binary driver is not a supported use-case and users can workaround this by adding acpi_backlight=video on the kernel commandline, but the ThinkPad W530 is a popular model under Linux users, so it seems worthwhile to add a quirk for this. I will also email Nvidia asking them to make the driver call acpi_video_register_backlight() when an internal LCD panel is detected. So maybe the next maintenance release of the drivers will fix this... Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default") Cc: All applicable <stable@vger.kernel.org> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-05	ACPI: video: Add acpi_backlight=video quirk for Apple iMac14,1 and iMac14,2	Hans de Goede	1	-0/+23
	On the Apple iMac14,1 and iMac14,2 all-in-ones (monitors with builtin "PC") the connection between the GPU and the panel is seen by the GPU driver as regular DP instead of eDP, causing the GPU driver to never call acpi_video_register_backlight(). (GPU drivers only call acpi_video_register_backlight() when an internal panel is detected, to avoid non working acpi_video# devices getting registered on desktops which unfortunately is a real issue.) Fix the missing acpi_video# backlight device on these all-in-ones by adding a acpi_backlight=video DMI quirk, so that video.ko will immediately register the backlight device instead of waiting for an acpi_video_register_backlight() call. Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default") Cc: All applicable <stable@vger.kernel.org> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>