summaryrefslogtreecommitdiffstats
path: root/arch/arm64 (follow)
Commit message (Collapse)AuthorAgeFilesLines
* KVM: arm64: Shave a few bytes from the EL2 idmap codeMarc Zyngier2024-10-173-23/+31
| | | | | | | | | | | | | | | Our idmap is becoming too big, to the point where it doesn't fit in a 4kB page anymore. There are some low-hanging fruits though, such as the el2_init_state horror that is expanded 3 times in the kernel. Let's at least limit ourselves to two copies, which makes the kernel link again. At some point, we'll have to have a better way of doing this. Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241009204903.GA3353168@thelio-3990X
* KVM: arm64: Don't eagerly teardown the vgic on init errorMarc Zyngier2024-10-112-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As there is very little ordering in the KVM API, userspace can instanciate a half-baked GIC (missing its memory map, for example) at almost any time. This means that, with the right timing, a thread running vcpu-0 can enter the kernel without a GIC configured and get a GIC created behind its back by another thread. Amusingly, it will pick up that GIC and start messing with the data structures without the GIC having been fully initialised. Similarly, a thread running vcpu-1 can enter the kernel, and try to init the GIC that was previously created. Since this GIC isn't properly configured (no memory map), it fails to correctly initialise. And that's the point where we decide to teardown the GIC, freeing all its resources. Behind vcpu-0's back. Things stop pretty abruptly, with a variety of symptoms. Clearly, this isn't good, we should be a bit more careful about this. It is obvious that this guest is not viable, as it is missing some important part of its configuration. So instead of trying to tear bits of it down, let's just mark it as *dead*. It means that any further interaction from userspace will result in -EIO. The memory will be released on the "normal" path, when userspace gives up. Cc: stable@vger.kernel.org Reported-by: Alexander Potapenko <glider@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241009183603.3221824-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: Expose S1PIE to guestsMark Brown2024-10-081-1/+3
| | | | | | | | | | | | | | | Prior to commit 70ed7238297f ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1") we just exposed the santised view of ID_AA64MMFR3_EL1 to guests, meaning that they saw both TCRX and S1PIE if present on the host machine. That commit added VMM control over the contents of the register and exposed S1POE but removed S1PIE, meaning that the extension is no longer visible to guests. Reenable support for S1PIE with VMM control. Fixes: 70ed7238297f ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1") Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241005-kvm-arm64-fix-s1pie-v1-1-5901f02de749@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to rescheduleOliver Upton2024-10-081-0/+27
| | | | | | | | | | | There's been a decent amount of attention around unmaps of nested MMUs, and TLBI handling is no exception to this. Add a comment clarifying why it is safe to reschedule during a TLBI unmap, even without a reference on the MMU in progress. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-5-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: nv: Punt stage-2 recycling to a vCPU requestOliver Upton2024-10-084-2/+37
| | | | | | | | | | | | | | | | | | | Currently, when a nested MMU is repurposed for some other MMU context, KVM unmaps everything during vcpu_load() while holding the MMU lock for write. This is quite a performance bottleneck for large nested VMs, as all vCPU scheduling will spin until the unmap completes. Start punting the MMU cleanup to a vCPU request, where it is then possible to periodically release the MMU lock and CPU in the presence of contention. Ensure that no vCPU winds up using a stale MMU by tracking the pending unmap on the S2 MMU itself and requesting an unmap on every vCPU that finds it. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-4-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: nv: Do not block when unmapping stage-2 if disallowedOliver Upton2024-10-085-15/+17
| | | | | | | | | | | | | | | | Right now the nested code allows unmap operations on a shadow stage-2 to block unconditionally. This is wrong in a couple places, such as a non-blocking MMU notifier or on the back of a sched_in() notifier as part of shadow MMU recycling. Carry through whether or not blocking is allowed to kvm_pgtable_stage2_unmap(). This 'fixes' an issue where stage-2 MMU reclaim would precipitate a stack overflow from a pile of kvm_sched_in() callbacks, all trying to recycle a stage-2 MMU. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: nv: Keep reference on stage-2 MMU when scheduled outOliver Upton2024-10-081-3/+18
| | | | | | | | | | | | | | | | | | | | | | If a vCPU is scheduling out and not in WFI emulation, it is highly likely it will get scheduled again soon and reuse the MMU it had before. Dropping the MMU at vcpu_put() can have some unfortunate consequences, as the MMU could get reclaimed and used in a different context, forcing another 'cold start' on an otherwise active MMU. Avoid that altogether by keeping a reference on the MMU if the vCPU is scheduling out, ensuring that another vCPU cannot reclaim it while the current vCPU is away. Since there are more MMUs than vCPUs, this does not affect the guarantee that an unused MMU is available at any time. Furthermore, this makes the vcpu->arch.hw_mmu ~stable in preemptible code, at least for where it matters in the stage-2 abort path. Yes, the MMU can change across WFI emulation, but there isn't even a use case where this would matter. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-2-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: Unregister redistributor for failed vCPU creationOliver Upton2024-10-081-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alex reports that syzkaller has managed to trigger a use-after-free when tearing down a VM: BUG: KASAN: slab-use-after-free in kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769 Read of size 8 at addr ffffff801c6890d0 by task syz.3.2219/10758 CPU: 3 UID: 0 PID: 10758 Comm: syz.3.2219 Not tainted 6.11.0-rc6-dirty #64 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x17c/0x1a8 arch/arm64/kernel/stacktrace.c:317 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:324 __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x94/0xc0 lib/dump_stack.c:119 print_report+0x144/0x7a4 mm/kasan/report.c:377 kasan_report+0xcc/0x128 mm/kasan/report.c:601 __asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381 kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769 kvm_vm_release+0x4c/0x60 virt/kvm/kvm_main.c:1409 __fput+0x198/0x71c fs/file_table.c:422 ____fput+0x20/0x30 fs/file_table.c:450 task_work_run+0x1cc/0x23c kernel/task_work.c:228 do_notify_resume+0x144/0x1a0 include/linux/resume_user_mode.h:50 el0_svc+0x64/0x68 arch/arm64/kernel/entry-common.c:169 el0t_64_sync_handler+0x90/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Upon closer inspection, it appears that we do not properly tear down the MMIO registration for a vCPU that fails creation late in the game, e.g. a vCPU w/ the same ID already exists in the VM. It is important to consider the context of commit that introduced this bug by moving the unregistration out of __kvm_vgic_vcpu_destroy(). That change correctly sought to avoid an srcu v. config_lock inversion by breaking up the vCPU teardown into two parts, one guarded by the config_lock. Fix the use-after-free while avoiding lock inversion by adding a special-cased unregistration to __kvm_vgic_vcpu_destroy(). This is safe because failed vCPUs are torn down outside of the config_lock. Cc: stable@vger.kernel.org Fixes: f616506754d3 ("KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors") Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007223909.2157336-1-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
* Merge branch kvm-arm64/idregs-6.12 into kvmarm/fixesMarc Zyngier2024-10-082-8/+42
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/idregs-6.12: : . : Make some fields of ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1 : writable from userspace, so that a VMM can influence the : set of guest-visible features. : : - for ID_AA64DFR0_EL1: DoubleLock, WRPs, PMUVer and DebugVer : are writable (courtesy of Shameer Kolothum) : : - for ID_AA64PFR1_EL1: BT, SSBS, CVS2_frac are writable : (courtesy of Shaoqin Huang) : . KVM: selftests: aarch64: Add writable test for ID_AA64PFR1_EL1 KVM: arm64: Allow userspace to change ID_AA64PFR1_EL1 KVM: arm64: Use kvm_has_feat() to check if FEAT_SSBS is advertised to the guest KVM: arm64: Disable fields that KVM doesn't know how to handle in ID_AA64PFR1_EL1 KVM: arm64: Make the exposed feature bits in AA64DFR0_EL1 writable from userspace Signed-off-by: Marc Zyngier <maz@kernel.org>
| * KVM: arm64: Allow userspace to change ID_AA64PFR1_EL1Shaoqin Huang2024-08-251-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow userspace to change the guest-visible value of the register with different way of handling: - Since the RAS and MPAM is not writable in the ID_AA64PFR0_EL1 register, RAS_frac and MPAM_frac are also not writable in the ID_AA64PFR1_EL1 register. - The MTE is controlled by a separate UAPI (KVM_CAP_ARM_MTE) with an internal flag (KVM_ARCH_FLAG_MTE_ENABLED). So it's not writable. - For those fields which KVM doesn't know how to handle, they are not exposed to the guest (being disabled in the register read accessor), those fields value will always be 0. Those fields don't have a known behavior now, so don't advertise them to the userspace. Thus still not writable. Those fields include SME, RNDR_trap, NMI, GCS, THE, DF2, PFAR, MTE_frac, MTEX. - The BT, SSBS, CSV2_frac don't introduce any new registers which KVM doesn't know how to handle, they can be written without ill effect. So let them writable. Besides, we don't do the crosscheck in KVM about the CSV2_frac even if it depends on the value of CSV2, it should be made sure by the VMM instead of KVM. Signed-off-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240723072004.1470688-4-shahuang@redhat.com Signed-off-by: Marc Zyngier <maz@kernel.org>
| * KVM: arm64: Use kvm_has_feat() to check if FEAT_SSBS is advertised to the guestShaoqin Huang2024-08-251-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently KVM use cpus_have_final_cap() to check if FEAT_SSBS is advertised to the guest. But if FEAT_SSBS is writable and isn't advertised to the guest, this is wrong. Update it to use kvm_has_feat() to check if FEAT_SSBS is advertised to the guest, thus the KVM can do the right thing if FEAT_SSBS isn't advertised to the guest. Signed-off-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240723072004.1470688-3-shahuang@redhat.com Signed-off-by: Marc Zyngier <maz@kernel.org>
| * KVM: arm64: Disable fields that KVM doesn't know how to handle in ↵Shaoqin Huang2024-08-251-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ID_AA64PFR1_EL1 For some of the fields in the ID_AA64PFR1_EL1 register, KVM doesn't know how to handle them right now. So explicitly disable them in the register accessor, then those fields value will be masked to 0 even if on the hardware the field value is 1. This is safe because from a UAPI point of view that read_sanitised_ftr_reg() doesn't yet return a nonzero value for any of those fields. This will benifit the migration if the host and VM have different values when restoring a VM. Those fields include RNDR_trap, NMI, MTE_frac, GCS, THE, MTEX, DF2, PFAR. Signed-off-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240723072004.1470688-2-shahuang@redhat.com Signed-off-by: Marc Zyngier <maz@kernel.org>
| * KVM: arm64: Make the exposed feature bits in AA64DFR0_EL1 writable from ↵Shameer Kolothum2024-08-221-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | userspace KVM exposes the OS double lock feature bit to Guests but returns RAZ/WI on Guest OSDLR_EL1 access. This breaks Guest migration between systems where this feature differ. Add support to make this feature writable from userspace by setting the mask bit. While at it, set the mask bits for the exposed WRPs(Number of Watchpoints) as well. Also update the selftest to cover these fields. However we still can't make BRPs and CTX_CMPs fields writable, because as per ARM ARM DDI 0487K.a, section D2.8.3 Breakpoint types and linking of breakpoints, highest numbered breakpoints(BRPs) must be context aware breakpoints(CTX_CMPs). KVM does not trap + emulate the breakpoint registers, and as such cannot support a layout that misaligns with the underlying hardware. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20240816132819.34316-1-shameerali.kolothum.thodi@huawei.com Signed-off-by: Marc Zyngier <maz@kernel.org>
* | KVM: arm64: Fix kvm_has_feat*() handling of negative featuresMarc Zyngier2024-10-031-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Oliver reports that the kvm_has_feat() helper is not behaviing as expected for negative feature. On investigation, the main issue seems to be caused by the following construct: #define get_idreg_field(kvm, id, fld) \ (id##_##fld##_SIGNED ? \ get_idreg_field_signed(kvm, id, fld) : \ get_idreg_field_unsigned(kvm, id, fld)) where one side of the expression evaluates as something signed, and the other as something unsigned. In retrospect, this is totally braindead, as the compiler converts this into an unsigned expression. When compared to something that is 0, the test is simply elided. Epic fail. Similar issue exists in the expand_field_sign() macro. The correct way to handle this is to chose between signed and unsigned comparisons, so that both sides of the ternary expression are of the same type (bool). In order to keep the code readable (sort of), we introduce new comparison primitives taking an operator as a parameter, and rewrite the kvm_has_feat*() helpers in terms of these primitives. Fixes: c62d7a23b947 ("KVM: arm64: Add feature checking helpers") Reported-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Oliver Upton <oliver.upton@linux.dev> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241002204239.2051637-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
* | KVM: arm64: Constrain the host to the maximum shared SVE VL with pKVMMark Brown2024-10-012-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When pKVM saves and restores the host floating point state on a SVE system, it programs the vector length in ZCR_EL2.LEN to be whatever the maximum VL for the PE is. But it uses a buffer allocated with kvm_host_sve_max_vl, the maximum VL shared by all PEs in the system. This means that if we run on a system where the maximum VLs are not consistent, we will overflow the buffer on PEs which support larger VLs. Since the host will not currently attempt to make use of non-shared VLs, fix this by explicitly setting the EL2 VL to be the maximum shared VL when we save and restore. This will enforce the limit on host VL usage. Should we wish to support asymmetric VLs, this code will need to be updated along with the required changes for the host: https://lore.kernel.org/r/20240730-kvm-arm64-fix-pkvm-sve-vl-v6-0-cae8a2e0bd66@kernel.org Fixes: b5b9955617bc ("KVM: arm64: Eagerly restore host fpsimd/sve state in pKVM") Signed-off-by: Mark Brown <broonie@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240912-kvm-arm64-limit-guest-vl-v2-1-dd2c29cb2ac9@kernel.org [maz: added punctuation to the commit message] Signed-off-by: Marc Zyngier <maz@kernel.org>
* | KVM: arm64: Fix __pkvm_init_vcpu cptr_el2 error pathVincent Donnefort2024-10-011-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | On an error, hyp_vcpu will be accessed while this memory has already been relinquished to the host and unmapped from the hypervisor. Protect the CPTR assignment with an early return. Fixes: b5b9955617bc ("KVM: arm64: Eagerly restore host fpsimd/sve state in pKVM") Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20240919110500.2345927-1-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2024-09-281-3/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull x86 kvm updates from Paolo Bonzini: "x86: - KVM currently invalidates the entirety of the page tables, not just those for the memslot being touched, when a memslot is moved or deleted. This does not traditionally have particularly noticeable overhead, but Intel's TDX will require the guest to re-accept private pages if they are dropped from the secure EPT, which is a non starter. Actually, the only reason why this is not already being done is a bug which was never fully investigated and caused VM instability with assigned GeForce GPUs, so allow userspace to opt into the new behavior. - Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10 functionality that is on the horizon) - Rework common MSR handling code to suppress errors on userspace accesses to unsupported-but-advertised MSRs This will allow removing (almost?) all of KVM's exemptions for userspace access to MSRs that shouldn't exist based on the vCPU model (the actual cleanup is non-trivial future work) - Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the 64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv) stores the entire 64-bit value at the ICR offset - Fix a bug where KVM would fail to exit to userspace if one was triggered by a fastpath exit handler - Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when there's already a pending wake event at the time of the exit - Fix a WARN caused by RSM entering a nested guest from SMM with invalid guest state, by forcing the vCPU out of guest mode prior to signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the nested guest) - Overhaul the "unprotect and retry" logic to more precisely identify cases where retrying is actually helpful, and to harden all retry paths against putting the guest into an infinite retry loop - Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in the shadow MMU - Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for adding multi generation LRU support in KVM - Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled, i.e. when the CPU has already flushed the RSB - Trace the per-CPU host save area as a VMCB pointer to improve readability and cleanup the retrieval of the SEV-ES host save area - Remove unnecessary accounting of temporary nested VMCB related allocations - Set FINAL/PAGE in the page fault error code for EPT violations if and only if the GVA is valid. If the GVA is NOT valid, there is no guest-side page table walk and so stuffing paging related metadata is nonsensical - Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of emulating posted interrupt delivery to L2 - Add a lockdep assertion to detect unsafe accesses of vmcs12 structures - Harden eVMCS loading against an impossible NULL pointer deref (really truly should be impossible) - Minor SGX fix and a cleanup - Misc cleanups Generic: - Register KVM's cpuhp and syscore callbacks when enabling virtualization in hardware, as the sole purpose of said callbacks is to disable and re-enable virtualization as needed - Enable virtualization when KVM is loaded, not right before the first VM is created Together with the previous change, this simplifies a lot the logic of the callbacks, because their very existence implies virtualization is enabled - Fix a bug that results in KVM prematurely exiting to userspace for coalesced MMIO/PIO in many cases, clean up the related code, and add a testcase - Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_ the gpa+len crosses a page boundary, which thankfully is guaranteed to not happen in the current code base. Add WARNs in more helpers that read/write guest memory to detect similar bugs Selftests: - Fix a goof that caused some Hyper-V tests to be skipped when run on bare metal, i.e. NOT in a VM - Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest - Explicitly include one-off assets in .gitignore. Past Sean was completely wrong about not being able to detect missing .gitignore entries - Verify userspace single-stepping works when KVM happens to handle a VM-Exit in its fastpath - Misc cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits) Documentation: KVM: fix warning in "make htmldocs" s390: Enable KVM_S390_UCONTROL config in debug_defconfig selftests: kvm: s390: Add VM run test case KVM: SVM: let alternatives handle the cases when RSB filling is required KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range() KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps() KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range() KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure() KVM: x86: Update retry protection fields when forcing retry on emulation failure KVM: x86: Apply retry protection to "unprotect on failure" path KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn ...
| * \ Merge branch 'kvm-redo-enable-virt' into HEADPaolo Bonzini2024-09-171-3/+3
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Register KVM's cpuhp and syscore callbacks when enabling virtualization in hardware, as the sole purpose of said callbacks is to disable and re-enable virtualization as needed. The primary motivation for this series is to simplify dealing with enabling virtualization for Intel's TDX, which needs to enable virtualization when kvm-intel.ko is loaded, i.e. long before the first VM is created. That said, this is a nice cleanup on its own. By registering the callbacks on-demand, the callbacks themselves don't need to check kvm_usage_count, because their very existence implies a non-zero count. Patch 1 (re)adds a dedicated lock for kvm_usage_count. This avoids a lock ordering issue between cpus_read_lock() and kvm_lock. The lock ordering issue still exist in very rare cases, and will be fixed for good by switching vm_list to an (S)RCU-protected list. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| | * | KVM: Rename arch hooks related to per-CPU virtualization enablingSean Christopherson2024-09-041-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename the per-CPU hooks used to enable virtualization in hardware to align with the KVM-wide helpers in kvm_main.c, and to better capture that the callbacks are invoked on every online CPU. No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-ID: <20240830043600.127750-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | | Merge tag 'asm-generic-6.12' of ↵Linus Torvalds2024-09-261-2/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "These are only two small patches, one cleanup for arch/alpha and a preparation patch cleaning up the handling of runtime constants in the linker scripts" * tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: runtime constants: move list of constants to vmlinux.lds.h alpha: no need to include asm/xchg.h twice
| * | | | runtime constants: move list of constants to vmlinux.lds.hJann Horn2024-08-191-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor the list of constant variables into a macro. This should make it easier to add more constants in the future. Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* | | | | Merge tag 'char-misc-6.12-rc1' of ↵Linus Torvalds2024-09-261-1/+2
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc driver updates from Greg KH: "Here is the "big" set of char/misc and other driver subsystem changes for 6.12-rc1. Lots of changes in here, primarily dominated by the usual IIO driver updates and additions, but there are also small driver subsystem updates all over the place. Included in here are: - lots and lots of new IIO drivers and updates to existing ones - interconnect subsystem updates and new drivers - nvmem subsystem updates and new drivers - mhi driver updates - power supply subsystem updates - kobj_type const work for many different small subsystems - comedi driver fix - coresight subsystem and driver updates - fpga subsystem improvements - slimbus fixups - binder new feature addition for "frozen" notifications - lots and lots of other small driver updates and cleanups All of these have been in linux-next for a long time with no reported problems" * tag 'char-misc-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (354 commits) greybus: gb-beagleplay: Add firmware upload API arm64: dts: ti: k3-am625-beagleplay: Add bootloader-backdoor-gpios to cc1352p7 dt-bindings: net: ti,cc1352p7: Add bootloader-backdoor-gpios MAINTAINERS: Update path for U-Boot environment variables YAML nvmem: layouts: add U-Boot env layout comedi: ni_routing: tools: Check when the file could not be opened ocxl: Remove the unused declarations in headr file hpet: Fix the wrong format specifier uio: Constify struct kobj_type cxl: Constify struct kobj_type binder: modify the comment for binder_proc_unlock iio: adc: axp20x_adc: add support for AXP717 ADC dt-bindings: iio: adc: Add AXP717 compatible iio: adc: axp20x_adc: Add adc_en1 and adc_en2 to axp_data w1: ds2482: Drop explicit initialization of struct i2c_device_id::driver_data to 0 tools: iio: rm .*.cmd when make clean iio: adc: standardize on formatting for id match tables iio: proximity: aw96103: Add support for aw96103/aw96105 proximity sensor bus: mhi: host: pci_generic: Enable EDL trigger for Foxconn modems bus: mhi: host: pci_generic: Update EDL firmware path for Foxconn modems ...
| * | | | | arm64: dts: ti: k3-am625-beagleplay: Add bootloader-backdoor-gpios to cc1352p7Ayush Singh2024-09-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add bootloader-backdoor-gpios which is required for enabling bootloader backdoor for flashing firmware to cc1352p7. Also fix the incorrect reset-gpio. Signed-off-by: Ayush Singh <ayush@beagleboard.org> Reviewed-by: Dhruva Gole <d-gole@ti.com> Link: https://lore.kernel.org/r/20240903-beagleplay_fw_upgrade-v4-2-526fc62204a7@beagleboard.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | | | Merge tag 'tty-6.12-rc1' of ↵Linus Torvalds2024-09-261-0/+33
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial driver updates from Greg KH: "Here is the "big" set of tty/serial driver updates for 6.12-rc1. Nothing major in here, just nice forward progress in the slow cleanup of the serial apis, and lots of other driver updates and fixes. Included in here are: - serial api updates from Jiri to make things more uniform and sane - 8250_platform driver cleanups - samsung serial driver fixes and updates - qcom-geni serial driver fixes from Johan for the bizarre UART engine that that chip seems to have. Hopefully it's in a better state now, but hardware designers still seem to come up with more ways to make broken UARTS 40+ years after this all should have finished. - sc16is7xx driver updates - omap 8250 driver updates - 8250_bcm2835aux driver updates - a few new serial driver bindings added - other serial minor driver updates All of these have been in linux-next for a long time with no reported problems" * tag 'tty-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (65 commits) tty: serial: samsung: Fix serial rx on Apple A7-A9 tty: serial: samsung: Fix A7-A11 serial earlycon SError tty: serial: samsung: Use bit manipulation macros for APPLE_S5L_* tty: rp2: Fix reset with non forgiving PCIe host bridges serial: 8250_aspeed_vuart: Enable module autoloading serial: qcom-geni: fix polled console corruption serial: qcom-geni: disable interrupts during console writes serial: qcom-geni: fix console corruption serial: qcom-geni: introduce qcom_geni_serial_poll_bitfield() serial: qcom-geni: fix arg types for qcom_geni_serial_poll_bit() soc: qcom: geni-se: add GP_LENGTH/IRQ_EN_SET/IRQ_EN_CLEAR registers serial: qcom-geni: fix false console tx restart serial: qcom-geni: fix fifo polling timeout tty: hvc: convert comma to semicolon mxser: convert comma to semicolon serial: 8250_bcm2835aux: Fix clock imbalance in PM resume serial: sc16is7xx: convert bitmask definitions to use BIT() macro serial: sc16is7xx: fix copy-paste errors in EFR_SWFLOWx_BIT constants serial: sc16is7xx: remove SC16IS7XX_MSR_DELTA_MASK serial: xilinx_uartps: Make cdns_rs485_supported static ...
| * \ \ \ \ \ Merge 6.11-rc4 into tty-nextGreg Kroah-Hartman2024-08-1920-38/+43
| |\ \ \ \ \ \ | | | |/ / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | | | Merge 6.11-rc3 into tty-nextGreg Kroah-Hartman2024-08-1213-59/+57
| |\ \ \ \ \ \ | | | |_|_|_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | We need the tty/serial fixes in here to build on top of. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | | | arm64: dts: mediatek: mt7981: add UART controllersRafał Miłecki2024-07-311-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MT7981 has three on-SoC UART controllers. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240727121447.1016-2-zajec5@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | | | | Merge tag 'rust-6.12' of https://github.com/Rust-for-Linux/linuxLinus Torvalds2024-09-252-1/+16
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull Rust updates from Miguel Ojeda: "Toolchain and infrastructure: - Support 'MITIGATION_{RETHUNK,RETPOLINE,SLS}' (which cleans up objtool warnings), teach objtool about 'noreturn' Rust symbols and mimic '___ADDRESSABLE()' for 'module_{init,exit}'. With that, we should be objtool-warning-free, so enable it to run for all Rust object files. - KASAN (no 'SW_TAGS'), KCFI and shadow call sanitizer support. - Support 'RUSTC_VERSION', including re-config and re-build on change. - Split helpers file into several files in a folder, to avoid conflicts in it. Eventually those files will be moved to the right places with the new build system. In addition, remove the need to manually export the symbols defined there, reusing existing machinery for that. - Relax restriction on configurations with Rust + GCC plugins to just the RANDSTRUCT plugin. 'kernel' crate: - New 'list' module: doubly-linked linked list for use with reference counted values, which is heavily used by the upcoming Rust Binder. This includes 'ListArc' (a wrapper around 'Arc' that is guaranteed unique for the given ID), 'AtomicTracker' (tracks whether a 'ListArc' exists using an atomic), 'ListLinks' (the prev/next pointers for an item in a linked list), 'List' (the linked list itself), 'Iter' (an iterator over a 'List'), 'Cursor' (a cursor into a 'List' that allows to remove elements), 'ListArcField' (a field exclusively owned by a 'ListArc'), as well as support for heterogeneous lists. - New 'rbtree' module: red-black tree abstractions used by the upcoming Rust Binder. This includes 'RBTree' (the red-black tree itself), 'RBTreeNode' (a node), 'RBTreeNodeReservation' (a memory reservation for a node), 'Iter' and 'IterMut' (immutable and mutable iterators), 'Cursor' (bidirectional cursor that allows to remove elements), as well as an entry API similar to the Rust standard library one. - 'init' module: add 'write_[pin_]init' methods and the 'InPlaceWrite' trait. Add the 'assert_pinned!' macro. - 'sync' module: implement the 'InPlaceInit' trait for 'Arc' by introducing an associated type in the trait. - 'alloc' module: add 'drop_contents' method to 'BoxExt'. - 'types' module: implement the 'ForeignOwnable' trait for 'Pin<Box<T>>' and improve the trait's documentation. In addition, add the 'into_raw' method to the 'ARef' type. - 'error' module: in preparation for the upcoming Rust support for 32-bit architectures, like arm, locally allow Clippy lint for those. Documentation: - https://rust.docs.kernel.org has been announced, so link to it. - Enable rustdoc's "jump to definition" feature, making its output a bit closer to the experience in a cross-referencer. - Debian Testing now also provides recent Rust releases (outside of the freeze period), so add it to the list. MAINTAINERS: - Trevor is joining as reviewer of the "RUST" entry. And a few other small bits" * tag 'rust-6.12' of https://github.com/Rust-for-Linux/linux: (54 commits) kasan: rust: Add KASAN smoke test via UAF kbuild: rust: Enable KASAN support rust: kasan: Rust does not support KHWASAN kbuild: rust: Define probing macros for rustc kasan: simplify and clarify Makefile rust: cfi: add support for CFI_CLANG with Rust cfi: add CONFIG_CFI_ICALL_NORMALIZE_INTEGERS rust: support for shadow call stack sanitizer docs: rust: include other expressions in conditional compilation section kbuild: rust: replace proc macros dependency on `core.o` with the version text kbuild: rust: rebuild if the version text changes kbuild: rust: re-run Kconfig if the version text changes kbuild: rust: add `CONFIG_RUSTC_VERSION` rust: avoid `box_uninit_write` feature MAINTAINERS: add Trevor Gross as Rust reviewer rust: rbtree: add `RBTree::entry` rust: rbtree: add cursor rust: rbtree: add mutable iterator rust: rbtree: add iterator rust: rbtree: add red-black tree implementation backed by the C version ...
| * | | | | | | rust: support for shadow call stack sanitizerAlice Ryhl2024-09-132-1/+16
| | |_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add all of the flags that are needed to support the shadow call stack (SCS) sanitizer with Rust, and updates Kconfig to allow only configurations that work. The -Zfixed-x18 flag is required to use SCS on arm64, and requires rustc version 1.80.0 or greater. This restriction is reflected in Kconfig. When CONFIG_DYNAMIC_SCS is enabled, the build will be configured to include unwind tables in the build artifacts. Dynamic SCS uses the unwind tables at boot to find all places that need to be patched. The -Cforce-unwind-tables=y flag ensures that unwind tables are available for Rust code. In non-dynamic mode, the -Zsanitizer=shadow-call-stack flag is what enables the SCS sanitizer. Using this flag requires rustc version 1.82.0 or greater on the targets used by Rust in the kernel. This restriction is reflected in Kconfig. It is possible to avoid the requirement of rustc 1.80.0 by using -Ctarget-feature=+reserve-x18 instead of -Zfixed-x18. However, this flag emits a warning during the build, so this patch does not add support for using it and instead requires 1.80.0 or greater. The dependency is placed on `select HAVE_RUST` to avoid a situation where enabling Rust silently turns off the sanitizer. Instead, turning on the sanitizer results in Rust being disabled. We generally do not want changes to CONFIG_RUST to result in any mitigations being changed or turned off. At the time of writing, rustc 1.82.0 only exists via the nightly release channel. There is a chance that the -Zsanitizer=shadow-call-stack flag will end up needing 1.83.0 instead, but I think it is small. Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240829-shadow-call-stack-v7-1-2f62a4432abf@google.com [ Fixed indentation using spaces. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
* | | | | | | Merge tag 'i2c-for-6.12-rc1' of ↵Linus Torvalds2024-09-231-0/+1
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: "I2C core: - finally remove the I2C_COMPAT symbol after 15 years of deprecation - lock client addresses during initialization to prevent race conditions between different kinds of instantiation - use scoped foreach OF child loops - testunit cleanups and documentation improvements, as well as two new tests, one for repeated start and one for triggering SMBusAlert interrupts I2C host drivers: - DesignWare and Renesas I2C driver updates. The first has has undergone through a series of cleanups that have been sent to the mailing list a year ago for the first time and finally get merged in this pull request. They are many, from typos (e.g. i2/i2c), to cosmetics, to refactoring (e.g. move inline functions to librarieas) and many others. - all the DesignWare Kconfig options have been grouped under the I2C_DESIGNWARE_CORE and this required some adaptation in many of the kernel configuration files for different arm and mips boards Cleanups: - improve the exit path in the runtime resume function for the Qualcomm Geni platform - get rid of the unused "target_addr" parameter in the Intel LJCA driver - intialize the restart_flag in the MediaTek controller in one single place - constify a few global data structures in the virtio driver - simplify the bus speed handling in the Renesas driver init function making it more readable - improved probe function of the Renesas R-Car driver - switch the iMX/MXC driver to use RUNTIME_PM_OPS() instead of SET_RUNTIME_PM_OPS() - iMX/MXC driver cleanups - use devm_clk_get_enabled() to simplify the Renesas EMEV2, Ingenic and MPC drivers Refactoring: - Fix a potential out of boundary array access in the Nuvoton driver. This is not a bug fix because the issue could never occur due to hardware not having the properties listed in the array. The change makes the driver more future proof and, at the same time, silences code analyzers. Improvements: - several patches improving the runtime power management handling of the Renesas I2C (riic) driver - use a more descriptive adapter name in the Intel i801 driver to show the presence of the IDF feature - kill pending transactions when irq's can't complete their handling in the Intel Denverton (ismt) driver, triggering a timeout New Feature: - support fast mode plus in the Renesas I2C (riic) driver New support: - Added support for: - Renesas R9A08G045 - Rockchip RK3576 - KEBA I2C - Theobroma Systems Mule Multiplexer. - new i2c-keba.c driver - new driver for The Mule i2c multiplexer Core I2C framework: - move runtime PM functions in order to allow them to be accessed during device add Devicetree: - nVidia and Qualcomm binding improvements - get rid of redundant "multi-master" property in the aspeed binding - convert i2c-sprd binding to YAML AT24 updates: - document a new model from giantec in DT bindings" * tag 'i2c-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (69 commits) i2c: designware: Use pci_get_drvdata() i2c: designware: Propagate firmware node i2c: designware: Uninline i2c_dw_probe() i2c: ljca: Remove unused "target_addr" parameter i2c: keba: Add KEBA I2C controller support i2c: i801: Use a different adapter-name for IDF adapters i2c: core: Setup i2c_adapter runtime-pm before calling device_add() dt-bindings: i2c: i2c-sprd: convert to YAML i2c: ismt: kill transaction in hardware on timeout i2c: designware: Group all DesignWare drivers under a single option net: txgbe: Fix I2C Kconfig dependencies RISC-V: configs: enable I2C_DESIGNWARE_CORE with I2C_DESIGNWARE_PLATFORM mips: configs: enable I2C_DESIGNWARE_CORE with I2C_DESIGNWARE_PLATFORM arm64: defconfig: enable I2C_DESIGNWARE_CORE with I2C_DESIGNWARE_PLATFORM ARM: configs: enable I2C_DESIGNWARE_CORE with I2C_DESIGNWARE_PLATFORM ARC: configs: enable I2C_DESIGNWARE_CORE with I2C_DESIGNWARE_PLATFORM i2c: virtio: Constify struct i2c_algorithm and struct virtio_device_id i2c: rcar: tidyup priv->devtype handling on rcar_i2c_probe() i2c: imx: Convert comma to semicolon i2c: jz4780: Use devm_clk_get_enabled() helpers ...
| * | | | | | | arm64: defconfig: enable I2C_DESIGNWARE_CORE with I2C_DESIGNWARE_PLATFORMHeikki Krogerus2024-09-101-0/+1
| | |_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dependency handling of the Synopsys DesignWare I2C adapter drivers is going to be changed so that the glue drivers for the PCI and platform buses depend on I2C_DESIGNWARE_CORE. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
* | | | | | | Merge tag 'pci-v6.12-changes' of ↵Linus Torvalds2024-09-231-0/+1
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Wait for device readiness after reset by polling Vendor ID and looking for Configuration RRS instead of polling the Command register and looking for non-error completions, to avoid hardware retries done for RRS on non-Vendor ID reads (Bjorn Helgaas) - Rename CRS Completion Status to RRS ('Request Retry Status') to match PCIe r6.0 spec usage (Bjorn Helgaas) - Clear LBMS bit after a manual link retrain so we don't try to retrain a link when there's no downstream device anymore (Maciej W. Rozycki) - Revert to the original link speed after retraining fails instead of leaving it restricted to 2.5GT/s, so a future device has a chance to use higher speeds (Maciej W. Rozycki) - Wait for each level of downstream bus, not just the first, to become accessible before restoring devices on that bus (Ilpo Järvinen) - Add ARCH_PCI_DEV_GROUPS so s390 can add its own attribute_groups without having to stomp on the core's pdev->dev.groups (Lukas Wunner) Driver binding: - Export pcim_request_region(), a managed counterpart of pci_request_region(), for use by drivers (Philipp Stanner) - Export pcim_iomap_region() and deprecate pcim_iomap_regions() (Philipp Stanner) - Request the PCI BAR used by xboxvideo (Philipp Stanner) - Request and map drm/ast BARs with pcim_iomap_region() (Philipp Stanner) MSI: - Add MSI_FLAG_NO_AFFINITY flag for devices that mux MSIs onto a single IRQ line and cannot set the affinity of each MSI to a specific CPU core (Marek Vasut) - Use MSI_FLAG_NO_AFFINITY and remove unnecessary .irq_set_affinity() implementations in aardvark, altera, brcmstb, dwc, mediatek-gen3, mediatek, mobiveil, plda, rcar, tegra, vmd, xilinx-nwl, xilinx-xdma, and xilinx drivers to avoid 'IRQ: set affinity failed' warnings (Marek Vasut) Power management: - Add pwrctl support for ATH11K inside the WCN6855 package (Konrad Dybcio) PCI device hotplug: - Remove unnecessary hpc_ops struct from shpchp (ngn) - Check for PCI_POSSIBLE_ERROR(), not 0xffffffff, in cpqphp (weiyufeng) Virtualization: - Mark Creative Labs EMU20k2 INTx masking as broken (Alex Williamson) - Add an ACS quirk for Qualcomm SA8775P, which doesn't advertise ACS but does provide ACS-like features (Subramanian Ananthanarayanan) IOMMU: - Add function 0 DMA alias quirk for Glenfly Arise audio function, which uses the function 0 Requester ID (WangYuli) NPEM: - Add Native PCIe Enclosure Management (NPEM) support for sysfs control of NVMe RAID storage indicators (ok/fail/locate/ rebuild/etc) (Mariusz Tkaczyk) - Add support for the ACPI _DSM PCIe SSD status LED management, which is functionally similar to NPEM but mediated by platform firmware (Mariusz Tkaczyk) Device trees: - Drop minItems and maxItems from ranges in PCI generic host binding since host bridges may have several MMIO and I/O port apertures (Frank Li) - Add kirin, rcar-gen2, uniphier DT binding top-level constraints for clocks (Krzysztof Kozlowski) Altera PCIe controller driver: - Convert altera DT bindings from text to YAML (Matthew Gerlach) - Replace TLP_REQ_ID() with macro PCI_DEVID(), which does the same thing and is what other drivers use (Jinjie Ruan) Broadcom STB PCIe controller driver: - Add DT binding maxItems for reset controllers (Jim Quinlan) - Use the 'bridge' reset method if described in the DT (Jim Quinlan) - Use the 'swinit' reset method if described in the DT (Jim Quinlan) - Add 'has_phy' so the existence of a 'rescal' reset controller doesn't imply software control of it (Jim Quinlan) - Add support for many inbound DMA windows (Jim Quinlan) - Rename SoC 'type' to 'soc_base' express the fact that SoCs come in families of multiple similar devices (Jim Quinlan) - Add Broadcom 7712 DT description and driver support (Jim Quinlan) - Sort enums, pcie_offsets[], pcie_cfg_data, .compatible strings for maintainability (Bjorn Helgaas) Freescale i.MX6 PCIe controller driver: - Add imx6q-pcie 'dbi2' and 'atu' reg-names for i.MX8M Endpoints (Richard Zhu) - Fix a code restructuring error that caused i.MX8MM and i.MX8MP Endpoints to fail to establish link (Richard Zhu) - Fix i.MX8MP Endpoint occasional failure to trigger MSI by enforcing outbound alignment requirement (Richard Zhu) - Call phy_power_off() in the .probe() error path (Frank Li) - Rename internal names from imx6_* to imx_* since i.MX7/8/9 are also supported (Frank Li) - Manage Refclk by using SoC-specific callbacks instead of switch statements (Frank Li) - Manage core reset by using SoC-specific callbacks instead of switch statements (Frank Li) - Expand comments for erratum ERR010728 workaround (Frank Li) - Use generic PHY APIs to configure mode, speed, and submode, which is harmless for devices that implement their own internal PHY management and don't set the generic imx_pcie->phy (Frank Li) - Add i.MX8Q (i.MX8QM, i.MX8QXP, and i.MX8DXL) DT binding and driver Root Complex support (Richard Zhu) Freescale Layerscape PCIe controller driver: - Replace layerscape-pcie DT binding compatible fsl,lx2160a-pcie with fsl,lx2160ar2-pcie (Frank Li) - Add layerscape-pcie DT binding deprecated 'num-viewport' property to address a DT checker warning (Frank Li) - Change layerscape-pcie DT binding 'fsl,pcie-scfg' to phandle-array (Frank Li) Loongson PCIe controller driver: - Increase max PCI hosts to 8 for Loongson-3C6000 and newer chipsets (Huacai Chen) Marvell Aardvark PCIe controller driver: - Fix issue with emulating Configuration RRS for two-byte reads of Vendor ID; previously it only worked for four-byte reads (Bjorn Helgaas) MediaTek PCIe Gen3 controller driver: - Add per-SoC struct mtk_gen3_pcie_pdata to support multiple SoC types (Lorenzo Bianconi) - Use reset_bulk APIs to manage PHY reset lines (Lorenzo Bianconi) - Add DT and driver support for Airoha EN7581 PCIe controller (Lorenzo Bianconi) Qualcomm PCIe controller driver: - Update qcom,pcie-sc7280 DT binding with eight interrupts (Rayyan Ansari) - Add back DT 'vddpe-3v3-supply', which was incorrectly removed earlier (Johan Hovold) - Drop endpoint redundant masking of global IRQ events (Manivannan Sadhasivam) - Clarify unknown global IRQ message and only log it once to avoid a flood (Manivannan Sadhasivam) - Add 'linux,pci-domain' property to endpoint DT binding (Manivannan Sadhasivam) - Assign PCI domain number for endpoint controllers (Manivannan Sadhasivam) - Add 'qcom_pcie_ep' and the PCI domain number to IRQ names for endpoint controller (Manivannan Sadhasivam) - Add global SPI interrupt for PCIe link events to DT binding (Manivannan Sadhasivam) - Add global RC interrupt handler to handle 'Link up' events and automatically enumerate hot-added devices (Manivannan Sadhasivam) - Avoid mirroring of DBI and iATU register space so it doesn't overlap BAR MMIO space (Prudhvi Yarlagadda) - Enable controller resources like PHY only after PERST# is deasserted to partially avoid the problem that the endpoint SoC crashes when accessing things when Refclk is absent (Manivannan Sadhasivam) - Add 16.0 GT/s equalization and RX lane margining settings (Shashank Babu Chinta Venkata) - Pass domain number to pci_bus_release_domain_nr() explicitly to avoid a NULL pointer dereference (Manivannan Sadhasivam) Renesas R-Car PCIe controller driver: - Make the read-only const array 'check_addr' static (Colin Ian King) - Add R-Car V4M (R8A779H0) PCIe host and endpoint to DT binding (Yoshihiro Shimoda) TI DRA7xx PCIe controller driver: - Request IRQF_ONESHOT for 'dra7xx-pcie-main' IRQ since the primary handler is NULL (Siddharth Vadapalli) - Handle IRQ request errors during root port and endpoint probe (Siddharth Vadapalli) TI J721E PCIe driver: - Add DT 'ti,syscon-acspcie-proxy-ctrl' and driver support to enable the ACSPCIE module to drive Refclk for the Endpoint (Siddharth Vadapalli) - Extract the cadence link setup from cdns_pcie_host_setup() so link setup can be done separately during resume (Thomas Richard) - Add T_PERST_CLK_US definition for the mandatory delay between Refclk becoming stable and PERST# being deasserted (Thomas Richard) - Add j721e suspend and resume support (Théo Lebrun) TI Keystone PCIe controller driver: - Fix NULL pointer checking when applying MRRS limitation quirk for AM65x SR 1.0 Errata #i2037 (Dan Carpenter) Xilinx NWL PCIe controller driver: - Fix off-by-one error in INTx IRQ handler that caused INTx interrupts to be lost or delivered as the wrong interrupt (Sean Anderson) - Rate-limit misc interrupt messages (Sean Anderson) - Turn off the clock on probe failure and device removal (Sean Anderson) - Add DT binding and driver support for enabling/disabling PHYs (Sean Anderson) - Add PCIe phy bindings for the ZCU102 (Sean Anderson) Xilinx XDMA PCIe controller driver: - Add support for Xilinx QDMA Soft IP PCIe Root Port Bridge to DT binding and xilinx-dma-pl driver (Thippeswamy Havalige) Miscellaneous: - Fix buffer overflow in kirin_pcie_parse_port() (Alexandra Diupina) - Fix minor kerneldoc issues and typos (Bjorn Helgaas) - Use PCI_DEVID() macro in aer_inject() instead of open-coding it (Jinjie Ruan) - Check pcie_find_root_port() return in x86 fixups to avoid NULL pointer dereferences (Samasth Norway Ananda) - Make pci_bus_type constant (Kunwu Chan) - Remove unused declarations of __pci_pme_wakeup() and pci_vpd_release() (Yue Haibing) - Remove any leftover .*.cmd files with make clean (zhang jiao) - Remove unused BILLION macro (zhang jiao)" * tag 'pci-v6.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (132 commits) PCI: Fix typos dt-bindings: PCI: qcom: Allow 'vddpe-3v3-supply' again tools: PCI: Remove unused BILLION macro tools: PCI: Remove .*.cmd files with make clean PCI: Pass domain number to pci_bus_release_domain_nr() explicitly PCI: dra7xx: Fix error handling when IRQ request fails in probe PCI: dra7xx: Fix threaded IRQ request for "dra7xx-pcie-main" IRQ PCI: qcom: Add RX lane margining settings for 16.0 GT/s PCI: qcom: Add equalization settings for 16.0 GT/s PCI: dwc: Always cache the maximum link speed value in dw_pcie::max_link_speed PCI: dwc: Rename 'dw_pcie::link_gen' to 'dw_pcie::max_link_speed' PCI: qcom-ep: Enable controller resources like PHY only after refclk is available PCI: Mark Creative Labs EMU20k2 INTx masking as broken dt-bindings: PCI: imx6q-pcie: Add reg-name "dbi2" and "atu" for i.MX8M PCIe Endpoint dt-bindings: PCI: altera: msi: Convert to YAML PCI: imx6: Add i.MX8Q PCIe Root Complex (RC) support PCI: Rename CRS Completion Status to RRS PCI: aardvark: Correct Configuration RRS checking PCI: Wait for device readiness with Configuration RRS PCI: brcmstb: Sort enums, pcie_offsets[], pcie_cfg_data, .compatible strings ...
| * | | | | | | arm64: zynqmp: Add PCIe phys property for ZCU102Sean Anderson2024-08-301-0/+1
| | |_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add PCIe phy bindings for the ZCU102. Link: https://lore.kernel.org/r/20240531161337.864994-8-sean.anderson@linux.dev Tested-by: Thippeswamy Havalige <thippeswamy.havalige@amd.com> Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Michal Simek <michal.simek@amd.com>
* | | | | | | Merge tag 'bpf-next-6.12' of ↵Linus Torvalds2024-09-211-217/+291
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Introduce '__attribute__((bpf_fastcall))' for helpers and kfuncs with corresponding support in LLVM. It is similar to existing 'no_caller_saved_registers' attribute in GCC/LLVM with a provision for backward compatibility. It allows compilers generate more efficient BPF code assuming the verifier or JITs will inline or partially inline a helper/kfunc with such attribute. bpf_cast_to_kern_ctx, bpf_rdonly_cast, bpf_get_smp_processor_id are the first set of such helpers. - Harden and extend ELF build ID parsing logic. When called from sleepable context the relevants parts of ELF file will be read to find and fetch .note.gnu.build-id information. Also harden the logic to avoid TOCTOU, overflow, out-of-bounds problems. - Improvements and fixes for sched-ext: - Allow passing BPF iterators as kfunc arguments - Make the pointer returned from iter_next method trusted - Fix x86 JIT convergence issue due to growing/shrinking conditional jumps in variable length encoding - BPF_LSM related: - Introduce few VFS kfuncs and consolidate them in fs/bpf_fs_kfuncs.c - Enforce correct range of return values from certain LSM hooks - Disallow attaching to other LSM hooks - Prerequisite work for upcoming Qdisc in BPF: - Allow kptrs in program provided structs - Support for gen_epilogue in verifier_ops - Important fixes: - Fix uprobe multi pid filter check - Fix bpf_strtol and bpf_strtoul helpers - Track equal scalars history on per-instruction level - Fix tailcall hierarchy on x86 and arm64 - Fix signed division overflow to prevent INT_MIN/-1 trap on x86 - Fix get kernel stack in BPF progs attached to tracepoint:syscall - Selftests: - Add uprobe bench/stress tool - Generate file dependencies to drastically improve re-build time - Match JIT-ed and BPF asm with __xlated/__jited keywords - Convert older tests to test_progs framework - Add support for RISC-V - Few fixes when BPF programs are compiled with GCC-BPF backend (support for GCC-BPF in BPF CI is ongoing in parallel) - Add traffic monitor - Enable cross compile and musl libc * tag 'bpf-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (260 commits) btf: require pahole 1.21+ for DEBUG_INFO_BTF with default DWARF version btf: move pahole check in scripts/link-vmlinux.sh to lib/Kconfig.debug btf: remove redundant CONFIG_BPF test in scripts/link-vmlinux.sh bpf: Call the missed kfree() when there is no special field in btf bpf: Call the missed btf_record_free() when map creation fails selftests/bpf: Add a test case to write mtu result into .rodata selftests/bpf: Add a test case to write strtol result into .rodata selftests/bpf: Rename ARG_PTR_TO_LONG test description selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error bpf: Improve check_raw_mode_ok test for MEM_UNINIT-tagged types bpf: Fix helper writes to read-only maps bpf: Remove truncation test in bpf_strtol and bpf_strtoul helpers bpf: Fix bpf_strtol and bpf_strtoul helpers for 32bit selftests/bpf: Add tests for sdiv/smod overflow cases bpf: Fix a sdiv overflow issue libbpf: Add bpf_object__token_fd accessor docs/bpf: Add missing BPF program types to docs docs/bpf: Add constant values for linkages bpf: Use fake pt_regs when doing bpf syscall tracepoint tracing ...
| * | | | | | | bpf, arm64: Jit BPF_CALL to direct call when possibleXu Kuohai2024-09-041-16/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, BPF_CALL is always jited to indirect call. When target is within the range of direct call, BPF_CALL can be jited to direct call. For example, the following BPF_CALL call __htab_map_lookup_elem is always jited to indirect call: mov x10, #0xffffffffffff18f4 movk x10, #0x821, lsl #16 movk x10, #0x8000, lsl #32 blr x10 When the address of target __htab_map_lookup_elem is within the range of direct call, the BPF_CALL can be jited to: bl 0xfffffffffd33bc98 This patch does such jit optimization by emitting arm64 direct calls for BPF_CALL when possible, indirect calls otherwise. Without this patch, the jit works as follows. 1. First pass A. Determine jited position and size for each bpf instruction. B. Computed the jited image size. 2. Allocate jited image with size computed in step 1. 3. Second pass A. Adjust jump offset for jump instructions B. Write the final image. This works because, for a given bpf prog, regardless of where the jited image is allocated, the jited result for each instruction is fixed. The second pass differs from the first only in adjusting the jump offsets, like changing "jmp imm1" to "jmp imm2", while the position and size of the "jmp" instruction remain unchanged. Now considering whether to jit BPF_CALL to arm64 direct or indirect call instruction. The choice depends solely on the jump offset: direct call if the jump offset is within 128MB, indirect call otherwise. For a given BPF_CALL, the target address is known, so the jump offset is decided by the jited address of the BPF_CALL instruction. In other words, for a given bpf prog, the jited result for each BPF_CALL is determined by its jited address. The jited address for a BPF_CALL is the jited image address plus the total jited size of all preceding instructions. For a given bpf prog, there are clearly no BPF_CALL instructions before the first BPF_CALL instruction. Since the jited result for all other instructions other than BPF_CALL are fixed, the total jited size preceding the first BPF_CALL is also fixed. Therefore, once the jited image is allocated, the jited address for the first BPF_CALL is fixed. Now that the jited result for the first BPF_CALL is fixed, the jited results for all instructions preceding the second BPF_CALL are fixed. So the jited address and result for the second BPF_CALL are also fixed. Similarly, we can conclude that the jited addresses and results for all subsequent BPF_CALL instructions are fixed. This means that, for a given bpf prog, once the jited image is allocated, the jited address and result for all instructions, including all BPF_CALL instructions, are fixed. Based on the observation, with this patch, the jit works as follows. 1. First pass Estimate the maximum jited image size. In this pass, all BPF_CALLs are jited to arm64 indirect calls since the jump offsets are unknown because the jited image is not allocated. 2. Allocate jited image with size estimated in step 1. 3. Second pass A. Determine the jited result for each BPF_CALL. B. Determine jited address and size for each bpf instruction. 4. Third pass A. Adjust jump offset for jump instructions. B. Write the final image. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20240903094407.601107-1-xukuohai@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | | | | | bpf, arm64: Avoid blindly saving/restoring all callee-saved registersXu Kuohai2024-08-281-111/+183
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The arm64 jit blindly saves/restores all callee-saved registers, making the jited result looks a bit too compliated. For example, for an empty prog, the jited result is: 0: bti jc 4: mov x9, lr 8: nop c: paciasp 10: stp fp, lr, [sp, #-16]! 14: mov fp, sp 18: stp x19, x20, [sp, #-16]! 1c: stp x21, x22, [sp, #-16]! 20: stp x26, x25, [sp, #-16]! 24: mov x26, #0 28: stp x26, x25, [sp, #-16]! 2c: mov x26, sp 30: stp x27, x28, [sp, #-16]! 34: mov x25, sp 38: bti j // tailcall target 3c: sub sp, sp, #0 40: mov x7, #0 44: add sp, sp, #0 48: ldp x27, x28, [sp], #16 4c: ldp x26, x25, [sp], #16 50: ldp x26, x25, [sp], #16 54: ldp x21, x22, [sp], #16 58: ldp x19, x20, [sp], #16 5c: ldp fp, lr, [sp], #16 60: mov x0, x7 64: autiasp 68: ret Clearly, there is no need to save/restore unused callee-saved registers. This patch does this change, making the jited image to only save/restore the callee-saved registers it uses. Now the jited result of empty prog is: 0: bti jc 4: mov x9, lr 8: nop c: paciasp 10: stp fp, lr, [sp, #-16]! 14: mov fp, sp 18: stp xzr, x26, [sp, #-16]! 1c: mov x26, sp 20: bti j // tailcall target 24: mov x7, #0 28: ldp xzr, x26, [sp], #16 2c: ldp fp, lr, [sp], #16 30: mov x0, x7 34: autiasp 38: ret Since bpf prog saves/restores its own callee-saved registers as needed, to make tailcall work correctly, the caller needs to restore its saved registers before tailcall, and the callee needs to save its callee-saved registers after tailcall. This extra restoring/saving instructions increases preformance overhead. [1] provides 2 benchmarks for tailcall scenarios. Below is the perf number measured in an arm64 KVM guest. The result indicates that the performance difference before and after the patch in typical tailcall scenarios is negligible. - Before: Performance counter stats for './test_progs -t tailcalls' (5 runs): 4313.43 msec task-clock # 0.874 CPUs utilized ( +- 0.16% ) 574 context-switches # 133.073 /sec ( +- 1.14% ) 0 cpu-migrations # 0.000 /sec 538 page-faults # 124.727 /sec ( +- 0.57% ) 10697772784 cycles # 2.480 GHz ( +- 0.22% ) (61.19%) 25511241955 instructions # 2.38 insn per cycle ( +- 0.08% ) (66.70%) 5108910557 branches # 1.184 G/sec ( +- 0.08% ) (72.38%) 2800459 branch-misses # 0.05% of all branches ( +- 0.51% ) (72.36%) TopDownL1 # 0.60 retiring ( +- 0.09% ) (66.84%) # 0.21 frontend_bound ( +- 0.15% ) (61.31%) # 0.12 bad_speculation ( +- 0.08% ) (50.11%) # 0.07 backend_bound ( +- 0.16% ) (33.30%) 8274201819 L1-dcache-loads # 1.918 G/sec ( +- 0.18% ) (33.15%) 468268 L1-dcache-load-misses # 0.01% of all L1-dcache accesses ( +- 4.69% ) (33.16%) 385383 LLC-loads # 89.345 K/sec ( +- 5.22% ) (33.16%) 38296 LLC-load-misses # 9.94% of all LL-cache accesses ( +- 42.52% ) (38.69%) 6886576501 L1-icache-loads # 1.597 G/sec ( +- 0.35% ) (38.69%) 1848585 L1-icache-load-misses # 0.03% of all L1-icache accesses ( +- 4.52% ) (44.23%) 9043645883 dTLB-loads # 2.097 G/sec ( +- 0.10% ) (44.33%) 416672 dTLB-load-misses # 0.00% of all dTLB cache accesses ( +- 5.15% ) (49.89%) 6925626111 iTLB-loads # 1.606 G/sec ( +- 0.35% ) (55.46%) 66220 iTLB-load-misses # 0.00% of all iTLB cache accesses ( +- 1.88% ) (55.50%) <not supported> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 4.9372 +- 0.0526 seconds time elapsed ( +- 1.07% ) Performance counter stats for './test_progs -t flow_dissector' (5 runs): 10924.50 msec task-clock # 0.945 CPUs utilized ( +- 0.08% ) 603 context-switches # 55.197 /sec ( +- 1.13% ) 0 cpu-migrations # 0.000 /sec 566 page-faults # 51.810 /sec ( +- 0.42% ) 27381270695 cycles # 2.506 GHz ( +- 0.18% ) (60.46%) 56996583922 instructions # 2.08 insn per cycle ( +- 0.21% ) (66.11%) 10321647567 branches # 944.816 M/sec ( +- 0.17% ) (71.79%) 3347735 branch-misses # 0.03% of all branches ( +- 3.72% ) (72.15%) TopDownL1 # 0.52 retiring ( +- 0.13% ) (66.74%) # 0.27 frontend_bound ( +- 0.14% ) (61.27%) # 0.14 bad_speculation ( +- 0.19% ) (50.36%) # 0.07 backend_bound ( +- 0.42% ) (33.89%) 18740797617 L1-dcache-loads # 1.715 G/sec ( +- 0.43% ) (33.71%) 13715669 L1-dcache-load-misses # 0.07% of all L1-dcache accesses ( +- 32.85% ) (33.34%) 4087551 LLC-loads # 374.164 K/sec ( +- 29.53% ) (33.26%) 267906 LLC-load-misses # 6.55% of all LL-cache accesses ( +- 23.90% ) (38.76%) 15811864229 L1-icache-loads # 1.447 G/sec ( +- 0.12% ) (38.73%) 2976833 L1-icache-load-misses # 0.02% of all L1-icache accesses ( +- 9.73% ) (44.22%) 20138907471 dTLB-loads # 1.843 G/sec ( +- 0.18% ) (44.15%) 732850 dTLB-load-misses # 0.00% of all dTLB cache accesses ( +- 11.18% ) (49.64%) 15895726702 iTLB-loads # 1.455 G/sec ( +- 0.15% ) (55.13%) 152075 iTLB-load-misses # 0.00% of all iTLB cache accesses ( +- 4.71% ) (54.98%) <not supported> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 11.5613 +- 0.0317 seconds time elapsed ( +- 0.27% ) - After: Performance counter stats for './test_progs -t tailcalls' (5 runs): 4278.78 msec task-clock # 0.871 CPUs utilized ( +- 0.15% ) 569 context-switches # 132.982 /sec ( +- 0.58% ) 0 cpu-migrations # 0.000 /sec 539 page-faults # 125.970 /sec ( +- 0.43% ) 10588986432 cycles # 2.475 GHz ( +- 0.20% ) (60.91%) 25303825043 instructions # 2.39 insn per cycle ( +- 0.08% ) (66.48%) 5110756256 branches # 1.194 G/sec ( +- 0.07% ) (72.03%) 2719569 branch-misses # 0.05% of all branches ( +- 2.42% ) (72.03%) TopDownL1 # 0.60 retiring ( +- 0.22% ) (66.31%) # 0.22 frontend_bound ( +- 0.21% ) (60.83%) # 0.12 bad_speculation ( +- 0.26% ) (50.25%) # 0.06 backend_bound ( +- 0.17% ) (33.52%) 8163648527 L1-dcache-loads # 1.908 G/sec ( +- 0.33% ) (33.52%) 694979 L1-dcache-load-misses # 0.01% of all L1-dcache accesses ( +- 30.53% ) (33.52%) 1902347 LLC-loads # 444.600 K/sec ( +- 48.84% ) (33.69%) 96677 LLC-load-misses # 5.08% of all LL-cache accesses ( +- 43.48% ) (39.30%) 6863517589 L1-icache-loads # 1.604 G/sec ( +- 0.37% ) (39.17%) 1871519 L1-icache-load-misses # 0.03% of all L1-icache accesses ( +- 6.78% ) (44.56%) 8927782813 dTLB-loads # 2.087 G/sec ( +- 0.14% ) (44.37%) 438237 dTLB-load-misses # 0.00% of all dTLB cache accesses ( +- 6.00% ) (49.75%) 6886906831 iTLB-loads # 1.610 G/sec ( +- 0.36% ) (55.08%) 67568 iTLB-load-misses # 0.00% of all iTLB cache accesses ( +- 3.27% ) (54.86%) <not supported> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 4.9114 +- 0.0309 seconds time elapsed ( +- 0.63% ) Performance counter stats for './test_progs -t flow_dissector' (5 runs): 10948.40 msec task-clock # 0.942 CPUs utilized ( +- 0.05% ) 615 context-switches # 56.173 /sec ( +- 1.65% ) 1 cpu-migrations # 0.091 /sec ( +- 31.62% ) 567 page-faults # 51.788 /sec ( +- 0.44% ) 27334194328 cycles # 2.497 GHz ( +- 0.08% ) (61.05%) 56656528828 instructions # 2.07 insn per cycle ( +- 0.08% ) (66.67%) 10270389422 branches # 938.072 M/sec ( +- 0.10% ) (72.21%) 3453837 branch-misses # 0.03% of all branches ( +- 3.75% ) (72.27%) TopDownL1 # 0.52 retiring ( +- 0.16% ) (66.55%) # 0.27 frontend_bound ( +- 0.09% ) (60.91%) # 0.14 bad_speculation ( +- 0.08% ) (49.85%) # 0.07 backend_bound ( +- 0.16% ) (33.33%) 18982866028 L1-dcache-loads # 1.734 G/sec ( +- 0.24% ) (33.34%) 8802454 L1-dcache-load-misses # 0.05% of all L1-dcache accesses ( +- 52.30% ) (33.31%) 2612962 LLC-loads # 238.661 K/sec ( +- 29.78% ) (33.45%) 264107 LLC-load-misses # 10.11% of all LL-cache accesses ( +- 18.34% ) (39.07%) 15793205997 L1-icache-loads # 1.443 G/sec ( +- 0.15% ) (39.09%) 3930802 L1-icache-load-misses # 0.02% of all L1-icache accesses ( +- 3.72% ) (44.66%) 20097828496 dTLB-loads # 1.836 G/sec ( +- 0.09% ) (44.68%) 961757 dTLB-load-misses # 0.00% of all dTLB cache accesses ( +- 3.32% ) (50.15%) 15838728506 iTLB-loads # 1.447 G/sec ( +- 0.09% ) (55.62%) 167652 iTLB-load-misses # 0.00% of all iTLB cache accesses ( +- 1.28% ) (55.52%) <not supported> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 11.6173 +- 0.0268 seconds time elapsed ( +- 0.23% ) [1] https://lore.kernel.org/bpf/20200724123644.5096-1-maciej.fijalkowski@intel.com/ Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20240826071624.350108-3-xukuohai@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | | | | | bpf, arm64: Get rid of fpbXu Kuohai2024-08-281-93/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bpf prog accesses stack using BPF_FP as the base address and a negative immediate number as offset. But arm64 ldr/str instructions only support non-negative immediate number as offset. To simplify the jited result, commit 5b3d19b9bd40 ("bpf, arm64: Adjust the offset of str/ldr(immediate) to positive number") introduced FPB to represent the lowest stack address that the bpf prog being jited may access, and with this address as the baseline, it converts BPF_FP plus negative immediate offset number to FPB plus non-negative immediate offset. Considering that for a given bpf prog, the jited stack space is fixed with A64_SP as the lowest address and BPF_FP as the highest address. Thus we can get rid of FPB and converts BPF_FP plus negative immediate offset to A64_SP plus non-negative immediate offset. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20240826071624.350108-2-xukuohai@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfAlexei Starovoitov2024-08-2238-108/+149
| |\ \ \ \ \ \ \ | | | |_|/ / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge bpf fixes after downstream PR including important fixes (from bpf-next point of view): commit 41c24102af7b ("selftests/bpf: Filter out _GNU_SOURCE when compiling test_cpp") commit fdad456cbcca ("bpf: Fix updating attached freplace prog in prog_array map") No conflicts. Adjacent changes in: include/linux/bpf_verifier.h kernel/bpf/verifier.c tools/testing/selftests/bpf/Makefile Link: https://lore.kernel.org/bpf/20240813234307.82773-1-alexei.starovoitov@gmail.com/ Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | | | | | bpf, arm64: Fix tailcall hierarchyLeon Hwang2024-07-291-16/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a tailcall issue caused by abusing the tailcall in bpf2bpf feature on arm64 like the way of "bpf, x64: Fix tailcall hierarchy". On arm64, when a tail call happens, it uses tail_call_cnt_ptr to increment tail_call_cnt, too. At the prologue of main prog, it has to initialize tail_call_cnt and prepare tail_call_cnt_ptr. At the prologue of subprog, it pushes x26 register twice, and does not initialize tail_call_cnt. At the epilogue, it pops x26 twice, no matter whether it is main prog or subprog. Fixes: d4609a5d8c70 ("bpf, arm64: Keep tail call count across bpf2bpf calls") Acked-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Link: https://lore.kernel.org/r/20240714123902.32305-3-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
* | | | | | | | Merge tag 'mm-nonmm-stable-2024-09-21-07-52' of ↵Linus Torvalds2024-09-211-1/+4
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Many singleton patches - please see the various changelogs for details. Quite a lot of nilfs2 work this time around. Notable patch series in this pull request are: - "mul_u64_u64_div_u64: new implementation" by Nicolas Pitre, with assistance from Uwe Kleine-König. Reimplement mul_u64_u64_div_u64() to provide (much) more accurate results. The current implementation was causing Uwe some issues in the PWM drivers. - "xz: Updates to license, filters, and compression options" from Lasse Collin. Miscellaneous maintenance and kinor feature work to the xz decompressor. - "Fix some GDB command error and add some GDB commands" from Kuan-Ying Lee. Fixes and enhancements to the gdb scripts. - "treewide: add missing MODULE_DESCRIPTION() macros" from Jeff Johnson. Adds lots of MODULE_DESCRIPTIONs, thus fixing lots of warnings about this. - "nilfs2: add support for some common ioctls" from Ryusuke Konishi. Adds various commonly-available ioctls to nilfs2. - "This series fixes a number of formatting issues in kernel doc comments" from Ryusuke Konishi does that. - "nilfs2: prevent unexpected ENOENT propagation" from Ryusuke Konishi. Fix issues where -ENOENT was being unintentionally and inappropriately returned to userspace. - "nilfs2: assorted cleanups" from Huang Xiaojia. - "nilfs2: fix potential issues with empty b-tree nodes" from Ryusuke Konishi fixes some issues which can occur on corrupted nilfs2 filesystems. - "scripts/decode_stacktrace.sh: improve error reporting and usability" from Luca Ceresoli does those things" * tag 'mm-nonmm-stable-2024-09-21-07-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (103 commits) list: test: increase coverage of list_test_list_replace*() list: test: fix tests for list_cut_position() proc: use __auto_type more treewide: correct the typo 'retun' ocfs2: cleanup return value and mlog in ocfs2_global_read_info() nilfs2: remove duplicate 'unlikely()' usage nilfs2: fix potential oob read in nilfs_btree_check_delete() nilfs2: determine empty node blocks as corrupted nilfs2: fix potential null-ptr-deref in nilfs_btree_insert() user_namespace: use kmemdup_array() instead of kmemdup() for multiple allocation tools/mm: rm thp_swap_allocator_test when make clean squashfs: fix percpu address space issues in decompressor_multi_percpu.c lib: glob.c: added null check for character class nilfs2: refactor nilfs_segctor_thread() nilfs2: use kthread_create and kthread_stop for the log writer thread nilfs2: remove sc_timer_task nilfs2: do not repair reserved inode bitmap in nilfs_new_inode() nilfs2: eliminate the shared counter and spinlock for i_generation nilfs2: separate inode type information from i_state field nilfs2: use the BITS_PER_LONG macro ...
| * | | | | | | | arm64: boot: add Image.xz supportLasse Collin2024-09-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Image.* targets existed for other compressors already. Bootloader support is needed for decompression. This is for CONFIG_EFI_ZBOOT=n. With CONFIG_EFI_ZBOOT=y, XZ was already available. Link: https://lkml.kernel.org/r/20240721133633.47721-16-lasse.collin@tukaani.org Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Cc: Simon Glass <sjg@chromium.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Jules Maselbas <jmaselbas@zdiv.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Emil Renner Berthing <emil.renner.berthing@canonical.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Joel Stanley <joel@jms.id.au> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jubin Zhong <zhongjubin@huawei.com> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rui Li <me@lirui.org> Cc: Sam James <sam@gentoo.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | | | | | | | | Merge tag 'mm-stable-2024-09-20-02-31' of ↵Linus Torvalds2024-09-216-15/+35
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Along with the usual shower of singleton patches, notable patch series in this pull request are: - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds consistency to the APIs and behaviour of these two core allocation functions. This also simplifies/enables Rustification. - "Some cleanups for shmem" from Baolin Wang. No functional changes - mode code reuse, better function naming, logic simplifications. - "mm: some small page fault cleanups" from Josef Bacik. No functional changes - code cleanups only. - "Various memory tiering fixes" from Zi Yan. A small fix and a little cleanup. - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and simplifications and .text shrinkage. - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt. This is a feature, it adds new feilds to /proc/vmstat such as $ grep kstack /proc/vmstat kstack_1k 3 kstack_2k 188 kstack_4k 11391 kstack_8k 243 kstack_16k 0 which tells us that 11391 processes used 4k of stack while none at all used 16k. Useful for some system tuning things, but partivularly useful for "the dynamic kernel stack project". - "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory. - "mm: memcg: page counters optimizations" from Roman Gushchin. "3 independent small optimizations of page counters". - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David Hildenbrand. Improves PTE/PMD splitlock detection, makes powerpc/8xx work correctly by design rather than by accident. - "mm: remove arch_make_page_accessible()" from David Hildenbrand. Some folio conversions which make arch_make_page_accessible() unneeded. - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel. Cleans up and fixes our handling of the resetting of the cgroup/process peak-memory-use detector. - "Make core VMA operations internal and testable" from Lorenzo Stoakes. Rationalizaion and encapsulation of the VMA manipulation APIs. With a view to better enable testing of the VMA functions, even from a userspace-only harness. - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix issues in the zswap global shrinker, resulting in improved performance. - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill in some missing info in /proc/zoneinfo. - "mm: replace follow_page() by folio_walk" from David Hildenbrand. Code cleanups and rationalizations (conversion to folio_walk()) resulting in the removal of follow_page(). - "improving dynamic zswap shrinker protection scheme" from Nhat Pham. Some tuning to improve zswap's dynamic shrinker. Significant reductions in swapin and improvements in performance are shown. - "mm: Fix several issues with unaccepted memory" from Kirill Shutemov. Improvements to the new unaccepted memory feature, - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on DAX PUDs. This was missing, although nobody seems to have notied yet. - "Introduce a store type enum for the Maple tree" from Sidhartha Kumar. Cleanups and modest performance improvements for the maple tree library code. - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move more cgroup v1 remnants away from the v2 memcg code. - "memcg: initiate deprecation of v1 features" from Shakeel Butt. Adds various warnings telling users that memcg v1 features are deprecated. - "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li. Greatly improves the success rate of the mTHP swap allocation. - "mm: introduce numa_memblks" from Mike Rapoport. Moves various disparate per-arch implementations of numa_memblk code into generic code. - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly improves the performance of munmap() of swap-filled ptes. - "support large folio swap-out and swap-in for shmem" from Baolin Wang. With this series we no longer split shmem large folios into simgle-page folios when swapping out shmem. - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice performance improvements and code reductions for gigantic folios. - "support shmem mTHP collapse" from Baolin Wang. Adds support for khugepaged's collapsing of shmem mTHP folios. - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect() performance regression due to the addition of mseal(). - "Increase the number of bits available in page_type" from Matthew Wilcox. Increases the number of bits available in page_type! - "Simplify the page flags a little" from Matthew Wilcox. Many legacy page flags are now folio flags, so the page-based flags and their accessors/mutators can be removed. - "mm: store zero pages to be swapped out in a bitmap" from Usama Arif. An optimization which permits us to avoid writing/reading zero-filled zswap pages to backing store. - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race window which occurs when a MAP_FIXED operqtion is occurring during an unrelated vma tree walk. - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of the vma_merge() functionality, making ot cleaner, more testable and better tested. - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park. Minor fixups of DAMON selftests and kunit tests. - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang. Code cleanups and folio conversions. - "Shmem mTHP controls and stats improvements" from Ryan Roberts. Cleanups for shmem controls and stats. - "mm: count the number of anonymous THPs per size" from Barry Song. Expose additional anon THP stats to userspace for improved tuning. - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio conversions and removal of now-unused page-based APIs. - "replace per-quota region priorities histogram buffer with per-context one" from SeongJae Park. DAMON histogram rationalization. - "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae Park. DAMON documentation updates. - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve related doc and warn" from Jason Wang: fixes usage of page allocator __GFP_NOFAIL and GFP_ATOMIC flags. - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy. This was overprovisioning THPs in sparsely accessed memory areas. - "zram: introduce custom comp backends API" frm Sergey Senozhatsky. Add support for zram run-time compression algorithm tuning. - "mm: Care about shadow stack guard gap when getting an unmapped area" from Mark Brown. Fix up the various arch_get_unmapped_area() implementations to better respect guard areas. - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability of mem_cgroup_iter() and various code cleanups. - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge pfnmap support. - "resource: Fix region_intersects() vs add_memory_driver_managed()" from Huang Ying. Fix a bug in region_intersects() for systems with CXL memory. - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches a couple more code paths to correctly recover from the encountering of poisoned memry. - "mm: enable large folios swap-in support" from Barry Song. Support the swapin of mTHP memory into appropriately-sized folios, rather than into single-page folios" * tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits) zram: free secondary algorithms names uprobes: turn xol_area->pages[2] into xol_area->page uprobes: introduce the global struct vm_special_mapping xol_mapping Revert "uprobes: use vm_special_mapping close() functionality" mm: support large folios swap-in for sync io devices mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios mm: fix swap_read_folio_zeromap() for large folios with partial zeromap mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries set_memory: add __must_check to generic stubs mm/vma: return the exact errno in vms_gather_munmap_vmas() memcg: cleanup with !CONFIG_MEMCG_V1 mm/show_mem.c: report alloc tags in human readable units mm: support poison recovery from copy_present_page() mm: support poison recovery from do_cow_fault() resource, kunit: add test case for region_intersects() resource: make alloc_free_mem_region() works for iomem_resource mm: z3fold: deprecate CONFIG_Z3FOLD vfio/pci: implement huge_fault support mm/arm64: support large pfn mappings mm/x86: support large pfn mappings ...
| * | | | | | | | | mm/arm64: support large pfn mappingsPeter Xu2024-09-172-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support huge pfnmaps by using bit 56 (PTE_SPECIAL) for "special" on pmds/puds. Provide the pmd/pud helpers to set/get special bit. There's one more thing missing for arm64 which is the pxx_pgprot() for pmd/pud. Add them too, which is mostly the same as the pte version by dropping the pfn field. These helpers are essential to be used in the new follow_pfnmap*() API to report valid pgprot_t results. Note that arm64 doesn't yet support huge PUD yet, but it's still straightforward to provide the pud helpers that we need altogether. Only PMD helpers will make an immediate benefit until arm64 will support huge PUDs first in general (e.g. in THPs). Link: https://lkml.kernel.org/r/20240826204353.2228736-19-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | | | | | | mm: always define pxx_pgprot()Peter Xu2024-09-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There're: - 8 archs (arc, arm64, include, mips, powerpc, s390, sh, x86) that support pte_pgprot(). - 2 archs (x86, sparc) that support pmd_pgprot(). - 1 arch (x86) that support pud_pgprot(). Always define them to be used in generic code, and then we don't need to fiddle with "#ifdef"s when doing so. Link: https://lkml.kernel.org/r/20240826204353.2228736-9-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | | | | | | x86: remove PG_uncachedMatthew Wilcox (Oracle)2024-09-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert x86 to use PG_arch_2 instead of PG_uncached and remove PG_uncached. Link: https://lkml.kernel.org/r/20240821193445.2294269-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | | | | | | arch, mm: move definition of node_data to generic codeMike Rapoport (Microsoft)2024-09-043-13/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Every architecture that supports NUMA defines node_data in the same way: struct pglist_data *node_data[MAX_NUMNODES]; No reason to keep multiple copies of this definition and its forward declarations, especially when such forward declaration is the only thing in include/asm/mmzone.h for many architectures. Add definition and declaration of node_data to generic code and drop architecture-specific versions. Link: https://lkml.kernel.org/r/20240807064110.1003856-8-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | | | | | | mm: kvmalloc: align kvrealloc() with krealloc()Danilo Krummrich2024-09-021-1/+0
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Besides the obvious (and desired) difference between krealloc() and kvrealloc(), there is some inconsistency in their function signatures and behavior: - krealloc() frees the memory when the requested size is zero, whereas kvrealloc() simply returns a pointer to the existing allocation. - krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas kvrealloc() does not accept a NULL pointer at all and, if passed, would fault instead. - krealloc() is self-contained, whereas kvrealloc() relies on the caller to provide the size of the previous allocation. Inconsistent behavior throughout allocation APIs is error prone, hence make kvrealloc() behave like krealloc(), which seems superior in all mentioned aspects. Besides that, implementing kvrealloc() by making use of krealloc() and vrealloc() provides oppertunities to grow (and shrink) allocations more efficiently. For instance, vrealloc() can be optimized to allocate and map additional pages to grow the allocation or unmap and free unused pages to shrink the allocation. [dakr@kernel.org: document concurrency restrictions] Link: https://lkml.kernel.org/r/20240725125442.4957-1-dakr@kernel.org [dakr@kernel.org: disable KASAN when switching to vmalloc] Link: https://lkml.kernel.org/r/20240730185049.6244-2-dakr@kernel.org [dakr@kernel.org: properly document __GFP_ZERO behavior] Link: https://lkml.kernel.org/r/20240730185049.6244-5-dakr@kernel.org Link: https://lkml.kernel.org/r/20240722163111.4766-3-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Christian König <christian.koenig@amd.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <kees@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | | | | | | | | Merge tag 'sched-rt-2024-09-17' of ↵Linus Torvalds2024-09-201-0/+1
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RT enablement from Thomas Gleixner: "Enable PREEMPT_RT on supported architectures: After twenty years of development we finally reached the point to enable PREEMPT_RT support in the mainline kernel. All prerequisites are merged, so enable it on the supported architectures ARM64, RISCV and X86(32/64-bit)" * tag 'sched-rt-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: riscv: Allow to enable PREEMPT_RT. arm64: Allow to enable PREEMPT_RT. x86: Allow to enable PREEMPT_RT.
| * | | | | | | | | arm64: Allow to enable PREEMPT_RT.Sebastian Andrzej Siewior2024-09-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is really time. arm64 has all the required architecture related changes, that have been identified over time, in order to enable PREEMPT_RT. With the recent printk changes, the last known road block has been addressed. Allow to enable PREEMPT_RT on arm64. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/all/20240906111841.562402-3-bigeasy@linutronix.de
* | | | | | | | | | Merge tag 'dma-mapping-6.12-2024-09-19' of ↵Linus Torvalds2024-09-192-21/+19
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - support DMA zones for arm64 systems where memory starts at > 4GB (Baruch Siach, Catalin Marinas) - support direct calls into dma-iommu and thus obsolete dma_map_ops for many common configurations (Leon Romanovsky) - add DMA-API tracing (Sean Anderson) - remove the not very useful return value from various dma_set_* APIs (Christoph Hellwig) - misc cleanups and minor optimizations (Chen Y, Yosry Ahmed, Christoph Hellwig) * tag 'dma-mapping-6.12-2024-09-19' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: reflow dma_supported dma-mapping: reliably inform about DMA support for IOMMU dma-mapping: add tracing for dma-mapping API calls dma-mapping: use IOMMU DMA calls for common alloc/free page calls dma-direct: optimize page freeing when it is not addressable dma-mapping: clearly mark DMA ops as an architecture feature vdpa_sim: don't select DMA_OPS arm64: mm: keep low RAM dma zone dma-mapping: don't return errors from dma_set_max_seg_size dma-mapping: don't return errors from dma_set_seg_boundary dma-mapping: don't return errors from dma_set_min_align_mask scsi: check that busses support the DMA API before setting dma parameters arm64: mm: fix DMA zone when dma-ranges is missing dma-mapping: direct calls for dma-iommu dma-mapping: call ->unmap_page and ->unmap_sg unconditionally arm64: support DMA zone above 4GB dma-mapping: replace zone_dma_bits by zone_dma_limit dma-mapping: use bit masking to check VM_DMA_COHERENT