summaryrefslogtreecommitdiffstats
path: root/include
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2024-11-23 18:58:07 +0100
committerLinus Torvalds <torvalds@linux-foundation.org>2024-11-23 18:58:07 +0100
commit5c00ff742bf5caf85f60e1c73999f99376fb865d (patch)
treefa484e83c27af79f1c0511e7e0673507461c9379 /include
parentMerge tag '6.13-rc-part1-SMB3-client-fixes' of git://git.samba.org/sfrench/ci... (diff)
parentcma: enforce non-zero pageblock_order during cma_init_reserved_mem() (diff)
downloadlinux-5c00ff742bf5caf85f60e1c73999f99376fb865d.tar.xz
linux-5c00ff742bf5caf85f60e1c73999f99376fb865d.zip
Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton: - The series "zram: optimal post-processing target selection" from Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from MaĆ­ra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. * tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits) cma: enforce non-zero pageblock_order during cma_init_reserved_mem() mm/kfence: add a new kunit test test_use_after_free_read_nofault() zram: fix NULL pointer in comp_algorithm_show() memcg/hugetlb: add hugeTLB counters to memcg vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount zram: ZRAM_DEF_COMP should depend on ZRAM MAINTAINERS/MEMORY MANAGEMENT: add document files for mm Docs/mm/damon: recommend academic papers to read and/or cite mm: define general function pXd_init() kmemleak: iommu/iova: fix transient kmemleak false positive mm/list_lru: simplify the list_lru walk callback function mm/list_lru: split the lock to per-cgroup scope mm/list_lru: simplify reparenting and initial allocation mm/list_lru: code clean up for reparenting mm/list_lru: don't export list_lru_add mm/list_lru: don't pass unnecessary key parameters kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols ...
Diffstat (limited to 'include')
-rw-r--r--include/asm-generic/codetag.lds.h19
-rw-r--r--include/asm-generic/hugetlb.h15
-rw-r--r--include/asm-generic/text-patching.h5
-rw-r--r--include/linux/alloc_tag.h21
-rw-r--r--include/linux/bootmem_info.h35
-rw-r--r--include/linux/codetag.h40
-rw-r--r--include/linux/execmem.h49
-rw-r--r--include/linux/gfp.h6
-rw-r--r--include/linux/highmem.h8
-rw-r--r--include/linux/huge_mm.h16
-rw-r--r--include/linux/hugetlb.h22
-rw-r--r--include/linux/kasan.h12
-rw-r--r--include/linux/khugepaged.h2
-rw-r--r--include/linux/kmemleak.h4
-rw-r--r--include/linux/ksm.h8
-rw-r--r--include/linux/list_lru.h26
-rw-r--r--include/linux/maple_tree.h16
-rw-r--r--include/linux/memcontrol.h97
-rw-r--r--include/linux/mempolicy.h2
-rw-r--r--include/linux/mm.h77
-rw-r--r--include/linux/mm_inline.h27
-rw-r--r--include/linux/mm_types.h84
-rw-r--r--include/linux/mmu_notifier.h7
-rw-r--r--include/linux/mmzone.h5
-rw-r--r--include/linux/module.h16
-rw-r--r--include/linux/moduleloader.h4
-rw-r--r--include/linux/oom.h1
-rw-r--r--include/linux/page-flags-layout.h7
-rw-r--r--include/linux/page-flags.h18
-rw-r--r--include/linux/page-isolation.h8
-rw-r--r--include/linux/pagemap.h31
-rw-r--r--include/linux/pagewalk.h18
-rw-r--r--include/linux/pgalloc_tag.h202
-rw-r--r--include/linux/pgtable.h59
-rw-r--r--include/linux/rmap.h17
-rw-r--r--include/linux/sched/coredump.h82
-rw-r--r--include/linux/set_memory.h6
-rw-r--r--include/linux/shmem_fs.h6
-rw-r--r--include/linux/swapops.h24
-rw-r--r--include/linux/text-patching.h15
-rw-r--r--include/linux/vmalloc.h63
-rw-r--r--include/linux/zswap.h2
-rw-r--r--include/trace/events/memcg.h106
-rw-r--r--include/trace/events/mmap_lock.h14
-rw-r--r--include/trace/events/vmscan.h45
-rw-r--r--include/uapi/asm-generic/mman-common.h3
46 files changed, 869 insertions, 481 deletions
diff --git a/include/asm-generic/codetag.lds.h b/include/asm-generic/codetag.lds.h
index 64f536b80380..372c320c5043 100644
--- a/include/asm-generic/codetag.lds.h
+++ b/include/asm-generic/codetag.lds.h
@@ -11,4 +11,23 @@
#define CODETAG_SECTIONS() \
SECTION_WITH_BOUNDARIES(alloc_tags)
+/*
+ * Module codetags which aren't used after module unload, therefore have the
+ * same lifespan as the module and can be safely unloaded with the module.
+ */
+#define MOD_CODETAG_SECTIONS()
+
+#define MOD_SEPARATE_CODETAG_SECTION(_name) \
+ .codetag.##_name : { \
+ SECTION_WITH_BOUNDARIES(_name) \
+ }
+
+/*
+ * For codetags which might be used after module unload, therefore might stay
+ * longer in memory. Each such codetag type has its own section so that we can
+ * unload them individually once unused.
+ */
+#define MOD_SEPARATE_CODETAG_SECTIONS() \
+ MOD_SEPARATE_CODETAG_SECTION(alloc_tags)
+
#endif /* __ASM_GENERIC_CODETAG_LDS_H */
diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index 594d5905f615..f42133dae68e 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -42,20 +42,26 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t newprot)
return pte_modify(pte, newprot);
}
+#ifndef __HAVE_ARCH_HUGE_PTE_MKUFFD_WP
static inline pte_t huge_pte_mkuffd_wp(pte_t pte)
{
return huge_pte_wrprotect(pte_mkuffd_wp(pte));
}
+#endif
+#ifndef __HAVE_ARCH_HUGE_PTE_CLEAR_UFFD_WP
static inline pte_t huge_pte_clear_uffd_wp(pte_t pte)
{
return pte_clear_uffd_wp(pte);
}
+#endif
+#ifndef __HAVE_ARCH_HUGE_PTE_UFFD_WP
static inline int huge_pte_uffd_wp(pte_t pte)
{
return pte_uffd_wp(pte);
}
+#endif
#ifndef __HAVE_ARCH_HUGE_PTE_CLEAR
static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
@@ -106,22 +112,17 @@ static inline int huge_pte_none(pte_t pte)
#endif
/* Please refer to comments above pte_none_mostly() for the usage */
+#ifndef __HAVE_ARCH_HUGE_PTE_NONE_MOSTLY
static inline int huge_pte_none_mostly(pte_t pte)
{
return huge_pte_none(pte) || is_pte_marker(pte);
}
+#endif
#ifndef __HAVE_ARCH_PREPARE_HUGEPAGE_RANGE
static inline int prepare_hugepage_range(struct file *file,
unsigned long addr, unsigned long len)
{
- struct hstate *h = hstate_file(file);
-
- if (len & ~huge_page_mask(h))
- return -EINVAL;
- if (addr & ~huge_page_mask(h))
- return -EINVAL;
-
return 0;
}
#endif
diff --git a/include/asm-generic/text-patching.h b/include/asm-generic/text-patching.h
new file mode 100644
index 000000000000..2245c641b741
--- /dev/null
+++ b/include/asm-generic/text-patching.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_GENERIC_TEXT_PATCHING_H
+#define _ASM_GENERIC_TEXT_PATCHING_H
+
+#endif /* _ASM_GENERIC_TEXT_PATCHING_H */
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 941deffc590d..7c0786bdf9af 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -30,6 +30,21 @@ struct alloc_tag {
struct alloc_tag_counters __percpu *counters;
} __aligned(8);
+struct alloc_tag_kernel_section {
+ struct alloc_tag *first_tag;
+ unsigned long count;
+};
+
+struct alloc_tag_module_section {
+ union {
+ unsigned long start_addr;
+ struct alloc_tag *first_tag;
+ };
+ unsigned long end_addr;
+ /* used size */
+ unsigned long size;
+};
+
#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
#define CODETAG_EMPTY ((void *)1)
@@ -54,6 +69,8 @@ static inline void set_codetag_empty(union codetag_ref *ref) {}
#ifdef CONFIG_MEM_ALLOC_PROFILING
+#define ALLOC_TAG_SECTION_NAME "alloc_tags"
+
struct codetag_bytes {
struct codetag *ct;
s64 bytes;
@@ -76,7 +93,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
#define DEFINE_ALLOC_TAG(_alloc_tag) \
static struct alloc_tag _alloc_tag __used __aligned(8) \
- __section("alloc_tags") = { \
+ __section(ALLOC_TAG_SECTION_NAME) = { \
.ct = CODE_TAG_INIT, \
.counters = &_shared_alloc_tag };
@@ -85,7 +102,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
#define DEFINE_ALLOC_TAG(_alloc_tag) \
static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr); \
static struct alloc_tag _alloc_tag __used __aligned(8) \
- __section("alloc_tags") = { \
+ __section(ALLOC_TAG_SECTION_NAME) = { \
.ct = CODE_TAG_INIT, \
.counters = &_alloc_tag_cntr };
diff --git a/include/linux/bootmem_info.h b/include/linux/bootmem_info.h
index cffa38a73618..d8a8d245824a 100644
--- a/include/linux/bootmem_info.h
+++ b/include/linux/bootmem_info.h
@@ -6,11 +6,10 @@
#include <linux/kmemleak.h>
/*
- * Types for free bootmem stored in page->lru.next. These have to be in
- * some random range in unsigned long space for debugging purposes.
+ * Types for free bootmem stored in the low bits of page->private.
*/
-enum {
- MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE = 12,
+enum bootmem_type {
+ MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE = 1,
SECTION_INFO = MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE,
MIX_SECTION_INFO,
NODE_INFO,
@@ -21,9 +20,19 @@ enum {
void __init register_page_bootmem_info_node(struct pglist_data *pgdat);
void get_page_bootmem(unsigned long info, struct page *page,
- unsigned long type);
+ enum bootmem_type type);
void put_page_bootmem(struct page *page);
+static inline enum bootmem_type bootmem_type(const struct page *page)
+{
+ return (unsigned long)page->private & 0xf;
+}
+
+static inline unsigned long bootmem_info(const struct page *page)
+{
+ return (unsigned long)page->private >> 4;
+}
+
/*
* Any memory allocated via the memblock allocator and not via the
* buddy will be marked reserved already in the memmap. For those
@@ -31,7 +40,7 @@ void put_page_bootmem(struct page *page);
*/
static inline void free_bootmem_page(struct page *page)
{
- unsigned long magic = page->index;
+ enum bootmem_type type = bootmem_type(page);
/*
* The reserve_bootmem_region sets the reserved flag on bootmem
@@ -39,7 +48,7 @@ static inline void free_bootmem_page(struct page *page)
*/
VM_BUG_ON_PAGE(page_ref_count(page) != 2, page);
- if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
+ if (type == SECTION_INFO || type == MIX_SECTION_INFO)
put_page_bootmem(page);
else
VM_BUG_ON_PAGE(1, page);
@@ -53,8 +62,18 @@ static inline void put_page_bootmem(struct page *page)
{
}
+static inline enum bootmem_type bootmem_type(const struct page *page)
+{
+ return SECTION_INFO;
+}
+
+static inline unsigned long bootmem_info(const struct page *page)
+{
+ return 0;
+}
+
static inline void get_page_bootmem(unsigned long info, struct page *page,
- unsigned long type)
+ enum bootmem_type type)
{
}
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index c2a579ccd455..d14dbd26b370 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -13,6 +13,9 @@ struct codetag_module;
struct seq_buf;
struct module;
+#define CODETAG_SECTION_START_PREFIX "__start_"
+#define CODETAG_SECTION_STOP_PREFIX "__stop_"
+
/*
* An instance of this structure is created in a special ELF section at every
* code location being tagged. At runtime, the special section is treated as
@@ -35,8 +38,15 @@ struct codetag_type_desc {
size_t tag_size;
void (*module_load)(struct codetag_type *cttype,
struct codetag_module *cmod);
- bool (*module_unload)(struct codetag_type *cttype,
+ void (*module_unload)(struct codetag_type *cttype,
struct codetag_module *cmod);
+#ifdef CONFIG_MODULES
+ void (*module_replaced)(struct module *mod, struct module *new_mod);
+ bool (*needs_section_mem)(struct module *mod, unsigned long size);
+ void *(*alloc_section_mem)(struct module *mod, unsigned long size,
+ unsigned int prepend, unsigned long align);
+ void (*free_section_mem)(struct module *mod, bool used);
+#endif
};
struct codetag_iterator {
@@ -71,11 +81,31 @@ struct codetag_type *
codetag_register_type(const struct codetag_type_desc *desc);
#if defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES)
+
+bool codetag_needs_module_section(struct module *mod, const char *name,
+ unsigned long size);
+void *codetag_alloc_module_section(struct module *mod, const char *name,
+ unsigned long size, unsigned int prepend,
+ unsigned long align);
+void codetag_free_module_sections(struct module *mod);
+void codetag_module_replaced(struct module *mod, struct module *new_mod);
void codetag_load_module(struct module *mod);
-bool codetag_unload_module(struct module *mod);
-#else
+void codetag_unload_module(struct module *mod);
+
+#else /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
+
+static inline bool
+codetag_needs_module_section(struct module *mod, const char *name,
+ unsigned long size) { return false; }
+static inline void *
+codetag_alloc_module_section(struct module *mod, const char *name,
+ unsigned long size, unsigned int prepend,
+ unsigned long align) { return NULL; }
+static inline void codetag_free_module_sections(struct module *mod) {}
+static inline void codetag_module_replaced(struct module *mod, struct module *new_mod) {}
static inline void codetag_load_module(struct module *mod) {}
-static inline bool codetag_unload_module(struct module *mod) { return true; }
-#endif
+static inline void codetag_unload_module(struct module *mod) {}
+
+#endif /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
#endif /* _LINUX_CODETAG_H */
diff --git a/include/linux/execmem.h b/include/linux/execmem.h
index 32cef1144117..64130ae19690 100644
--- a/include/linux/execmem.h
+++ b/include/linux/execmem.h
@@ -46,11 +46,27 @@ enum execmem_type {
/**
* enum execmem_range_flags - options for executable memory allocations
* @EXECMEM_KASAN_SHADOW: allocate kasan shadow
+ * @EXECMEM_ROX_CACHE: allocations should use ROX cache of huge pages
*/
enum execmem_range_flags {
EXECMEM_KASAN_SHADOW = (1 << 0),
+ EXECMEM_ROX_CACHE = (1 << 1),
};
+#ifdef CONFIG_ARCH_HAS_EXECMEM_ROX
+/**
+ * execmem_fill_trapping_insns - set memory to contain instructions that
+ * will trap
+ * @ptr: pointer to memory to fill
+ * @size: size of the range to fill
+ * @writable: is the memory poited by @ptr is writable or ROX
+ *
+ * A hook for architecures to fill execmem ranges with invalid instructions.
+ * Architectures that use EXECMEM_ROX_CACHE must implement this.
+ */
+void execmem_fill_trapping_insns(void *ptr, size_t size, bool writable);
+#endif
+
/**
* struct execmem_range - definition of an address space suitable for code and
* related data allocations
@@ -123,6 +139,39 @@ void *execmem_alloc(enum execmem_type type, size_t size);
*/
void execmem_free(void *ptr);
+#ifdef CONFIG_MMU
+/**
+ * execmem_vmap - create virtual mapping for EXECMEM_MODULE_DATA memory
+ * @size: size of the virtual mapping in bytes
+ *
+ * Maps virtually contiguous area in the range suitable for EXECMEM_MODULE_DATA.
+ *
+ * Return: the area descriptor on success or %NULL on failure.
+ */
+struct vm_struct *execmem_vmap(size_t size);
+#endif
+
+/**
+ * execmem_update_copy - copy an update to executable memory
+ * @dst: destination address to update
+ * @src: source address containing the data
+ * @size: how many bytes of memory shold be copied
+ *
+ * Copy @size bytes from @src to @dst using text poking if the memory at
+ * @dst is read-only.
+ *
+ * Return: a pointer to @dst or NULL on error
+ */
+void *execmem_update_copy(void *dst, const void *src, size_t size);
+
+/**
+ * execmem_is_rox - check if execmem is read-only
+ * @type - the execmem type to check
+ *
+ * Return: %true if the @type is read-only, %false if it's writable
+ */
+bool execmem_is_rox(enum execmem_type type);
+
#if defined(CONFIG_EXECMEM) && !defined(CONFIG_ARCH_WANTS_EXECMEM_LATE)
void execmem_init(void);
#else
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index a0a6d25f883f..b0fe9f62d15b 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -306,7 +306,7 @@ struct folio *folio_alloc_noprof(gfp_t gfp, unsigned int order);
struct folio *folio_alloc_mpol_noprof(gfp_t gfp, unsigned int order,
struct mempolicy *mpol, pgoff_t ilx, int nid);
struct folio *vma_alloc_folio_noprof(gfp_t gfp, int order, struct vm_area_struct *vma,
- unsigned long addr, bool hugepage);
+ unsigned long addr);
#else
static inline struct page *alloc_pages_noprof(gfp_t gfp_mask, unsigned int order)
{
@@ -326,7 +326,7 @@ static inline struct folio *folio_alloc_mpol_noprof(gfp_t gfp, unsigned int orde
{
return folio_alloc_noprof(gfp, order);
}
-#define vma_alloc_folio_noprof(gfp, order, vma, addr, hugepage) \
+#define vma_alloc_folio_noprof(gfp, order, vma, addr) \
folio_alloc_noprof(gfp, order)
#endif
@@ -341,7 +341,7 @@ static inline struct folio *folio_alloc_mpol_noprof(gfp_t gfp, unsigned int orde
static inline struct page *alloc_page_vma_noprof(gfp_t gfp,
struct vm_area_struct *vma, unsigned long addr)
{
- struct folio *folio = vma_alloc_folio_noprof(gfp, 0, vma, addr, false);
+ struct folio *folio = vma_alloc_folio_noprof(gfp, 0, vma, addr);
return &folio->page;
}
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 930a591b9b61..6e452bd8e7e3 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -224,13 +224,7 @@ static inline
struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
unsigned long vaddr)
{
- struct folio *folio;
-
- folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr, false);
- if (folio)
- clear_user_highpage(&folio->page, vaddr);
-
- return folio;
+ return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr);
}
#endif
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index ef5b80e48599..b94c2e8ee918 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -2,7 +2,6 @@
#ifndef _LINUX_HUGE_MM_H
#define _LINUX_HUGE_MM_H
-#include <linux/sched/coredump.h>
#include <linux/mm_types.h>
#include <linux/fs.h> /* only for vma_is_dax() */
@@ -120,6 +119,8 @@ enum mthp_stat_item {
MTHP_STAT_ANON_FAULT_ALLOC,
MTHP_STAT_ANON_FAULT_FALLBACK,
MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
+ MTHP_STAT_ZSWPOUT,
+ MTHP_STAT_SWPIN,
MTHP_STAT_SWPOUT,
MTHP_STAT_SWPOUT_FALLBACK,
MTHP_STAT_SHMEM_ALLOC,
@@ -253,19 +254,6 @@ static inline unsigned long thp_vma_suitable_orders(struct vm_area_struct *vma,
return orders;
}
-static inline bool file_thp_enabled(struct vm_area_struct *vma)
-{
- struct inode *inode;
-
- if (!vma->vm_file)
- return false;
-
- inode = vma->vm_file->f_inode;
-
- return (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) &&
- !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
-}
-
unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long vm_flags,
unsigned long tva_flags,
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index e4697539b665..ae4fe8615bb6 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -546,16 +546,10 @@ static inline struct hstate *hstate_inode(struct inode *i)
}
#endif /* !CONFIG_HUGETLBFS */
-#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
-unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
- unsigned long len, unsigned long pgoff,
- unsigned long flags);
-#endif /* HAVE_ARCH_HUGETLB_UNMAPPED_AREA */
-
unsigned long
-generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
- unsigned long len, unsigned long pgoff,
- unsigned long flags);
+hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags);
/*
* huegtlb page specific state flags. These flags are located in page.private
@@ -1035,9 +1029,19 @@ void hugetlb_unregister_node(struct node *node);
*/
bool is_raw_hwpoison_page_in_hugepage(struct page *page);
+static inline unsigned long huge_page_mask_align(struct file *file)
+{
+ return PAGE_MASK & ~huge_page_mask(hstate_file(file));
+}
+
#else /* CONFIG_HUGETLB_PAGE */
struct hstate {};
+static inline unsigned long huge_page_mask_align(struct file *file)
+{
+ return 0;
+}
+
static inline struct hugepage_subpool *hugetlb_folio_subpool(struct folio *folio)
{
return NULL;
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 00a3bf7c0d8f..6bbfc8aa42e8 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -29,6 +29,9 @@ typedef unsigned int __bitwise kasan_vmalloc_flags_t;
#define KASAN_VMALLOC_VM_ALLOC ((__force kasan_vmalloc_flags_t)0x02u)
#define KASAN_VMALLOC_PROT_NORMAL ((__force kasan_vmalloc_flags_t)0x04u)
+#define KASAN_VMALLOC_PAGE_RANGE 0x1 /* Apply exsiting page range */
+#define KASAN_VMALLOC_TLB_FLUSH 0x2 /* TLB flush */
+
#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
#include <linux/pgtable.h>
@@ -564,7 +567,8 @@ void kasan_populate_early_vm_area_shadow(void *start, unsigned long size);
int kasan_populate_vmalloc(unsigned long addr, unsigned long size);
void kasan_release_vmalloc(unsigned long start, unsigned long end,
unsigned long free_region_start,
- unsigned long free_region_end);
+ unsigned long free_region_end,
+ unsigned long flags);
#else /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */
@@ -579,7 +583,8 @@ static inline int kasan_populate_vmalloc(unsigned long start,
static inline void kasan_release_vmalloc(unsigned long start,
unsigned long end,
unsigned long free_region_start,
- unsigned long free_region_end) { }
+ unsigned long free_region_end,
+ unsigned long flags) { }
#endif /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */
@@ -614,7 +619,8 @@ static inline int kasan_populate_vmalloc(unsigned long start,
static inline void kasan_release_vmalloc(unsigned long start,
unsigned long end,
unsigned long free_region_start,
- unsigned long free_region_end) { }
+ unsigned long free_region_end,
+ unsigned long flags) { }
static inline void *kasan_unpoison_vmalloc(const void *start,
unsigned long size,
diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index 30baae91b225..1f46046080f5 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -2,8 +2,6 @@
#ifndef _LINUX_KHUGEPAGED_H
#define _LINUX_KHUGEPAGED_H
-#include <linux/sched/coredump.h> /* MMF_VM_HUGEPAGE */
-
extern unsigned int khugepaged_max_ptes_none __read_mostly;
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
extern struct attribute_group khugepaged_attr_group;
diff --git a/include/linux/kmemleak.h b/include/linux/kmemleak.h
index 6a3cd1bf4680..93a73c076d16 100644
--- a/include/linux/kmemleak.h
+++ b/include/linux/kmemleak.h
@@ -26,6 +26,7 @@ extern void kmemleak_free_part(const void *ptr, size_t size) __ref;
extern void kmemleak_free_percpu(const void __percpu *ptr) __ref;
extern void kmemleak_update_trace(const void *ptr) __ref;
extern void kmemleak_not_leak(const void *ptr) __ref;
+extern void kmemleak_transient_leak(const void *ptr) __ref;
extern void kmemleak_ignore(const void *ptr) __ref;
extern void kmemleak_scan_area(const void *ptr, size_t size, gfp_t gfp) __ref;
extern void kmemleak_no_scan(const void *ptr) __ref;
@@ -93,6 +94,9 @@ static inline void kmemleak_update_trace(const void *ptr)
static inline void kmemleak_not_leak(const void *ptr)
{
}
+static inline void kmemleak_transient_leak(const void *ptr)
+{
+}
static inline void kmemleak_ignore(const void *ptr)
{
}
diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index ec9c05044d4f..6a53ac4885bb 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -13,7 +13,6 @@
#include <linux/pagemap.h>
#include <linux/rmap.h>
#include <linux/sched.h>
-#include <linux/sched/coredump.h>
#ifdef CONFIG_KSM
int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
@@ -91,7 +90,7 @@ struct folio *ksm_might_need_to_copy(struct folio *folio,
void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc);
void folio_migrate_ksm(struct folio *newfolio, struct folio *folio);
-void collect_procs_ksm(struct folio *folio, struct page *page,
+void collect_procs_ksm(const struct folio *folio, const struct page *page,
struct list_head *to_kill, int force_early);
long ksm_process_profit(struct mm_struct *);
@@ -123,8 +122,9 @@ static inline void ksm_might_unmap_zero_page(struct mm_struct *mm, pte_t pte)
{
}
-static inline void collect_procs_ksm(struct folio *folio, struct page *page,
- struct list_head *to_kill, int force_early)
+static inline void collect_procs_ksm(const struct folio *folio,
+ const struct page *page, struct list_head *to_kill,
+ int force_early)
{
}
diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index 5099a8ccd5f4..05c166811f6b 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -32,6 +32,8 @@ struct list_lru_one {
struct list_head list;
/* may become negative during memcg reparenting */
long nr_items;
+ /* protects all fields above */
+ spinlock_t lock;
};
struct list_lru_memcg {
@@ -41,11 +43,9 @@ struct list_lru_memcg {
};
struct list_lru_node {
- /* protects all lists on the node, including per cgroup */
- spinlock_t lock;
/* global list, used for the root cgroup in cgroup aware lrus */
struct list_lru_one lru;
- long nr_items;
+ atomic_long_t nr_items;
} ____cacheline_aligned_in_smp;
struct list_lru {
@@ -56,16 +56,28 @@ struct list_lru {
bool memcg_aware;
struct xarray xa;
#endif
+#ifdef CONFIG_LOCKDEP
+ struct lock_class_key *key;
+#endif
};
void list_lru_destroy(struct list_lru *lru);
int __list_lru_init(struct list_lru *lru, bool memcg_aware,
- struct lock_class_key *key, struct shrinker *shrinker);
+ struct shrinker *shrinker);
#define list_lru_init(lru) \
- __list_lru_init((lru), false, NULL, NULL)
+ __list_lru_init((lru), false, NULL)
#define list_lru_init_memcg(lru, shrinker) \
- __list_lru_init((lru), true, NULL, shrinker)
+ __list_lru_init((lru), true, shrinker)
+
+static inline int list_lru_init_memcg_key(struct list_lru *lru, struct shrinker *shrinker,
+ struct lock_class_key *key)
+{
+#ifdef CONFIG_LOCKDEP
+ lru->key = key;
+#endif
+ return list_lru_init_memcg(lru, shrinker);
+}
int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
gfp_t gfp);
@@ -172,7 +184,7 @@ void list_lru_isolate_move(struct list_lru_one *list, struct list_head *item,
struct list_head *head);
typedef enum lru_status (*list_lru_walk_cb)(struct list_head *item,
- struct list_lru_one *list, spinlock_t *lock, void *cb_arg);
+ struct list_lru_one *list, void *cb_arg);
/**
* list_lru_walk_one: walk a @lru, isolating and disposing freeable items.
diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index c2c11004085e..cbbcd18d4186 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -224,7 +224,7 @@ typedef struct { /* nothing */ } lockdep_map_p;
* (set at tree creation time) and dynamic information set under the spinlock.
*
* Another use of flags are to indicate global states of the tree. This is the
- * case with the MAPLE_USE_RCU flag, which indicates the tree is currently in
+ * case with the MT_FLAGS_USE_RCU flag, which indicates the tree is currently in
* RCU mode. This mode was added to allow the tree to reuse nodes instead of
* re-allocating and RCU freeing nodes when there is a single user.
*/
@@ -592,6 +592,20 @@ static __always_inline void mas_reset(struct ma_state *mas)
#define mas_for_each(__mas, __entry, __max) \
while (((__entry) = mas_find((__mas), (__max))) != NULL)
+/**
+ * mas_for_each_rev() - Iterate over a range of the maple tree in reverse order.
+ * @__mas: Maple Tree operation state (maple_state)
+ * @__entry: Entry retrieved from the tree
+ * @__min: minimum index to retrieve from the tree
+ *
+ * When returned, mas->index and mas->last will hold the entire range for the
+ * entry.
+ *
+ * Note: may return the zero entry.
+ */
+#define mas_for_each_rev(__mas, __entry, __min) \
+ while (((__entry) = mas_find_rev((__mas), (__min))) != NULL)
+
#ifdef CONFIG_DEBUG_MAPLE_TREE
enum mt_dump_format {
mt_dump_dec,
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e1b41554a5fb..5502aa8e138e 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -299,25 +299,10 @@ struct mem_cgroup {
/* For oom notifier event fd */
struct list_head oom_notify;
- /*
- * Should we move charges of a task when a task is moved into this
- * mem_cgroup ? And what type of charges should we move ?
- */
- unsigned long move_charge_at_immigrate;
- /* taken only while moving_account > 0 */
- spinlock_t move_lock;
- unsigned long move_lock_flags;
-
/* Legacy tcp memory accounting */
bool tcpmem_active;
int tcpmem_pressure;
- /*
- * set > 0 if pages under this cgroup are moving to other cgroup.
- */
- atomic_t moving_account;
- struct task_struct *move_lock_task;
-
/* List of events which userspace want to receive */
struct list_head event_list;
spinlock_t event_list_lock;
@@ -433,9 +418,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
*
* - the folio lock
* - LRU isolation
- * - folio_memcg_lock()
* - exclusive reference
- * - mem_cgroup_trylock_pages()
*
* For a kmem folio a caller should hold an rcu read lock to protect memcg
* associated with a kmem folio from being released.
@@ -460,35 +443,6 @@ static inline bool folio_memcg_charged(struct folio *folio)
return __folio_memcg(folio) != NULL;
}
-/**
- * folio_memcg_rcu - Locklessly get the memory cgroup associated with a folio.
- * @folio: Pointer to the folio.
- *
- * This function assumes that the folio is known to have a
- * proper memory cgroup pointer. It's not safe to call this function
- * against some type of folios, e.g. slab folios or ex-slab folios.
- *
- * Return: A pointer to the memory cgroup associated with the folio,
- * or NULL.
- */
-static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
-{
- unsigned long memcg_data = READ_ONCE(folio->memcg_data);
-
- VM_BUG_ON_FOLIO(folio_test_slab(folio), folio);
-
- if (memcg_data & MEMCG_DATA_KMEM) {
- struct obj_cgroup *objcg;
-
- objcg = (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
- return obj_cgroup_memcg(objcg);
- }
-
- WARN_ON_ONCE(!rcu_read_lock_held());
-
- return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
-}
-
/*
* folio_memcg_check - Get the memory cgroup associated with a folio.
* @folio: Pointer to the folio.
@@ -504,9 +458,7 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
*
* - the folio lock
* - LRU isolation
- * - lock_folio_memcg()
* - exclusive reference
- * - mem_cgroup_trylock_pages()
*
* For a kmem folio a caller should hold an rcu read lock to protect memcg
* associated with a kmem folio from being released.
@@ -1103,10 +1055,9 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio)
return NULL;
}
-static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
+static inline bool folio_memcg_charged(struct folio *folio)
{
- WARN_ON_ONCE(!rcu_read_lock_held());
- return NULL;
+ return false;
}
static inline struct mem_cgroup *folio_memcg_check(struct folio *folio)
@@ -1282,6 +1233,10 @@ struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css)
return NULL;
}
+static inline void obj_cgroup_get(struct obj_cgroup *objcg)
+{
+}
+
static inline void obj_cgroup_put(struct obj_cgroup *objcg)
{
}
@@ -1874,26 +1829,6 @@ static inline bool task_in_memcg_oom(struct task_struct *p)
return p->memcg_in_oom;
}
-void folio_memcg_lock(struct folio *folio);
-void folio_memcg_unlock(struct folio *folio);
-
-/* try to stablize folio_memcg() for all the pages in a memcg */
-static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg)
-{
- rcu_read_lock();
-
- if (mem_cgroup_disabled() || !atomic_read(&memcg->moving_account))
- return true;
-
- rcu_read_unlock();
- return false;
-}
-
-static inline void mem_cgroup_unlock_pages(void)
-{
- rcu_read_unlock();
-}
-
static inline void mem_cgroup_enter_user_fault(void)
{
WARN_ON(current->in_user_fault);
@@ -1915,26 +1850,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order,
return 0;
}
-static inline void folio_memcg_lock(struct folio *folio)
-{
-}
-
-static inline void folio_memcg_unlock(struct folio *folio)
-{
-}
-
-static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg)
-{
- /* to match folio_memcg_rcu() */
- rcu_read_lock();
- return true;
-}
-
-static inline void mem_cgroup_unlock_pages(void)
-{
- rcu_read_unlock();
-}
-
static inline bool task_in_memcg_oom(struct task_struct *p)
{
return false;
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 1add16f21612..ce9885e0178a 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -47,7 +47,7 @@ struct mempolicy {
atomic_t refcnt;
unsigned short mode; /* See MPOL_* above */
unsigned short flags; /* See set_mempolicy() MPOL_F_* above */
- nodemask_t nodes; /* interleave/bind/perfer */
+ nodemask_t nodes; /* interleave/bind/preferred/etc */
int home_node; /* Home node to use for MPOL_BIND and MPOL_PREFERRED_MANY */
union {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 673771f34674..2bbf73eb53e7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -97,11 +97,11 @@ extern const int mmap_rnd_compat_bits_max;
extern int mmap_rnd_compat_bits __read_mostly;
#endif
-#ifndef PHYSMEM_END
+#ifndef DIRECT_MAP_PHYSMEM_END
# ifdef MAX_PHYSMEM_BITS
-# define PHYSMEM_END ((1ULL << MAX_PHYSMEM_BITS) - 1)
+# define DIRECT_MAP_PHYSMEM_END ((1ULL << MAX_PHYSMEM_BITS) - 1)
# else
-# define PHYSMEM_END (((phys_addr_t)-1)&~(1ULL<<63))
+# define DIRECT_MAP_PHYSMEM_END (((phys_addr_t)-1)&~(1ULL<<63))
# endif
#endif
@@ -1298,8 +1298,6 @@ static inline struct folio *virt_to_folio(const void *x)
void __folio_put(struct folio *folio);
-void put_pages_list(struct list_head *pages);
-
void split_page(struct page *page, unsigned int order);
void folio_copy(struct folio *dst, struct folio *src);
int folio_mc_copy(struct folio *dst, struct folio *src);
@@ -1907,7 +1905,7 @@ static inline unsigned long page_to_section(const struct page *page)
*
* Return: The Page Frame Number of the first page in the folio.
*/
-static inline unsigned long folio_pfn(struct folio *folio)
+static inline unsigned long folio_pfn(const struct folio *folio)
{
return page_to_pfn(&folio->page);
}
@@ -3028,8 +3026,11 @@ static inline pte_t *pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd,
return pte;
}
-pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd,
- unsigned long addr, spinlock_t **ptlp);
+pte_t *pte_offset_map_ro_nolock(struct mm_struct *mm, pmd_t *pmd,
+ unsigned long addr, spinlock_t **ptlp);
+pte_t *pte_offset_map_rw_nolock(struct mm_struct *mm, pmd_t *pmd,
+ unsigned long addr, pmd_t *pmdvalp,
+ spinlock_t **ptlp);
#define pte_unmap_unlock(pte, ptl) do { \
spin_unlock(ptl); \
@@ -3831,9 +3832,6 @@ void *sparse_buffer_alloc(unsigned long size);
struct page * __populate_section_memmap(unsigned long pfn,
unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
struct dev_pagemap *pgmap);
-void pud_init(void *addr);
-void pmd_init(void *addr);
-void kernel_pte_init(void *addr);
pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node);
pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
@@ -4176,63 +4174,6 @@ static inline int do_mseal(unsigned long start, size_t len_in, unsigned long fla
}
#endif
-#ifdef CONFIG_MEM_ALLOC_PROFILING
-static inline void pgalloc_tag_split(struct folio *folio, int old_order, int new_order)
-{
- int i;
- struct alloc_tag *tag;
- unsigned int nr_pages = 1 << new_order;
-
- if (!mem_alloc_profiling_enabled())
- return;
-
- tag = pgalloc_tag_get(&folio->page);
- if (!tag)
- return;
-
- for (i = nr_pages; i < (1 << old_order); i += nr_pages) {
- union codetag_ref *ref = get_page_tag_ref(folio_page(folio, i));
-
- if (ref) {
- /* Set new reference to point to the original tag */
- alloc_tag_ref_set(ref, tag);
- put_page_tag_ref(ref);
- }
- }
-}
-
-static inline void pgalloc_tag_copy(struct folio *new, struct folio *old)
-{
- struct alloc_tag *tag;
- union codetag_ref *ref;
-
- tag = pgalloc_tag_get(&old->page);
- if (!tag)
- return;
-
- ref = get_page_tag_ref(&new->page);
- if (!ref)
- return;
-
- /* Clear the old ref to the original allocation tag. */
- clear_page_tag_ref(&old->page);
- /* Decrement the counters of the tag on get_new_folio. */
- alloc_tag_sub(ref, folio_nr_pages(new));
-
- __alloc_tag_ref_set(ref, tag);
-
- put_page_tag_ref(ref);
-}
-#else /* !CONFIG_MEM_ALLOC_PROFILING */
-static inline void pgalloc_tag_split(struct folio *folio, int old_order, int new_order)
-{
-}
-
-static inline void pgalloc_tag_copy(struct folio *new, struct folio *old)
-{
-}
-#endif /* CONFIG_MEM_ALLOC_PROFILING */
-
int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status);
int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status);
int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index f4fe593c1400..1b6a917fffa4 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -155,6 +155,11 @@ static inline int folio_lru_refs(struct folio *folio)
return ((flags & LRU_REFS_MASK) >> LRU_REFS_PGOFF) + workingset;
}
+static inline void folio_clear_lru_refs(struct folio *folio)
+{
+ set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS, 0);
+}
+
static inline int folio_lru_gen(struct folio *folio)
{
unsigned long flags = READ_ONCE(folio->flags);
@@ -222,6 +227,7 @@ static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio,
{
unsigned long seq;
unsigned long flags;
+ unsigned long mask;
int gen = folio_lru_gen(folio);
int type = folio_is_file_lru(folio);
int zone = folio_zonenum(folio);
@@ -257,7 +263,14 @@ static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio,
gen = lru_gen_from_seq(seq);
flags = (gen + 1UL) << LRU_GEN_PGOFF;
/* see the comment on MIN_NR_GENS about PG_active */
- set_mask_bits(&folio->flags, LRU_GEN_MASK | BIT(PG_active), flags);
+ mask = LRU_GEN_MASK;
+ /*
+ * Don't clear PG_workingset here because it can affect PSI accounting
+ * if the activation is due to workingset refault.
+ */
+ if (folio_test_active(folio))
+ mask |= LRU_REFS_MASK | BIT(PG_referenced) | BIT(PG_active);
+ set_mask_bits(&folio->flags, mask, flags);
lru_gen_update_size(lruvec, folio, -1, gen);
/* for folio_rotate_reclaimable() */
@@ -291,6 +304,12 @@ static inline bool lru_gen_del_folio(struct lruvec *lruvec, struct folio *folio,
return true;
}
+static inline void folio_migrate_refs(struct folio *new, struct folio *old)
+{
+ unsigned long refs = READ_ONCE(old->flags) & LRU_REFS_MASK;
+
+ set_mask_bits(&new->flags, LRU_REFS_MASK, refs);
+}
#else /* !CONFIG_LRU_GEN */
static inline bool lru_gen_enabled(void)
@@ -313,6 +332,10 @@ static inline bool lru_gen_del_folio(struct lruvec *lruvec, struct folio *folio,
return false;
}
+static inline void folio_migrate_refs(struct folio *new, struct folio *old)
+{
+
+}
#endif /* CONFIG_LRU_GEN */
static __always_inline
@@ -521,7 +544,7 @@ static inline pte_marker copy_pte_marker(
{
pte_marker srcm = pte_marker_get(entry);
/* Always copy error entries. */
- pte_marker dstm = srcm & PTE_MARKER_POISONED;
+ pte_marker dstm = srcm & (PTE_MARKER_POISONED | PTE_MARKER_GUARD);
/* Only copy PTE markers if UFFD register matches. */
if ((srcm & PTE_MARKER_UFFD_WP) && userfaultfd_wp(dst_vma))
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e85beea1206e..7361a8f3ab68 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1535,4 +1535,88 @@ enum {
/* See also internal only FOLL flags in mm/internal.h */
};
+/* mm flags */
+
+/*
+ * The first two bits represent core dump modes for set-user-ID,
+ * the modes are SUID_DUMP_* defined in linux/sched/coredump.h
+ */
+#define MMF_DUMPABLE_BITS 2
+#define MMF_DUMPABLE_MASK ((1 << MMF_DUMPABLE_BITS) - 1)
+/* coredump filter bits */
+#define MMF_DUMP_ANON_PRIVATE 2
+#define MMF_DUMP_ANON_SHARED 3
+#define MMF_DUMP_MAPPED_PRIVATE 4
+#define MMF_DUMP_MAPPED_SHARED 5
+#define MMF_DUMP_ELF_HEADERS 6
+#define MMF_DUMP_HUGETLB_PRIVATE 7
+#define MMF_DUMP_HUGETLB_SHARED 8
+#define MMF_DUMP_DAX_PRIVATE 9
+#define MMF_DUMP_DAX_SHARED 10
+
+#define MMF_DUMP_FILTER_SHIFT MMF_DUMPABLE_BITS
+#define MMF_DUMP_FILTER_BITS 9
+#define MMF_DUMP_FILTER_MASK \
+ (((1 << MMF_DUMP_FILTER_BITS) - 1) << MMF_DUMP_FILTER_SHIFT)
+#define MMF_DUMP_FILTER_DEFAULT \
+ ((1 << MMF_DUMP_ANON_PRIVATE) | (1 << MMF_DUMP_ANON_SHARED) |\
+ (1 << MMF_DUMP_HUGETLB_PRIVATE) | MMF_DUMP_MASK_DEFAULT_ELF)
+
+#ifdef CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS
+# define MMF_DUMP_MASK_DEFAULT_ELF (1 << MMF_DUMP_ELF_HEADERS)
+#else
+# define MMF_DUMP_MASK_DEFAULT_ELF 0
+#endif
+ /* leave room for more dump flags */
+#define MMF_VM_MERGEABLE 16 /* KSM may merge identical pages */
+#define MMF_VM_HUGEPAGE 17 /* set when mm is available for khugepaged */
+
+/*
+ * This one-shot flag is dropped due to necessity of changing exe once again
+ * on NFS restore
+ */
+//#define MMF_EXE_FILE_CHANGED 18 /* see prctl_set_mm_exe_file() */
+
+#define MMF_HAS_UPROBES 19 /* has uprobes */
+#define MMF_RECALC_UPROBES 20 /* MMF_HAS_UPROBES can be wrong */
+#define MMF_OOM_SKIP 21 /* mm is of no interest for the OOM killer */
+#define MMF_UNSTABLE 22 /* mm is unstable for copy_from_user */
+#define MMF_HUGE_ZERO_PAGE 23 /* mm has ever used the global huge zero page */
+#define MMF_DISABLE_THP 24 /* disable THP for all VMAs */
+#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
+#define MMF_OOM_REAP_QUEUED 25 /* mm was queued for oom_reaper */
+#define MMF_MULTIPROCESS 26 /* mm is shared between processes */
+/*
+ * MMF_HAS_PINNED: Whether this mm has pinned any pages. This can be either
+ * replaced in the future by mm.pinned_vm when it becomes stable, or grow into
+ * a counter on its own. We're aggresive on this bit for now: even if the
+ * pinned pages were unpinned later on, we'll still keep this bit set for the
+ * lifecycle of this mm, just for simplicity.
+ */
+#define MMF_HAS_PINNED 27 /* FOLL_PIN has run, never cleared */
+
+#define MMF_HAS_MDWE 28
+#define MMF_HAS_MDWE_MASK (1 << MMF_HAS_MDWE)
+
+
+#define MMF_HAS_MDWE_NO_INHERIT 29
+
+#define MMF_VM_MERGE_ANY 30
+#define MMF_VM_MERGE_ANY_MASK (1 << MMF_VM_MERGE_ANY)
+
+#define MMF_TOPDOWN 31 /* mm searches top down by default */
+#define MMF_TOPDOWN_MASK (1 << MMF_TOPDOWN)
+
+#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
+ MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\
+ MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK)
+
+static inline unsigned long mmf_init_flags(unsigned long flags)
+{
+ if (flags & (1UL << MMF_HAS_MDWE_NO_INHERIT))
+ flags &= ~((1UL << MMF_HAS_MDWE) |
+ (1UL << MMF_HAS_MDWE_NO_INHERIT));
+ return flags & MMF_INIT_MASK;
+}
+
#endif /* _LINUX_MM_TYPES_H */
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index d39ebb10caeb..e2dd57ca368b 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -606,6 +606,13 @@ static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
return 0;
}
+static inline int mmu_notifier_clear_young(struct mm_struct *mm,
+ unsigned long start,
+ unsigned long end)
+{
+ return 0;
+}
+
static inline int mmu_notifier_test_young(struct mm_struct *mm,
unsigned long address)
{
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 80bc5640bb60..b36124145a16 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -220,6 +220,9 @@ enum node_stat_item {
PGDEMOTE_KSWAPD,
PGDEMOTE_DIRECT,
PGDEMOTE_KHUGEPAGED,
+#ifdef CONFIG_HUGETLB_PAGE
+ NR_HUGETLB,
+#endif
NR_VM_NODE_STAT_ITEMS
};
@@ -403,6 +406,8 @@ enum {
NR_LRU_GEN_CAPS
};
+#define LRU_REFS_FLAGS (BIT(PG_referenced) | BIT(PG_workingset))
+
#define MIN_LRU_BATCH BITS_PER_LONG
#define MAX_LRU_BATCH (MIN_LRU_BATCH * 64)
diff --git a/include/linux/module.h b/include/linux/module.h
index 88ecc5e9f523..2a9386cbdf85 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -367,6 +367,8 @@ enum mod_mem_type {
struct module_memory {
void *base;
+ void *rw_copy;
+ bool is_rox;
unsigned int size;
#ifdef CONFIG_MODULES_TREE_LOOKUP
@@ -767,6 +769,15 @@ static inline bool is_livepatch_module(struct module *mod)
void set_module_sig_enforced(void);
+void *__module_writable_address(struct module *mod, void *loc);
+
+static inline void *module_writable_address(struct module *mod, void *loc)
+{
+ if (!IS_ENABLED(CONFIG_ARCH_HAS_EXECMEM_ROX) || !mod)
+ return loc;
+ return __module_writable_address(mod, loc);
+}
+
#else /* !CONFIG_MODULES... */
static inline struct module *__module_address(unsigned long addr)
@@ -874,6 +885,11 @@ static inline bool module_is_coming(struct module *mod)
{
return false;
}
+
+static inline void *module_writable_address(struct module *mod, void *loc)
+{
+ return loc;
+}
#endif /* CONFIG_MODULES */
#ifdef CONFIG_SYSFS
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index e395461d59e5..1f5507ba5a12 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -108,6 +108,10 @@ int module_finalize(const Elf_Ehdr *hdr,
const Elf_Shdr *sechdrs,
struct module *mod);
+int module_post_finalize(const Elf_Ehdr *hdr,
+ const Elf_Shdr *sechdrs,
+ struct module *mod);
+
#ifdef CONFIG_MODULES
void flush_module_init_free_work(void);
#else
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 7d0c9c48a0c5..1e0fc6931ce9 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -7,7 +7,6 @@
#include <linux/types.h>
#include <linux/nodemask.h>
#include <uapi/linux/oom.h>
-#include <linux/sched/coredump.h> /* MMF_* */
#include <linux/mm.h> /* VM_FAULT* */
struct zonelist;
diff --git a/include/linux/page-flags-layout.h b/include/linux/page-flags-layout.h
index 7d79818dc065..4f5c9e979bb9 100644
--- a/include/linux/page-flags-layout.h
+++ b/include/linux/page-flags-layout.h
@@ -111,5 +111,12 @@
ZONES_WIDTH - LRU_GEN_WIDTH - SECTIONS_WIDTH - \
NODES_WIDTH - KASAN_TAG_WIDTH - LAST_CPUPID_WIDTH)
+#define NR_NON_PAGEFLAG_BITS (SECTIONS_WIDTH + NODES_WIDTH + ZONES_WIDTH + \
+ LAST_CPUPID_SHIFT + KASAN_TAG_WIDTH + \
+ LRU_GEN_WIDTH + LRU_REFS_WIDTH)
+
+#define NR_UNUSED_PAGEFLAG_BITS (BITS_PER_LONG - \
+ (NR_NON_PAGEFLAG_BITS + NR_PAGEFLAGS))
+
#endif
#endif /* _LINUX_PAGE_FLAGS_LAYOUT */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 908ee0aad554..2220bfec278e 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -689,6 +689,13 @@ static __always_inline bool folio_test_anon(const struct folio *folio)
return ((unsigned long)folio->mapping & PAGE_MAPPING_ANON) != 0;
}
+static __always_inline bool PageAnonNotKsm(const struct page *page)
+{
+ unsigned long flags = (unsigned long)page_folio(page)->mapping;
+
+ return (flags & PAGE_MAPPING_FLAGS) == PAGE_MAPPING_ANON;
+}
+
static __always_inline bool PageAnon(const struct page *page)
{
return folio_test_anon(page_folio(page));
@@ -718,13 +725,8 @@ static __always_inline bool folio_test_ksm(const struct folio *folio)
return ((unsigned long)folio->mapping & PAGE_MAPPING_FLAGS) ==
PAGE_MAPPING_KSM;
}
-
-static __always_inline bool PageKsm(const struct page *page)
-{
- return folio_test_ksm(page_folio(page));
-}
#else
-TESTPAGEFLAG_FALSE(Ksm, ksm)
+FOLIO_TEST_FLAG_FALSE(ksm)
#endif
u64 stable_page_flags(const struct page *page);
@@ -1137,14 +1139,14 @@ static __always_inline int PageAnonExclusive(const struct page *page)
static __always_inline void SetPageAnonExclusive(struct page *page)
{
- VM_BUG_ON_PGFLAGS(!PageAnon(page) || PageKsm(page), page);
+ VM_BUG_ON_PGFLAGS(!PageAnonNotKsm(page), page);
VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page);
set_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags);
}
static __always_inline void ClearPageAnonExclusive(struct page *page)
{
- VM_BUG_ON_PGFLAGS(!PageAnon(page) || PageKsm(page), page);
+ VM_BUG_ON_PGFLAGS(!PageAnonNotKsm(page), page);
VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page);
clear_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags);
}
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c16db0067090..73dc2c1841ec 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,10 +3,6 @@
#define __LINUX_PAGEISOLATION_H
#ifdef CONFIG_MEMORY_ISOLATION
-static inline bool has_isolate_pageblock(struct zone *zone)
-{
- return zone->nr_isolate_pageblock;
-}
static inline bool is_migrate_isolate_page(struct page *page)
{
return get_pageblock_migratetype(page) == MIGRATE_ISOLATE;
@@ -16,10 +12,6 @@ static inline bool is_migrate_isolate(int migratetype)
return migratetype == MIGRATE_ISOLATE;
}
#else
-static inline bool has_isolate_pageblock(struct zone *zone)
-{
- return false;
-}
static inline bool is_migrate_isolate_page(struct page *page)
{
return false;
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 68a5f1ff3301..bcf0865a38ae 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1011,22 +1011,25 @@ static inline struct folio *read_mapping_folio(struct address_space *mapping,
return read_cache_folio(mapping, index, NULL, file);
}
-/*
- * Get the offset in PAGE_SIZE (even for hugetlb pages).
+/**
+ * page_pgoff - Calculate the logical page offset of this page.
+ * @folio: The folio containing this page.
+ * @page: The page which we need the offset of.
+ *
+ * For file pages, this is the offset from the beginning of the file
+ * in units of PAGE_SIZE. For anonymous pages, this is the offset from
+ * the beginning of the anon_vma in units of PAGE_SIZE. This will
+ * return nonsense for KSM pages.
+ *
+ * Context: Caller must have a reference on the folio or otherwise
+ * prevent it from being split or freed.
+ *
+ * Return: The offset in units of PAGE_SIZE.
*/
-static inline pgoff_t page_to_pgoff(struct page *page)
+static inline pgoff_t page_pgoff(const struct folio *folio,
+ const struct page *page)
{
- struct page *head;
-
- if (likely(!PageTransTail(page)))
- return page->index;
-
- head = compound_head(page);
- /*
- * We don't initialize ->index for tail pages: calculate based on
- * head page
- */
- return head->index + page - head;
+ return folio->index + folio_page_idx(folio, page);
}
/*
diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index f5eb5a32aeed..9700a29f8afb 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -25,12 +25,15 @@ enum page_walk_lock {
* this handler is required to be able to handle
* pmd_trans_huge() pmds. They may simply choose to
* split_huge_page() instead of handling it explicitly.
- * @pte_entry: if set, called for each PTE (lowest-level) entry,
- * including empty ones
+ * @pte_entry: if set, called for each PTE (lowest-level) entry
+ * including empty ones, except if @install_pte is set.
+ * If @install_pte is set, @pte_entry is called only for
+ * existing PTEs.
* @pte_hole: if set, called for each hole at all levels,
* depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD.
* Any folded depths (where PTRS_PER_P?D is equal to 1)
- * are skipped.
+ * are skipped. If @install_pte is specified, this will
+ * not trigger for any populated ranges.
* @hugetlb_entry: if set, called for each hugetlb entry. This hook
* function is called with the vma lock held, in order to
* protect against a concurrent freeing of the pte_t* or
@@ -51,6 +54,13 @@ enum page_walk_lock {
* @pre_vma: if set, called before starting walk on a non-null vma.
* @post_vma: if set, called after a walk on a non-null vma, provided
* that @pre_vma and the vma walk succeeded.
+ * @install_pte: if set, missing page table entries are installed and
+ * thus all levels are always walked in the specified
+ * range. This callback is then invoked at the PTE level
+ * (having split any THP pages prior), providing the PTE to
+ * install. If allocations fail, the walk is aborted. This
+ * operation is only available for userland memory. Not
+ * usable for hugetlb ranges.
*
* p?d_entry callbacks are called even if those levels are folded on a
* particular architecture/configuration.
@@ -76,6 +86,8 @@ struct mm_walk_ops {
int (*pre_vma)(unsigned long start, unsigned long end,
struct mm_walk *walk);
void (*post_vma)(struct mm_walk *walk);
+ int (*install_pte)(unsigned long addr, unsigned long next,
+ pte_t *ptep, struct mm_walk *walk);
enum page_walk_lock walk_lock;
};
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 59a3deb792a8..0e43ab653ab6 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -12,45 +12,166 @@
#include <linux/page_ext.h>
extern struct page_ext_operations page_alloc_tagging_ops;
+extern unsigned long alloc_tag_ref_mask;
+extern int alloc_tag_ref_offs;
+extern struct alloc_tag_kernel_section kernel_tags;
-static inline union codetag_ref *codetag_ref_from_page_ext(struct page_ext *page_ext)
+DECLARE_STATIC_KEY_FALSE(mem_profiling_compressed);
+
+typedef u16 pgalloc_tag_idx;
+
+union pgtag_ref_handle {
+ union codetag_ref *ref; /* reference in page extension */
+ struct page *page; /* reference in page flags */
+};
+
+/* Reserved indexes */
+#define CODETAG_ID_NULL 0
+#define CODETAG_ID_EMPTY 1
+#define CODETAG_ID_FIRST 2
+
+#ifdef CONFIG_MODULES
+
+extern struct alloc_tag_module_section module_tags;
+
+static inline struct alloc_tag *module_idx_to_tag(pgalloc_tag_idx idx)
{
- return (union codetag_ref *)page_ext_data(page_ext, &page_alloc_tagging_ops);
+ return &module_tags.first_tag[idx - kernel_tags.count];
}
-static inline struct page_ext *page_ext_from_codetag_ref(union codetag_ref *ref)
+static inline pgalloc_tag_idx module_tag_to_idx(struct alloc_tag *tag)
{
- return (void *)ref - page_alloc_tagging_ops.offset;
+ return CODETAG_ID_FIRST + kernel_tags.count + (tag - module_tags.first_tag);
}
-/* Should be called only if mem_alloc_profiling_enabled() */
-static inline union codetag_ref *get_page_tag_ref(struct page *page)
+#else /* CONFIG_MODULES */
+
+static inline struct alloc_tag *module_idx_to_tag(pgalloc_tag_idx idx)
{
- if (page) {
- struct page_ext *page_ext = page_ext_get(page);
+ pr_warn("invalid page tag reference %lu\n", (unsigned long)idx);
+ return NULL;
+}
- if (page_ext)
- return codetag_ref_from_page_ext(page_ext);
+static inline pgalloc_tag_idx module_tag_to_idx(struct alloc_tag *tag)
+{
+ pr_warn("invalid page tag 0x%lx\n", (unsigned long)tag);
+ return CODETAG_ID_NULL;
+}
+
+#endif /* CONFIG_MODULES */
+
+static inline void idx_to_ref(pgalloc_tag_idx idx, union codetag_ref *ref)
+{
+ switch (idx) {
+ case (CODETAG_ID_NULL):
+ ref->ct = NULL;
+ break;
+ case (CODETAG_ID_EMPTY):
+ set_codetag_empty(ref);
+ break;
+ default:
+ idx -= CODETAG_ID_FIRST;
+ ref->ct = idx < kernel_tags.count ?
+ &kernel_tags.first_tag[idx].ct :
+ &module_idx_to_tag(idx)->ct;
+ break;
}
- return NULL;
}
-static inline void put_page_tag_ref(union codetag_ref *ref)
+static inline pgalloc_tag_idx ref_to_idx(union codetag_ref *ref)
+{
+ struct alloc_tag *tag;
+
+ if (!ref->ct)
+ return CODETAG_ID_NULL;
+
+ if (is_codetag_empty(ref))
+ return CODETAG_ID_EMPTY;
+
+ tag = ct_to_alloc_tag(ref->ct);
+ if (tag >= kernel_tags.first_tag && tag < kernel_tags.first_tag + kernel_tags.count)
+ return CODETAG_ID_FIRST + (tag - kernel_tags.first_tag);
+
+ return module_tag_to_idx(tag);
+}
+
+
+
+/* Should be called only if mem_alloc_profiling_enabled() */
+static inline bool get_page_tag_ref(struct page *page, union codetag_ref *ref,
+ union pgtag_ref_handle *handle)
{
- if (WARN_ON(!ref))
+ if (!page)
+ return false;
+
+ if (static_key_enabled(&mem_profiling_compressed)) {
+ pgalloc_tag_idx idx;
+
+ idx = (page->flags >> alloc_tag_ref_offs) & alloc_tag_ref_mask;
+ idx_to_ref(idx, ref);
+ handle->page = page;
+ } else {
+ struct page_ext *page_ext;
+ union codetag_ref *tmp;
+
+ page_ext = page_ext_get(page);
+ if (!page_ext)
+ return false;
+
+ tmp = (union codetag_ref *)page_ext_data(page_ext, &page_alloc_tagging_ops);
+ ref->ct = tmp->ct;
+ handle->ref = tmp;
+ }
+
+ return true;
+}
+
+static inline void put_page_tag_ref(union pgtag_ref_handle handle)
+{
+ if (WARN_ON(!handle.ref))
return;
- page_ext_put(page_ext_from_codetag_ref(ref));
+ if (!static_key_enabled(&mem_profiling_compressed))
+ page_ext_put((void *)handle.ref - page_alloc_tagging_ops.offset);
+}
+
+static inline void update_page_tag_ref(union pgtag_ref_handle handle, union codetag_ref *ref)
+{
+ if (static_key_enabled(&mem_profiling_compressed)) {
+ struct page *page = handle.page;
+ unsigned long old_flags;
+ unsigned long flags;
+ unsigned long idx;
+
+ if (WARN_ON(!page || !ref))
+ return;
+
+ idx = (unsigned long)ref_to_idx(ref);
+ idx = (idx & alloc_tag_ref_mask) << alloc_tag_ref_offs;
+ do {
+ old_flags = READ_ONCE(page->flags);
+ flags = old_flags;
+ flags &= ~(alloc_tag_ref_mask << alloc_tag_ref_offs);
+ flags |= idx;
+ } while (unlikely(!try_cmpxchg(&page->flags, &old_flags, flags)));
+ } else {
+ if (WARN_ON(!handle.ref || !ref))
+ return;
+
+ handle.ref->ct = ref->ct;
+ }
}
static inline void clear_page_tag_ref(struct page *page)
{
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
- if (ref) {
- set_codetag_empty(ref);
- put_page_tag_ref(ref);
+ if (get_page_tag_ref(page, &ref, &handle)) {
+ set_codetag_empty(&ref);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
}
}
@@ -59,11 +180,13 @@ static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int nr)
{
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
- if (ref) {
- alloc_tag_add(ref, task->alloc_tag, PAGE_SIZE * nr);
- put_page_tag_ref(ref);
+ if (get_page_tag_ref(page, &ref, &handle)) {
+ alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
}
}
@@ -71,11 +194,13 @@ static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
static inline void pgalloc_tag_sub(struct page *page, unsigned int nr)
{
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
- if (ref) {
- alloc_tag_sub(ref, PAGE_SIZE * nr);
- put_page_tag_ref(ref);
+ if (get_page_tag_ref(page, &ref, &handle)) {
+ alloc_tag_sub(&ref, PAGE_SIZE * nr);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
}
}
@@ -85,13 +210,14 @@ static inline struct alloc_tag *pgalloc_tag_get(struct page *page)
struct alloc_tag *tag = NULL;
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
-
- alloc_tag_sub_check(ref);
- if (ref) {
- if (ref->ct)
- tag = ct_to_alloc_tag(ref->ct);
- put_page_tag_ref(ref);
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
+
+ if (get_page_tag_ref(page, &ref, &handle)) {
+ alloc_tag_sub_check(&ref);
+ if (ref.ct)
+ tag = ct_to_alloc_tag(ref.ct);
+ put_page_tag_ref(handle);
}
}
@@ -104,16 +230,22 @@ static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
this_cpu_sub(tag->counters->bytes, PAGE_SIZE * nr);
}
+void pgalloc_tag_split(struct folio *folio, int old_order, int new_order);
+void pgalloc_tag_copy(struct folio *new, struct folio *old);
+
+void __init alloc_tag_sec_init(void);
+
#else /* CONFIG_MEM_ALLOC_PROFILING */
-static inline union codetag_ref *get_page_tag_ref(struct page *page) { return NULL; }
-static inline void put_page_tag_ref(union codetag_ref *ref) {}
static inline void clear_page_tag_ref(struct page *page) {}
static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int nr) {}
static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
static inline struct alloc_tag *pgalloc_tag_get(struct page *page) { return NULL; }
static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) {}
+static inline void alloc_tag_sec_init(void) {}
+static inline void pgalloc_tag_split(struct folio *folio, int old_order, int new_order) {}
+static inline void pgalloc_tag_copy(struct folio *new, struct folio *old) {}
#endif /* CONFIG_MEM_ALLOC_PROFILING */
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index e8b2ac6bd2ae..adef9d6e9b1b 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -90,6 +90,27 @@ static inline unsigned long pud_index(unsigned long address)
#define pgd_index(a) (((a) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
#endif
+#ifndef kernel_pte_init
+static inline void kernel_pte_init(void *addr)
+{
+}
+#define kernel_pte_init kernel_pte_init
+#endif
+
+#ifndef pmd_init
+static inline void pmd_init(void *addr)
+{
+}
+#define pmd_init pmd_init
+#endif
+
+#ifndef pud_init
+static inline void pud_init(void *addr)
+{
+}
+#define pud_init pud_init
+#endif
+
#ifndef pte_offset_kernel
static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address)
{
@@ -1056,44 +1077,6 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
}
#endif
-/*
- * Use set_p*_safe(), and elide TLB flushing, when confident that *no*
- * TLB flush will be required as a result of the "set". For example, use
- * in scenarios where it is known ahead of time that the routine is
- * setting non-present entries, or re-setting an existing entry to the
- * same value. Otherwise, use the typical "set" helpers and flush the
- * TLB.
- */
-#define set_pte_safe(ptep, pte) \
-({ \
- WARN_ON_ONCE(pte_present(*ptep) && !pte_same(*ptep, pte)); \
- set_pte(ptep, pte); \
-})
-
-#define set_pmd_safe(pmdp, pmd) \
-({ \
- WARN_ON_ONCE(pmd_present(*pmdp) && !pmd_same(*pmdp, pmd)); \
- set_pmd(pmdp, pmd); \
-})
-
-#define set_pud_safe(pudp, pud) \
-({ \
- WARN_ON_ONCE(pud_present(*pudp) && !pud_same(*pudp, pud)); \
- set_pud(pudp, pud); \
-})
-
-#define set_p4d_safe(p4dp, p4d) \
-({ \
- WARN_ON_ONCE(p4d_present(*p4dp) && !p4d_same(*p4dp, p4d)); \
- set_p4d(p4dp, p4d); \
-})
-
-#define set_pgd_safe(pgdp, pgd) \
-({ \
- WARN_ON_ONCE(pgd_present(*pgdp) && !pgd_same(*pgdp, pgd)); \
- set_pgd(pgdp, pgd); \
-})
-
#ifndef __HAVE_ARCH_DO_SWAP_PAGE
static inline void arch_do_swap_page_nr(struct mm_struct *mm,
struct vm_area_struct *vma,
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index d5e93e44322e..683a04088f3f 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -171,7 +171,7 @@ static inline void anon_vma_merge(struct vm_area_struct *vma,
unlink_anon_vmas(next);
}
-struct anon_vma *folio_get_anon_vma(struct folio *folio);
+struct anon_vma *folio_get_anon_vma(const struct folio *folio);
/* RMAP flags, currently only relevant for some anon rmap operations. */
typedef int __bitwise rmap_t;
@@ -194,8 +194,8 @@ enum rmap_level {
RMAP_LEVEL_PMD,
};
-static inline void __folio_rmap_sanity_checks(struct folio *folio,
- struct page *page, int nr_pages, enum rmap_level level)
+static inline void __folio_rmap_sanity_checks(const struct folio *folio,
+ const struct page *page, int nr_pages, enum rmap_level level)
{
/* hugetlb folios are handled separately. */
VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
@@ -728,11 +728,8 @@ page_vma_mapped_walk_restart(struct page_vma_mapped_walk *pvmw)
}
bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw);
-
-/*
- * Used by swapoff to help locate where page is expected in vma.
- */
-unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
+unsigned long page_address_in_vma(const struct folio *folio,
+ const struct page *, const struct vm_area_struct *);
/*
* Cleans the PTEs of shared mappings.
@@ -774,14 +771,14 @@ struct rmap_walk_control {
bool (*rmap_one)(struct folio *folio, struct vm_area_struct *vma,
unsigned long addr, void *arg);
int (*done)(struct folio *folio);
- struct anon_vma *(*anon_lock)(struct folio *folio,
+ struct anon_vma *(*anon_lock)(const struct folio *folio,
struct rmap_walk_control *rwc);
bool (*invalid_vma)(struct vm_area_struct *vma, void *arg);
};
void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc);
void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc);
-struct anon_vma *folio_lock_anon_vma_read(struct folio *folio,
+struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
struct rmap_walk_control *rwc);
#else /* !CONFIG_MMU */
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index e62ff805cfc9..6eb65ceed213 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -8,12 +8,6 @@
#define SUID_DUMP_USER 1 /* Dump as user of process */
#define SUID_DUMP_ROOT 2 /* Dump as root */
-/* mm flags */
-
-/* for SUID_DUMP_* above */
-#define MMF_DUMPABLE_BITS 2
-#define MMF_DUMPABLE_MASK ((1 << MMF_DUMPABLE_BITS) - 1)
-
extern void set_dumpable(struct mm_struct *mm, int value);
/*
* This returns the actual value of the suid_dumpable flag. For things
@@ -31,80 +25,4 @@ static inline int get_dumpable(struct mm_struct *mm)
return __get_dumpable(mm->flags);
}
-/* coredump filter bits */
-#define MMF_DUMP_ANON_PRIVATE 2
-#define MMF_DUMP_ANON_SHARED 3
-#define MMF_DUMP_MAPPED_PRIVATE 4
-#define MMF_DUMP_MAPPED_SHARED 5
-#define MMF_DUMP_ELF_HEADERS 6
-#define MMF_DUMP_HUGETLB_PRIVATE 7
-#define MMF_DUMP_HUGETLB_SHARED 8
-#define MMF_DUMP_DAX_PRIVATE 9
-#define MMF_DUMP_DAX_SHARED 10
-
-#define MMF_DUMP_FILTER_SHIFT MMF_DUMPABLE_BITS
-#define MMF_DUMP_FILTER_BITS 9
-#define MMF_DUMP_FILTER_MASK \
- (((1 << MMF_DUMP_FILTER_BITS) - 1) << MMF_DUMP_FILTER_SHIFT)
-#define MMF_DUMP_FILTER_DEFAULT \
- ((1 << MMF_DUMP_ANON_PRIVATE) | (1 << MMF_DUMP_ANON_SHARED) |\
- (1 << MMF_DUMP_HUGETLB_PRIVATE) | MMF_DUMP_MASK_DEFAULT_ELF)
-
-#ifdef CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS
-# define MMF_DUMP_MASK_DEFAULT_ELF (1 << MMF_DUMP_ELF_HEADERS)
-#else
-# define MMF_DUMP_MASK_DEFAULT_ELF 0
-#endif
- /* leave room for more dump flags */
-#define MMF_VM_MERGEABLE 16 /* KSM may merge identical pages */
-#define MMF_VM_HUGEPAGE 17 /* set when mm is available for
- khugepaged */
-/*
- * This one-shot flag is dropped due to necessity of changing exe once again
- * on NFS restore
- */
-//#define MMF_EXE_FILE_CHANGED 18 /* see prctl_set_mm_exe_file() */
-
-#define MMF_HAS_UPROBES 19 /* has uprobes */
-#define MMF_RECALC_UPROBES 20 /* MMF_HAS_UPROBES can be wrong */
-#define MMF_OOM_SKIP 21 /* mm is of no interest for the OOM killer */
-#define MMF_UNSTABLE 22 /* mm is unstable for copy_from_user */
-#define MMF_HUGE_ZERO_PAGE 23 /* mm has ever used the global huge zero page */
-#define MMF_DISABLE_THP 24 /* disable THP for all VMAs */
-#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
-#define MMF_OOM_REAP_QUEUED 25 /* mm was queued for oom_reaper */
-#define MMF_MULTIPROCESS 26 /* mm is shared between processes */
-/*
- * MMF_HAS_PINNED: Whether this mm has pinned any pages. This can be either
- * replaced in the future by mm.pinned_vm when it becomes stable, or grow into
- * a counter on its own. We're aggresive on this bit for now: even if the
- * pinned pages were unpinned later on, we'll still keep this bit set for the
- * lifecycle of this mm, just for simplicity.
- */
-#define MMF_HAS_PINNED 27 /* FOLL_PIN has run, never cleared */
-
-#define MMF_HAS_MDWE 28
-#define MMF_HAS_MDWE_MASK (1 << MMF_HAS_MDWE)
-
-
-#define MMF_HAS_MDWE_NO_INHERIT 29
-
-#define MMF_VM_MERGE_ANY 30
-#define MMF_VM_MERGE_ANY_MASK (1 << MMF_VM_MERGE_ANY)
-
-#define MMF_TOPDOWN 31 /* mm searches top down by default */
-#define MMF_TOPDOWN_MASK (1 << MMF_TOPDOWN)
-
-#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
- MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\
- MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK)
-
-static inline unsigned long mmf_init_flags(unsigned long flags)
-{
- if (flags & (1UL << MMF_HAS_MDWE_NO_INHERIT))
- flags &= ~((1UL << MMF_HAS_MDWE) |
- (1UL << MMF_HAS_MDWE_NO_INHERIT));
- return flags & MMF_INIT_MASK;
-}
-
#endif /* _LINUX_SCHED_COREDUMP_H */
diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index e7aec20fb44f..3030d9245f5a 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -34,6 +34,12 @@ static inline int set_direct_map_default_noflush(struct page *page)
return 0;
}
+static inline int set_direct_map_valid_noflush(struct page *page,
+ unsigned nr, bool valid)
+{
+ return 0;
+}
+
static inline bool kernel_page_present(struct page *page)
{
return true;
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 018da28c01e7..0b273a7b9f01 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -114,6 +114,7 @@ int shmem_unuse(unsigned int type);
unsigned long shmem_allowable_huge_orders(struct inode *inode,
struct vm_area_struct *vma, pgoff_t index,
loff_t write_end, bool shmem_huge_force);
+bool shmem_hpage_pmd_enabled(void);
#else
static inline unsigned long shmem_allowable_huge_orders(struct inode *inode,
struct vm_area_struct *vma, pgoff_t index,
@@ -121,6 +122,11 @@ static inline unsigned long shmem_allowable_huge_orders(struct inode *inode,
{
return 0;
}
+
+static inline bool shmem_hpage_pmd_enabled(void)
+{
+ return false;
+}
#endif
#ifdef CONFIG_SHMEM
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index cb468e418ea1..96f26e29fefe 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -426,9 +426,19 @@ typedef unsigned long pte_marker;
* "Poisoned" here is meant in the very general sense of "future accesses are
* invalid", instead of referring very specifically to hardware memory errors.
* This marker is meant to represent any of various different causes of this.
+ *
+ * Note that, when encountered by the faulting logic, PTEs with this marker will
+ * result in VM_FAULT_HWPOISON and thus regardless trigger hardware memory error
+ * logic.
*/
#define PTE_MARKER_POISONED BIT(1)
-#define PTE_MARKER_MASK (BIT(2) - 1)
+/*
+ * Indicates that, on fault, this PTE will case a SIGSEGV signal to be
+ * sent. This means guard markers behave in effect as if the region were mapped
+ * PROT_NONE, rather than if they were a memory hole or equivalent.
+ */
+#define PTE_MARKER_GUARD BIT(2)
+#define PTE_MARKER_MASK (BIT(3) - 1)
static inline swp_entry_t make_pte_marker_entry(pte_marker marker)
{
@@ -464,6 +474,18 @@ static inline int is_poisoned_swp_entry(swp_entry_t entry)
{
return is_pte_marker_entry(entry) &&
(pte_marker_get(entry) & PTE_MARKER_POISONED);
+
+}
+
+static inline swp_entry_t make_guard_swp_entry(void)
+{
+ return make_pte_marker_entry(PTE_MARKER_GUARD);
+}
+
+static inline int is_guard_swp_entry(swp_entry_t entry)
+{
+ return is_pte_marker_entry(entry) &&
+ (pte_marker_get(entry) & PTE_MARKER_GUARD);
}
/*
diff --git a/include/linux/text-patching.h b/include/linux/text-patching.h
new file mode 100644
index 000000000000..ad5877ab0855
--- /dev/null
+++ b/include/linux/text-patching.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_TEXT_PATCHING_H
+#define _LINUX_TEXT_PATCHING_H
+
+#include <asm/text-patching.h>
+
+#ifndef text_poke_copy
+static inline void *text_poke_copy(void *dst, const void *src, size_t len)
+{
+ return memcpy(dst, src, len);
+}
+#define text_poke_copy text_poke_copy
+#endif
+
+#endif /* _LINUX_TEXT_PATCHING_H */
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index ad2ce7a6ab7a..31e9ffd936e3 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -134,12 +134,6 @@ extern void vm_unmap_ram(const void *mem, unsigned int count);
extern void *vm_map_ram(struct page **pages, unsigned int count, int node);
extern void vm_unmap_aliases(void);
-#ifdef CONFIG_MMU
-extern unsigned long vmalloc_nr_pages(void);
-#else
-static inline unsigned long vmalloc_nr_pages(void) { return 0; }
-#endif
-
extern void *vmalloc_noprof(unsigned long size) __alloc_size(1);
#define vmalloc(...) alloc_hooks(vmalloc_noprof(__VA_ARGS__))
@@ -208,6 +202,9 @@ extern int remap_vmalloc_range_partial(struct vm_area_struct *vma,
extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
unsigned long pgoff);
+int vmap_pages_range(unsigned long addr, unsigned long end, pgprot_t prot,
+ struct page **pages, unsigned int page_shift);
+
/*
* Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values
* and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings()
@@ -266,12 +263,29 @@ static inline bool is_vm_area_hugepages(const void *addr)
#endif
}
+/* for /proc/kcore */
+long vread_iter(struct iov_iter *iter, const char *addr, size_t count);
+
+/*
+ * Internals. Don't use..
+ */
+__init void vm_area_add_early(struct vm_struct *vm);
+__init void vm_area_register_early(struct vm_struct *vm, size_t align);
+
+int register_vmap_purge_notifier(struct notifier_block *nb);
+int unregister_vmap_purge_notifier(struct notifier_block *nb);
+
#ifdef CONFIG_MMU
+#define VMALLOC_TOTAL (VMALLOC_END - VMALLOC_START)
+
+unsigned long vmalloc_nr_pages(void);
+
int vm_area_map_pages(struct vm_struct *area, unsigned long start,
unsigned long end, struct page **pages);
void vm_area_unmap_pages(struct vm_struct *area, unsigned long start,
unsigned long end);
void vunmap_range(unsigned long addr, unsigned long end);
+
static inline void set_vm_flush_reset_perms(void *addr)
{
struct vm_struct *vm = find_vm_area(addr);
@@ -279,24 +293,14 @@ static inline void set_vm_flush_reset_perms(void *addr)
if (vm)
vm->flags |= VM_FLUSH_RESET_PERMS;
}
+#else /* !CONFIG_MMU */
+#define VMALLOC_TOTAL 0UL
-#else
-static inline void set_vm_flush_reset_perms(void *addr)
-{
-}
-#endif
-
-/* for /proc/kcore */
-extern long vread_iter(struct iov_iter *iter, const char *addr, size_t count);
-
-/*
- * Internals. Don't use..
- */
-extern __init void vm_area_add_early(struct vm_struct *vm);
-extern __init void vm_area_register_early(struct vm_struct *vm, size_t align);
+static inline unsigned long vmalloc_nr_pages(void) { return 0; }
+static inline void set_vm_flush_reset_perms(void *addr) {}
+#endif /* CONFIG_MMU */
-#ifdef CONFIG_SMP
-# ifdef CONFIG_MMU
+#if defined(CONFIG_MMU) && defined(CONFIG_SMP)
struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
const size_t *sizes, int nr_vms,
size_t align);
@@ -311,22 +315,9 @@ pcpu_get_vm_areas(const unsigned long *offsets,
return NULL;
}
-static inline void
-pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
-{
-}
-# endif
-#endif
-
-#ifdef CONFIG_MMU
-#define VMALLOC_TOTAL (VMALLOC_END - VMALLOC_START)
-#else
-#define VMALLOC_TOTAL 0UL
+static inline void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) {}
#endif
-int register_vmap_purge_notifier(struct notifier_block *nb);
-int unregister_vmap_purge_notifier(struct notifier_block *nb);
-
#if defined(CONFIG_MMU) && defined(CONFIG_PRINTK)
bool vmalloc_dump_obj(void *object);
#else
diff --git a/include/linux/zswap.h b/include/linux/zswap.h
index 9cd1beef0654..d961ead91bf1 100644
--- a/include/linux/zswap.h
+++ b/include/linux/zswap.h
@@ -7,7 +7,7 @@
struct lruvec;
-extern atomic_t zswap_stored_pages;
+extern atomic_long_t zswap_stored_pages;
#ifdef CONFIG_ZSWAP
diff --git a/include/trace/events/memcg.h b/include/trace/events/memcg.h
new file mode 100644
index 000000000000..dfe2f51019b4
--- /dev/null
+++ b/include/trace/events/memcg.h
@@ -0,0 +1,106 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM memcg
+
+#if !defined(_TRACE_MEMCG_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_MEMCG_H
+
+#include <linux/memcontrol.h>
+#include <linux/tracepoint.h>
+
+
+DECLARE_EVENT_CLASS(memcg_rstat_stats,
+
+ TP_PROTO(struct mem_cgroup *memcg, int item, int val),
+
+ TP_ARGS(memcg, item, val),
+
+ TP_STRUCT__entry(
+ __field(u64, id)
+ __field(int, item)
+ __field(int, val)
+ ),
+
+ TP_fast_assign(
+ __entry->id = cgroup_id(memcg->css.cgroup);
+ __entry->item = item;
+ __entry->val = val;
+ ),
+
+ TP_printk("memcg_id=%llu item=%d val=%d",
+ __entry->id, __entry->item, __entry->val)
+);
+
+DEFINE_EVENT(memcg_rstat_stats, mod_memcg_state,
+
+ TP_PROTO(struct mem_cgroup *memcg, int item, int val),
+
+ TP_ARGS(memcg, item, val)
+);
+
+DEFINE_EVENT(memcg_rstat_stats, mod_memcg_lruvec_state,
+
+ TP_PROTO(struct mem_cgroup *memcg, int item, int val),
+
+ TP_ARGS(memcg, item, val)
+);
+
+DECLARE_EVENT_CLASS(memcg_rstat_events,
+
+ TP_PROTO(struct mem_cgroup *memcg, int item, unsigned long val),
+
+ TP_ARGS(memcg, item, val),
+
+ TP_STRUCT__entry(
+ __field(u64, id)
+ __field(int, item)
+ __field(unsigned long, val)
+ ),
+
+ TP_fast_assign(
+ __entry->id = cgroup_id(memcg->css.cgroup);
+ __entry->item = item;
+ __entry->val = val;
+ ),
+
+ TP_printk("memcg_id=%llu item=%d val=%lu",
+ __entry->id, __entry->item, __entry->val)
+);
+
+DEFINE_EVENT(memcg_rstat_events, count_memcg_events,
+
+ TP_PROTO(struct mem_cgroup *memcg, int item, unsigned long val),
+
+ TP_ARGS(memcg, item, val)
+);
+
+TRACE_EVENT(memcg_flush_stats,
+
+ TP_PROTO(struct mem_cgroup *memcg, s64 stats_updates,
+ bool force, bool needs_flush),
+
+ TP_ARGS(memcg, stats_updates, force, needs_flush),
+
+ TP_STRUCT__entry(
+ __field(u64, id)
+ __field(s64, stats_updates)
+ __field(bool, force)
+ __field(bool, needs_flush)
+ ),
+
+ TP_fast_assign(
+ __entry->id = cgroup_id(memcg->css.cgroup);
+ __entry->stats_updates = stats_updates;
+ __entry->force = force;
+ __entry->needs_flush = needs_flush;
+ ),
+
+ TP_printk("memcg_id=%llu stats_updates=%lld force=%d needs_flush=%d",
+ __entry->id, __entry->stats_updates,
+ __entry->force, __entry->needs_flush)
+);
+
+#endif /* _TRACE_MEMCG_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/trace/events/mmap_lock.h b/include/trace/events/mmap_lock.h
index f2827f98a44f..bc2e3ad787b3 100644
--- a/include/trace/events/mmap_lock.h
+++ b/include/trace/events/mmap_lock.h
@@ -10,9 +10,6 @@
struct mm_struct;
-extern int trace_mmap_lock_reg(void);
-extern void trace_mmap_lock_unreg(void);
-
DECLARE_EVENT_CLASS(mmap_lock,
TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write),
@@ -40,16 +37,15 @@ DECLARE_EVENT_CLASS(mmap_lock,
);
#define DEFINE_MMAP_LOCK_EVENT(name) \
- DEFINE_EVENT_FN(mmap_lock, name, \
+ DEFINE_EVENT(mmap_lock, name, \
TP_PROTO(struct mm_struct *mm, const char *memcg_path, \
bool write), \
- TP_ARGS(mm, memcg_path, write), \
- trace_mmap_lock_reg, trace_mmap_lock_unreg)
+ TP_ARGS(mm, memcg_path, write))
DEFINE_MMAP_LOCK_EVENT(mmap_lock_start_locking);
DEFINE_MMAP_LOCK_EVENT(mmap_lock_released);
-TRACE_EVENT_FN(mmap_lock_acquire_returned,
+TRACE_EVENT(mmap_lock_acquire_returned,
TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write,
bool success),
@@ -76,9 +72,7 @@ TRACE_EVENT_FN(mmap_lock_acquire_returned,
__get_str(memcg_path),
__entry->write ? "true" : "false",
__entry->success ? "true" : "false"
- ),
-
- trace_mmap_lock_reg, trace_mmap_lock_unreg
+ )
);
#endif /* _TRACE_MMAP_LOCK_H */
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 1a488c30afa5..490958fa10de 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -346,6 +346,51 @@ TRACE_EVENT(mm_vmscan_write_folio,
show_reclaim_flags(__entry->reclaim_flags))
);
+TRACE_EVENT(mm_vmscan_reclaim_pages,
+
+ TP_PROTO(int nid,
+ unsigned long nr_scanned, unsigned long nr_reclaimed,
+ struct reclaim_stat *stat),
+
+ TP_ARGS(nid, nr_scanned, nr_reclaimed, stat),
+
+ TP_STRUCT__entry(
+ __field(int, nid)
+ __field(unsigned long, nr_scanned)
+ __field(unsigned long, nr_reclaimed)
+ __field(unsigned long, nr_dirty)
+ __field(unsigned long, nr_writeback)
+ __field(unsigned long, nr_congested)
+ __field(unsigned long, nr_immediate)
+ __field(unsigned int, nr_activate0)
+ __field(unsigned int, nr_activate1)
+ __field(unsigned long, nr_ref_keep)
+ __field(unsigned long, nr_unmap_fail)
+ ),
+
+ TP_fast_assign(
+ __entry->nid = nid;
+ __entry->nr_scanned = nr_scanned;
+ __entry->nr_reclaimed = nr_reclaimed;
+ __entry->nr_dirty = stat->nr_dirty;
+ __entry->nr_writeback = stat->nr_writeback;
+ __entry->nr_congested = stat->nr_congested;
+ __entry->nr_immediate = stat->nr_immediate;
+ __entry->nr_activate0 = stat->nr_activate[0];
+ __entry->nr_activate1 = stat->nr_activate[1];
+ __entry->nr_ref_keep = stat->nr_ref_keep;
+ __entry->nr_unmap_fail = stat->nr_unmap_fail;
+ ),
+
+ TP_printk("nid=%d nr_scanned=%ld nr_reclaimed=%ld nr_dirty=%ld nr_writeback=%ld nr_congested=%ld nr_immediate=%ld nr_activate_anon=%d nr_activate_file=%d nr_ref_keep=%ld nr_unmap_fail=%ld",
+ __entry->nid,
+ __entry->nr_scanned, __entry->nr_reclaimed,
+ __entry->nr_dirty, __entry->nr_writeback,
+ __entry->nr_congested, __entry->nr_immediate,
+ __entry->nr_activate0, __entry->nr_activate1,
+ __entry->nr_ref_keep, __entry->nr_unmap_fail)
+);
+
TRACE_EVENT(mm_vmscan_lru_shrink_inactive,
TP_PROTO(int nid,
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 6ce1f1ceb432..1ea2c4c33b86 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -79,6 +79,9 @@
#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */
+#define MADV_GUARD_INSTALL 102 /* fatal signal on access to range */
+#define MADV_GUARD_REMOVE 103 /* unguard range */
+
/* compatibility flags */
#define MAP_FILE 0