diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-05-26 21:32:41 +0200 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-05-26 21:32:41 +0200 |
commit | 98931dd95fd489fcbfa97da563505a6f071d7c77 (patch) | |
tree | 44683fc4a92efa614acdca2742a7ff19d26da1e3 /Documentation/filesystems | |
parent | Merge tag 'kbuild-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mas... (diff) | |
parent | mm: kfence: use PAGE_ALIGNED helper (diff) | |
download | linux-98931dd95fd489fcbfa97da563505a6f071d7c77.tar.xz linux-98931dd95fd489fcbfa97da563505a6f071d7c77.zip |
Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Almost all of MM here. A few things are still getting finished off,
reviewed, etc.
- Yang Shi has improved the behaviour of khugepaged collapsing of
readonly file-backed transparent hugepages.
- Johannes Weiner has arranged for zswap memory use to be tracked and
managed on a per-cgroup basis.
- Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for
runtime enablement of the recent huge page vmemmap optimization
feature.
- Baolin Wang contributes a series to fix some issues around hugetlb
pagetable invalidation.
- Zhenwei Pi has fixed some interactions between hwpoisoned pages and
virtualization.
- Tong Tiangen has enabled the use of the presently x86-only
page_table_check debugging feature on arm64 and riscv.
- David Vernet has done some fixup work on the memcg selftests.
- Peter Xu has taught userfaultfd to handle write protection faults
against shmem- and hugetlbfs-backed files.
- More DAMON development from SeongJae Park - adding online tuning of
the feature and support for monitoring of fixed virtual address
ranges. Also easier discovery of which monitoring operations are
available.
- Nadav Amit has done some optimization of TLB flushing during
mprotect().
- Neil Brown continues to labor away at improving our swap-over-NFS
support.
- David Hildenbrand has some fixes to anon page COWing versus
get_user_pages().
- Peng Liu fixed some errors in the core hugetlb code.
- Joao Martins has reduced the amount of memory consumed by
device-dax's compound devmaps.
- Some cleanups of the arch-specific pagemap code from Anshuman
Khandual.
- Muchun Song has found and fixed some errors in the TLB flushing of
transparent hugepages.
- Roman Gushchin has done more work on the memcg selftests.
... and, of course, many smaller fixes and cleanups. Notably, the
customary million cleanup serieses from Miaohe Lin"
* tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits)
mm: kfence: use PAGE_ALIGNED helper
selftests: vm: add the "settings" file with timeout variable
selftests: vm: add "test_hmm.sh" to TEST_FILES
selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests
selftests: vm: add migration to the .gitignore
selftests/vm/pkeys: fix typo in comment
ksm: fix typo in comment
selftests: vm: add process_mrelease tests
Revert "mm/vmscan: never demote for memcg reclaim"
mm/kfence: print disabling or re-enabling message
include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace"
include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion"
mm: fix a potential infinite loop in start_isolate_page_range()
MAINTAINERS: add Muchun as co-maintainer for HugeTLB
zram: fix Kconfig dependency warning
mm/shmem: fix shmem folio swapoff hang
cgroup: fix an error handling path in alloc_pagecache_max_30M()
mm: damon: use HPAGE_PMD_SIZE
tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
nodemask.h: fix compilation error with GCC12
...
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/locking.rst | 18 | ||||
-rw-r--r-- | Documentation/filesystems/proc.rst | 154 | ||||
-rw-r--r-- | Documentation/filesystems/vfs.rst | 17 |
3 files changed, 122 insertions, 67 deletions
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 515bc48ab58b..d1bf77ef3bc1 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -258,8 +258,9 @@ prototypes:: int (*launder_folio)(struct folio *); bool (*is_partially_uptodate)(struct folio *, size_t from, size_t count); int (*error_remove_page)(struct address_space *, struct page *); - int (*swap_activate)(struct file *); + int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span) int (*swap_deactivate)(struct file *); + int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); locking rules: All except dirty_folio and free_folio may block @@ -287,6 +288,7 @@ is_partially_uptodate: yes error_remove_page: yes swap_activate: no swap_deactivate: no +swap_rw: yes, unlocks ====================== ======================== ========= =============== ->write_begin(), ->write_end() and ->read_folio() may be called from @@ -386,15 +388,19 @@ cleaned, or an error value if not. Note that in order to prevent the folio getting mapped back in and redirtied, it needs to be kept locked across the entire operation. -->swap_activate will be called with a non-zero argument on -files backing (non block device backed) swapfiles. A return value -of zero indicates success, in which case this file can be used for -backing swapspace. The swapspace operations will be proxied to the -address space operations. +->swap_activate() will be called to prepare the given file for swap. It +should perform any validation and preparation necessary to ensure that +writes can be performed with minimal memory allocation. It should call +add_swap_extent(), or the helper iomap_swapfile_activate(), and return +the number of extents added. If IO should be submitted through +->swap_rw(), it should set SWP_FS_OPS, otherwise IO will be submitted +directly to the block device ``sis->bdev``. ->swap_deactivate() will be called in the sys_swapoff() path after ->swap_activate() returned success. +->swap_rw will be called for swap IO if SWP_FS_OPS was set by ->swap_activate(). + file_lock_operations ==================== diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 6a0dd99786f9..1bc91fb8c321 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -942,56 +942,73 @@ can be substantial. In many cases there are other means to find out additional memory using subsystem specific interfaces, for instance /proc/net/sockstat for TCP memory allocations. -The following is from a 16GB PIII, which has highmem enabled. -You may not have all of these fields. +Example output. You may not have all of these fields. :: > cat /proc/meminfo - MemTotal: 16344972 kB - MemFree: 13634064 kB - MemAvailable: 14836172 kB - Buffers: 3656 kB - Cached: 1195708 kB - SwapCached: 0 kB - Active: 891636 kB - Inactive: 1077224 kB - HighTotal: 15597528 kB - HighFree: 13629632 kB - LowTotal: 747444 kB - LowFree: 4432 kB - SwapTotal: 0 kB - SwapFree: 0 kB - Dirty: 968 kB - Writeback: 0 kB - AnonPages: 861800 kB - Mapped: 280372 kB - Shmem: 644 kB - KReclaimable: 168048 kB - Slab: 284364 kB - SReclaimable: 159856 kB - SUnreclaim: 124508 kB - PageTables: 24448 kB - NFS_Unstable: 0 kB - Bounce: 0 kB - WritebackTmp: 0 kB - CommitLimit: 7669796 kB - Committed_AS: 100056 kB - VmallocTotal: 112216 kB - VmallocUsed: 428 kB - VmallocChunk: 111088 kB - Percpu: 62080 kB - HardwareCorrupted: 0 kB - AnonHugePages: 49152 kB - ShmemHugePages: 0 kB - ShmemPmdMapped: 0 kB + MemTotal: 32858820 kB + MemFree: 21001236 kB + MemAvailable: 27214312 kB + Buffers: 581092 kB + Cached: 5587612 kB + SwapCached: 0 kB + Active: 3237152 kB + Inactive: 7586256 kB + Active(anon): 94064 kB + Inactive(anon): 4570616 kB + Active(file): 3143088 kB + Inactive(file): 3015640 kB + Unevictable: 0 kB + Mlocked: 0 kB + SwapTotal: 0 kB + SwapFree: 0 kB + Zswap: 1904 kB + Zswapped: 7792 kB + Dirty: 12 kB + Writeback: 0 kB + AnonPages: 4654780 kB + Mapped: 266244 kB + Shmem: 9976 kB + KReclaimable: 517708 kB + Slab: 660044 kB + SReclaimable: 517708 kB + SUnreclaim: 142336 kB + KernelStack: 11168 kB + PageTables: 20540 kB + NFS_Unstable: 0 kB + Bounce: 0 kB + WritebackTmp: 0 kB + CommitLimit: 16429408 kB + Committed_AS: 7715148 kB + VmallocTotal: 34359738367 kB + VmallocUsed: 40444 kB + VmallocChunk: 0 kB + Percpu: 29312 kB + HardwareCorrupted: 0 kB + AnonHugePages: 4149248 kB + ShmemHugePages: 0 kB + ShmemPmdMapped: 0 kB + FileHugePages: 0 kB + FilePmdMapped: 0 kB + CmaTotal: 0 kB + CmaFree: 0 kB + HugePages_Total: 0 + HugePages_Free: 0 + HugePages_Rsvd: 0 + HugePages_Surp: 0 + Hugepagesize: 2048 kB + Hugetlb: 0 kB + DirectMap4k: 401152 kB + DirectMap2M: 10008576 kB + DirectMap1G: 24117248 kB MemTotal Total usable RAM (i.e. physical RAM minus a few reserved bits and the kernel binary code) MemFree - The sum of LowFree+HighFree + Total free RAM. On highmem systems, the sum of LowFree+HighFree MemAvailable An estimate of how much memory is available for starting new applications, without swapping. Calculated from MemFree, @@ -1005,8 +1022,9 @@ Buffers Relatively temporary storage for raw disk blocks shouldn't get tremendously large (20MB or so) Cached - in-memory cache for files read from the disk (the - pagecache). Doesn't include SwapCached + In-memory cache for files read from the disk (the + pagecache) as well as tmpfs & shmem. + Doesn't include SwapCached. SwapCached Memory that once was swapped out, is swapped back in but still also is in the swapfile (if memory is needed it @@ -1018,6 +1036,11 @@ Active Inactive Memory which has been less recently used. It is more eligible to be reclaimed for other purposes +Unevictable + Memory allocated for userspace which cannot be reclaimed, such + as mlocked pages, ramfs backing pages, secret memfd pages etc. +Mlocked + Memory locked with mlock(). HighTotal, HighFree Highmem is all memory above ~860MB of physical memory. Highmem areas are for use by userspace programs, or @@ -1034,26 +1057,20 @@ SwapTotal SwapFree Memory which has been evicted from RAM, and is temporarily on the disk +Zswap + Memory consumed by the zswap backend (compressed size) +Zswapped + Amount of anonymous memory stored in zswap (original size) Dirty Memory which is waiting to get written back to the disk Writeback Memory which is actively being written back to the disk AnonPages Non-file backed pages mapped into userspace page tables -HardwareCorrupted - The amount of RAM/memory in KB, the kernel identifies as - corrupted. -AnonHugePages - Non-file backed huge pages mapped into userspace page tables Mapped files which have been mmaped, such as libraries Shmem Total memory used by shared memory (shmem) and tmpfs -ShmemHugePages - Memory used by shared memory (shmem) and tmpfs allocated - with huge pages -ShmemPmdMapped - Shared memory mapped into userspace with huge pages KReclaimable Kernel allocations that the kernel will attempt to reclaim under memory pressure. Includes SReclaimable (below), and other @@ -1064,9 +1081,10 @@ SReclaimable Part of Slab, that might be reclaimed, such as caches SUnreclaim Part of Slab, that cannot be reclaimed on memory pressure +KernelStack + Memory consumed by the kernel stacks of all tasks PageTables - amount of memory dedicated to the lowest level of page - tables. + Memory consumed by userspace page tables NFS_Unstable Always zero. Previous counted pages which had been written to the server, but has not been committed to stable storage. @@ -1098,7 +1116,7 @@ Committed_AS has been allocated by processes, even if it has not been "used" by them as of yet. A process which malloc()'s 1G of memory, but only touches 300M of it will show up as - using 1G. This 1G is memory which has been "committed" to + using 1G. This 1G is memory which has been "committed" to by the VM and can be used at any time by the allocating application. With strict overcommit enabled on the system (mode 2 in 'vm.overcommit_memory'), allocations which would @@ -1107,7 +1125,7 @@ Committed_AS not fail due to lack of memory once that memory has been successfully allocated. VmallocTotal - total size of vmalloc memory area + total size of vmalloc virtual address space VmallocUsed amount of vmalloc area which is used VmallocChunk @@ -1115,6 +1133,30 @@ VmallocChunk Percpu Memory allocated to the percpu allocator used to back percpu allocations. This stat excludes the cost of metadata. +HardwareCorrupted + The amount of RAM/memory in KB, the kernel identifies as + corrupted. +AnonHugePages + Non-file backed huge pages mapped into userspace page tables +ShmemHugePages + Memory used by shared memory (shmem) and tmpfs allocated + with huge pages +ShmemPmdMapped + Shared memory mapped into userspace with huge pages +FileHugePages + Memory used for filesystem data (page cache) allocated + with huge pages +FilePmdMapped + Page cache mapped into userspace with huge pages +CmaTotal + Memory reserved for the Contiguous Memory Allocator (CMA) +CmaFree + Free remaining memory in the CMA reserves +HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, Hugetlb + See Documentation/admin-guide/mm/hugetlbpage.rst. +DirectMap4k, DirectMap2M, DirectMap1G + Breakdown of page table sizes used in the kernel's + identity mapping of RAM vmallocinfo ~~~~~~~~~~~ diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 12a011d2cbc6..08069ecd49a6 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -749,8 +749,9 @@ cache in your filesystem. The following members are defined: size_t count); void (*is_dirty_writeback)(struct folio *, bool *, bool *); int (*error_remove_page) (struct mapping *mapping, struct page *page); - int (*swap_activate)(struct file *); + int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span) int (*swap_deactivate)(struct file *); + int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); }; ``writepage`` @@ -948,15 +949,21 @@ cache in your filesystem. The following members are defined: unless you have them locked or reference counts increased. ``swap_activate`` - Called when swapon is used on a file to allocate space if - necessary and pin the block lookup information in memory. A - return value of zero indicates success, in which case this file - can be used to back swapspace. + + Called to prepare the given file for swap. It should perform + any validation and preparation necessary to ensure that writes + can be performed with minimal memory allocation. It should call + add_swap_extent(), or the helper iomap_swapfile_activate(), and + return the number of extents added. If IO should be submitted + through ->swap_rw(), it should set SWP_FS_OPS, otherwise IO will + be submitted directly to the block device ``sis->bdev``. ``swap_deactivate`` Called during swapoff on files where swap_activate was successful. +``swap_rw`` + Called to read or write swap pages when SWP_FS_OPS is set. The File Object =============== |