Merge tag 'docs-6.4' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet: "Commit volume in documentation is relatively low this time, but there is still a fair amount going on, including: - Reorganize the architecture-specific documentation under Documentation/arch This makes the structure match the source directory and helps to clean up the mess that is the top-level Documentation directory a bit. This work creates the new directory and moves x86 and most of the less-active architectures there. The current plan is to move the rest of the architectures in 6.5, with the patches going through the appropriate subsystem trees. - Some more Spanish translations and maintenance of the Italian translation - A new "Kernel contribution maturity model" document from Ted - A new tutorial on quickly building a trimmed kernel from Thorsten Plus the usual set of updates and fixes" * tag 'docs-6.4' of git://git.lwn.net/linux: (47 commits) media: Adjust column width for pdfdocs media: Fix building pdfdocs docs: clk: add documentation to log which clocks have been disabled docs: trace: Fix typo in ftrace.rst Documentation/process: always CC responsible lists docs: kmemleak: adjust to config renaming ELF: document some de-facto PT_* ABI quirks Documentation: arm: remove stih415/stih416 related entries docs: turn off "smart quotes" in the HTML build Documentation: firmware: Clarify firmware path usage docs/mm: Physical Memory: Fix grammar Documentation: Add document for false sharing dma-api-howto: typo fix docs: move m68k architecture documentation under Documentation/arch/ docs: move parisc documentation under Documentation/arch/ docs: move ia64 architecture docs under Documentation/arch/ docs: Move arc architecture docs under Documentation/arch/ docs: move nios2 documentation under Documentation/arch/ docs: move openrisc documentation under Documentation/arch/ docs: move superh documentation under Documentation/arch/ ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2023-04-24 21:35:49 +0200
committer: Linus Torvalds <torvalds@linux-foundation.org> 2023-04-24 21:35:49 +0200
commit: c23f28975abc2eb02cecc8bc1f2c95473a59ed2e (patch)
tree: d356762219c0d3079d37f1cc3ab3eef5cd37567a /Documentation/arch/x86/tlb.rst
parent: Merge tag 'linux-kselftest-kunit-6.4-rc1' of git://git.kernel.org/pub/scm/lin... (diff)
parent: media: Adjust column width for pdfdocs (diff)
download: linux-c23f28975abc2eb02cecc8bc1f2c95473a59ed2e.tar.xz
linux-c23f28975abc2eb02cecc8bc1f2c95473a59ed2e.zip
1 files changed, 83 insertions, 0 deletions
diff --git a/Documentation/arch/x86/tlb.rst b/Documentation/arch/x86/tlb.rst
new file mode 100644
index 000000000000..82ec58ae63a8
--- /dev/null
+++ b/Documentation/arch/x86/tlb.rst
@@ -0,0 +1,83 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======
+The TLB
+=======
+
+When the kernel unmaps or modified the attributes of a range of
+memory, it has two choices:
+
+ 1. Flush the entire TLB with a two-instruction sequence.  This is
+    a quick operation, but it causes collateral damage: TLB entries
+    from areas other than the one we are trying to flush will be
+    destroyed and must be refilled later, at some cost.
+ 2. Use the invlpg instruction to invalidate a single page at a
+    time.  This could potentially cost many more instructions, but
+    it is a much more precise operation, causing no collateral
+    damage to other TLB entries.
+
+Which method to do depends on a few things:
+
+ 1. The size of the flush being performed.  A flush of the entire
+    address space is obviously better performed by flushing the
+    entire TLB than doing 2^48/PAGE_SIZE individual flushes.
+ 2. The contents of the TLB.  If the TLB is empty, then there will
+    be no collateral damage caused by doing the global flush, and
+    all of the individual flush will have ended up being wasted
+    work.
+ 3. The size of the TLB.  The larger the TLB, the more collateral
+    damage we do with a full flush.  So, the larger the TLB, the
+    more attractive an individual flush looks.  Data and
+    instructions have separate TLBs, as do different page sizes.
+ 4. The microarchitecture.  The TLB has become a multi-level
+    cache on modern CPUs, and the global flushes have become more
+    expensive relative to single-page flushes.
+
+There is obviously no way the kernel can know all these things,
+especially the contents of the TLB during a given flush.  The
+sizes of the flush will vary greatly depending on the workload as
+well.  There is essentially no "right" point to choose.
+
+You may be doing too many individual invalidations if you see the
+invlpg instruction (or instructions _near_ it) show up high in
+profiles.  If you believe that individual invalidations being
+called too often, you can lower the tunable::
+
+	/sys/kernel/debug/x86/tlb_single_page_flush_ceiling
+
+This will cause us to do the global flush for more cases.
+Lowering it to 0 will disable the use of the individual flushes.
+Setting it to 1 is a very conservative setting and it should
+never need to be 0 under normal circumstances.
+
+Despite the fact that a single individual flush on x86 is
+guaranteed to flush a full 2MB [1]_, hugetlbfs always uses the full
+flushes.  THP is treated exactly the same as normal memory.
+
+You might see invlpg inside of flush_tlb_mm_range() show up in
+profiles, or you can use the trace_tlb_flush() tracepoints. to
+determine how long the flush operations are taking.
+
+Essentially, you are balancing the cycles you spend doing invlpg
+with the cycles that you spend refilling the TLB later.
+
+You can measure how expensive TLB refills are by using
+performance counters and 'perf stat', like this::
+
+  perf stat -e
+    cpu/event=0x8,umask=0x84,name=dtlb_load_misses_walk_duration/,
+    cpu/event=0x8,umask=0x82,name=dtlb_load_misses_walk_completed/,
+    cpu/event=0x49,umask=0x4,name=dtlb_store_misses_walk_duration/,
+    cpu/event=0x49,umask=0x2,name=dtlb_store_misses_walk_completed/,
+    cpu/event=0x85,umask=0x4,name=itlb_misses_walk_duration/,
+    cpu/event=0x85,umask=0x2,name=itlb_misses_walk_completed/
+
+That works on an IvyBridge-era CPU (i5-3320M).  Different CPUs
+may have differently-named counters, but they should at least
+be there in some form.  You can use pmu-tools 'ocperf list'
+(https://github.com/andikleen/pmu-tools) to find the right
+counters for a given CPU.
+
+.. [1] A footnote in Intel's SDM "4.10.4.2 Recommended Invalidation"
+   says: "One execution of INVLPG is sufficient even for a page
+   with size greater than 4 KBytes."
author	Linus Torvalds <torvalds@linux-foundation.org>	2023-04-24 21:35:49 +0200
committer	Linus Torvalds <torvalds@linux-foundation.org>	2023-04-24 21:35:49 +0200
commit	c23f28975abc2eb02cecc8bc1f2c95473a59ed2e (patch)
tree	d356762219c0d3079d37f1cc3ab3eef5cd37567a /Documentation/arch/x86/tlb.rst
parent	Merge tag 'linux-kselftest-kunit-6.4-rc1' of git://git.kernel.org/pub/scm/lin... (diff)
parent	media: Adjust column width for pdfdocs (diff)
download	linux-c23f28975abc2eb02cecc8bc1f2c95473a59ed2e.tar.xz linux-c23f28975abc2eb02cecc8bc1f2c95473a59ed2e.zip