summaryrefslogtreecommitdiffstats
path: root/Documentation/admin-guide
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r--Documentation/admin-guide/btmrvl.rst124
-rw-r--r--Documentation/admin-guide/clearing-warn-once.rst9
-rw-r--r--Documentation/admin-guide/cpu-load.rst114
-rw-r--r--Documentation/admin-guide/cputopology.rst177
-rw-r--r--Documentation/admin-guide/device-mapper/statistics.rst4
-rw-r--r--Documentation/admin-guide/efi-stub.rst100
-rw-r--r--Documentation/admin-guide/highuid.rst80
-rw-r--r--Documentation/admin-guide/hw-vuln/l1tf.rst2
-rw-r--r--Documentation/admin-guide/hw_random.rst105
-rw-r--r--Documentation/admin-guide/index.rst17
-rw-r--r--Documentation/admin-guide/iostats.rst197
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt2
-rw-r--r--Documentation/admin-guide/kernel-per-CPU-kthreads.rst356
-rw-r--r--Documentation/admin-guide/lcd-panel-cgram.rst27
-rw-r--r--Documentation/admin-guide/ldm.rst121
-rw-r--r--Documentation/admin-guide/lockup-watchdogs.rst83
-rw-r--r--Documentation/admin-guide/mm/cma_debugfs.rst25
-rw-r--r--Documentation/admin-guide/mm/index.rst1
-rw-r--r--Documentation/admin-guide/numastat.rst30
-rw-r--r--Documentation/admin-guide/pnp.rst292
-rw-r--r--Documentation/admin-guide/rtc.rst140
-rw-r--r--Documentation/admin-guide/svga.rst249
-rw-r--r--Documentation/admin-guide/sysctl/kernel.rst2
-rw-r--r--Documentation/admin-guide/video-output.rst34
24 files changed, 2286 insertions, 5 deletions
diff --git a/Documentation/admin-guide/btmrvl.rst b/Documentation/admin-guide/btmrvl.rst
new file mode 100644
index 000000000000..ec57740ead0c
--- /dev/null
+++ b/Documentation/admin-guide/btmrvl.rst
@@ -0,0 +1,124 @@
+=============
+btmrvl driver
+=============
+
+All commands are used via debugfs interface.
+
+Set/get driver configurations
+=============================
+
+Path: /debug/btmrvl/config/
+
+gpiogap=[n], hscfgcmd
+ These commands are used to configure the host sleep parameters::
+ bit 8:0 -- Gap
+ bit 16:8 -- GPIO
+
+ where GPIO is the pin number of GPIO used to wake up the host.
+ It could be any valid GPIO pin# (e.g. 0-7) or 0xff (SDIO interface
+ wakeup will be used instead).
+
+ where Gap is the gap in milli seconds between wakeup signal and
+ wakeup event, or 0xff for special host sleep setting.
+
+ Usage::
+
+ # Use SDIO interface to wake up the host and set GAP to 0x80:
+ echo 0xff80 > /debug/btmrvl/config/gpiogap
+ echo 1 > /debug/btmrvl/config/hscfgcmd
+
+ # Use GPIO pin #3 to wake up the host and set GAP to 0xff:
+ echo 0x03ff > /debug/btmrvl/config/gpiogap
+ echo 1 > /debug/btmrvl/config/hscfgcmd
+
+psmode=[n], pscmd
+ These commands are used to enable/disable auto sleep mode
+
+ where the option is::
+
+ 1 -- Enable auto sleep mode
+ 0 -- Disable auto sleep mode
+
+ Usage::
+
+ # Enable auto sleep mode
+ echo 1 > /debug/btmrvl/config/psmode
+ echo 1 > /debug/btmrvl/config/pscmd
+
+ # Disable auto sleep mode
+ echo 0 > /debug/btmrvl/config/psmode
+ echo 1 > /debug/btmrvl/config/pscmd
+
+
+hsmode=[n], hscmd
+ These commands are used to enable host sleep or wake up firmware
+
+ where the option is::
+
+ 1 -- Enable host sleep
+ 0 -- Wake up firmware
+
+ Usage::
+
+ # Enable host sleep
+ echo 1 > /debug/btmrvl/config/hsmode
+ echo 1 > /debug/btmrvl/config/hscmd
+
+ # Wake up firmware
+ echo 0 > /debug/btmrvl/config/hsmode
+ echo 1 > /debug/btmrvl/config/hscmd
+
+
+Get driver status
+=================
+
+Path: /debug/btmrvl/status/
+
+Usage::
+
+ cat /debug/btmrvl/status/<args>
+
+where the args are:
+
+curpsmode
+ This command displays current auto sleep status.
+
+psstate
+ This command display the power save state.
+
+hsstate
+ This command display the host sleep state.
+
+txdnldrdy
+ This command displays the value of Tx download ready flag.
+
+Issuing a raw hci command
+=========================
+
+Use hcitool to issue raw hci command, refer to hcitool manual
+
+Usage::
+
+ Hcitool cmd <ogf> <ocf> [Parameters]
+
+Interface Control Command::
+
+ hcitool cmd 0x3f 0x5b 0xf5 0x01 0x00 --Enable All interface
+ hcitool cmd 0x3f 0x5b 0xf5 0x01 0x01 --Enable Wlan interface
+ hcitool cmd 0x3f 0x5b 0xf5 0x01 0x02 --Enable BT interface
+ hcitool cmd 0x3f 0x5b 0xf5 0x00 0x00 --Disable All interface
+ hcitool cmd 0x3f 0x5b 0xf5 0x00 0x01 --Disable Wlan interface
+ hcitool cmd 0x3f 0x5b 0xf5 0x00 0x02 --Disable BT interface
+
+SD8688 firmware
+===============
+
+Images:
+
+- /lib/firmware/sd8688_helper.bin
+- /lib/firmware/sd8688.bin
+
+
+The images can be downloaded from:
+
+git.infradead.org/users/dwmw2/linux-firmware.git/libertas/
diff --git a/Documentation/admin-guide/clearing-warn-once.rst b/Documentation/admin-guide/clearing-warn-once.rst
new file mode 100644
index 000000000000..211fd926cf00
--- /dev/null
+++ b/Documentation/admin-guide/clearing-warn-once.rst
@@ -0,0 +1,9 @@
+Clearing WARN_ONCE
+------------------
+
+WARN_ONCE / WARN_ON_ONCE / printk_once only emit a message once.
+
+echo 1 > /sys/kernel/debug/clear_warn_once
+
+clears the state and allows the warnings to print once again.
+This can be useful after test suite runs to reproduce problems.
diff --git a/Documentation/admin-guide/cpu-load.rst b/Documentation/admin-guide/cpu-load.rst
new file mode 100644
index 000000000000..2d01ce43d2a2
--- /dev/null
+++ b/Documentation/admin-guide/cpu-load.rst
@@ -0,0 +1,114 @@
+========
+CPU load
+========
+
+Linux exports various bits of information via ``/proc/stat`` and
+``/proc/uptime`` that userland tools, such as top(1), use to calculate
+the average time system spent in a particular state, for example::
+
+ $ iostat
+ Linux 2.6.18.3-exp (linmac) 02/20/2007
+
+ avg-cpu: %user %nice %system %iowait %steal %idle
+ 10.01 0.00 2.92 5.44 0.00 81.63
+
+ ...
+
+Here the system thinks that over the default sampling period the
+system spent 10.01% of the time doing work in user space, 2.92% in the
+kernel, and was overall 81.63% of the time idle.
+
+In most cases the ``/proc/stat`` information reflects the reality quite
+closely, however due to the nature of how/when the kernel collects
+this data sometimes it can not be trusted at all.
+
+So how is this information collected? Whenever timer interrupt is
+signalled the kernel looks what kind of task was running at this
+moment and increments the counter that corresponds to this tasks
+kind/state. The problem with this is that the system could have
+switched between various states multiple times between two timer
+interrupts yet the counter is incremented only for the last state.
+
+
+Example
+-------
+
+If we imagine the system with one task that periodically burns cycles
+in the following manner::
+
+ time line between two timer interrupts
+ |--------------------------------------|
+ ^ ^
+ |_ something begins working |
+ |_ something goes to sleep
+ (only to be awaken quite soon)
+
+In the above situation the system will be 0% loaded according to the
+``/proc/stat`` (since the timer interrupt will always happen when the
+system is executing the idle handler), but in reality the load is
+closer to 99%.
+
+One can imagine many more situations where this behavior of the kernel
+will lead to quite erratic information inside ``/proc/stat``::
+
+
+ /* gcc -o hog smallhog.c */
+ #include <time.h>
+ #include <limits.h>
+ #include <signal.h>
+ #include <sys/time.h>
+ #define HIST 10
+
+ static volatile sig_atomic_t stop;
+
+ static void sighandler (int signr)
+ {
+ (void) signr;
+ stop = 1;
+ }
+ static unsigned long hog (unsigned long niters)
+ {
+ stop = 0;
+ while (!stop && --niters);
+ return niters;
+ }
+ int main (void)
+ {
+ int i;
+ struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 },
+ .it_value = { .tv_sec = 0, .tv_usec = 1 } };
+ sigset_t set;
+ unsigned long v[HIST];
+ double tmp = 0.0;
+ unsigned long n;
+ signal (SIGALRM, &sighandler);
+ setitimer (ITIMER_REAL, &it, NULL);
+
+ hog (ULONG_MAX);
+ for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX);
+ for (i = 0; i < HIST; ++i) tmp += v[i];
+ tmp /= HIST;
+ n = tmp - (tmp / 3.0);
+
+ sigemptyset (&set);
+ sigaddset (&set, SIGALRM);
+
+ for (;;) {
+ hog (n);
+ sigwait (&set, &i);
+ }
+ return 0;
+ }
+
+
+References
+----------
+
+- http://lkml.org/lkml/2007/2/12/6
+- Documentation/filesystems/proc.txt (1.8)
+
+
+Thanks
+------
+
+Con Kolivas, Pavel Machek
diff --git a/Documentation/admin-guide/cputopology.rst b/Documentation/admin-guide/cputopology.rst
new file mode 100644
index 000000000000..b90dafcc8237
--- /dev/null
+++ b/Documentation/admin-guide/cputopology.rst
@@ -0,0 +1,177 @@
+===========================================
+How CPU topology info is exported via sysfs
+===========================================
+
+Export CPU topology info via sysfs. Items (attributes) are similar
+to /proc/cpuinfo output of some architectures. They reside in
+/sys/devices/system/cpu/cpuX/topology/:
+
+physical_package_id:
+
+ physical package id of cpuX. Typically corresponds to a physical
+ socket number, but the actual value is architecture and platform
+ dependent.
+
+die_id:
+
+ the CPU die ID of cpuX. Typically it is the hardware platform's
+ identifier (rather than the kernel's). The actual value is
+ architecture and platform dependent.
+
+core_id:
+
+ the CPU core ID of cpuX. Typically it is the hardware platform's
+ identifier (rather than the kernel's). The actual value is
+ architecture and platform dependent.
+
+book_id:
+
+ the book ID of cpuX. Typically it is the hardware platform's
+ identifier (rather than the kernel's). The actual value is
+ architecture and platform dependent.
+
+drawer_id:
+
+ the drawer ID of cpuX. Typically it is the hardware platform's
+ identifier (rather than the kernel's). The actual value is
+ architecture and platform dependent.
+
+core_cpus:
+
+ internal kernel map of CPUs within the same core.
+ (deprecated name: "thread_siblings")
+
+core_cpus_list:
+
+ human-readable list of CPUs within the same core.
+ (deprecated name: "thread_siblings_list");
+
+package_cpus:
+
+ internal kernel map of the CPUs sharing the same physical_package_id.
+ (deprecated name: "core_siblings")
+
+package_cpus_list:
+
+ human-readable list of CPUs sharing the same physical_package_id.
+ (deprecated name: "core_siblings_list")
+
+die_cpus:
+
+ internal kernel map of CPUs within the same die.
+
+die_cpus_list:
+
+ human-readable list of CPUs within the same die.
+
+book_siblings:
+
+ internal kernel map of cpuX's hardware threads within the same
+ book_id.
+
+book_siblings_list:
+
+ human-readable list of cpuX's hardware threads within the same
+ book_id.
+
+drawer_siblings:
+
+ internal kernel map of cpuX's hardware threads within the same
+ drawer_id.
+
+drawer_siblings_list:
+
+ human-readable list of cpuX's hardware threads within the same
+ drawer_id.
+
+Architecture-neutral, drivers/base/topology.c, exports these attributes.
+However, the book and drawer related sysfs files will only be created if
+CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively.
+
+CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are currently only used on s390,
+where they reflect the cpu and cache hierarchy.
+
+For an architecture to support this feature, it must define some of
+these macros in include/asm-XXX/topology.h::
+
+ #define topology_physical_package_id(cpu)
+ #define topology_die_id(cpu)
+ #define topology_core_id(cpu)
+ #define topology_book_id(cpu)
+ #define topology_drawer_id(cpu)
+ #define topology_sibling_cpumask(cpu)
+ #define topology_core_cpumask(cpu)
+ #define topology_die_cpumask(cpu)
+ #define topology_book_cpumask(cpu)
+ #define topology_drawer_cpumask(cpu)
+
+The type of ``**_id macros`` is int.
+The type of ``**_cpumask macros`` is ``(const) struct cpumask *``. The latter
+correspond with appropriate ``**_siblings`` sysfs attributes (except for
+topology_sibling_cpumask() which corresponds with thread_siblings).
+
+To be consistent on all architectures, include/linux/topology.h
+provides default definitions for any of the above macros that are
+not defined by include/asm-XXX/topology.h:
+
+1) topology_physical_package_id: -1
+2) topology_die_id: -1
+3) topology_core_id: 0
+4) topology_sibling_cpumask: just the given CPU
+5) topology_core_cpumask: just the given CPU
+6) topology_die_cpumask: just the given CPU
+
+For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
+default definitions for topology_book_id() and topology_book_cpumask().
+For architectures that don't support drawers (CONFIG_SCHED_DRAWER) there are
+no default definitions for topology_drawer_id() and topology_drawer_cpumask().
+
+Additionally, CPU topology information is provided under
+/sys/devices/system/cpu and includes these files. The internal
+source for the output is in brackets ("[]").
+
+ =========== ==========================================================
+ kernel_max: the maximum CPU index allowed by the kernel configuration.
+ [NR_CPUS-1]
+
+ offline: CPUs that are not online because they have been
+ HOTPLUGGED off (see cpu-hotplug.txt) or exceed the limit
+ of CPUs allowed by the kernel configuration (kernel_max
+ above). [~cpu_online_mask + cpus >= NR_CPUS]
+
+ online: CPUs that are online and being scheduled [cpu_online_mask]
+
+ possible: CPUs that have been allocated resources and can be
+ brought online if they are present. [cpu_possible_mask]
+
+ present: CPUs that have been identified as being present in the
+ system. [cpu_present_mask]
+ =========== ==========================================================
+
+The format for the above output is compatible with cpulist_parse()
+[see <linux/cpumask.h>]. Some examples follow.
+
+In this example, there are 64 CPUs in the system but cpus 32-63 exceed
+the kernel max which is limited to 0..31 by the NR_CPUS config option
+being 32. Note also that CPUs 2 and 4-31 are not online but could be
+brought online as they are both present and possible::
+
+ kernel_max: 31
+ offline: 2,4-31,32-63
+ online: 0-1,3
+ possible: 0-31
+ present: 0-31
+
+In this example, the NR_CPUS config option is 128, but the kernel was
+started with possible_cpus=144. There are 4 CPUs in the system and cpu2
+was manually taken offline (and is the only CPU that can be brought
+online.)::
+
+ kernel_max: 127
+ offline: 2,4-127,128-143
+ online: 0-1,3
+ possible: 0-127
+ present: 0-3
+
+See cpu-hotplug.txt for the possible_cpus=NUM kernel start parameter
+as well as more information on the various cpumasks.
diff --git a/Documentation/admin-guide/device-mapper/statistics.rst b/Documentation/admin-guide/device-mapper/statistics.rst
index 3d80a9f850cc..41ded0bc5933 100644
--- a/Documentation/admin-guide/device-mapper/statistics.rst
+++ b/Documentation/admin-guide/device-mapper/statistics.rst
@@ -13,7 +13,7 @@ the range specified.
The I/O statistics counters for each step-sized area of a region are
in the same format as `/sys/block/*/stat` or `/proc/diskstats` (see:
-Documentation/iostats.txt). But two extra counters (12 and 13) are
+Documentation/admin-guide/iostats.rst). But two extra counters (12 and 13) are
provided: total time spent reading and writing. When the histogram
argument is used, the 14th parameter is reported that represents the
histogram of latencies. All these counters may be accessed by sending
@@ -151,7 +151,7 @@ Messages
The first 11 counters have the same meaning as
`/sys/block/*/stat or /proc/diskstats`.
- Please refer to Documentation/iostats.txt for details.
+ Please refer to Documentation/admin-guide/iostats.rst for details.
1. the number of reads completed
2. the number of reads merged
diff --git a/Documentation/admin-guide/efi-stub.rst b/Documentation/admin-guide/efi-stub.rst
new file mode 100644
index 000000000000..833edb0d0bc4
--- /dev/null
+++ b/Documentation/admin-guide/efi-stub.rst
@@ -0,0 +1,100 @@
+=================
+The EFI Boot Stub
+=================
+
+On the x86 and ARM platforms, a kernel zImage/bzImage can masquerade
+as a PE/COFF image, thereby convincing EFI firmware loaders to load
+it as an EFI executable. The code that modifies the bzImage header,
+along with the EFI-specific entry point that the firmware loader
+jumps to are collectively known as the "EFI boot stub", and live in
+arch/x86/boot/header.S and arch/x86/boot/compressed/eboot.c,
+respectively. For ARM the EFI stub is implemented in
+arch/arm/boot/compressed/efi-header.S and
+arch/arm/boot/compressed/efi-stub.c. EFI stub code that is shared
+between architectures is in drivers/firmware/efi/libstub.
+
+For arm64, there is no compressed kernel support, so the Image itself
+masquerades as a PE/COFF image and the EFI stub is linked into the
+kernel. The arm64 EFI stub lives in arch/arm64/kernel/efi-entry.S
+and drivers/firmware/efi/libstub/arm64-stub.c.
+
+By using the EFI boot stub it's possible to boot a Linux kernel
+without the use of a conventional EFI boot loader, such as grub or
+elilo. Since the EFI boot stub performs the jobs of a boot loader, in
+a certain sense it *IS* the boot loader.
+
+The EFI boot stub is enabled with the CONFIG_EFI_STUB kernel option.
+
+
+How to install bzImage.efi
+--------------------------
+
+The bzImage located in arch/x86/boot/bzImage must be copied to the EFI
+System Partition (ESP) and renamed with the extension ".efi". Without
+the extension the EFI firmware loader will refuse to execute it. It's
+not possible to execute bzImage.efi from the usual Linux file systems
+because EFI firmware doesn't have support for them. For ARM the
+arch/arm/boot/zImage should be copied to the system partition, and it
+may not need to be renamed. Similarly for arm64, arch/arm64/boot/Image
+should be copied but not necessarily renamed.
+
+
+Passing kernel parameters from the EFI shell
+--------------------------------------------
+
+Arguments to the kernel can be passed after bzImage.efi, e.g.::
+
+ fs0:> bzImage.efi console=ttyS0 root=/dev/sda4
+
+
+The "initrd=" option
+--------------------
+
+Like most boot loaders, the EFI stub allows the user to specify
+multiple initrd files using the "initrd=" option. This is the only EFI
+stub-specific command line parameter, everything else is passed to the
+kernel when it boots.
+
+The path to the initrd file must be an absolute path from the
+beginning of the ESP, relative path names do not work. Also, the path
+is an EFI-style path and directory elements must be separated with
+backslashes (\). For example, given the following directory layout::
+
+ fs0:>
+ Kernels\
+ bzImage.efi
+ initrd-large.img
+
+ Ramdisks\
+ initrd-small.img
+ initrd-medium.img
+
+to boot with the initrd-large.img file if the current working
+directory is fs0:\Kernels, the following command must be used::
+
+ fs0:\Kernels> bzImage.efi initrd=\Kernels\initrd-large.img
+
+Notice how bzImage.efi can be specified with a relative path. That's
+because the image we're executing is interpreted by the EFI shell,
+which understands relative paths, whereas the rest of the command line
+is passed to bzImage.efi.
+
+
+The "dtb=" option
+-----------------
+
+For the ARM and arm64 architectures, a device tree must be provided to
+the kernel. Normally firmware shall supply the device tree via the
+EFI CONFIGURATION TABLE. However, the "dtb=" command line option can
+be used to override the firmware supplied device tree, or to supply
+one when firmware is unable to.
+
+Please note: Firmware adds runtime configuration information to the
+device tree before booting the kernel. If dtb= is used to override
+the device tree, then any runtime data provided by firmware will be
+lost. The dtb= option should only be used either as a debug tool, or
+as a last resort when a device tree is not provided in the EFI
+CONFIGURATION TABLE.
+
+"dtb=" is processed in the same manner as the "initrd=" option that is
+described above.
diff --git a/Documentation/admin-guide/highuid.rst b/Documentation/admin-guide/highuid.rst
new file mode 100644
index 000000000000..6ee70465c0ea
--- /dev/null
+++ b/Documentation/admin-guide/highuid.rst
@@ -0,0 +1,80 @@
+===================================================
+Notes on the change from 16-bit UIDs to 32-bit UIDs
+===================================================
+
+:Author: Chris Wing <wingc@umich.edu>
+:Last updated: January 11, 2000
+
+- kernel code MUST take into account __kernel_uid_t and __kernel_uid32_t
+ when communicating between user and kernel space in an ioctl or data
+ structure.
+
+- kernel code should use uid_t and gid_t in kernel-private structures and
+ code.
+
+What's left to be done for 32-bit UIDs on all Linux architectures:
+
+- Disk quotas have an interesting limitation that is not related to the
+ maximum UID/GID. They are limited by the maximum file size on the
+ underlying filesystem, because quota records are written at offsets
+ corresponding to the UID in question.
+ Further investigation is needed to see if the quota system can cope
+ properly with huge UIDs. If it can deal with 64-bit file offsets on all
+ architectures, this should not be a problem.
+
+- Decide whether or not to keep backwards compatibility with the system
+ accounting file, or if we should break it as the comments suggest
+ (currently, the old 16-bit UID and GID are still written to disk, and
+ part of the former pad space is used to store separate 32-bit UID and
+ GID)
+
+- Need to validate that OS emulation calls the 16-bit UID
+ compatibility syscalls, if the OS being emulated used 16-bit UIDs, or
+ uses the 32-bit UID system calls properly otherwise.
+
+ This affects at least:
+
+ - iBCS on Intel
+
+ - sparc32 emulation on sparc64
+ (need to support whatever new 32-bit UID system calls are added to
+ sparc32)
+
+- Validate that all filesystems behave properly.
+
+ At present, 32-bit UIDs _should_ work for:
+
+ - ext2
+ - ufs
+ - isofs
+ - nfs
+ - coda
+ - udf
+
+ Ioctl() fixups have been made for:
+
+ - ncpfs
+ - smbfs
+
+ Filesystems with simple fixups to prevent 16-bit UID wraparound:
+
+ - minix
+ - sysv
+ - qnx4
+
+ Other filesystems have not been checked yet.
+
+- The ncpfs and smpfs filesystems cannot presently use 32-bit UIDs in
+ all ioctl()s. Some new ioctl()s have been added with 32-bit UIDs, but
+ more are needed. (as well as new user<->kernel data structures)
+
+- The ELF core dump format only supports 16-bit UIDs on arm, i386, m68k,
+ sh, and sparc32. Fixing this is probably not that important, but would
+ require adding a new ELF section.
+
+- The ioctl()s used to control the in-kernel NFS server only support
+ 16-bit UIDs on arm, i386, m68k, sh, and sparc32.
+
+- make sure that the UID mapping feature of AX25 networking works properly
+ (it should be safe because it's always used a 32-bit integer to
+ communicate between user and kernel)
diff --git a/Documentation/admin-guide/hw-vuln/l1tf.rst b/Documentation/admin-guide/hw-vuln/l1tf.rst
index 656aee262e23..f83212fae4d5 100644
--- a/Documentation/admin-guide/hw-vuln/l1tf.rst
+++ b/Documentation/admin-guide/hw-vuln/l1tf.rst
@@ -241,7 +241,7 @@ Guest mitigation mechanisms
For further information about confining guests to a single or to a group
of cores consult the cpusets documentation:
- https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.rst
+ https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v1/cpusets.rst
.. _interrupt_isolation:
diff --git a/Documentation/admin-guide/hw_random.rst b/Documentation/admin-guide/hw_random.rst
new file mode 100644
index 000000000000..121de96e395e
--- /dev/null
+++ b/Documentation/admin-guide/hw_random.rst
@@ -0,0 +1,105 @@
+==========================================================
+Linux support for random number generator in i8xx chipsets
+==========================================================
+
+Introduction
+============
+
+The hw_random framework is software that makes use of a
+special hardware feature on your CPU or motherboard,
+a Random Number Generator (RNG). The software has two parts:
+a core providing the /dev/hwrng character device and its
+sysfs support, plus a hardware-specific driver that plugs
+into that core.
+
+To make the most effective use of these mechanisms, you
+should download the support software as well. Download the
+latest version of the "rng-tools" package from the
+hw_random driver's official Web site:
+
+ http://sourceforge.net/projects/gkernel/
+
+Those tools use /dev/hwrng to fill the kernel entropy pool,
+which is used internally and exported by the /dev/urandom and
+/dev/random special files.
+
+Theory of operation
+===================
+
+CHARACTER DEVICE. Using the standard open()
+and read() system calls, you can read random data from
+the hardware RNG device. This data is NOT CHECKED by any
+fitness tests, and could potentially be bogus (if the
+hardware is faulty or has been tampered with). Data is only
+output if the hardware "has-data" flag is set, but nevertheless
+a security-conscious person would run fitness tests on the
+data before assuming it is truly random.
+
+The rng-tools package uses such tests in "rngd", and lets you
+run them by hand with a "rngtest" utility.
+
+/dev/hwrng is char device major 10, minor 183.
+
+CLASS DEVICE. There is a /sys/class/misc/hw_random node with
+two unique attributes, "rng_available" and "rng_current". The
+"rng_available" attribute lists the hardware-specific drivers
+available, while "rng_current" lists the one which is currently
+connected to /dev/hwrng. If your system has more than one
+RNG available, you may change the one used by writing a name from
+the list in "rng_available" into "rng_current".
+
+==========================================================================
+
+
+Hardware driver for Intel/AMD/VIA Random Number Generators (RNG)
+ - Copyright 2000,2001 Jeff Garzik <jgarzik@pobox.com>
+ - Copyright 2000,2001 Philipp Rumpf <prumpf@mandrakesoft.com>
+
+
+About the Intel RNG hardware, from the firmware hub datasheet
+=============================================================
+
+The Firmware Hub integrates a Random Number Generator (RNG)
+using thermal noise generated from inherently random quantum
+mechanical properties of silicon. When not generating new random
+bits the RNG circuitry will enter a low power state. Intel will
+provide a binary software driver to give third party software
+access to our RNG for use as a security feature. At this time,
+the RNG is only to be used with a system in an OS-present state.
+
+Intel RNG Driver notes
+======================
+
+FIXME: support poll(2)
+
+.. note::
+
+ request_mem_region was removed, for three reasons:
+
+ 1) Only one RNG is supported by this driver;
+ 2) The location used by the RNG is a fixed location in
+ MMIO-addressable memory;
+ 3) users with properly working BIOS e820 handling will always
+ have the region in which the RNG is located reserved, so
+ request_mem_region calls always fail for proper setups.
+ However, for people who use mem=XX, BIOS e820 information is
+ **not** in /proc/iomem, and request_mem_region(RNG_ADDR) can
+ succeed.
+
+Driver details
+==============
+
+Based on:
+ Intel 82802AB/82802AC Firmware Hub (FWH) Datasheet
+ May 1999 Order Number: 290658-002 R
+
+Intel 82802 Firmware Hub:
+ Random Number Generator
+ Programmer's Reference Manual
+ December 1999 Order Number: 298029-001 R
+
+Intel 82802 Firmware HUB Random Number Generator Driver
+ Copyright (c) 2000 Matt Sottek <msottek@quiknet.com>
+
+Special thanks to Matt Sottek. I did the "guts", he
+did the "brains" and all the testing.
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index a5fdb1a846ce..4e98f5596da0 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -85,8 +85,25 @@ configure specific aspects of kernel behavior to your liking.
perf-security
acpi/index
aoe/index
+ btmrvl
+ clearing-warn-once
+ cpu-load
+ cputopology
device-mapper/index
+ efi-stub
+ highuid
+ hw_random
+ iostats
+ kernel-per-CPU-kthreads
laptops/index
+ lcd-panel-cgram
+ ldm
+ lockup-watchdogs
+ numastat
+ pnp
+ rtc
+ svga
+ video-output
.. only:: subproject and html
diff --git a/Documentation/admin-guide/iostats.rst b/Documentation/admin-guide/iostats.rst
new file mode 100644
index 000000000000..5d63b18bd6d1
--- /dev/null
+++ b/Documentation/admin-guide/iostats.rst
@@ -0,0 +1,197 @@
+=====================
+I/O statistics fields
+=====================
+
+Since 2.4.20 (and some versions before, with patches), and 2.5.45,
+more extensive disk statistics have been introduced to help measure disk
+activity. Tools such as ``sar`` and ``iostat`` typically interpret these and do
+the work for you, but in case you are interested in creating your own
+tools, the fields are explained here.
+
+In 2.4 now, the information is found as additional fields in
+``/proc/partitions``. In 2.6 and upper, the same information is found in two
+places: one is in the file ``/proc/diskstats``, and the other is within
+the sysfs file system, which must be mounted in order to obtain
+the information. Throughout this document we'll assume that sysfs
+is mounted on ``/sys``, although of course it may be mounted anywhere.
+Both ``/proc/diskstats`` and sysfs use the same source for the information
+and so should not differ.
+
+Here are examples of these different formats::
+
+ 2.4:
+ 3 0 39082680 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
+ 3 1 9221278 hda1 35486 0 35496 38030 0 0 0 0 0 38030 38030
+
+ 2.6+ sysfs:
+ 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
+ 35486 38030 38030 38030
+
+ 2.6+ diskstats:
+ 3 0 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
+ 3 1 hda1 35486 38030 38030 38030
+
+ 4.18+ diskstats:
+ 3 0 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 0 0 0 0
+
+On 2.4 you might execute ``grep 'hda ' /proc/partitions``. On 2.6+, you have
+a choice of ``cat /sys/block/hda/stat`` or ``grep 'hda ' /proc/diskstats``.
+
+The advantage of one over the other is that the sysfs choice works well
+if you are watching a known, small set of disks. ``/proc/diskstats`` may
+be a better choice if you are watching a large number of disks because
+you'll avoid the overhead of 50, 100, or 500 or more opens/closes with
+each snapshot of your disk statistics.
+
+In 2.4, the statistics fields are those after the device name. In
+the above example, the first field of statistics would be 446216.
+By contrast, in 2.6+ if you look at ``/sys/block/hda/stat``, you'll
+find just the eleven fields, beginning with 446216. If you look at
+``/proc/diskstats``, the eleven fields will be preceded by the major and
+minor device numbers, and device name. Each of these formats provides
+eleven fields of statistics, each meaning exactly the same things.
+All fields except field 9 are cumulative since boot. Field 9 should
+go to zero as I/Os complete; all others only increase (unless they
+overflow and wrap). Yes, these are (32-bit or 64-bit) unsigned long
+(native word size) numbers, and on a very busy or long-lived system they
+may wrap. Applications should be prepared to deal with that; unless
+your observations are measured in large numbers of minutes or hours,
+they should not wrap twice before you notice them.
+
+Each set of stats only applies to the indicated device; if you want
+system-wide stats you'll have to find all the devices and sum them all up.
+
+Field 1 -- # of reads completed
+ This is the total number of reads completed successfully.
+
+Field 2 -- # of reads merged, field 6 -- # of writes merged
+ Reads and writes which are adjacent to each other may be merged for
+ efficiency. Thus two 4K reads may become one 8K read before it is
+ ultimately handed to the disk, and so it will be counted (and queued)
+ as only one I/O. This field lets you know how often this was done.
+
+Field 3 -- # of sectors read
+ This is the total number of sectors read successfully.
+
+Field 4 -- # of milliseconds spent reading
+ This is the total number of milliseconds spent by all reads (as
+ measured from __make_request() to end_that_request_last()).
+
+Field 5 -- # of writes completed
+ This is the total number of writes completed successfully.
+
+Field 6 -- # of writes merged
+ See the description of field 2.
+
+Field 7 -- # of sectors written
+ This is the total number of sectors written successfully.
+
+Field 8 -- # of milliseconds spent writing
+ This is the total number of milliseconds spent by all writes (as
+ measured from __make_request() to end_that_request_last()).
+
+Field 9 -- # of I/Os currently in progress
+ The only field that should go to zero. Incremented as requests are
+ given to appropriate struct request_queue and decremented as they finish.
+
+Field 10 -- # of milliseconds spent doing I/Os
+ This field increases so long as field 9 is nonzero.
+
+ Since 5.0 this field counts jiffies when at least one request was
+ started or completed. If request runs more than 2 jiffies then some
+ I/O time will not be accounted unless there are other requests.
+
+Field 11 -- weighted # of milliseconds spent doing I/Os
+ This field is incremented at each I/O start, I/O completion, I/O
+ merge, or read of these stats by the number of I/Os in progress
+ (field 9) times the number of milliseconds spent doing I/O since the
+ last update of this field. This can provide an easy measure of both
+ I/O completion time and the backlog that may be accumulating.
+
+Field 12 -- # of discards completed
+ This is the total number of discards completed successfully.
+
+Field 13 -- # of discards merged
+ See the description of field 2
+
+Field 14 -- # of sectors discarded
+ This is the total number of sectors discarded successfully.
+
+Field 15 -- # of milliseconds spent discarding
+ This is the total number of milliseconds spent by all discards (as
+ measured from __make_request() to end_that_request_last()).
+
+To avoid introducing performance bottlenecks, no locks are held while
+modifying these counters. This implies that minor inaccuracies may be
+introduced when changes collide, so (for instance) adding up all the
+read I/Os issued per partition should equal those made to the disks ...
+but due to the lack of locking it may only be very close.
+
+In 2.6+, there are counters for each CPU, which make the lack of locking
+almost a non-issue. When the statistics are read, the per-CPU counters
+are summed (possibly overflowing the unsigned long variable they are
+summed to) and the result given to the user. There is no convenient
+user interface for accessing the per-CPU counters themselves.
+
+Disks vs Partitions
+-------------------
+
+There were significant changes between 2.4 and 2.6+ in the I/O subsystem.
+As a result, some statistic information disappeared. The translation from
+a disk address relative to a partition to the disk address relative to
+the host disk happens much earlier. All merges and timings now happen
+at the disk level rather than at both the disk and partition level as
+in 2.4. Consequently, you'll see a different statistics output on 2.6+ for
+partitions from that for disks. There are only *four* fields available
+for partitions on 2.6+ machines. This is reflected in the examples above.
+
+Field 1 -- # of reads issued
+ This is the total number of reads issued to this partition.
+
+Field 2 -- # of sectors read
+ This is the total number of sectors requested to be read from this
+ partition.
+
+Field 3 -- # of writes issued
+ This is the total number of writes issued to this partition.
+
+Field 4 -- # of sectors written
+ This is the total number of sectors requested to be written to
+ this partition.
+
+Note that since the address is translated to a disk-relative one, and no
+record of the partition-relative address is kept, the subsequent success
+or failure of the read cannot be attributed to the partition. In other
+words, the number of reads for partitions is counted slightly before time
+of queuing for partitions, and at completion for whole disks. This is
+a subtle distinction that is probably uninteresting for most cases.
+
+More significant is the error induced by counting the numbers of
+reads/writes before merges for partitions and after for disks. Since a
+typical workload usually contains a lot of successive and adjacent requests,
+the number of reads/writes issued can be several times higher than the
+number of reads/writes completed.
+
+In 2.6.25, the full statistic set is again available for partitions and
+disk and partition statistics are consistent again. Since we still don't
+keep record of the partition-relative address, an operation is attributed to
+the partition which contains the first sector of the request after the
+eventual merges. As requests can be merged across partition, this could lead
+to some (probably insignificant) inaccuracy.
+
+Additional notes
+----------------
+
+In 2.6+, sysfs is not mounted by default. If your distribution of
+Linux hasn't added it already, here's the line you'll want to add to
+your ``/etc/fstab``::
+
+ none /sys sysfs defaults 0 0
+
+
+In 2.6+, all disk statistics were removed from ``/proc/stat``. In 2.4, they
+appear in both ``/proc/partitions`` and ``/proc/stat``, although the ones in
+``/proc/stat`` take a very different format from those in ``/proc/partitions``
+(see proc(5), if your system has it.)
+
+-- ricklind@us.ibm.com
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a571a67e0c85..19b1e3bef56c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5066,7 +5066,7 @@
vga= [BOOT,X86-32] Select a particular video mode
See Documentation/x86/boot.rst and
- Documentation/svga.txt.
+ Documentation/admin-guide/svga.rst.
Use vga=ask for menu.
This is actually a boot loader parameter; the value is
passed to the kernel using a special protocol.
diff --git a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst
new file mode 100644
index 000000000000..4f18456dd3b1
--- /dev/null
+++ b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst
@@ -0,0 +1,356 @@
+==========================================
+Reducing OS jitter due to per-cpu kthreads
+==========================================
+
+This document lists per-CPU kthreads in the Linux kernel and presents
+options to control their OS jitter. Note that non-per-CPU kthreads are
+not listed here. To reduce OS jitter from non-per-CPU kthreads, bind
+them to a "housekeeping" CPU dedicated to such work.
+
+References
+==========
+
+- Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs.
+
+- Documentation/admin-guide/cgroup-v1: Using cgroups to bind tasks to sets of CPUs.
+
+- man taskset: Using the taskset command to bind tasks to sets
+ of CPUs.
+
+- man sched_setaffinity: Using the sched_setaffinity() system
+ call to bind tasks to sets of CPUs.
+
+- /sys/devices/system/cpu/cpuN/online: Control CPU N's hotplug state,
+ writing "0" to offline and "1" to online.
+
+- In order to locate kernel-generated OS jitter on CPU N:
+
+ cd /sys/kernel/debug/tracing
+ echo 1 > max_graph_depth # Increase the "1" for more detail
+ echo function_graph > current_tracer
+ # run workload
+ cat per_cpu/cpuN/trace
+
+kthreads
+========
+
+Name:
+ ehca_comp/%u
+
+Purpose:
+ Periodically process Infiniband-related work.
+
+To reduce its OS jitter, do any of the following:
+
+1. Don't use eHCA Infiniband hardware, instead choosing hardware
+ that does not require per-CPU kthreads. This will prevent these
+ kthreads from being created in the first place. (This will
+ work for most people, as this hardware, though important, is
+ relatively old and is produced in relatively low unit volumes.)
+2. Do all eHCA-Infiniband-related work on other CPUs, including
+ interrupts.
+3. Rework the eHCA driver so that its per-CPU kthreads are
+ provisioned only on selected CPUs.
+
+
+Name:
+ irq/%d-%s
+
+Purpose:
+ Handle threaded interrupts.
+
+To reduce its OS jitter, do the following:
+
+1. Use irq affinity to force the irq threads to execute on
+ some other CPU.
+
+Name:
+ kcmtpd_ctr_%d
+
+Purpose:
+ Handle Bluetooth work.
+
+To reduce its OS jitter, do one of the following:
+
+1. Don't use Bluetooth, in which case these kthreads won't be
+ created in the first place.
+2. Use irq affinity to force Bluetooth-related interrupts to
+ occur on some other CPU and furthermore initiate all
+ Bluetooth activity on some other CPU.
+
+Name:
+ ksoftirqd/%u
+
+Purpose:
+ Execute softirq handlers when threaded or when under heavy load.
+
+To reduce its OS jitter, each softirq vector must be handled
+separately as follows:
+
+TIMER_SOFTIRQ
+-------------
+
+Do all of the following:
+
+1. To the extent possible, keep the CPU out of the kernel when it
+ is non-idle, for example, by avoiding system calls and by forcing
+ both kernel threads and interrupts to execute elsewhere.
+2. Build with CONFIG_HOTPLUG_CPU=y. After boot completes, force
+ the CPU offline, then bring it back online. This forces
+ recurring timers to migrate elsewhere. If you are concerned
+ with multiple CPUs, force them all offline before bringing the
+ first one back online. Once you have onlined the CPUs in question,
+ do not offline any other CPUs, because doing so could force the
+ timer back onto one of the CPUs in question.
+
+NET_TX_SOFTIRQ and NET_RX_SOFTIRQ
+---------------------------------
+
+Do all of the following:
+
+1. Force networking interrupts onto other CPUs.
+2. Initiate any network I/O on other CPUs.
+3. Once your application has started, prevent CPU-hotplug operations
+ from being initiated from tasks that might run on the CPU to
+ be de-jittered. (It is OK to force this CPU offline and then
+ bring it back online before you start your application.)
+
+BLOCK_SOFTIRQ
+-------------
+
+Do all of the following:
+
+1. Force block-device interrupts onto some other CPU.
+2. Initiate any block I/O on other CPUs.
+3. Once your application has started, prevent CPU-hotplug operations
+ from being initiated from tasks that might run on the CPU to
+ be de-jittered. (It is OK to force this CPU offline and then
+ bring it back online before you start your application.)
+
+IRQ_POLL_SOFTIRQ
+----------------
+
+Do all of the following:
+
+1. Force block-device interrupts onto some other CPU.
+2. Initiate any block I/O and block-I/O polling on other CPUs.
+3. Once your application has started, prevent CPU-hotplug operations
+ from being initiated from tasks that might run on the CPU to
+ be de-jittered. (It is OK to force this CPU offline and then
+ bring it back online before you start your application.)
+
+TASKLET_SOFTIRQ
+---------------
+
+Do one or more of the following:
+
+1. Avoid use of drivers that use tasklets. (Such drivers will contain
+ calls to things like tasklet_schedule().)
+2. Convert all drivers that you must use from tasklets to workqueues.
+3. Force interrupts for drivers using tasklets onto other CPUs,
+ and also do I/O involving these drivers on other CPUs.
+
+SCHED_SOFTIRQ
+-------------
+
+Do all of the following:
+
+1. Avoid sending scheduler IPIs to the CPU to be de-jittered,
+ for example, ensure that at most one runnable kthread is present
+ on that CPU. If a thread that expects to run on the de-jittered
+ CPU awakens, the scheduler will send an IPI that can result in
+ a subsequent SCHED_SOFTIRQ.
+2. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered
+ is marked as an adaptive-ticks CPU using the "nohz_full="
+ boot parameter. This reduces the number of scheduler-clock
+ interrupts that the de-jittered CPU receives, minimizing its
+ chances of being selected to do the load balancing work that
+ runs in SCHED_SOFTIRQ context.
+3. To the extent possible, keep the CPU out of the kernel when it
+ is non-idle, for example, by avoiding system calls and by
+ forcing both kernel threads and interrupts to execute elsewhere.
+ This further reduces the number of scheduler-clock interrupts
+ received by the de-jittered CPU.
+
+HRTIMER_SOFTIRQ
+---------------
+
+Do all of the following:
+
+1. To the extent possible, keep the CPU out of the kernel when it
+ is non-idle. For example, avoid system calls and force both
+ kernel threads and interrupts to execute elsewhere.
+2. Build with CONFIG_HOTPLUG_CPU=y. Once boot completes, force the
+ CPU offline, then bring it back online. This forces recurring
+ timers to migrate elsewhere. If you are concerned with multiple
+ CPUs, force them all offline before bringing the first one
+ back online. Once you have onlined the CPUs in question, do not
+ offline any other CPUs, because doing so could force the timer
+ back onto one of the CPUs in question.
+
+RCU_SOFTIRQ
+-----------
+
+Do at least one of the following:
+
+1. Offload callbacks and keep the CPU in either dyntick-idle or
+ adaptive-ticks state by doing all of the following:
+
+ a. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be
+ de-jittered is marked as an adaptive-ticks CPU using the
+ "nohz_full=" boot parameter. Bind the rcuo kthreads to
+ housekeeping CPUs, which can tolerate OS jitter.
+ b. To the extent possible, keep the CPU out of the kernel
+ when it is non-idle, for example, by avoiding system
+ calls and by forcing both kernel threads and interrupts
+ to execute elsewhere.
+
+2. Enable RCU to do its processing remotely via dyntick-idle by
+ doing all of the following:
+
+ a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y.
+ b. Ensure that the CPU goes idle frequently, allowing other
+ CPUs to detect that it has passed through an RCU quiescent
+ state. If the kernel is built with CONFIG_NO_HZ_FULL=y,
+ userspace execution also allows other CPUs to detect that
+ the CPU in question has passed through a quiescent state.
+ c. To the extent possible, keep the CPU out of the kernel
+ when it is non-idle, for example, by avoiding system
+ calls and by forcing both kernel threads and interrupts
+ to execute elsewhere.
+
+Name:
+ kworker/%u:%d%s (cpu, id, priority)
+
+Purpose:
+ Execute workqueue requests
+
+To reduce its OS jitter, do any of the following:
+
+1. Run your workload at a real-time priority, which will allow
+ preempting the kworker daemons.
+2. A given workqueue can be made visible in the sysfs filesystem
+ by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
+ Such a workqueue can be confined to a given subset of the
+ CPUs using the ``/sys/devices/virtual/workqueue/*/cpumask`` sysfs
+ files. The set of WQ_SYSFS workqueues can be displayed using
+ "ls sys/devices/virtual/workqueue". That said, the workqueues
+ maintainer would like to caution people against indiscriminately
+ sprinkling WQ_SYSFS across all the workqueues. The reason for
+ caution is that it is easy to add WQ_SYSFS, but because sysfs is
+ part of the formal user/kernel API, it can be nearly impossible
+ to remove it, even if its addition was a mistake.
+3. Do any of the following needed to avoid jitter that your
+ application cannot tolerate:
+
+ a. Build your kernel with CONFIG_SLUB=y rather than
+ CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
+ use of each CPU's workqueues to run its cache_reap()
+ function.
+ b. Avoid using oprofile, thus avoiding OS jitter from
+ wq_sync_buffer().
+ c. Limit your CPU frequency so that a CPU-frequency
+ governor is not required, possibly enlisting the aid of
+ special heatsinks or other cooling technologies. If done
+ correctly, and if you CPU architecture permits, you should
+ be able to build your kernel with CONFIG_CPU_FREQ=n to
+ avoid the CPU-frequency governor periodically running
+ on each CPU, including cs_dbs_timer() and od_dbs_timer().
+
+ WARNING: Please check your CPU specifications to
+ make sure that this is safe on your particular system.
+ d. As of v3.18, Christoph Lameter's on-demand vmstat workers
+ commit prevents OS jitter due to vmstat_update() on
+ CONFIG_SMP=y systems. Before v3.18, is not possible
+ to entirely get rid of the OS jitter, but you can
+ decrease its frequency by writing a large value to
+ /proc/sys/vm/stat_interval. The default value is HZ,
+ for an interval of one second. Of course, larger values
+ will make your virtual-memory statistics update more
+ slowly. Of course, you can also run your workload at
+ a real-time priority, thus preempting vmstat_update(),
+ but if your workload is CPU-bound, this is a bad idea.
+ However, there is an RFC patch from Christoph Lameter
+ (based on an earlier one from Gilad Ben-Yossef) that
+ reduces or even eliminates vmstat overhead for some
+ workloads at https://lkml.org/lkml/2013/9/4/379.
+ e. Boot with "elevator=noop" to avoid workqueue use by
+ the block layer.
+ f. If running on high-end powerpc servers, build with
+ CONFIG_PPC_RTAS_DAEMON=n. This prevents the RTAS
+ daemon from running on each CPU every second or so.
+ (This will require editing Kconfig files and will defeat
+ this platform's RAS functionality.) This avoids jitter
+ due to the rtas_event_scan() function.
+ WARNING: Please check your CPU specifications to
+ make sure that this is safe on your particular system.
+ g. If running on Cell Processor, build your kernel with
+ CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from
+ spu_gov_work().
+ WARNING: Please check your CPU specifications to
+ make sure that this is safe on your particular system.
+ h. If running on PowerMAC, build your kernel with
+ CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
+ avoiding OS jitter from rackmeter_do_timer().
+
+Name:
+ rcuc/%u
+
+Purpose:
+ Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels.
+
+To reduce its OS jitter, do at least one of the following:
+
+1. Build the kernel with CONFIG_PREEMPT=n. This prevents these
+ kthreads from being created in the first place, and also obviates
+ the need for RCU priority boosting. This approach is feasible
+ for workloads that do not require high degrees of responsiveness.
+2. Build the kernel with CONFIG_RCU_BOOST=n. This prevents these
+ kthreads from being created in the first place. This approach
+ is feasible only if your workload never requires RCU priority
+ boosting, for example, if you ensure frequent idle time on all
+ CPUs that might execute within the kernel.
+3. Build with CONFIG_RCU_NOCB_CPU=y and boot with the rcu_nocbs=
+ boot parameter offloading RCU callbacks from all CPUs susceptible
+ to OS jitter. This approach prevents the rcuc/%u kthreads from
+ having any work to do, so that they are never awakened.
+4. Ensure that the CPU never enters the kernel, and, in particular,
+ avoid initiating any CPU hotplug operations on this CPU. This is
+ another way of preventing any callbacks from being queued on the
+ CPU, again preventing the rcuc/%u kthreads from having any work
+ to do.
+
+Name:
+ rcuop/%d and rcuos/%d
+
+Purpose:
+ Offload RCU callbacks from the corresponding CPU.
+
+To reduce its OS jitter, do at least one of the following:
+
+1. Use affinity, cgroups, or other mechanism to force these kthreads
+ to execute on some other CPU.
+2. Build with CONFIG_RCU_NOCB_CPU=n, which will prevent these
+ kthreads from being created in the first place. However, please
+ note that this will not eliminate OS jitter, but will instead
+ shift it to RCU_SOFTIRQ.
+
+Name:
+ watchdog/%u
+
+Purpose:
+ Detect software lockups on each CPU.
+
+To reduce its OS jitter, do at least one of the following:
+
+1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these
+ kthreads from being created in the first place.
+2. Boot with "nosoftlockup=0", which will also prevent these kthreads
+ from being created. Other related watchdog and softlockup boot
+ parameters may be found in Documentation/admin-guide/kernel-parameters.rst
+ and Documentation/watchdog/watchdog-parameters.rst.
+3. Echo a zero to /proc/sys/kernel/watchdog to disable the
+ watchdog timer.
+4. Echo a large number of /proc/sys/kernel/watchdog_thresh in
+ order to reduce the frequency of OS jitter due to the watchdog
+ timer down to a level that is acceptable for your workload.
diff --git a/Documentation/admin-guide/lcd-panel-cgram.rst b/Documentation/admin-guide/lcd-panel-cgram.rst
new file mode 100644
index 000000000000..a3eb00c62f53
--- /dev/null
+++ b/Documentation/admin-guide/lcd-panel-cgram.rst
@@ -0,0 +1,27 @@
+======================================
+Parallel port LCD/Keypad Panel support
+======================================
+
+Some LCDs allow you to define up to 8 characters, mapped to ASCII
+characters 0 to 7. The escape code to define a new character is
+'\e[LG' followed by one digit from 0 to 7, representing the character
+number, and up to 8 couples of hex digits terminated by a semi-colon
+(';'). Each couple of digits represents a line, with 1-bits for each
+illuminated pixel with LSB on the right. Lines are numbered from the
+top of the character to the bottom. On a 5x7 matrix, only the 5 lower
+bits of the 7 first bytes are used for each character. If the string
+is incomplete, only complete lines will be redefined. Here are some
+examples::
+
+ printf "\e[LG0010101050D1F0C04;" => 0 = [enter]
+ printf "\e[LG1040E1F0000000000;" => 1 = [up]
+ printf "\e[LG2000000001F0E0400;" => 2 = [down]
+ printf "\e[LG3040E1F001F0E0400;" => 3 = [up-down]
+ printf "\e[LG40002060E1E0E0602;" => 4 = [left]
+ printf "\e[LG500080C0E0F0E0C08;" => 5 = [right]
+ printf "\e[LG60016051516141400;" => 6 = "IP"
+
+ printf "\e[LG00103071F1F070301;" => big speaker
+ printf "\e[LG00002061E1E060200;" => small speaker
+
+Willy
diff --git a/Documentation/admin-guide/ldm.rst b/Documentation/admin-guide/ldm.rst
new file mode 100644
index 000000000000..12c571368e73
--- /dev/null
+++ b/Documentation/admin-guide/ldm.rst
@@ -0,0 +1,121 @@
+==========================================
+LDM - Logical Disk Manager (Dynamic Disks)
+==========================================
+
+:Author: Originally Written by FlatCap - Richard Russon <ldm@flatcap.org>.
+:Last Updated: Anton Altaparmakov on 30 March 2007 for Windows Vista.
+
+Overview
+--------
+
+Windows 2000, XP, and Vista use a new partitioning scheme. It is a complete
+replacement for the MSDOS style partitions. It stores its information in a
+1MiB journalled database at the end of the physical disk. The size of
+partitions is limited only by disk space. The maximum number of partitions is
+nearly 2000.
+
+Any partitions created under the LDM are called "Dynamic Disks". There are no
+longer any primary or extended partitions. Normal MSDOS style partitions are
+now known as Basic Disks.
+
+If you wish to use Spanned, Striped, Mirrored or RAID 5 Volumes, you must use
+Dynamic Disks. The journalling allows Windows to make changes to these
+partitions and filesystems without the need to reboot.
+
+Once the LDM driver has divided up the disk, you can use the MD driver to
+assemble any multi-partition volumes, e.g. Stripes, RAID5.
+
+To prevent legacy applications from repartitioning the disk, the LDM creates a
+dummy MSDOS partition containing one disk-sized partition. This is what is
+supported with the Linux LDM driver.
+
+A newer approach that has been implemented with Vista is to put LDM on top of a
+GPT label disk. This is not supported by the Linux LDM driver yet.
+
+
+Example
+-------
+
+Below we have a 50MiB disk, divided into seven partitions.
+
+.. note::
+
+ The missing 1MiB at the end of the disk is where the LDM database is
+ stored.
+
++-------++--------------+---------+-----++--------------+---------+----+
+|Device || Offset Bytes | Sectors | MiB || Size Bytes | Sectors | MiB|
++=======++==============+=========+=====++==============+=========+====+
+|hda || 0 | 0 | 0 || 52428800 | 102400 | 50|
++-------++--------------+---------+-----++--------------+---------+----+
+|hda1 || 51380224 | 100352 | 49 || 1048576 | 2048 | 1|
++-------++--------------+---------+-----++--------------+---------+----+
+|hda2 || 16384 | 32 | 0 || 6979584 | 13632 | 6|
++-------++--------------+---------+-----++--------------+---------+----+
+|hda3 || 6995968 | 13664 | 6 || 10485760 | 20480 | 10|
++-------++--------------+---------+-----++--------------+---------+----+
+|hda4 || 17481728 | 34144 | 16 || 4194304 | 8192 | 4|
++-------++--------------+---------+-----++--------------+---------+----+
+|hda5 || 21676032 | 42336 | 20 || 5242880 | 10240 | 5|
++-------++--------------+---------+-----++--------------+---------+----+
+|hda6 || 26918912 | 52576 | 25 || 10485760 | 20480 | 10|
++-------++--------------+---------+-----++--------------+---------+----+
+|hda7 || 37404672 | 73056 | 35 || 13959168 | 27264 | 13|
++-------++--------------+---------+-----++--------------+---------+----+
+
+The LDM Database may not store the partitions in the order that they appear on
+disk, but the driver will sort them.
+
+When Linux boots, you will see something like::
+
+ hda: 102400 sectors w/32KiB Cache, CHS=50/64/32
+ hda: [LDM] hda1 hda2 hda3 hda4 hda5 hda6 hda7
+
+
+Compiling LDM Support
+---------------------
+
+To enable LDM, choose the following two options:
+
+ - "Advanced partition selection" CONFIG_PARTITION_ADVANCED
+ - "Windows Logical Disk Manager (Dynamic Disk) support" CONFIG_LDM_PARTITION
+
+If you believe the driver isn't working as it should, you can enable the extra
+debugging code. This will produce a LOT of output. The option is:
+
+ - "Windows LDM extra logging" CONFIG_LDM_DEBUG
+
+N.B. The partition code cannot be compiled as a module.
+
+As with all the partition code, if the driver doesn't see signs of its type of
+partition, it will pass control to another driver, so there is no harm in
+enabling it.
+
+If you have Dynamic Disks but don't enable the driver, then all you will see
+is a dummy MSDOS partition filling the whole disk. You won't be able to mount
+any of the volumes on the disk.
+
+
+Booting
+-------
+
+If you enable LDM support, then lilo is capable of booting from any of the
+discovered partitions. However, grub does not understand the LDM partitioning
+and cannot boot from a Dynamic Disk.
+
+
+More Documentation
+------------------
+
+There is an Overview of the LDM together with complete Technical Documentation.
+It is available for download.
+
+ http://www.linux-ntfs.org/
+
+If you have any LDM questions that aren't answered in the documentation, email
+me.
+
+Cheers,
+ FlatCap - Richard Russon
+ ldm@flatcap.org
+
diff --git a/Documentation/admin-guide/lockup-watchdogs.rst b/Documentation/admin-guide/lockup-watchdogs.rst
new file mode 100644
index 000000000000..290840c160af
--- /dev/null
+++ b/Documentation/admin-guide/lockup-watchdogs.rst
@@ -0,0 +1,83 @@
+===============================================================
+Softlockup detector and hardlockup detector (aka nmi_watchdog)
+===============================================================
+
+The Linux kernel can act as a watchdog to detect both soft and hard
+lockups.
+
+A 'softlockup' is defined as a bug that causes the kernel to loop in
+kernel mode for more than 20 seconds (see "Implementation" below for
+details), without giving other tasks a chance to run. The current
+stack trace is displayed upon detection and, by default, the system
+will stay locked up. Alternatively, the kernel can be configured to
+panic; a sysctl, "kernel.softlockup_panic", a kernel parameter,
+"softlockup_panic" (see "Documentation/admin-guide/kernel-parameters.rst" for
+details), and a compile option, "BOOTPARAM_SOFTLOCKUP_PANIC", are
+provided for this.
+
+A 'hardlockup' is defined as a bug that causes the CPU to loop in
+kernel mode for more than 10 seconds (see "Implementation" below for
+details), without letting other interrupts have a chance to run.
+Similarly to the softlockup case, the current stack trace is displayed
+upon detection and the system will stay locked up unless the default
+behavior is changed, which can be done through a sysctl,
+'hardlockup_panic', a compile time knob, "BOOTPARAM_HARDLOCKUP_PANIC",
+and a kernel parameter, "nmi_watchdog"
+(see "Documentation/admin-guide/kernel-parameters.rst" for details).
+
+The panic option can be used in combination with panic_timeout (this
+timeout is set through the confusingly named "kernel.panic" sysctl),
+to cause the system to reboot automatically after a specified amount
+of time.
+
+Implementation
+==============
+
+The soft and hard lockup detectors are built on top of the hrtimer and
+perf subsystems, respectively. A direct consequence of this is that,
+in principle, they should work in any architecture where these
+subsystems are present.
+
+A periodic hrtimer runs to generate interrupts and kick the watchdog
+task. An NMI perf event is generated every "watchdog_thresh"
+(compile-time initialized to 10 and configurable through sysctl of the
+same name) seconds to check for hardlockups. If any CPU in the system
+does not receive any hrtimer interrupt during that time the
+'hardlockup detector' (the handler for the NMI perf event) will
+generate a kernel warning or call panic, depending on the
+configuration.
+
+The watchdog task is a high priority kernel thread that updates a
+timestamp every time it is scheduled. If that timestamp is not updated
+for 2*watchdog_thresh seconds (the softlockup threshold) the
+'softlockup detector' (coded inside the hrtimer callback function)
+will dump useful debug information to the system log, after which it
+will call panic if it was instructed to do so or resume execution of
+other kernel code.
+
+The period of the hrtimer is 2*watchdog_thresh/5, which means it has
+two or three chances to generate an interrupt before the hardlockup
+detector kicks in.
+
+As explained above, a kernel knob is provided that allows
+administrators to configure the period of the hrtimer and the perf
+event. The right value for a particular environment is a trade-off
+between fast response to lockups and detection overhead.
+
+By default, the watchdog runs on all online cores. However, on a
+kernel configured with NO_HZ_FULL, by default the watchdog runs only
+on the housekeeping cores, not the cores specified in the "nohz_full"
+boot argument. If we allowed the watchdog to run by default on
+the "nohz_full" cores, we would have to run timer ticks to activate
+the scheduler, which would prevent the "nohz_full" functionality
+from protecting the user code on those cores from the kernel.
+Of course, disabling it by default on the nohz_full cores means that
+when those cores do enter the kernel, by default we will not be
+able to detect if they lock up. However, allowing the watchdog
+to continue to run on the housekeeping (non-tickless) cores means
+that we will continue to detect lockups properly on those cores.
+
+In either case, the set of cores excluded from running the watchdog
+may be adjusted via the kernel.watchdog_cpumask sysctl. For
+nohz_full cores, this may be useful for debugging a case where the
+kernel seems to be hanging on the nohz_full cores.
diff --git a/Documentation/admin-guide/mm/cma_debugfs.rst b/Documentation/admin-guide/mm/cma_debugfs.rst
new file mode 100644
index 000000000000..4e06ffabd78a
--- /dev/null
+++ b/Documentation/admin-guide/mm/cma_debugfs.rst
@@ -0,0 +1,25 @@
+=====================
+CMA Debugfs Interface
+=====================
+
+The CMA debugfs interface is useful to retrieve basic information out of the
+different CMA areas and to test allocation/release in each of the areas.
+
+Each CMA zone represents a directory under <debugfs>/cma/, indexed by the
+kernel's CMA index. So the first CMA zone would be:
+
+ <debugfs>/cma/cma-0
+
+The structure of the files created under that directory is as follows:
+
+ - [RO] base_pfn: The base PFN (Page Frame Number) of the zone.
+ - [RO] count: Amount of memory in the CMA area.
+ - [RO] order_per_bit: Order of pages represented by one bit.
+ - [RO] bitmap: The bitmap of page states in the zone.
+ - [WO] alloc: Allocate N pages from that CMA area. For example::
+
+ echo 5 > <debugfs>/cma/cma-2/alloc
+
+would try to allocate 5 pages from the cma-2 area.
+
+ - [WO] free: Free N pages from that CMA area, similar to the above.
diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst
index 5f61a6c429e0..11db46448354 100644
--- a/Documentation/admin-guide/mm/index.rst
+++ b/Documentation/admin-guide/mm/index.rst
@@ -26,6 +26,7 @@ the Linux memory management.
:maxdepth: 1
concepts
+ cma_debugfs
hugetlbpage
idle_page_tracking
ksm
diff --git a/Documentation/admin-guide/numastat.rst b/Documentation/admin-guide/numastat.rst
new file mode 100644
index 000000000000..aaf1667489f8
--- /dev/null
+++ b/Documentation/admin-guide/numastat.rst
@@ -0,0 +1,30 @@
+===============================
+Numa policy hit/miss statistics
+===============================
+
+/sys/devices/system/node/node*/numastat
+
+All units are pages. Hugepages have separate counters.
+
+=============== ============================================================
+numa_hit A process wanted to allocate memory from this node,
+ and succeeded.
+
+numa_miss A process wanted to allocate memory from another node,
+ but ended up with memory from this node.
+
+numa_foreign A process wanted to allocate on this node,
+ but ended up with memory from another one.
+
+local_node A process ran on this node and got memory from it.
+
+other_node A process ran on this node and got memory from another node.
+
+interleave_hit Interleaving wanted to allocate from this node
+ and succeeded.
+=============== ============================================================
+
+For easier reading you can use the numastat utility from the numactl package
+(http://oss.sgi.com/projects/libnuma/). Note that it only works
+well right now on machines with a small number of CPUs.
+
diff --git a/Documentation/admin-guide/pnp.rst b/Documentation/admin-guide/pnp.rst
new file mode 100644
index 000000000000..bab2d10631f0
--- /dev/null
+++ b/Documentation/admin-guide/pnp.rst
@@ -0,0 +1,292 @@
+=================================
+Linux Plug and Play Documentation
+=================================
+
+:Author: Adam Belay <ambx1@neo.rr.com>
+:Last updated: Oct. 16, 2002
+
+
+Overview
+--------
+
+Plug and Play provides a means of detecting and setting resources for legacy or
+otherwise unconfigurable devices. The Linux Plug and Play Layer provides these
+services to compatible drivers.
+
+
+The User Interface
+------------------
+
+The Linux Plug and Play user interface provides a means to activate PnP devices
+for legacy and user level drivers that do not support Linux Plug and Play. The
+user interface is integrated into sysfs.
+
+In addition to the standard sysfs file the following are created in each
+device's directory:
+- id - displays a list of support EISA IDs
+- options - displays possible resource configurations
+- resources - displays currently allocated resources and allows resource changes
+
+activating a device
+^^^^^^^^^^^^^^^^^^^
+
+::
+
+ # echo "auto" > resources
+
+this will invoke the automatic resource config system to activate the device
+
+manually activating a device
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+::
+
+ # echo "manual <depnum> <mode>" > resources
+
+ <depnum> - the configuration number
+ <mode> - static or dynamic
+ static = for next boot
+ dynamic = now
+
+disabling a device
+^^^^^^^^^^^^^^^^^^
+
+::
+
+ # echo "disable" > resources
+
+
+EXAMPLE:
+
+Suppose you need to activate the floppy disk controller.
+
+1. change to the proper directory, in my case it is
+ /driver/bus/pnp/devices/00:0f::
+
+ # cd /driver/bus/pnp/devices/00:0f
+ # cat name
+ PC standard floppy disk controller
+
+2. check if the device is already active::
+
+ # cat resources
+ DISABLED
+
+ - Notice the string "DISABLED". This means the device is not active.
+
+3. check the device's possible configurations (optional)::
+
+ # cat options
+ Dependent: 01 - Priority acceptable
+ port 0x3f0-0x3f0, align 0x7, size 0x6, 16-bit address decoding
+ port 0x3f7-0x3f7, align 0x0, size 0x1, 16-bit address decoding
+ irq 6
+ dma 2 8-bit compatible
+ Dependent: 02 - Priority acceptable
+ port 0x370-0x370, align 0x7, size 0x6, 16-bit address decoding
+ port 0x377-0x377, align 0x0, size 0x1, 16-bit address decoding
+ irq 6
+ dma 2 8-bit compatible
+
+4. now activate the device::
+
+ # echo "auto" > resources
+
+5. finally check if the device is active::
+
+ # cat resources
+ io 0x3f0-0x3f5
+ io 0x3f7-0x3f7
+ irq 6
+ dma 2
+
+also there are a series of kernel parameters::
+
+ pnp_reserve_irq=irq1[,irq2] ....
+ pnp_reserve_dma=dma1[,dma2] ....
+ pnp_reserve_io=io1,size1[,io2,size2] ....
+ pnp_reserve_mem=mem1,size1[,mem2,size2] ....
+
+
+
+The Unified Plug and Play Layer
+-------------------------------
+
+All Plug and Play drivers, protocols, and services meet at a central location
+called the Plug and Play Layer. This layer is responsible for the exchange of
+information between PnP drivers and PnP protocols. Thus it automatically
+forwards commands to the proper protocol. This makes writing PnP drivers
+significantly easier.
+
+The following functions are available from the Plug and Play Layer:
+
+pnp_get_protocol
+ increments the number of uses by one
+
+pnp_put_protocol
+ deincrements the number of uses by one
+
+pnp_register_protocol
+ use this to register a new PnP protocol
+
+pnp_unregister_protocol
+ use this function to remove a PnP protocol from the Plug and Play Layer
+
+pnp_register_driver
+ adds a PnP driver to the Plug and Play Layer
+
+ this includes driver model integration
+ returns zero for success or a negative error number for failure; count
+ calls to the .add() method if you need to know how many devices bind to
+ the driver
+
+pnp_unregister_driver
+ removes a PnP driver from the Plug and Play Layer
+
+
+
+Plug and Play Protocols
+-----------------------
+
+This section contains information for PnP protocol developers.
+
+The following Protocols are currently available in the computing world:
+
+- PNPBIOS:
+ used for system devices such as serial and parallel ports.
+- ISAPNP:
+ provides PnP support for the ISA bus
+- ACPI:
+ among its many uses, ACPI provides information about system level
+ devices.
+
+It is meant to replace the PNPBIOS. It is not currently supported by Linux
+Plug and Play but it is planned to be in the near future.
+
+
+Requirements for a Linux PnP protocol:
+1. the protocol must use EISA IDs
+2. the protocol must inform the PnP Layer of a device's current configuration
+
+- the ability to set resources is optional but preferred.
+
+The following are PnP protocol related functions:
+
+pnp_add_device
+ use this function to add a PnP device to the PnP layer
+
+ only call this function when all wanted values are set in the pnp_dev
+ structure
+
+pnp_init_device
+ call this to initialize the PnP structure
+
+pnp_remove_device
+ call this to remove a device from the Plug and Play Layer.
+ it will fail if the device is still in use.
+ automatically will free mem used by the device and related structures
+
+pnp_add_id
+ adds an EISA ID to the list of supported IDs for the specified device
+
+For more information consult the source of a protocol such as
+/drivers/pnp/pnpbios/core.c.
+
+
+
+Linux Plug and Play Drivers
+---------------------------
+
+This section contains information for Linux PnP driver developers.
+
+The New Way
+^^^^^^^^^^^
+
+1. first make a list of supported EISA IDS
+
+ ex::
+
+ static const struct pnp_id pnp_dev_table[] = {
+ /* Standard LPT Printer Port */
+ {.id = "PNP0400", .driver_data = 0},
+ /* ECP Printer Port */
+ {.id = "PNP0401", .driver_data = 0},
+ {.id = ""}
+ };
+
+ Please note that the character 'X' can be used as a wild card in the function
+ portion (last four characters).
+
+ ex::
+
+ /* Unknown PnP modems */
+ { "PNPCXXX", UNKNOWN_DEV },
+
+ Supported PnP card IDs can optionally be defined.
+ ex::
+
+ static const struct pnp_id pnp_card_table[] = {
+ { "ANYDEVS", 0 },
+ { "", 0 }
+ };
+
+2. Optionally define probe and remove functions. It may make sense not to
+ define these functions if the driver already has a reliable method of detecting
+ the resources, such as the parport_pc driver.
+
+ ex::
+
+ static int
+ serial_pnp_probe(struct pnp_dev * dev, const struct pnp_id *card_id, const
+ struct pnp_id *dev_id)
+ {
+ . . .
+
+ ex::
+
+ static void serial_pnp_remove(struct pnp_dev * dev)
+ {
+ . . .
+
+ consult /drivers/serial/8250_pnp.c for more information.
+
+3. create a driver structure
+
+ ex::
+
+ static struct pnp_driver serial_pnp_driver = {
+ .name = "serial",
+ .card_id_table = pnp_card_table,
+ .id_table = pnp_dev_table,
+ .probe = serial_pnp_probe,
+ .remove = serial_pnp_remove,
+ };
+
+ * name and id_table cannot be NULL.
+
+4. register the driver
+
+ ex::
+
+ static int __init serial8250_pnp_init(void)
+ {
+ return pnp_register_driver(&serial_pnp_driver);
+ }
+
+The Old Way
+^^^^^^^^^^^
+
+A series of compatibility functions have been created to make it easy to convert
+ISAPNP drivers. They should serve as a temporary solution only.
+
+They are as follows::
+
+ struct pnp_card *pnp_find_card(unsigned short vendor,
+ unsigned short device,
+ struct pnp_card *from)
+
+ struct pnp_dev *pnp_find_dev(struct pnp_card *card,
+ unsigned short vendor,
+ unsigned short function,
+ struct pnp_dev *from)
+
diff --git a/Documentation/admin-guide/rtc.rst b/Documentation/admin-guide/rtc.rst
new file mode 100644
index 000000000000..688c95b11919
--- /dev/null
+++ b/Documentation/admin-guide/rtc.rst
@@ -0,0 +1,140 @@
+=======================================
+Real Time Clock (RTC) Drivers for Linux
+=======================================
+
+When Linux developers talk about a "Real Time Clock", they usually mean
+something that tracks wall clock time and is battery backed so that it
+works even with system power off. Such clocks will normally not track
+the local time zone or daylight savings time -- unless they dual boot
+with MS-Windows -- but will instead be set to Coordinated Universal Time
+(UTC, formerly "Greenwich Mean Time").
+
+The newest non-PC hardware tends to just count seconds, like the time(2)
+system call reports, but RTCs also very commonly represent time using
+the Gregorian calendar and 24 hour time, as reported by gmtime(3).
+
+Linux has two largely-compatible userspace RTC API families you may
+need to know about:
+
+ * /dev/rtc ... is the RTC provided by PC compatible systems,
+ so it's not very portable to non-x86 systems.
+
+ * /dev/rtc0, /dev/rtc1 ... are part of a framework that's
+ supported by a wide variety of RTC chips on all systems.
+
+Programmers need to understand that the PC/AT functionality is not
+always available, and some systems can do much more. That is, the
+RTCs use the same API to make requests in both RTC frameworks (using
+different filenames of course), but the hardware may not offer the
+same functionality. For example, not every RTC is hooked up to an
+IRQ, so they can't all issue alarms; and where standard PC RTCs can
+only issue an alarm up to 24 hours in the future, other hardware may
+be able to schedule one any time in the upcoming century.
+
+
+Old PC/AT-Compatible driver: /dev/rtc
+--------------------------------------
+
+All PCs (even Alpha machines) have a Real Time Clock built into them.
+Usually they are built into the chipset of the computer, but some may
+actually have a Motorola MC146818 (or clone) on the board. This is the
+clock that keeps the date and time while your computer is turned off.
+
+ACPI has standardized that MC146818 functionality, and extended it in
+a few ways (enabling longer alarm periods, and wake-from-hibernate).
+That functionality is NOT exposed in the old driver.
+
+However it can also be used to generate signals from a slow 2Hz to a
+relatively fast 8192Hz, in increments of powers of two. These signals
+are reported by interrupt number 8. (Oh! So *that* is what IRQ 8 is
+for...) It can also function as a 24hr alarm, raising IRQ 8 when the
+alarm goes off. The alarm can also be programmed to only check any
+subset of the three programmable values, meaning that it could be set to
+ring on the 30th second of the 30th minute of every hour, for example.
+The clock can also be set to generate an interrupt upon every clock
+update, thus generating a 1Hz signal.
+
+The interrupts are reported via /dev/rtc (major 10, minor 135, read only
+character device) in the form of an unsigned long. The low byte contains
+the type of interrupt (update-done, alarm-rang, or periodic) that was
+raised, and the remaining bytes contain the number of interrupts since
+the last read. Status information is reported through the pseudo-file
+/proc/driver/rtc if the /proc filesystem was enabled. The driver has
+built in locking so that only one process is allowed to have the /dev/rtc
+interface open at a time.
+
+A user process can monitor these interrupts by doing a read(2) or a
+select(2) on /dev/rtc -- either will block/stop the user process until
+the next interrupt is received. This is useful for things like
+reasonably high frequency data acquisition where one doesn't want to
+burn up 100% CPU by polling gettimeofday etc. etc.
+
+At high frequencies, or under high loads, the user process should check
+the number of interrupts received since the last read to determine if
+there has been any interrupt "pileup" so to speak. Just for reference, a
+typical 486-33 running a tight read loop on /dev/rtc will start to suffer
+occasional interrupt pileup (i.e. > 1 IRQ event since last read) for
+frequencies above 1024Hz. So you really should check the high bytes
+of the value you read, especially at frequencies above that of the
+normal timer interrupt, which is 100Hz.
+
+Programming and/or enabling interrupt frequencies greater than 64Hz is
+only allowed by root. This is perhaps a bit conservative, but we don't want
+an evil user generating lots of IRQs on a slow 386sx-16, where it might have
+a negative impact on performance. This 64Hz limit can be changed by writing
+a different value to /proc/sys/dev/rtc/max-user-freq. Note that the
+interrupt handler is only a few lines of code to minimize any possibility
+of this effect.
+
+Also, if the kernel time is synchronized with an external source, the
+kernel will write the time back to the CMOS clock every 11 minutes. In
+the process of doing this, the kernel briefly turns off RTC periodic
+interrupts, so be aware of this if you are doing serious work. If you
+don't synchronize the kernel time with an external source (via ntp or
+whatever) then the kernel will keep its hands off the RTC, allowing you
+exclusive access to the device for your applications.
+
+The alarm and/or interrupt frequency are programmed into the RTC via
+various ioctl(2) calls as listed in ./include/linux/rtc.h
+Rather than write 50 pages describing the ioctl() and so on, it is
+perhaps more useful to include a small test program that demonstrates
+how to use them, and demonstrates the features of the driver. This is
+probably a lot more useful to people interested in writing applications
+that will be using this driver. See the code at the end of this document.
+
+(The original /dev/rtc driver was written by Paul Gortmaker.)
+
+
+New portable "RTC Class" drivers: /dev/rtcN
+--------------------------------------------
+
+Because Linux supports many non-ACPI and non-PC platforms, some of which
+have more than one RTC style clock, it needed a more portable solution
+than expecting a single battery-backed MC146818 clone on every system.
+Accordingly, a new "RTC Class" framework has been defined. It offers
+three different userspace interfaces:
+
+ * /dev/rtcN ... much the same as the older /dev/rtc interface
+
+ * /sys/class/rtc/rtcN ... sysfs attributes support readonly
+ access to some RTC attributes.
+
+ * /proc/driver/rtc ... the system clock RTC may expose itself
+ using a procfs interface. If there is no RTC for the system clock,
+ rtc0 is used by default. More information is (currently) shown
+ here than through sysfs.
+
+The RTC Class framework supports a wide variety of RTCs, ranging from those
+integrated into embeddable system-on-chip (SOC) processors to discrete chips
+using I2C, SPI, or some other bus to communicate with the host CPU. There's
+even support for PC-style RTCs ... including the features exposed on newer PCs
+through ACPI.
+
+The new framework also removes the "one RTC per system" restriction. For
+example, maybe the low-power battery-backed RTC is a discrete I2C chip, but
+a high functionality RTC is integrated into the SOC. That system might read
+the system clock from the discrete RTC, but use the integrated one for all
+other tasks, because of its greater functionality.
+
+Check out tools/testing/selftests/rtc/rtctest.c for an example usage of the
+ioctl interface.
diff --git a/Documentation/admin-guide/svga.rst b/Documentation/admin-guide/svga.rst
new file mode 100644
index 000000000000..b6c2f9acca92
--- /dev/null
+++ b/Documentation/admin-guide/svga.rst
@@ -0,0 +1,249 @@
+.. include:: <isonum.txt>
+
+=================================
+Video Mode Selection Support 2.13
+=================================
+
+:Copyright: |copy| 1995--1999 Martin Mares, <mj@ucw.cz>
+
+Intro
+~~~~~
+
+This small document describes the "Video Mode Selection" feature which
+allows the use of various special video modes supported by the video BIOS. Due
+to usage of the BIOS, the selection is limited to boot time (before the
+kernel decompression starts) and works only on 80X86 machines.
+
+.. note::
+
+ Short intro for the impatient: Just use vga=ask for the first time,
+ enter ``scan`` on the video mode prompt, pick the mode you want to use,
+ remember its mode ID (the four-digit hexadecimal number) and then
+ set the vga parameter to this number (converted to decimal first).
+
+The video mode to be used is selected by a kernel parameter which can be
+specified in the kernel Makefile (the SVGA_MODE=... line) or by the "vga=..."
+option of LILO (or some other boot loader you use) or by the "vidmode" utility
+(present in standard Linux utility packages). You can use the following values
+of this parameter::
+
+ NORMAL_VGA - Standard 80x25 mode available on all display adapters.
+
+ EXTENDED_VGA - Standard 8-pixel font mode: 80x43 on EGA, 80x50 on VGA.
+
+ ASK_VGA - Display a video mode menu upon startup (see below).
+
+ 0..35 - Menu item number (when you have used the menu to view the list of
+ modes available on your adapter, you can specify the menu item you want
+ to use). 0..9 correspond to "0".."9", 10..35 to "a".."z". Warning: the
+ mode list displayed may vary as the kernel version changes, because the
+ modes are listed in a "first detected -- first displayed" manner. It's
+ better to use absolute mode numbers instead.
+
+ 0x.... - Hexadecimal video mode ID (also displayed on the menu, see below
+ for exact meaning of the ID). Warning: rdev and LILO don't support
+ hexadecimal numbers -- you have to convert it to decimal manually.
+
+Menu
+~~~~
+
+The ASK_VGA mode causes the kernel to offer a video mode menu upon
+bootup. It displays a "Press <RETURN> to see video modes available, <SPACE>
+to continue or wait 30 secs" message. If you press <RETURN>, you enter the
+menu, if you press <SPACE> or wait 30 seconds, the kernel will boot up in
+the standard 80x25 mode.
+
+The menu looks like::
+
+ Video adapter: <name-of-detected-video-adapter>
+ Mode: COLSxROWS:
+ 0 0F00 80x25
+ 1 0F01 80x50
+ 2 0F02 80x43
+ 3 0F03 80x26
+ ....
+ Enter mode number or ``scan``: <flashing-cursor-here>
+
+<name-of-detected-video-adapter> tells what video adapter did Linux detect
+-- it's either a generic adapter name (MDA, CGA, HGC, EGA, VGA, VESA VGA [a VGA
+with VESA-compliant BIOS]) or a chipset name (e.g., Trident). Direct detection
+of chipsets is turned off by default as it's inherently unreliable due to
+absolutely insane PC design.
+
+"0 0F00 80x25" means that the first menu item (the menu items are numbered
+from "0" to "9" and from "a" to "z") is a 80x25 mode with ID=0x0f00 (see the
+next section for a description of mode IDs).
+
+<flashing-cursor-here> encourages you to enter the item number or mode ID
+you wish to set and press <RETURN>. If the computer complains something about
+"Unknown mode ID", it is trying to tell you that it isn't possible to set such
+a mode. It's also possible to press only <RETURN> which leaves the current mode.
+
+The mode list usually contains a few basic modes and some VESA modes. In
+case your chipset has been detected, some chipset-specific modes are shown as
+well (some of these might be missing or unusable on your machine as different
+BIOSes are often shipped with the same card and the mode numbers depend purely
+on the VGA BIOS).
+
+The modes displayed on the menu are partially sorted: The list starts with
+the standard modes (80x25 and 80x50) followed by "special" modes (80x28 and
+80x43), local modes (if the local modes feature is enabled), VESA modes and
+finally SVGA modes for the auto-detected adapter.
+
+If you are not happy with the mode list offered (e.g., if you think your card
+is able to do more), you can enter "scan" instead of item number / mode ID. The
+program will try to ask the BIOS for all possible video mode numbers and test
+what happens then. The screen will be probably flashing wildly for some time and
+strange noises will be heard from inside the monitor and so on and then, really
+all consistent video modes supported by your BIOS will appear (plus maybe some
+``ghost modes``). If you are afraid this could damage your monitor, don't use
+this function.
+
+After scanning, the mode ordering is a bit different: the auto-detected SVGA
+modes are not listed at all and the modes revealed by ``scan`` are shown before
+all VESA modes.
+
+Mode IDs
+~~~~~~~~
+
+Because of the complexity of all the video stuff, the video mode IDs
+used here are also a bit complex. A video mode ID is a 16-bit number usually
+expressed in a hexadecimal notation (starting with "0x"). You can set a mode
+by entering its mode directly if you know it even if it isn't shown on the menu.
+
+The ID numbers can be divided to those regions::
+
+ 0x0000 to 0x00ff - menu item references. 0x0000 is the first item. Don't use
+ outside the menu as this can change from boot to boot (especially if you
+ have used the ``scan`` feature).
+
+ 0x0100 to 0x017f - standard BIOS modes. The ID is a BIOS video mode number
+ (as presented to INT 10, function 00) increased by 0x0100.
+
+ 0x0200 to 0x08ff - VESA BIOS modes. The ID is a VESA mode ID increased by
+ 0x0100. All VESA modes should be autodetected and shown on the menu.
+
+ 0x0900 to 0x09ff - Video7 special modes. Set by calling INT 0x10, AX=0x6f05.
+ (Usually 940=80x43, 941=132x25, 942=132x44, 943=80x60, 944=100x60,
+ 945=132x28 for the standard Video7 BIOS)
+
+ 0x0f00 to 0x0fff - special modes (they are set by various tricks -- usually
+ by modifying one of the standard modes). Currently available:
+ 0x0f00 standard 80x25, don't reset mode if already set (=FFFF)
+ 0x0f01 standard with 8-point font: 80x43 on EGA, 80x50 on VGA
+ 0x0f02 VGA 80x43 (VGA switched to 350 scanlines with a 8-point font)
+ 0x0f03 VGA 80x28 (standard VGA scans, but 14-point font)
+ 0x0f04 leave current video mode
+ 0x0f05 VGA 80x30 (480 scans, 16-point font)
+ 0x0f06 VGA 80x34 (480 scans, 14-point font)
+ 0x0f07 VGA 80x60 (480 scans, 8-point font)
+ 0x0f08 Graphics hack (see the VIDEO_GFX_HACK paragraph below)
+
+ 0x1000 to 0x7fff - modes specified by resolution. The code has a "0xRRCC"
+ form where RR is a number of rows and CC is a number of columns.
+ E.g., 0x1950 corresponds to a 80x25 mode, 0x2b84 to 132x43 etc.
+ This is the only fully portable way to refer to a non-standard mode,
+ but it relies on the mode being found and displayed on the menu
+ (remember that mode scanning is not done automatically).
+
+ 0xff00 to 0xffff - aliases for backward compatibility:
+ 0xffff equivalent to 0x0f00 (standard 80x25)
+ 0xfffe equivalent to 0x0f01 (EGA 80x43 or VGA 80x50)
+
+If you add 0x8000 to the mode ID, the program will try to recalculate
+vertical display timing according to mode parameters, which can be used to
+eliminate some annoying bugs of certain VGA BIOSes (usually those used for
+cards with S3 chipsets and old Cirrus Logic BIOSes) -- mainly extra lines at the
+end of the display.
+
+Options
+~~~~~~~
+
+Build options for arch/x86/boot/* are selected by the kernel kconfig
+utility and the kernel .config file.
+
+VIDEO_GFX_HACK - includes special hack for setting of graphics modes
+to be used later by special drivers.
+Allows to set _any_ BIOS mode including graphic ones and forcing specific
+text screen resolution instead of peeking it from BIOS variables. Don't use
+unless you think you know what you're doing. To activate this setup, use
+mode number 0x0f08 (see the Mode IDs section above).
+
+Still doesn't work?
+~~~~~~~~~~~~~~~~~~~
+
+When the mode detection doesn't work (e.g., the mode list is incorrect or
+the machine hangs instead of displaying the menu), try to switch off some of
+the configuration options listed under "Options". If it fails, you can still use
+your kernel with the video mode set directly via the kernel parameter.
+
+In either case, please send me a bug report containing what _exactly_
+happens and how do the configuration switches affect the behaviour of the bug.
+
+If you start Linux from M$-DOS, you might also use some DOS tools for
+video mode setting. In this case, you must specify the 0x0f04 mode ("leave
+current settings") to Linux, because if you don't and you use any non-standard
+mode, Linux will switch to 80x25 automatically.
+
+If you set some extended mode and there's one or more extra lines on the
+bottom of the display containing already scrolled-out text, your VGA BIOS
+contains the most common video BIOS bug called "incorrect vertical display
+end setting". Adding 0x8000 to the mode ID might fix the problem. Unfortunately,
+this must be done manually -- no autodetection mechanisms are available.
+
+History
+~~~~~~~
+
+=============== ================================================================
+1.0 (??-Nov-95) First version supporting all adapters supported by the old
+ setup.S + Cirrus Logic 54XX. Present in some 1.3.4? kernels
+ and then removed due to instability on some machines.
+2.0 (28-Jan-96) Rewritten from scratch. Cirrus Logic 64XX support added, almost
+ everything is configurable, the VESA support should be much more
+ stable, explicit mode numbering allowed, "scan" implemented etc.
+2.1 (30-Jan-96) VESA modes moved to 0x200-0x3ff. Mode selection by resolution
+ supported. Few bugs fixed. VESA modes are listed prior to
+ modes supplied by SVGA autodetection as they are more reliable.
+ CLGD autodetect works better. Doesn't depend on 80x25 being
+ active when started. Scanning fixed. 80x43 (any VGA) added.
+ Code cleaned up.
+2.2 (01-Feb-96) EGA 80x43 fixed. VESA extended to 0x200-0x4ff (non-standard 02XX
+ VESA modes work now). Display end bug workaround supported.
+ Special modes renumbered to allow adding of the "recalculate"
+ flag, 0xffff and 0xfffe became aliases instead of real IDs.
+ Screen contents retained during mode changes.
+2.3 (15-Mar-96) Changed to work with 1.3.74 kernel.
+2.4 (18-Mar-96) Added patches by Hans Lermen fixing a memory overwrite problem
+ with some boot loaders. Memory management rewritten to reflect
+ these changes. Unfortunately, screen contents retaining works
+ only with some loaders now.
+ Added a Tseng 132x60 mode.
+2.5 (19-Mar-96) Fixed a VESA mode scanning bug introduced in 2.4.
+2.6 (25-Mar-96) Some VESA BIOS errors not reported -- it fixes error reports on
+ several cards with broken VESA code (e.g., ATI VGA).
+2.7 (09-Apr-96) - Accepted all VESA modes in range 0x100 to 0x7ff, because some
+ cards use very strange mode numbers.
+ - Added Realtek VGA modes (thanks to Gonzalo Tornaria).
+ - Hardware testing order slightly changed, tests based on ROM
+ contents done as first.
+ - Added support for special Video7 mode switching functions
+ (thanks to Tom Vander Aa).
+ - Added 480-scanline modes (especially useful for notebooks,
+ original version written by hhanemaa@cs.ruu.nl, patched by
+ Jeff Chua, rewritten by me).
+ - Screen store/restore fixed.
+2.8 (14-Apr-96) - Previous release was not compilable without CONFIG_VIDEO_SVGA.
+ - Better recognition of text modes during mode scan.
+2.9 (12-May-96) - Ignored VESA modes 0x80 - 0xff (more VESA BIOS bugs!)
+2.10(11-Nov-96) - The whole thing made optional.
+ - Added the CONFIG_VIDEO_400_HACK switch.
+ - Added the CONFIG_VIDEO_GFX_HACK switch.
+ - Code cleanup.
+2.11(03-May-97) - Yet another cleanup, now including also the documentation.
+ - Direct testing of SVGA adapters turned off by default, ``scan``
+ offered explicitly on the prompt line.
+ - Removed the doc section describing adding of new probing
+ functions as I try to get rid of _all_ hardware probing here.
+2.12(25-May-98) Added support for VESA frame buffer graphics.
+2.13(14-May-99) Minor documentation fixes.
+=============== ================================================================
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index a0c1d4ce403a..032c7cd3cede 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -327,7 +327,7 @@ when a hard lockup is detected.
0 - don't panic on hard lockup
1 - panic on hard lockup
-See Documentation/lockup-watchdogs.txt for more information. This can
+See Documentation/admin-guide/lockup-watchdogs.rst for more information. This can
also be set using the nmi_watchdog kernel parameter.
diff --git a/Documentation/admin-guide/video-output.rst b/Documentation/admin-guide/video-output.rst
new file mode 100644
index 000000000000..56d6fa2e2368
--- /dev/null
+++ b/Documentation/admin-guide/video-output.rst
@@ -0,0 +1,34 @@
+Video Output Switcher Control
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+2006 luming.yu@intel.com
+
+The output sysfs class driver provides an abstract video output layer that
+can be used to hook platform specific methods to enable/disable video output
+device through common sysfs interface. For example, on my IBM ThinkPad T42
+laptop, The ACPI video driver registered its output devices and read/write
+method for 'state' with output sysfs class. The user interface under sysfs is::
+
+ linux:/sys/class/video_output # tree .
+ .
+ |-- CRT0
+ | |-- device -> ../../../devices/pci0000:00/0000:00:01.0
+ | |-- state
+ | |-- subsystem -> ../../../class/video_output
+ | `-- uevent
+ |-- DVI0
+ | |-- device -> ../../../devices/pci0000:00/0000:00:01.0
+ | |-- state
+ | |-- subsystem -> ../../../class/video_output
+ | `-- uevent
+ |-- LCD0
+ | |-- device -> ../../../devices/pci0000:00/0000:00:01.0
+ | |-- state
+ | |-- subsystem -> ../../../class/video_output
+ | `-- uevent
+ `-- TV0
+ |-- device -> ../../../devices/pci0000:00/0000:00:01.0
+ |-- state
+ |-- subsystem -> ../../../class/video_output
+ `-- uevent
+