diff options
Diffstat (limited to 'Documentation')
289 files changed, 8379 insertions, 5478 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX deleted file mode 100644 index 2754fe83f0d4..000000000000 --- a/Documentation/00-INDEX +++ /dev/null @@ -1,428 +0,0 @@ - -This is a brief list of all the files in ./linux/Documentation and what -they contain. If you add a documentation file, please list it here in -alphabetical order as well, or risk being hunted down like a rabid dog. -Please keep the descriptions small enough to fit on one line. - Thanks -- Paul G. - -Following translations are available on the WWW: - - - Japanese, maintained by the JF Project (jf@listserv.linux.or.jp), at - http://linuxjf.sourceforge.jp/ - -00-INDEX - - this file. -ABI/ - - info on kernel <-> userspace ABI and relative interface stability. -CodingStyle - - nothing here, just a pointer to process/coding-style.rst. -DMA-API.txt - - DMA API, pci_ API & extensions for non-consistent memory machines. -DMA-API-HOWTO.txt - - Dynamic DMA mapping Guide -DMA-ISA-LPC.txt - - How to do DMA with ISA (and LPC) devices. -DMA-attributes.txt - - listing of the various possible attributes a DMA region can have -EDID/ - - directory with info on customizing EDID for broken gfx/displays. -IPMI.txt - - info on Linux Intelligent Platform Management Interface (IPMI) Driver. -IRQ-affinity.txt - - how to select which CPU(s) handle which interrupt events on SMP. -IRQ-domain.txt - - info on interrupt numbering and setting up IRQ domains. -IRQ.txt - - description of what an IRQ is. -Intel-IOMMU.txt - - basic info on the Intel IOMMU virtualization support. -Makefile - - It's not of interest for those who aren't touching the build system. -PCI/ - - info related to PCI drivers. -RCU/ - - directory with info on RCU (read-copy update). -SAK.txt - - info on Secure Attention Keys. -SM501.txt - - Silicon Motion SM501 multimedia companion chip -SubmittingPatches - - nothing here, just a pointer to process/coding-style.rst. -accounting/ - - documentation on accounting and taskstats. -acpi/ - - info on ACPI-specific hooks in the kernel. -admin-guide/ - - info related to Linux users and system admins. -aoe/ - - description of AoE (ATA over Ethernet) along with config examples. -arm/ - - directory with info about Linux on the ARM architecture. -arm64/ - - directory with info about Linux on the 64 bit ARM architecture. -auxdisplay/ - - misc. LCD driver documentation (cfag12864b, ks0108). -backlight/ - - directory with info on controlling backlights in flat panel displays -block/ - - info on the Block I/O (BIO) layer. -blockdev/ - - info on block devices & drivers -bt8xxgpio.txt - - info on how to modify a bt8xx video card for GPIO usage. -btmrvl.txt - - info on Marvell Bluetooth driver usage. -bus-devices/ - - directory with info on TI GPMC (General Purpose Memory Controller) -bus-virt-phys-mapping.txt - - how to access I/O mapped memory from within device drivers. -cdrom/ - - directory with information on the CD-ROM drivers that Linux has. -cgroup-v1/ - - cgroups v1 features, including cpusets and memory controller. -cma/ - - Continuous Memory Area (CMA) debugfs interface. -conf.py - - It's not of interest for those who aren't touching the build system. -connector/ - - docs on the netlink based userspace<->kernel space communication mod. -console/ - - documentation on Linux console drivers. -core-api/ - - documentation on kernel core components. -cpu-freq/ - - info on CPU frequency and voltage scaling. -cpu-hotplug.txt - - document describing CPU hotplug support in the Linux kernel. -cpu-load.txt - - document describing how CPU load statistics are collected. -cpuidle/ - - info on CPU_IDLE, CPU idle state management subsystem. -cputopology.txt - - documentation on how CPU topology info is exported via sysfs. -crc32.txt - - brief tutorial on CRC computation -crypto/ - - directory with info on the Crypto API. -dcdbas.txt - - information on the Dell Systems Management Base Driver. -debugging-modules.txt - - some notes on debugging modules after Linux 2.6.3. -debugging-via-ohci1394.txt - - how to use firewire like a hardware debugger memory reader. -dell_rbu.txt - - document demonstrating the use of the Dell Remote BIOS Update driver. -dev-tools/ - - directory with info on development tools for the kernel. -device-mapper/ - - directory with info on Device Mapper. -dmaengine/ - - the DMA engine and controller API guides. -devicetree/ - - directory with info on device tree files used by OF/PowerPC/ARM -digsig.txt - -info on the Digital Signature Verification API -dma-buf-sharing.txt - - the DMA Buffer Sharing API Guide -docutils.conf - - nothing here. Just a configuration file for docutils. -dontdiff - - file containing a list of files that should never be diff'ed. -driver-api/ - - the Linux driver implementer's API guide. -driver-model/ - - directory with info about Linux driver model. -early-userspace/ - - info about initramfs, klibc, and userspace early during boot. -efi-stub.txt - - How to use the EFI boot stub to bypass GRUB or elilo on EFI systems. -eisa.txt - - info on EISA bus support. -extcon/ - - directory with porting guide for Android kernel switch driver. -isa.txt - - info on EISA bus support. -fault-injection/ - - dir with docs about the fault injection capabilities infrastructure. -fb/ - - directory with info on the frame buffer graphics abstraction layer. -features/ - - status of feature implementation on different architectures. -filesystems/ - - info on the vfs and the various filesystems that Linux supports. -firmware_class/ - - request_firmware() hotplug interface info. -flexible-arrays.txt - - how to make use of flexible sized arrays in linux -fmc/ - - information about the FMC bus abstraction -fpga/ - - FPGA Manager Core. -futex-requeue-pi.txt - - info on requeueing of tasks from a non-PI futex to a PI futex -gcc-plugins.txt - - GCC plugin infrastructure. -gpio/ - - gpio related documentation -gpu/ - - directory with information on GPU driver developer's guide. -hid/ - - directory with information on human interface devices -highuid.txt - - notes on the change from 16 bit to 32 bit user/group IDs. -hwspinlock.txt - - hardware spinlock provides hardware assistance for synchronization -timers/ - - info on the timer related topics -hw_random.txt - - info on Linux support for random number generator in i8xx chipsets. -hwmon/ - - directory with docs on various hardware monitoring drivers. -i2c/ - - directory with info about the I2C bus/protocol (2 wire, kHz speed). -x86/i386/ - - directory with info about Linux on Intel 32 bit architecture. -ia64/ - - directory with info about Linux on Intel 64 bit architecture. -ide/ - - Information regarding the Enhanced IDE drive. -iio/ - - info on industrial IIO configfs support. -index.rst - - main index for the documentation at ReST format. -infiniband/ - - directory with documents concerning Linux InfiniBand support. -input/ - - info on Linux input device support. -intel_txt.txt - - info on intel Trusted Execution Technology (intel TXT). -io-mapping.txt - - description of io_mapping functions in linux/io-mapping.h -io_ordering.txt - - info on ordering I/O writes to memory-mapped addresses. -ioctl/ - - directory with documents describing various IOCTL calls. -iostats.txt - - info on I/O statistics Linux kernel provides. -irqflags-tracing.txt - - how to use the irq-flags tracing feature. -isapnp.txt - - info on Linux ISA Plug & Play support. -isdn/ - - directory with info on the Linux ISDN support, and supported cards. -kbuild/ - - directory with info about the kernel build process. -kdump/ - - directory with mini HowTo on getting the crash dump code to work. -doc-guide/ - - how to write and format reStructuredText kernel documentation -kernel-per-CPU-kthreads.txt - - List of all per-CPU kthreads and how they introduce jitter. -kobject.txt - - info of the kobject infrastructure of the Linux kernel. -kprobes.txt - - documents the kernel probes debugging feature. -kref.txt - - docs on adding reference counters (krefs) to kernel objects. -laptops/ - - directory with laptop related info and laptop driver documentation. -ldm.txt - - a brief description of LDM (Windows Dynamic Disks). -leds/ - - directory with info about LED handling under Linux. -livepatch/ - - info on kernel live patching. -locking/ - - directory with info about kernel locking primitives -lockup-watchdogs.txt - - info on soft and hard lockup detectors (aka nmi_watchdog). -logo.gif - - full colour GIF image of Linux logo (penguin - Tux). -logo.txt - - info on creator of above logo & site to get additional images from. -lsm.txt - - Linux Security Modules: General Security Hooks for Linux -lzo.txt - - kernel LZO decompressor input formats -m68k/ - - directory with info about Linux on Motorola 68k architecture. -mailbox.txt - - How to write drivers for the common mailbox framework (IPC). -md/ - - directory with info about Linux Software RAID -media/ - - info on media drivers: uAPI, kAPI and driver documentation. -memory-barriers.txt - - info on Linux kernel memory barriers. -memory-devices/ - - directory with info on parts like the Texas Instruments EMIF driver -memory-hotplug.txt - - Hotpluggable memory support, how to use and current status. -men-chameleon-bus.txt - - info on MEN chameleon bus. -mic/ - - Intel Many Integrated Core (MIC) architecture device driver. -mips/ - - directory with info about Linux on MIPS architecture. -misc-devices/ - - directory with info about devices using the misc dev subsystem -mmc/ - - directory with info about the MMC subsystem -mtd/ - - directory with info about memory technology devices (flash) -namespaces/ - - directory with various information about namespaces -netlabel/ - - directory with information on the NetLabel subsystem. -networking/ - - directory with info on various aspects of networking with Linux. -nfc/ - - directory relating info about Near Field Communications support. -nios2/ - - Linux on the Nios II architecture. -nommu-mmap.txt - - documentation about no-mmu memory mapping support. -numastat.txt - - info on how to read Numa policy hit/miss statistics in sysfs. -ntb.txt - - info on Non-Transparent Bridge (NTB) drivers. -nvdimm/ - - info on non-volatile devices. -nvmem/ - - info on non volatile memory framework. -output/ - - default directory where html/LaTeX/pdf files will be written. -padata.txt - - An introduction to the "padata" parallel execution API -parisc/ - - directory with info on using Linux on PA-RISC architecture. -parport-lowlevel.txt - - description and usage of the low level parallel port functions. -pcmcia/ - - info on the Linux PCMCIA driver. -percpu-rw-semaphore.txt - - RCU based read-write semaphore optimized for locking for reading -perf/ - - info about the APM X-Gene SoC Performance Monitoring Unit (PMU). -phy/ - - ino on Samsung USB 2.0 PHY adaptation layer. -phy.txt - - Description of the generic PHY framework. -pi-futex.txt - - documentation on lightweight priority inheritance futexes. -pinctrl.txt - - info on pinctrl subsystem and the PINMUX/PINCONF and drivers -platform/ - - List of supported hardware by compal and Dell laptop. -pnp.txt - - Linux Plug and Play documentation. -power/ - - directory with info on Linux PCI power management. -powerpc/ - - directory with info on using Linux with the PowerPC. -prctl/ - - directory with info on the priveledge control subsystem -preempt-locking.txt - - info on locking under a preemptive kernel. -process/ - - how to work with the mainline kernel development process. -pps/ - - directory with information on the pulse-per-second support -pti/ - - directory with info on Intel MID PTI. -ptp/ - - directory with info on support for IEEE 1588 PTP clocks in Linux. -pwm.txt - - info on the pulse width modulation driver subsystem -rapidio/ - - directory with info on RapidIO packet-based fabric interconnect -rbtree.txt - - info on what red-black trees are and what they are for. -remoteproc.txt - - info on how to handle remote processor (e.g. AMP) offloads/usage. -rfkill.txt - - info on the radio frequency kill switch subsystem/support. -robust-futex-ABI.txt - - documentation of the robust futex ABI. -robust-futexes.txt - - a description of what robust futexes are. -rpmsg.txt - - info on the Remote Processor Messaging (rpmsg) Framework -rtc.txt - - notes on how to use the Real Time Clock (aka CMOS clock) driver. -s390/ - - directory with info on using Linux on the IBM S390. -scheduler/ - - directory with info on the scheduler. -scsi/ - - directory with info on Linux scsi support. -security/ - - directory that contains security-related info -serial/ - - directory with info on the low level serial API. -sgi-ioc4.txt - - description of the SGI IOC4 PCI (multi function) device. -sh/ - - directory with info on porting Linux to a new architecture. -smsc_ece1099.txt - -info on the smsc Keyboard Scan Expansion/GPIO Expansion device. -sound/ - - directory with info on sound card support. -spi/ - - overview of Linux kernel Serial Peripheral Interface (SPI) support. -sphinx/ - - no documentation here, just files required by Sphinx toolchain. -sphinx-static/ - - no documentation here, just files required by Sphinx toolchain. -static-keys.txt - - info on how static keys allow debug code in hotpaths via patching -svga.txt - - short guide on selecting video modes at boot via VGA BIOS. -sync_file.txt - - Sync file API guide. -sysctl/ - - directory with info on the /proc/sys/* files. -target/ - - directory with info on generating TCM v4 fabric .ko modules -tee.txt - - info on the TEE subsystem and drivers -this_cpu_ops.txt - - List rationale behind and the way to use this_cpu operations. -thermal/ - - directory with information on managing thermal issues (CPU/temp) -trace/ - - directory with info on tracing technologies within linux -translations/ - - translations of this document from English to another language -unaligned-memory-access.txt - - info on how to avoid arch breaking unaligned memory access in code. -unshare.txt - - description of the Linux unshare system call. -usb/ - - directory with info regarding the Universal Serial Bus. -vfio.txt - - info on Virtual Function I/O used in guest/hypervisor instances. -video-output.txt - - sysfs class driver interface to enable/disable a video output device. -virtual/ - - directory with information on the various linux virtualizations. -vm/ - - directory with info on the Linux vm code. -w1/ - - directory with documents regarding the 1-wire (w1) subsystem. -watchdog/ - - how to auto-reboot Linux if it has "fallen and can't get up". ;-) -wimax/ - - directory with info about Intel Wireless Wimax Connections -core-api/workqueue.rst - - information on the Concurrency Managed Workqueue implementation -x86/x86_64/ - - directory with info on Linux support for AMD x86-64 (Hammer) machines. -xillybus.txt - - Overview and basic ui of xillybus driver -xtensa/ - - directory with documents relating to arch/xtensa port/implementation -xz.txt - - how to make use of the XZ data compression within linux kernel -zorro.txt - - info on writing drivers for Zorro bus devices found on Amigas. diff --git a/Documentation/ABI/stable/sysfs-driver-usb-usbtmc b/Documentation/ABI/stable/sysfs-driver-usb-usbtmc index e960cd027e1e..a9e123ba32cd 100644 --- a/Documentation/ABI/stable/sysfs-driver-usb-usbtmc +++ b/Documentation/ABI/stable/sysfs-driver-usb-usbtmc @@ -25,38 +25,3 @@ Description: 4.2.2. The files are read only. - - -What: /sys/bus/usb/drivers/usbtmc/*/TermChar -Date: August 2008 -Contact: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -Description: - This file is the TermChar value to be sent to the USB TMC - device as described by the document, "Universal Serial Bus Test - and Measurement Class Specification - (USBTMC) Revision 1.0" as published by the USB-IF. - - Note that the TermCharEnabled file determines if this value is - sent to the device or not. - - -What: /sys/bus/usb/drivers/usbtmc/*/TermCharEnabled -Date: August 2008 -Contact: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -Description: - This file determines if the TermChar is to be sent to the - device on every transaction or not. For more details about - this, please see the document, "Universal Serial Bus Test and - Measurement Class Specification (USBTMC) Revision 1.0" as - published by the USB-IF. - - -What: /sys/bus/usb/drivers/usbtmc/*/auto_abort -Date: August 2008 -Contact: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -Description: - This file determines if the transaction of the USB TMC - device is to be automatically aborted if there is any error. - For more details about this, please see the document, - "Universal Serial Bus Test and Measurement Class Specification - (USBTMC) Revision 1.0" as published by the USB-IF. diff --git a/Documentation/ABI/testing/configfs-stp-policy-p_sys-t b/Documentation/ABI/testing/configfs-stp-policy-p_sys-t new file mode 100644 index 000000000000..b290d1c00dcf --- /dev/null +++ b/Documentation/ABI/testing/configfs-stp-policy-p_sys-t @@ -0,0 +1,41 @@ +What: /config/stp-policy/<device>:p_sys-t.<policy>/<node>/uuid +Date: June 2018 +KernelVersion: 4.19 +Description: + UUID source identifier string, RW. + Default value is randomly generated at the mkdir <node> time. + Data coming from trace sources that use this <node> will be + tagged with this UUID in the MIPI SyS-T packet stream, to + allow the decoder to discern between different sources + within the same master/channel range, and identify the + higher level decoders that may be needed for each source. + +What: /config/stp-policy/<device>:p_sys-t.<policy>/<node>/do_len +Date: June 2018 +KernelVersion: 4.19 +Description: + Include payload length in the MIPI SyS-T header, boolean. + If enabled, the SyS-T protocol encoder will include payload + length in each packet's metadata. This is normally redundant + if the underlying transport protocol supports marking message + boundaries (which STP does), so this is off by default. + +What: /config/stp-policy/<device>:p_sys-t.<policy>/<node>/ts_interval +Date: June 2018 +KernelVersion: 4.19 +Description: + Time interval in milliseconds. Include a timestamp in the + MIPI SyS-T packet metadata, if this many milliseconds have + passed since the previous packet from this source. Zero is + the default and stands for "never send the timestamp". + +What: /config/stp-policy/<device>:p_sys-t.<policy>/<node>/clocksync_interval +Date: June 2018 +KernelVersion: 4.19 +Description: + Time interval in milliseconds. Send a CLOCKSYNC packet if + this many milliseconds have passed since the previous + CLOCKSYNC packet from this source. Zero is the default and + stands for "never send the CLOCKSYNC". It makes sense to + use this option with sources that generate constant and/or + periodic data, like stm_heartbeat. diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc b/Documentation/ABI/testing/configfs-usb-gadget-uvc index 9281e2aa38df..809765bd9573 100644 --- a/Documentation/ABI/testing/configfs-usb-gadget-uvc +++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc @@ -12,6 +12,10 @@ Date: Dec 2014 KernelVersion: 4.0 Description: Control descriptors + All attributes read only: + bInterfaceNumber - USB interface number for this + streaming interface + What: /config/usb-gadget/gadget/functions/uvc.name/control/class Date: Dec 2014 KernelVersion: 4.0 @@ -109,6 +113,10 @@ Date: Dec 2014 KernelVersion: 4.0 Description: Streaming descriptors + All attributes read only: + bInterfaceNumber - USB interface number for this + streaming interface + What: /config/usb-gadget/gadget/functions/uvc.name/streaming/class Date: Dec 2014 KernelVersion: 4.0 @@ -160,6 +168,10 @@ Description: Specific MJPEG format descriptors All attributes read only, except bmaControls and bDefaultFrameIndex: + bFormatIndex - unique id for this format descriptor; + only defined after parent header is + linked into the streaming class; + read-only bmaControls - this format's data for bmaControls in the streaming header bmInterfaceFlags - specifies interlace information, @@ -177,6 +189,10 @@ Date: Dec 2014 KernelVersion: 4.0 Description: Specific MJPEG frame descriptors + bFrameIndex - unique id for this framedescriptor; + only defined after parent format is + linked into the streaming header; + read-only dwFrameInterval - indicates how frame interval can be programmed; a number of values separated by newline can be specified @@ -204,6 +220,10 @@ Date: Dec 2014 KernelVersion: 4.0 Description: Specific uncompressed format descriptors + bFormatIndex - unique id for this format descriptor; + only defined after parent header is + linked into the streaming class; + read-only bmaControls - this format's data for bmaControls in the streaming header bmInterfaceFlags - specifies interlace information, @@ -224,6 +244,10 @@ Date: Dec 2014 KernelVersion: 4.0 Description: Specific uncompressed frame descriptors + bFrameIndex - unique id for this framedescriptor; + only defined after parent format is + linked into the streaming header; + read-only dwFrameInterval - indicates how frame interval can be programmed; a number of values separated by newline can be specified diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 44d4b2be92fd..8bfee557e50e 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -323,3 +323,27 @@ Description: This is similar to /sys/bus/pci/drivers_autoprobe, but affects only the VFs associated with a specific PF. + +What: /sys/bus/pci/devices/.../p2pmem/size +Date: November 2017 +Contact: Logan Gunthorpe <logang@deltatee.com> +Description: + If the device has any Peer-to-Peer memory registered, this + file contains the total amount of memory that the device + provides (in decimal). + +What: /sys/bus/pci/devices/.../p2pmem/available +Date: November 2017 +Contact: Logan Gunthorpe <logang@deltatee.com> +Description: + If the device has any Peer-to-Peer memory registered, this + file contains the amount of memory that has not been + allocated (in decimal). + +What: /sys/bus/pci/devices/.../p2pmem/published +Date: November 2017 +Contact: Logan Gunthorpe <logang@deltatee.com> +Description: + If the device has any Peer-to-Peer memory registered, this + file contains a '1' if the memory has been published for + use outside the driver that owns the device. diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb index 08d456e07b53..559baa5c418c 100644 --- a/Documentation/ABI/testing/sysfs-bus-usb +++ b/Documentation/ABI/testing/sysfs-bus-usb @@ -189,6 +189,16 @@ Description: The file will read "hotplug", "wired" and "not used" if the information is available, and "unknown" otherwise. +What: /sys/bus/usb/devices/.../(hub interface)/portX/location +Date: October 2018 +Contact: Bjørn Mork <bjorn@mork.no> +Description: + Some platforms provide usb port physical location through + firmware. This is used by the kernel to pair up logical ports + mapping to the same physical connector. The attribute exposes the + raw location value as a hex integer. + + What: /sys/bus/usb/devices/.../(hub interface)/portX/quirks Date: May 2018 Contact: Nicolas Boichat <drinkcat@chromium.org> @@ -219,7 +229,14 @@ Description: ports and report them to the kernel. This attribute is to expose the number of over-current situation occurred on a specific port to user space. This file will contain an unsigned 32 bit value - which wraps to 0 after its maximum is reached. + which wraps to 0 after its maximum is reached. This file supports + poll() for monitoring changes to this value in user space. + + Any time this value changes the corresponding hub device will send a + udev event with the following attributes: + + OVER_CURRENT_PORT=/sys/bus/usb/devices/.../(hub interface)/portX + OVER_CURRENT_COUNT=[current value of this sysfs attribute] What: /sys/bus/usb/devices/.../(hub interface)/portX/usb3_lpm_permit Date: November 2015 diff --git a/Documentation/ABI/testing/sysfs-bus-vmbus b/Documentation/ABI/testing/sysfs-bus-vmbus new file mode 100644 index 000000000000..91e6c065973c --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-vmbus @@ -0,0 +1,21 @@ +What: /sys/bus/vmbus/devices/.../driver_override +Date: August 2019 +Contact: Stephen Hemminger <sthemmin@microsoft.com> +Description: + This file allows the driver for a device to be specified which + will override standard static and dynamic ID matching. When + specified, only a driver with a name matching the value written + to driver_override will have an opportunity to bind to the + device. The override is specified by writing a string to the + driver_override file (echo uio_hv_generic > driver_override) and + may be cleared with an empty string (echo > driver_override). + This returns the device to standard matching rules binding. + Writing to driver_override does not automatically unbind the + device from its current driver or make any attempt to + automatically load the specified driver. If no driver with a + matching name is currently loaded in the kernel, the device + will not bind to any driver. This also allows devices to + opt-out of driver binding using a driver_override name such as + "none". Only a single driver may be specified in the override, + there is no support for parsing delimiters. + diff --git a/Documentation/ABI/testing/sysfs-class-lcd-s6e63m0 b/Documentation/ABI/testing/sysfs-class-lcd-s6e63m0 deleted file mode 100644 index ae0a2d3dcc07..000000000000 --- a/Documentation/ABI/testing/sysfs-class-lcd-s6e63m0 +++ /dev/null @@ -1,27 +0,0 @@ -sysfs interface for the S6E63M0 AMOLED LCD panel driver -------------------------------------------------------- - -What: /sys/class/lcd/<lcd>/gamma_mode -Date: May, 2010 -KernelVersion: v2.6.35 -Contact: dri-devel@lists.freedesktop.org -Description: - (RW) Read or write the gamma mode. Following three modes are - supported: - 0 - gamma value 2.2, - 1 - gamma value 1.9 and - 2 - gamma value 1.7. - - -What: /sys/class/lcd/<lcd>/gamma_table -Date: May, 2010 -KernelVersion: v2.6.35 -Contact: dri-devel@lists.freedesktop.org -Description: - (RO) Displays the size of the gamma table i.e. the number of - gamma modes available. - -This is a backlight lcd driver. These interfaces are an extension to the API -documented in Documentation/ABI/testing/sysfs-class-lcd and in -Documentation/ABI/stable/sysfs-class-backlight (under -/sys/class/backlight/<backlight>/). diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net index 2f1788111cd9..664a8f6a634f 100644 --- a/Documentation/ABI/testing/sysfs-class-net +++ b/Documentation/ABI/testing/sysfs-class-net @@ -91,6 +91,24 @@ Description: stacked (e.g: VLAN interfaces) but still have the same MAC address as their parent device. +What: /sys/class/net/<iface>/dev_port +Date: February 2014 +KernelVersion: 3.15 +Contact: netdev@vger.kernel.org +Description: + Indicates the port number of this network device, formatted + as a decimal value. Some NICs have multiple independent ports + on the same PCI bus, device and function. This attribute allows + userspace to distinguish the respective interfaces. + + Note: some device drivers started to use 'dev_id' for this + purpose since long before 3.15 and have not adopted the new + attribute ever since. To query the port number, some tools look + exclusively at 'dev_port', while others only consult 'dev_id'. + If a network device has multiple client adapter ports as + described in the previous paragraph and does not set this + attribute to its port number, it's a kernel bug. + What: /sys/class/net/<iface>/dormant Date: March 2006 KernelVersion: 2.6.17 @@ -117,7 +135,7 @@ Description: full: full duplex Note: This attribute is only valid for interfaces that implement - the ethtool get_settings method (mostly Ethernet). + the ethtool get_link_ksettings method (mostly Ethernet). What: /sys/class/net/<iface>/flags Date: April 2005 @@ -224,7 +242,7 @@ Description: an integer representing the link speed in Mbits/sec. Note: this attribute is only valid for interfaces that implement - the ethtool get_settings method (mostly Ethernet ). + the ethtool get_link_ksettings method (mostly Ethernet). What: /sys/class/net/<iface>/tx_queue_len Date: April 2005 diff --git a/Documentation/ABI/testing/sysfs-class-net-dsa b/Documentation/ABI/testing/sysfs-class-net-dsa new file mode 100644 index 000000000000..f240221e071e --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-net-dsa @@ -0,0 +1,7 @@ +What: /sys/class/net/<iface>/tagging +Date: August 2018 +KernelVersion: 4.20 +Contact: netdev@vger.kernel.org +Description: + String indicating the type of tagging protocol used by the + DSA slave network device. diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs index 94a24aedcdb2..3ac41774ad3c 100644 --- a/Documentation/ABI/testing/sysfs-fs-f2fs +++ b/Documentation/ABI/testing/sysfs-fs-f2fs @@ -121,7 +121,22 @@ What: /sys/fs/f2fs/<disk>/idle_interval Date: January 2016 Contact: "Jaegeuk Kim" <jaegeuk@kernel.org> Description: - Controls the idle timing. + Controls the idle timing for all paths other than + discard and gc path. + +What: /sys/fs/f2fs/<disk>/discard_idle_interval +Date: September 2018 +Contact: "Chao Yu" <yuchao0@huawei.com> +Contact: "Sahitya Tummala" <stummala@codeaurora.org> +Description: + Controls the idle timing for discard path. + +What: /sys/fs/f2fs/<disk>/gc_idle_interval +Date: September 2018 +Contact: "Chao Yu" <yuchao0@huawei.com> +Contact: "Sahitya Tummala" <stummala@codeaurora.org> +Description: + Controls the idle timing for gc path. What: /sys/fs/f2fs/<disk>/iostat_enable Date: August 2017 diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX deleted file mode 100644 index 206b1d5c1e71..000000000000 --- a/Documentation/PCI/00-INDEX +++ /dev/null @@ -1,26 +0,0 @@ -00-INDEX - - this file -acpi-info.txt - - info on how PCI host bridges are represented in ACPI -MSI-HOWTO.txt - - the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ. -PCIEBUS-HOWTO.txt - - a guide describing the PCI Express Port Bus driver -pci-error-recovery.txt - - info on PCI error recovery -pci-iov-howto.txt - - the PCI Express I/O Virtualization HOWTO -pci.txt - - info on the PCI subsystem for device driver authors -pcieaer-howto.txt - - the PCI Express Advanced Error Reporting Driver Guide HOWTO -endpoint/pci-endpoint.txt - - guide to add endpoint controller driver and endpoint function driver. -endpoint/pci-endpoint-cfs.txt - - guide to use configfs to configure the PCI endpoint function. -endpoint/pci-test-function.txt - - specification of *PCI test* function device. -endpoint/pci-test-howto.txt - - userguide for PCI endpoint test function. -endpoint/function/binding/ - - binding documentation for PCI endpoint function diff --git a/Documentation/PCI/endpoint/pci-test-howto.txt b/Documentation/PCI/endpoint/pci-test-howto.txt index e40cf0fb58d7..040479f437a5 100644 --- a/Documentation/PCI/endpoint/pci-test-howto.txt +++ b/Documentation/PCI/endpoint/pci-test-howto.txt @@ -99,17 +99,20 @@ Note that the devices listed here correspond to the value populated in 1.4 above 2.2 Using Endpoint Test function Device pcitest.sh added in tools/pci/ can be used to run all the default PCI endpoint -tests. Before pcitest.sh can be used pcitest.c should be compiled using the -following commands. +tests. To compile this tool the following commands should be used: - cd <kernel-dir> - make headers_install ARCH=arm - arm-linux-gnueabihf-gcc -Iusr/include tools/pci/pcitest.c -o pcitest - cp pcitest <rootfs>/usr/sbin/ - cp tools/pci/pcitest.sh <rootfs> + # cd <kernel-dir> + # make -C tools/pci + +or if you desire to compile and install in your system: + + # cd <kernel-dir> + # make -C tools/pci install + +The tool and script will be located in <rootfs>/usr/bin/ 2.2.1 pcitest.sh Output - # ./pcitest.sh + # pcitest.sh BAR tests BAR0: OKAY diff --git a/Documentation/PCI/pci-error-recovery.txt b/Documentation/PCI/pci-error-recovery.txt index 688b69121e82..0b6bb3ef449e 100644 --- a/Documentation/PCI/pci-error-recovery.txt +++ b/Documentation/PCI/pci-error-recovery.txt @@ -110,7 +110,7 @@ The actual steps taken by a platform to recover from a PCI error event will be platform-dependent, but will follow the general sequence described below. -STEP 0: Error Event: ERR_NONFATAL +STEP 0: Error Event ------------------- A PCI bus error is detected by the PCI hardware. On powerpc, the slot is isolated, in that all I/O is blocked: all reads return 0xffffffff, @@ -228,7 +228,13 @@ proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform proceeds to STEP 4 (Slot Reset) -STEP 3: Slot Reset +STEP 3: Link Reset +------------------ +The platform resets the link. This is a PCI-Express specific step +and is done whenever a fatal error has been detected that can be +"solved" by resetting the link. + +STEP 4: Slot Reset ------------------ In response to a return value of PCI_ERS_RESULT_NEED_RESET, the @@ -314,7 +320,7 @@ Failure). >>> However, it probably should. -STEP 4: Resume Operations +STEP 5: Resume Operations ------------------------- The platform will call the resume() callback on all affected device drivers if all drivers on the segment have returned @@ -326,7 +332,7 @@ a result code. At this point, if a new error happens, the platform will restart a new error recovery sequence. -STEP 5: Permanent Failure +STEP 6: Permanent Failure ------------------------- A "permanent failure" has occurred, and the platform cannot recover the device. The platform will call error_detected() with a @@ -349,27 +355,6 @@ errors. See the discussion in powerpc/eeh-pci-error-recovery.txt for additional detail on real-life experience of the causes of software errors. -STEP 0: Error Event: ERR_FATAL -------------------- -PCI bus error is detected by the PCI hardware. On powerpc, the slot is -isolated, in that all I/O is blocked: all reads return 0xffffffff, all -writes are ignored. - -STEP 1: Remove devices --------------------- -Platform removes the devices depending on the error agent, it could be -this port for all subordinates or upstream component (likely downstream -port) - -STEP 2: Reset link --------------------- -The platform resets the link. This is a PCI-Express specific step and is -done whenever a fatal error has been detected that can be "solved" by -resetting the link. - -STEP 3: Re-enumerate the devices --------------------- -Initiates the re-enumeration. Conclusion; General Remarks --------------------------- diff --git a/Documentation/RCU/00-INDEX b/Documentation/RCU/00-INDEX deleted file mode 100644 index f46980c060aa..000000000000 --- a/Documentation/RCU/00-INDEX +++ /dev/null @@ -1,34 +0,0 @@ -00-INDEX - - This file -arrayRCU.txt - - Using RCU to Protect Read-Mostly Arrays -checklist.txt - - Review Checklist for RCU Patches -listRCU.txt - - Using RCU to Protect Read-Mostly Linked Lists -lockdep.txt - - RCU and lockdep checking -lockdep-splat.txt - - RCU Lockdep splats explained. -NMI-RCU.txt - - Using RCU to Protect Dynamic NMI Handlers -rcu_dereference.txt - - Proper care and feeding of return values from rcu_dereference() -rcubarrier.txt - - RCU and Unloadable Modules -rculist_nulls.txt - - RCU list primitives for use with SLAB_TYPESAFE_BY_RCU -rcuref.txt - - Reference-count design for elements of lists/arrays protected by RCU -rcu.txt - - RCU Concepts -RTFP.txt - - List of RCU papers (bibliography) going back to 1980. -stallwarn.txt - - RCU CPU stall warnings (module parameter rcu_cpu_stall_suppress) -torture.txt - - RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST) -UP.txt - - RCU on Uniprocessor Systems -whatisRCU.txt - - What is RCU? diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt index 7d4ae110c2c9..721b3e426515 100644 --- a/Documentation/RCU/rcu.txt +++ b/Documentation/RCU/rcu.txt @@ -87,7 +87,3 @@ o Where can I find more information on RCU? See the RTFP.txt file in this directory. Or point your browser at http://www.rdrop.com/users/paulmck/RCU/. - -o What are all these files in this directory? - - See 00-INDEX for the list. diff --git a/Documentation/accounting/psi.txt b/Documentation/accounting/psi.txt new file mode 100644 index 000000000000..b8ca28b60215 --- /dev/null +++ b/Documentation/accounting/psi.txt @@ -0,0 +1,73 @@ +================================ +PSI - Pressure Stall Information +================================ + +:Date: April, 2018 +:Author: Johannes Weiner <hannes@cmpxchg.org> + +When CPU, memory or IO devices are contended, workloads experience +latency spikes, throughput losses, and run the risk of OOM kills. + +Without an accurate measure of such contention, users are forced to +either play it safe and under-utilize their hardware resources, or +roll the dice and frequently suffer the disruptions resulting from +excessive overcommit. + +The psi feature identifies and quantifies the disruptions caused by +such resource crunches and the time impact it has on complex workloads +or even entire systems. + +Having an accurate measure of productivity losses caused by resource +scarcity aids users in sizing workloads to hardware--or provisioning +hardware according to workload demand. + +As psi aggregates this information in realtime, systems can be managed +dynamically using techniques such as load shedding, migrating jobs to +other systems or data centers, or strategically pausing or killing low +priority or restartable batch jobs. + +This allows maximizing hardware utilization without sacrificing +workload health or risking major disruptions such as OOM kills. + +Pressure interface +================== + +Pressure information for each resource is exported through the +respective file in /proc/pressure/ -- cpu, memory, and io. + +The format for CPU is as such: + +some avg10=0.00 avg60=0.00 avg300=0.00 total=0 + +and for memory and IO: + +some avg10=0.00 avg60=0.00 avg300=0.00 total=0 +full avg10=0.00 avg60=0.00 avg300=0.00 total=0 + +The "some" line indicates the share of time in which at least some +tasks are stalled on a given resource. + +The "full" line indicates the share of time in which all non-idle +tasks are stalled on a given resource simultaneously. In this state +actual CPU cycles are going to waste, and a workload that spends +extended time in this state is considered to be thrashing. This has +severe impact on performance, and it's useful to distinguish this +situation from a state where some tasks are stalled but the CPU is +still doing productive work. As such, time spent in this subset of the +stall state is tracked separately and exported in the "full" averages. + +The ratios are tracked as recent trends over ten, sixty, and three +hundred second windows, which gives insight into short term events as +well as medium and long term trends. The total absolute stall time is +tracked and exported as well, to allow detection of latency spikes +which wouldn't necessarily make a dent in the time averages, or to +average trends over custom time frames. + +Cgroup2 interface +================= + +In a system with a CONFIG_CGROUP=y kernel and the cgroup2 filesystem +mounted, pressure stall information is also tracked for tasks grouped +into cgroups. Each subdirectory in the cgroupfs mountpoint contains +cpu.pressure, memory.pressure, and io.pressure files; the format is +the same as the /proc/pressure/ files. diff --git a/Documentation/admin-guide/LSM/Yama.rst b/Documentation/admin-guide/LSM/Yama.rst index 13468ea696b7..d0a060de3973 100644 --- a/Documentation/admin-guide/LSM/Yama.rst +++ b/Documentation/admin-guide/LSM/Yama.rst @@ -64,8 +64,8 @@ The sysctl settings (writable only with ``CAP_SYS_PTRACE``) are: Using ``PTRACE_TRACEME`` is unchanged. 2 - admin-only attach: - only processes with ``CAP_SYS_PTRACE`` may use ptrace - with ``PTRACE_ATTACH``, or through children calling ``PTRACE_TRACEME``. + only processes with ``CAP_SYS_PTRACE`` may use ptrace, either with + ``PTRACE_ATTACH`` or through children calling ``PTRACE_TRACEME``. 3 - no attach: no processes may use ptrace with ``PTRACE_ATTACH`` nor via diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst index 15ea785b2dfa..0797eec76be1 100644 --- a/Documentation/admin-guide/README.rst +++ b/Documentation/admin-guide/README.rst @@ -51,8 +51,7 @@ Documentation - There are various README files in the Documentation/ subdirectory: these typically contain kernel-specific installation notes for some - drivers for example. See Documentation/00-INDEX for a list of what - is contained in each file. Please read the + drivers for example. Please read the :ref:`Documentation/process/changes.rst <changes>` file, as it contains information about the problems, which may result by upgrading your kernel. diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index caf36105a1c7..8384c681a4b2 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -966,6 +966,12 @@ All time durations are in microseconds. $PERIOD duration. "max" for $MAX indicates no limit. If only one number is written, $MAX is updated. + cpu.pressure + A read-only nested-key file which exists on non-root cgroups. + + Shows pressure stall information for CPU. See + Documentation/accounting/psi.txt for details. + Memory ------ @@ -1127,6 +1133,10 @@ PAGE_SIZE multiple when read back. disk readahead. For now OOM in memory cgroup kills tasks iff shortage has happened inside page fault. + This event is not raised if the OOM killer is not + considered as an option, e.g. for failed high-order + allocations. + oom_kill The number of processes belonging to this cgroup killed by any kind of OOM killer. @@ -1271,6 +1281,12 @@ PAGE_SIZE multiple when read back. higher than the limit for an extended period of time. This reduces the impact on the workload and memory management. + memory.pressure + A read-only nested-key file which exists on non-root cgroups. + + Shows pressure stall information for memory. See + Documentation/accounting/psi.txt for details. + Usage Guidelines ~~~~~~~~~~~~~~~~ @@ -1408,6 +1424,12 @@ IO Interface Files 8:16 rbps=2097152 wbps=max riops=max wiops=max + io.pressure + A read-only nested-key file which exists on non-root cgroups. + + Shows pressure stall information for IO. See + Documentation/accounting/psi.txt for details. + Writeback ~~~~~~~~~ diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst new file mode 100644 index 000000000000..e506d3dae510 --- /dev/null +++ b/Documentation/admin-guide/ext4.rst @@ -0,0 +1,574 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================== +ext4 General Information +======================== + +Ext4 is an advanced level of the ext3 filesystem which incorporates +scalability and reliability enhancements for supporting large filesystems +(64 bit) in keeping with increasing disk capacities and state-of-the-art +feature requirements. + +Mailing list: linux-ext4@vger.kernel.org +Web site: http://ext4.wiki.kernel.org + + +Quick usage instructions +======================== + +Note: More extensive information for getting started with ext4 can be +found at the ext4 wiki site at the URL: +http://ext4.wiki.kernel.org/index.php/Ext4_Howto + + - The latest version of e2fsprogs can be found at: + + https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ + + or + + http://sourceforge.net/project/showfiles.php?group_id=2406 + + or grab the latest git repository from: + + https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git + + - Create a new filesystem using the ext4 filesystem type: + + # mke2fs -t ext4 /dev/hda1 + + Or to configure an existing ext3 filesystem to support extents: + + # tune2fs -O extents /dev/hda1 + + If the filesystem was created with 128 byte inodes, it can be + converted to use 256 byte for greater efficiency via: + + # tune2fs -I 256 /dev/hda1 + + - Mounting: + + # mount -t ext4 /dev/hda1 /wherever + + - When comparing performance with other filesystems, it's always + important to try multiple workloads; very often a subtle change in a + workload parameter can completely change the ranking of which + filesystems do well compared to others. When comparing versus ext3, + note that ext4 enables write barriers by default, while ext3 does + not enable write barriers by default. So it is useful to use + explicitly specify whether barriers are enabled or not when via the + '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems + for a fair comparison. When tuning ext3 for best benchmark numbers, + it is often worthwhile to try changing the data journaling mode; '-o + data=writeback' can be faster for some workloads. (Note however that + running mounted with data=writeback can potentially leave stale data + exposed in recently written files in case of an unclean shutdown, + which could be a security exposure in some situations.) Configuring + the filesystem with a large journal can also be helpful for + metadata-intensive workloads. + +Features +======== + +Currently Available +------------------- + +* ability to use filesystems > 16TB (e2fsprogs support not available yet) +* extent format reduces metadata overhead (RAM, IO for access, transactions) +* extent format more robust in face of on-disk corruption due to magics, +* internal redundancy in tree +* improved file allocation (multi-block alloc) +* lift 32000 subdirectory limit imposed by i_links_count[1] +* nsec timestamps for mtime, atime, ctime, create time +* inode version field on disk (NFSv4, Lustre) +* reduced e2fsck time via uninit_bg feature +* journal checksumming for robustness, performance +* persistent file preallocation (e.g for streaming media, databases) +* ability to pack bitmaps and inode tables into larger virtual groups via the + flex_bg feature +* large file support +* inode allocation using large virtual block groups via flex_bg +* delayed allocation +* large block (up to pagesize) support +* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force + the ordering) + +[1] Filesystems with a block size of 1k may see a limit imposed by the +directory hash tree having a maximum depth of two. + +Options +======= + +When mounting an ext4 filesystem, the following option are accepted: +(*) == default + + ro + Mount filesystem read only. Note that ext4 will replay the journal (and + thus write to the partition) even when mounted "read only". The mount + options "ro,noload" can be used to prevent writes to the filesystem. + + journal_checksum + Enable checksumming of the journal transactions. This will allow the + recovery code in e2fsck and the kernel to detect corruption in the + kernel. It is a compatible change and will be ignored by older + kernels. + + journal_async_commit + Commit block can be written to disk without waiting for descriptor + blocks. If enabled older kernels cannot mount the device. This will + enable 'journal_checksum' internally. + + journal_path=path, journal_dev=devnum + When the external journal device's major/minor numbers have changed, + these options allow the user to specify the new journal location. The + journal device is identified through either its new major/minor numbers + encoded in devnum, or via a path to the device. + + norecovery, noload + Don't load the journal on mounting. Note that if the filesystem was + not unmounted cleanly, skipping the journal replay will lead to the + filesystem containing inconsistencies that can lead to any number of + problems. + + data=journal + All data are committed into the journal prior to being written into the + main file system. Enabling this mode will disable delayed allocation + and O_DIRECT support. + + data=ordered (*) + All data are forced directly out to the main file system prior to its + metadata being committed to the journal. + + data=writeback + Data ordering is not preserved, data may be written into the main file + system after its metadata has been committed to the journal. + + commit=nrsec (*) + Ext4 can be told to sync all its data and metadata every 'nrsec' + seconds. The default value is 5 seconds. This means that if you lose + your power, you will lose as much as the latest 5 seconds of work (your + filesystem will not be damaged though, thanks to the journaling). This + default value (or any low value) will hurt performance, but it's good + for data-safety. Setting it to 0 will have the same effect as leaving + it at the default (5 seconds). Setting it to very large values will + improve performance. + + barrier=<0|1(*)>, barrier(*), nobarrier + This enables/disables the use of write barriers in the jbd code. + barrier=0 disables, barrier=1 enables. This also requires an IO stack + which can support barriers, and if jbd gets an error on a barrier + write, it will disable again with a warning. Write barriers enforce + proper on-disk ordering of journal commits, making volatile disk write + caches safe to use, at some performance penalty. If your disks are + battery-backed in one way or another, disabling barriers may safely + improve performance. The mount options "barrier" and "nobarrier" can + also be used to enable or disable barriers, for consistency with other + ext4 mount options. + + inode_readahead_blks=n + This tuning parameter controls the maximum number of inode table blocks + that ext4's inode table readahead algorithm will pre-read into the + buffer cache. The default value is 32 blocks. + + nouser_xattr + Disables Extended User Attributes. See the attr(5) manual page for + more information about extended attributes. + + noacl + This option disables POSIX Access Control List support. If ACL support + is enabled in the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL + is enabled by default on mount. See the acl(5) manual page for more + information about acl. + + bsddf (*) + Make 'df' act like BSD. + + minixdf + Make 'df' act like Minix. + + debug + Extra debugging information is sent to syslog. + + abort + Simulate the effects of calling ext4_abort() for debugging purposes. + This is normally used while remounting a filesystem which is already + mounted. + + errors=remount-ro + Remount the filesystem read-only on an error. + + errors=continue + Keep going on a filesystem error. + + errors=panic + Panic and halt the machine if an error occurs. (These mount options + override the errors behavior specified in the superblock, which can be + configured using tune2fs) + + data_err=ignore(*) + Just print an error message if an error occurs in a file data buffer in + ordered mode. + data_err=abort + Abort the journal if an error occurs in a file data buffer in ordered + mode. + + grpid | bsdgroups + New objects have the group ID of their parent. + + nogrpid (*) | sysvgroups + New objects have the group ID of their creator. + + resgid=n + The group ID which may use the reserved blocks. + + resuid=n + The user ID which may use the reserved blocks. + + sb= + Use alternate superblock at this location. + + quota, noquota, grpquota, usrquota + These options are ignored by the filesystem. They are used only by + quota tools to recognize volumes where quota should be turned on. See + documentation in the quota-tools package for more details + (http://sourceforge.net/projects/linuxquota). + + jqfmt=<quota type>, usrjquota=<file>, grpjquota=<file> + These options tell filesystem details about quota so that quota + information can be properly updated during journal replay. They replace + the above quota options. See documentation in the quota-tools package + for more details (http://sourceforge.net/projects/linuxquota). + + stripe=n + Number of filesystem blocks that mballoc will try to use for allocation + size and alignment. For RAID5/6 systems this should be the number of + data disks * RAID chunk size in file system blocks. + + delalloc (*) + Defer block allocation until just before ext4 writes out the block(s) + in question. This allows ext4 to better allocation decisions more + efficiently. + + nodelalloc + Disable delayed allocation. Blocks are allocated when the data is + copied from userspace to the page cache, either via the write(2) system + call or when an mmap'ed page which was previously unallocated is + written for the first time. + + max_batch_time=usec + Maximum amount of time ext4 should wait for additional filesystem + operations to be batch together with a synchronous write operation. + Since a synchronous write operation is going to force a commit and then + a wait for the I/O complete, it doesn't cost much, and can be a huge + throughput win, we wait for a small amount of time to see if any other + transactions can piggyback on the synchronous write. The algorithm + used is designed to automatically tune for the speed of the disk, by + measuring the amount of time (on average) that it takes to finish + committing a transaction. Call this time the "commit time". If the + time that the transaction has been running is less than the commit + time, ext4 will try sleeping for the commit time to see if other + operations will join the transaction. The commit time is capped by + the max_batch_time, which defaults to 15000us (15ms). This + optimization can be turned off entirely by setting max_batch_time to 0. + + min_batch_time=usec + This parameter sets the commit time (as described above) to be at least + min_batch_time. It defaults to zero microseconds. Increasing this + parameter may improve the throughput of multi-threaded, synchronous + workloads on very fast disks, at the cost of increasing latency. + + journal_ioprio=prio + The I/O priority (from 0 to 7, where 0 is the highest priority) which + should be used for I/O operations submitted by kjournald2 during a + commit operation. This defaults to 3, which is a slightly higher + priority than the default I/O priority. + + auto_da_alloc(*), noauto_da_alloc + Many broken applications don't use fsync() when replacing existing + files via patterns such as fd = open("foo.new")/write(fd,..)/close(fd)/ + rename("foo.new", "foo"), or worse yet, fd = open("foo", + O_TRUNC)/write(fd,..)/close(fd). If auto_da_alloc is enabled, ext4 + will detect the replace-via-rename and replace-via-truncate patterns + and force that any delayed allocation blocks are allocated such that at + the next journal commit, in the default data=ordered mode, the data + blocks of the new file are forced to disk before the rename() operation + is committed. This provides roughly the same level of guarantees as + ext3, and avoids the "zero-length" problem that can happen when a + system crashes before the delayed allocation blocks are forced to disk. + + noinit_itable + Do not initialize any uninitialized inode table blocks in the + background. This feature may be used by installation CD's so that the + install process can complete as quickly as possible; the inode table + initialization process would then be deferred until the next time the + file system is unmounted. + + init_itable=n + The lazy itable init code will wait n times the number of milliseconds + it took to zero out the previous block group's inode table. This + minimizes the impact on the system performance while file system's + inode table is being initialized. + + discard, nodiscard(*) + Controls whether ext4 should issue discard/TRIM commands to the + underlying block device when blocks are freed. This is useful for SSD + devices and sparse/thinly-provisioned LUNs, but it is off by default + until sufficient testing has been done. + + nouid32 + Disables 32-bit UIDs and GIDs. This is for interoperability with + older kernels which only store and expect 16-bit values. + + block_validity(*), noblock_validity + These options enable or disable the in-kernel facility for tracking + filesystem metadata blocks within internal data structures. This + allows multi- block allocator and other routines to notice bugs or + corrupted allocation bitmaps which cause blocks to be allocated which + overlap with filesystem metadata blocks. + + dioread_lock, dioread_nolock + Controls whether or not ext4 should use the DIO read locking. If the + dioread_nolock option is specified ext4 will allocate uninitialized + extent before buffer write and convert the extent to initialized after + IO completes. This approach allows ext4 code to avoid using inode + mutex, which improves scalability on high speed storages. However this + does not work with data journaling and dioread_nolock option will be + ignored with kernel warning. Note that dioread_nolock code path is only + used for extent-based files. Because of the restrictions this options + comprises it is off by default (e.g. dioread_lock). + + max_dir_size_kb=n + This limits the size of directories so that any attempt to expand them + beyond the specified limit in kilobytes will cause an ENOSPC error. + This is useful in memory constrained environments, where a very large + directory can cause severe performance problems or even provoke the Out + Of Memory killer. (For example, if there is only 512mb memory + available, a 176mb directory may seriously cramp the system's style.) + + i_version + Enable 64-bit inode version support. This option is off by default. + + dax + Use direct access (no page cache). See + Documentation/filesystems/dax.txt. Note that this option is + incompatible with data=journal. + +Data Mode +========= +There are 3 different data modes: + +* writeback mode + + In data=writeback mode, ext4 does not journal data at all. This mode provides + a similar level of journaling as that of XFS, JFS, and ReiserFS in its default + mode - metadata journaling. A crash+recovery can cause incorrect data to + appear in files which were written shortly before the crash. This mode will + typically provide the best ext4 performance. + +* ordered mode + + In data=ordered mode, ext4 only officially journals metadata, but it logically + groups metadata information related to data changes with the data blocks into + a single unit called a transaction. When it's time to write the new metadata + out to disk, the associated data blocks are written first. In general, this + mode performs slightly slower than writeback but significantly faster than + journal mode. + +* journal mode + + data=journal mode provides full data and metadata journaling. All new data is + written to the journal first, and then to its final location. In the event of + a crash, the journal can be replayed, bringing both data and metadata into a + consistent state. This mode is the slowest except when data needs to be read + from and written to disk at the same time where it outperforms all others + modes. Enabling this mode will disable delayed allocation and O_DIRECT + support. + +/proc entries +============= + +Information about mounted ext4 file systems can be found in +/proc/fs/ext4. Each mounted filesystem will have a directory in +/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or +/proc/fs/ext4/dm-0). The files in each per-device directory are shown +in table below. + +Files in /proc/fs/ext4/<devname> + + mb_groups + details of multiblock allocator buddy cache of free blocks + +/sys entries +============ + +Information about mounted ext4 file systems can be found in +/sys/fs/ext4. Each mounted filesystem will have a directory in +/sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or +/sys/fs/ext4/dm-0). The files in each per-device directory are shown +in table below. + +Files in /sys/fs/ext4/<devname>: + +(see also Documentation/ABI/testing/sysfs-fs-ext4) + + delayed_allocation_blocks + This file is read-only and shows the number of blocks that are dirty in + the page cache, but which do not have their location in the filesystem + allocated yet. + + inode_goal + Tuning parameter which (if non-zero) controls the goal inode used by + the inode allocator in preference to all other allocation heuristics. + This is intended for debugging use only, and should be 0 on production + systems. + + inode_readahead_blks + Tuning parameter which controls the maximum number of inode table + blocks that ext4's inode table readahead algorithm will pre-read into + the buffer cache. + + lifetime_write_kbytes + This file is read-only and shows the number of kilobytes of data that + have been written to this filesystem since it was created. + + max_writeback_mb_bump + The maximum number of megabytes the writeback code will try to write + out before move on to another inode. + + mb_group_prealloc + The multiblock allocator will round up allocation requests to a + multiple of this tuning parameter if the stripe size is not set in the + ext4 superblock + + mb_max_to_scan + The maximum number of extents the multiblock allocator will search to + find the best extent. + + mb_min_to_scan + The minimum number of extents the multiblock allocator will search to + find the best extent. + + mb_order2_req + Tuning parameter which controls the minimum size for requests (as a + power of 2) where the buddy cache is used. + + mb_stats + Controls whether the multiblock allocator should collect statistics, + which are shown during the unmount. 1 means to collect statistics, 0 + means not to collect statistics. + + mb_stream_req + Files which have fewer blocks than this tunable parameter will have + their blocks allocated out of a block group specific preallocation + pool, so that small files are packed closely together. Each large file + will have its blocks allocated out of its own unique preallocation + pool. + + session_write_kbytes + This file is read-only and shows the number of kilobytes of data that + have been written to this filesystem since it was mounted. + + reserved_clusters + This is RW file and contains number of reserved clusters in the file + system which will be used in the specific situations to avoid costly + zeroout, unexpected ENOSPC, or possible data loss. The default is 2% or + 4096 clusters, whichever is smaller and this can be changed however it + can never exceed number of clusters in the file system. If there is not + enough space for the reserved space when mounting the file mount will + _not_ fail. + +Ioctls +====== + +There is some Ext4 specific functionality which can be accessed by applications +through the system call interfaces. The list of all Ext4 specific ioctls are +shown in the table below. + +Table of Ext4 specific ioctls + + EXT4_IOC_GETFLAGS + Get additional attributes associated with inode. The ioctl argument is + an integer bitfield, with bit values described in ext4.h. This ioctl is + an alias for FS_IOC_GETFLAGS. + + EXT4_IOC_SETFLAGS + Set additional attributes associated with inode. The ioctl argument is + an integer bitfield, with bit values described in ext4.h. This ioctl is + an alias for FS_IOC_SETFLAGS. + + EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD + Get the inode i_generation number stored for each inode. The + i_generation number is normally changed only when new inode is created + and it is particularly useful for network filesystems. The '_OLD' + version of this ioctl is an alias for FS_IOC_GETVERSION. + + EXT4_IOC_SETVERSION, EXT4_IOC_SETVERSION_OLD + Set the inode i_generation number stored for each inode. The '_OLD' + version of this ioctl is an alias for FS_IOC_SETVERSION. + + EXT4_IOC_GROUP_EXTEND + This ioctl has the same purpose as the resize mount option. It allows + to resize filesystem to the end of the last existing block group, + further resize has to be done with resize2fs, either online, or + offline. The argument points to the unsigned logn number representing + the filesystem new block count. + + EXT4_IOC_MOVE_EXT + Move the block extents from orig_fd (the one this ioctl is pointing to) + to the donor_fd (the one specified in move_extent structure passed as + an argument to this ioctl). Then, exchange inode metadata between + orig_fd and donor_fd. This is especially useful for online + defragmentation, because the allocator has the opportunity to allocate + moved blocks better, ideally into one contiguous extent. + + EXT4_IOC_GROUP_ADD + Add a new group descriptor to an existing or new group descriptor + block. The new group descriptor is described by ext4_new_group_input + structure, which is passed as an argument to this ioctl. This is + especially useful in conjunction with EXT4_IOC_GROUP_EXTEND, which + allows online resize of the filesystem to the end of the last existing + block group. Those two ioctls combined is used in userspace online + resize tool (e.g. resize2fs). + + EXT4_IOC_MIGRATE + This ioctl operates on the filesystem itself. It converts (migrates) + ext3 indirect block mapped inode to ext4 extent mapped inode by walking + through indirect block mapping of the original inode and converting + contiguous block ranges into ext4 extents of the temporary inode. Then, + inodes are swapped. This ioctl might help, when migrating from ext3 to + ext4 filesystem, however suggestion is to create fresh ext4 filesystem + and copy data from the backup. Note, that filesystem has to support + extents for this ioctl to work. + + EXT4_IOC_ALLOC_DA_BLKS + Force all of the delay allocated blocks to be allocated to preserve + application-expected ext3 behaviour. Note that this will also start + triggering a write of the data blocks, but this behaviour may change in + the future as it is not necessary and has been done this way only for + sake of simplicity. + + EXT4_IOC_RESIZE_FS + Resize the filesystem to a new size. The number of blocks of resized + filesystem is passed in via 64 bit integer argument. The kernel + allocates bitmaps and inode table, the userspace tool thus just passes + the new number of blocks. + + EXT4_IOC_SWAP_BOOT + Swap i_blocks and associated attributes (like i_blocks, i_size, + i_flags, ...) from the specified inode with inode EXT4_BOOT_LOADER_INO + (#5). This is typically used to store a boot loader in a secure part of + the filesystem, where it can't be changed by a normal user by accident. + The data blocks of the previous boot loader will be associated with the + given inode. + +References +========== + +kernel source: <file:fs/ext4/> + <file:fs/jbd2/> + +programs: http://e2fsprogs.sourceforge.net/ + +useful links: http://fedoraproject.org/wiki/ext3-devel + http://www.bullopensource.org/ext4/ + http://ext4.wiki.kernel.org/index.php/Main_Page + http://fedoraproject.org/wiki/Features/Ext4 diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index 0873685bab0f..965745d5fb9a 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -71,6 +71,7 @@ configure specific aspects of kernel behavior to your liking. java ras bcache + ext4 pm/index thunderbolt LSM/index diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 6d380890e075..b90fe3b6bc6c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1759,12 +1759,24 @@ nobypass [PPC/POWERNV] Disable IOMMU bypass, using IOMMU for PCI devices. + iommu.strict= [ARM64] Configure TLB invalidation behaviour + Format: { "0" | "1" } + 0 - Lazy mode. + Request that DMA unmap operations use deferred + invalidation of hardware TLBs, for increased + throughput at the cost of reduced device isolation. + Will fall back to strict mode if not supported by + the relevant IOMMU driver. + 1 - Strict mode (default). + DMA unmap operations invalidate IOMMU hardware TLBs + synchronously. + iommu.passthrough= [ARM64] Configure DMA to bypass the IOMMU by default. Format: { "0" | "1" } 0 - Use IOMMU translation for DMA. 1 - Bypass the IOMMU for DMA. - unset - Use IOMMU translation for DMA. + unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH. io7= [HW] IO7 for Marvel based alpha systems See comment before marvel_specify_io7 in @@ -2284,6 +2296,8 @@ ltpc= [NET] Format: <io>,<irq>,<dma> + lsm.debug [SECURITY] Enable LSM initialization debugging output. + machvec= [IA-64] Force the use of a particular machine-vector (machvec) in a generic kernel. Example: machvec=hpzx1_swiotlb @@ -2414,7 +2428,7 @@ seconds. Use this parameter to check at some other rate. 0 disables periodic checking. - memtest= [KNL,X86,ARM] Enable memtest + memtest= [KNL,X86,ARM,PPC] Enable memtest Format: <integer> default : 0 <disable> Specifies the number of memtest passes to be @@ -4621,7 +4635,8 @@ usbcore.old_scheme_first= [USB] Start with the old device initialization - scheme (default 0 = off). + scheme, applies only to low and full-speed devices + (default 0 = off). usbcore.usbfs_memory_mb= [USB] Memory limit (in MB) for buffers allocated by @@ -4836,6 +4851,18 @@ This is actually a boot loader parameter; the value is passed to the kernel using a special protocol. + vm_debug[=options] [KNL] Available with CONFIG_DEBUG_VM=y. + May slow down system boot speed, especially when + enabled on systems with a large amount of memory. + All options are enabled by default, and this + interface is meant to allow for selectively + enabling or disabling specific virtual memory + debugging features. + + Available options are: + P Enable page structure init time poisoning + - Disable all of the above options + vmalloc=nn[KMG] [KNL,BOOT] Forces the vmalloc area to have an exact size of <nn>. This can be used to increase the minimum size (128MB on x86). It can also be used to diff --git a/Documentation/admin-guide/l1tf.rst b/Documentation/admin-guide/l1tf.rst index bae52b845de0..b85dd80510b0 100644 --- a/Documentation/admin-guide/l1tf.rst +++ b/Documentation/admin-guide/l1tf.rst @@ -553,7 +553,7 @@ When nested virtualization is in use, three operating systems are involved: the bare metal hypervisor, the nested hypervisor and the nested virtual machine. VMENTER operations from the nested hypervisor into the nested guest will always be processed by the bare metal hypervisor. If KVM is the -bare metal hypervisor it wiil: +bare metal hypervisor it will: - Flush the L1D cache on every switch from the nested hypervisor to the nested virtual machine, so that the nested hypervisor's secrets are not diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst index ceead68c2df7..8edb35f11317 100644 --- a/Documentation/admin-guide/mm/index.rst +++ b/Documentation/admin-guide/mm/index.rst @@ -29,6 +29,7 @@ the Linux memory management. hugetlbpage idle_page_tracking ksm + memory-hotplug numa_memory_policy pagemap soft-dirty diff --git a/Documentation/memory-hotplug.txt b/Documentation/admin-guide/mm/memory-hotplug.rst index 7f49ebf3ddb2..25157aec5b31 100644 --- a/Documentation/memory-hotplug.txt +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -1,3 +1,5 @@ +.. _admin_guide_memory_hotplug: + ============== Memory Hotplug ============== @@ -9,39 +11,19 @@ This document is about memory hotplug including how-to-use and current status. Because Memory Hotplug is still under development, contents of this text will be changed often. -.. CONTENTS - - 1. Introduction - 1.1 purpose of memory hotplug - 1.2. Phases of memory hotplug - 1.3. Unit of Memory online/offline operation - 2. Kernel Configuration - 3. sysfs files for memory hotplug - 4. Physical memory hot-add phase - 4.1 Hardware(Firmware) Support - 4.2 Notify memory hot-add event by hand - 5. Logical Memory hot-add phase - 5.1. State of memory - 5.2. How to online memory - 6. Logical memory remove - 6.1 Memory offline and ZONE_MOVABLE - 6.2. How to offline memory - 7. Physical memory remove - 8. Memory hotplug event notifier - 9. Future Work List - +.. contents:: :local: .. note:: (1) x86_64's has special implementation for memory hotplug. This text does not describe it. - (2) This text assumes that sysfs is mounted at /sys. + (2) This text assumes that sysfs is mounted at ``/sys``. Introduction ============ -purpose of memory hotplug +Purpose of memory hotplug ------------------------- Memory Hotplug allows users to increase/decrease the amount of memory. @@ -57,7 +39,6 @@ hardware which supports memory power management. Linux memory hotplug is designed for both purpose. - Phases of memory hotplug ------------------------ @@ -92,7 +73,6 @@ phase by hand. (However, if you writes udev's hotplug scripts for memory hotplug, these phases can be execute in seamless way.) - Unit of Memory online/offline operation --------------------------------------- @@ -107,10 +87,9 @@ unit upon which memory online/offline operations are to be performed. The default size of a memory block is the same as memory section size unless an architecture specifies otherwise. (see :ref:`memory_hotplug_sysfs_files`.) -To determine the size (in bytes) of a memory block please read this file: - -/sys/devices/system/memory/block_size_bytes +To determine the size (in bytes) of a memory block please read this file:: + /sys/devices/system/memory/block_size_bytes Kernel Configuration ==================== @@ -119,22 +98,22 @@ To use memory hotplug feature, kernel must be compiled with following config options. - For all memory hotplug: - - Memory model -> Sparse Memory (CONFIG_SPARSEMEM) - - Allow for memory hot-add (CONFIG_MEMORY_HOTPLUG) + - Memory model -> Sparse Memory (``CONFIG_SPARSEMEM``) + - Allow for memory hot-add (``CONFIG_MEMORY_HOTPLUG``) - To enable memory removal, the following are also necessary: - - Allow for memory hot remove (CONFIG_MEMORY_HOTREMOVE) - - Page Migration (CONFIG_MIGRATION) + - Allow for memory hot remove (``CONFIG_MEMORY_HOTREMOVE``) + - Page Migration (``CONFIG_MIGRATION``) - For ACPI memory hotplug, the following are also necessary: - - Memory hotplug (under ACPI Support menu) (CONFIG_ACPI_HOTPLUG_MEMORY) + - Memory hotplug (under ACPI Support menu) (``CONFIG_ACPI_HOTPLUG_MEMORY``) - This option can be kernel module. - As a related configuration, if your box has a feature of NUMA-node hotplug via ACPI, then this option is necessary too. - ACPI0004,PNP0A05 and PNP0A06 Container Driver (under ACPI Support menu) - (CONFIG_ACPI_CONTAINER). + (``CONFIG_ACPI_CONTAINER``). This option can be kernel module too. @@ -145,10 +124,11 @@ sysfs files for memory hotplug ============================== All memory blocks have their device information in sysfs. Each memory block -is described under /sys/devices/system/memory as: +is described under ``/sys/devices/system/memory`` as:: /sys/devices/system/memory/memoryXXX - (XXX is the memory block id.) + +where XXX is the memory block id. For the memory block covered by the sysfs directory. It is expected that all memory sections in this range are present and no memory holes exist in the @@ -157,7 +137,7 @@ the existence of one should not affect the hotplug capabilities of the memory block. For example, assume 1GiB memory block size. A device for a memory starting at -0x100000000 is /sys/device/system/memory/memory4:: +0x100000000 is ``/sys/device/system/memory/memory4``:: (0x100000000 / 1Gib = 4) @@ -165,11 +145,11 @@ This device covers address range [0x100000000 ... 0x140000000) Under each memory block, you can see 5 files: -- /sys/devices/system/memory/memoryXXX/phys_index -- /sys/devices/system/memory/memoryXXX/phys_device -- /sys/devices/system/memory/memoryXXX/state -- /sys/devices/system/memory/memoryXXX/removable -- /sys/devices/system/memory/memoryXXX/valid_zones +- ``/sys/devices/system/memory/memoryXXX/phys_index`` +- ``/sys/devices/system/memory/memoryXXX/phys_device`` +- ``/sys/devices/system/memory/memoryXXX/state`` +- ``/sys/devices/system/memory/memoryXXX/removable`` +- ``/sys/devices/system/memory/memoryXXX/valid_zones`` =================== ============================================================ ``phys_index`` read-only and contains memory block id, same as XXX. @@ -207,13 +187,15 @@ Under each memory block, you can see 5 files: These directories/files appear after physical memory hotplug phase. If CONFIG_NUMA is enabled the memoryXXX/ directories can also be accessed -via symbolic links located in the /sys/devices/system/node/node* directories. +via symbolic links located in the ``/sys/devices/system/node/node*`` directories. + +For example:: -For example: -/sys/devices/system/node/node0/memory9 -> ../../memory/memory9 + /sys/devices/system/node/node0/memory9 -> ../../memory/memory9 -A backlink will also be created: -/sys/devices/system/memory/memory9/node0 -> ../../node/node0 +A backlink will also be created:: + + /sys/devices/system/memory/memory9/node0 -> ../../node/node0 .. _memory_hotplug_physical_mem: @@ -240,7 +222,6 @@ If firmware supports NUMA-node hotplug, and defines an object _HID "ACPI0004", calls hotplug code for all of objects which are defined in it. If memory device is found, memory hotplug code will be called. - Notify memory hot-add event by hand ----------------------------------- @@ -251,8 +232,9 @@ CONFIG_ARCH_MEMORY_PROBE and can be configured on powerpc, sh, and x86 if hotplug is supported, although for x86 this should be handled by ACPI notification. -Probe interface is located at -/sys/devices/system/memory/probe +Probe interface is located at:: + + /sys/devices/system/memory/probe You can tell the physical address of new memory to the kernel by:: @@ -263,7 +245,6 @@ memory_block_size] memory range is hot-added. In this case, hotplug script is not called (in current implementation). You'll have to online memory by yourself. Please see :ref:`memory_hotplug_how_to_online_memory`. - Logical Memory hot-add phase ============================ @@ -301,7 +282,7 @@ This sets a global policy and impacts all memory blocks that will subsequently be hotplugged. Currently offline blocks keep their state. It is possible, under certain circumstances, that some memory blocks will be added but will fail to online. User space tools can check their "state" files -(/sys/devices/system/memory/memoryXXX/state) and try to online them manually. +(``/sys/devices/system/memory/memoryXXX/state``) and try to online them manually. If the automatic onlining wasn't requested, failed, or some memory block was offlined it is possible to change the individual block's state by writing to the @@ -334,8 +315,6 @@ available memory will be increased. This may be changed in future. - - Logical memory remove ===================== @@ -413,88 +392,6 @@ Need more implementation yet.... - Notification completion of remove works by OS to firmware. - Guard from remove if not yet. -Memory hotplug event notifier -============================= - -Hotplugging events are sent to a notification queue. - -There are six types of notification defined in include/linux/memory.h: - -MEM_GOING_ONLINE - Generated before new memory becomes available in order to be able to - prepare subsystems to handle memory. The page allocator is still unable - to allocate from the new memory. - -MEM_CANCEL_ONLINE - Generated if MEMORY_GOING_ONLINE fails. - -MEM_ONLINE - Generated when memory has successfully brought online. The callback may - allocate pages from the new memory. - -MEM_GOING_OFFLINE - Generated to begin the process of offlining memory. Allocations are no - longer possible from the memory but some of the memory to be offlined - is still in use. The callback can be used to free memory known to a - subsystem from the indicated memory block. - -MEM_CANCEL_OFFLINE - Generated if MEMORY_GOING_OFFLINE fails. Memory is available again from - the memory block that we attempted to offline. - -MEM_OFFLINE - Generated after offlining memory is complete. - -A callback routine can be registered by calling:: - - hotplug_memory_notifier(callback_func, priority) - -Callback functions with higher values of priority are called before callback -functions with lower values. - -A callback function must have the following prototype:: - - int callback_func( - struct notifier_block *self, unsigned long action, void *arg); - -The first argument of the callback function (self) is a pointer to the block -of the notifier chain that points to the callback function itself. -The second argument (action) is one of the event types described above. -The third argument (arg) passes a pointer of struct memory_notify:: - - struct memory_notify { - unsigned long start_pfn; - unsigned long nr_pages; - int status_change_nid_normal; - int status_change_nid_high; - int status_change_nid; - } - -- start_pfn is start_pfn of online/offline memory. -- nr_pages is # of pages of online/offline memory. -- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask - is (will be) set/clear, if this is -1, then nodemask status is not changed. -- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask - is (will be) set/clear, if this is -1, then nodemask status is not changed. -- status_change_nid is set node id when N_MEMORY of nodemask is (will be) - set/clear. It means a new(memoryless) node gets new memory by online and a - node loses all memory. If this is -1, then nodemask status is not changed. - - If status_changed_nid* >= 0, callback should create/discard structures for the - node if necessary. - -The callback routine shall return one of the values -NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP -defined in include/linux/notifier.h - -NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. - -NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, -MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops -further processing of the notification queue. - -NOTIFY_STOP stops further processing of the notification queue. - Future Work =========== diff --git a/Documentation/admin-guide/security-bugs.rst b/Documentation/admin-guide/security-bugs.rst index 30491d91e93d..164bf71149fd 100644 --- a/Documentation/admin-guide/security-bugs.rst +++ b/Documentation/admin-guide/security-bugs.rst @@ -26,23 +26,34 @@ information is helpful. Any exploit code is very helpful and will not be released without consent from the reporter unless it has already been made public. -Disclosure ----------- - -The goal of the Linux kernel security team is to work with the bug -submitter to understand and fix the bug. We prefer to publish the fix as -soon as possible, but try to avoid public discussion of the bug itself -and leave that to others. - -Publishing the fix may be delayed when the bug or the fix is not yet -fully understood, the solution is not well-tested or for vendor -coordination. However, we expect these delays to be short, measurable in -days, not weeks or months. A release date is negotiated by the security -team working with the bug submitter as well as vendors. However, the -kernel security team holds the final say when setting a timeframe. The -timeframe varies from immediate (esp. if it's already publicly known bug) -to a few weeks. As a basic default policy, we expect report date to -release date to be on the order of 7 days. +Disclosure and embargoed information +------------------------------------ + +The security list is not a disclosure channel. For that, see Coordination +below. + +Once a robust fix has been developed, our preference is to release the +fix in a timely fashion, treating it no differently than any of the other +thousands of changes and fixes the Linux kernel project releases every +month. + +However, at the request of the reporter, we will postpone releasing the +fix for up to 5 business days after the date of the report or after the +embargo has lifted; whichever comes first. The only exception to that +rule is if the bug is publicly known, in which case the preference is to +release the fix as soon as it's available. + +Whilst embargoed information may be shared with trusted individuals in +order to develop a fix, such information will not be published alongside +the fix or on any other disclosure channel without the permission of the +reporter. This includes but is not limited to the original bug report +and followup discussions (if any), exploits, CVE information or the +identity of the reporter. + +In other words our only interest is in getting bugs fixed. All other +information submitted to the security list and any followup discussions +of the report are treated confidentially even after the embargo has been +lifted, in perpetuity. Coordination ------------ @@ -68,7 +79,7 @@ may delay the bug handling. If a reporter wishes to have a CVE identifier assigned ahead of public disclosure, they will need to contact the private linux-distros list, described above. When such a CVE identifier is known before a patch is provided, it is desirable to mention it in the commit -message, though. +message if the reporter agrees. Non-disclosure agreements ------------------------- diff --git a/Documentation/arm/00-INDEX b/Documentation/arm/00-INDEX deleted file mode 100644 index b6e69fd371c4..000000000000 --- a/Documentation/arm/00-INDEX +++ /dev/null @@ -1,50 +0,0 @@ -00-INDEX - - this file -Booting - - requirements for booting -CCN.txt - - Cache Coherent Network ring-bus and perf PMU driver. -Interrupts - - ARM Interrupt subsystem documentation -IXP4xx - - Intel IXP4xx Network processor. -Netwinder - - Netwinder specific documentation -Porting - - Symbol definitions for porting Linux to a new ARM machine. -Setup - - Kernel initialization parameters on ARM Linux -README - - General ARM documentation -SA1100/ - - SA1100 documentation -Samsung-S3C24XX/ - - S3C24XX ARM Linux Overview -SPEAr/ - - ST SPEAr platform Linux Overview -VFP/ - - Release notes for Linux Kernel Vector Floating Point support code -cluster-pm-race-avoidance.txt - - Algorithm for CPU and Cluster setup/teardown -empeg/ - - Ltd's Empeg MP3 Car Audio Player -firmware.txt - - Secure firmware registration and calling. -kernel_mode_neon.txt - - How to use NEON instructions in kernel mode -kernel_user_helpers.txt - - Helper functions in kernel space made available for userspace. -mem_alignment - - alignment abort handler documentation -memory.txt - - description of the virtual memory layout -nwfpe/ - - NWFPE floating point emulator documentation -swp_emulation - - SWP/SWPB emulation handler/logging description -tcm.txt - - ARM Tightly Coupled Memory -uefi.txt - - [U]EFI configuration and runtime services documentation -vlocks.txt - - Voting locks, low-level mechanism relying on memory system atomic writes. diff --git a/Documentation/block/00-INDEX b/Documentation/block/00-INDEX deleted file mode 100644 index 8d55b4bbb5e2..000000000000 --- a/Documentation/block/00-INDEX +++ /dev/null @@ -1,34 +0,0 @@ -00-INDEX - - This file -bfq-iosched.txt - - BFQ IO scheduler and its tunables -biodoc.txt - - Notes on the Generic Block Layer Rewrite in Linux 2.5 -biovecs.txt - - Immutable biovecs and biovec iterators -capability.txt - - Generic Block Device Capability (/sys/block/<device>/capability) -cfq-iosched.txt - - CFQ IO scheduler tunables -cmdline-partition.txt - - how to specify block device partitions on kernel command line -data-integrity.txt - - Block data integrity -deadline-iosched.txt - - Deadline IO scheduler tunables -ioprio.txt - - Block io priorities (in CFQ scheduler) -pr.txt - - Block layer support for Persistent Reservations -null_blk.txt - - Null block for block-layer benchmarking. -queue-sysfs.txt - - Queue's sysfs entries -request.txt - - The members of struct request (in include/linux/blkdev.h) -stat.txt - - Block layer statistics in /sys/block/<device>/stat -switching-sched.txt - - Switching I/O schedulers at runtime -writeback_cache_control.txt - - Control of volatile write back caches diff --git a/Documentation/blockdev/00-INDEX b/Documentation/blockdev/00-INDEX deleted file mode 100644 index c08df56dd91b..000000000000 --- a/Documentation/blockdev/00-INDEX +++ /dev/null @@ -1,18 +0,0 @@ -00-INDEX - - this file -README.DAC960 - - info on Mylex DAC960/DAC1100 PCI RAID Controller Driver for Linux. -cciss.txt - - info, major/minor #'s for Compaq's SMART Array Controllers. -cpqarray.txt - - info on using Compaq's SMART2 Intelligent Disk Array Controllers. -floppy.txt - - notes and driver options for the floppy disk driver. -mflash.txt - - info on mGine m(g)flash driver for linux. -nbd.txt - - info on a TCP implementation of a network block device. -paride.txt - - information about the parallel port IDE subsystem. -ramdisk.txt - - short guide on how to set up and use the RAM disk. diff --git a/Documentation/cdrom/00-INDEX b/Documentation/cdrom/00-INDEX deleted file mode 100644 index 433edf23dc49..000000000000 --- a/Documentation/cdrom/00-INDEX +++ /dev/null @@ -1,11 +0,0 @@ -00-INDEX - - this file (info on CD-ROMs and Linux) -Makefile - - only used to generate TeX output from the documentation. -cdrom-standard.tex - - LaTeX document on standardizing the CD-ROM programming interface. -ide-cd - - info on setting up and using ATAPI (aka IDE) CD-ROMs. -packet-writing.txt - - Info on the CDRW packet writing module - diff --git a/Documentation/cgroup-v1/00-INDEX b/Documentation/cgroup-v1/00-INDEX deleted file mode 100644 index 13e0c85e7b35..000000000000 --- a/Documentation/cgroup-v1/00-INDEX +++ /dev/null @@ -1,26 +0,0 @@ -00-INDEX - - this file -blkio-controller.txt - - Description for Block IO Controller, implementation and usage details. -cgroups.txt - - Control Groups definition, implementation details, examples and API. -cpuacct.txt - - CPU Accounting Controller; account CPU usage for groups of tasks. -cpusets.txt - - documents the cpusets feature; assign CPUs and Mem to a set of tasks. -admin-guide/devices.rst - - Device Whitelist Controller; description, interface and security. -freezer-subsystem.txt - - checkpointing; rationale to not use signals, interface. -hugetlb.txt - - HugeTLB Controller implementation and usage details. -memcg_test.txt - - Memory Resource Controller; implementation details. -memory.txt - - Memory Resource Controller; design, accounting, interface, testing. -net_cls.txt - - Network classifier cgroups details and usages. -net_prio.txt - - Network priority cgroups details and usages. -pids.txt - - Process number cgroups details and usages. diff --git a/Documentation/cgroup-v1/rdma.txt b/Documentation/cgroup-v1/rdma.txt index af618171e0eb..9bdb7fd03f83 100644 --- a/Documentation/cgroup-v1/rdma.txt +++ b/Documentation/cgroup-v1/rdma.txt @@ -27,7 +27,7 @@ cgroup. Currently user space applications can easily take away all the rdma verb specific resources such as AH, CQ, QP, MR etc. Due to which other applications in other cgroup or kernel space ULPs may not even get chance to allocate any -rdma resources. This can leads to service unavailability. +rdma resources. This can lead to service unavailability. Therefore RDMA controller is needed through which resource consumption of processes can be limited. Through this controller different rdma diff --git a/Documentation/conf.py b/Documentation/conf.py index b691af4831fa..72647a38b5c2 100644 --- a/Documentation/conf.py +++ b/Documentation/conf.py @@ -259,7 +259,7 @@ latex_elements = { 'papersize': 'a4paper', # The font size ('10pt', '11pt' or '12pt'). -'pointsize': '8pt', +'pointsize': '11pt', # Latex figure (float) alignment #'figure_align': 'htbp', @@ -272,8 +272,8 @@ latex_elements = { 'preamble': ''' % Use some font with UTF-8 support with XeLaTeX \\usepackage{fontspec} - \\setsansfont{DejaVu Serif} - \\setromanfont{DejaVu Sans} + \\setsansfont{DejaVu Sans} + \\setromanfont{DejaVu Serif} \\setmonofont{DejaVu Sans Mono} ''' @@ -383,6 +383,10 @@ latex_documents = [ 'The kernel development community', 'manual'), ('filesystems/index', 'filesystems.tex', 'Linux Filesystems API', 'The kernel development community', 'manual'), + ('admin-guide/ext4', 'ext4-admin-guide.tex', 'ext4 Administration Guide', + 'ext4 Community', 'manual'), + ('filesystems/ext4/index', 'ext4-data-structures.tex', + 'ext4 Data Structures and Algorithms', 'ext4 Community', 'manual'), ('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide', 'The kernel development community', 'manual'), ('input/index', 'linux-input.tex', 'The Linux input driver subsystem', diff --git a/Documentation/core-api/boot-time-mm.rst b/Documentation/core-api/boot-time-mm.rst index 03cb1643f46f..6e12e89a03e0 100644 --- a/Documentation/core-api/boot-time-mm.rst +++ b/Documentation/core-api/boot-time-mm.rst @@ -76,7 +76,7 @@ These interfaces available only with bootmem, i.e when ``CONFIG_NO_BOOTMEM=n`` .. kernel-doc:: include/linux/bootmem.h .. kernel-doc:: mm/bootmem.c - :nodocs: + :functions: Memblock specific API --------------------- @@ -89,4 +89,4 @@ really happens under the hood. .. kernel-doc:: include/linux/memblock.h .. kernel-doc:: mm/memblock.c - :nodocs: + :functions: diff --git a/Documentation/core-api/gfp_mask-from-fs-io.rst b/Documentation/core-api/gfp_mask-from-fs-io.rst index e0df8f416582..e7c32a8de126 100644 --- a/Documentation/core-api/gfp_mask-from-fs-io.rst +++ b/Documentation/core-api/gfp_mask-from-fs-io.rst @@ -1,3 +1,5 @@ +.. _gfp_mask_from_fs_io: + ================================= GFP masks used from FS/IO context ================================= diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index 26b735cefb93..29c790f571a5 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -27,10 +27,13 @@ Core utilities errseq printk-formats circular-buffers + memory-allocation mm-api gfp_mask-from-fs-io timekeeping boot-time-mm + memory-hotplug + Interfaces for kernel debugging =============================== diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst new file mode 100644 index 000000000000..f8bb9aa120c4 --- /dev/null +++ b/Documentation/core-api/memory-allocation.rst @@ -0,0 +1,122 @@ +======================= +Memory Allocation Guide +======================= + +Linux provides a variety of APIs for memory allocation. You can +allocate small chunks using `kmalloc` or `kmem_cache_alloc` families, +large virtually contiguous areas using `vmalloc` and its derivatives, +or you can directly request pages from the page allocator with +`alloc_pages`. It is also possible to use more specialized allocators, +for instance `cma_alloc` or `zs_malloc`. + +Most of the memory allocation APIs use GFP flags to express how that +memory should be allocated. The GFP acronym stands for "get free +pages", the underlying memory allocation function. + +Diversity of the allocation APIs combined with the numerous GFP flags +makes the question "How should I allocate memory?" not that easy to +answer, although very likely you should use + +:: + + kzalloc(<size>, GFP_KERNEL); + +Of course there are cases when other allocation APIs and different GFP +flags must be used. + +Get Free Page flags +=================== + +The GFP flags control the allocators behavior. They tell what memory +zones can be used, how hard the allocator should try to find free +memory, whether the memory can be accessed by the userspace etc. The +:ref:`Documentation/core-api/mm-api.rst <mm-api-gfp-flags>` provides +reference documentation for the GFP flags and their combinations and +here we briefly outline their recommended usage: + + * Most of the time ``GFP_KERNEL`` is what you need. Memory for the + kernel data structures, DMAable memory, inode cache, all these and + many other allocations types can use ``GFP_KERNEL``. Note, that + using ``GFP_KERNEL`` implies ``GFP_RECLAIM``, which means that + direct reclaim may be triggered under memory pressure; the calling + context must be allowed to sleep. + * If the allocation is performed from an atomic context, e.g interrupt + handler, use ``GFP_NOWAIT``. This flag prevents direct reclaim and + IO or filesystem operations. Consequently, under memory pressure + ``GFP_NOWAIT`` allocation is likely to fail. Allocations which + have a reasonable fallback should be using ``GFP_NOWARN``. + * If you think that accessing memory reserves is justified and the kernel + will be stressed unless allocation succeeds, you may use ``GFP_ATOMIC``. + * Untrusted allocations triggered from userspace should be a subject + of kmem accounting and must have ``__GFP_ACCOUNT`` bit set. There + is the handy ``GFP_KERNEL_ACCOUNT`` shortcut for ``GFP_KERNEL`` + allocations that should be accounted. + * Userspace allocations should use either of the ``GFP_USER``, + ``GFP_HIGHUSER`` or ``GFP_HIGHUSER_MOVABLE`` flags. The longer + the flag name the less restrictive it is. + + ``GFP_HIGHUSER_MOVABLE`` does not require that allocated memory + will be directly accessible by the kernel and implies that the + data is movable. + + ``GFP_HIGHUSER`` means that the allocated memory is not movable, + but it is not required to be directly accessible by the kernel. An + example may be a hardware allocation that maps data directly into + userspace but has no addressing limitations. + + ``GFP_USER`` means that the allocated memory is not movable and it + must be directly accessible by the kernel. + +You may notice that quite a few allocations in the existing code +specify ``GFP_NOIO`` or ``GFP_NOFS``. Historically, they were used to +prevent recursion deadlocks caused by direct memory reclaim calling +back into the FS or IO paths and blocking on already held +resources. Since 4.12 the preferred way to address this issue is to +use new scope APIs described in +:ref:`Documentation/core-api/gfp_mask-from-fs-io.rst <gfp_mask_from_fs_io>`. + +Other legacy GFP flags are ``GFP_DMA`` and ``GFP_DMA32``. They are +used to ensure that the allocated memory is accessible by hardware +with limited addressing capabilities. So unless you are writing a +driver for a device with such restrictions, avoid using these flags. +And even with hardware with restrictions it is preferable to use +`dma_alloc*` APIs. + +Selecting memory allocator +========================== + +The most straightforward way to allocate memory is to use a function +from the :c:func:`kmalloc` family. And, to be on the safe size it's +best to use routines that set memory to zero, like +:c:func:`kzalloc`. If you need to allocate memory for an array, there +are :c:func:`kmalloc_array` and :c:func:`kcalloc` helpers. + +The maximal size of a chunk that can be allocated with `kmalloc` is +limited. The actual limit depends on the hardware and the kernel +configuration, but it is a good practice to use `kmalloc` for objects +smaller than page size. + +For large allocations you can use :c:func:`vmalloc` and +:c:func:`vzalloc`, or directly request pages from the page +allocator. The memory allocated by `vmalloc` and related functions is +not physically contiguous. + +If you are not sure whether the allocation size is too large for +`kmalloc`, it is possible to use :c:func:`kvmalloc` and its +derivatives. It will try to allocate memory with `kmalloc` and if the +allocation fails it will be retried with `vmalloc`. There are +restrictions on which GFP flags can be used with `kvmalloc`; please +see :c:func:`kvmalloc_node` reference documentation. Note that +`kvmalloc` may return memory that is not physically contiguous. + +If you need to allocate many identical objects you can use the slab +cache allocator. The cache should be set up with +:c:func:`kmem_cache_create` before it can be used. Afterwards +:c:func:`kmem_cache_alloc` and its convenience wrappers can allocate +memory from that cache. + +When the allocated memory is no longer needed it must be freed. You +can use :c:func:`kvfree` for the memory allocated with `kmalloc`, +`vmalloc` and `kvmalloc`. The slab caches should be freed with +:c:func:`kmem_cache_free`. And don't forget to destroy the cache with +:c:func:`kmem_cache_destroy`. diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst new file mode 100644 index 000000000000..de7467e48067 --- /dev/null +++ b/Documentation/core-api/memory-hotplug.rst @@ -0,0 +1,125 @@ +.. _memory_hotplug: + +============== +Memory hotplug +============== + +Memory hotplug event notifier +============================= + +Hotplugging events are sent to a notification queue. + +There are six types of notification defined in ``include/linux/memory.h``: + +MEM_GOING_ONLINE + Generated before new memory becomes available in order to be able to + prepare subsystems to handle memory. The page allocator is still unable + to allocate from the new memory. + +MEM_CANCEL_ONLINE + Generated if MEM_GOING_ONLINE fails. + +MEM_ONLINE + Generated when memory has successfully brought online. The callback may + allocate pages from the new memory. + +MEM_GOING_OFFLINE + Generated to begin the process of offlining memory. Allocations are no + longer possible from the memory but some of the memory to be offlined + is still in use. The callback can be used to free memory known to a + subsystem from the indicated memory block. + +MEM_CANCEL_OFFLINE + Generated if MEM_GOING_OFFLINE fails. Memory is available again from + the memory block that we attempted to offline. + +MEM_OFFLINE + Generated after offlining memory is complete. + +A callback routine can be registered by calling:: + + hotplug_memory_notifier(callback_func, priority) + +Callback functions with higher values of priority are called before callback +functions with lower values. + +A callback function must have the following prototype:: + + int callback_func( + struct notifier_block *self, unsigned long action, void *arg); + +The first argument of the callback function (self) is a pointer to the block +of the notifier chain that points to the callback function itself. +The second argument (action) is one of the event types described above. +The third argument (arg) passes a pointer of struct memory_notify:: + + struct memory_notify { + unsigned long start_pfn; + unsigned long nr_pages; + int status_change_nid_normal; + int status_change_nid_high; + int status_change_nid; + } + +- start_pfn is start_pfn of online/offline memory. +- nr_pages is # of pages of online/offline memory. +- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask + is (will be) set/clear, if this is -1, then nodemask status is not changed. +- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask + is (will be) set/clear, if this is -1, then nodemask status is not changed. +- status_change_nid is set node id when N_MEMORY of nodemask is (will be) + set/clear. It means a new(memoryless) node gets new memory by online and a + node loses all memory. If this is -1, then nodemask status is not changed. + + If status_changed_nid* >= 0, callback should create/discard structures for the + node if necessary. + +The callback routine shall return one of the values +NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP +defined in ``include/linux/notifier.h`` + +NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. + +NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, +MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops +further processing of the notification queue. + +NOTIFY_STOP stops further processing of the notification queue. + +Locking Internals +================= + +When adding/removing memory that uses memory block devices (i.e. ordinary RAM), +the device_hotplug_lock should be held to: + +- synchronize against online/offline requests (e.g. via sysfs). This way, memory + block devices can only be accessed (.online/.state attributes) by user + space once memory has been fully added. And when removing memory, we + know nobody is in critical sections. +- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC) + +Especially, there is a possible lock inversion that is avoided using +device_hotplug_lock when adding memory and user space tries to online that +memory faster than expected: + +- device_online() will first take the device_lock(), followed by + mem_hotplug_lock +- add_memory_resource() will first take the mem_hotplug_lock, followed by + the device_lock() (while creating the devices, during bus_add_device()). + +As the device is visible to user space before taking the device_lock(), this +can result in a lock inversion. + +onlining/offlining of memory should be done via device_online()/ +device_offline() - to make sure it is properly synchronized to actions +via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type) + +When adding/removing/onlining/offlining memory or adding/removing +heterogeneous/device memory, we should always hold the mem_hotplug_lock in +write mode to serialise memory hotplug (e.g. access to global/zone +variables). + +In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read +mode allows for a quite efficient get_online_mems/put_online_mems +implementation, so code accessing memory can protect from that memory +vanishing. diff --git a/Documentation/core-api/mm-api.rst b/Documentation/core-api/mm-api.rst index 46ae3537fb12..5ce1ec1dd066 100644 --- a/Documentation/core-api/mm-api.rst +++ b/Documentation/core-api/mm-api.rst @@ -14,6 +14,8 @@ User Space Memory Access .. kernel-doc:: mm/util.c :functions: get_user_pages_fast +.. _mm-api-gfp-flags: + Memory Allocation Controls ========================== diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst index 25dc591cb110..ff48b55040ef 100644 --- a/Documentation/core-api/printk-formats.rst +++ b/Documentation/core-api/printk-formats.rst @@ -376,15 +376,15 @@ correctness of the format string and va_list arguments. Passed by reference. -kobjects --------- +Device tree nodes +----------------- :: %pOF[fnpPcCF] -For printing kobject based structs (device nodes). Default behaviour is +For printing device tree node structures. Default behaviour is equivalent to %pOFf. - f - device node full_name @@ -420,9 +420,8 @@ struct clk %pC pll1 %pCn pll1 -For printing struct clk structures. %pC and %pCn print the name -(Common Clock Framework) or address (legacy clock framework) of the -structure. +For printing struct clk structures. %pC and %pCn print the name of the clock +(Common Clock Framework) or a unique 32-bit ID (legacy clock framework). Passed by reference. diff --git a/Documentation/dev-tools/coccinelle.rst b/Documentation/dev-tools/coccinelle.rst index 94f41c290bfc..aa14f05cabb1 100644 --- a/Documentation/dev-tools/coccinelle.rst +++ b/Documentation/dev-tools/coccinelle.rst @@ -30,18 +30,29 @@ of many distributions, e.g. : - NetBSD - FreeBSD -You can get the latest version released from the Coccinelle homepage at +Some distribution packages are obsolete and it is recommended +to use the latest version released from the Coccinelle homepage at http://coccinelle.lip6.fr/ -Once you have it, run the following command:: +Or from Github at: - ./configure +https://github.com/coccinelle/coccinelle + +Once you have it, run the following commands:: + + ./autogen + ./configure make as a regular user, and install it with:: sudo make install +More detailed installation instructions to build from source can be +found at: + +https://github.com/coccinelle/coccinelle/blob/master/install.txt + Supplemental documentation --------------------------- @@ -51,6 +62,10 @@ https://bottest.wiki.kernel.org/coccicheck The wiki documentation always refers to the linux-next version of the script. +For Semantic Patch Language(SmPL) grammar documentation refer to: + +http://coccinelle.lip6.fr/documentation.php + Using Coccinelle on the Linux kernel ------------------------------------ @@ -223,7 +238,7 @@ Since coccicheck runs through make, it naturally runs from the kernel proper dir, as such the second rule above would be implied for picking up a .cocciconfig when using ``make coccicheck``. -``make coccicheck`` also supports using M= targets.If you do not supply +``make coccicheck`` also supports using M= targets. If you do not supply any M= target, it is assumed you want to target the entire kernel. The kernel coccicheck script has:: diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst index 6f653acea248..dad1bb8711e2 100644 --- a/Documentation/dev-tools/kselftest.rst +++ b/Documentation/dev-tools/kselftest.rst @@ -159,7 +159,7 @@ Contributing new tests (details) * If a test needs specific kernel config options enabled, add a config file in the test directory to enable them. - e.g: tools/testing/selftests/android/ion/config + e.g: tools/testing/selftests/android/config Test Harness ============ diff --git a/Documentation/device-mapper/dm-flakey.txt b/Documentation/device-mapper/dm-flakey.txt index c43030718cef..9f0e247d0877 100644 --- a/Documentation/device-mapper/dm-flakey.txt +++ b/Documentation/device-mapper/dm-flakey.txt @@ -33,6 +33,10 @@ Optional feature parameters: All write I/O is silently ignored. Read I/O is handled correctly. + error_writes: + All write I/O is failed with an error signalled. + Read I/O is handled correctly. + corrupt_bio_byte <Nth_byte> <direction> <value> <flags>: During <down interval>, replace <Nth_byte> of the data of each matching bio with <value>. diff --git a/Documentation/devicetree/00-INDEX b/Documentation/devicetree/00-INDEX deleted file mode 100644 index 8c4102c6a5e7..000000000000 --- a/Documentation/devicetree/00-INDEX +++ /dev/null @@ -1,12 +0,0 @@ -Documentation for device trees, a data structure by which bootloaders pass -hardware layout to Linux in a device-independent manner, simplifying hardware -probing. This subsystem is maintained by Grant Likely -<grant.likely@secretlab.ca> and has a mailing list at -https://lists.ozlabs.org/listinfo/devicetree-discuss - -00-INDEX - - this file -booting-without-of.txt - - Booting Linux without Open Firmware, describes history and format of device trees. -usage-model.txt - - How Linux uses DT and what DT aims to solve.
\ No newline at end of file diff --git a/Documentation/devicetree/bindings/arm/al,alpine.txt b/Documentation/devicetree/bindings/arm/al,alpine.txt index f404a4f9b165..d00debe2e86f 100644 --- a/Documentation/devicetree/bindings/arm/al,alpine.txt +++ b/Documentation/devicetree/bindings/arm/al,alpine.txt @@ -14,75 +14,3 @@ compatible: must contain "al,alpine" ... } - -* CPU node: - -The Alpine platform includes cortex-a15 cores. -enable-method: must be "al,alpine-smp" to allow smp [1] - -Example: - -cpus { - #address-cells = <1>; - #size-cells = <0>; - enable-method = "al,alpine-smp"; - - cpu@0 { - compatible = "arm,cortex-a15"; - device_type = "cpu"; - reg = <0>; - }; - - cpu@1 { - compatible = "arm,cortex-a15"; - device_type = "cpu"; - reg = <1>; - }; - - cpu@2 { - compatible = "arm,cortex-a15"; - device_type = "cpu"; - reg = <2>; - }; - - cpu@3 { - compatible = "arm,cortex-a15"; - device_type = "cpu"; - reg = <3>; - }; -}; - - -* Alpine CPU resume registers - -The CPU resume register are used to define required resume address after -reset. - -Properties: -- compatible : Should contain "al,alpine-cpu-resume". -- reg : Offset and length of the register set for the device - -Example: - -cpu_resume { - compatible = "al,alpine-cpu-resume"; - reg = <0xfbff5ed0 0x30>; -}; - -* Alpine System-Fabric Service Registers - -The System-Fabric Service Registers allow various operation on CPU and -system fabric, like powering CPUs off. - -Properties: -- compatible : Should contain "al,alpine-sysfabric-service" and "syscon". -- reg : Offset and length of the register set for the device - -Example: - -nb_service { - compatible = "al,alpine-sysfabric-service", "syscon"; - reg = <0xfb070000 0x10000>; -}; - -[1] arm/cpu-enable-method/al,alpine-smp diff --git a/Documentation/devicetree/bindings/arm/atmel-at91.txt b/Documentation/devicetree/bindings/arm/atmel-at91.txt index 31220b54d85d..4bf1b4da7659 100644 --- a/Documentation/devicetree/bindings/arm/atmel-at91.txt +++ b/Documentation/devicetree/bindings/arm/atmel-at91.txt @@ -70,173 +70,3 @@ compatible: must be one of: - "atmel,samv71q19" - "atmel,samv71q20" - "atmel,samv71q21" - -Chipid required properties: -- compatible: Should be "atmel,sama5d2-chipid" -- reg : Should contain registers location and length - -PIT Timer required properties: -- compatible: Should be "atmel,at91sam9260-pit" -- reg: Should contain registers location and length -- interrupts: Should contain interrupt for the PIT which is the IRQ line - shared across all System Controller members. - -System Timer (ST) required properties: -- compatible: Should be "atmel,at91rm9200-st", "syscon", "simple-mfd" -- reg: Should contain registers location and length -- interrupts: Should contain interrupt for the ST which is the IRQ line - shared across all System Controller members. -- clocks: phandle to input clock. -Its subnodes can be: -- watchdog: compatible should be "atmel,at91rm9200-wdt" - -RSTC Reset Controller required properties: -- compatible: Should be "atmel,<chip>-rstc". - <chip> can be "at91sam9260" or "at91sam9g45" or "sama5d3" -- reg: Should contain registers location and length -- clocks: phandle to input clock. - -Example: - - rstc@fffffd00 { - compatible = "atmel,at91sam9260-rstc"; - reg = <0xfffffd00 0x10>; - clocks = <&clk32k>; - }; - -RAMC SDRAM/DDR Controller required properties: -- compatible: Should be "atmel,at91rm9200-sdramc", "syscon" - "atmel,at91sam9260-sdramc", - "atmel,at91sam9g45-ddramc", - "atmel,sama5d3-ddramc", -- reg: Should contain registers location and length - -Examples: - - ramc0: ramc@ffffe800 { - compatible = "atmel,at91sam9g45-ddramc"; - reg = <0xffffe800 0x200>; - }; - -SHDWC Shutdown Controller - -required properties: -- compatible: Should be "atmel,<chip>-shdwc". - <chip> can be "at91sam9260", "at91sam9rl" or "at91sam9x5". -- reg: Should contain registers location and length -- clocks: phandle to input clock. - -optional properties: -- atmel,wakeup-mode: String, operation mode of the wakeup mode. - Supported values are: "none", "high", "low", "any". -- atmel,wakeup-counter: Counter on Wake-up 0 (between 0x0 and 0xf). - -optional at91sam9260 properties: -- atmel,wakeup-rtt-timer: boolean to enable Real-time Timer Wake-up. - -optional at91sam9rl properties: -- atmel,wakeup-rtc-timer: boolean to enable Real-time Clock Wake-up. -- atmel,wakeup-rtt-timer: boolean to enable Real-time Timer Wake-up. - -optional at91sam9x5 properties: -- atmel,wakeup-rtc-timer: boolean to enable Real-time Clock Wake-up. - -Example: - - shdwc@fffffd10 { - compatible = "atmel,at91sam9260-shdwc"; - reg = <0xfffffd10 0x10>; - clocks = <&clk32k>; - }; - -SHDWC SAMA5D2-Compatible Shutdown Controller - -1) shdwc node - -required properties: -- compatible: should be "atmel,sama5d2-shdwc". -- reg: should contain registers location and length -- clocks: phandle to input clock. -- #address-cells: should be one. The cell is the wake-up input index. -- #size-cells: should be zero. - -optional properties: - -- debounce-delay-us: minimum wake-up inputs debouncer period in - microseconds. It's usually a board-related property. -- atmel,wakeup-rtc-timer: boolean to enable Real-Time Clock wake-up. - -The node contains child nodes for each wake-up input that the platform uses. - -2) input nodes - -Wake-up input nodes are usually described in the "board" part of the Device -Tree. Note also that input 0 is linked to the wake-up pin and is frequently -used. - -Required properties: -- reg: should contain the wake-up input index [0 - 15]. - -Optional properties: -- atmel,wakeup-active-high: boolean, the corresponding wake-up input described - by the child, forces the wake-up of the core power supply on a high level. - The default is to be active low. - -Example: - -On the SoC side: - shdwc@f8048010 { - compatible = "atmel,sama5d2-shdwc"; - reg = <0xf8048010 0x10>; - clocks = <&clk32k>; - #address-cells = <1>; - #size-cells = <0>; - atmel,wakeup-rtc-timer; - }; - -On the board side: - shdwc@f8048010 { - debounce-delay-us = <976>; - - input@0 { - reg = <0>; - }; - - input@1 { - reg = <1>; - atmel,wakeup-active-high; - }; - }; - -Special Function Registers (SFR) - -Special Function Registers (SFR) manage specific aspects of the integrated -memory, bridge implementations, processor and other functionality not controlled -elsewhere. - -required properties: -- compatible: Should be "atmel,<chip>-sfr", "syscon" or - "atmel,<chip>-sfrbu", "syscon" - <chip> can be "sama5d3", "sama5d4" or "sama5d2". -- reg: Should contain registers location and length - - sfr@f0038000 { - compatible = "atmel,sama5d3-sfr", "syscon"; - reg = <0xf0038000 0x60>; - }; - -Security Module (SECUMOD) - -The Security Module macrocell provides all necessary secure functions to avoid -voltage, temperature, frequency and mechanical attacks on the chip. It also -embeds secure memories that can be scrambled - -required properties: -- compatible: Should be "atmel,<chip>-secumod", "syscon". - <chip> can be "sama5d2". -- reg: Should contain registers location and length - - secumod@fc040000 { - compatible = "atmel,sama5d2-secumod", "syscon"; - reg = <0xfc040000 0x100>; - }; diff --git a/Documentation/devicetree/bindings/arm/atmel-sysregs.txt b/Documentation/devicetree/bindings/arm/atmel-sysregs.txt new file mode 100644 index 000000000000..4b96608ad692 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/atmel-sysregs.txt @@ -0,0 +1,171 @@ +Atmel system registers + +Chipid required properties: +- compatible: Should be "atmel,sama5d2-chipid" +- reg : Should contain registers location and length + +PIT Timer required properties: +- compatible: Should be "atmel,at91sam9260-pit" +- reg: Should contain registers location and length +- interrupts: Should contain interrupt for the PIT which is the IRQ line + shared across all System Controller members. + +System Timer (ST) required properties: +- compatible: Should be "atmel,at91rm9200-st", "syscon", "simple-mfd" +- reg: Should contain registers location and length +- interrupts: Should contain interrupt for the ST which is the IRQ line + shared across all System Controller members. +- clocks: phandle to input clock. +Its subnodes can be: +- watchdog: compatible should be "atmel,at91rm9200-wdt" + +RSTC Reset Controller required properties: +- compatible: Should be "atmel,<chip>-rstc". + <chip> can be "at91sam9260" or "at91sam9g45" or "sama5d3" +- reg: Should contain registers location and length +- clocks: phandle to input clock. + +Example: + + rstc@fffffd00 { + compatible = "atmel,at91sam9260-rstc"; + reg = <0xfffffd00 0x10>; + clocks = <&clk32k>; + }; + +RAMC SDRAM/DDR Controller required properties: +- compatible: Should be "atmel,at91rm9200-sdramc", "syscon" + "atmel,at91sam9260-sdramc", + "atmel,at91sam9g45-ddramc", + "atmel,sama5d3-ddramc", +- reg: Should contain registers location and length + +Examples: + + ramc0: ramc@ffffe800 { + compatible = "atmel,at91sam9g45-ddramc"; + reg = <0xffffe800 0x200>; + }; + +SHDWC Shutdown Controller + +required properties: +- compatible: Should be "atmel,<chip>-shdwc". + <chip> can be "at91sam9260", "at91sam9rl" or "at91sam9x5". +- reg: Should contain registers location and length +- clocks: phandle to input clock. + +optional properties: +- atmel,wakeup-mode: String, operation mode of the wakeup mode. + Supported values are: "none", "high", "low", "any". +- atmel,wakeup-counter: Counter on Wake-up 0 (between 0x0 and 0xf). + +optional at91sam9260 properties: +- atmel,wakeup-rtt-timer: boolean to enable Real-time Timer Wake-up. + +optional at91sam9rl properties: +- atmel,wakeup-rtc-timer: boolean to enable Real-time Clock Wake-up. +- atmel,wakeup-rtt-timer: boolean to enable Real-time Timer Wake-up. + +optional at91sam9x5 properties: +- atmel,wakeup-rtc-timer: boolean to enable Real-time Clock Wake-up. + +Example: + + shdwc@fffffd10 { + compatible = "atmel,at91sam9260-shdwc"; + reg = <0xfffffd10 0x10>; + clocks = <&clk32k>; + }; + +SHDWC SAMA5D2-Compatible Shutdown Controller + +1) shdwc node + +required properties: +- compatible: should be "atmel,sama5d2-shdwc". +- reg: should contain registers location and length +- clocks: phandle to input clock. +- #address-cells: should be one. The cell is the wake-up input index. +- #size-cells: should be zero. + +optional properties: + +- debounce-delay-us: minimum wake-up inputs debouncer period in + microseconds. It's usually a board-related property. +- atmel,wakeup-rtc-timer: boolean to enable Real-Time Clock wake-up. + +The node contains child nodes for each wake-up input that the platform uses. + +2) input nodes + +Wake-up input nodes are usually described in the "board" part of the Device +Tree. Note also that input 0 is linked to the wake-up pin and is frequently +used. + +Required properties: +- reg: should contain the wake-up input index [0 - 15]. + +Optional properties: +- atmel,wakeup-active-high: boolean, the corresponding wake-up input described + by the child, forces the wake-up of the core power supply on a high level. + The default is to be active low. + +Example: + +On the SoC side: + shdwc@f8048010 { + compatible = "atmel,sama5d2-shdwc"; + reg = <0xf8048010 0x10>; + clocks = <&clk32k>; + #address-cells = <1>; + #size-cells = <0>; + atmel,wakeup-rtc-timer; + }; + +On the board side: + shdwc@f8048010 { + debounce-delay-us = <976>; + + input@0 { + reg = <0>; + }; + + input@1 { + reg = <1>; + atmel,wakeup-active-high; + }; + }; + +Special Function Registers (SFR) + +Special Function Registers (SFR) manage specific aspects of the integrated +memory, bridge implementations, processor and other functionality not controlled +elsewhere. + +required properties: +- compatible: Should be "atmel,<chip>-sfr", "syscon" or + "atmel,<chip>-sfrbu", "syscon" + <chip> can be "sama5d3", "sama5d4" or "sama5d2". +- reg: Should contain registers location and length + + sfr@f0038000 { + compatible = "atmel,sama5d3-sfr", "syscon"; + reg = <0xf0038000 0x60>; + }; + +Security Module (SECUMOD) + +The Security Module macrocell provides all necessary secure functions to avoid +voltage, temperature, frequency and mechanical attacks on the chip. It also +embeds secure memories that can be scrambled + +required properties: +- compatible: Should be "atmel,<chip>-secumod", "syscon". + <chip> can be "sama5d2". +- reg: Should contain registers location and length + + secumod@fc040000 { + compatible = "atmel,sama5d2-secumod", "syscon"; + reg = <0xfc040000 0x100>; + }; diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt index 5d1ad09bafb4..f8aff65ab921 100644 --- a/Documentation/devicetree/bindings/arm/coresight.txt +++ b/Documentation/devicetree/bindings/arm/coresight.txt @@ -54,9 +54,7 @@ its hardware characteristcs. clocks the core of that coresight component. The latter clock is optional. - * port or ports: The representation of the component's port - layout using the generic DT graph presentation found in - "bindings/graph.txt". + * port or ports: see "Graph bindings for Coresight" below. * Additional required properties for System Trace Macrocells (STM): * reg: along with the physical base address and length of the register @@ -73,7 +71,7 @@ its hardware characteristcs. AMBA markee): - "arm,coresight-replicator" - * port or ports: same as above. + * port or ports: see "Graph bindings for Coresight" below. * Optional properties for ETM/PTMs: @@ -96,6 +94,20 @@ its hardware characteristcs. * interrupts : Exactly one SPI may be listed for reporting the address error +Graph bindings for Coresight +------------------------------- + +Coresight components are interconnected to create a data path for the flow of +trace data generated from the "sources" to their collection points "sink". +Each coresight component must describe the "input" and "output" connections. +The connections must be described via generic DT graph bindings as described +by the "bindings/graph.txt", where each "port" along with an "endpoint" +component represents a hardware port and the connection. + + * All output ports must be listed inside a child node named "out-ports" + * All input ports must be listed inside a child node named "in-ports". + * Port address must match the hardware port number. + Example: 1. Sinks @@ -105,10 +117,11 @@ Example: clocks = <&oscclk6a>; clock-names = "apb_pclk"; - port { - etb_in_port: endpoint@0 { - slave-mode; - remote-endpoint = <&replicator_out_port0>; + in-ports { + port { + etb_in_port: endpoint@0 { + remote-endpoint = <&replicator_out_port0>; + }; }; }; }; @@ -119,10 +132,11 @@ Example: clocks = <&oscclk6a>; clock-names = "apb_pclk"; - port { - tpiu_in_port: endpoint@0 { - slave-mode; - remote-endpoint = <&replicator_out_port1>; + in-ports { + port { + tpiu_in_port: endpoint@0 { + remote-endpoint = <&replicator_out_port1>; + }; }; }; }; @@ -133,22 +147,16 @@ Example: clocks = <&oscclk6a>; clock-names = "apb_pclk"; - ports { - #address-cells = <1>; - #size-cells = <0>; - - /* input port */ - port@0 { - reg = <0>; + in-ports { + port { etr_in_port: endpoint { - slave-mode; remote-endpoint = <&replicator2_out_port0>; }; }; + }; - /* CATU link represented by output port */ - port@1 { - reg = <1>; + out-ports { + port { etr_out_port: endpoint { remote-endpoint = <&catu_in_port>; }; @@ -163,7 +171,7 @@ Example: */ compatible = "arm,coresight-replicator"; - ports { + out-ports { #address-cells = <1>; #size-cells = <0>; @@ -181,12 +189,11 @@ Example: remote-endpoint = <&tpiu_in_port>; }; }; + }; - /* replicator input port */ - port@2 { - reg = <0>; + in-ports { + port { replicator_in_port0: endpoint { - slave-mode; remote-endpoint = <&funnel_out_port0>; }; }; @@ -199,40 +206,36 @@ Example: clocks = <&oscclk6a>; clock-names = "apb_pclk"; - ports { - #address-cells = <1>; - #size-cells = <0>; - - /* funnel output port */ - port@0 { - reg = <0>; + out-ports { + port { funnel_out_port0: endpoint { remote-endpoint = <&replicator_in_port0>; }; }; + }; - /* funnel input ports */ - port@1 { + in-ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { reg = <0>; funnel_in_port0: endpoint { - slave-mode; remote-endpoint = <&ptm0_out_port>; }; }; - port@2 { + port@1 { reg = <1>; funnel_in_port1: endpoint { - slave-mode; remote-endpoint = <&ptm1_out_port>; }; }; - port@3 { + port@2 { reg = <2>; funnel_in_port2: endpoint { - slave-mode; remote-endpoint = <&etm0_out_port>; }; }; @@ -248,9 +251,11 @@ Example: cpu = <&cpu0>; clocks = <&oscclk6a>; clock-names = "apb_pclk"; - port { - ptm0_out_port: endpoint { - remote-endpoint = <&funnel_in_port0>; + out-ports { + port { + ptm0_out_port: endpoint { + remote-endpoint = <&funnel_in_port0>; + }; }; }; }; @@ -262,9 +267,11 @@ Example: cpu = <&cpu1>; clocks = <&oscclk6a>; clock-names = "apb_pclk"; - port { - ptm1_out_port: endpoint { - remote-endpoint = <&funnel_in_port1>; + out-ports { + port { + ptm1_out_port: endpoint { + remote-endpoint = <&funnel_in_port1>; + }; }; }; }; @@ -278,9 +285,11 @@ Example: clocks = <&soc_smc50mhz>; clock-names = "apb_pclk"; - port { - stm_out_port: endpoint { - remote-endpoint = <&main_funnel_in_port2>; + out-ports { + port { + stm_out_port: endpoint { + remote-endpoint = <&main_funnel_in_port2>; + }; }; }; }; @@ -295,10 +304,11 @@ Example: clock-names = "apb_pclk"; interrupts = <GIC_SPI 4 IRQ_TYPE_LEVEL_HIGH>; - port { - catu_in_port: endpoint { - slave-mode; - remote-endpoint = <&etr_out_port>; + in-ports { + port { + catu_in_port: endpoint { + remote-endpoint = <&etr_out_port>; + }; }; }; }; diff --git a/Documentation/devicetree/bindings/arm/cpu-enable-method/al,alpine-smp b/Documentation/devicetree/bindings/arm/cpu-enable-method/al,alpine-smp index c2e0cc5e4cfd..35e5afb6d9ad 100644 --- a/Documentation/devicetree/bindings/arm/cpu-enable-method/al,alpine-smp +++ b/Documentation/devicetree/bindings/arm/cpu-enable-method/al,alpine-smp @@ -14,7 +14,28 @@ Related properties: (none) Note: This enable method requires valid nodes compatible with -"al,alpine-cpu-resume" and "al,alpine-nb-service"[1]. +"al,alpine-cpu-resume" and "al,alpine-nb-service". + + +* Alpine CPU resume registers + +The CPU resume register are used to define required resume address after +reset. + +Properties: +- compatible : Should contain "al,alpine-cpu-resume". +- reg : Offset and length of the register set for the device + + +* Alpine System-Fabric Service Registers + +The System-Fabric Service Registers allow various operation on CPU and +system fabric, like powering CPUs off. + +Properties: +- compatible : Should contain "al,alpine-sysfabric-service" and "syscon". +- reg : Offset and length of the register set for the device + Example: @@ -48,5 +69,12 @@ cpus { }; }; --- -[1] arm/al,alpine.txt +cpu_resume { + compatible = "al,alpine-cpu-resume"; + reg = <0xfbff5ed0 0x30>; +}; + +nb_service { + compatible = "al,alpine-sysfabric-service", "syscon"; + reg = <0xfb070000 0x10000>; +}; diff --git a/Documentation/devicetree/bindings/arm/cpus.txt b/Documentation/devicetree/bindings/arm/cpus.txt index 96dfccc0faa8..b0198a1cf403 100644 --- a/Documentation/devicetree/bindings/arm/cpus.txt +++ b/Documentation/devicetree/bindings/arm/cpus.txt @@ -276,7 +276,7 @@ described below. Usage: optional Value type: <prop-encoded-array> Definition: A u32 value that represents the running time dynamic - power coefficient in units of mW/MHz/uV^2. The + power coefficient in units of uW/MHz/V^2. The coefficient can either be calculated from power measurements or derived by analysis. @@ -287,7 +287,7 @@ described below. Pdyn = dynamic-power-coefficient * V^2 * f - where voltage is in uV, frequency is in MHz. + where voltage is in V, frequency is in MHz. Example 1 (dual-cluster big.LITTLE system 32-bit): diff --git a/Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-dcfg.txt b/Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-dcfg.txt new file mode 100644 index 000000000000..b5cb374dc47d --- /dev/null +++ b/Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-dcfg.txt @@ -0,0 +1,19 @@ +Freescale DCFG + +DCFG is the device configuration unit, that provides general purpose +configuration and status for the device. Such as setting the secondary +core start address and release the secondary core from holdoff and startup. + +Required properties: + - compatible: Should contain a chip-specific compatible string, + Chip-specific strings are of the form "fsl,<chip>-dcfg", + The following <chip>s are known to be supported: + ls1012a, ls1021a, ls1043a, ls1046a, ls2080a. + + - reg : should contain base address and length of DCFG memory-mapped registers + +Example: + dcfg: dcfg@1ee0000 { + compatible = "fsl,ls1021a-dcfg"; + reg = <0x0 0x1ee0000 0x0 0x10000>; + }; diff --git a/Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-scfg.txt b/Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-scfg.txt new file mode 100644 index 000000000000..0ab67b0b216d --- /dev/null +++ b/Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-scfg.txt @@ -0,0 +1,19 @@ +Freescale SCFG + +SCFG is the supplemental configuration unit, that provides SoC specific +configuration and status registers for the chip. Such as getting PEX port +status. + +Required properties: + - compatible: Should contain a chip-specific compatible string, + Chip-specific strings are of the form "fsl,<chip>-scfg", + The following <chip>s are known to be supported: + ls1012a, ls1021a, ls1043a, ls1046a, ls2080a. + + - reg: should contain base address and length of SCFG memory-mapped registers + +Example: + scfg: scfg@1570000 { + compatible = "fsl,ls1021a-scfg"; + reg = <0x0 0x1570000 0x0 0x10000>; + }; diff --git a/Documentation/devicetree/bindings/arm/fsl.txt b/Documentation/devicetree/bindings/arm/fsl.txt index 8a1baa2b9723..1e775aaa5c5b 100644 --- a/Documentation/devicetree/bindings/arm/fsl.txt +++ b/Documentation/devicetree/bindings/arm/fsl.txt @@ -101,45 +101,6 @@ Freescale LS1021A Platform Device Tree Bindings Required root node compatible properties: - compatible = "fsl,ls1021a"; -Freescale SoC-specific Device Tree Bindings -------------------------------------------- - -Freescale SCFG - SCFG is the supplemental configuration unit, that provides SoC specific -configuration and status registers for the chip. Such as getting PEX port -status. - Required properties: - - compatible: Should contain a chip-specific compatible string, - Chip-specific strings are of the form "fsl,<chip>-scfg", - The following <chip>s are known to be supported: - ls1012a, ls1021a, ls1043a, ls1046a, ls2080a. - - - reg: should contain base address and length of SCFG memory-mapped registers - -Example: - scfg: scfg@1570000 { - compatible = "fsl,ls1021a-scfg"; - reg = <0x0 0x1570000 0x0 0x10000>; - }; - -Freescale DCFG - DCFG is the device configuration unit, that provides general purpose -configuration and status for the device. Such as setting the secondary -core start address and release the secondary core from holdoff and startup. - Required properties: - - compatible: Should contain a chip-specific compatible string, - Chip-specific strings are of the form "fsl,<chip>-dcfg", - The following <chip>s are known to be supported: - ls1012a, ls1021a, ls1043a, ls1046a, ls2080a. - - - reg : should contain base address and length of DCFG memory-mapped registers - -Example: - dcfg: dcfg@1ee0000 { - compatible = "fsl,ls1021a-dcfg"; - reg = <0x0 0x1ee0000 0x0 0x10000>; - }; - Freescale ARMv8 based Layerscape SoC family Device Tree Bindings ---------------------------------------------------------------- diff --git a/Documentation/devicetree/bindings/arm/secure.txt b/Documentation/devicetree/bindings/arm/secure.txt index e31303fb233a..f27bbff2c780 100644 --- a/Documentation/devicetree/bindings/arm/secure.txt +++ b/Documentation/devicetree/bindings/arm/secure.txt @@ -32,7 +32,8 @@ describe the view of Secure world using the standard bindings. These secure- bindings only need to be used where both the Secure and Normal world views need to be described in a single device tree. -Valid Secure world properties: +Valid Secure world properties +----------------------------- - secure-status : specifies whether the device is present and usable in the secure world. The combination of this with "status" allows @@ -51,3 +52,19 @@ Valid Secure world properties: status = "disabled"; secure-status = "okay"; /* S-only */ status = "disabled"; /* disabled in both */ status = "disabled"; secure-status = "disabled"; /* disabled in both */ + +The secure-chosen node +---------------------- + +Similar to the /chosen node which serves as a place for passing data +between firmware and the operating system, the /secure-chosen node may +be used to pass data to the Secure OS. Only the properties defined +below may appear in the /secure-chosen node. + +- stdout-path : specifies the device to be used by the Secure OS for + its console output. The syntax is the same as for /chosen/stdout-path. + If the /secure-chosen node exists but the stdout-path property is not + present, the Secure OS should not perform any console output. If + /secure-chosen does not exist, the Secure OS should use the value of + /chosen/stdout-path instead (that is, use the same device as the + Normal world OS). diff --git a/Documentation/devicetree/bindings/arm/zte,sysctrl.txt b/Documentation/devicetree/bindings/arm/zte,sysctrl.txt new file mode 100644 index 000000000000..7e66b7f7ba96 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/zte,sysctrl.txt @@ -0,0 +1,30 @@ +ZTE sysctrl Registers + +Registers for 'zte,zx296702' SoC: + +System management required properties: + - compatible = "zte,sysctrl" + +Low power management required properties: + - compatible = "zte,zx296702-pcu" + +Bus matrix required properties: + - compatible = "zte,zx-bus-matrix" + + +Registers for 'zte,zx296718' SoC: + +System management required properties: + - compatible = "zte,zx296718-aon-sysctrl" + - compatible = "zte,zx296718-sysctrl" + +Example: +aon_sysctrl: aon-sysctrl@116000 { + compatible = "zte,zx296718-aon-sysctrl", "syscon"; + reg = <0x116000 0x1000>; +}; + +sysctrl: sysctrl@1463000 { + compatible = "zte,zx296718-sysctrl", "syscon"; + reg = <0x1463000 0x1000>; +}; diff --git a/Documentation/devicetree/bindings/arm/zte.txt b/Documentation/devicetree/bindings/arm/zte.txt index 83369785d29c..340612794a37 100644 --- a/Documentation/devicetree/bindings/arm/zte.txt +++ b/Documentation/devicetree/bindings/arm/zte.txt @@ -1,20 +1,10 @@ ZTE platforms device tree bindings ---------------------------------------- +--------------------------------------- - ZX296702 board: Required root node properties: - compatible = "zte,zx296702-ad1", "zte,zx296702" -System management required properties: - - compatible = "zte,sysctrl" - -Low power management required properties: - - compatible = "zte,zx296702-pcu" - -Bus matrix required properties: - - compatible = "zte,zx-bus-matrix" - - --------------------------------------- - ZX296718 SoC: Required root node properties: @@ -22,18 +12,3 @@ Bus matrix required properties: ZX296718 EVB board: - "zte,zx296718-evb" - -System management required properties: - - compatible = "zte,zx296718-aon-sysctrl" - - compatible = "zte,zx296718-sysctrl" - -Example: -aon_sysctrl: aon-sysctrl@116000 { - compatible = "zte,zx296718-aon-sysctrl", "syscon"; - reg = <0x116000 0x1000>; -}; - -sysctrl: sysctrl@1463000 { - compatible = "zte,zx296718-sysctrl", "syscon"; - reg = <0x1463000 0x1000>; -}; diff --git a/Documentation/devicetree/bindings/connector/usb-connector.txt b/Documentation/devicetree/bindings/connector/usb-connector.txt index 8855bfcfd778..d90e17e2428b 100644 --- a/Documentation/devicetree/bindings/connector/usb-connector.txt +++ b/Documentation/devicetree/bindings/connector/usb-connector.txt @@ -29,15 +29,15 @@ Required properties for usb-c-connector with power delivery support: in "Universal Serial Bus Power Delivery Specification" chapter 6.4.1.2 Source_Capabilities Message, the order of each entry(PDO) should follow the PD spec chapter 6.4.1. Required for power source and power dual role. - User can specify the source PDO array via PDO_FIXED/BATT/VAR() defined in - dt-bindings/usb/pd.h. + User can specify the source PDO array via PDO_FIXED/BATT/VAR/PPS_APDO() + defined in dt-bindings/usb/pd.h. - sink-pdos: An array of u32 with each entry providing supported power sink data object(PDO), the detailed bit definitions of PDO can be found in "Universal Serial Bus Power Delivery Specification" chapter 6.4.1.3 Sink Capabilities Message, the order of each entry(PDO) should follow the PD spec chapter 6.4.1. Required for power sink and power dual role. - User can specify the sink PDO array via PDO_FIXED/BATT/VAR() defined in - dt-bindings/usb/pd.h. + User can specify the sink PDO array via PDO_FIXED/BATT/VAR/PPS_APDO() defined + in dt-bindings/usb/pd.h. - op-sink-microwatt: Sink required operating power in microwatt, if source can't offer the power, Capability Mismatch is set. Required for power sink and power dual role. diff --git a/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt index 78d2db9d4de5..d28fd1af01b4 100644 --- a/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt +++ b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt @@ -24,7 +24,7 @@ Optional properties: Example: -p1_sec_a: crypto@400,d2000000 { +p1_sec_a: crypto@400d2000000 { compatible = "hisilicon,hip07-sec"; reg = <0x400 0xd0000000 0x0 0x10000 0x400 0xd2000000 0x0 0x10000 diff --git a/Documentation/devicetree/bindings/dma/jz4780-dma.txt b/Documentation/devicetree/bindings/dma/jz4780-dma.txt index 03e9cf7b42e0..636fcb26b164 100644 --- a/Documentation/devicetree/bindings/dma/jz4780-dma.txt +++ b/Documentation/devicetree/bindings/dma/jz4780-dma.txt @@ -2,8 +2,13 @@ Required properties: -- compatible: Should be "ingenic,jz4780-dma" -- reg: Should contain the DMA controller registers location and length. +- compatible: Should be one of: + * ingenic,jz4740-dma + * ingenic,jz4725b-dma + * ingenic,jz4770-dma + * ingenic,jz4780-dma +- reg: Should contain the DMA channel registers location and length, followed + by the DMA controller registers location and length. - interrupts: Should contain the interrupt specifier of the DMA controller. - clocks: Should contain a clock specifier for the JZ4780 PDMA clock. - #dma-cells: Must be <2>. Number of integer cells in the dmas property of @@ -19,9 +24,10 @@ Optional properties: Example: -dma: dma@13420000 { +dma: dma-controller@13420000 { compatible = "ingenic,jz4780-dma"; - reg = <0x13420000 0x10000>; + reg = <0x13420000 0x400 + 0x13421000 0x40>; interrupt-parent = <&intc>; interrupts = <10>; diff --git a/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt b/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt index 946229c48657..a5a7c3f5a1e3 100644 --- a/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt +++ b/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt @@ -17,6 +17,7 @@ Required Properties: - compatible: "renesas,dmac-<soctype>", "renesas,rcar-dmac" as fallback. Examples with soctypes are: - "renesas,dmac-r8a7743" (RZ/G1M) + - "renesas,dmac-r8a7744" (RZ/G1N) - "renesas,dmac-r8a7745" (RZ/G1E) - "renesas,dmac-r8a77470" (RZ/G1C) - "renesas,dmac-r8a7790" (R-Car H2) diff --git a/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt b/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt index 482e54362d3e..1743017bd948 100644 --- a/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt +++ b/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt @@ -4,6 +4,7 @@ Required Properties: -compatible: "renesas,<soctype>-usb-dmac", "renesas,usb-dmac" as fallback. Examples with soctypes are: - "renesas,r8a7743-usb-dmac" (RZ/G1M) + - "renesas,r8a7744-usb-dmac" (RZ/G1N) - "renesas,r8a7745-usb-dmac" (RZ/G1E) - "renesas,r8a7790-usb-dmac" (R-Car H2) - "renesas,r8a7791-usb-dmac" (R-Car M2-W) diff --git a/Documentation/devicetree/bindings/fpga/fpga-region.txt b/Documentation/devicetree/bindings/fpga/fpga-region.txt index 6db8aeda461a..90c44694a30b 100644 --- a/Documentation/devicetree/bindings/fpga/fpga-region.txt +++ b/Documentation/devicetree/bindings/fpga/fpga-region.txt @@ -415,7 +415,7 @@ DT Overlay contains: firmware-name = "base.rbf"; fpga-bridge@4400 { - compatible = "altr,freeze-bridge"; + compatible = "altr,freeze-bridge-controller"; reg = <0x4400 0x10>; fpga_region1: fpga-region1 { @@ -427,7 +427,7 @@ DT Overlay contains: }; fpga-bridge@4420 { - compatible = "altr,freeze-bridge"; + compatible = "altr,freeze-bridge-controller"; reg = <0x4420 0x10>; fpga_region2: fpga-region2 { diff --git a/Documentation/devicetree/bindings/i2c/i2c.txt b/Documentation/devicetree/bindings/i2c/i2c.txt index 11263982470e..44efafdfd7f5 100644 --- a/Documentation/devicetree/bindings/i2c/i2c.txt +++ b/Documentation/devicetree/bindings/i2c/i2c.txt @@ -84,7 +84,7 @@ Binding may contain optional "interrupts" property, describing interrupts used by the device. I2C core will assign "irq" interrupt (or the very first interrupt if not using interrupt names) as primary interrupt for the slave. -Alternatively, devices supporting SMbus Host Notify, and connected to +Alternatively, devices supporting SMBus Host Notify, and connected to adapters that support this feature, may use "host-notify" property. I2C core will create a virtual interrupt for Host Notify and assign it as primary interrupt for the slave. diff --git a/Documentation/devicetree/bindings/interrupt-controller/marvell,icu.txt b/Documentation/devicetree/bindings/interrupt-controller/marvell,icu.txt index aa8bf2ec8905..1c94a57a661e 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/marvell,icu.txt +++ b/Documentation/devicetree/bindings/interrupt-controller/marvell,icu.txt @@ -5,6 +5,8 @@ The Marvell ICU (Interrupt Consolidation Unit) controller is responsible for collecting all wired-interrupt sources in the CP and communicating them to the GIC in the AP, the unit translates interrupt requests on input wires to MSG memory mapped transactions to the GIC. +These messages will access a different GIC memory area depending on +their type (NSR, SR, SEI, REI, etc). Required properties: @@ -12,20 +14,23 @@ Required properties: - reg: Should contain ICU registers location and length. -- #interrupt-cells: Specifies the number of cells needed to encode an - interrupt source. The value shall be 3. +Subnodes: Each group of interrupt is declared as a subnode of the ICU, +with their own compatible. + +Required properties for the icu_nsr/icu_sei subnodes: - The 1st cell is the group type of the ICU interrupt. Possible group - types are: +- compatible: Should be one of: + * "marvell,cp110-icu-nsr" + * "marvell,cp110-icu-sr" + * "marvell,cp110-icu-sei" + * "marvell,cp110-icu-rei" - ICU_GRP_NSR (0x0) : Shared peripheral interrupt, non-secure - ICU_GRP_SR (0x1) : Shared peripheral interrupt, secure - ICU_GRP_SEI (0x4) : System error interrupt - ICU_GRP_REI (0x5) : RAM error interrupt +- #interrupt-cells: Specifies the number of cells needed to encode an + interrupt source. The value shall be 2. - The 2nd cell is the index of the interrupt in the ICU unit. + The 1st cell is the index of the interrupt in the ICU unit. - The 3rd cell is the type of the interrupt. See arm,gic.txt for + The 2nd cell is the type of the interrupt. See arm,gic.txt for details. - interrupt-controller: Identifies the node as an interrupt @@ -35,17 +40,73 @@ Required properties: that allows to trigger interrupts using MSG memory mapped transactions. +Note: each 'interrupts' property referring to any 'icu_xxx' node shall + have a different number within [0:206]. + Example: icu: interrupt-controller@1e0000 { compatible = "marvell,cp110-icu"; - reg = <0x1e0000 0x10>; + reg = <0x1e0000 0x440>; + + CP110_LABEL(icu_nsr): interrupt-controller@10 { + compatible = "marvell,cp110-icu-nsr"; + reg = <0x10 0x20>; + #interrupt-cells = <2>; + interrupt-controller; + msi-parent = <&gicp>; + }; + + CP110_LABEL(icu_sei): interrupt-controller@50 { + compatible = "marvell,cp110-icu-sei"; + reg = <0x50 0x10>; + #interrupt-cells = <2>; + interrupt-controller; + msi-parent = <&sei>; + }; +}; + +node1 { + interrupt-parent = <&icu_nsr>; + interrupts = <106 IRQ_TYPE_LEVEL_HIGH>; +}; + +node2 { + interrupt-parent = <&icu_sei>; + interrupts = <107 IRQ_TYPE_LEVEL_HIGH>; +}; + +/* Would not work with the above nodes */ +node3 { + interrupt-parent = <&icu_nsr>; + interrupts = <107 IRQ_TYPE_LEVEL_HIGH>; +}; + +The legacy bindings were different in this way: + +- #interrupt-cells: The value was 3. + The 1st cell was the group type of the ICU interrupt. Possible + group types were: + ICU_GRP_NSR (0x0) : Shared peripheral interrupt, non-secure + ICU_GRP_SR (0x1) : Shared peripheral interrupt, secure + ICU_GRP_SEI (0x4) : System error interrupt + ICU_GRP_REI (0x5) : RAM error interrupt + The 2nd cell was the index of the interrupt in the ICU unit. + The 3rd cell was the type of the interrupt. See arm,gic.txt for + details. + +Example: + +icu: interrupt-controller@1e0000 { + compatible = "marvell,cp110-icu"; + reg = <0x1e0000 0x440>; + #interrupt-cells = <3>; interrupt-controller; msi-parent = <&gicp>; }; -usb3h0: usb3@500000 { +node1 { interrupt-parent = <&icu>; interrupts = <ICU_GRP_NSR 106 IRQ_TYPE_LEVEL_HIGH>; }; diff --git a/Documentation/devicetree/bindings/interrupt-controller/marvell,sei.txt b/Documentation/devicetree/bindings/interrupt-controller/marvell,sei.txt new file mode 100644 index 000000000000..0beafed502f5 --- /dev/null +++ b/Documentation/devicetree/bindings/interrupt-controller/marvell,sei.txt @@ -0,0 +1,36 @@ +Marvell SEI (System Error Interrupt) Controller +----------------------------------------------- + +Marvell SEI (System Error Interrupt) controller is an interrupt +aggregator. It receives interrupts from several sources and aggregates +them to a single interrupt line (an SPI) on the parent interrupt +controller. + +This interrupt controller can handle up to 64 SEIs, a set comes from the +AP and is wired while a second set comes from the CPs by the mean of +MSIs. + +Required properties: + +- compatible: should be one of: + * "marvell,ap806-sei" +- reg: SEI registers location and length. +- interrupts: identifies the parent IRQ that will be triggered. +- #interrupt-cells: number of cells to define an SEI wired interrupt + coming from the AP, should be 1. The cell is the IRQ + number. +- interrupt-controller: identifies the node as an interrupt controller + for AP interrupts. +- msi-controller: identifies the node as an MSI controller for the CPs + interrupts. + +Example: + + sei: interrupt-controller@3f0200 { + compatible = "marvell,ap806-sei"; + reg = <0x3f0200 0x40>; + interrupts = <GIC_SPI 0 IRQ_TYPE_LEVEL_HIGH>; + #interrupt-cells = <1>; + interrupt-controller; + msi-controller; + }; diff --git a/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt b/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt index a046ed374d80..8de96a4fb2d5 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt +++ b/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt @@ -2,10 +2,12 @@ DT bindings for the R-Mobile/R-Car/RZ/G interrupt controller Required properties: -- compatible: has to be "renesas,irqc-<soctype>", "renesas,irqc" as fallback. +- compatible: must be "renesas,irqc-<soctype>" or "renesas,intc-ex-<soctype>", + and "renesas,irqc" as fallback. Examples with soctypes are: - "renesas,irqc-r8a73a4" (R-Mobile APE6) - "renesas,irqc-r8a7743" (RZ/G1M) + - "renesas,irqc-r8a7744" (RZ/G1N) - "renesas,irqc-r8a7745" (RZ/G1E) - "renesas,irqc-r8a77470" (RZ/G1C) - "renesas,irqc-r8a7790" (R-Car H2) @@ -19,6 +21,7 @@ Required properties: - "renesas,intc-ex-r8a77965" (R-Car M3-N) - "renesas,intc-ex-r8a77970" (R-Car V3M) - "renesas,intc-ex-r8a77980" (R-Car V3H) + - "renesas,intc-ex-r8a77990" (R-Car E3) - "renesas,intc-ex-r8a77995" (R-Car D3) - #interrupt-cells: has to be <2>: an interrupt index and flags, as defined in interrupts.txt in this directory diff --git a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt index c6e2d855fe13..377ee639d103 100644 --- a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt +++ b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt @@ -12,6 +12,7 @@ Required Properties: - "renesas,ipmmu-r8a73a4" for the R8A73A4 (R-Mobile APE6) IPMMU. - "renesas,ipmmu-r8a7743" for the R8A7743 (RZ/G1M) IPMMU. + - "renesas,ipmmu-r8a7744" for the R8A7744 (RZ/G1N) IPMMU. - "renesas,ipmmu-r8a7745" for the R8A7745 (RZ/G1E) IPMMU. - "renesas,ipmmu-r8a7790" for the R8A7790 (R-Car H2) IPMMU. - "renesas,ipmmu-r8a7791" for the R8A7791 (R-Car M2-W) IPMMU. diff --git a/Documentation/devicetree/bindings/mfd/arizona.txt b/Documentation/devicetree/bindings/mfd/arizona.txt index 9b62831fdf3e..148ef621a5e5 100644 --- a/Documentation/devicetree/bindings/mfd/arizona.txt +++ b/Documentation/devicetree/bindings/mfd/arizona.txt @@ -76,7 +76,7 @@ Deprecated properties: Also see child specific device properties: Regulator - ../regulator/arizona-regulator.txt Extcon - ../extcon/extcon-arizona.txt - Sound - ../sound/arizona.txt + Sound - ../sound/wlf,arizona.txt Example: diff --git a/Documentation/devicetree/bindings/serial/atmel-usart.txt b/Documentation/devicetree/bindings/mfd/atmel-usart.txt index 7c0d6b2f53e4..7f0cd72f47d2 100644 --- a/Documentation/devicetree/bindings/serial/atmel-usart.txt +++ b/Documentation/devicetree/bindings/mfd/atmel-usart.txt @@ -1,6 +1,6 @@ * Atmel Universal Synchronous Asynchronous Receiver/Transmitter (USART) -Required properties: +Required properties for USART: - compatible: Should be "atmel,<chip>-usart" or "atmel,<chip>-dbgu" The compatible <chip> indicated will be the first SoC to support an additional mode or an USART new feature. @@ -11,7 +11,13 @@ Required properties: Required elements: "usart" - clocks: phandles to input clocks. -Optional properties: +Required properties for USART in SPI mode: +- #size-cells : Must be <0> +- #address-cells : Must be <1> +- cs-gpios: chipselects (internal cs not supported) +- atmel,usart-mode : Must be <AT91_USART_MODE_SPI> (found in dt-bindings/mfd/at91-usart.h) + +Optional properties in serial mode: - atmel,use-dma-rx: use of PDC or DMA for receiving data - atmel,use-dma-tx: use of PDC or DMA for transmitting data - {rts,cts,dtr,dsr,rng,dcd}-gpios: specify a GPIO for RTS/CTS/DTR/DSR/RI/DCD line respectively. @@ -62,3 +68,18 @@ Example: dma-names = "tx", "rx"; atmel,fifo-size = <32>; }; + +- SPI mode: + #include <dt-bindings/mfd/at91-usart.h> + + spi0: spi@f001c000 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "atmel,at91rm9200-usart", "atmel,at91sam9260-usart"; + atmel,usart-mode = <AT91_USART_MODE_SPI>; + reg = <0xf001c000 0x100>; + interrupts = <12 IRQ_TYPE_LEVEL_HIGH 5>; + clocks = <&usart0_clk>; + clock-names = "usart"; + cs-gpios = <&pioB 3 0>; + }; diff --git a/Documentation/devicetree/bindings/mips/mscc.txt b/Documentation/devicetree/bindings/mips/mscc.txt index ae15ec333542..bc817e984628 100644 --- a/Documentation/devicetree/bindings/mips/mscc.txt +++ b/Documentation/devicetree/bindings/mips/mscc.txt @@ -41,3 +41,19 @@ Example: compatible = "mscc,ocelot-cpu-syscon", "syscon"; reg = <0x70000000 0x2c>; }; + +o HSIO regs: + +The SoC has a few registers (HSIO) handling miscellaneous functionalities: +configuration and status of PLL5, RCOMP, SyncE, SerDes configurations and +status, SerDes muxing and a thermal sensor. + +Required properties: +- compatible: Should be "mscc,ocelot-hsio", "syscon", "simple-mfd" +- reg : Should contain registers location and length + +Example: + syscon@10d0000 { + compatible = "mscc,ocelot-hsio", "syscon", "simple-mfd"; + reg = <0x10d0000 0x10000>; + }; diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt index 6611a7c2053a..01fdc33a41d0 100644 --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt @@ -9,6 +9,25 @@ blocks that can be used to create functional hardware objects/devices such as network interfaces, crypto accelerator instances, L2 switches, etc. +For an overview of the DPAA2 architecture and fsl-mc bus see: +Documentation/networking/dpaa2/overview.rst + +As described in the above overview, all DPAA2 objects in a DPRC share the +same hardware "isolation context" and a 10-bit value called an ICID +(isolation context id) is expressed by the hardware to identify +the requester. + +The generic 'iommus' property is insufficient to describe the relationship +between ICIDs and IOMMUs, so an iommu-map property is used to define +the set of possible ICIDs under a root DPRC and how they map to +an IOMMU. + +For generic IOMMU bindings, see +Documentation/devicetree/bindings/iommu/iommu.txt. + +For arm-smmu binding, see: +Documentation/devicetree/bindings/iommu/arm,smmu.txt. + Required properties: - compatible @@ -88,14 +107,34 @@ Sub-nodes: Value type: <phandle> Definition: Specifies the phandle to the PHY device node associated with the this dpmac. +Optional properties: + +- iommu-map: Maps an ICID to an IOMMU and associated iommu-specifier + data. + + The property is an arbitrary number of tuples of + (icid-base,iommu,iommu-base,length). + + Any ICID i in the interval [icid-base, icid-base + length) is + associated with the listed IOMMU, with the iommu-specifier + (i - icid-base + iommu-base). Example: + smmu: iommu@5000000 { + compatible = "arm,mmu-500"; + #iommu-cells = <1>; + stream-match-mask = <0x7C00>; + ... + }; + fsl_mc: fsl-mc@80c000000 { compatible = "fsl,qoriq-mc"; reg = <0x00000008 0x0c000000 0 0x40>, /* MC portal base */ <0x00000000 0x08340000 0 0x40000>; /* MC control reg */ msi-parent = <&its>; + /* define map for ICIDs 23-64 */ + iommu-map = <23 &smmu 23 41>; #address-cells = <3>; #size-cells = <1>; diff --git a/Documentation/devicetree/bindings/misc/lwn-bk4.txt b/Documentation/devicetree/bindings/misc/lwn-bk4.txt new file mode 100644 index 000000000000..d6a8c188c087 --- /dev/null +++ b/Documentation/devicetree/bindings/misc/lwn-bk4.txt @@ -0,0 +1,26 @@ +* Liebherr's BK4 controller external SPI + +A device which handles data acquisition from compatible industrial +peripherals. +The SPI is used for data and management purposes in both master and +slave modes. + +Required properties: + +- compatible : Should be "lwn,bk4" + +Required SPI properties: + +- reg : Should be address of the device chip select within + the controller. + +- spi-max-frequency : Maximum SPI clocking speed of device in Hz, should be + 30MHz at most for the Liebherr's BK4 external bus. + +Example: + +spidev0: spi@0 { + compatible = "lwn,bk4"; + spi-max-frequency = <30000000>; + reg = <0>; +}; diff --git a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt index 4648948f7c3b..e15589f47787 100644 --- a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt +++ b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt @@ -19,6 +19,9 @@ Optional properties: - interrupt-names: must be "mdio_done_error" when there is a share interrupt fed to this hardware block, or must be "mdio_done" for the first interrupt and "mdio_error" for the second when there are separate interrupts +- clocks: A reference to the clock supplying the MDIO bus controller +- clock-frequency: the MDIO bus clock that must be output by the MDIO bus + hardware, if absent, the default hardware values are used Child nodes of this MDIO bus controller node are standard Ethernet PHY device nodes as described in Documentation/devicetree/bindings/net/phy.txt diff --git a/Documentation/devicetree/bindings/net/can/rcar_can.txt b/Documentation/devicetree/bindings/net/can/rcar_can.txt index 94a7f33ac5e9..cc4372842bf3 100644 --- a/Documentation/devicetree/bindings/net/can/rcar_can.txt +++ b/Documentation/devicetree/bindings/net/can/rcar_can.txt @@ -3,6 +3,7 @@ Renesas R-Car CAN controller Device Tree Bindings Required properties: - compatible: "renesas,can-r8a7743" if CAN controller is a part of R8A7743 SoC. + "renesas,can-r8a7744" if CAN controller is a part of R8A7744 SoC. "renesas,can-r8a7745" if CAN controller is a part of R8A7745 SoC. "renesas,can-r8a7778" if CAN controller is a part of R8A7778 SoC. "renesas,can-r8a7779" if CAN controller is a part of R8A7779 SoC. diff --git a/Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt b/Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt new file mode 100644 index 000000000000..886cbe8ffb38 --- /dev/null +++ b/Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt @@ -0,0 +1,143 @@ +Lantiq GSWIP Ethernet switches +================================== + +Required properties for GSWIP core: + +- compatible : "lantiq,xrx200-gswip" for the embedded GSWIP in the + xRX200 SoC +- reg : memory range of the GSWIP core registers + : memory range of the GSWIP MDIO registers + : memory range of the GSWIP MII registers + +See Documentation/devicetree/bindings/net/dsa/dsa.txt for a list of +additional required and optional properties. + + +Required properties for MDIO bus: +- compatible : "lantiq,xrx200-mdio" for the MDIO bus inside the GSWIP + core of the xRX200 SoC and the PHYs connected to it. + +See Documentation/devicetree/bindings/net/mdio.txt for a list of additional +required and optional properties. + + +Required properties for GPHY firmware loading: +- compatible : "lantiq,xrx200-gphy-fw", "lantiq,gphy-fw" + "lantiq,xrx300-gphy-fw", "lantiq,gphy-fw" + "lantiq,xrx330-gphy-fw", "lantiq,gphy-fw" + for the loading of the firmware into the embedded + GPHY core of the SoC. +- lantiq,rcu : reference to the rcu syscon + +The GPHY firmware loader has a list of GPHY entries, one for each +embedded GPHY + +- reg : Offset of the GPHY firmware register in the RCU + register range +- resets : list of resets of the embedded GPHY +- reset-names : list of names of the resets + +Example: + +Ethernet switch on the VRX200 SoC: + +switch@e108000 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "lantiq,xrx200-gswip"; + reg = < 0xe108000 0x3100 /* switch */ + 0xe10b100 0xd8 /* mdio */ + 0xe10b1d8 0x130 /* mii */ + >; + dsa,member = <0 0>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + label = "lan3"; + phy-mode = "rgmii"; + phy-handle = <&phy0>; + }; + + port@1 { + reg = <1>; + label = "lan4"; + phy-mode = "rgmii"; + phy-handle = <&phy1>; + }; + + port@2 { + reg = <2>; + label = "lan2"; + phy-mode = "internal"; + phy-handle = <&phy11>; + }; + + port@4 { + reg = <4>; + label = "lan1"; + phy-mode = "internal"; + phy-handle = <&phy13>; + }; + + port@5 { + reg = <5>; + label = "wan"; + phy-mode = "rgmii"; + phy-handle = <&phy5>; + }; + + port@6 { + reg = <0x6>; + label = "cpu"; + ethernet = <ð0>; + }; + }; + + mdio { + #address-cells = <1>; + #size-cells = <0>; + compatible = "lantiq,xrx200-mdio"; + reg = <0>; + + phy0: ethernet-phy@0 { + reg = <0x0>; + }; + phy1: ethernet-phy@1 { + reg = <0x1>; + }; + phy5: ethernet-phy@5 { + reg = <0x5>; + }; + phy11: ethernet-phy@11 { + reg = <0x11>; + }; + phy13: ethernet-phy@13 { + reg = <0x13>; + }; + }; + + gphy-fw { + compatible = "lantiq,xrx200-gphy-fw", "lantiq,gphy-fw"; + lantiq,rcu = <&rcu0>; + #address-cells = <1>; + #size-cells = <0>; + + gphy@20 { + reg = <0x20>; + + resets = <&reset0 31 30>; + reset-names = "gphy"; + }; + + gphy@68 { + reg = <0x68>; + + resets = <&reset0 29 28>; + reset-names = "gphy"; + }; + }; +}; diff --git a/Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt b/Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt new file mode 100644 index 000000000000..5ff5e68bbbb6 --- /dev/null +++ b/Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt @@ -0,0 +1,21 @@ +Lantiq xRX200 GSWIP PMAC Ethernet driver +================================== + +Required properties: + +- compatible : "lantiq,xrx200-net" for the PMAC of the embedded + : GSWIP in the xXR200 +- reg : memory range of the PMAC core inside of the GSWIP core +- interrupts : TX and RX DMA interrupts. Use interrupt-names "tx" for + : the TX interrupt and "rx" for the RX interrupt. + +Example: + +ethernet@e10b308 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "lantiq,xrx200-net"; + reg = <0xe10b308 0xcf8>; + interrupts = <73>, <72>; + interrupt-names = "tx", "rx"; +}; diff --git a/Documentation/devicetree/bindings/net/marvell-pp2.txt b/Documentation/devicetree/bindings/net/marvell-pp2.txt index fc019df0d863..b78397669320 100644 --- a/Documentation/devicetree/bindings/net/marvell-pp2.txt +++ b/Documentation/devicetree/bindings/net/marvell-pp2.txt @@ -31,7 +31,7 @@ required. Required properties (port): -- interrupts: interrupt for the port +- interrupts: interrupt(s) for the port - port-id: ID of the port from the MAC point of view - gop-port-id: only for marvell,armada-7k-pp2, ID of the port from the GOP (Group Of Ports) point of view. This ID is used to index the @@ -43,10 +43,12 @@ Optional properties (port): - marvell,loopback: port is loopback mode - phy: a phandle to a phy node defining the PHY address (as the reg property, a single integer). -- interrupt-names: if more than a single interrupt for rx is given, must - be the name associated to the interrupts listed. Valid - names are: "tx-cpu0", "tx-cpu1", "tx-cpu2", "tx-cpu3", - "rx-shared", "link". +- interrupt-names: if more than a single interrupt for is given, must be the + name associated to the interrupts listed. Valid names are: + "hifX", with X in [0..8], and "link". The names "tx-cpu0", + "tx-cpu1", "tx-cpu2", "tx-cpu3" and "rx-shared" are supported + for backward compatibility but shouldn't be used for new + additions. - marvell,system-controller: a phandle to the system controller. Example for marvell,armada-375-pp2: @@ -89,9 +91,14 @@ cpm_ethernet: ethernet@0 { <ICU_GRP_NSR 43 IRQ_TYPE_LEVEL_HIGH>, <ICU_GRP_NSR 47 IRQ_TYPE_LEVEL_HIGH>, <ICU_GRP_NSR 51 IRQ_TYPE_LEVEL_HIGH>, - <ICU_GRP_NSR 55 IRQ_TYPE_LEVEL_HIGH>; - interrupt-names = "tx-cpu0", "tx-cpu1", "tx-cpu2", - "tx-cpu3", "rx-shared"; + <ICU_GRP_NSR 55 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 59 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 63 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 67 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 71 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 129 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4", + "hif5", "hif6", "hif7", "hif8", "link"; port-id = <0>; gop-port-id = <0>; }; @@ -101,9 +108,14 @@ cpm_ethernet: ethernet@0 { <ICU_GRP_NSR 44 IRQ_TYPE_LEVEL_HIGH>, <ICU_GRP_NSR 48 IRQ_TYPE_LEVEL_HIGH>, <ICU_GRP_NSR 52 IRQ_TYPE_LEVEL_HIGH>, - <ICU_GRP_NSR 56 IRQ_TYPE_LEVEL_HIGH>; - interrupt-names = "tx-cpu0", "tx-cpu1", "tx-cpu2", - "tx-cpu3", "rx-shared"; + <ICU_GRP_NSR 56 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 60 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 64 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 68 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 72 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 128 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4", + "hif5", "hif6", "hif7", "hif8", "link"; port-id = <1>; gop-port-id = <2>; }; @@ -113,9 +125,14 @@ cpm_ethernet: ethernet@0 { <ICU_GRP_NSR 45 IRQ_TYPE_LEVEL_HIGH>, <ICU_GRP_NSR 49 IRQ_TYPE_LEVEL_HIGH>, <ICU_GRP_NSR 53 IRQ_TYPE_LEVEL_HIGH>, - <ICU_GRP_NSR 57 IRQ_TYPE_LEVEL_HIGH>; - interrupt-names = "tx-cpu0", "tx-cpu1", "tx-cpu2", - "tx-cpu3", "rx-shared"; + <ICU_GRP_NSR 57 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 61 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 65 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 69 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 73 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 127 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4", + "hif5", "hif6", "hif7", "hif8", "link"; port-id = <2>; gop-port-id = <3>; }; diff --git a/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt b/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt index e22d8cfea687..5100358177c9 100644 --- a/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt +++ b/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt @@ -1,4 +1,4 @@ -Micrel KSZ9021/KSZ9031 Gigabit Ethernet PHY +Micrel KSZ9021/KSZ9031/KSZ9131 Gigabit Ethernet PHY Some boards require special tuning values, particularly when it comes to clock delays. You can specify clock delay values in the PHY OF @@ -64,6 +64,32 @@ KSZ9031: Attention: The link partner must be configurable as slave otherwise no link will be established. +KSZ9131: + + All skew control options are specified in picoseconds. The increment + step is 100ps. Unlike KSZ9031, the values represent picoseccond delays. + A negative value can be assigned as rxc-skew-psec = <(-100)>;. + + Optional properties: + + Range of the value -700 to 2400, default value 0: + + - rxc-skew-psec : Skew control of RX clock pad + - txc-skew-psec : Skew control of TX clock pad + + Range of the value -700 to 800, default value 0: + + - rxdv-skew-psec : Skew control of RX CTL pad + - txen-skew-psec : Skew control of TX CTL pad + - rxd0-skew-psec : Skew control of RX data 0 pad + - rxd1-skew-psec : Skew control of RX data 1 pad + - rxd2-skew-psec : Skew control of RX data 2 pad + - rxd3-skew-psec : Skew control of RX data 3 pad + - txd0-skew-psec : Skew control of TX data 0 pad + - txd1-skew-psec : Skew control of TX data 1 pad + - txd2-skew-psec : Skew control of TX data 2 pad + - txd3-skew-psec : Skew control of TX data 3 pad + Examples: mdio { diff --git a/Documentation/devicetree/bindings/net/mscc-ocelot.txt b/Documentation/devicetree/bindings/net/mscc-ocelot.txt index 0a84711abece..9e5c17d426ce 100644 --- a/Documentation/devicetree/bindings/net/mscc-ocelot.txt +++ b/Documentation/devicetree/bindings/net/mscc-ocelot.txt @@ -12,7 +12,6 @@ Required properties: - "sys" - "rew" - "qs" - - "hsio" - "qsys" - "ana" - "portX" with X from 0 to the number of last port index available on that @@ -45,7 +44,6 @@ Example: reg = <0x1010000 0x10000>, <0x1030000 0x10000>, <0x1080000 0x100>, - <0x10d0000 0x10000>, <0x11e0000 0x100>, <0x11f0000 0x100>, <0x1200000 0x100>, @@ -59,10 +57,9 @@ Example: <0x1280000 0x100>, <0x1800000 0x80000>, <0x1880000 0x10000>; - reg-names = "sys", "rew", "qs", "hsio", "port0", - "port1", "port2", "port3", "port4", "port5", - "port6", "port7", "port8", "port9", "port10", - "qsys", "ana"; + reg-names = "sys", "rew", "qs", "port0", "port1", "port2", + "port3", "port4", "port5", "port6", "port7", + "port8", "port9", "port10", "qsys", "ana"; interrupts = <21 22>; interrupt-names = "xtr", "inj"; diff --git a/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt b/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt index 0eedabe22cc3..5ff37c68c941 100644 --- a/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt +++ b/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt @@ -1,10 +1,5 @@ * Microsemi - vsc8531 Giga bit ethernet phy -Required properties: -- compatible : Should contain phy id as "ethernet-phy-idAAAA.BBBB" - The PHY device uses the binding described in - Documentation/devicetree/bindings/net/phy.txt - Optional properties: - vsc8531,vddmac : The vddmac in mV. Allowed values is listed in the first row of Table 1 (below). @@ -27,14 +22,16 @@ Optional properties: 'vddmac'. Default value is 0%. Ref: Table:1 - Edge rate change (below). -- vsc8531,led-0-mode : LED mode. Specify how the LED[0] should behave. - Allowed values are define in - "include/dt-bindings/net/mscc-phy-vsc8531.h". - Default value is VSC8531_LINK_1000_ACTIVITY (1). -- vsc8531,led-1-mode : LED mode. Specify how the LED[1] should behave. - Allowed values are define in +- vsc8531,led-[N]-mode : LED mode. Specify how the LED[N] should behave. + N depends on the number of LEDs supported by a + PHY. + Allowed values are defined in "include/dt-bindings/net/mscc-phy-vsc8531.h". - Default value is VSC8531_LINK_100_ACTIVITY (2). + Default values are VSC8531_LINK_1000_ACTIVITY (1), + VSC8531_LINK_100_ACTIVITY (2), + VSC8531_LINK_ACTIVITY (0) and + VSC8531_DUPLEX_COLLISION (8). + Table: 1 - Edge rate change ----------------------------------------------------------------| diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt b/Documentation/devicetree/bindings/net/renesas,ravb.txt index da249b7c406c..3530256a879c 100644 --- a/Documentation/devicetree/bindings/net/renesas,ravb.txt +++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt @@ -6,6 +6,7 @@ interface contains. Required properties: - compatible: Must contain one or more of the following: - "renesas,etheravb-r8a7743" for the R8A7743 SoC. + - "renesas,etheravb-r8a7744" for the R8A7744 SoC. - "renesas,etheravb-r8a7745" for the R8A7745 SoC. - "renesas,etheravb-r8a77470" for the R8A77470 SoC. - "renesas,etheravb-r8a7790" for the R8A7790 SoC. diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt index 7fd4e8ce4149..2196d1ab3c8c 100644 --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt @@ -56,6 +56,11 @@ Optional properties: the length can vary between hw versions. - <supply-name>-supply: handle to the regulator device tree node optional "supply-name" is "vdd-0.8-cx-mx". +- memory-region: + Usage: optional + Value type: <phandle> + Definition: reference to the reserved-memory for the msa region + used by the wifi firmware running in Q6. Example (to supply the calibration data alone): @@ -149,4 +154,5 @@ wifi@18000000 { <0 140 0 /* CE10 */ >, <0 141 0 /* CE11 */ >; vdd-0.8-cx-mx-supply = <&pm8998_l5>; + memory-region = <&wifi_msa_mem>; }; diff --git a/Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.txt b/Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.txt index cb33421184a0..f37494d5a7be 100644 --- a/Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.txt +++ b/Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.txt @@ -50,6 +50,7 @@ Additional required properties for imx7d-pcie: - reset-names: Must contain the following entires: - "pciephy" - "apps" + - "turnoff" Example: diff --git a/Documentation/devicetree/bindings/pci/pci-keystone.txt b/Documentation/devicetree/bindings/pci/pci-keystone.txt index 4dd17de549a7..2030ee0dc4f9 100644 --- a/Documentation/devicetree/bindings/pci/pci-keystone.txt +++ b/Documentation/devicetree/bindings/pci/pci-keystone.txt @@ -19,6 +19,9 @@ pcie_msi_intc : Interrupt controller device node for MSI IRQ chip interrupt-cells: should be set to 1 interrupts: GIC interrupt lines connected to PCI MSI interrupt lines +ti,syscon-pcie-id : phandle to the device control module required to set device + id and vendor id. + Example: pcie_msi_intc: msi-interrupt-controller { interrupt-controller; diff --git a/Documentation/devicetree/bindings/pci/pci-rcar-gen2.txt b/Documentation/devicetree/bindings/pci/pci-rcar-gen2.txt index 9fe7e12a7bf3..b94078f58d8e 100644 --- a/Documentation/devicetree/bindings/pci/pci-rcar-gen2.txt +++ b/Documentation/devicetree/bindings/pci/pci-rcar-gen2.txt @@ -7,6 +7,7 @@ OHCI and EHCI controllers. Required properties: - compatible: "renesas,pci-r8a7743" for the R8A7743 SoC; + "renesas,pci-r8a7744" for the R8A7744 SoC; "renesas,pci-r8a7745" for the R8A7745 SoC; "renesas,pci-r8a7790" for the R8A7790 SoC; "renesas,pci-r8a7791" for the R8A7791 SoC; diff --git a/Documentation/devicetree/bindings/pci/rcar-pci.txt b/Documentation/devicetree/bindings/pci/rcar-pci.txt index a5f7fc62d10e..976ef7bfff93 100644 --- a/Documentation/devicetree/bindings/pci/rcar-pci.txt +++ b/Documentation/devicetree/bindings/pci/rcar-pci.txt @@ -2,6 +2,7 @@ Required properties: compatible: "renesas,pcie-r8a7743" for the R8A7743 SoC; + "renesas,pcie-r8a7744" for the R8A7744 SoC; "renesas,pcie-r8a7779" for the R8A7779 SoC; "renesas,pcie-r8a7790" for the R8A7790 SoC; "renesas,pcie-r8a7791" for the R8A7791 SoC; @@ -9,6 +10,7 @@ compatible: "renesas,pcie-r8a7743" for the R8A7743 SoC; "renesas,pcie-r8a7795" for the R8A7795 SoC; "renesas,pcie-r8a7796" for the R8A7796 SoC; "renesas,pcie-r8a77980" for the R8A77980 SoC; + "renesas,pcie-r8a77990" for the R8A77990 SoC; "renesas,pcie-rcar-gen2" for a generic R-Car Gen2 or RZ/G1 compatible device. "renesas,pcie-rcar-gen3" for a generic R-Car Gen3 compatible device. diff --git a/Documentation/devicetree/bindings/pci/ti-pci.txt b/Documentation/devicetree/bindings/pci/ti-pci.txt index 7f7af3044016..452fe48c4fdd 100644 --- a/Documentation/devicetree/bindings/pci/ti-pci.txt +++ b/Documentation/devicetree/bindings/pci/ti-pci.txt @@ -26,6 +26,11 @@ HOST MODE ranges, interrupt-map-mask, interrupt-map : as specified in ../designware-pcie.txt + - ti,syscon-unaligned-access: phandle to the syscon DT node. The 1st argument + should contain the register offset within syscon + and the 2nd argument should contain the bit field + for setting the bit to enable unaligned + access. DEVICE MODE =========== diff --git a/Documentation/devicetree/bindings/phy/brcm-sata-phy.txt b/Documentation/devicetree/bindings/phy/brcm-sata-phy.txt index 0aced97d8092..b640845fec67 100644 --- a/Documentation/devicetree/bindings/phy/brcm-sata-phy.txt +++ b/Documentation/devicetree/bindings/phy/brcm-sata-phy.txt @@ -8,6 +8,7 @@ Required properties: "brcm,iproc-nsp-sata-phy" "brcm,phy-sata3" "brcm,iproc-sr-sata-phy" + "brcm,bcm63138-sata-phy" - address-cells: should be 1 - size-cells: should be 0 - reg: register ranges for the PHY PCB interface diff --git a/Documentation/devicetree/bindings/phy/phy-cadence-dp.txt b/Documentation/devicetree/bindings/phy/phy-cadence-dp.txt new file mode 100644 index 000000000000..7f49fd54ebc1 --- /dev/null +++ b/Documentation/devicetree/bindings/phy/phy-cadence-dp.txt @@ -0,0 +1,30 @@ +Cadence MHDP DisplayPort SD0801 PHY binding +=========================================== + +This binding describes the Cadence SD0801 PHY hardware included with +the Cadence MHDP DisplayPort controller. + +------------------------------------------------------------------------------- +Required properties (controller (parent) node): +- compatible : Should be "cdns,dp-phy" +- reg : Defines the following sets of registers in the parent + mhdp device: + - Offset of the DPTX PHY configuration registers + - Offset of the SD0801 PHY configuration registers +- #phy-cells : from the generic PHY bindings, must be 0. + +Optional properties: +- num_lanes : Number of DisplayPort lanes to use (1, 2 or 4) +- max_bit_rate : Maximum DisplayPort link bit rate to use, in Mbps (2160, + 2430, 2700, 3240, 4320, 5400 or 8100) +------------------------------------------------------------------------------- + +Example: + dp_phy: phy@f0fb030a00 { + compatible = "cdns,dp-phy"; + reg = <0xf0 0xfb030a00 0x0 0x00000040>, + <0xf0 0xfb500000 0x0 0x00100000>; + num_lanes = <4>; + max_bit_rate = <8100>; + #phy-cells = <0>; + }; diff --git a/Documentation/devicetree/bindings/phy/phy-ocelot-serdes.txt b/Documentation/devicetree/bindings/phy/phy-ocelot-serdes.txt new file mode 100644 index 000000000000..332219860187 --- /dev/null +++ b/Documentation/devicetree/bindings/phy/phy-ocelot-serdes.txt @@ -0,0 +1,43 @@ +Microsemi Ocelot SerDes muxing driver +------------------------------------- + +On Microsemi Ocelot, there is a handful of registers in HSIO address +space for setting up the SerDes to switch port muxing. + +A SerDes X can be "muxed" to work with switch port Y or Z for example. +One specific SerDes can also be used as a PCIe interface. + +Hence, a SerDes represents an interface, be it an Ethernet or a PCIe one. + +There are two kinds of SerDes: SERDES1G supports 10/100Mbps in +half/full-duplex and 1000Mbps in full-duplex mode while SERDES6G supports +10/100Mbps in half/full-duplex and 1000/2500Mbps in full-duplex mode. + +Also, SERDES6G number (aka "macro") 0 is the only interface supporting +QSGMII. + +This is a child of the HSIO syscon ("mscc,ocelot-hsio", see +Documentation/devicetree/bindings/mips/mscc.txt) on the Microsemi Ocelot. + +Required properties: + +- compatible: should be "mscc,vsc7514-serdes" +- #phy-cells : from the generic phy bindings, must be 2. + The first number defines the input port to use for a given + SerDes macro. The second defines the macro to use. They are + defined in dt-bindings/phy/phy-ocelot-serdes.h + +Example: + + serdes: serdes { + compatible = "mscc,vsc7514-serdes"; + #phy-cells = <2>; + }; + + ethernet { + port1 { + phy-handle = <&phy_foo>; + /* Link SERDES1G_5 to port1 */ + phys = <&serdes 1 SERDES1G_5>; + }; + }; diff --git a/Documentation/devicetree/bindings/phy/phy-rockchip-inno-hdmi.txt b/Documentation/devicetree/bindings/phy/phy-rockchip-inno-hdmi.txt new file mode 100644 index 000000000000..710cccd5ee56 --- /dev/null +++ b/Documentation/devicetree/bindings/phy/phy-rockchip-inno-hdmi.txt @@ -0,0 +1,43 @@ +ROCKCHIP HDMI PHY WITH INNO IP BLOCK + +Required properties: + - compatible : should be one of the listed compatibles: + * "rockchip,rk3228-hdmi-phy", + * "rockchip,rk3328-hdmi-phy"; + - reg : Address and length of the hdmi phy control register set + - clocks : phandle + clock specifier for the phy clocks + - clock-names : string, clock name, must contain "sysclk" for system + control and register configuration, "refoclk" for crystal- + oscillator reference PLL clock input and "refpclk" for pclk- + based refeference PLL clock input. + - #clock-cells: should be 0. + - clock-output-names : shall be the name for the output clock. + - interrupts : phandle + interrupt specified for the hdmiphy interrupt + - #phy-cells : must be 0. See ./phy-bindings.txt for details. + +Optional properties for rk3328-hdmi-phy: + - nvmem-cells = phandle + nvmem specifier for the cpu-version efuse + - nvmem-cell-names : "cpu-version" to read the chip version, required + for adjustment to some frequency settings + +Example: + hdmi_phy: hdmi-phy@12030000 { + compatible = "rockchip,rk3228-hdmi-phy"; + reg = <0x12030000 0x10000>; + #phy-cells = <0>; + clocks = <&cru PCLK_HDMI_PHY>, <&xin24m>, <&cru DCLK_HDMIPHY>; + clock-names = "sysclk", "refoclk", "refpclk"; + #clock-cells = <0>; + clock-output-names = "hdmi_phy"; + status = "disabled"; + }; + +Then the PHY can be used in other nodes such as: + + hdmi: hdmi@200a0000 { + compatible = "rockchip,rk3228-dw-hdmi"; + ... + phys = <&hdmi_phy>; + phy-names = "hdmi"; + ... + }; diff --git a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt index 0c7629e88bf3..adf20b2bdf71 100644 --- a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt +++ b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt @@ -10,16 +10,20 @@ Required properties: "qcom,msm8996-qmp-pcie-phy" for 14nm PCIe phy on msm8996, "qcom,msm8996-qmp-usb3-phy" for 14nm USB3 phy on msm8996, "qcom,sdm845-qmp-usb3-phy" for USB3 QMP V3 phy on sdm845, - "qcom,sdm845-qmp-usb3-uni-phy" for USB3 QMP V3 UNI phy on sdm845. + "qcom,sdm845-qmp-usb3-uni-phy" for USB3 QMP V3 UNI phy on sdm845, + "qcom,sdm845-qmp-ufs-phy" for UFS QMP phy on sdm845. - - reg: - - For "qcom,sdm845-qmp-usb3-phy": - - index 0: address and length of register set for PHY's common serdes - block. - - named register "dp_com" (using reg-names): address and length of the - DP_COM control block. - - For all others: - - offset and length of register set for PHY's common serdes block. +- reg: + - index 0: address and length of register set for PHY's common + serdes block. + - index 1: address and length of the DP_COM control block (for + "qcom,sdm845-qmp-usb3-phy" only). + +- reg-names: + - For "qcom,sdm845-qmp-usb3-phy": + - Should be: "reg-base", "dp_com" + - For all others: + - The reg-names property shouldn't be defined. - #clock-cells: must be 1 - Phy pll outputs a bunch of clocks for Tx, Rx and Pipe @@ -35,6 +39,7 @@ Required properties: "aux" for phy aux clock, "ref" for 19.2 MHz ref clk, "com_aux" for phy common block aux clock, + "ref_aux" for phy reference aux clock, For "qcom,msm8996-qmp-pcie-phy" must contain: "aux", "cfg_ahb", "ref". For "qcom,msm8996-qmp-usb3-phy" must contain: diff --git a/Documentation/devicetree/bindings/phy/rcar-gen2-phy.txt b/Documentation/devicetree/bindings/phy/rcar-gen2-phy.txt index eeb9e1874ea6..4f0879a0ca12 100644 --- a/Documentation/devicetree/bindings/phy/rcar-gen2-phy.txt +++ b/Documentation/devicetree/bindings/phy/rcar-gen2-phy.txt @@ -5,6 +5,7 @@ This file provides information on what the device node for the R-Car generation Required properties: - compatible: "renesas,usb-phy-r8a7743" if the device is a part of R8A7743 SoC. + "renesas,usb-phy-r8a7744" if the device is a part of R8A7744 SoC. "renesas,usb-phy-r8a7745" if the device is a part of R8A7745 SoC. "renesas,usb-phy-r8a7790" if the device is a part of R8A7790 SoC. "renesas,usb-phy-r8a7791" if the device is a part of R8A7791 SoC. diff --git a/Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb2.txt b/Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb2.txt index fb4a204da2bf..de7b5393c163 100644 --- a/Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb2.txt +++ b/Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb2.txt @@ -1,10 +1,12 @@ * Renesas R-Car generation 3 USB 2.0 PHY This file provides information on what the device node for the R-Car generation -3 USB 2.0 PHY contains. +3 and RZ/G2 USB 2.0 PHY contain. Required properties: -- compatible: "renesas,usb2-phy-r8a7795" if the device is a part of an R8A7795 +- compatible: "renesas,usb2-phy-r8a774a1" if the device is a part of an R8A774A1 + SoC. + "renesas,usb2-phy-r8a7795" if the device is a part of an R8A7795 SoC. "renesas,usb2-phy-r8a7796" if the device is a part of an R8A7796 SoC. @@ -14,7 +16,8 @@ Required properties: R8A77990 SoC. "renesas,usb2-phy-r8a77995" if the device is a part of an R8A77995 SoC. - "renesas,rcar-gen3-usb2-phy" for a generic R-Car Gen3 compatible device. + "renesas,rcar-gen3-usb2-phy" for a generic R-Car Gen3 or RZ/G2 + compatible device. When compatible with the generic version, nodes must list the SoC-specific version corresponding to the platform first @@ -31,6 +34,8 @@ channel as USB OTG: - interrupts: interrupt specifier for the PHY. - vbus-supply: Phandle to a regulator that provides power to the VBUS. This regulator will be managed during the PHY power on/off sequence. +- renesas,no-otg-pins: boolean, specify when a board does not provide proper + otg pins. Example (R-Car H3): diff --git a/Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb3.txt b/Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb3.txt index 47dd296ecead..9d9826609c2f 100644 --- a/Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb3.txt +++ b/Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb3.txt @@ -1,20 +1,22 @@ * Renesas R-Car generation 3 USB 3.0 PHY This file provides information on what the device node for the R-Car generation -3 USB 3.0 PHY contains. +3 and RZ/G2 USB 3.0 PHY contain. If you want to enable spread spectrum clock (ssc), you should use USB_EXTAL instead of USB3_CLK. However, if you don't want to these features, you don't need this driver. Required properties: -- compatible: "renesas,r8a7795-usb3-phy" if the device is a part of an R8A7795 +- compatible: "renesas,r8a774a1-usb3-phy" if the device is a part of an R8A774A1 + SoC. + "renesas,r8a7795-usb3-phy" if the device is a part of an R8A7795 SoC. "renesas,r8a7796-usb3-phy" if the device is a part of an R8A7796 SoC. "renesas,r8a77965-usb3-phy" if the device is a part of an R8A77965 SoC. - "renesas,rcar-gen3-usb3-phy" for a generic R-Car Gen3 compatible - device. + "renesas,rcar-gen3-usb3-phy" for a generic R-Car Gen3 or RZ/G2 + compatible device. When compatible with the generic version, nodes must list the SoC-specific version corresponding to the platform first diff --git a/Documentation/devicetree/bindings/phy/uniphier-pcie-phy.txt b/Documentation/devicetree/bindings/phy/uniphier-pcie-phy.txt new file mode 100644 index 000000000000..1889d3b89d68 --- /dev/null +++ b/Documentation/devicetree/bindings/phy/uniphier-pcie-phy.txt @@ -0,0 +1,31 @@ +Socionext UniPhier PCIe PHY bindings + +This describes the devicetree bindings for PHY interface built into +PCIe controller implemented on Socionext UniPhier SoCs. + +Required properties: +- compatible: Should contain one of the following: + "socionext,uniphier-ld20-pcie-phy" - for LD20 PHY + "socionext,uniphier-pxs3-pcie-phy" - for PXs3 PHY +- reg: Specifies offset and length of the register set for the device. +- #phy-cells: Must be zero. +- clocks: A phandle to the clock gate for PCIe glue layer including + this phy. +- resets: A phandle to the reset line for PCIe glue layer including + this phy. + +Optional properties: +- socionext,syscon: A phandle to system control to set configurations + for phy. + +Refer to phy/phy-bindings.txt for the generic PHY binding properties. + +Example: + pcie_phy: phy@66038000 { + compatible = "socionext,uniphier-ld20-pcie-phy"; + reg = <0x66038000 0x4000>; + #phy-cells = <0>; + clocks = <&sys_clk 24>; + resets = <&sys_rst 24>; + socionext,syscon = <&soc_glue>; + }; diff --git a/Documentation/devicetree/bindings/phy/uniphier-usb2-phy.txt b/Documentation/devicetree/bindings/phy/uniphier-usb2-phy.txt new file mode 100644 index 000000000000..b43b28250cc0 --- /dev/null +++ b/Documentation/devicetree/bindings/phy/uniphier-usb2-phy.txt @@ -0,0 +1,45 @@ +Socionext UniPhier USB2 PHY + +This describes the devicetree bindings for PHY interface built into +USB2 controller implemented on Socionext UniPhier SoCs. + +Pro4 SoC has both USB2 and USB3 host controllers, however, this USB3 +controller doesn't include its own High-Speed PHY. This needs to specify +USB2 PHY instead of USB3 HS-PHY. + +Required properties: +- compatible: Should contain one of the following: + "socionext,uniphier-pro4-usb2-phy" - for Pro4 SoC + "socionext,uniphier-ld11-usb2-phy" - for LD11 SoC + +Sub-nodes: +Each PHY should be represented as a sub-node. + +Sub-nodes required properties: +- #phy-cells: Should be 0. +- reg: The number of the PHY. + +Sub-nodes optional properties: +- vbus-supply: A phandle to the regulator for USB VBUS. + +Refer to phy/phy-bindings.txt for the generic PHY binding properties. + +Example: + soc-glue@5f800000 { + ... + usb-phy { + compatible = "socionext,uniphier-ld11-usb2-phy"; + usb_phy0: phy@0 { + reg = <0>; + #phy-cells = <0>; + }; + ... + }; + }; + + usb@5a800100 { + compatible = "socionext,uniphier-ehci", "generic-ehci"; + ... + phy-names = "usb"; + phys = <&usb_phy0>; + }; diff --git a/Documentation/devicetree/bindings/phy/uniphier-usb3-hsphy.txt b/Documentation/devicetree/bindings/phy/uniphier-usb3-hsphy.txt new file mode 100644 index 000000000000..e8d8086a7ae9 --- /dev/null +++ b/Documentation/devicetree/bindings/phy/uniphier-usb3-hsphy.txt @@ -0,0 +1,69 @@ +Socionext UniPhier USB3 High-Speed (HS) PHY + +This describes the devicetree bindings for PHY interfaces built into +USB3 controller implemented on Socionext UniPhier SoCs. +Although the controller includes High-Speed PHY and Super-Speed PHY, +this describes about High-Speed PHY. + +Required properties: +- compatible: Should contain one of the following: + "socionext,uniphier-pro4-usb3-hsphy" - for Pro4 SoC + "socionext,uniphier-pxs2-usb3-hsphy" - for PXs2 SoC + "socionext,uniphier-ld20-usb3-hsphy" - for LD20 SoC + "socionext,uniphier-pxs3-usb3-hsphy" - for PXs3 SoC +- reg: Specifies offset and length of the register set for the device. +- #phy-cells: Should be 0. +- clocks: A list of phandles to the clock gate for USB3 glue layer. + According to the clock-names, appropriate clocks are required. +- clock-names: Should contain the following: + "gio", "link" - for Pro4 SoC + "phy", "phy-ext", "link" - for PXs3 SoC, "phy-ext" is optional. + "phy", "link" - for others +- resets: A list of phandles to the reset control for USB3 glue layer. + According to the reset-names, appropriate resets are required. +- reset-names: Should contain the following: + "gio", "link" - for Pro4 SoC + "phy", "link" - for others + +Optional properties: +- vbus-supply: A phandle to the regulator for USB VBUS. +- nvmem-cells: Phandles to nvmem cell that contains the trimming data. + Available only for HS-PHY implemented on LD20 and PXs3, and + if unspecified, default value is used. +- nvmem-cell-names: Should be the following names, which correspond to + each nvmem-cells. + All of the 3 parameters associated with the following names are + required for each port, if any one is omitted, the trimming data + of the port will not be set at all. + "rterm", "sel_t", "hs_i" - Each cell name for phy parameters + +Refer to phy/phy-bindings.txt for the generic PHY binding properties. + +Example: + + usb-glue@65b00000 { + compatible = "socionext,uniphier-ld20-dwc3-glue", + "simple-mfd"; + #address-cells = <1>; + #size-cells = <1>; + ranges = <0 0x65b00000 0x400>; + + usb_vbus0: regulator { + ... + }; + + usb_hsphy0: hs-phy@200 { + compatible = "socionext,uniphier-ld20-usb3-hsphy"; + reg = <0x200 0x10>; + #phy-cells = <0>; + clock-names = "link", "phy"; + clocks = <&sys_clk 14>, <&sys_clk 16>; + reset-names = "link", "phy"; + resets = <&sys_rst 14>, <&sys_rst 16>; + vbus-supply = <&usb_vbus0>; + nvmem-cell-names = "rterm", "sel_t", "hs_i"; + nvmem-cells = <&usb_rterm0>, <&usb_sel_t0>, + <&usb_hs_i0>; + }; + ... + }; diff --git a/Documentation/devicetree/bindings/phy/uniphier-usb3-ssphy.txt b/Documentation/devicetree/bindings/phy/uniphier-usb3-ssphy.txt new file mode 100644 index 000000000000..490b815445e8 --- /dev/null +++ b/Documentation/devicetree/bindings/phy/uniphier-usb3-ssphy.txt @@ -0,0 +1,57 @@ +Socionext UniPhier USB3 Super-Speed (SS) PHY + +This describes the devicetree bindings for PHY interfaces built into +USB3 controller implemented on Socionext UniPhier SoCs. +Although the controller includes High-Speed PHY and Super-Speed PHY, +this describes about Super-Speed PHY. + +Required properties: +- compatible: Should contain one of the following: + "socionext,uniphier-pro4-usb3-ssphy" - for Pro4 SoC + "socionext,uniphier-pxs2-usb3-ssphy" - for PXs2 SoC + "socionext,uniphier-ld20-usb3-ssphy" - for LD20 SoC + "socionext,uniphier-pxs3-usb3-ssphy" - for PXs3 SoC +- reg: Specifies offset and length of the register set for the device. +- #phy-cells: Should be 0. +- clocks: A list of phandles to the clock gate for USB3 glue layer. + According to the clock-names, appropriate clocks are required. +- clock-names: + "gio", "link" - for Pro4 SoC + "phy", "phy-ext", "link" - for PXs3 SoC, "phy-ext" is optional. + "phy", "link" - for others +- resets: A list of phandles to the reset control for USB3 glue layer. + According to the reset-names, appropriate resets are required. +- reset-names: + "gio", "link" - for Pro4 SoC + "phy", "link" - for others + +Optional properties: +- vbus-supply: A phandle to the regulator for USB VBUS. + +Refer to phy/phy-bindings.txt for the generic PHY binding properties. + +Example: + + usb-glue@65b00000 { + compatible = "socionext,uniphier-ld20-dwc3-glue", + "simple-mfd"; + #address-cells = <1>; + #size-cells = <1>; + ranges = <0 0x65b00000 0x400>; + + usb_vbus0: regulator { + ... + }; + + usb_ssphy0: ss-phy@300 { + compatible = "socionext,uniphier-ld20-usb3-ssphy"; + reg = <0x300 0x10>; + #phy-cells = <0>; + clock-names = "link", "phy"; + clocks = <&sys_clk 14>, <&sys_clk 16>; + reset-names = "link", "phy"; + resets = <&sys_rst 14>, <&sys_rst 16>; + vbus-supply = <&usb_vbus0>; + }; + ... + }; diff --git a/Documentation/devicetree/bindings/power/reset/qcom,pon.txt b/Documentation/devicetree/bindings/power/reset/qcom,pon.txt index 651491bb63b7..5705f575862d 100644 --- a/Documentation/devicetree/bindings/power/reset/qcom,pon.txt +++ b/Documentation/devicetree/bindings/power/reset/qcom,pon.txt @@ -6,7 +6,10 @@ and resin along with the Android reboot-mode. This DT node has pwrkey and resin as sub nodes. Required Properties: --compatible: "qcom,pm8916-pon" +-compatible: Must be one of: + "qcom,pm8916-pon" + "qcom,pms405-pon" + -reg: Specifies the physical address of the pon register Optional subnode: diff --git a/Documentation/devicetree/bindings/power/supply/bq25890.txt b/Documentation/devicetree/bindings/power/supply/bq25890.txt index c9dd17d142ad..dc0568933359 100644 --- a/Documentation/devicetree/bindings/power/supply/bq25890.txt +++ b/Documentation/devicetree/bindings/power/supply/bq25890.txt @@ -1,5 +1,8 @@ Binding for TI bq25890 Li-Ion Charger +This driver will support the bq25896 and the bq25890. There are other ICs +in the same family but those have not been tested. + Required properties: - compatible: Should contain one of the following: * "ti,bq25890" diff --git a/Documentation/devicetree/bindings/power/supply/bq27xxx.txt b/Documentation/devicetree/bindings/power/supply/bq27xxx.txt index 37994fdb18ca..4fa8e08df2b6 100644 --- a/Documentation/devicetree/bindings/power/supply/bq27xxx.txt +++ b/Documentation/devicetree/bindings/power/supply/bq27xxx.txt @@ -23,6 +23,7 @@ Required properties: * "ti,bq27546" - BQ27546 * "ti,bq27742" - BQ27742 * "ti,bq27545" - BQ27545 + * "ti,bq27411" - BQ27411 * "ti,bq27421" - BQ27421 * "ti,bq27425" - BQ27425 * "ti,bq27426" - BQ27426 diff --git a/Documentation/devicetree/bindings/power/supply/sc2731_charger.txt b/Documentation/devicetree/bindings/power/supply/sc2731_charger.txt new file mode 100644 index 000000000000..5266fab16575 --- /dev/null +++ b/Documentation/devicetree/bindings/power/supply/sc2731_charger.txt @@ -0,0 +1,40 @@ +Spreadtrum SC2731 PMIC battery charger binding + +Required properties: + - compatible: Should be "sprd,sc2731-charger". + - reg: Address offset of charger register. + - phys: Contains a phandle to the USB phy. + +Optional Properties: +- monitored-battery: phandle of battery characteristics devicetree node. + The charger uses the following battery properties: +- charge-term-current-microamp: current for charge termination phase. +- constant-charge-voltage-max-microvolt: maximum constant input voltage. + See Documentation/devicetree/bindings/power/supply/battery.txt + +Example: + + bat: battery { + compatible = "simple-battery"; + charge-term-current-microamp = <120000>; + constant-charge-voltage-max-microvolt = <4350000>; + ...... + }; + + sc2731_pmic: pmic@0 { + compatible = "sprd,sc2731"; + reg = <0>; + spi-max-frequency = <26000000>; + interrupts = <GIC_SPI 31 IRQ_TYPE_LEVEL_HIGH>; + interrupt-controller; + #interrupt-cells = <2>; + #address-cells = <1>; + #size-cells = <0>; + + charger@0 { + compatible = "sprd,sc2731-charger"; + reg = <0x0>; + phys = <&ssphy>; + monitored-battery = <&bat>; + }; + }; diff --git a/Documentation/devicetree/bindings/reset/fsl,imx7-src.txt b/Documentation/devicetree/bindings/reset/fsl,imx7-src.txt index 5e1afc3d8480..1ab1d109318e 100644 --- a/Documentation/devicetree/bindings/reset/fsl,imx7-src.txt +++ b/Documentation/devicetree/bindings/reset/fsl,imx7-src.txt @@ -5,7 +5,7 @@ Please also refer to reset.txt in this directory for common reset controller binding usage. Required properties: -- compatible: Should be "fsl,imx7-src", "syscon" +- compatible: Should be "fsl,imx7d-src", "syscon" - reg: should be register base and length as documented in the datasheet - interrupts: Should contain SRC interrupt diff --git a/Documentation/devicetree/bindings/soc/fsl/cpm_qe/network.txt b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/network.txt index 03c741602c6d..6d2dd8a31482 100644 --- a/Documentation/devicetree/bindings/soc/fsl/cpm_qe/network.txt +++ b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/network.txt @@ -98,6 +98,12 @@ The property below is dependent on fsl,tdm-interface: usage: optional for tdm interface value type: <empty> Definition : Internal loopback connecting on TDM layer. +- fsl,hmask + usage: optional + Value type: <u16> + Definition: HDLC address recognition. Set to zero to disable + address filtering of packets: + fsl,hmask = /bits/ 16 <0x0000>; Example for tdm interface: diff --git a/Documentation/devicetree/bindings/sound/adi,adau1977.txt b/Documentation/devicetree/bindings/sound/adi,adau1977.txt new file mode 100644 index 000000000000..e79aeef73f28 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/adi,adau1977.txt @@ -0,0 +1,54 @@ +Analog Devices ADAU1977/ADAU1978/ADAU1979 + +Datasheets: +http://www.analog.com/media/en/technical-documentation/data-sheets/ADAU1977.pdf +http://www.analog.com/media/en/technical-documentation/data-sheets/ADAU1978.pdf +http://www.analog.com/media/en/technical-documentation/data-sheets/ADAU1979.pdf + +This driver supports both the I2C and SPI bus. + +Required properties: + - compatible: Should contain one of the following: + "adi,adau1977" + "adi,adau1978" + "adi,adau1979" + + - AVDD-supply: analog power supply for the device, please consult + Documentation/devicetree/bindings/regulator/regulator.txt + +Optional properties: + - reset-gpio: the reset pin for the chip, for more details consult + Documentation/devicetree/bindings/gpio/gpio.txt + + - DVDD-supply: supply voltage for the digital core, please consult + Documentation/devicetree/bindings/regulator/regulator.txt + +For required properties on SPI, please consult +Documentation/devicetree/bindings/spi/spi-bus.txt + +Required properties on I2C: + + - reg: The i2c address. Value depends on the state of ADDR0 + and ADDR1, as wired in hardware. + +Examples: + + adau1977_spi: adau1977@0 { + compatible = "adi,adau1977"; + spi-max-frequency = <600000>; + + AVDD-supply = <®ulator>; + DVDD-supply = <®ulator_digital>; + + reset_gpio = <&gpio 10 GPIO_ACTIVE_LOW>; + }; + + adau1977_i2c: adau1977@11 { + compatible = "adi,adau1977"; + reg = <0x11>; + + AVDD-supply = <®ulator>; + DVDD-supply = <®ulator_digital>; + + reset_gpio = <&gpio 10 GPIO_ACTIVE_LOW>; + }; diff --git a/Documentation/devicetree/bindings/sound/amlogic,axg-pdm.txt b/Documentation/devicetree/bindings/sound/amlogic,axg-pdm.txt new file mode 100644 index 000000000000..5672d0bc5b16 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/amlogic,axg-pdm.txt @@ -0,0 +1,24 @@ +* Amlogic Audio PDM input + +Required properties: +- compatible: 'amlogic,axg-pdm' +- reg: physical base address of the controller and length of memory + mapped region. +- clocks: list of clock phandle, one for each entry clock-names. +- clock-names: should contain the following: + * "pclk" : peripheral clock. + * "dclk" : pdm digital clock + * "sysclk" : dsp system clock +- #sound-dai-cells: must be 0. + +Example of PDM on the A113 SoC: + +pdm: audio-controller@ff632000 { + compatible = "amlogic,axg-pdm"; + reg = <0x0 0xff632000 0x0 0x34>; + #sound-dai-cells = <0>; + clocks = <&clkc_audio AUD_CLKID_PDM>, + <&clkc_audio AUD_CLKID_PDM_DCLK>, + <&clkc_audio AUD_CLKID_PDM_SYSCLK>; + clock-names = "pclk", "dclk", "sysclk"; +}; diff --git a/Documentation/devicetree/bindings/sound/cs42l51.txt b/Documentation/devicetree/bindings/sound/cs42l51.txt new file mode 100644 index 000000000000..4b5de33ce377 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/cs42l51.txt @@ -0,0 +1,17 @@ +CS42L51 audio CODEC + +Optional properties: + + - clocks : a list of phandles + clock-specifiers, one for each entry in + clock-names + + - clock-names : must contain "MCLK" + +Example: + +cs42l51: cs42l51@4a { + compatible = "cirrus,cs42l51"; + reg = <0x4a>; + clocks = <&mclk_prov>; + clock-names = "MCLK"; +}; diff --git a/Documentation/devicetree/bindings/sound/maxim,max98088.txt b/Documentation/devicetree/bindings/sound/maxim,max98088.txt new file mode 100644 index 000000000000..da764d913319 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/maxim,max98088.txt @@ -0,0 +1,23 @@ +MAX98088 audio CODEC + +This device supports I2C only. + +Required properties: + +- compatible: "maxim,max98088" or "maxim,max98089". +- reg: The I2C address of the device. + +Optional properties: + +- clocks: the clock provider of MCLK, see ../clock/clock-bindings.txt section + "consumer" for more information. +- clock-names: must be set to "mclk" + +Example: + +max98089: codec@10 { + compatible = "maxim,max98089"; + reg = <0x10>; + clocks = <&clks IMX6QDL_CLK_CKO2>; + clock-names = "mclk"; +}; diff --git a/Documentation/devicetree/bindings/sound/mikroe,mikroe-proto.txt b/Documentation/devicetree/bindings/sound/mikroe,mikroe-proto.txt new file mode 100644 index 000000000000..912f8fae11c5 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/mikroe,mikroe-proto.txt @@ -0,0 +1,23 @@ +Mikroe-PROTO audio board + +Required properties: + - compatible: "mikroe,mikroe-proto" + - dai-format: Must be "i2s". + - i2s-controller: The phandle of the I2S controller. + - audio-codec: The phandle of the WM8731 audio codec. +Optional properties: + - model: The user-visible name of this sound complex. + - bitclock-master: Indicates dai-link bit clock master; for details see simple-card.txt (1). + - frame-master: Indicates dai-link frame master; for details see simple-card.txt (1). + +(1) : There must be the same master for both bit and frame clocks. + +Example: + sound { + compatible = "mikroe,mikroe-proto"; + model = "wm8731 @ sama5d2_xplained"; + i2s-controller = <&i2s0>; + audio-codec = <&wm8731>; + dai-format = "i2s"; + }; +}; diff --git a/Documentation/devicetree/bindings/sound/nau8822.txt b/Documentation/devicetree/bindings/sound/nau8822.txt new file mode 100644 index 000000000000..a471d162d4e5 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/nau8822.txt @@ -0,0 +1,16 @@ +NAU8822 audio CODEC + +This device supports I2C only. + +Required properties: + + - compatible : "nuvoton,nau8822" + + - reg : the I2C address of the device. + +Example: + +codec: nau8822@1a { + compatible = "nuvoton,nau8822"; + reg = <0x1a>; +}; diff --git a/Documentation/devicetree/bindings/sound/pcm3060.txt b/Documentation/devicetree/bindings/sound/pcm3060.txt new file mode 100644 index 000000000000..90fcb8523099 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/pcm3060.txt @@ -0,0 +1,17 @@ +PCM3060 audio CODEC + +This driver supports both I2C and SPI. + +Required properties: + +- compatible: "ti,pcm3060" + +- reg : the I2C address of the device for I2C, the chip select + number for SPI. + +Examples: + + pcm3060: pcm3060@46 { + compatible = "ti,pcm3060"; + reg = <0x46>; + }; diff --git a/Documentation/devicetree/bindings/sound/qcom,q6afe.txt b/Documentation/devicetree/bindings/sound/qcom,q6afe.txt index a8179409c194..d74888b9f1bb 100644 --- a/Documentation/devicetree/bindings/sound/qcom,q6afe.txt +++ b/Documentation/devicetree/bindings/sound/qcom,q6afe.txt @@ -49,7 +49,7 @@ configuration of each dai. Must contain the following properties. Usage: required for mi2s interface Value type: <prop-encoded-array> Definition: Must be list of serial data lines used by this dai. - should be one or more of the 1-4 sd lines. + should be one or more of the 0-3 sd lines. - qcom,tdm-sync-mode: Usage: required for tdm interface @@ -137,42 +137,42 @@ q6afe@4 { prim-mi2s-rx@16 { reg = <16>; - qcom,sd-lines = <1 3>; + qcom,sd-lines = <0 2>; }; prim-mi2s-tx@17 { reg = <17>; - qcom,sd-lines = <2>; + qcom,sd-lines = <1>; }; sec-mi2s-rx@18 { reg = <18>; - qcom,sd-lines = <1 4>; + qcom,sd-lines = <0 3>; }; sec-mi2s-tx@19 { reg = <19>; - qcom,sd-lines = <2>; + qcom,sd-lines = <1>; }; tert-mi2s-rx@20 { reg = <20>; - qcom,sd-lines = <2 4>; + qcom,sd-lines = <1 3>; }; tert-mi2s-tx@21 { reg = <21>; - qcom,sd-lines = <1>; + qcom,sd-lines = <0>; }; quat-mi2s-rx@22 { reg = <22>; - qcom,sd-lines = <1>; + qcom,sd-lines = <0>; }; quat-mi2s-tx@23 { reg = <23>; - qcom,sd-lines = <2>; + qcom,sd-lines = <1>; }; }; }; diff --git a/Documentation/devicetree/bindings/sound/renesas,rsnd.txt b/Documentation/devicetree/bindings/sound/renesas,rsnd.txt index 9e764270c36b..d92b705e7917 100644 --- a/Documentation/devicetree/bindings/sound/renesas,rsnd.txt +++ b/Documentation/devicetree/bindings/sound/renesas,rsnd.txt @@ -340,10 +340,12 @@ Required properties: - compatible : "renesas,rcar_sound-<soctype>", fallbacks "renesas,rcar_sound-gen1" if generation1, and "renesas,rcar_sound-gen2" if generation2 (or RZ/G1) - "renesas,rcar_sound-gen3" if generation3 + "renesas,rcar_sound-gen3" if generation3 (or RZ/G2) Examples with soctypes are: - "renesas,rcar_sound-r8a7743" (RZ/G1M) + - "renesas,rcar_sound-r8a7744" (RZ/G1N) - "renesas,rcar_sound-r8a7745" (RZ/G1E) + - "renesas,rcar_sound-r8a774a1" (RZ/G2M) - "renesas,rcar_sound-r8a7778" (R-Car M1A) - "renesas,rcar_sound-r8a7779" (R-Car H1) - "renesas,rcar_sound-r8a7790" (R-Car H2) @@ -353,6 +355,7 @@ Required properties: - "renesas,rcar_sound-r8a7795" (R-Car H3) - "renesas,rcar_sound-r8a7796" (R-Car M3-W) - "renesas,rcar_sound-r8a77965" (R-Car M3-N) + - "renesas,rcar_sound-r8a77990" (R-Car E3) - reg : Should contain the register physical address. required register is SRU/ADG/SSI if generation1 diff --git a/Documentation/devicetree/bindings/sound/st,sta32x.txt b/Documentation/devicetree/bindings/sound/st,sta32x.txt index 255de3ae5b2f..52265fb757c5 100644 --- a/Documentation/devicetree/bindings/sound/st,sta32x.txt +++ b/Documentation/devicetree/bindings/sound/st,sta32x.txt @@ -19,6 +19,10 @@ Required properties: Optional properties: + - clocks, clock-names: Clock specifier for XTI input clock. + If specified, the clock will be enabled when the codec is probed, + and disabled when it is removed. The 'clock-names' must be set to 'xti'. + - st,output-conf: number, Selects the output configuration: 0: 2-channel (full-bridge) power, 2-channel data-out 1: 2 (half-bridge). 1 (full-bridge) on-board power @@ -39,6 +43,9 @@ Optional properties: - st,thermal-warning-recover: If present, thermal warning recovery is enabled. + - st,fault-detect-recovery: + If present, fault detect recovery is enabled. + - st,thermal-warning-adjustment: If present, thermal warning adjustment is enabled. @@ -76,6 +83,8 @@ Example: codec: sta32x@38 { compatible = "st,sta32x"; reg = <0x1c>; + clocks = <&clock>; + clock-names = "xti"; reset-gpios = <&gpio1 19 0>; power-down-gpios = <&gpio1 16 0>; st,output-conf = /bits/ 8 <0x3>; // set output to 2-channel diff --git a/Documentation/devicetree/bindings/sound/st,stm32-sai.txt b/Documentation/devicetree/bindings/sound/st,stm32-sai.txt index 3a3fc506e43a..3f4467ff0aa2 100644 --- a/Documentation/devicetree/bindings/sound/st,stm32-sai.txt +++ b/Documentation/devicetree/bindings/sound/st,stm32-sai.txt @@ -31,7 +31,11 @@ SAI subnodes required properties: - reg: Base address and size of SAI sub-block register set. - clocks: Must contain one phandle and clock specifier pair for sai_ck which feeds the internal clock generator. + If the SAI shares a master clock, with another SAI set as MCLK + clock provider, SAI provider phandle must be specified here. - clock-names: Must contain "sai_ck". + Must also contain "MCLK", if SAI shares a master clock, + with a SAI set as MCLK clock provider. - dmas: see Documentation/devicetree/bindings/dma/stm32-dma.txt - dma-names: identifier string for each DMA request line "tx": if sai sub-block is configured as playback DAI @@ -51,6 +55,9 @@ SAI subnodes Optional properties: configured according to protocol defined in related DAI link node, such as i2s, left justified, right justified, dsp and pdm protocols. Note: ac97 protocol is not supported by SAI driver + - #clock-cells: should be 0. This property must be present if the SAI device + is a master clock provider, according to clocks bindings, described in + Documentation/devicetree/bindings/clock/clock-bindings.txt. The device node should contain one 'port' child node with one child 'endpoint' node, according to the bindings defined in Documentation/devicetree/bindings/ diff --git a/Documentation/devicetree/bindings/sound/sun4i-i2s.txt b/Documentation/devicetree/bindings/sound/sun4i-i2s.txt index b9d50d6cdef3..61e71c1729e0 100644 --- a/Documentation/devicetree/bindings/sound/sun4i-i2s.txt +++ b/Documentation/devicetree/bindings/sound/sun4i-i2s.txt @@ -10,6 +10,7 @@ Required properties: - "allwinner,sun6i-a31-i2s" - "allwinner,sun8i-a83t-i2s" - "allwinner,sun8i-h3-i2s" + - "allwinner,sun50i-a64-codec-i2s" - reg: physical base address of the controller and length of memory mapped region. - interrupts: should contain the I2S interrupt. @@ -26,6 +27,7 @@ Required properties for the following compatibles: - "allwinner,sun6i-a31-i2s" - "allwinner,sun8i-a83t-i2s" - "allwinner,sun8i-h3-i2s" + - "allwinner,sun50i-a64-codec-i2s" - resets: phandle to the reset line for this codec Example: diff --git a/Documentation/devicetree/bindings/sound/sun50i-codec-analog.txt b/Documentation/devicetree/bindings/sound/sun50i-codec-analog.txt new file mode 100644 index 000000000000..4f8ad0e04d20 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/sun50i-codec-analog.txt @@ -0,0 +1,12 @@ +* Allwinner A64 Codec Analog Controls + +Required properties: +- compatible: must be one of the following compatibles: + - "allwinner,sun50i-a64-codec-analog" +- reg: must contain the registers location and length + +Example: + codec_analog: codec-analog@1f015c0 { + compatible = "allwinner,sun50i-a64-codec-analog"; + reg = <0x01f015c0 0x4>; + }; diff --git a/Documentation/devicetree/bindings/sound/ts3a227e.txt b/Documentation/devicetree/bindings/sound/ts3a227e.txt index 3ed8359144d3..21ab45bc7e8f 100644 --- a/Documentation/devicetree/bindings/sound/ts3a227e.txt +++ b/Documentation/devicetree/bindings/sound/ts3a227e.txt @@ -14,7 +14,7 @@ Required properties: Optional properies: - ti,micbias: Intended MICBIAS voltage (datasheet section 9.6.7). - Select 0/1/2/3/4/5/6/7 to specify MACBIAS voltage + Select 0/1/2/3/4/5/6/7 to specify MICBIAS voltage 2.1V/2.2V/2.3V/2.4V/2.5V/2.6V/2.7V/2.8V Default value is "1" (2.2V). diff --git a/Documentation/devicetree/bindings/sound/wm8782.txt b/Documentation/devicetree/bindings/sound/wm8782.txt new file mode 100644 index 000000000000..256cdec6ec4d --- /dev/null +++ b/Documentation/devicetree/bindings/sound/wm8782.txt @@ -0,0 +1,17 @@ +WM8782 stereo ADC + +This device does not have any control interface or reset pins. + +Required properties: + + - compatible : "wlf,wm8782" + - Vdda-supply : phandle to a regulator for the analog power supply (2.7V - 5.5V) + - Vdd-supply : phandle to a regulator for the digital power supply (2.7V - 3.6V) + +Example: + +wm8782: stereo-adc { + compatible = "wlf,wm8782"; + Vdda-supply = <&vdda_supply>; + Vdd-supply = <&vdd_supply>; +}; diff --git a/Documentation/devicetree/bindings/thermal/qcom-spmi-temp-alarm.txt b/Documentation/devicetree/bindings/thermal/qcom-spmi-temp-alarm.txt index 290ec06fa33a..0273a92a2a84 100644 --- a/Documentation/devicetree/bindings/thermal/qcom-spmi-temp-alarm.txt +++ b/Documentation/devicetree/bindings/thermal/qcom-spmi-temp-alarm.txt @@ -6,8 +6,7 @@ interrupt signal and status register to identify high PMIC die temperature. Required properties: - compatible: Should contain "qcom,spmi-temp-alarm". -- reg: Specifies the SPMI address and length of the controller's - registers. +- reg: Specifies the SPMI address. - interrupts: PMIC temperature alarm interrupt. - #thermal-sensor-cells: Should be 0. See thermal.txt for a description. @@ -20,7 +19,7 @@ Example: pm8941_temp: thermal-alarm@2400 { compatible = "qcom,spmi-temp-alarm"; - reg = <0x2400 0x100>; + reg = <0x2400>; interrupts = <0 0x24 0 IRQ_TYPE_EDGE_RISING>; #thermal-sensor-cells = <0>; @@ -36,19 +35,14 @@ Example: thermal-sensors = <&pm8941_temp>; trips { - passive { - temperature = <1050000>; + stage1 { + temperature = <105000>; hysteresis = <2000>; type = "passive"; }; - alert { + stage2 { temperature = <125000>; hysteresis = <2000>; - type = "hot"; - }; - crit { - temperature = <145000>; - hysteresis = <2000>; type = "critical"; }; }; diff --git a/Documentation/devicetree/bindings/thermal/qoriq-thermal.txt b/Documentation/devicetree/bindings/thermal/qoriq-thermal.txt index 20ca4ef9d776..04cbb90a5d3e 100644 --- a/Documentation/devicetree/bindings/thermal/qoriq-thermal.txt +++ b/Documentation/devicetree/bindings/thermal/qoriq-thermal.txt @@ -1,9 +1,9 @@ * Thermal Monitoring Unit (TMU) on Freescale QorIQ SoCs Required properties: -- compatible : Must include "fsl,qoriq-tmu". The version of the device is - determined by the TMU IP Block Revision Register (IPBRR0) at - offset 0x0BF8. +- compatible : Must include "fsl,qoriq-tmu" or "fsl,imx8mq-tmu". The + version of the device is determined by the TMU IP Block Revision + Register (IPBRR0) at offset 0x0BF8. Table of correspondences between IPBRR0 values and example chips: Value Device ---------- ----- diff --git a/Documentation/devicetree/bindings/thermal/rcar-gen3-thermal.txt b/Documentation/devicetree/bindings/thermal/rcar-gen3-thermal.txt index cfa154bb0fa7..ad9a435afef4 100644 --- a/Documentation/devicetree/bindings/thermal/rcar-gen3-thermal.txt +++ b/Documentation/devicetree/bindings/thermal/rcar-gen3-thermal.txt @@ -7,9 +7,11 @@ inside the LSI. Required properties: - compatible : "renesas,<soctype>-thermal", Examples with soctypes are: + - "renesas,r8a774a1-thermal" (RZ/G2M) - "renesas,r8a7795-thermal" (R-Car H3) - "renesas,r8a7796-thermal" (R-Car M3-W) - "renesas,r8a77965-thermal" (R-Car M3-N) + - "renesas,r8a77980-thermal" (R-Car V3H) - reg : Address ranges of the thermal registers. Each sensor needs one address range. Sorting must be done in increasing order according to datasheet, i.e. @@ -19,7 +21,8 @@ Required properties: Optional properties: -- interrupts : interrupts routed to the TSC (3 for H3, M3-W and M3-N) +- interrupts : interrupts routed to the TSC (3 for H3, M3-W, M3-N, + and V3H) - power-domain : Must contain a reference to the power domain. This property is mandatory if the thermal sensor instance is part of a controllable power domain. diff --git a/Documentation/devicetree/bindings/thermal/rcar-thermal.txt b/Documentation/devicetree/bindings/thermal/rcar-thermal.txt index 67c563f1b4c4..73e1613d2cb0 100644 --- a/Documentation/devicetree/bindings/thermal/rcar-thermal.txt +++ b/Documentation/devicetree/bindings/thermal/rcar-thermal.txt @@ -4,15 +4,17 @@ Required properties: - compatible : "renesas,thermal-<soctype>", "renesas,rcar-gen2-thermal" (with thermal-zone) or "renesas,rcar-thermal" (without thermal-zone) as - fallback except R-Car D3. + fallback except R-Car V3M/D3. Examples with soctypes are: - "renesas,thermal-r8a73a4" (R-Mobile APE6) - "renesas,thermal-r8a7743" (RZ/G1M) + - "renesas,thermal-r8a7744" (RZ/G1N) - "renesas,thermal-r8a7779" (R-Car H1) - "renesas,thermal-r8a7790" (R-Car H2) - "renesas,thermal-r8a7791" (R-Car M2-W) - "renesas,thermal-r8a7792" (R-Car V2H) - "renesas,thermal-r8a7793" (R-Car M2-N) + - "renesas,thermal-r8a77970" (R-Car V3M) - "renesas,thermal-r8a77995" (R-Car D3) - reg : Address range of the thermal registers. The 1st reg will be recognized as common register @@ -21,7 +23,7 @@ Required properties: Option properties: - interrupts : If present should contain 3 interrupts for - R-Car D3 or 1 interrupt otherwise. + R-Car V3M/D3 or 1 interrupt otherwise. Example (non interrupt support): diff --git a/Documentation/devicetree/bindings/thermal/stm32-thermal.txt b/Documentation/devicetree/bindings/thermal/stm32-thermal.txt new file mode 100644 index 000000000000..8c0d5a4d8031 --- /dev/null +++ b/Documentation/devicetree/bindings/thermal/stm32-thermal.txt @@ -0,0 +1,61 @@ +Binding for Thermal Sensor for STMicroelectronics STM32 series of SoCs. + +On STM32 SoCs, the Digital Temperature Sensor (DTS) is in charge of managing an +analog block which delivers a frequency depending on the internal SoC's +temperature. By using a reference frequency, DTS is able to provide a sample +number which can be translated into a temperature by the user. + +DTS provides interrupt notification mechanism by threshold. This mechanism +offers two temperature trip points: passive and critical. The first is intended +for passive cooling notification while the second is used for over-temperature +reset. + +Required parameters: +------------------- + +compatible: Should be "st,stm32-thermal" +reg: This should be the physical base address and length of the + sensor's registers. +clocks: Phandle of the clock used by the thermal sensor. + See: Documentation/devicetree/bindings/clock/clock-bindings.txt +clock-names: Should be "pclk" for register access clock and reference clock. + See: Documentation/devicetree/bindings/resource-names.txt +#thermal-sensor-cells: Should be 0. See ./thermal.txt for a description. +interrupts: Standard way to define interrupt number. + +Example: + + thermal-zones { + cpu_thermal: cpu-thermal { + polling-delay-passive = <0>; + polling-delay = <0>; + + thermal-sensors = <&thermal>; + + trips { + cpu_alert1: cpu-alert1 { + temperature = <85000>; + hysteresis = <0>; + type = "passive"; + }; + + cpu-crit: cpu-crit { + temperature = <120000>; + hysteresis = <0>; + type = "critical"; + }; + }; + + cooling-maps { + }; + }; + }; + + thermal: thermal@50028000 { + compatible = "st,stm32-thermal"; + reg = <0x50028000 0x100>; + clocks = <&rcc TMPSENS>; + clock-names = "pclk"; + #thermal-sensor-cells = <0>; + interrupts = <GIC_SPI 147 IRQ_TYPE_LEVEL_HIGH>; + }; diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt index eb7ee91556a5..ca14ba959e0d 100644 --- a/Documentation/devicetree/bindings/thermal/thermal.txt +++ b/Documentation/devicetree/bindings/thermal/thermal.txt @@ -152,7 +152,7 @@ Optional property: Elem size: one cell the sensors listed in the thermal-sensors property. Elem type: signed Coefficients defaults to 1, in case this property is not specified. A simple linear polynomial is used: - Z = c0 * x0 + c1 + x1 + ... + c(n-1) * x(n-1) + cn. + Z = c0 * x0 + c1 * x1 + ... + c(n-1) * x(n-1) + cn. The coefficients are ordered and they match with sensors by means of sensor ID. Additional coefficients are diff --git a/Documentation/devicetree/bindings/timer/renesas,cmt.txt b/Documentation/devicetree/bindings/timer/renesas,cmt.txt index b40add2d9bb4..33992679a8bd 100644 --- a/Documentation/devicetree/bindings/timer/renesas,cmt.txt +++ b/Documentation/devicetree/bindings/timer/renesas,cmt.txt @@ -24,6 +24,8 @@ Required Properties: - "renesas,r8a73a4-cmt1" for the 48-bit CMT1 device included in r8a73a4. - "renesas,r8a7743-cmt0" for the 32-bit CMT0 device included in r8a7743. - "renesas,r8a7743-cmt1" for the 48-bit CMT1 device included in r8a7743. + - "renesas,r8a7744-cmt0" for the 32-bit CMT0 device included in r8a7744. + - "renesas,r8a7744-cmt1" for the 48-bit CMT1 device included in r8a7744. - "renesas,r8a7745-cmt0" for the 32-bit CMT0 device included in r8a7745. - "renesas,r8a7745-cmt1" for the 48-bit CMT1 device included in r8a7745. - "renesas,r8a7790-cmt0" for the 32-bit CMT0 device included in r8a7790. @@ -34,6 +36,10 @@ Required Properties: - "renesas,r8a7793-cmt1" for the 48-bit CMT1 device included in r8a7793. - "renesas,r8a7794-cmt0" for the 32-bit CMT0 device included in r8a7794. - "renesas,r8a7794-cmt1" for the 48-bit CMT1 device included in r8a7794. + - "renesas,r8a77970-cmt0" for the 32-bit CMT0 device included in r8a77970. + - "renesas,r8a77970-cmt1" for the 48-bit CMT1 device included in r8a77970. + - "renesas,r8a77980-cmt0" for the 32-bit CMT0 device included in r8a77980. + - "renesas,r8a77980-cmt1" for the 48-bit CMT1 device included in r8a77980. - "renesas,rcar-gen2-cmt0" for 32-bit CMT0 devices included in R-Car Gen2 and RZ/G1. @@ -41,6 +47,9 @@ Required Properties: and RZ/G1. These are fallbacks for r8a73a4, R-Car Gen2 and RZ/G1 entries listed above. + - "renesas,rcar-gen3-cmt0" for 32-bit CMT0 devices included in R-Car Gen3. + - "renesas,rcar-gen3-cmt1" for 48-bit CMT1 devices included in R-Car Gen3. + These are fallbacks for R-Car Gen3 entries listed above. - reg: base address and length of the registers block for the timer module. - interrupts: interrupt-specifier for the timer, one per channel. diff --git a/Documentation/devicetree/bindings/timer/renesas,ostm.txt b/Documentation/devicetree/bindings/timer/renesas,ostm.txt index be3ae0fdf775..81a78f8bcf17 100644 --- a/Documentation/devicetree/bindings/timer/renesas,ostm.txt +++ b/Documentation/devicetree/bindings/timer/renesas,ostm.txt @@ -9,7 +9,8 @@ Channels are independent from each other. Required Properties: - compatible: must be one or more of the following: - - "renesas,r7s72100-ostm" for the r7s72100 OSTM + - "renesas,r7s72100-ostm" for the R7S72100 (RZ/A1) OSTM + - "renesas,r7s9210-ostm" for the R7S9210 (RZ/A2) OSTM - "renesas,ostm" for any OSTM This is a fallback for the above renesas,*-ostm entries diff --git a/Documentation/devicetree/bindings/trivial-devices.txt b/Documentation/devicetree/bindings/trivial-devices.txt index 763a2808a95c..69c934aec13b 100644 --- a/Documentation/devicetree/bindings/trivial-devices.txt +++ b/Documentation/devicetree/bindings/trivial-devices.txt @@ -35,7 +35,6 @@ at,24c08 i2c serial eeprom (24cxx) atmel,at97sc3204t i2c trusted platform module (TPM) capella,cm32181 CM32181: Ambient Light Sensor capella,cm3232 CM3232: Ambient Light Sensor -cirrus,cs42l51 Cirrus Logic CS42L51 audio codec dallas,ds1374 I2C, 32-Bit Binary Counter Watchdog RTC with Trickle Charger and Reset Input/Output dallas,ds1631 High-Precision Digital Thermometer dallas,ds1672 Dallas DS1672 Real-time Clock diff --git a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt index 2e9318151df7..529e51879fb2 100644 --- a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt +++ b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt @@ -80,6 +80,8 @@ Optional properties: controller. It's expected that a mux state of 0 indicates device mode and a mux state of 1 indicates host mode. - mux-control-names: Shall be "usb_switch" if mux-controls is specified. +- pinctrl-names: Names for optional pin modes in "default", "host", "device" +- pinctrl-n: alternate pin modes i.mx specific properties - fsl,usbmisc: phandler of non-core register device, with one diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt b/Documentation/devicetree/bindings/usb/dwc3.txt index 3e4c38b806ac..636630fb92d7 100644 --- a/Documentation/devicetree/bindings/usb/dwc3.txt +++ b/Documentation/devicetree/bindings/usb/dwc3.txt @@ -19,6 +19,7 @@ Exception for clocks: "cavium,octeon-7130-usb-uctl" "qcom,dwc3" "samsung,exynos5250-dwusb3" + "samsung,exynos5433-dwusb3" "samsung,exynos7-dwusb3" "sprd,sc9860-dwc3" "st,stih407-dwc3" diff --git a/Documentation/devicetree/bindings/usb/ehci-mv.txt b/Documentation/devicetree/bindings/usb/ehci-mv.txt new file mode 100644 index 000000000000..335589895763 --- /dev/null +++ b/Documentation/devicetree/bindings/usb/ehci-mv.txt @@ -0,0 +1,23 @@ +* Marvell PXA/MMP EHCI controller. + +Required properties: + +- compatible: must be "marvell,pxau2o-ehci" +- reg: physical base addresses of the controller and length of memory mapped region +- interrupts: one EHCI controller interrupt should be described here +- clocks: phandle list of usb clocks +- clock-names: should be "USBCLK" +- phys: phandle for the PHY device +- phy-names: should be "usb" + +Example: + + ehci0: usb-ehci@d4208000 { + compatible = "marvell,pxau2o-ehci"; + reg = <0xd4208000 0x200>; + interrupts = <44>; + clocks = <&soc_clocks MMP2_CLK_USB>; + clock-names = "USBCLK"; + phys = <&usb_otg_phy>; + phy-names = "usb"; + }; diff --git a/Documentation/devicetree/bindings/usb/exynos-usb.txt b/Documentation/devicetree/bindings/usb/exynos-usb.txt index c97374315049..b7111f43fa59 100644 --- a/Documentation/devicetree/bindings/usb/exynos-usb.txt +++ b/Documentation/devicetree/bindings/usb/exynos-usb.txt @@ -83,6 +83,8 @@ Required properties: - compatible: should be one of the following - "samsung,exynos5250-dwusb3": for USB 3.0 DWC3 controller on Exynos5250/5420. + "samsung,exynos5433-dwusb3": for USB 3.0 DWC3 controller on + Exynos5433. "samsung,exynos7-dwusb3": for USB 3.0 DWC3 controller on Exynos7. - #address-cells, #size-cells : should be '1' if the device has sub-nodes with 'reg' property. diff --git a/Documentation/devicetree/bindings/usb/faraday,fotg210.txt b/Documentation/devicetree/bindings/usb/faraday,fotg210.txt new file mode 100644 index 000000000000..06a2286e2054 --- /dev/null +++ b/Documentation/devicetree/bindings/usb/faraday,fotg210.txt @@ -0,0 +1,35 @@ +Faraday FOTG Host controller + +This OTG-capable USB host controller is found in Cortina Systems +Gemini and other SoC products. + +Required properties: +- compatible: should be one of: + "faraday,fotg210" + "cortina,gemini-usb", "faraday,fotg210" +- reg: should contain one register range i.e. start and length +- interrupts: description of the interrupt line + +Optional properties: +- clocks: should contain the IP block clock +- clock-names: should be "PCLK" for the IP block clock + +Required properties for "cortina,gemini-usb" compatible: +- syscon: a phandle to the system controller to access PHY registers + +Optional properties for "cortina,gemini-usb" compatible: +- cortina,gemini-mini-b: boolean property that indicates that a Mini-B + OTG connector is in use +- wakeup-source: see power/wakeup-source.txt + +Example for Gemini: + +usb@68000000 { + compatible = "cortina,gemini-usb", "faraday,fotg210"; + reg = <0x68000000 0x1000>; + interrupts = <10 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&cc 12>; + clock-names = "PCLK"; + syscon = <&syscon>; + wakeup-source; +}; diff --git a/Documentation/devicetree/bindings/usb/fcs,fusb302.txt b/Documentation/devicetree/bindings/usb/fcs,fusb302.txt index 6087dc7f209e..a5d011d2efc8 100644 --- a/Documentation/devicetree/bindings/usb/fcs,fusb302.txt +++ b/Documentation/devicetree/bindings/usb/fcs,fusb302.txt @@ -5,10 +5,19 @@ Required properties : - reg : I2C slave address - interrupts : Interrupt specifier -Optional properties : -- fcs,operating-sink-microwatt : - Minimum amount of power accepted from a sink - when negotiating +Required sub-node: +- connector : The "usb-c-connector" attached to the FUSB302 IC. The bindings + of the connector node are specified in: + + Documentation/devicetree/bindings/connector/usb-connector.txt + +Deprecated properties : +- fcs,max-sink-microvolt : Maximum sink voltage accepted by port controller +- fcs,max-sink-microamp : Maximum sink current accepted by port controller +- fcs,max-sink-microwatt : Maximum sink power accepted by port controller +- fcs,operating-sink-microwatt : Minimum amount of power accepted from a sink + when negotiating + Example: @@ -17,7 +26,16 @@ fusb302: typec-portc@54 { reg = <0x54>; interrupt-parent = <&nmi_intc>; interrupts = <0 IRQ_TYPE_LEVEL_LOW>; - fcs,max-sink-microvolt = <12000000>; - fcs,max-sink-microamp = <3000000>; - fcs,max-sink-microwatt = <36000000>; + + usb_con: connector { + compatible = "usb-c-connector"; + label = "USB-C"; + power-role = "dual"; + try-power-role = "sink"; + source-pdos = <PDO_FIXED(5000, 3000, PDO_FIXED_USB_COMM)>; + sink-pdos = <PDO_FIXED(5000, 3000, PDO_FIXED_USB_COMM) + PDO_VAR(3000, 12000, 3000) + PDO_PPS_APDO(3000, 11000, 3000)>; + op-sink-microwatt = <10000000>; + }; }; diff --git a/Documentation/devicetree/bindings/usb/renesas_usb3.txt b/Documentation/devicetree/bindings/usb/renesas_usb3.txt index 2c071bb5801e..d366555166d0 100644 --- a/Documentation/devicetree/bindings/usb/renesas_usb3.txt +++ b/Documentation/devicetree/bindings/usb/renesas_usb3.txt @@ -2,11 +2,13 @@ Renesas Electronics USB3.0 Peripheral driver Required properties: - compatible: Must contain one of the following: + - "renesas,r8a774a1-usb3-peri" - "renesas,r8a7795-usb3-peri" - "renesas,r8a7796-usb3-peri" - "renesas,r8a77965-usb3-peri" - - "renesas,rcar-gen3-usb3-peri" for a generic R-Car Gen3 compatible - device + - "renesas,r8a77990-usb3-peri" + - "renesas,rcar-gen3-usb3-peri" for a generic R-Car Gen3 or RZ/G2 + compatible device When compatible with the generic version, nodes must list the SoC-specific version corresponding to the platform first diff --git a/Documentation/devicetree/bindings/usb/renesas_usbhs.txt b/Documentation/devicetree/bindings/usb/renesas_usbhs.txt index 43960faf5a88..90719f501852 100644 --- a/Documentation/devicetree/bindings/usb/renesas_usbhs.txt +++ b/Documentation/devicetree/bindings/usb/renesas_usbhs.txt @@ -4,7 +4,9 @@ Required properties: - compatible: Must contain one or more of the following: - "renesas,usbhs-r8a7743" for r8a7743 (RZ/G1M) compatible device + - "renesas,usbhs-r8a7744" for r8a7744 (RZ/G1N) compatible device - "renesas,usbhs-r8a7745" for r8a7745 (RZ/G1E) compatible device + - "renesas,usbhs-r8a774a1" for r8a774a1 (RZ/G2M) compatible device - "renesas,usbhs-r8a7790" for r8a7790 (R-Car H2) compatible device - "renesas,usbhs-r8a7791" for r8a7791 (R-Car M2-W) compatible device - "renesas,usbhs-r8a7792" for r8a7792 (R-Car V2H) compatible device @@ -13,10 +15,11 @@ Required properties: - "renesas,usbhs-r8a7795" for r8a7795 (R-Car H3) compatible device - "renesas,usbhs-r8a7796" for r8a7796 (R-Car M3-W) compatible device - "renesas,usbhs-r8a77965" for r8a77965 (R-Car M3-N) compatible device + - "renesas,usbhs-r8a77990" for r8a77990 (R-Car E3) compatible device - "renesas,usbhs-r8a77995" for r8a77995 (R-Car D3) compatible device - "renesas,usbhs-r7s72100" for r7s72100 (RZ/A1) compatible device - "renesas,rcar-gen2-usbhs" for R-Car Gen2 or RZ/G1 compatible devices - - "renesas,rcar-gen3-usbhs" for R-Car Gen3 compatible device + - "renesas,rcar-gen3-usbhs" for R-Car Gen3 or RZ/G2 compatible devices - "renesas,rza1-usbhs" for RZ/A1 compatible device When compatible with the generic version, nodes must list the @@ -25,7 +28,11 @@ Required properties: - reg: Base address and length of the register for the USBHS - interrupts: Interrupt specifier for the USBHS - - clocks: A list of phandle + clock specifier pairs + - clocks: A list of phandle + clock specifier pairs. + - In case of "renesas,rcar-gen3-usbhs", two clocks are required. + First clock should be peripheral and second one should be host. + - In case of except above, one clock is required. First clock + should be peripheral. Optional properties: - renesas,buswait: Integer to use BUSWAIT register diff --git a/Documentation/devicetree/bindings/usb/usb-ehci.txt b/Documentation/devicetree/bindings/usb/usb-ehci.txt index 0f1b75386207..406252d14c6b 100644 --- a/Documentation/devicetree/bindings/usb/usb-ehci.txt +++ b/Documentation/devicetree/bindings/usb/usb-ehci.txt @@ -15,7 +15,11 @@ Optional properties: - needs-reset-on-resume : boolean, set this to force EHCI reset after resume - has-transaction-translator : boolean, set this if EHCI have a Transaction Translator built into the root hub. - - clocks : a list of phandle + clock specifier pairs + - clocks : a list of phandle + clock specifier pairs. In case of Renesas + R-Car Gen3 SoCs: + - if a host only channel: first clock should be host. + - if a USB DRD channel: first clock should be host and second one + should be peripheral. - phys : see usb-hcd.txt in the current directory - resets : phandle + reset specifier pair diff --git a/Documentation/devicetree/bindings/usb/usb-ohci.txt b/Documentation/devicetree/bindings/usb/usb-ohci.txt index a8d2103d1f3d..aaaa5255c972 100644 --- a/Documentation/devicetree/bindings/usb/usb-ohci.txt +++ b/Documentation/devicetree/bindings/usb/usb-ohci.txt @@ -12,7 +12,11 @@ Optional properties: - no-big-frame-no : boolean, set if frame_no lives in bits [15:0] of HCCA - remote-wakeup-connected: remote wakeup is wired on the platform - num-ports : u32, to override the detected port count -- clocks : a list of phandle + clock specifier pairs +- clocks : a list of phandle + clock specifier pairs. In case of Renesas + R-Car Gen3 SoCs: + - if a host only channel: first clock should be host. + - if a USB DRD channel: first clock should be host and second one + should be peripheral. - phys : see usb-hcd.txt in the current directory - resets : a list of phandle + reset specifier pairs diff --git a/Documentation/devicetree/bindings/usb/usb-xhci.txt b/Documentation/devicetree/bindings/usb/usb-xhci.txt index ac4cd0d6195a..fea8b1545751 100644 --- a/Documentation/devicetree/bindings/usb/usb-xhci.txt +++ b/Documentation/devicetree/bindings/usb/usb-xhci.txt @@ -8,6 +8,8 @@ Required properties: - "marvell,armada-375-xhci" for Armada 375 SoCs - "marvell,armada-380-xhci" for Armada 38x SoCs - "renesas,xhci-r8a7743" for r8a7743 SoC + - "renesas,xhci-r8a7744" for r8a7744 SoC + - "renesas,xhci-r8a774a1" for r8a774a1 SoC - "renesas,xhci-r8a7790" for r8a7790 SoC - "renesas,xhci-r8a7791" for r8a7791 SoC - "renesas,xhci-r8a7793" for r8a7793 SoC @@ -17,7 +19,8 @@ Required properties: - "renesas,xhci-r8a77990" for r8a77990 SoC - "renesas,rcar-gen2-xhci" for a generic R-Car Gen2 or RZ/G1 compatible device - - "renesas,rcar-gen3-xhci" for a generic R-Car Gen3 compatible device + - "renesas,rcar-gen3-xhci" for a generic R-Car Gen3 or RZ/G2 compatible + device - "xhci-platform" (deprecated) When compatible with the generic version, nodes must list the diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index 2c3fc512e746..376f24484182 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -127,6 +127,7 @@ everspin Everspin Technologies, Inc. exar Exar Corporation excito Excito ezchip EZchip Semiconductor +facebook Facebook fairphone Fairphone B.V. faraday Faraday Technology Corporation fastrax Fastrax Oy @@ -235,6 +236,7 @@ micrel Micrel Inc. microchip Microchip Technology Inc. microcrystal Micro Crystal AG micron Micron Technology Inc. +mikroe MikroElektronika d.o.o. minix MINIX Technology Ltd. miramems MiraMEMS Sensing Technology Co., Ltd. mitsubishi Mitsubishi Electric Corporation @@ -274,6 +276,7 @@ nxp NXP Semiconductors okaya Okaya Electric America, Inc. oki Oki Electric Industry Co., Ltd. olimex OLIMEX Ltd. +olpc One Laptop Per Child onion Onion Corporation onnn ON Semiconductor Corp. ontat On Tat Industrial Company diff --git a/Documentation/devicetree/bindings/watchdog/renesas-wdt.txt b/Documentation/devicetree/bindings/watchdog/renesas-wdt.txt index 9407212a85a8..d72d1181ec62 100644 --- a/Documentation/devicetree/bindings/watchdog/renesas-wdt.txt +++ b/Documentation/devicetree/bindings/watchdog/renesas-wdt.txt @@ -6,6 +6,7 @@ Required properties: version. Examples with soctypes are: - "renesas,r8a7743-wdt" (RZ/G1M) + - "renesas,r8a7744-wdt" (RZ/G1N) - "renesas,r8a7745-wdt" (RZ/G1E) - "renesas,r8a774a1-wdt" (RZ/G2M) - "renesas,r8a7790-wdt" (R-Car H2) diff --git a/Documentation/driver-api/basics.rst b/Documentation/driver-api/basics.rst index 826e85d50a16..e970fadf4d1a 100644 --- a/Documentation/driver-api/basics.rst +++ b/Documentation/driver-api/basics.rst @@ -121,6 +121,9 @@ Kernel utility functions .. kernel-doc:: kernel/rcu/update.c :export: +.. kernel-doc:: include/linux/overflow.h + :internal: + Device Resource Management -------------------------- diff --git a/Documentation/driver-api/firewire.rst b/Documentation/driver-api/firewire.rst new file mode 100644 index 000000000000..94a2d7f01d99 --- /dev/null +++ b/Documentation/driver-api/firewire.rst @@ -0,0 +1,48 @@ +=========================================== +Firewire (IEEE 1394) driver Interface Guide +=========================================== + +Introduction and Overview +========================= + +The Linux FireWire subsystem adds some interfaces into the Linux system to + use/maintain+any resource on IEEE 1394 bus. + +The main purpose of these interfaces is to access address space on each node +on IEEE 1394 bus by ISO/IEC 13213 (IEEE 1212) procedure, and to control +isochronous resources on the bus by IEEE 1394 procedure. + +Two types of interfaces are added, according to consumers of the interface. A +set of userspace interfaces is available via `firewire character devices`. A set +of kernel interfaces is available via exported symbols in `firewire-core` module. + +Firewire char device data structures +==================================== + +.. include:: /ABI/stable/firewire-cdev + :literal: + +.. kernel-doc:: include/uapi/linux/firewire-cdev.h + :internal: + +Firewire device probing and sysfs interfaces +============================================ + +.. include:: /ABI/stable/sysfs-bus-firewire + :literal: + +.. kernel-doc:: drivers/firewire/core-device.c + :export: + +Firewire core transaction interfaces +==================================== + +.. kernel-doc:: drivers/firewire/core-transaction.c + :export: + +Firewire Isochronous I/O interfaces +=================================== + +.. kernel-doc:: drivers/firewire/core-iso.c + :export: + diff --git a/Documentation/driver-api/fpga/fpga-bridge.rst b/Documentation/driver-api/fpga/fpga-bridge.rst index 2c2aaca894bf..71c5a40da320 100644 --- a/Documentation/driver-api/fpga/fpga-bridge.rst +++ b/Documentation/driver-api/fpga/fpga-bridge.rst @@ -4,6 +4,12 @@ FPGA Bridge API to implement a new FPGA bridge ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +* struct :c:type:`fpga_bridge` — The FPGA Bridge structure +* struct :c:type:`fpga_bridge_ops` — Low level Bridge driver ops +* :c:func:`devm_fpga_bridge_create()` — Allocate and init a bridge struct +* :c:func:`fpga_bridge_register()` — Register a bridge +* :c:func:`fpga_bridge_unregister()` — Unregister a bridge + .. kernel-doc:: include/linux/fpga/fpga-bridge.h :functions: fpga_bridge @@ -11,39 +17,10 @@ API to implement a new FPGA bridge :functions: fpga_bridge_ops .. kernel-doc:: drivers/fpga/fpga-bridge.c - :functions: fpga_bridge_create - -.. kernel-doc:: drivers/fpga/fpga-bridge.c - :functions: fpga_bridge_free + :functions: devm_fpga_bridge_create .. kernel-doc:: drivers/fpga/fpga-bridge.c :functions: fpga_bridge_register .. kernel-doc:: drivers/fpga/fpga-bridge.c :functions: fpga_bridge_unregister - -API to control an FPGA bridge -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -You probably won't need these directly. FPGA regions should handle this. - -.. kernel-doc:: drivers/fpga/fpga-bridge.c - :functions: of_fpga_bridge_get - -.. kernel-doc:: drivers/fpga/fpga-bridge.c - :functions: fpga_bridge_get - -.. kernel-doc:: drivers/fpga/fpga-bridge.c - :functions: fpga_bridge_put - -.. kernel-doc:: drivers/fpga/fpga-bridge.c - :functions: fpga_bridge_get_to_list - -.. kernel-doc:: drivers/fpga/fpga-bridge.c - :functions: of_fpga_bridge_get_to_list - -.. kernel-doc:: drivers/fpga/fpga-bridge.c - :functions: fpga_bridge_enable - -.. kernel-doc:: drivers/fpga/fpga-bridge.c - :functions: fpga_bridge_disable diff --git a/Documentation/driver-api/fpga/fpga-mgr.rst b/Documentation/driver-api/fpga/fpga-mgr.rst index 82b6dbbd31cd..576f1945eacd 100644 --- a/Documentation/driver-api/fpga/fpga-mgr.rst +++ b/Documentation/driver-api/fpga/fpga-mgr.rst @@ -49,18 +49,14 @@ probe function calls fpga_mgr_register(), such as:: * them in priv */ - mgr = fpga_mgr_create(dev, "Altera SOCFPGA FPGA Manager", - &socfpga_fpga_ops, priv); + mgr = devm_fpga_mgr_create(dev, "Altera SOCFPGA FPGA Manager", + &socfpga_fpga_ops, priv); if (!mgr) return -ENOMEM; platform_set_drvdata(pdev, mgr); - ret = fpga_mgr_register(mgr); - if (ret) - fpga_mgr_free(mgr); - - return ret; + return fpga_mgr_register(mgr); } static int socfpga_fpga_remove(struct platform_device *pdev) @@ -102,67 +98,19 @@ The ops include a .state function which will determine the state the FPGA is in and return a code of type enum fpga_mgr_states. It doesn't result in a change in state. -How to write an image buffer to a supported FPGA ------------------------------------------------- - -Some sample code:: - - #include <linux/fpga/fpga-mgr.h> - - struct fpga_manager *mgr; - struct fpga_image_info *info; - int ret; - - /* - * Get a reference to FPGA manager. The manager is not locked, so you can - * hold onto this reference without it preventing programming. - * - * This example uses the device node of the manager. Alternatively, use - * fpga_mgr_get(dev) instead if you have the device. - */ - mgr = of_fpga_mgr_get(mgr_node); - - /* struct with information about the FPGA image to program. */ - info = fpga_image_info_alloc(dev); - - /* flags indicates whether to do full or partial reconfiguration */ - info->flags = FPGA_MGR_PARTIAL_RECONFIG; - - /* - * At this point, indicate where the image is. This is pseudo-code; you're - * going to use one of these three. - */ - if (image is in a scatter gather table) { - - info->sgt = [your scatter gather table] - - } else if (image is in a buffer) { - - info->buf = [your image buffer] - info->count = [image buffer size] - - } else if (image is in a firmware file) { - - info->firmware_name = devm_kstrdup(dev, firmware_name, GFP_KERNEL); - - } - - /* Get exclusive control of FPGA manager */ - ret = fpga_mgr_lock(mgr); - - /* Load the buffer to the FPGA */ - ret = fpga_mgr_buf_load(mgr, &info, buf, count); - - /* Release the FPGA manager */ - fpga_mgr_unlock(mgr); - fpga_mgr_put(mgr); - - /* Deallocate the image info if you're done with it */ - fpga_image_info_free(info); - API for implementing a new FPGA Manager driver ---------------------------------------------- +* ``fpga_mgr_states`` — Values for :c:member:`fpga_manager->state`. +* struct :c:type:`fpga_manager` — the FPGA manager struct +* struct :c:type:`fpga_manager_ops` — Low level FPGA manager driver ops +* :c:func:`devm_fpga_mgr_create` — Allocate and init a manager struct +* :c:func:`fpga_mgr_register` — Register an FPGA manager +* :c:func:`fpga_mgr_unregister` — Unregister an FPGA manager + +.. kernel-doc:: include/linux/fpga/fpga-mgr.h + :functions: fpga_mgr_states + .. kernel-doc:: include/linux/fpga/fpga-mgr.h :functions: fpga_manager @@ -170,56 +118,10 @@ API for implementing a new FPGA Manager driver :functions: fpga_manager_ops .. kernel-doc:: drivers/fpga/fpga-mgr.c - :functions: fpga_mgr_create - -.. kernel-doc:: drivers/fpga/fpga-mgr.c - :functions: fpga_mgr_free + :functions: devm_fpga_mgr_create .. kernel-doc:: drivers/fpga/fpga-mgr.c :functions: fpga_mgr_register .. kernel-doc:: drivers/fpga/fpga-mgr.c :functions: fpga_mgr_unregister - -API for programming an FPGA ---------------------------- - -FPGA Manager flags - -.. kernel-doc:: include/linux/fpga/fpga-mgr.h - :doc: FPGA Manager flags - -.. kernel-doc:: include/linux/fpga/fpga-mgr.h - :functions: fpga_image_info - -.. kernel-doc:: include/linux/fpga/fpga-mgr.h - :functions: fpga_mgr_states - -.. kernel-doc:: drivers/fpga/fpga-mgr.c - :functions: fpga_image_info_alloc - -.. kernel-doc:: drivers/fpga/fpga-mgr.c - :functions: fpga_image_info_free - -.. kernel-doc:: drivers/fpga/fpga-mgr.c - :functions: of_fpga_mgr_get - -.. kernel-doc:: drivers/fpga/fpga-mgr.c - :functions: fpga_mgr_get - -.. kernel-doc:: drivers/fpga/fpga-mgr.c - :functions: fpga_mgr_put - -.. kernel-doc:: drivers/fpga/fpga-mgr.c - :functions: fpga_mgr_lock - -.. kernel-doc:: drivers/fpga/fpga-mgr.c - :functions: fpga_mgr_unlock - -.. kernel-doc:: include/linux/fpga/fpga-mgr.h - :functions: fpga_mgr_states - -Note - use :c:func:`fpga_region_program_fpga()` instead of :c:func:`fpga_mgr_load()` - -.. kernel-doc:: drivers/fpga/fpga-mgr.c - :functions: fpga_mgr_load diff --git a/Documentation/driver-api/fpga/fpga-programming.rst b/Documentation/driver-api/fpga/fpga-programming.rst new file mode 100644 index 000000000000..b5484df6ff0f --- /dev/null +++ b/Documentation/driver-api/fpga/fpga-programming.rst @@ -0,0 +1,107 @@ +In-kernel API for FPGA Programming +================================== + +Overview +-------- + +The in-kernel API for FPGA programming is a combination of APIs from +FPGA manager, bridge, and regions. The actual function used to +trigger FPGA programming is :c:func:`fpga_region_program_fpga()`. + +:c:func:`fpga_region_program_fpga()` uses functionality supplied by +the FPGA manager and bridges. It will: + + * lock the region's mutex + * lock the mutex of the region's FPGA manager + * build a list of FPGA bridges if a method has been specified to do so + * disable the bridges + * program the FPGA using info passed in :c:member:`fpga_region->info`. + * re-enable the bridges + * release the locks + +The struct fpga_image_info specifies what FPGA image to program. It is +allocated/freed by :c:func:`fpga_image_info_alloc()` and freed with +:c:func:`fpga_image_info_free()` + +How to program an FPGA using a region +------------------------------------- + +When the FPGA region driver probed, it was given a pointer to an FPGA manager +driver so it knows which manager to use. The region also either has a list of +bridges to control during programming or it has a pointer to a function that +will generate that list. Here's some sample code of what to do next:: + + #include <linux/fpga/fpga-mgr.h> + #include <linux/fpga/fpga-region.h> + + struct fpga_image_info *info; + int ret; + + /* + * First, alloc the struct with information about the FPGA image to + * program. + */ + info = fpga_image_info_alloc(dev); + if (!info) + return -ENOMEM; + + /* Set flags as needed, such as: */ + info->flags = FPGA_MGR_PARTIAL_RECONFIG; + + /* + * Indicate where the FPGA image is. This is pseudo-code; you're + * going to use one of these three. + */ + if (image is in a scatter gather table) { + + info->sgt = [your scatter gather table] + + } else if (image is in a buffer) { + + info->buf = [your image buffer] + info->count = [image buffer size] + + } else if (image is in a firmware file) { + + info->firmware_name = devm_kstrdup(dev, firmware_name, + GFP_KERNEL); + + } + + /* Add info to region and do the programming */ + region->info = info; + ret = fpga_region_program_fpga(region); + + /* Deallocate the image info if you're done with it */ + region->info = NULL; + fpga_image_info_free(info); + + if (ret) + return ret; + + /* Now enumerate whatever hardware has appeared in the FPGA. */ + +API for programming an FPGA +--------------------------- + +* :c:func:`fpga_region_program_fpga` — Program an FPGA +* :c:type:`fpga_image_info` — Specifies what FPGA image to program +* :c:func:`fpga_image_info_alloc()` — Allocate an FPGA image info struct +* :c:func:`fpga_image_info_free()` — Free an FPGA image info struct + +.. kernel-doc:: drivers/fpga/fpga-region.c + :functions: fpga_region_program_fpga + +FPGA Manager flags + +.. kernel-doc:: include/linux/fpga/fpga-mgr.h + :doc: FPGA Manager flags + +.. kernel-doc:: include/linux/fpga/fpga-mgr.h + :functions: fpga_image_info + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: fpga_image_info_alloc + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: fpga_image_info_free diff --git a/Documentation/driver-api/fpga/fpga-region.rst b/Documentation/driver-api/fpga/fpga-region.rst index f30333ce828e..0529b2d2231a 100644 --- a/Documentation/driver-api/fpga/fpga-region.rst +++ b/Documentation/driver-api/fpga/fpga-region.rst @@ -34,41 +34,6 @@ fpga_image_info including: * flags indicating specifics such as whether the image is for partial reconfiguration. -How to program an FPGA using a region -------------------------------------- - -First, allocate the info struct:: - - info = fpga_image_info_alloc(dev); - if (!info) - return -ENOMEM; - -Set flags as needed, i.e.:: - - info->flags |= FPGA_MGR_PARTIAL_RECONFIG; - -Point to your FPGA image, such as:: - - info->sgt = &sgt; - -Add info to region and do the programming:: - - region->info = info; - ret = fpga_region_program_fpga(region); - -:c:func:`fpga_region_program_fpga()` operates on info passed in the -fpga_image_info (region->info). This function will attempt to: - - * lock the region's mutex - * lock the region's FPGA manager - * build a list of FPGA bridges if a method has been specified to do so - * disable the bridges - * program the FPGA - * re-enable the bridges - * release the locks - -Then you will want to enumerate whatever hardware has appeared in the FPGA. - How to add a new FPGA region ---------------------------- @@ -77,26 +42,62 @@ An example of usage can be seen in the probe function of [#f2]_. .. [#f1] ../devicetree/bindings/fpga/fpga-region.txt .. [#f2] ../../drivers/fpga/of-fpga-region.c -API to program an FPGA ----------------------- - -.. kernel-doc:: drivers/fpga/fpga-region.c - :functions: fpga_region_program_fpga - API to add a new FPGA region ---------------------------- +* struct :c:type:`fpga_region` — The FPGA region struct +* :c:func:`devm_fpga_region_create` — Allocate and init a region struct +* :c:func:`fpga_region_register` — Register an FPGA region +* :c:func:`fpga_region_unregister` — Unregister an FPGA region + +The FPGA region's probe function will need to get a reference to the FPGA +Manager it will be using to do the programming. This usually would happen +during the region's probe function. + +* :c:func:`fpga_mgr_get` — Get a reference to an FPGA manager, raise ref count +* :c:func:`of_fpga_mgr_get` — Get a reference to an FPGA manager, raise ref count, + given a device node. +* :c:func:`fpga_mgr_put` — Put an FPGA manager + +The FPGA region will need to specify which bridges to control while programming +the FPGA. The region driver can build a list of bridges during probe time +(:c:member:`fpga_region->bridge_list`) or it can have a function that creates +the list of bridges to program just before programming +(:c:member:`fpga_region->get_bridges`). The FPGA bridge framework supplies the +following APIs to handle building or tearing down that list. + +* :c:func:`fpga_bridge_get_to_list` — Get a ref of an FPGA bridge, add it to a + list +* :c:func:`of_fpga_bridge_get_to_list` — Get a ref of an FPGA bridge, add it to a + list, given a device node +* :c:func:`fpga_bridges_put` — Given a list of bridges, put them + .. kernel-doc:: include/linux/fpga/fpga-region.h :functions: fpga_region .. kernel-doc:: drivers/fpga/fpga-region.c - :functions: fpga_region_create - -.. kernel-doc:: drivers/fpga/fpga-region.c - :functions: fpga_region_free + :functions: devm_fpga_region_create .. kernel-doc:: drivers/fpga/fpga-region.c :functions: fpga_region_register .. kernel-doc:: drivers/fpga/fpga-region.c :functions: fpga_region_unregister + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: fpga_mgr_get + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: of_fpga_mgr_get + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: fpga_mgr_put + +.. kernel-doc:: drivers/fpga/fpga-bridge.c + :functions: fpga_bridge_get_to_list + +.. kernel-doc:: drivers/fpga/fpga-bridge.c + :functions: of_fpga_bridge_get_to_list + +.. kernel-doc:: drivers/fpga/fpga-bridge.c + :functions: fpga_bridges_put diff --git a/Documentation/driver-api/fpga/index.rst b/Documentation/driver-api/fpga/index.rst index c51e5ebd544a..31a4773bd2e6 100644 --- a/Documentation/driver-api/fpga/index.rst +++ b/Documentation/driver-api/fpga/index.rst @@ -11,3 +11,5 @@ FPGA Subsystem fpga-mgr fpga-bridge fpga-region + fpga-programming + diff --git a/Documentation/driver-api/fpga/intro.rst b/Documentation/driver-api/fpga/intro.rst index 50d1cab84950..f54c7dabcc7d 100644 --- a/Documentation/driver-api/fpga/intro.rst +++ b/Documentation/driver-api/fpga/intro.rst @@ -44,7 +44,7 @@ FPGA Region ----------- If you are adding a new interface to the FPGA framework, add it on top -of an FPGA region to allow the most reuse of your interface. +of an FPGA region. The FPGA Region framework (fpga-region.c) associates managers and bridges as reconfigurable regions. A region may refer to the whole diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index 6d9f2f9fe20e..909f991b4c0d 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -29,7 +29,8 @@ available subsections can be seen below. iio/index input usb/index - pci + firewire + pci/index spi i2c hsi diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst new file mode 100644 index 000000000000..c6cf1fef61ce --- /dev/null +++ b/Documentation/driver-api/pci/index.rst @@ -0,0 +1,22 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================ +The Linux PCI driver implementer's API guide +============================================ + +.. class:: toc-title + + Table of contents + +.. toctree:: + :maxdepth: 2 + + pci + p2pdma + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst new file mode 100644 index 000000000000..4c577fa7bef9 --- /dev/null +++ b/Documentation/driver-api/pci/p2pdma.rst @@ -0,0 +1,145 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================ +PCI Peer-to-Peer DMA Support +============================ + +The PCI bus has pretty decent support for performing DMA transfers +between two devices on the bus. This type of transaction is henceforth +called Peer-to-Peer (or P2P). However, there are a number of issues that +make P2P transactions tricky to do in a perfectly safe way. + +One of the biggest issues is that PCI doesn't require forwarding +transactions between hierarchy domains, and in PCIe, each Root Port +defines a separate hierarchy domain. To make things worse, there is no +simple way to determine if a given Root Complex supports this or not. +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel +only supports doing P2P when the endpoints involved are all behind the +same PCI bridge, as such devices are all in the same PCI hierarchy +domain, and the spec guarantees that all transactions within the +hierarchy will be routable, but it does not require routing +between hierarchies. + +The second issue is that to make use of existing interfaces in Linux, +memory that is used for P2P transactions needs to be backed by struct +pages. However, PCI BARs are not typically cache coherent so there are +a few corner case gotchas with these pages so developers need to +be careful about what they do with them. + + +Driver Writer's Guide +===================== + +In a given P2P implementation there may be three or more different +types of kernel drivers in play: + +* Provider - A driver which provides or publishes P2P resources like + memory or doorbell registers to other drivers. +* Client - A driver which makes use of a resource by setting up a + DMA transaction to or from it. +* Orchestrator - A driver which orchestrates the flow of data between + clients and providers. + +In many cases there could be overlap between these three types (i.e., +it may be typical for a driver to be both a provider and a client). + +For example, in the NVMe Target Copy Offload implementation: + +* The NVMe PCI driver is both a client, provider and orchestrator + in that it exposes any CMB (Controller Memory Buffer) as a P2P memory + resource (provider), it accepts P2P memory pages as buffers in requests + to be used directly (client) and it can also make use of the CMB as + submission queue entries (orchastrator). +* The RDMA driver is a client in this arrangement so that an RNIC + can DMA directly to the memory exposed by the NVMe device. +* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC + to the P2P memory (CMB) and then to the NVMe device (and vice versa). + +This is currently the only arrangement supported by the kernel but +one could imagine slight tweaks to this that would allow for the same +functionality. For example, if a specific RNIC added a BAR with some +memory behind it, its driver could add support as a P2P provider and +then the NVMe Target could use the RNIC's memory instead of the CMB +in cases where the NVMe cards in use do not have CMB support. + + +Provider Drivers +---------------- + +A provider simply needs to register a BAR (or a portion of a BAR) +as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`. +This will register struct pages for all the specified memory. + +After that it may optionally publish all of its resources as +P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow +any orchestrator drivers to find and use the memory. When marked in +this way, the resource must be regular memory with no side effects. + +For the time being this is fairly rudimentary in that all resources +are typically going to be P2P memory. Future work will likely expand +this to include other types of resources like doorbells. + + +Client Drivers +-------------- + +A client driver typically only has to conditionally change its DMA map +routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead +of the usual :c:func:`dma_map_sg()` function. Memory mapped in this +way does not need to be unmapped. + +The client may also, optionally, make use of +:c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping +functions and when to use the regular mapping functions. In some +situations, it may be more appropriate to use a flag to indicate a +given request is P2P memory and map appropriately. It is important to +ensure that struct pages that back P2P memory stay out of code that +does not have support for them as other code may treat the pages as +regular memory which may not be appropriate. + + +Orchestrator Drivers +-------------------- + +The first task an orchestrator driver must do is compile a list of +all client devices that will be involved in a given transaction. For +example, the NVMe Target driver creates a list including the namespace +block device and the RNIC in use. If the orchestrator has access to +a specific P2P provider to use it may check compatibility using +:c:func:`pci_p2pdma_distance()` otherwise it may find a memory provider +that's compatible with all clients using :c:func:`pci_p2pmem_find()`. +If more than one provider is supported, the one nearest to all the clients will +be chosen first. If more than one provider is an equal distance away, the +one returned will be chosen at random (it is not an arbitrary but +truely random). This function returns the PCI device to use for the provider +with a reference taken and therefore when it's no longer needed it should be +returned with pci_dev_put(). + +Once a provider is selected, the orchestrator can then use +:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to +allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()` +and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for +allocating scatter-gather lists with P2P memory. + +Struct Page Caveats +------------------- + +Driver writers should be very careful about not passing these special +struct pages to code that isn't prepared for it. At this time, the kernel +interfaces do not have any checks for ensuring this. This obviously +precludes passing these pages to userspace. + +P2P memory is also technically IO memory but should never have any side +effects behind it. Thus, the order of loads and stores should not be important +and ioreadX(), iowriteX() and friends should not be necessary. +However, as the memory is not cache coherent, if access ever needs to +be protected by a spinlock then :c:func:`mmiowb()` must be used before +unlocking the lock. (See ACQUIRES VS I/O ACCESSES in +Documentation/memory-barriers.txt) + + +P2P DMA Support Library +======================= + +.. kernel-doc:: drivers/pci/p2pdma.c + :export: diff --git a/Documentation/driver-api/pci.rst b/Documentation/driver-api/pci/pci.rst index ca85e5e78b2c..ca85e5e78b2c 100644 --- a/Documentation/driver-api/pci.rst +++ b/Documentation/driver-api/pci/pci.rst diff --git a/Documentation/driver-api/soundwire/stream.rst b/Documentation/driver-api/soundwire/stream.rst index 29121aa55fb9..26a6064503fd 100644 --- a/Documentation/driver-api/soundwire/stream.rst +++ b/Documentation/driver-api/soundwire/stream.rst @@ -101,6 +101,34 @@ interface. :: +--------------------+ | | +----------------+ +Example 5: Stereo Stream with L and R channel is rendered by 2 Masters, each +rendering one channel, and is received by two different Slaves, each +receiving one channel. Both Masters and both Slaves are using single port. :: + + +---------------+ Clock Signal +---------------+ + | Master +----------------------------------+ Slave | + | Interface | | Interface | + | 1 | | 1 | + | | Data Signal | | + | L +----------------------------------+ L | + | (Data) | Data Direction | (Data) | + +---------------+ +-----------------------> +---------------+ + + +---------------+ Clock Signal +---------------+ + | Master +----------------------------------+ Slave | + | Interface | | Interface | + | 2 | | 2 | + | | Data Signal | | + | R +----------------------------------+ R | + | (Data) | Data Direction | (Data) | + +---------------+ +-----------------------> +---------------+ + +Note: In multi-link cases like above, to lock, one would acquire a global +lock and then go on locking bus instances. But, in this case the caller +framework(ASoC DPCM) guarantees that stream operations on a card are +always serialized. So, there is no race condition and hence no need for +global lock. + SoundWire Stream Management flow ================================ @@ -174,6 +202,7 @@ per stream. From ASoC DPCM framework, this stream state maybe linked to .startup() operation. .. code-block:: c + int sdw_alloc_stream(char * stream_name); @@ -200,6 +229,7 @@ only be invoked once by respective Master(s) and Slave(s). From ASoC DPCM framework, this stream state is linked to .hw_params() operation. .. code-block:: c + int sdw_stream_add_master(struct sdw_bus * bus, struct sdw_stream_config * stream_config, struct sdw_ports_config * ports_config, @@ -245,6 +275,7 @@ stream. From ASoC DPCM framework, this stream state is linked to .prepare() operation. .. code-block:: c + int sdw_prepare_stream(struct sdw_stream_runtime * stream); @@ -274,6 +305,7 @@ stream. From ASoC DPCM framework, this stream state is linked to .trigger() start operation. .. code-block:: c + int sdw_enable_stream(struct sdw_stream_runtime * stream); SDW_STREAM_DISABLED @@ -301,6 +333,7 @@ per stream. From ASoC DPCM framework, this stream state is linked to .trigger() stop operation. .. code-block:: c + int sdw_disable_stream(struct sdw_stream_runtime * stream); @@ -325,6 +358,7 @@ per stream. From ASoC DPCM framework, this stream state is linked to .trigger() stop operation. .. code-block:: c + int sdw_deprepare_stream(struct sdw_stream_runtime * stream); @@ -349,6 +383,7 @@ all the Master(s) and Slave(s) associated with stream. From ASoC DPCM framework, this stream state is linked to .hw_free() operation. .. code-block:: c + int sdw_stream_remove_master(struct sdw_bus * bus, struct sdw_stream_runtime * stream); int sdw_stream_remove_slave(struct sdw_slave * slave, @@ -361,6 +396,7 @@ stream assigned as part of ALLOCATED state. In .shutdown() the data structure maintaining stream state are freed up. .. code-block:: c + void sdw_release_stream(struct sdw_stream_runtime * stream); Not Supported diff --git a/Documentation/driver-api/uio-howto.rst b/Documentation/driver-api/uio-howto.rst index fb2eb73be4a3..25f50eace28b 100644 --- a/Documentation/driver-api/uio-howto.rst +++ b/Documentation/driver-api/uio-howto.rst @@ -463,8 +463,8 @@ Getting information about your UIO device Information about all UIO devices is available in sysfs. The first thing you should do in your driver is check ``name`` and ``version`` to make -sure your talking to the right device and that its kernel driver has the -version you expect. +sure you're talking to the right device and that its kernel driver has +the version you expect. You should also make sure that the memory mapping you need exists and has the size you expect. diff --git a/Documentation/efi-stub.txt b/Documentation/efi-stub.txt index 41df801f9a50..833edb0d0bc4 100644 --- a/Documentation/efi-stub.txt +++ b/Documentation/efi-stub.txt @@ -83,7 +83,18 @@ is passed to bzImage.efi. The "dtb=" option ----------------- -For the ARM and arm64 architectures, we also need to be able to provide a -device tree to the kernel. This is done with the "dtb=" command line option, -and is processed in the same manner as the "initrd=" option that is +For the ARM and arm64 architectures, a device tree must be provided to +the kernel. Normally firmware shall supply the device tree via the +EFI CONFIGURATION TABLE. However, the "dtb=" command line option can +be used to override the firmware supplied device tree, or to supply +one when firmware is unable to. + +Please note: Firmware adds runtime configuration information to the +device tree before booting the kernel. If dtb= is used to override +the device tree, then any runtime data provided by firmware will be +lost. The dtb= option should only be used either as a debug tool, or +as a last resort when a device tree is not provided in the EFI +CONFIGURATION TABLE. + +"dtb=" is processed in the same manner as the "initrd=" option that is described above. diff --git a/Documentation/fb/00-INDEX b/Documentation/fb/00-INDEX deleted file mode 100644 index fe85e7c5907a..000000000000 --- a/Documentation/fb/00-INDEX +++ /dev/null @@ -1,75 +0,0 @@ -Index of files in Documentation/fb. If you think something about frame -buffer devices needs an entry here, needs correction or you've written one -please mail me. - Geert Uytterhoeven <geert@linux-m68k.org> - -00-INDEX - - this file. -api.txt - - The frame buffer API between applications and buffer devices. -arkfb.txt - - info on the fbdev driver for ARK Logic chips. -aty128fb.txt - - info on the ATI Rage128 frame buffer driver. -cirrusfb.txt - - info on the driver for Cirrus Logic chipsets. -cmap_xfbdev.txt - - an introduction to fbdev's cmap structures. -deferred_io.txt - - an introduction to deferred IO. -efifb.txt - - info on the EFI platform driver for Intel based Apple computers. -ep93xx-fb.txt - - info on the driver for EP93xx LCD controller. -fbcon.txt - - intro to and usage guide for the framebuffer console (fbcon). -framebuffer.txt - - introduction to frame buffer devices. -gxfb.txt - - info on the framebuffer driver for AMD Geode GX2 based processors. -intel810.txt - - documentation for the Intel 810/815 framebuffer driver. -intelfb.txt - - docs for Intel 830M/845G/852GM/855GM/865G/915G/945G fb driver. -internals.txt - - quick overview of frame buffer device internals. -lxfb.txt - - info on the framebuffer driver for AMD Geode LX based processors. -matroxfb.txt - - info on the Matrox framebuffer driver for Alpha, Intel and PPC. -metronomefb.txt - - info on the driver for the Metronome display controller. -modedb.txt - - info on the video mode database. -pvr2fb.txt - - info on the PowerVR 2 frame buffer driver. -pxafb.txt - - info on the driver for the PXA25x LCD controller. -s3fb.txt - - info on the fbdev driver for S3 Trio/Virge chips. -sa1100fb.txt - - information about the driver for the SA-1100 LCD controller. -sh7760fb.txt - - info on the SH7760/SH7763 integrated LCDC Framebuffer driver. -sisfb.txt - - info on the framebuffer device driver for various SiS chips. -sm501.txt - - info on the framebuffer device driver for sm501 videoframebuffer. -sstfb.txt - - info on the frame buffer driver for 3dfx' Voodoo Graphics boards. -tgafb.txt - - info on the TGA (DECChip 21030) frame buffer driver. -tridentfb.txt - info on the framebuffer driver for some Trident chip based cards. -udlfb.txt - - Driver for DisplayLink USB 2.0 chips. -uvesafb.txt - - info on the userspace VESA (VBE2+ compliant) frame buffer device. -vesafb.txt - - info on the VESA frame buffer device. -viafb.modes - - list of modes for VIA Integration Graphic Chip. -viafb.txt - - info on the VIA Integration Graphic Chip console framebuffer driver. -vt8623fb.txt - - info on the fb driver for the graphics core in VIA VT8623 chipsets. diff --git a/Documentation/fb/vesafb.txt b/Documentation/fb/vesafb.txt index 950d5a658cb3..413bb73235be 100644 --- a/Documentation/fb/vesafb.txt +++ b/Documentation/fb/vesafb.txt @@ -114,11 +114,11 @@ to turn it on. You can pass options to vesafb using "video=vesafb:option" on the kernel command line. Multiple options should be separated -by comma, like this: "video=vesafb:ypan,invers" +by comma, like this: "video=vesafb:ypan,inverse" Accepted options: -invers no comment... +inverse use inverse color map ypan enable display panning using the VESA protected mode interface. The visible screen is just a window of the diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX deleted file mode 100644 index 0937bade1099..000000000000 --- a/Documentation/filesystems/00-INDEX +++ /dev/null @@ -1,153 +0,0 @@ -00-INDEX - - this file (info on some of the filesystems supported by linux). -Locking - - info on locking rules as they pertain to Linux VFS. -9p.txt - - 9p (v9fs) is an implementation of the Plan 9 remote fs protocol. -adfs.txt - - info and mount options for the Acorn Advanced Disc Filing System. -afs.txt - - info and examples for the distributed AFS (Andrew File System) fs. -affs.txt - - info and mount options for the Amiga Fast File System. -autofs-mount-control.txt - - info on device control operations for autofs module. -automount-support.txt - - information about filesystem automount support. -befs.txt - - information about the BeOS filesystem for Linux. -bfs.txt - - info for the SCO UnixWare Boot Filesystem (BFS). -btrfs.txt - - info for the BTRFS filesystem. -caching/ - - directory containing filesystem cache documentation. -ceph.txt - - info for the Ceph Distributed File System. -cifs/ - - directory containing CIFS filesystem documentation and example code. -coda.txt - - description of the CODA filesystem. -configfs/ - - directory containing configfs documentation and example code. -cramfs.txt - - info on the cram filesystem for small storage (ROMs etc). -dax.txt - - info on avoiding the page cache for files stored on CPU-addressable - storage devices. -debugfs.txt - - info on the debugfs filesystem. -devpts.txt - - info on the devpts filesystem. -directory-locking - - info about the locking scheme used for directory operations. -dlmfs.txt - - info on the userspace interface to the OCFS2 DLM. -dnotify.txt - - info about directory notification in Linux. -dnotify_test.c - - example program for dnotify. -ecryptfs.txt - - docs on eCryptfs: stacked cryptographic filesystem for Linux. -efivarfs.txt - - info for the efivarfs filesystem. -exofs.txt - - info, usage, mount options, design about EXOFS. -ext2.txt - - info, mount options and specifications for the Ext2 filesystem. -ext3.txt - - info, mount options and specifications for the Ext3 filesystem. -ext4.txt - - info, mount options and specifications for the Ext4 filesystem. -f2fs.txt - - info and mount options for the F2FS filesystem. -fiemap.txt - - info on fiemap ioctl. -files.txt - - info on file management in the Linux kernel. -fuse.txt - - info on the Filesystem in User SpacE including mount options. -gfs2-glocks.txt - - info on the Global File System 2 - Glock internal locking rules. -gfs2-uevents.txt - - info on the Global File System 2 - uevents. -gfs2.txt - - info on the Global File System 2. -hfs.txt - - info on the Macintosh HFS Filesystem for Linux. -hfsplus.txt - - info on the Macintosh HFSPlus Filesystem for Linux. -hpfs.txt - - info and mount options for the OS/2 HPFS. -inotify.txt - - info on the powerful yet simple file change notification system. -isofs.txt - - info and mount options for the ISO 9660 (CDROM) filesystem. -jfs.txt - - info and mount options for the JFS filesystem. -locks.txt - - info on file locking implementations, flock() vs. fcntl(), etc. -mandatory-locking.txt - - info on the Linux implementation of Sys V mandatory file locking. -nfs/ - - nfs-related documentation. -nilfs2.txt - - info and mount options for the NILFS2 filesystem. -ntfs.txt - - info and mount options for the NTFS filesystem (Windows NT). -ocfs2.txt - - info and mount options for the OCFS2 clustered filesystem. -omfs.txt - - info on the Optimized MPEG FileSystem. -path-lookup.txt - - info on path walking and name lookup locking. -pohmelfs/ - - directory containing pohmelfs filesystem documentation. -porting - - various information on filesystem porting. -proc.txt - - info on Linux's /proc filesystem. -qnx6.txt - - info on the QNX6 filesystem. -quota.txt - - info on Quota subsystem. -ramfs-rootfs-initramfs.txt - - info on the 'in memory' filesystems ramfs, rootfs and initramfs. -relay.txt - - info on relay, for efficient streaming from kernel to user space. -romfs.txt - - description of the ROMFS filesystem. -seq_file.txt - - how to use the seq_file API. -sharedsubtree.txt - - a description of shared subtrees for namespaces. -spufs.txt - - info and mount options for the SPU filesystem used on Cell. -squashfs.txt - - info on the squashfs filesystem. -sysfs-pci.txt - - info on accessing PCI device resources through sysfs. -sysfs-tagging.txt - - info on sysfs tagging to avoid duplicates. -sysfs.txt - - info on sysfs, a ram-based filesystem for exporting kernel objects. -sysv-fs.txt - - info on the SystemV/V7/Xenix/Coherent filesystem. -tmpfs.txt - - info on tmpfs, a filesystem that holds all files in virtual memory. -ubifs.txt - - info on the Unsorted Block Images FileSystem. -udf.txt - - info and mount options for the UDF filesystem. -ufs.txt - - info on the ufs filesystem. -vfat.txt - - info on using the VFAT filesystem used in Windows NT and Windows 95. -vfs.txt - - overview of the Virtual File System. -xfs-delayed-logging-design.txt - - info on the XFS Delayed Logging Design. -xfs-self-describing-metadata.txt - - info on XFS Self Describing Metadata. -xfs.txt - - info and mount options for the XFS filesystem. diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt index 70cb68bed2e8..bc393e0a22b8 100644 --- a/Documentation/filesystems/dax.txt +++ b/Documentation/filesystems/dax.txt @@ -75,7 +75,7 @@ exposure of uninitialized data through mmap. These filesystems may be used for inspiration: - ext2: see Documentation/filesystems/ext2.txt -- ext4: see Documentation/filesystems/ext4.txt +- ext4: see Documentation/filesystems/ext4/ext4.rst - xfs: see Documentation/filesystems/xfs.txt diff --git a/Documentation/filesystems/ext2.txt b/Documentation/filesystems/ext2.txt index 81c0becab225..a45c9fc0747b 100644 --- a/Documentation/filesystems/ext2.txt +++ b/Documentation/filesystems/ext2.txt @@ -358,7 +358,7 @@ and are copied into the filesystem. If a transaction is incomplete at the time of the crash, then there is no guarantee of consistency for the blocks in that transaction so they are discarded (which means any filesystem changes they represent are also lost). -Check Documentation/filesystems/ext4.txt if you want to read more about +Check Documentation/filesystems/ext4/ext4.rst if you want to read more about ext4 and journaling. References diff --git a/Documentation/filesystems/ext4/ondisk/about.rst b/Documentation/filesystems/ext4/about.rst index 0aadba052264..0aadba052264 100644 --- a/Documentation/filesystems/ext4/ondisk/about.rst +++ b/Documentation/filesystems/ext4/about.rst diff --git a/Documentation/filesystems/ext4/ondisk/allocators.rst b/Documentation/filesystems/ext4/allocators.rst index 7aa85152ace3..7aa85152ace3 100644 --- a/Documentation/filesystems/ext4/ondisk/allocators.rst +++ b/Documentation/filesystems/ext4/allocators.rst diff --git a/Documentation/filesystems/ext4/ondisk/attributes.rst b/Documentation/filesystems/ext4/attributes.rst index 0b01b67b81fe..54386a010a8d 100644 --- a/Documentation/filesystems/ext4/ondisk/attributes.rst +++ b/Documentation/filesystems/ext4/attributes.rst @@ -30,7 +30,7 @@ Extended attributes, when stored after the inode, have a header ``ext4_xattr_ibody_header`` that is 4 bytes long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -47,7 +47,7 @@ The beginning of an extended attribute block is in ``struct ext4_xattr_header``, which is 32 bytes long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -92,7 +92,7 @@ entries must be stored in sorted order. The sort order is Attributes stored inside an inode do not need be stored in sorted order. .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -157,7 +157,7 @@ attribute name index field is set, and matching string is removed from the key name. Here is a map of name index values to key prefixes: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Name Index diff --git a/Documentation/filesystems/ext4/ondisk/bigalloc.rst b/Documentation/filesystems/ext4/bigalloc.rst index c6d88557553c..c6d88557553c 100644 --- a/Documentation/filesystems/ext4/ondisk/bigalloc.rst +++ b/Documentation/filesystems/ext4/bigalloc.rst diff --git a/Documentation/filesystems/ext4/ondisk/bitmaps.rst b/Documentation/filesystems/ext4/bitmaps.rst index c7546dbc197a..c7546dbc197a 100644 --- a/Documentation/filesystems/ext4/ondisk/bitmaps.rst +++ b/Documentation/filesystems/ext4/bitmaps.rst diff --git a/Documentation/filesystems/ext4/ondisk/blockgroup.rst b/Documentation/filesystems/ext4/blockgroup.rst index baf888e4c06a..baf888e4c06a 100644 --- a/Documentation/filesystems/ext4/ondisk/blockgroup.rst +++ b/Documentation/filesystems/ext4/blockgroup.rst diff --git a/Documentation/filesystems/ext4/ondisk/blockmap.rst b/Documentation/filesystems/ext4/blockmap.rst index 30e25750d88a..30e25750d88a 100644 --- a/Documentation/filesystems/ext4/ondisk/blockmap.rst +++ b/Documentation/filesystems/ext4/blockmap.rst diff --git a/Documentation/filesystems/ext4/ondisk/blocks.rst b/Documentation/filesystems/ext4/blocks.rst index 73d4dc0f7bda..73d4dc0f7bda 100644 --- a/Documentation/filesystems/ext4/ondisk/blocks.rst +++ b/Documentation/filesystems/ext4/blocks.rst diff --git a/Documentation/filesystems/ext4/ondisk/checksums.rst b/Documentation/filesystems/ext4/checksums.rst index 9d6a793b2e03..5519e253810d 100644 --- a/Documentation/filesystems/ext4/ondisk/checksums.rst +++ b/Documentation/filesystems/ext4/checksums.rst @@ -28,7 +28,7 @@ of checksum. The checksum function is whatever the superblock describes (crc32c as of October 2013) unless noted otherwise. .. list-table:: - :widths: 1 1 4 + :widths: 20 8 50 :header-rows: 1 * - Metadata diff --git a/Documentation/filesystems/ext4/ondisk/directory.rst b/Documentation/filesystems/ext4/directory.rst index 8fcba68c2884..614034e24669 100644 --- a/Documentation/filesystems/ext4/ondisk/directory.rst +++ b/Documentation/filesystems/ext4/directory.rst @@ -34,7 +34,7 @@ is at most 263 bytes long, though on disk you'll need to reference ``dirent.rec_len`` to know for sure. .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -66,7 +66,7 @@ tree traversal. This format is ``ext4_dir_entry_2``, which is at most ``dirent.rec_len`` to know for sure. .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -99,7 +99,7 @@ tree traversal. This format is ``ext4_dir_entry_2``, which is at most The directory file type is one of the following values: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -130,7 +130,7 @@ in the place where the name normally goes. The structure is ``struct ext4_dir_entry_tail``: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -212,7 +212,7 @@ The root of the htree is in ``struct dx_root``, which is the full length of a data block: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -305,7 +305,7 @@ of a data block: The directory hash is one of the following values: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -327,7 +327,7 @@ Interior nodes of an htree are recorded as ``struct dx_node``, which is also the full length of a data block: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -375,7 +375,7 @@ The hash maps that exist in both ``struct dx_root`` and long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -405,7 +405,7 @@ directory index (which will ensure that there's space for the checksum. The dx\_tail structure is 8 bytes long and looks like this: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset diff --git a/Documentation/filesystems/ext4/ondisk/dynamic.rst b/Documentation/filesystems/ext4/dynamic.rst index bb0c84333341..bb0c84333341 100644 --- a/Documentation/filesystems/ext4/ondisk/dynamic.rst +++ b/Documentation/filesystems/ext4/dynamic.rst diff --git a/Documentation/filesystems/ext4/ondisk/eainode.rst b/Documentation/filesystems/ext4/eainode.rst index ecc0d01a0a72..ecc0d01a0a72 100644 --- a/Documentation/filesystems/ext4/ondisk/eainode.rst +++ b/Documentation/filesystems/ext4/eainode.rst diff --git a/Documentation/filesystems/ext4/ext4.rst b/Documentation/filesystems/ext4/ext4.rst deleted file mode 100644 index 9d4368d591fa..000000000000 --- a/Documentation/filesystems/ext4/ext4.rst +++ /dev/null @@ -1,613 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -======================== -General Information -======================== - -Ext4 is an advanced level of the ext3 filesystem which incorporates -scalability and reliability enhancements for supporting large filesystems -(64 bit) in keeping with increasing disk capacities and state-of-the-art -feature requirements. - -Mailing list: linux-ext4@vger.kernel.org -Web site: http://ext4.wiki.kernel.org - - -Quick usage instructions -======================== - -Note: More extensive information for getting started with ext4 can be -found at the ext4 wiki site at the URL: -http://ext4.wiki.kernel.org/index.php/Ext4_Howto - - - The latest version of e2fsprogs can be found at: - - https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ - - or - - http://sourceforge.net/project/showfiles.php?group_id=2406 - - or grab the latest git repository from: - - https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git - - - Create a new filesystem using the ext4 filesystem type: - - # mke2fs -t ext4 /dev/hda1 - - Or to configure an existing ext3 filesystem to support extents: - - # tune2fs -O extents /dev/hda1 - - If the filesystem was created with 128 byte inodes, it can be - converted to use 256 byte for greater efficiency via: - - # tune2fs -I 256 /dev/hda1 - - - Mounting: - - # mount -t ext4 /dev/hda1 /wherever - - - When comparing performance with other filesystems, it's always - important to try multiple workloads; very often a subtle change in a - workload parameter can completely change the ranking of which - filesystems do well compared to others. When comparing versus ext3, - note that ext4 enables write barriers by default, while ext3 does - not enable write barriers by default. So it is useful to use - explicitly specify whether barriers are enabled or not when via the - '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems - for a fair comparison. When tuning ext3 for best benchmark numbers, - it is often worthwhile to try changing the data journaling mode; '-o - data=writeback' can be faster for some workloads. (Note however that - running mounted with data=writeback can potentially leave stale data - exposed in recently written files in case of an unclean shutdown, - which could be a security exposure in some situations.) Configuring - the filesystem with a large journal can also be helpful for - metadata-intensive workloads. - -Features -======== - -Currently Available -------------------- - -* ability to use filesystems > 16TB (e2fsprogs support not available yet) -* extent format reduces metadata overhead (RAM, IO for access, transactions) -* extent format more robust in face of on-disk corruption due to magics, -* internal redundancy in tree -* improved file allocation (multi-block alloc) -* lift 32000 subdirectory limit imposed by i_links_count[1] -* nsec timestamps for mtime, atime, ctime, create time -* inode version field on disk (NFSv4, Lustre) -* reduced e2fsck time via uninit_bg feature -* journal checksumming for robustness, performance -* persistent file preallocation (e.g for streaming media, databases) -* ability to pack bitmaps and inode tables into larger virtual groups via the - flex_bg feature -* large file support -* inode allocation using large virtual block groups via flex_bg -* delayed allocation -* large block (up to pagesize) support -* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force - the ordering) - -[1] Filesystems with a block size of 1k may see a limit imposed by the -directory hash tree having a maximum depth of two. - -Options -======= - -When mounting an ext4 filesystem, the following option are accepted: -(*) == default - -======================= ======================================================= -Mount Option Description -======================= ======================================================= -ro Mount filesystem read only. Note that ext4 will - replay the journal (and thus write to the - partition) even when mounted "read only". The - mount options "ro,noload" can be used to prevent - writes to the filesystem. - -journal_checksum Enable checksumming of the journal transactions. - This will allow the recovery code in e2fsck and the - kernel to detect corruption in the kernel. It is a - compatible change and will be ignored by older kernels. - -journal_async_commit Commit block can be written to disk without waiting - for descriptor blocks. If enabled older kernels cannot - mount the device. This will enable 'journal_checksum' - internally. - -journal_path=path -journal_dev=devnum When the external journal device's major/minor numbers - have changed, these options allow the user to specify - the new journal location. The journal device is - identified through either its new major/minor numbers - encoded in devnum, or via a path to the device. - -norecovery Don't load the journal on mounting. Note that -noload if the filesystem was not unmounted cleanly, - skipping the journal replay will lead to the - filesystem containing inconsistencies that can - lead to any number of problems. - -data=journal All data are committed into the journal prior to being - written into the main file system. Enabling - this mode will disable delayed allocation and - O_DIRECT support. - -data=ordered (*) All data are forced directly out to the main file - system prior to its metadata being committed to the - journal. - -data=writeback Data ordering is not preserved, data may be written - into the main file system after its metadata has been - committed to the journal. - -commit=nrsec (*) Ext4 can be told to sync all its data and metadata - every 'nrsec' seconds. The default value is 5 seconds. - This means that if you lose your power, you will lose - as much as the latest 5 seconds of work (your - filesystem will not be damaged though, thanks to the - journaling). This default value (or any low value) - will hurt performance, but it's good for data-safety. - Setting it to 0 will have the same effect as leaving - it at the default (5 seconds). - Setting it to very large values will improve - performance. - -barrier=<0|1(*)> This enables/disables the use of write barriers in -barrier(*) the jbd code. barrier=0 disables, barrier=1 enables. -nobarrier This also requires an IO stack which can support - barriers, and if jbd gets an error on a barrier - write, it will disable again with a warning. - Write barriers enforce proper on-disk ordering - of journal commits, making volatile disk write caches - safe to use, at some performance penalty. If - your disks are battery-backed in one way or another, - disabling barriers may safely improve performance. - The mount options "barrier" and "nobarrier" can - also be used to enable or disable barriers, for - consistency with other ext4 mount options. - -inode_readahead_blks=n This tuning parameter controls the maximum - number of inode table blocks that ext4's inode - table readahead algorithm will pre-read into - the buffer cache. The default value is 32 blocks. - -nouser_xattr Disables Extended User Attributes. See the - attr(5) manual page for more information about - extended attributes. - -noacl This option disables POSIX Access Control List - support. If ACL support is enabled in the kernel - configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL is - enabled by default on mount. See the acl(5) manual - page for more information about acl. - -bsddf (*) Make 'df' act like BSD. -minixdf Make 'df' act like Minix. - -debug Extra debugging information is sent to syslog. - -abort Simulate the effects of calling ext4_abort() for - debugging purposes. This is normally used while - remounting a filesystem which is already mounted. - -errors=remount-ro Remount the filesystem read-only on an error. -errors=continue Keep going on a filesystem error. -errors=panic Panic and halt the machine if an error occurs. - (These mount options override the errors behavior - specified in the superblock, which can be configured - using tune2fs) - -data_err=ignore(*) Just print an error message if an error occurs - in a file data buffer in ordered mode. -data_err=abort Abort the journal if an error occurs in a file - data buffer in ordered mode. - -grpid New objects have the group ID of their parent. -bsdgroups - -nogrpid (*) New objects have the group ID of their creator. -sysvgroups - -resgid=n The group ID which may use the reserved blocks. - -resuid=n The user ID which may use the reserved blocks. - -sb=n Use alternate superblock at this location. - -quota These options are ignored by the filesystem. They -noquota are used only by quota tools to recognize volumes -grpquota where quota should be turned on. See documentation -usrquota in the quota-tools package for more details - (http://sourceforge.net/projects/linuxquota). - -jqfmt=<quota type> These options tell filesystem details about quota -usrjquota=<file> so that quota information can be properly updated -grpjquota=<file> during journal replay. They replace the above - quota options. See documentation in the quota-tools - package for more details - (http://sourceforge.net/projects/linuxquota). - -stripe=n Number of filesystem blocks that mballoc will try - to use for allocation size and alignment. For RAID5/6 - systems this should be the number of data - disks * RAID chunk size in file system blocks. - -delalloc (*) Defer block allocation until just before ext4 - writes out the block(s) in question. This - allows ext4 to better allocation decisions - more efficiently. -nodelalloc Disable delayed allocation. Blocks are allocated - when the data is copied from userspace to the - page cache, either via the write(2) system call - or when an mmap'ed page which was previously - unallocated is written for the first time. - -max_batch_time=usec Maximum amount of time ext4 should wait for - additional filesystem operations to be batch - together with a synchronous write operation. - Since a synchronous write operation is going to - force a commit and then a wait for the I/O - complete, it doesn't cost much, and can be a - huge throughput win, we wait for a small amount - of time to see if any other transactions can - piggyback on the synchronous write. The - algorithm used is designed to automatically tune - for the speed of the disk, by measuring the - amount of time (on average) that it takes to - finish committing a transaction. Call this time - the "commit time". If the time that the - transaction has been running is less than the - commit time, ext4 will try sleeping for the - commit time to see if other operations will join - the transaction. The commit time is capped by - the max_batch_time, which defaults to 15000us - (15ms). This optimization can be turned off - entirely by setting max_batch_time to 0. - -min_batch_time=usec This parameter sets the commit time (as - described above) to be at least min_batch_time. - It defaults to zero microseconds. Increasing - this parameter may improve the throughput of - multi-threaded, synchronous workloads on very - fast disks, at the cost of increasing latency. - -journal_ioprio=prio The I/O priority (from 0 to 7, where 0 is the - highest priority) which should be used for I/O - operations submitted by kjournald2 during a - commit operation. This defaults to 3, which is - a slightly higher priority than the default I/O - priority. - -auto_da_alloc(*) Many broken applications don't use fsync() when -noauto_da_alloc replacing existing files via patterns such as - fd = open("foo.new")/write(fd,..)/close(fd)/ - rename("foo.new", "foo"), or worse yet, - fd = open("foo", O_TRUNC)/write(fd,..)/close(fd). - If auto_da_alloc is enabled, ext4 will detect - the replace-via-rename and replace-via-truncate - patterns and force that any delayed allocation - blocks are allocated such that at the next - journal commit, in the default data=ordered - mode, the data blocks of the new file are forced - to disk before the rename() operation is - committed. This provides roughly the same level - of guarantees as ext3, and avoids the - "zero-length" problem that can happen when a - system crashes before the delayed allocation - blocks are forced to disk. - -noinit_itable Do not initialize any uninitialized inode table - blocks in the background. This feature may be - used by installation CD's so that the install - process can complete as quickly as possible; the - inode table initialization process would then be - deferred until the next time the file system - is unmounted. - -init_itable=n The lazy itable init code will wait n times the - number of milliseconds it took to zero out the - previous block group's inode table. This - minimizes the impact on the system performance - while file system's inode table is being initialized. - -discard Controls whether ext4 should issue discard/TRIM -nodiscard(*) commands to the underlying block device when - blocks are freed. This is useful for SSD devices - and sparse/thinly-provisioned LUNs, but it is off - by default until sufficient testing has been done. - -nouid32 Disables 32-bit UIDs and GIDs. This is for - interoperability with older kernels which only - store and expect 16-bit values. - -block_validity(*) These options enable or disable the in-kernel -noblock_validity facility for tracking filesystem metadata blocks - within internal data structures. This allows multi- - block allocator and other routines to notice - bugs or corrupted allocation bitmaps which cause - blocks to be allocated which overlap with - filesystem metadata blocks. - -dioread_lock Controls whether or not ext4 should use the DIO read -dioread_nolock locking. If the dioread_nolock option is specified - ext4 will allocate uninitialized extent before buffer - write and convert the extent to initialized after IO - completes. This approach allows ext4 code to avoid - using inode mutex, which improves scalability on high - speed storages. However this does not work with - data journaling and dioread_nolock option will be - ignored with kernel warning. Note that dioread_nolock - code path is only used for extent-based files. - Because of the restrictions this options comprises - it is off by default (e.g. dioread_lock). - -max_dir_size_kb=n This limits the size of directories so that any - attempt to expand them beyond the specified - limit in kilobytes will cause an ENOSPC error. - This is useful in memory constrained - environments, where a very large directory can - cause severe performance problems or even - provoke the Out Of Memory killer. (For example, - if there is only 512mb memory available, a 176mb - directory may seriously cramp the system's style.) - -i_version Enable 64-bit inode version support. This option is - off by default. - -dax Use direct access (no page cache). See - Documentation/filesystems/dax.txt. Note that - this option is incompatible with data=journal. -======================= ======================================================= - -Data Mode -========= -There are 3 different data modes: - -* writeback mode - - In data=writeback mode, ext4 does not journal data at all. This mode provides - a similar level of journaling as that of XFS, JFS, and ReiserFS in its default - mode - metadata journaling. A crash+recovery can cause incorrect data to - appear in files which were written shortly before the crash. This mode will - typically provide the best ext4 performance. - -* ordered mode - - In data=ordered mode, ext4 only officially journals metadata, but it logically - groups metadata information related to data changes with the data blocks into - a single unit called a transaction. When it's time to write the new metadata - out to disk, the associated data blocks are written first. In general, this - mode performs slightly slower than writeback but significantly faster than - journal mode. - -* journal mode - - data=journal mode provides full data and metadata journaling. All new data is - written to the journal first, and then to its final location. In the event of - a crash, the journal can be replayed, bringing both data and metadata into a - consistent state. This mode is the slowest except when data needs to be read - from and written to disk at the same time where it outperforms all others - modes. Enabling this mode will disable delayed allocation and O_DIRECT - support. - -/proc entries -============= - -Information about mounted ext4 file systems can be found in -/proc/fs/ext4. Each mounted filesystem will have a directory in -/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or -/proc/fs/ext4/dm-0). The files in each per-device directory are shown -in table below. - -Files in /proc/fs/ext4/<devname> - -================ ======= - File Content -================ ======= - mb_groups details of multiblock allocator buddy cache of free blocks -================ ======= - -/sys entries -============ - -Information about mounted ext4 file systems can be found in -/sys/fs/ext4. Each mounted filesystem will have a directory in -/sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or -/sys/fs/ext4/dm-0). The files in each per-device directory are shown -in table below. - -Files in /sys/fs/ext4/<devname>: - -(see also Documentation/ABI/testing/sysfs-fs-ext4) - -============================= ================================================= -File Content -============================= ================================================= - delayed_allocation_blocks This file is read-only and shows the number of - blocks that are dirty in the page cache, but - which do not have their location in the - filesystem allocated yet. - -inode_goal Tuning parameter which (if non-zero) controls - the goal inode used by the inode allocator in - preference to all other allocation heuristics. - This is intended for debugging use only, and - should be 0 on production systems. - -inode_readahead_blks Tuning parameter which controls the maximum - number of inode table blocks that ext4's inode - table readahead algorithm will pre-read into - the buffer cache - -lifetime_write_kbytes This file is read-only and shows the number of - kilobytes of data that have been written to this - filesystem since it was created. - - max_writeback_mb_bump The maximum number of megabytes the writeback - code will try to write out before move on to - another inode. - - mb_group_prealloc The multiblock allocator will round up allocation - requests to a multiple of this tuning parameter if - the stripe size is not set in the ext4 superblock - - mb_max_to_scan The maximum number of extents the multiblock - allocator will search to find the best extent - - mb_min_to_scan The minimum number of extents the multiblock - allocator will search to find the best extent - - mb_order2_req Tuning parameter which controls the minimum size - for requests (as a power of 2) where the buddy - cache is used - - mb_stats Controls whether the multiblock allocator should - collect statistics, which are shown during the - unmount. 1 means to collect statistics, 0 means - not to collect statistics - - mb_stream_req Files which have fewer blocks than this tunable - parameter will have their blocks allocated out - of a block group specific preallocation pool, so - that small files are packed closely together. - Each large file will have its blocks allocated - out of its own unique preallocation pool. - - session_write_kbytes This file is read-only and shows the number of - kilobytes of data that have been written to this - filesystem since it was mounted. - - reserved_clusters This is RW file and contains number of reserved - clusters in the file system which will be used - in the specific situations to avoid costly - zeroout, unexpected ENOSPC, or possible data - loss. The default is 2% or 4096 clusters, - whichever is smaller and this can be changed - however it can never exceed number of clusters - in the file system. If there is not enough space - for the reserved space when mounting the file - mount will _not_ fail. -============================= ================================================= - -Ioctls -====== - -There is some Ext4 specific functionality which can be accessed by applications -through the system call interfaces. The list of all Ext4 specific ioctls are -shown in the table below. - -Table of Ext4 specific ioctls - -============================= ================================================= -Ioctl Description -============================= ================================================= - EXT4_IOC_GETFLAGS Get additional attributes associated with inode. - The ioctl argument is an integer bitfield, with - bit values described in ext4.h. This ioctl is an - alias for FS_IOC_GETFLAGS. - - EXT4_IOC_SETFLAGS Set additional attributes associated with inode. - The ioctl argument is an integer bitfield, with - bit values described in ext4.h. This ioctl is an - alias for FS_IOC_SETFLAGS. - - EXT4_IOC_GETVERSION - EXT4_IOC_GETVERSION_OLD - Get the inode i_generation number stored for - each inode. The i_generation number is normally - changed only when new inode is created and it is - particularly useful for network filesystems. The - '_OLD' version of this ioctl is an alias for - FS_IOC_GETVERSION. - - EXT4_IOC_SETVERSION - EXT4_IOC_SETVERSION_OLD - Set the inode i_generation number stored for - each inode. The '_OLD' version of this ioctl - is an alias for FS_IOC_SETVERSION. - - EXT4_IOC_GROUP_EXTEND This ioctl has the same purpose as the resize - mount option. It allows to resize filesystem - to the end of the last existing block group, - further resize has to be done with resize2fs, - either online, or offline. The argument points - to the unsigned logn number representing the - filesystem new block count. - - EXT4_IOC_MOVE_EXT Move the block extents from orig_fd (the one - this ioctl is pointing to) to the donor_fd (the - one specified in move_extent structure passed - as an argument to this ioctl). Then, exchange - inode metadata between orig_fd and donor_fd. - This is especially useful for online - defragmentation, because the allocator has the - opportunity to allocate moved blocks better, - ideally into one contiguous extent. - - EXT4_IOC_GROUP_ADD Add a new group descriptor to an existing or - new group descriptor block. The new group - descriptor is described by ext4_new_group_input - structure, which is passed as an argument to - this ioctl. This is especially useful in - conjunction with EXT4_IOC_GROUP_EXTEND, - which allows online resize of the filesystem - to the end of the last existing block group. - Those two ioctls combined is used in userspace - online resize tool (e.g. resize2fs). - - EXT4_IOC_MIGRATE This ioctl operates on the filesystem itself. - It converts (migrates) ext3 indirect block mapped - inode to ext4 extent mapped inode by walking - through indirect block mapping of the original - inode and converting contiguous block ranges - into ext4 extents of the temporary inode. Then, - inodes are swapped. This ioctl might help, when - migrating from ext3 to ext4 filesystem, however - suggestion is to create fresh ext4 filesystem - and copy data from the backup. Note, that - filesystem has to support extents for this ioctl - to work. - - EXT4_IOC_ALLOC_DA_BLKS Force all of the delay allocated blocks to be - allocated to preserve application-expected ext3 - behaviour. Note that this will also start - triggering a write of the data blocks, but this - behaviour may change in the future as it is - not necessary and has been done this way only - for sake of simplicity. - - EXT4_IOC_RESIZE_FS Resize the filesystem to a new size. The number - of blocks of resized filesystem is passed in via - 64 bit integer argument. The kernel allocates - bitmaps and inode table, the userspace tool thus - just passes the new number of blocks. - - EXT4_IOC_SWAP_BOOT Swap i_blocks and associated attributes - (like i_blocks, i_size, i_flags, ...) from - the specified inode with inode - EXT4_BOOT_LOADER_INO (#5). This is typically - used to store a boot loader in a secure part of - the filesystem, where it can't be changed by a - normal user by accident. - The data blocks of the previous boot loader - will be associated with the given inode. -============================= ================================================= - -References -========== - -kernel source: <file:fs/ext4/> - <file:fs/jbd2/> - -programs: http://e2fsprogs.sourceforge.net/ - -useful links: http://fedoraproject.org/wiki/ext3-devel - http://www.bullopensource.org/ext4/ - http://ext4.wiki.kernel.org/index.php/Main_Page - http://fedoraproject.org/wiki/Features/Ext4 diff --git a/Documentation/filesystems/ext4/ondisk/globals.rst b/Documentation/filesystems/ext4/globals.rst index 368bf7662b96..368bf7662b96 100644 --- a/Documentation/filesystems/ext4/ondisk/globals.rst +++ b/Documentation/filesystems/ext4/globals.rst diff --git a/Documentation/filesystems/ext4/ondisk/group_descr.rst b/Documentation/filesystems/ext4/group_descr.rst index 759827e5d2cf..0f783ed88592 100644 --- a/Documentation/filesystems/ext4/ondisk/group_descr.rst +++ b/Documentation/filesystems/ext4/group_descr.rst @@ -43,7 +43,7 @@ entire bitmap. The block group descriptor is laid out in ``struct ext4_group_desc``. .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -157,7 +157,7 @@ The block group descriptor is laid out in ``struct ext4_group_desc``. Block group flags can be any combination of the following: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value diff --git a/Documentation/filesystems/ext4/ondisk/ifork.rst b/Documentation/filesystems/ext4/ifork.rst index 5dbe3b2b121a..b9816d5a896b 100644 --- a/Documentation/filesystems/ext4/ondisk/ifork.rst +++ b/Documentation/filesystems/ext4/ifork.rst @@ -68,7 +68,7 @@ The extent tree header is recorded in ``struct ext4_extent_header``, which is 12 bytes long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -104,7 +104,7 @@ Internal nodes of the extent tree, also known as index nodes, are recorded as ``struct ext4_extent_idx``, and are 12 bytes long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -134,7 +134,7 @@ Leaf nodes of the extent tree are recorded as ``struct ext4_extent``, and are also 12 bytes long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -174,7 +174,7 @@ including) the checksum itself. ``struct ext4_extent_tail`` is 4 bytes long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset diff --git a/Documentation/filesystems/ext4/index.rst b/Documentation/filesystems/ext4/index.rst index 71121605558c..3be3e54d480d 100644 --- a/Documentation/filesystems/ext4/index.rst +++ b/Documentation/filesystems/ext4/index.rst @@ -1,17 +1,14 @@ .. SPDX-License-Identifier: GPL-2.0 -=============== -ext4 Filesystem -=============== - -General usage and on-disk artifacts writen by ext4. More documentation may -be ported from the wiki as time permits. This should be considered the -canonical source of information as the details here have been reviewed by -the ext4 community. +=================================== +ext4 Data Structures and Algorithms +=================================== .. toctree:: - :maxdepth: 5 + :maxdepth: 6 :numbered: - ext4 - ondisk/index + about.rst + overview.rst + globals.rst + dynamic.rst diff --git a/Documentation/filesystems/ext4/ondisk/inlinedata.rst b/Documentation/filesystems/ext4/inlinedata.rst index d1075178ce0b..d1075178ce0b 100644 --- a/Documentation/filesystems/ext4/ondisk/inlinedata.rst +++ b/Documentation/filesystems/ext4/inlinedata.rst diff --git a/Documentation/filesystems/ext4/ondisk/inodes.rst b/Documentation/filesystems/ext4/inodes.rst index 655ce898f3f5..6bd35e506b6f 100644 --- a/Documentation/filesystems/ext4/ondisk/inodes.rst +++ b/Documentation/filesystems/ext4/inodes.rst @@ -29,8 +29,9 @@ and the inode structure itself. The inode table entry is laid out in ``struct ext4_inode``. .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 + :class: longtable * - Offset - Size @@ -176,7 +177,7 @@ The inode table entry is laid out in ``struct ext4_inode``. The ``i_mode`` value is a combination of the following flags: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -227,7 +228,7 @@ The ``i_mode`` value is a combination of the following flags: The ``i_flags`` field is a combination of these values: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -314,7 +315,7 @@ The ``osd1`` field has multiple meanings depending on the creator: Linux: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -331,7 +332,7 @@ Linux: Hurd: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -346,7 +347,7 @@ Hurd: Masix: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -365,7 +366,7 @@ The ``osd2`` field has multiple meanings depending on the filesystem creator: Linux: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -402,7 +403,7 @@ Linux: Hurd: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -433,7 +434,7 @@ Hurd: Masix: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset diff --git a/Documentation/filesystems/ext4/ondisk/journal.rst b/Documentation/filesystems/ext4/journal.rst index e7031af86876..ea613ee701f5 100644 --- a/Documentation/filesystems/ext4/ondisk/journal.rst +++ b/Documentation/filesystems/ext4/journal.rst @@ -48,7 +48,7 @@ Layout Generally speaking, the journal has this format: .. list-table:: - :widths: 1 1 78 + :widths: 16 48 16 :header-rows: 1 * - Superblock @@ -76,7 +76,7 @@ The journal superblock will be in the next full block after the superblock. .. list-table:: - :widths: 1 1 1 1 76 + :widths: 12 12 12 32 12 :header-rows: 1 * - 1024 bytes of padding @@ -98,7 +98,7 @@ Every block in the journal starts with a common 12-byte header ``struct journal_header_s``: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -124,7 +124,7 @@ Every block in the journal starts with a common 12-byte header The journal block type can be any one of: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -154,7 +154,7 @@ The journal superblock is recorded as ``struct journal_superblock_s``, which is 1024 bytes long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -264,7 +264,7 @@ which is 1024 bytes long: The journal compat features are any combination of the following: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -278,7 +278,7 @@ The journal compat features are any combination of the following: The journal incompat features are any combination of the following: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -306,7 +306,7 @@ Journal checksum type codes are one of the following. crc32 or crc32c are the most likely choices. .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -330,7 +330,7 @@ described by a data structure, but here is the block structure anyway. Descriptor blocks consume at least 36 bytes, but use a full block: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -355,7 +355,7 @@ defined as ``struct journal_block_tag3_s``, which looks like the following. The size is 16 or 32 bytes. .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -400,7 +400,7 @@ following. The size is 16 or 32 bytes. The journal tag flags are any combination of the following: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -421,7 +421,7 @@ is defined as ``struct journal_block_tag_s``, which looks like the following. The size is 8, 12, 24, or 28 bytes: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -471,7 +471,7 @@ JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 are set, the end of the block is a ``struct jbd2_journal_block_tail``, which looks like this: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -513,7 +513,7 @@ Revocation blocks are described in length, but use a full block: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -543,7 +543,7 @@ JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 are set, the end of the revocation block is a ``struct jbd2_journal_revoke_tail``, which has this format: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -567,7 +567,7 @@ The commit block is described by ``struct commit_header``, which is 32 bytes long (but uses a full block): .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset diff --git a/Documentation/filesystems/ext4/ondisk/mmp.rst b/Documentation/filesystems/ext4/mmp.rst index b7d7a3137f80..25660981d93c 100644 --- a/Documentation/filesystems/ext4/ondisk/mmp.rst +++ b/Documentation/filesystems/ext4/mmp.rst @@ -32,7 +32,7 @@ The checksum is calculated against the FS UUID and the MMP structure. The MMP structure (``struct mmp_struct``) is as follows: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 12 20 40 :header-rows: 1 * - Offset diff --git a/Documentation/filesystems/ext4/ondisk/index.rst b/Documentation/filesystems/ext4/ondisk/index.rst deleted file mode 100644 index f7d082c3a435..000000000000 --- a/Documentation/filesystems/ext4/ondisk/index.rst +++ /dev/null @@ -1,9 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -============================== -Data Structures and Algorithms -============================== -.. include:: about.rst -.. include:: overview.rst -.. include:: globals.rst -.. include:: dynamic.rst diff --git a/Documentation/filesystems/ext4/ondisk/overview.rst b/Documentation/filesystems/ext4/overview.rst index cbab18baba12..cbab18baba12 100644 --- a/Documentation/filesystems/ext4/ondisk/overview.rst +++ b/Documentation/filesystems/ext4/overview.rst diff --git a/Documentation/filesystems/ext4/ondisk/special_inodes.rst b/Documentation/filesystems/ext4/special_inodes.rst index a82f70c9baeb..9061aabba827 100644 --- a/Documentation/filesystems/ext4/ondisk/special_inodes.rst +++ b/Documentation/filesystems/ext4/special_inodes.rst @@ -6,7 +6,7 @@ Special inodes ext4 reserves some inode for special features, as follows: .. list-table:: - :widths: 1 79 + :widths: 6 70 :header-rows: 1 * - inode Number diff --git a/Documentation/filesystems/ext4/ondisk/super.rst b/Documentation/filesystems/ext4/super.rst index 5f81dd87e0b9..04ff079a2acf 100644 --- a/Documentation/filesystems/ext4/ondisk/super.rst +++ b/Documentation/filesystems/ext4/super.rst @@ -19,7 +19,7 @@ The ext4 superblock is laid out as follows in ``struct ext4_super_block``: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -483,7 +483,7 @@ The ext4 superblock is laid out as follows in The superblock state is some combination of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value @@ -500,7 +500,7 @@ The superblock state is some combination of the following: The superblock error policy is one of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value @@ -517,7 +517,7 @@ The superblock error policy is one of the following: The filesystem creator is one of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value @@ -538,7 +538,7 @@ The filesystem creator is one of the following: The superblock revision is one of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value @@ -556,7 +556,7 @@ The superblock compatible features field is a combination of any of the following: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -595,7 +595,7 @@ The superblock incompatible features field is a combination of any of the following: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -647,7 +647,7 @@ The superblock read-only compatible features field is a combination of any of the following: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -702,7 +702,7 @@ the following: The ``s_def_hash_version`` field is one of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value @@ -725,7 +725,7 @@ The ``s_def_hash_version`` field is one of the following: The ``s_default_mount_opts`` field is any combination of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value @@ -767,7 +767,7 @@ The ``s_default_mount_opts`` field is any combination of the following: The ``s_flags`` field is any combination of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value @@ -784,7 +784,7 @@ The ``s_flags`` field is any combination of the following: The ``s_encrypt_algos`` list can contain any of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt index e5edd29687b5..e46c2147ddf8 100644 --- a/Documentation/filesystems/f2fs.txt +++ b/Documentation/filesystems/f2fs.txt @@ -172,9 +172,10 @@ fault_type=%d Support configuring fault injection type, should be FAULT_DIR_DEPTH 0x000000100 FAULT_EVICT_INODE 0x000000200 FAULT_TRUNCATE 0x000000400 - FAULT_IO 0x000000800 + FAULT_READ_IO 0x000000800 FAULT_CHECKPOINT 0x000001000 FAULT_DISCARD 0x000002000 + FAULT_WRITE_IO 0x000004000 mode=%s Control block allocation mode which supports "adaptive" and "lfs". In "lfs" mode, there should be no random writes towards main area. @@ -211,6 +212,11 @@ fsync_mode=%s Control the policy of fsync. Currently supports "posix", non-atomic files likewise "nobarrier" mount option. test_dummy_encryption Enable dummy encryption, which provides a fake fscrypt context. The fake fscrypt context is used by xfstests. +checkpoint=%s Set to "disable" to turn off checkpointing. Set to "enable" + to reenable checkpointing. Is enabled by default. While + disabled, any unmounting or unexpected shutdowns will cause + the filesystem contents to appear as they did when the + filesystem was mounted with that option. ================================================================================ DEBUGFS ENTRIES diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst index 48b424de85bb..cfbc18f0d9c9 100644 --- a/Documentation/filesystems/fscrypt.rst +++ b/Documentation/filesystems/fscrypt.rst @@ -191,21 +191,11 @@ Currently, the following pairs of encryption modes are supported: - AES-256-XTS for contents and AES-256-CTS-CBC for filenames - AES-128-CBC for contents and AES-128-CTS-CBC for filenames -- Speck128/256-XTS for contents and Speck128/256-CTS-CBC for filenames It is strongly recommended to use AES-256-XTS for contents encryption. AES-128-CBC was added only for low-powered embedded devices with crypto accelerators such as CAAM or CESA that do not support XTS. -Similarly, Speck128/256 support was only added for older or low-end -CPUs which cannot do AES fast enough -- especially ARM CPUs which have -NEON instructions but not the Cryptography Extensions -- and for which -it would not otherwise be feasible to use encryption at all. It is -not recommended to use Speck on CPUs that have AES instructions. -Speck support is only available if it has been enabled in the crypto -API via CONFIG_CRYPTO_SPECK. Also, on ARM platforms, to get -acceptable performance CONFIG_CRYPTO_SPECK_NEON must be enabled. - New encryption modes can be added relatively easily, without changes to individual filesystems. However, authenticated encryption (AE) modes are not currently supported because of the difficulty of dealing diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX deleted file mode 100644 index 53f3b596ac0d..000000000000 --- a/Documentation/filesystems/nfs/00-INDEX +++ /dev/null @@ -1,26 +0,0 @@ -00-INDEX - - this file (nfs-related documentation). -Exporting - - explanation of how to make filesystems exportable. -fault_injection.txt - - information for using fault injection on the server -knfsd-stats.txt - - statistics which the NFS server makes available to user space. -nfs.txt - - nfs client, and DNS resolution for fs_locations. -nfs41-server.txt - - info on the Linux server implementation of NFSv4 minor version 1. -nfs-rdma.txt - - how to install and setup the Linux NFS/RDMA client and server software -nfsd-admin-interfaces.txt - - Administrative interfaces for nfsd. -nfsroot.txt - - short guide on setting up a diskless box with NFS root filesystem. -pnfs.txt - - short explanation of some of the internals of the pnfs client code -rpc-cache.txt - - introduction to the caching mechanisms in the sunrpc layer. -idmapper.txt - - information for configuring request-keys to be used by idmapper -rpc-server-gss.txt - - Information on GSS authentication support in the NFS Server diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index 7b7b845c490a..321d74b73937 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -622,3 +622,14 @@ in your dentry operations instead. alloc_file_clone(file, flags, ops) does not affect any caller's references. On success you get a new struct file sharing the mount/dentry with the original, on failure - ERR_PTR(). +-- +[recommended] + ->lookup() instances doing an equivalent of + if (IS_ERR(inode)) + return ERR_CAST(inode); + return d_splice_alias(inode, dentry); + don't need to bother with the check - d_splice_alias() will do the + right thing when given ERR_PTR(...) as inode. Moreover, passing NULL + inode to d_splice_alias() will also do the right thing (equivalent of + d_add(dentry, NULL); return NULL;), so that kind of special cases + also doesn't need a separate treatment. diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 22b4b00dee31..12a5e6e693b6 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -858,6 +858,7 @@ Writeback: 0 kB AnonPages: 861800 kB Mapped: 280372 kB Shmem: 644 kB +KReclaimable: 168048 kB Slab: 284364 kB SReclaimable: 159856 kB SUnreclaim: 124508 kB @@ -925,6 +926,9 @@ AnonHugePages: Non-file backed huge pages mapped into userspace page tables ShmemHugePages: Memory used by shared memory (shmem) and tmpfs allocated with huge pages ShmemPmdMapped: Shared memory mapped into userspace with huge pages +KReclaimable: Kernel allocations that the kernel will attempt to reclaim + under memory pressure. Includes SReclaimable (below), and other + direct allocations with a shrinker. Slab: in-kernel data structures cache SReclaimable: Part of Slab, that might be reclaimed, such as caches SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure diff --git a/Documentation/fmc/00-INDEX b/Documentation/fmc/00-INDEX deleted file mode 100644 index 431c69570f43..000000000000 --- a/Documentation/fmc/00-INDEX +++ /dev/null @@ -1,38 +0,0 @@ - -Documentation in this directory comes from sections of the manual we -wrote for the externally-developed fmc-bus package. The complete -manual as of today (2013-02) is available in PDF format at -http://www.ohwr.org/projects/fmc-bus/files - -00-INDEX - - this file. - -FMC-and-SDB.txt - - What are FMC and SDB, basic concepts for this framework - -API.txt - - The functions that are exported by the bus driver - -parameters.txt - - The module parameters - -carrier.txt - - writing a carrier (a device) - -mezzanine.txt - - writing code for your mezzanine (a driver) - -identifiers.txt - - how identification and matching works - -fmc-fakedev.txt - - about drivers/fmc/fmc-fakedev.ko - -fmc-trivial.txt - - about drivers/fmc/fmc-trivial.ko - -fmc-write-eeprom.txt - - about drivers/fmc/fmc-write-eeprom.ko - -fmc-chardev.txt - - about drivers/fmc/fmc-chardev.ko diff --git a/Documentation/gpio/00-INDEX b/Documentation/gpio/00-INDEX deleted file mode 100644 index 17e19a68058f..000000000000 --- a/Documentation/gpio/00-INDEX +++ /dev/null @@ -1,4 +0,0 @@ -00-INDEX - - This file -sysfs.txt - - Information about the GPIO sysfs interface diff --git a/Documentation/ide/00-INDEX b/Documentation/ide/00-INDEX deleted file mode 100644 index 22f98ca79539..000000000000 --- a/Documentation/ide/00-INDEX +++ /dev/null @@ -1,14 +0,0 @@ -00-INDEX - - this file -ChangeLog.ide-cd.1994-2004 - - ide-cd changelog -ChangeLog.ide-floppy.1996-2002 - - ide-floppy changelog -ChangeLog.ide-tape.1995-2002 - - ide-tape changelog -ide-tape.txt - - info on the IDE ATAPI streaming tape driver -ide.txt - - important info for users of ATA devices (IDE/EIDE disks and CD-ROMS). -warm-plug-howto.txt - - using sysfs to remove and add IDE devices.
\ No newline at end of file diff --git a/Documentation/index.rst b/Documentation/index.rst index 5db7e87c7cb1..c858c2e66e36 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -22,10 +22,7 @@ The following describes the license of the Linux kernel source code (GPLv2), how to properly mark the license of individual files in the source tree, as well as links to the full license text. -.. toctree:: - :maxdepth: 2 - - process/license-rules.rst +* :ref:`kernel_licensing` User-oriented documentation --------------------------- diff --git a/Documentation/input/event-codes.rst b/Documentation/input/event-codes.rst index a8c0873beb95..cef220c176a4 100644 --- a/Documentation/input/event-codes.rst +++ b/Documentation/input/event-codes.rst @@ -190,7 +190,16 @@ A few EV_REL codes have special meanings: * REL_WHEEL, REL_HWHEEL: - These codes are used for vertical and horizontal scroll wheels, - respectively. + respectively. The value is the number of "notches" moved on the wheel, the + physical size of which varies by device. For high-resolution wheels (which + report multiple events for each notch of movement, or do not have notches) + this may be an approximation based on the high-resolution scroll events. + +* REL_WHEEL_HI_RES: + + - If a vertical scroll wheel supports high-resolution scrolling, this code + will be emitted in addition to REL_WHEEL. The value is the (approximate) + distance travelled by the user's finger, in microns. EV_ABS ------ diff --git a/Documentation/ioctl/00-INDEX b/Documentation/ioctl/00-INDEX deleted file mode 100644 index c1a925787950..000000000000 --- a/Documentation/ioctl/00-INDEX +++ /dev/null @@ -1,12 +0,0 @@ -00-INDEX - - this file -botching-up-ioctls.txt - - how to avoid botching up ioctls -cdrom.txt - - summary of CDROM ioctl calls -hdio.txt - - summary of HDIO_ ioctl calls -ioctl-decoding.txt - - how to decode the bits of an IOCTL code -ioctl-number.txt - - how to implement and register device/driver ioctl calls diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index 13a7c999c04a..d05d93761653 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt @@ -201,7 +201,7 @@ Code Seq#(hex) Include File Comments 'X' 01 linux/pktcdvd.h conflict! 'Y' all linux/cyclades.h 'Z' 14-15 drivers/message/fusion/mptctl.h -'[' 00-07 linux/usb/tmc.h USB Test and Measurement Devices +'[' 00-3F linux/usb/tmc.h USB Test and Measurement Devices <mailto:gregkh@linuxfoundation.org> 'a' all linux/atm*.h, linux/sonet.h ATM on linux <http://lrcwww.epfl.ch/> diff --git a/Documentation/isdn/00-INDEX b/Documentation/isdn/00-INDEX deleted file mode 100644 index 2d1889b6c1fa..000000000000 --- a/Documentation/isdn/00-INDEX +++ /dev/null @@ -1,42 +0,0 @@ -00-INDEX - - this file (info on ISDN implementation for Linux) -CREDITS - - list of the kind folks that brought you this stuff. -HiSax.cert - - information about the ITU approval certification of the HiSax driver. -INTERFACE - - description of isdn4linux Link Level and Hardware Level interfaces. -INTERFACE.fax - - description of the fax subinterface of isdn4linux. -INTERFACE.CAPI - - description of kernel CAPI Link Level to Hardware Level interface. -README - - general info on what you need and what to do for Linux ISDN. -README.FAQ - - general info for FAQ. -README.HiSax - - info on the HiSax driver which replaces the old teles. -README.audio - - info for running audio over ISDN. -README.avmb1 - - info on driver for AVM-B1 ISDN card. -README.concap - - info on "CONCAP" encapsulation protocol interface used for X.25. -README.diversion - - info on module for isdn diversion services. -README.fax - - info for using Fax over ISDN. -README.gigaset - - info on the drivers for Siemens Gigaset ISDN adapters -README.hfc-pci - - info on hfc-pci based cards. -README.hysdn - - info on driver for Hypercope active HYSDN cards -README.mISDN - - info on the Modular ISDN subsystem (mISDN) -README.syncppp - - info on running Sync PPP over ISDN. -README.x25 - - info for running X.25 over ISDN. -syncPPP.FAQ - - frequently asked questions about running PPP over ISDN. diff --git a/Documentation/kbuild/00-INDEX b/Documentation/kbuild/00-INDEX deleted file mode 100644 index 8c5e6aa78004..000000000000 --- a/Documentation/kbuild/00-INDEX +++ /dev/null @@ -1,14 +0,0 @@ -00-INDEX - - this file: info on the kernel build process -headers_install.txt - - how to export Linux headers for use by userspace -kbuild.txt - - developer information on kbuild -kconfig.txt - - usage help for make *config -kconfig-language.txt - - specification of Config Language, the language in Kconfig files -makefiles.txt - - developer information for linux kernel makefiles -modules.txt - - how to build modules and to install them diff --git a/Documentation/laptops/00-INDEX b/Documentation/laptops/00-INDEX deleted file mode 100644 index 86169dc766f7..000000000000 --- a/Documentation/laptops/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - This file -asus-laptop.txt - - information on the Asus Laptop Extras driver. -disk-shock-protection.txt - - information on hard disk shock protection. -laptop-mode.txt - - how to conserve battery power using laptop-mode. -sony-laptop.txt - - Sony Notebook Control Driver (SNC) Readme. -sonypi.txt - - info on Linux Sony Programmable I/O Device support. -thinkpad-acpi.txt - - information on the (IBM and Lenovo) ThinkPad ACPI Extras driver. -toshiba_haps.txt - - information on the Toshiba HDD Active Protection Sensor driver. diff --git a/Documentation/leds/00-INDEX b/Documentation/leds/00-INDEX deleted file mode 100644 index ae626b29a740..000000000000 --- a/Documentation/leds/00-INDEX +++ /dev/null @@ -1,32 +0,0 @@ -00-INDEX - - This file -leds-blinkm.txt - - Driver for BlinkM LED-devices. -leds-class.txt - - documents LED handling under Linux. -leds-class-flash.txt - - documents flash LED handling under Linux. -leds-lm3556.txt - - notes on how to use the leds-lm3556 driver. -leds-lp3944.txt - - notes on how to use the leds-lp3944 driver. -leds-lp5521.txt - - notes on how to use the leds-lp5521 driver. -leds-lp5523.txt - - notes on how to use the leds-lp5523 driver. -leds-lp5562.txt - - notes on how to use the leds-lp5562 driver. -leds-lp55xx.txt - - description about lp55xx common driver. -leds-lm3556.txt - - notes on how to use the leds-lm3556 driver. -leds-mlxcpld.txt - - notes on how to use the leds-mlxcpld driver. -ledtrig-oneshot.txt - - One-shot LED trigger for both sporadic and dense events. -ledtrig-transient.txt - - LED Transient Trigger, one shot timer activation. -ledtrig-usbport.txt - - notes on how to use the drivers/usb/core/ledtrig-usbport.c trigger. -uleds.txt - - notes on how to use the uleds driver. diff --git a/Documentation/locking/00-INDEX b/Documentation/locking/00-INDEX deleted file mode 100644 index c256c9bee2a4..000000000000 --- a/Documentation/locking/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - this file. -lockdep-design.txt - - documentation on the runtime locking correctness validator. -lockstat.txt - - info on collecting statistics on locks (and contention). -mutex-design.txt - - info on the generic mutex subsystem. -rt-mutex-design.txt - - description of the RealTime mutex implementation design. -rt-mutex.txt - - desc. of RT-mutex subsystem with PI (Priority Inheritance) support. -spinlocks.txt - - info on using spinlocks to provide exclusive access in kernel. -ww-mutex-design.txt - - Intro to Mutex wait/would deadlock handling.s diff --git a/Documentation/m68k/00-INDEX b/Documentation/m68k/00-INDEX deleted file mode 100644 index 2be8c6b00e74..000000000000 --- a/Documentation/m68k/00-INDEX +++ /dev/null @@ -1,7 +0,0 @@ -00-INDEX - - this file -README.buddha - - Amiga Buddha and Catweasel IDE Driver -kernel-options.txt - - command line options for Linux/m68k - diff --git a/Documentation/mips/00-INDEX b/Documentation/mips/00-INDEX deleted file mode 100644 index 8ae9cffc2262..000000000000 --- a/Documentation/mips/00-INDEX +++ /dev/null @@ -1,4 +0,0 @@ -00-INDEX - - this file. -AU1xxx_IDE.README - - README for MIPS AU1XXX IDE driver. diff --git a/Documentation/mmc/00-INDEX b/Documentation/mmc/00-INDEX deleted file mode 100644 index 4623bc0aa0bb..000000000000 --- a/Documentation/mmc/00-INDEX +++ /dev/null @@ -1,10 +0,0 @@ -00-INDEX - - this file -mmc-dev-attrs.txt - - info on SD and MMC device attributes -mmc-dev-parts.txt - - info on SD and MMC device partitions -mmc-async-req.txt - - info on mmc asynchronous requests -mmc-tools.txt - - info on mmc-utils tools diff --git a/Documentation/netlabel/00-INDEX b/Documentation/netlabel/00-INDEX deleted file mode 100644 index 837bf35990e2..000000000000 --- a/Documentation/netlabel/00-INDEX +++ /dev/null @@ -1,10 +0,0 @@ -00-INDEX - - this file. -cipso_ipv4.txt - - documentation on the IPv4 CIPSO protocol engine. -draft-ietf-cipso-ipsecurity-01.txt - - IETF draft of the CIPSO protocol, dated 16 July 1992. -introduction.txt - - NetLabel introduction, READ THIS FIRST. -lsm_interface.txt - - documentation on the NetLabel kernel security module API. diff --git a/Documentation/netlabel/cipso_ipv4.txt b/Documentation/netlabel/cipso_ipv4.txt index 93dacb132c3c..a6075481fd60 100644 --- a/Documentation/netlabel/cipso_ipv4.txt +++ b/Documentation/netlabel/cipso_ipv4.txt @@ -6,11 +6,12 @@ May 17, 2006 * Overview -The NetLabel CIPSO/IPv4 protocol engine is based on the IETF Commercial IP -Security Option (CIPSO) draft from July 16, 1992. A copy of this draft can be -found in this directory, consult '00-INDEX' for the filename. While the IETF -draft never made it to an RFC standard it has become a de-facto standard for -labeled networking and is used in many trusted operating systems. +The NetLabel CIPSO/IPv4 protocol engine is based on the IETF Commercial +IP Security Option (CIPSO) draft from July 16, 1992. A copy of this +draft can be found in this directory +(draft-ietf-cipso-ipsecurity-01.txt). While the IETF draft never made +it to an RFC standard it has become a de-facto standard for labeled +networking and is used in many trusted operating systems. * Outbound Packet Processing diff --git a/Documentation/netlabel/introduction.txt b/Documentation/netlabel/introduction.txt index 5ecd8d1dcf4e..3caf77bcff0f 100644 --- a/Documentation/netlabel/introduction.txt +++ b/Documentation/netlabel/introduction.txt @@ -22,7 +22,7 @@ refrain from calling the protocol engines directly, instead they should use the NetLabel kernel security module API described below. Detailed information about each NetLabel protocol engine can be found in this -directory, consult '00-INDEX' for filenames. +directory. * Communication Layer diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX deleted file mode 100644 index 02a323c43261..000000000000 --- a/Documentation/networking/00-INDEX +++ /dev/null @@ -1,234 +0,0 @@ -00-INDEX - - this file -3c509.txt - - information on the 3Com Etherlink III Series Ethernet cards. -6pack.txt - - info on the 6pack protocol, an alternative to KISS for AX.25 -LICENSE.qla3xxx - - GPLv2 for QLogic Linux Networking HBA Driver -LICENSE.qlge - - GPLv2 for QLogic Linux qlge NIC Driver -LICENSE.qlcnic - - GPLv2 for QLogic Linux qlcnic NIC Driver -PLIP.txt - - PLIP: The Parallel Line Internet Protocol device driver -README.ipw2100 - - README for the Intel PRO/Wireless 2100 driver. -README.ipw2200 - - README for the Intel PRO/Wireless 2915ABG and 2200BG driver. -README.sb1000 - - info on General Instrument/NextLevel SURFboard1000 cable modem. -altera_tse.txt - - Altera Triple-Speed Ethernet controller. -arcnet-hardware.txt - - tons of info on ARCnet, hubs, jumper settings for ARCnet cards, etc. -arcnet.txt - - info on the using the ARCnet driver itself. -atm.txt - - info on where to get ATM programs and support for Linux. -ax25.txt - - info on using AX.25 and NET/ROM code for Linux -baycom.txt - - info on the driver for Baycom style amateur radio modems -bonding.txt - - Linux Ethernet Bonding Driver HOWTO: link aggregation in Linux. -bridge.txt - - where to get user space programs for ethernet bridging with Linux. -cdc_mbim.txt - - 3G/LTE USB modem (Mobile Broadband Interface Model) -checksum-offloads.txt - - Explanation of checksum offloads; LCO, RCO -cops.txt - - info on the COPS LocalTalk Linux driver -cs89x0.txt - - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver -cxacru.txt - - Conexant AccessRunner USB ADSL Modem -cxacru-cf.py - - Conexant AccessRunner USB ADSL Modem configuration file parser -cxgb.txt - - Release Notes for the Chelsio N210 Linux device driver. -dccp.txt - - the Datagram Congestion Control Protocol (DCCP) (RFC 4340..42). -dctcp.txt - - DataCenter TCP congestion control -de4x5.txt - - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver -decnet.txt - - info on using the DECnet networking layer in Linux. -dl2k.txt - - README for D-Link DL2000-based Gigabit Ethernet Adapters (dl2k.ko). -dm9000.txt - - README for the Simtec DM9000 Network driver. -dmfe.txt - - info on the Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver. -dns_resolver.txt - - The DNS resolver module allows kernel servies to make DNS queries. -driver.txt - - Softnet driver issues. -ena.txt - - info on Amazon's Elastic Network Adapter (ENA) -e100.txt - - info on Intel's EtherExpress PRO/100 line of 10/100 boards -e1000.txt - - info on Intel's E1000 line of gigabit ethernet boards -e1000e.txt - - README for the Intel Gigabit Ethernet Driver (e1000e). -eql.txt - - serial IP load balancing -fib_trie.txt - - Level Compressed Trie (LC-trie) notes: a structure for routing. -filter.txt - - Linux Socket Filtering -fore200e.txt - - FORE Systems PCA-200E/SBA-200E ATM NIC driver info. -framerelay.txt - - info on using Frame Relay/Data Link Connection Identifier (DLCI). -gen_stats.txt - - Generic networking statistics for netlink users. -generic-hdlc.txt - - The generic High Level Data Link Control (HDLC) layer. -generic_netlink.txt - - info on Generic Netlink -gianfar.txt - - Gianfar Ethernet Driver. -i40e.txt - - README for the Intel Ethernet Controller XL710 Driver (i40e). -i40evf.txt - - Short note on the Driver for the Intel(R) XL710 X710 Virtual Function -ieee802154.txt - - Linux IEEE 802.15.4 implementation, API and drivers -igb.txt - - README for the Intel Gigabit Ethernet Driver (igb). -igbvf.txt - - README for the Intel Gigabit Ethernet Driver (igbvf). -ip-sysctl.txt - - /proc/sys/net/ipv4/* variables -ip_dynaddr.txt - - IP dynamic address hack e.g. for auto-dialup links -ipddp.txt - - AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation -iphase.txt - - Interphase PCI ATM (i)Chip IA Linux driver info. -ipsec.txt - - Note on not compressing IPSec payload and resulting failed policy check. -ipv6.txt - - Options to the ipv6 kernel module. -ipvs-sysctl.txt - - Per-inode explanation of the /proc/sys/net/ipv4/vs interface. -irda.txt - - where to get IrDA (infrared) utilities and info for Linux. -ixgb.txt - - README for the Intel 10 Gigabit Ethernet Driver (ixgb). -ixgbe.txt - - README for the Intel 10 Gigabit Ethernet Driver (ixgbe). -ixgbevf.txt - - README for the Intel Virtual Function (VF) Driver (ixgbevf). -l2tp.txt - - User guide to the L2TP tunnel protocol. -lapb-module.txt - - programming information of the LAPB module. -ltpc.txt - - the Apple or Farallon LocalTalk PC card driver -mac80211-auth-assoc-deauth.txt - - authentication and association / deauth-disassoc with max80211 -mac80211-injection.txt - - HOWTO use packet injection with mac80211 -multiqueue.txt - - HOWTO for multiqueue network device support. -netconsole.txt - - The network console module netconsole.ko: configuration and notes. -netdev-features.txt - - Network interface features API description. -netdevices.txt - - info on network device driver functions exported to the kernel. -netif-msg.txt - - Design of the network interface message level setting (NETIF_MSG_*). -netlink_mmap.txt - - memory mapped I/O with netlink -nf_conntrack-sysctl.txt - - list of netfilter-sysctl knobs. -nfc.txt - - The Linux Near Field Communication (NFS) subsystem. -openvswitch.txt - - Open vSwitch developer documentation. -operstates.txt - - Overview of network interface operational states. -packet_mmap.txt - - User guide to memory mapped packet socket rings (PACKET_[RT]X_RING). -phonet.txt - - The Phonet packet protocol used in Nokia cellular modems. -phy.txt - - The PHY abstraction layer. -pktgen.txt - - User guide to the kernel packet generator (pktgen.ko). -policy-routing.txt - - IP policy-based routing -ppp_generic.txt - - Information about the generic PPP driver. -proc_net_tcp.txt - - Per inode overview of the /proc/net/tcp and /proc/net/tcp6 interfaces. -radiotap-headers.txt - - Background on radiotap headers. -ray_cs.txt - - Raylink Wireless LAN card driver info. -rds.txt - - Background on the reliable, ordered datagram delivery method RDS. -regulatory.txt - - Overview of the Linux wireless regulatory infrastructure. -rxrpc.txt - - Guide to the RxRPC protocol. -s2io.txt - - Release notes for Neterion Xframe I/II 10GbE driver. -scaling.txt - - Explanation of network scaling techniques: RSS, RPS, RFS, aRFS, XPS. -sctp.txt - - Notes on the Linux kernel implementation of the SCTP protocol. -secid.txt - - Explanation of the secid member in flow structures. -skfp.txt - - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. -smc9.txt - - the driver for SMC's 9000 series of Ethernet cards -spider_net.txt - - README for the Spidernet Driver (as found in PS3 / Cell BE). -stmmac.txt - - README for the STMicro Synopsys Ethernet driver. -tc-actions-env-rules.txt - - rules for traffic control (tc) actions. -timestamping.txt - - overview of network packet timestamping variants. -tcp.txt - - short blurb on how TCP output takes place. -tcp-thin.txt - - kernel tuning options for low rate 'thin' TCP streams. -team.txt - - pointer to information for ethernet teaming devices. -tlan.txt - - ThunderLAN (Compaq Netelligent 10/100, Olicom OC-2xxx) driver info. -tproxy.txt - - Transparent proxy support user guide. -tuntap.txt - - TUN/TAP device driver, allowing user space Rx/Tx of packets. -udplite.txt - - UDP-Lite protocol (RFC 3828) introduction. -vortex.txt - - info on using 3Com Vortex (3c590, 3c592, 3c595, 3c597) Ethernet cards. -vxge.txt - - README for the Neterion X3100 PCIe Server Adapter. -vxlan.txt - - Virtual extensible LAN overview -x25.txt - - general info on X.25 development. -x25-iface.txt - - description of the X.25 Packet Layer to LAPB device interface. -xfrm_device.txt - - description of XFRM offload API -xfrm_proc.txt - - description of the statistics package for XFRM. -xfrm_sync.txt - - sync patches for XFRM enable migration of an SA between hosts. -xfrm_sysctl.txt - - description of the XFRM configuration options. -z8530drv.txt - - info about Linux driver for Z8530 based HDLC cards for AX.25 diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst index ff929cfab4f4..4ae4f9d8f8fe 100644 --- a/Documentation/networking/af_xdp.rst +++ b/Documentation/networking/af_xdp.rst @@ -159,8 +159,8 @@ log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050 and 3000 refers to the same chunk. -UMEM Completetion Ring -~~~~~~~~~~~~~~~~~~~~~~ +UMEM Completion Ring +~~~~~~~~~~~~~~~~~~~~ The Completion Ring is used transfer ownership of UMEM frames from kernel-space to user-space. Just like the Fill ring, UMEM indicies are diff --git a/Documentation/networking/defza.txt b/Documentation/networking/defza.txt new file mode 100644 index 000000000000..663e4a906751 --- /dev/null +++ b/Documentation/networking/defza.txt @@ -0,0 +1,57 @@ +Notes on the DEC FDDIcontroller 700 (DEFZA-xx) driver v.1.1.4. + + +DEC FDDIcontroller 700 is DEC's first-generation TURBOchannel FDDI +network card, designed in 1990 specifically for the DECstation 5000 +model 200 workstation. The board is a single attachment station and +it was manufactured in two variations, both of which are supported. + +First is the SAS MMF DEFZA-AA option, the original design implementing +the standard MMF-PMD, however with a pair of ST connectors rather than +the usual MIC connector. The other one is the SAS ThinWire/STP DEFZA-CA +option, denoted 700-C, with the network medium selectable by a switch +between the DEC proprietary ThinWire-PMD using a BNC connector and the +standard STP-PMD using a DE-9F connector. This option can interface to +a DECconcentrator 500 device and, in the case of the STP-PMD, also other +FDDI equipment and was designed to make it easier to transition from +existing IEEE 802.3 10BASE2 Ethernet and IEEE 802.5 Token Ring networks +by providing means to reuse existing cabling. + +This driver handles any number of cards installed in a single system. +They get fddi0, fddi1, etc. interface names assigned in the order of +increasing TURBOchannel slot numbers. + +The board only supports DMA on the receive side. Transmission involves +the use of PIO. As a result under a heavy transmission load there will +be a significant impact on system performance. + +The board supports a 64-entry CAM for matching destination addresses. +Two entries are preoccupied by the Directed Beacon and Ring Purger +multicast addresses and the rest is used as a multicast filter. An +all-multi mode is also supported for LLC frames and it is used if +requested explicitly or if the CAM overflows. The promiscuous mode +supports separate enables for LLC and SMT frames, but this driver +doesn't support changing them individually. + + +Known problems: + +None. + + +To do: + +5. MAC address change. The card does not support changing the Media + Access Controller's address registers but a similar effect can be + achieved by adding an alias to the CAM. There is no way to disable + matching against the original address though. + +7. Queueing incoming/outgoing SMT frames in the driver if the SMT + receive/RMC transmit ring is full. (?) + +8. Retrieving/reporting FDDI/SNMP stats. + + +Both success and failure reports are welcome. + +Maciej W. Rozycki <macro@linux-mips.org> diff --git a/Documentation/networking/devlink-params-bnxt.txt b/Documentation/networking/devlink-params-bnxt.txt new file mode 100644 index 000000000000..481aa303d5b4 --- /dev/null +++ b/Documentation/networking/devlink-params-bnxt.txt @@ -0,0 +1,18 @@ +enable_sriov [DEVICE, GENERIC] + Configuration mode: Permanent + +ignore_ari [DEVICE, GENERIC] + Configuration mode: Permanent + +msix_vec_per_pf_max [DEVICE, GENERIC] + Configuration mode: Permanent + +msix_vec_per_pf_min [DEVICE, GENERIC] + Configuration mode: Permanent + +gre_ver_check [DEVICE, DRIVER-SPECIFIC] + Generic Routing Encapsulation (GRE) version check will + be enabled in the device. If disabled, device skips + version checking for incoming packets. + Type: Boolean + Configuration mode: Permanent diff --git a/Documentation/networking/devlink-params.txt b/Documentation/networking/devlink-params.txt new file mode 100644 index 000000000000..ae444ffe73ac --- /dev/null +++ b/Documentation/networking/devlink-params.txt @@ -0,0 +1,42 @@ +Devlink configuration parameters +================================ +Following is the list of configuration parameters via devlink interface. +Each parameter can be generic or driver specific and are device level +parameters. + +Note that the driver-specific files should contain the generic params +they support to, with supported config modes. + +Each parameter can be set in different configuration modes: + runtime - set while driver is running, no reset required. + driverinit - applied while driver initializes, requires restart + driver by devlink reload command. + permanent - written to device's non-volatile memory, hard reset + required. + +Following is the list of parameters: +==================================== +enable_sriov [DEVICE, GENERIC] + Enable Single Root I/O Virtualisation (SRIOV) in + the device. + Type: Boolean + +ignore_ari [DEVICE, GENERIC] + Ignore Alternative Routing-ID Interpretation (ARI) + capability. If enabled, adapter will ignore ARI + capability even when platforms has the support + enabled and creates same number of partitions when + platform does not support ARI. + Type: Boolean + +msix_vec_per_pf_max [DEVICE, GENERIC] + Provides the maximum number of MSIX interrupts that + a device can create. Value is same across all + physical functions (PFs) in the device. + Type: u32 + +msix_vec_per_pf_min [DEVICE, GENERIC] + Provides the minimum number of MSIX interrupts required + for the device initialization. Value is same across all + physical functions (PFs) in the device. + Type: u32 diff --git a/Documentation/networking/dpaa2/ethernet-driver.rst b/Documentation/networking/dpaa2/ethernet-driver.rst new file mode 100644 index 000000000000..90ec940749e8 --- /dev/null +++ b/Documentation/networking/dpaa2/ethernet-driver.rst @@ -0,0 +1,185 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: <isonum.txt> + +=============================== +DPAA2 Ethernet driver +=============================== + +:Copyright: |copy| 2017-2018 NXP + +This file provides documentation for the Freescale DPAA2 Ethernet driver. + +Supported Platforms +=================== +This driver provides networking support for Freescale DPAA2 SoCs, e.g. +LS2080A, LS2088A, LS1088A. + + +Architecture Overview +===================== +Unlike regular NICs, in the DPAA2 architecture there is no single hardware block +representing network interfaces; instead, several separate hardware resources +concur to provide the networking functionality: + +- network interfaces +- queues, channels +- buffer pools +- MAC/PHY + +All hardware resources are allocated and configured through the Management +Complex (MC) portals. MC abstracts most of these resources as DPAA2 objects +and exposes ABIs through which they can be configured and controlled. A few +hardware resources, like queues, do not have a corresponding MC object and +are treated as internal resources of other objects. + +For a more detailed description of the DPAA2 architecture and its object +abstractions see *Documentation/networking/dpaa2/overview.rst*. + +Each Linux net device is built on top of a Datapath Network Interface (DPNI) +object and uses Buffer Pools (DPBPs), I/O Portals (DPIOs) and Concentrators +(DPCONs). + +Configuration interface:: + + ----------------------- + | DPAA2 Ethernet Driver | + ----------------------- + . . . + . . . + . . . . . . . . . . . . + . . . + . . . + ---------- ---------- ----------- + | DPBP API | | DPNI API | | DPCON API | + ---------- ---------- ----------- + . . . software + ======= . ========== . ============ . =================== + . . . hardware + ------------------------------------------ + | MC hardware portals | + ------------------------------------------ + . . . + . . . + ------ ------ ------- + | DPBP | | DPNI | | DPCON | + ------ ------ ------- + +The DPNIs are network interfaces without a direct one-on-one mapping to PHYs. +DPBPs represent hardware buffer pools. Packet I/O is performed in the context +of DPCON objects, using DPIO portals for managing and communicating with the +hardware resources. + +Datapath (I/O) interface:: + + ----------------------------------------------- + | DPAA2 Ethernet Driver | + ----------------------------------------------- + | ^ ^ | | + | | | | | + enqueue| dequeue| data | dequeue| seed | + (Tx) | (Rx, TxC)| avail.| request| buffers| + | | notify| | | + | | | | | + V | | V V + ----------------------------------------------- + | DPIO Driver | + ----------------------------------------------- + | | | | | software + | | | | | ================ + | | | | | hardware + ----------------------------------------------- + | I/O hardware portals | + ----------------------------------------------- + | ^ ^ | | + | | | | | + | | | V | + V | ================ V + ---------------------- | ------------- + queues ---------------------- | | Buffer pool | + ---------------------- | ------------- + ======================= + Channel + +Datapath I/O (DPIO) portals provide enqueue and dequeue services, data +availability notifications and buffer pool management. DPIOs are shared between +all DPAA2 objects (and implicitly all DPAA2 kernel drivers) that work with data +frames, but must be affine to the CPUs for the purpose of traffic distribution. + +Frames are transmitted and received through hardware frame queues, which can be +grouped in channels for the purpose of hardware scheduling. The Ethernet driver +enqueues TX frames on egress queues and after transmission is complete a TX +confirmation frame is sent back to the CPU. + +When frames are available on ingress queues, a data availability notification +is sent to the CPU; notifications are raised per channel, so even if multiple +queues in the same channel have available frames, only one notification is sent. +After a channel fires a notification, is must be explicitly rearmed. + +Each network interface can have multiple Rx, Tx and confirmation queues affined +to CPUs, and one channel (DPCON) for each CPU that services at least one queue. +DPCONs are used to distribute ingress traffic to different CPUs via the cores' +affine DPIOs. + +The role of hardware buffer pools is storage of ingress frame data. Each network +interface has a privately owned buffer pool which it seeds with kernel allocated +buffers. + + +DPNIs are decoupled from PHYs; a DPNI can be connected to a PHY through a DPMAC +object or to another DPNI through an internal link, but the connection is +managed by MC and completely transparent to the Ethernet driver. + +:: + + --------- --------- --------- + | eth if1 | | eth if2 | | eth ifn | + --------- --------- --------- + . . . + . . . + . . . + --------------------------- + | DPAA2 Ethernet Driver | + --------------------------- + . . . + . . . + . . . + ------ ------ ------ ------- + | DPNI | | DPNI | | DPNI | | DPMAC |----+ + ------ ------ ------ ------- | + | | | | | + | | | | ----- + =========== ================== | PHY | + ----- + +Creating a Network Interface +============================ +A net device is created for each DPNI object probed on the MC bus. Each DPNI has +a number of properties which determine the network interface configuration +options and associated hardware resources. + +DPNI objects (and the other DPAA2 objects needed for a network interface) can be +added to a container on the MC bus in one of two ways: statically, through a +Datapath Layout Binary file (DPL) that is parsed by MC at boot time; or created +dynamically at runtime, via the DPAA2 objects APIs. + + +Features & Offloads +=================== +Hardware checksum offloading is supported for TCP and UDP over IPv4/6 frames. +The checksum offloads can be independently configured on RX and TX through +ethtool. + +Hardware offload of unicast and multicast MAC filtering is supported on the +ingress path and permanently enabled. + +Scatter-gather frames are supported on both RX and TX paths. On TX, SG support +is configurable via ethtool; on RX it is always enabled. + +The DPAA2 hardware can process jumbo Ethernet frames of up to 10K bytes. + +The Ethernet driver defines a static flow hashing scheme that distributes +traffic based on a 5-tuple key: src IP, dst IP, IP proto, L4 src port, +L4 dst port. No user configuration is supported for now. + +Hardware specific statistics for the network interface as well as some +non-standard driver stats can be consulted through ethtool -S option. diff --git a/Documentation/networking/dpaa2/index.rst b/Documentation/networking/dpaa2/index.rst index 10bea113a7bc..67bd87fe6c53 100644 --- a/Documentation/networking/dpaa2/index.rst +++ b/Documentation/networking/dpaa2/index.rst @@ -7,3 +7,4 @@ DPAA2 Documentation overview dpio-driver + ethernet-driver diff --git a/Documentation/networking/e100.rst b/Documentation/networking/e100.rst index f81111eba9c5..5e2839b4ec92 100644 --- a/Documentation/networking/e100.rst +++ b/Documentation/networking/e100.rst @@ -1,4 +1,5 @@ -============================================================== +.. SPDX-License-Identifier: GPL-2.0+ + Linux* Base Driver for the Intel(R) PRO/100 Family of Adapters ============================================================== diff --git a/Documentation/networking/e1000.rst b/Documentation/networking/e1000.rst index f10dd4086921..6379d4d20771 100644 --- a/Documentation/networking/e1000.rst +++ b/Documentation/networking/e1000.rst @@ -1,4 +1,5 @@ -=========================================================== +.. SPDX-License-Identifier: GPL-2.0+ + Linux* Base Driver for Intel(R) Ethernet Network Connection =========================================================== diff --git a/Documentation/networking/e1000e.rst b/Documentation/networking/e1000e.rst new file mode 100644 index 000000000000..33554e5416c5 --- /dev/null +++ b/Documentation/networking/e1000e.rst @@ -0,0 +1,382 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Driver for Intel(R) Ethernet Network Connection +====================================================== + +Intel Gigabit Linux driver. +Copyright(c) 2008-2018 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Command Line Parameters +- Additional Configurations +- Support + + +Identifying Your Adapter +======================== +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +https://www.intel.com/support + + +Command Line Parameters +======================= +If the driver is built as a module, the following optional parameters are used +by entering them on the command line with the modprobe command using this +syntax:: + + modprobe e1000e [<option>=<VAL1>,<VAL2>,...] + +There needs to be a <VAL#> for each network port in the system supported by +this driver. The values will be applied to each instance, in function order. +For example:: + + modprobe e1000e InterruptThrottleRate=16000,16000 + +In this case, there are two network ports supported by e1000e in the system. +The default value for each parameter is generally the recommended setting, +unless otherwise noted. + +NOTE: A descriptor describes a data buffer and attributes related to the data +buffer. This information is accessed by the hardware. + +InterruptThrottleRate +--------------------- +:Valid Range: 0,1,3,4,100-100000 +:Default Value: 3 + +Interrupt Throttle Rate controls the number of interrupts each interrupt +vector can generate per second. Increasing ITR lowers latency at the cost of +increased CPU utilization, though it may help throughput in some circumstances. + +Setting InterruptThrottleRate to a value greater or equal to 100 +will program the adapter to send out a maximum of that many interrupts +per second, even if more packets have come in. This reduces interrupt +load on the system and can lower CPU utilization under heavy load, +but will increase latency as packets are not processed as quickly. + +The default behaviour of the driver previously assumed a static +InterruptThrottleRate value of 8000, providing a good fallback value for +all traffic types, but lacking in small packet performance and latency. +The hardware can handle many more small packets per second however, and +for this reason an adaptive interrupt moderation algorithm was implemented. + +The driver has two adaptive modes (setting 1 or 3) in which +it dynamically adjusts the InterruptThrottleRate value based on the traffic +that it receives. After determining the type of incoming traffic in the last +timeframe, it will adjust the InterruptThrottleRate to an appropriate value +for that traffic. + +The algorithm classifies the incoming traffic every interval into +classes. Once the class is determined, the InterruptThrottleRate value is +adjusted to suit that traffic type the best. There are three classes defined: +"Bulk traffic", for large amounts of packets of normal size; "Low latency", +for small amounts of traffic and/or a significant percentage of small +packets; and "Lowest latency", for almost completely small packets or +minimal traffic. + + - 0: Off + Turns off any interrupt moderation and may improve small packet latency. + However, this is generally not suitable for bulk throughput traffic due + to the increased CPU utilization of the higher interrupt rate. + - 1: Dynamic mode + This mode attempts to moderate interrupts per vector while maintaining + very low latency. This can sometimes cause extra CPU utilization. If + planning on deploying e1000e in a latency sensitive environment, this + parameter should be considered. + - 3: Dynamic Conservative mode (default) + In dynamic conservative mode, the InterruptThrottleRate value is set to + 4000 for traffic that falls in class "Bulk traffic". If traffic falls in + the "Low latency" or "Lowest latency" class, the InterruptThrottleRate is + increased stepwise to 20000. This default mode is suitable for most + applications. + - 4: Simplified Balancing mode + In simplified mode the interrupt rate is based on the ratio of TX and + RX traffic. If the bytes per second rate is approximately equal, the + interrupt rate will drop as low as 2000 interrupts per second. If the + traffic is mostly transmit or mostly receive, the interrupt rate could + be as high as 8000. + - 100-100000: + Setting InterruptThrottleRate to a value greater or equal to 100 + will program the adapter to send at most that many interrupts per second, + even if more packets have come in. This reduces interrupt load on the + system and can lower CPU utilization under heavy load, but will increase + latency as packets are not processed as quickly. + +NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and +RxAbsIntDelay parameters. In other words, minimizing the receive and/or +transmit absolute delays does not force the controller to generate more +interrupts than what the Interrupt Throttle Rate allows. + +RxIntDelay +---------- +:Valid Range: 0-65535 (0=off) +:Default Value: 0 + +This value delays the generation of receive interrupts in units of 1.024 +microseconds. Receive interrupt reduction can improve CPU efficiency if +properly tuned for specific network traffic. Increasing this value adds extra +latency to frame reception and can end up decreasing the throughput of TCP +traffic. If the system is reporting dropped receives, this value may be set +too high, causing the driver to run out of available receive descriptors. + +CAUTION: When setting RxIntDelay to a value other than 0, adapters may hang +(stop transmitting) under certain network conditions. If this occurs a NETDEV +WATCHDOG message is logged in the system event log. In addition, the +controller is automatically reset, restoring the network connection. To +eliminate the potential for the hang ensure that RxIntDelay is set to 0. + +RxAbsIntDelay +------------- +:Valid Range: 0-65535 (0=off) +:Default Value: 8 + +This value, in units of 1.024 microseconds, limits the delay in which a +receive interrupt is generated. This value ensures that an interrupt is +generated after the initial packet is received within the set amount of time, +which is useful only if RxIntDelay is non-zero. Proper tuning, along with +RxIntDelay, may improve traffic throughput in specific network conditions. + +TxIntDelay +---------- +:Valid Range: 0-65535 (0=off) +:Default Value: 8 + +This value delays the generation of transmit interrupts in units of 1.024 +microseconds. Transmit interrupt reduction can improve CPU efficiency if +properly tuned for specific network traffic. If the system is reporting +dropped transmits, this value may be set too high causing the driver to run +out of available transmit descriptors. + +TxAbsIntDelay +------------- +:Valid Range: 0-65535 (0=off) +:Default Value: 32 + +This value, in units of 1.024 microseconds, limits the delay in which a +transmit interrupt is generated. It is useful only if TxIntDelay is non-zero. +It ensures that an interrupt is generated after the initial Packet is sent on +the wire within the set amount of time. Proper tuning, along with TxIntDelay, +may improve traffic throughput in specific network conditions. + +copybreak +--------- +:Valid Range: 0-xxxxxxx (0=off) +:Default Value: 256 + +The driver copies all packets below or equaling this size to a fresh receive +buffer before handing it up the stack. +This parameter differs from other parameters because it is a single (not 1,1,1 +etc.) parameter applied to all driver instances and it is also available +during runtime at /sys/module/e1000e/parameters/copybreak. + +To use copybreak, type:: + + modprobe e1000e.ko copybreak=128 + +SmartPowerDownEnable +-------------------- +:Valid Range: 0,1 +:Default Value: 0 (disabled) + +Allows the PHY to turn off in lower power states. The user can turn off this +parameter in supported chipsets. + +KumeranLockLoss +--------------- +:Valid Range: 0,1 +:Default Value: 1 (enabled) + +This workaround skips resetting the PHY at shutdown for the initial silicon +releases of ICH8 systems. + +IntMode +------- +:Valid Range: 0-2 +:Default Value: 0 + + +-------+----------------+ + | Value | Interrupt Mode | + +=======+================+ + | 0 | Legacy | + +-------+----------------+ + | 1 | MSI | + +-------+----------------+ + | 2 | MSI-X | + +-------+----------------+ + +IntMode allows load time control over the type of interrupt registered for by +the driver. MSI-X is required for multiple queue support, and some kernels and +combinations of kernel .config options will force a lower level of interrupt +support. + +This command will show different values for each type of interrupt:: + + cat /proc/interrupts + +CrcStripping +------------ +:Valid Range: 0,1 +:Default Value: 1 (enabled) + +Strip the CRC from received packets before sending up the network stack. If +you have a machine with a BMC enabled but cannot receive IPMI traffic after +loading or enabling the driver, try disabling this feature. + +WriteProtectNVM +--------------- +:Valid Range: 0,1 +:Default Value: 1 (enabled) + +If set to 1, configure the hardware to ignore all write/erase cycles to the +GbE region in the ICHx NVM (in order to prevent accidental corruption of the +NVM). This feature can be disabled by setting the parameter to 0 during initial +driver load. + +NOTE: The machine must be power cycled (full off/on) when enabling NVM writes +via setting the parameter to zero. Once the NVM has been locked (via the +parameter at 1 when the driver loads) it cannot be unlocked except via power +cycle. + +Debug +----- +:Valid Range: 0-16 (0=none,...,16=all) +:Default Value: 0 + +This parameter adjusts the level of debug messages displayed in the system logs. + + +Additional Features and Configurations +====================================== + +Jumbo Frames +------------ +Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) +to a value larger than the default value of 1500. + +Use the ifconfig command to increase the MTU size. For example, enter the +following where <x> is the interface number:: + + ifconfig eth<x> mtu 9000 up + +Alternatively, you can use the ip command as follows:: + + ip link set mtu 9000 dev eth<x> + ip link set up dev eth<x> + +This setting is not saved across reboots. The setting change can be made +permanent by adding 'MTU=9000' to the file: + +- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x> +- For SLES: /etc/sysconfig/network/<config_file> + +NOTE: The maximum MTU setting for Jumbo Frames is 8996. This value coincides +with the maximum Jumbo Frames size of 9018 bytes. + +NOTE: Using Jumbo frames at 10 or 100 Mbps is not supported and may result in +poor performance or loss of link. + +NOTE: The following adapters limit Jumbo Frames sized packets to a maximum of +4088 bytes: + + - Intel(R) 82578DM Gigabit Network Connection + - Intel(R) 82577LM Gigabit Network Connection + +The following adapters do not support Jumbo Frames: + + - Intel(R) PRO/1000 Gigabit Server Adapter + - Intel(R) PRO/1000 PM Network Connection + - Intel(R) 82562G 10/100 Network Connection + - Intel(R) 82562G-2 10/100 Network Connection + - Intel(R) 82562GT 10/100 Network Connection + - Intel(R) 82562GT-2 10/100 Network Connection + - Intel(R) 82562V 10/100 Network Connection + - Intel(R) 82562V-2 10/100 Network Connection + - Intel(R) 82566DC Gigabit Network Connection + - Intel(R) 82566DC-2 Gigabit Network Connection + - Intel(R) 82566DM Gigabit Network Connection + - Intel(R) 82566MC Gigabit Network Connection + - Intel(R) 82566MM Gigabit Network Connection + - Intel(R) 82567V-3 Gigabit Network Connection + - Intel(R) 82577LC Gigabit Network Connection + - Intel(R) 82578DC Gigabit Network Connection + +NOTE: Jumbo Frames cannot be configured on an 82579-based Network device if +MACSec is enabled on the system. + + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: + +https://www.kernel.org/pub/software/network/ethtool/ + +NOTE: When validating enable/disable tests on some parts (for example, 82578), +it is necessary to add a few seconds between tests when working with ethtool. + + +Speed and Duplex Configuration +------------------------------ +In addressing speed and duplex configuration issues, you need to distinguish +between copper-based adapters and fiber-based adapters. + +In the default mode, an Intel(R) Ethernet Network Adapter using copper +connections will attempt to auto-negotiate with its link partner to determine +the best setting. If the adapter cannot establish link with the link partner +using auto-negotiation, you may need to manually configure the adapter and link +partner to identical settings to establish link and pass packets. This should +only be needed when attempting to link with an older switch that does not +support auto-negotiation or one that has been forced to a specific speed or +duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds +and higher cannot be forced. Use the autonegotiation advertising setting to +manually set devices for 1 Gbps and higher. + +Speed, duplex, and autonegotiation advertising are configured through the +ethtool* utility. + +Caution: Only experienced network administrators should force speed and duplex +or change autonegotiation advertising manually. The settings at the switch must +always match the adapter settings. Adapter performance may suffer or your +adapter may not operate if you configure the adapter differently from your +switch. + +An Intel(R) Ethernet Network Adapter using fiber-based connections, however, +will not attempt to auto-negotiate with its link partner since those adapters +operate only in full duplex and only at their native speed. + + +Enabling Wake on LAN* (WoL) +--------------------------- +WoL is configured through the ethtool* utility. + +WoL will be enabled on the system during the next shut down or reboot. For +this driver version, in order to enable WoL, the e1000e driver must be loaded +prior to shutting down or suspending the system. + +NOTE: Wake on LAN is only supported on port A for the following devices: +- Intel(R) PRO/1000 PT Dual Port Network Connection +- Intel(R) PRO/1000 PT Dual Port Server Connection +- Intel(R) PRO/1000 PT Dual Port Server Adapter +- Intel(R) PRO/1000 PF Dual Port Server Adapter +- Intel(R) PRO/1000 PT Quad Port Server Adapter +- Intel(R) Gigabit PT Quad Port Server ExpressModule + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/e1000e.txt b/Documentation/networking/e1000e.txt deleted file mode 100644 index 12089547baed..000000000000 --- a/Documentation/networking/e1000e.txt +++ /dev/null @@ -1,312 +0,0 @@ -Linux* Driver for Intel(R) Ethernet Network Connection -====================================================== - -Intel Gigabit Linux driver. -Copyright(c) 1999 - 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Command Line Parameters -- Additional Configurations -- Support - -Identifying Your Adapter -======================== - -The e1000e driver supports all PCI Express Intel(R) Gigabit Network -Connections, except those that are 82575, 82576 and 82580-based*. - -* NOTE: The Intel(R) PRO/1000 P Dual Port Server Adapter is supported by - the e1000 driver, not the e1000e driver due to the 82546 part being used - behind a PCI Express bridge. - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/go/network/adapter/idguide.htm - -For the latest Intel network drivers for Linux, refer to the following -website. In the search field, enter your adapter name or type, or use the -networking link on the left to search for your adapter: - - http://support.intel.com/support/go/network/adapter/home.htm - -Command Line Parameters -======================= - -The default value for each parameter is generally the recommended setting, -unless otherwise noted. - -NOTES: For more information about the InterruptThrottleRate, - RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay - parameters, see the application note at: - http://www.intel.com/design/network/applnots/ap450.htm - -InterruptThrottleRate ---------------------- -Valid Range: 0,1,3,4,100-100000 (0=off, 1=dynamic, 3=dynamic conservative, - 4=simplified balancing) -Default Value: 3 - -The driver can limit the amount of interrupts per second that the adapter -will generate for incoming packets. It does this by writing a value to the -adapter that is based on the maximum amount of interrupts that the adapter -will generate per second. - -Setting InterruptThrottleRate to a value greater or equal to 100 -will program the adapter to send out a maximum of that many interrupts -per second, even if more packets have come in. This reduces interrupt -load on the system and can lower CPU utilization under heavy load, -but will increase latency as packets are not processed as quickly. - -The default behaviour of the driver previously assumed a static -InterruptThrottleRate value of 8000, providing a good fallback value for -all traffic types, but lacking in small packet performance and latency. -The hardware can handle many more small packets per second however, and -for this reason an adaptive interrupt moderation algorithm was implemented. - -The driver has two adaptive modes (setting 1 or 3) in which -it dynamically adjusts the InterruptThrottleRate value based on the traffic -that it receives. After determining the type of incoming traffic in the last -timeframe, it will adjust the InterruptThrottleRate to an appropriate value -for that traffic. - -The algorithm classifies the incoming traffic every interval into -classes. Once the class is determined, the InterruptThrottleRate value is -adjusted to suit that traffic type the best. There are three classes defined: -"Bulk traffic", for large amounts of packets of normal size; "Low latency", -for small amounts of traffic and/or a significant percentage of small -packets; and "Lowest latency", for almost completely small packets or -minimal traffic. - -In dynamic conservative mode, the InterruptThrottleRate value is set to 4000 -for traffic that falls in class "Bulk traffic". If traffic falls in the "Low -latency" or "Lowest latency" class, the InterruptThrottleRate is increased -stepwise to 20000. This default mode is suitable for most applications. - -For situations where low latency is vital such as cluster or -grid computing, the algorithm can reduce latency even more when -InterruptThrottleRate is set to mode 1. In this mode, which operates -the same as mode 3, the InterruptThrottleRate will be increased stepwise to -70000 for traffic in class "Lowest latency". - -In simplified mode the interrupt rate is based on the ratio of TX and -RX traffic. If the bytes per second rate is approximately equal, the -interrupt rate will drop as low as 2000 interrupts per second. If the -traffic is mostly transmit or mostly receive, the interrupt rate could -be as high as 8000. - -Setting InterruptThrottleRate to 0 turns off any interrupt moderation -and may improve small packet latency, but is generally not suitable -for bulk throughput traffic. - -NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and - RxAbsIntDelay parameters. In other words, minimizing the receive - and/or transmit absolute delays does not force the controller to - generate more interrupts than what the Interrupt Throttle Rate - allows. - -NOTE: When e1000e is loaded with default settings and multiple adapters - are in use simultaneously, the CPU utilization may increase non- - linearly. In order to limit the CPU utilization without impacting - the overall throughput, we recommend that you load the driver as - follows: - - modprobe e1000e InterruptThrottleRate=3000,3000,3000 - - This sets the InterruptThrottleRate to 3000 interrupts/sec for - the first, second, and third instances of the driver. The range - of 2000 to 3000 interrupts per second works on a majority of - systems and is a good starting point, but the optimal value will - be platform-specific. If CPU utilization is not a concern, use - RX_POLLING (NAPI) and default driver settings. - -RxIntDelay ----------- -Valid Range: 0-65535 (0=off) -Default Value: 0 - -This value delays the generation of receive interrupts in units of 1.024 -microseconds. Receive interrupt reduction can improve CPU efficiency if -properly tuned for specific network traffic. Increasing this value adds -extra latency to frame reception and can end up decreasing the throughput -of TCP traffic. If the system is reporting dropped receives, this value -may be set too high, causing the driver to run out of available receive -descriptors. - -CAUTION: When setting RxIntDelay to a value other than 0, adapters may - hang (stop transmitting) under certain network conditions. If - this occurs a NETDEV WATCHDOG message is logged in the system - event log. In addition, the controller is automatically reset, - restoring the network connection. To eliminate the potential - for the hang ensure that RxIntDelay is set to 0. - -RxAbsIntDelay -------------- -Valid Range: 0-65535 (0=off) -Default Value: 8 - -This value, in units of 1.024 microseconds, limits the delay in which a -receive interrupt is generated. Useful only if RxIntDelay is non-zero, -this value ensures that an interrupt is generated after the initial -packet is received within the set amount of time. Proper tuning, -along with RxIntDelay, may improve traffic throughput in specific network -conditions. - -TxIntDelay ----------- -Valid Range: 0-65535 (0=off) -Default Value: 8 - -This value delays the generation of transmit interrupts in units of -1.024 microseconds. Transmit interrupt reduction can improve CPU -efficiency if properly tuned for specific network traffic. If the -system is reporting dropped transmits, this value may be set too high -causing the driver to run out of available transmit descriptors. - -TxAbsIntDelay -------------- -Valid Range: 0-65535 (0=off) -Default Value: 32 - -This value, in units of 1.024 microseconds, limits the delay in which a -transmit interrupt is generated. Useful only if TxIntDelay is non-zero, -this value ensures that an interrupt is generated after the initial -packet is sent on the wire within the set amount of time. Proper tuning, -along with TxIntDelay, may improve traffic throughput in specific -network conditions. - -Copybreak ---------- -Valid Range: 0-xxxxxxx (0=off) -Default Value: 256 - -Driver copies all packets below or equaling this size to a fresh RX -buffer before handing it up the stack. - -This parameter is different than other parameters, in that it is a -single (not 1,1,1 etc.) parameter applied to all driver instances and -it is also available during runtime at -/sys/module/e1000e/parameters/copybreak - -SmartPowerDownEnable --------------------- -Valid Range: 0-1 -Default Value: 0 (disabled) - -Allows PHY to turn off in lower power states. The user can set this parameter -in supported chipsets. - -KumeranLockLoss ---------------- -Valid Range: 0-1 -Default Value: 1 (enabled) - -This workaround skips resetting the PHY at shutdown for the initial -silicon releases of ICH8 systems. - -IntMode -------- -Valid Range: 0-2 (0=legacy, 1=MSI, 2=MSI-X) -Default Value: 2 - -Allows changing the interrupt mode at module load time, without requiring a -recompile. If the driver load fails to enable a specific interrupt mode, the -driver will try other interrupt modes, from least to most compatible. The -interrupt order is MSI-X, MSI, Legacy. If specifying MSI (IntMode=1) -interrupts, only MSI and Legacy will be attempted. - -CrcStripping ------------- -Valid Range: 0-1 -Default Value: 1 (enabled) - -Strip the CRC from received packets before sending up the network stack. If -you have a machine with a BMC enabled but cannot receive IPMI traffic after -loading or enabling the driver, try disabling this feature. - -WriteProtectNVM ---------------- -Valid Range: 0,1 -Default Value: 1 - -If set to 1, configure the hardware to ignore all write/erase cycles to the -GbE region in the ICHx NVM (in order to prevent accidental corruption of the -NVM). This feature can be disabled by setting the parameter to 0 during initial -driver load. -NOTE: The machine must be power cycled (full off/on) when enabling NVM writes -via setting the parameter to zero. Once the NVM has been locked (via the -parameter at 1 when the driver loads) it cannot be unlocked except via power -cycle. - -Additional Configurations -========================= - - Jumbo Frames - ------------ - Jumbo Frames support is enabled by changing the MTU to a value larger than - the default of 1500. Use the ifconfig command to increase the MTU size. - For example: - - ifconfig eth<x> mtu 9000 up - - This setting is not saved across reboots. - - Notes: - - - The maximum MTU setting for Jumbo Frames is 9216. This value coincides - with the maximum Jumbo Frames size of 9234 bytes. - - - Using Jumbo frames at 10 or 100 Mbps is not supported and may result in - poor performance or loss of link. - - - Some adapters limit Jumbo Frames sized packets to a maximum of - 4096 bytes and some adapters do not support Jumbo Frames. - - - Jumbo Frames cannot be configured on an 82579-based Network device, if - MACSec is enabled on the system. - - ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. We - strongly recommend downloading the latest version of ethtool at: - - https://kernel.org/pub/software/network/ethtool/ - - NOTE: When validating enable/disable tests on some parts (82578, for example) - you need to add a few seconds between tests when working with ethtool. - - Speed and Duplex - ---------------- - Speed and Duplex are configured through the ethtool* utility. For - instructions, refer to the ethtool man page. - - Enabling Wake on LAN* (WoL) - --------------------------- - WoL is configured through the ethtool* utility. For instructions on - enabling WoL with ethtool, refer to the ethtool man page. - - WoL will be enabled on the system during the next shut down or reboot. - For this driver version, in order to enable WoL, the e1000e driver must be - loaded when shutting down or rebooting the system. - - In most cases Wake On LAN is only supported on port A for multiple port - adapters. To verify if a port supports Wake on Lan run ethtool eth<X>. - -Support -======= - -For general information, go to the Intel support website at: - - www.intel.com/support/ - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index e6b4ebb2b243..2196b824e96c 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt @@ -203,11 +203,11 @@ opcodes as defined in linux/filter.h stand for: Instruction Addressing mode Description - ld 1, 2, 3, 4, 10 Load word into A + ld 1, 2, 3, 4, 12 Load word into A ldi 4 Load word into A ldh 1, 2 Load half-word into A ldb 1, 2 Load byte into A - ldx 3, 4, 5, 10 Load word into X + ldx 3, 4, 5, 12 Load word into X ldxi 4 Load word into X ldxb 5 Load byte into X @@ -216,14 +216,14 @@ opcodes as defined in linux/filter.h stand for: jmp 6 Jump to label ja 6 Jump to label - jeq 7, 8 Jump on A == k - jneq 8 Jump on A != k - jne 8 Jump on A != k - jlt 8 Jump on A < k - jle 8 Jump on A <= k - jgt 7, 8 Jump on A > k - jge 7, 8 Jump on A >= k - jset 7, 8 Jump on A & k + jeq 7, 8, 9, 10 Jump on A == <x> + jneq 9, 10 Jump on A != <x> + jne 9, 10 Jump on A != <x> + jlt 9, 10 Jump on A < <x> + jle 9, 10 Jump on A <= <x> + jgt 7, 8, 9, 10 Jump on A > <x> + jge 7, 8, 9, 10 Jump on A >= <x> + jset 7, 8, 9, 10 Jump on A & <x> add 0, 4 A + <x> sub 0, 4 A - <x> @@ -240,7 +240,7 @@ opcodes as defined in linux/filter.h stand for: tax Copy A into X txa Copy X into A - ret 4, 9 Return + ret 4, 11 Return The next table shows addressing formats from the 2nd column: @@ -254,9 +254,11 @@ The next table shows addressing formats from the 2nd column: 5 4*([k]&0xf) Lower nibble * 4 at byte offset k in the packet 6 L Jump label L 7 #k,Lt,Lf Jump to Lt if true, otherwise jump to Lf - 8 #k,Lt Jump to Lt if predicate is true - 9 a/%a Accumulator A - 10 extension BPF extension + 8 x/%x,Lt,Lf Jump to Lt if true, otherwise jump to Lf + 9 #k,Lt Jump to Lt if predicate is true + 10 x/%x,Lt Jump to Lt if predicate is true + 11 a/%a Accumulator A + 12 extension BPF extension The Linux kernel also has a couple of BPF extensions that are used along with the class of load instructions by "overloading" the k argument with @@ -1125,6 +1127,14 @@ pointer type. The types of pointers describe their base, as follows: PTR_TO_STACK Frame pointer. PTR_TO_PACKET skb->data. PTR_TO_PACKET_END skb->data + headlen; arithmetic forbidden. + PTR_TO_SOCKET Pointer to struct bpf_sock_ops, implicitly refcounted. + PTR_TO_SOCKET_OR_NULL + Either a pointer to a socket, or NULL; socket lookup + returns this type, which becomes a PTR_TO_SOCKET when + checked != NULL. PTR_TO_SOCKET is reference-counted, + so programs must release the reference through the + socket release function before the end of the program. + Arithmetic on these pointers is forbidden. However, a pointer may be offset from this base (as a result of pointer arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable offset'. The former is used when an exactly-known value (e.g. an immediate @@ -1171,6 +1181,13 @@ over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting pointer will have a variable offset known to be 4n+2 for some n, so adding the 2 bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through that pointer are safe. +The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common +to all copies of the pointer returned from a socket lookup. This has similar +behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but +it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly +represents a reference to the corresponding 'struct sock'. To ensure that the +reference is not leaked, it is imperative to NULL-check the reference and in +the non-NULL case, and pass the valid reference to the socket release function. Direct packet access -------------------- @@ -1444,6 +1461,55 @@ Error: 8: (7a) *(u64 *)(r0 +0) = 1 R0 invalid mem access 'imm' +Program that performs a socket lookup then sets the pointer to NULL without +checking it: +value: + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_MOV64_IMM(BPF_REG_3, 4), + BPF_MOV64_IMM(BPF_REG_4, 0), + BPF_MOV64_IMM(BPF_REG_5, 0), + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), +Error: + 0: (b7) r2 = 0 + 1: (63) *(u32 *)(r10 -8) = r2 + 2: (bf) r2 = r10 + 3: (07) r2 += -8 + 4: (b7) r3 = 4 + 5: (b7) r4 = 0 + 6: (b7) r5 = 0 + 7: (85) call bpf_sk_lookup_tcp#65 + 8: (b7) r0 = 0 + 9: (95) exit + Unreleased reference id=1, alloc_insn=7 + +Program that performs a socket lookup but does not NULL-check the returned +value: + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_MOV64_IMM(BPF_REG_3, 4), + BPF_MOV64_IMM(BPF_REG_4, 0), + BPF_MOV64_IMM(BPF_REG_5, 0), + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), + BPF_EXIT_INSN(), +Error: + 0: (b7) r2 = 0 + 1: (63) *(u32 *)(r10 -8) = r2 + 2: (bf) r2 = r10 + 3: (07) r2 += -8 + 4: (b7) r3 = 4 + 5: (b7) r4 = 0 + 6: (b7) r5 = 0 + 7: (85) call bpf_sk_lookup_tcp#65 + 8: (95) exit + Unreleased reference id=1, alloc_insn=7 + Testing ------- diff --git a/Documentation/networking/fm10k.rst b/Documentation/networking/fm10k.rst new file mode 100644 index 000000000000..bf5e5942f28d --- /dev/null +++ b/Documentation/networking/fm10k.rst @@ -0,0 +1,141 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for Intel(R) Ethernet Multi-host Controller +============================================================== + +August 20, 2018 +Copyright(c) 2015-2018 Intel Corporation. + +Contents +======== +- Identifying Your Adapter +- Additional Configurations +- Performance Tuning +- Known Issues +- Support + +Identifying Your Adapter +======================== +The driver in this release is compatible with devices based on the Intel(R) +Ethernet Multi-host Controller. + +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +http://www.intel.com/support + + +Flow Control +------------ +The Intel(R) Ethernet Switch Host Interface Driver does not support Flow +Control. It will not send pause frames. This may result in dropped frames. + + +Virtual Functions (VFs) +----------------------- +Use sysfs to enable VFs. +Valid Range: 0-64 + +For example:: + + echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs //enable VFs + echo 0 > /sys/class/net/$dev/device/sriov_numvfs //disable VFs + +NOTE: Neither the device nor the driver control how VFs are mapped into config +space. Bus layout will vary by operating system. On operating systems that +support it, you can check sysfs to find the mapping. + +NOTE: When SR-IOV mode is enabled, hardware VLAN filtering and VLAN tag +stripping/insertion will remain enabled. Please remove the old VLAN filter +before the new VLAN filter is added. For example:: + + ip link set eth0 vf 0 vlan 100 // set vlan 100 for VF 0 + ip link set eth0 vf 0 vlan 0 // Delete vlan 100 + ip link set eth0 vf 0 vlan 200 // set a new vlan 200 for VF 0 + + +Additional Features and Configurations +====================================== + +Jumbo Frames +------------ +Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) +to a value larger than the default value of 1500. + +Use the ifconfig command to increase the MTU size. For example, enter the +following where <x> is the interface number:: + + ifconfig eth<x> mtu 9000 up + +Alternatively, you can use the ip command as follows:: + + ip link set mtu 9000 dev eth<x> + ip link set up dev eth<x> + +This setting is not saved across reboots. The setting change can be made +permanent by adding 'MTU=9000' to the file: + +- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x> +- For SLES: /etc/sysconfig/network/<config_file> + +NOTE: The maximum MTU setting for Jumbo Frames is 15342. This value coincides +with the maximum Jumbo Frames size of 15364 bytes. + +NOTE: This driver will attempt to use multiple page sized buffers to receive +each jumbo packet. This should help to avoid buffer starvation issues when +allocating receive packets. + + +Generic Receive Offload, aka GRO +-------------------------------- +The driver supports the in-kernel software implementation of GRO. GRO has +shown that by coalescing Rx traffic into larger chunks of data, CPU +utilization can be significantly reduced when under large Rx load. GRO is an +evolution of the previously-used LRO interface. GRO is able to coalesce +other protocols besides TCP. It's also safe to use with configurations that +are problematic for LRO, namely bridging and iSCSI. + + + +Supported ethtool Commands and Options for Filtering +---------------------------------------------------- +-n --show-nfc + Retrieves the receive network flow classification configurations. + +rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 + Retrieves the hash options for the specified network traffic type. + +-N --config-nfc + Configures the receive network flow classification. + +rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r + Configures the hash options for the specified network traffic type. + +- udp4: UDP over IPv4 +- udp6: UDP over IPv6 +- f Hash on bytes 0 and 1 of the Layer 4 header of the rx packet. +- n Hash on bytes 2 and 3 of the Layer 4 header of the rx packet. + + +Known Issues/Troubleshooting +============================ + +Enabling SR-IOV in a 64-bit Microsoft* Windows Server* 2012/R2 guest OS under Linux KVM +--------------------------------------------------------------------------------------- +KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. This +includes traditional PCIe devices, as well as SR-IOV-capable devices based on +the Intel Ethernet Controller XL710. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/i40e.rst b/Documentation/networking/i40e.rst new file mode 100644 index 000000000000..0cc16c525d10 --- /dev/null +++ b/Documentation/networking/i40e.rst @@ -0,0 +1,770 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for the Intel(R) Ethernet Controller 700 Series +================================================================== + +Intel 40 Gigabit Linux driver. +Copyright(c) 1999-2018 Intel Corporation. + +Contents +======== + +- Overview +- Identifying Your Adapter +- Intel(R) Ethernet Flow Director +- Additional Configurations +- Known Issues +- Support + + +Driver information can be obtained using ethtool, lspci, and ifconfig. +Instructions on updating ethtool can be found in the section Additional +Configurations later in this document. + +For questions related to hardware requirements, refer to the documentation +supplied with your Intel adapter. All hardware requirements listed apply to use +with Linux. + + +Identifying Your Adapter +======================== +The driver is compatible with devices based on the following: + + * Intel(R) Ethernet Controller X710 + * Intel(R) Ethernet Controller XL710 + * Intel(R) Ethernet Network Connection X722 + * Intel(R) Ethernet Controller XXV710 + +For the best performance, make sure the latest NVM/FW is installed on your +device. + +For information on how to identify your adapter, and for the latest NVM/FW +images and Intel network drivers, refer to the Intel Support website: +https://www.intel.com/support + +SFP+ and QSFP+ Devices +---------------------- +For information about supported media, refer to this document: +https://www.intel.com/content/dam/www/public/us/en/documents/release-notes/xl710-ethernet-controller-feature-matrix.pdf + +NOTE: Some adapters based on the Intel(R) Ethernet Controller 700 Series only +support Intel Ethernet Optics modules. On these adapters, other modules are not +supported and will not function. In all cases Intel recommends using Intel +Ethernet Optics; other modules may function but are not validated by Intel. +Contact Intel for supported media types. + +NOTE: For connections based on Intel(R) Ethernet Controller 700 Series, support +is dependent on your system board. Please see your vendor for details. + +NOTE: In systems that do not have adequate airflow to cool the adapter and +optical modules, you must use high temperature optical modules. + +Virtual Functions (VFs) +----------------------- +Use sysfs to enable VFs. For example:: + + #echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs #enable VFs + #echo 0 > /sys/class/net/$dev/device/sriov_numvfs #disable VFs + +For example, the following instructions will configure PF eth0 and the first VF +on VLAN 10:: + + $ ip link set dev eth0 vf 0 vlan 10 + +VLAN Tag Packet Steering +------------------------ +Allows you to send all packets with a specific VLAN tag to a particular SR-IOV +virtual function (VF). Further, this feature allows you to designate a +particular VF as trusted, and allows that trusted VF to request selective +promiscuous mode on the Physical Function (PF). + +To set a VF as trusted or untrusted, enter the following command in the +Hypervisor:: + + # ip link set dev eth0 vf 1 trust [on|off] + +Once the VF is designated as trusted, use the following commands in the VM to +set the VF to promiscuous mode. + +:: + + For promiscuous all: + #ip link set eth2 promisc on + Where eth2 is a VF interface in the VM + + For promiscuous Multicast: + #ip link set eth2 allmulticast on + Where eth2 is a VF interface in the VM + +NOTE: By default, the ethtool priv-flag vf-true-promisc-support is set to +"off",meaning that promiscuous mode for the VF will be limited. To set the +promiscuous mode for the VF to true promiscuous and allow the VF to see all +ingress traffic, use the following command:: + + #ethtool -set-priv-flags p261p1 vf-true-promisc-support on + +The vf-true-promisc-support priv-flag does not enable promiscuous mode; rather, +it designates which type of promiscuous mode (limited or true) you will get +when you enable promiscuous mode using the ip link commands above. Note that +this is a global setting that affects the entire device. However,the +vf-true-promisc-support priv-flag is only exposed to the first PF of the +device. The PF remains in limited promiscuous mode (unless it is in MFP mode) +regardless of the vf-true-promisc-support setting. + +Now add a VLAN interface on the VF interface:: + + #ip link add link eth2 name eth2.100 type vlan id 100 + +Note that the order in which you set the VF to promiscuous mode and add the +VLAN interface does not matter (you can do either first). The end result in +this example is that the VF will get all traffic that is tagged with VLAN 100. + +Intel(R) Ethernet Flow Director +------------------------------- +The Intel Ethernet Flow Director performs the following tasks: + +- Directs receive packets according to their flows to different queues. +- Enables tight control on routing a flow in the platform. +- Matches flows and CPU cores for flow affinity. +- Supports multiple parameters for flexible flow classification and load + balancing (in SFP mode only). + +NOTE: The Linux i40e driver supports the following flow types: IPv4, TCPv4, and +UDPv4. For a given flow type, it supports valid combinations of IP addresses +(source or destination) and UDP/TCP ports (source and destination). For +example, you can supply only a source IP address, a source IP address and a +destination port, or any combination of one or more of these four parameters. + +NOTE: The Linux i40e driver allows you to filter traffic based on a +user-defined flexible two-byte pattern and offset by using the ethtool user-def +and mask fields. Only L3 and L4 flow types are supported for user-defined +flexible filters. For a given flow type, you must clear all Intel Ethernet Flow +Director filters before changing the input set (for that flow type). + +To enable or disable the Intel Ethernet Flow Director:: + + # ethtool -K ethX ntuple <on|off> + +When disabling ntuple filters, all the user programmed filters are flushed from +the driver cache and hardware. All needed filters must be re-added when ntuple +is re-enabled. + +To add a filter that directs packet to queue 2, use -U or -N switch:: + + # ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \ + 192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1] + +To set a filter using only the source and destination IP address:: + + # ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \ + 192.168.10.2 action 2 [loc 1] + +To see the list of filters currently present:: + + # ethtool <-u|-n> ethX + +Application Targeted Routing (ATR) Perfect Filters +-------------------------------------------------- +ATR is enabled by default when the kernel is in multiple transmit queue mode. +An ATR Intel Ethernet Flow Director filter rule is added when a TCP-IP flow +starts and is deleted when the flow ends. When a TCP-IP Intel Ethernet Flow +Director rule is added from ethtool (Sideband filter), ATR is turned off by the +driver. To re-enable ATR, the sideband can be disabled with the ethtool -K +option. For example:: + + ethtool –K [adapter] ntuple [off|on] + +If sideband is re-enabled after ATR is re-enabled, ATR remains enabled until a +TCP-IP flow is added. When all TCP-IP sideband rules are deleted, ATR is +automatically re-enabled. + +Packets that match the ATR rules are counted in fdir_atr_match stats in +ethtool, which also can be used to verify whether ATR rules still exist. + +Sideband Perfect Filters +------------------------ +Sideband Perfect Filters are used to direct traffic that matches specified +characteristics. They are enabled through ethtool's ntuple interface. To add a +new filter use the following command:: + + ethtool -U <device> flow-type <type> src-ip <ip> dst-ip <ip> src-port <port> \ + dst-port <port> action <queue> + +Where: + <device> - the ethernet device to program + <type> - can be ip4, tcp4, udp4, or sctp4 + <ip> - the ip address to match on + <port> - the port number to match on + <queue> - the queue to direct traffic towards (-1 discards matching traffic) + +Use the following command to display all of the active filters:: + + ethtool -u <device> + +Use the following command to delete a filter:: + + ethtool -U <device> delete <N> + +Where <N> is the filter id displayed when printing all the active filters, and +may also have been specified using "loc <N>" when adding the filter. + +The following example matches TCP traffic sent from 192.168.0.1, port 5300, +directed to 192.168.0.5, port 80, and sends it to queue 7:: + + ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 \ + src-port 5300 dst-port 80 action 7 + +For each flow-type, the programmed filters must all have the same matching +input set. For example, issuing the following two commands is acceptable:: + + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10 + +Issuing the next two commands, however, is not acceptable, since the first +specifies src-ip and the second specifies dst-ip:: + + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 + ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10 + +The second command will fail with an error. You may program multiple filters +with the same fields, using different values, but, on one device, you may not +program two tcp4 filters with different matching fields. + +Matching on a sub-portion of a field is not supported by the i40e driver, thus +partial mask fields are not supported. + +The driver also supports matching user-defined data within the packet payload. +This flexible data is specified using the "user-def" field of the ethtool +command in the following way: + ++----------------------------+--------------------------+ +| 31 28 24 20 16 | 15 12 8 4 0 | ++----------------------------+--------------------------+ +| offset into packet payload | 2 bytes of flexible data | ++----------------------------+--------------------------+ + +For example, + +:: + + ... user-def 0x4FFFF ... + +tells the filter to look 4 bytes into the payload and match that value against +0xFFFF. The offset is based on the beginning of the payload, and not the +beginning of the packet. Thus + +:: + + flow-type tcp4 ... user-def 0x8BEAF ... + +would match TCP/IPv4 packets which have the value 0xBEAF 8 bytes into the +TCP/IPv4 payload. + +Note that ICMP headers are parsed as 4 bytes of header and 4 bytes of payload. +Thus to match the first byte of the payload, you must actually add 4 bytes to +the offset. Also note that ip4 filters match both ICMP frames as well as raw +(unknown) ip4 frames, where the payload will be the L3 payload of the IP4 frame. + +The maximum offset is 64. The hardware will only read up to 64 bytes of data +from the payload. The offset must be even because the flexible data is 2 bytes +long and must be aligned to byte 0 of the packet payload. + +The user-defined flexible offset is also considered part of the input set and +cannot be programmed separately for multiple filters of the same type. However, +the flexible data is not part of the input set and multiple filters may use the +same offset but match against different data. + +To create filters that direct traffic to a specific Virtual Function, use the +"action" parameter. Specify the action as a 64 bit value, where the lower 32 +bits represents the queue number, while the next 8 bits represent which VF. +Note that 0 is the PF, so the VF identifier is offset by 1. For example:: + + ... action 0x800000002 ... + +specifies to direct traffic to Virtual Function 7 (8 minus 1) into queue 2 of +that VF. + +Note that these filters will not break internal routing rules, and will not +route traffic that otherwise would not have been sent to the specified Virtual +Function. + +Setting the link-down-on-close Private Flag +------------------------------------------- +When the link-down-on-close private flag is set to "on", the port's link will +go down when the interface is brought down using the ifconfig ethX down command. + +Use ethtool to view and set link-down-on-close, as follows:: + + ethtool --show-priv-flags ethX + ethtool --set-priv-flags ethX link-down-on-close [on|off] + +Viewing Link Messages +--------------------- +Link messages will not be displayed to the console if the distribution is +restricting system messages. In order to see network driver link messages on +your console, set dmesg to eight by entering the following:: + + dmesg -n 8 + +NOTE: This setting is not saved across reboots. + +Jumbo Frames +------------ +Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) +to a value larger than the default value of 1500. + +Use the ifconfig command to increase the MTU size. For example, enter the +following where <x> is the interface number:: + + ifconfig eth<x> mtu 9000 up + +Alternatively, you can use the ip command as follows:: + + ip link set mtu 9000 dev eth<x> + ip link set up dev eth<x> + +This setting is not saved across reboots. The setting change can be made +permanent by adding 'MTU=9000' to the file:: + + /etc/sysconfig/network-scripts/ifcfg-eth<x> // for RHEL + /etc/sysconfig/network/<config_file> // for SLES + +NOTE: The maximum MTU setting for Jumbo Frames is 9702. This value coincides +with the maximum Jumbo Frames size of 9728 bytes. + +NOTE: This driver will attempt to use multiple page sized buffers to receive +each jumbo packet. This should help to avoid buffer starvation issues when +allocating receive packets. + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: +https://www.kernel.org/pub/software/network/ethtool/ + +Supported ethtool Commands and Options for Filtering +---------------------------------------------------- +-n --show-nfc + Retrieves the receive network flow classification configurations. + +rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 + Retrieves the hash options for the specified network traffic type. + +-N --config-nfc + Configures the receive network flow classification. + +rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r... + Configures the hash options for the specified network traffic type. + +udp4 UDP over IPv4 +udp6 UDP over IPv6 + +f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet. +n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet. + +Speed and Duplex Configuration +------------------------------ +In addressing speed and duplex configuration issues, you need to distinguish +between copper-based adapters and fiber-based adapters. + +In the default mode, an Intel(R) Ethernet Network Adapter using copper +connections will attempt to auto-negotiate with its link partner to determine +the best setting. If the adapter cannot establish link with the link partner +using auto-negotiation, you may need to manually configure the adapter and link +partner to identical settings to establish link and pass packets. This should +only be needed when attempting to link with an older switch that does not +support auto-negotiation or one that has been forced to a specific speed or +duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds +and higher cannot be forced. Use the autonegotiation advertising setting to +manually set devices for 1 Gbps and higher. + +NOTE: You cannot set the speed for devices based on the Intel(R) Ethernet +Network Adapter XXV710 based devices. + +Speed, duplex, and autonegotiation advertising are configured through the +ethtool* utility. + +Caution: Only experienced network administrators should force speed and duplex +or change autonegotiation advertising manually. The settings at the switch must +always match the adapter settings. Adapter performance may suffer or your +adapter may not operate if you configure the adapter differently from your +switch. + +An Intel(R) Ethernet Network Adapter using fiber-based connections, however, +will not attempt to auto-negotiate with its link partner since those adapters +operate only in full duplex and only at their native speed. + +NAPI +---- +NAPI (Rx polling mode) is supported in the i40e driver. +For more information on NAPI, see +https://wiki.linuxfoundation.org/networking/napi + +Flow Control +------------ +Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable +receiving and transmitting pause frames for i40e. When transmit is enabled, +pause frames are generated when the receive packet buffer crosses a predefined +threshold. When receive is enabled, the transmit unit will halt for the time +delay specified when a pause frame is received. + +NOTE: You must have a flow control capable link partner. + +Flow Control is on by default. + +Use ethtool to change the flow control settings. + +To enable or disable Rx or Tx Flow Control:: + + ethtool -A eth? rx <on|off> tx <on|off> + +Note: This command only enables or disables Flow Control if auto-negotiation is +disabled. If auto-negotiation is enabled, this command changes the parameters +used for auto-negotiation with the link partner. + +To enable or disable auto-negotiation:: + + ethtool -s eth? autoneg <on|off> + +Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending +on your device, you may not be able to change the auto-negotiation setting. + +RSS Hash Flow +------------- +Allows you to set the hash bytes per flow type and any combination of one or +more options for Receive Side Scaling (RSS) hash byte configuration. + +:: + + # ethtool -N <dev> rx-flow-hash <type> <option> + +Where <type> is: + tcp4 signifying TCP over IPv4 + udp4 signifying UDP over IPv4 + tcp6 signifying TCP over IPv6 + udp6 signifying UDP over IPv6 +And <option> is one or more of: + s Hash on the IP source address of the Rx packet. + d Hash on the IP destination address of the Rx packet. + f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet. + n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet. + +MAC and VLAN anti-spoofing feature +---------------------------------- +When a malicious driver attempts to send a spoofed packet, it is dropped by the +hardware and not transmitted. +NOTE: This feature can be disabled for a specific Virtual Function (VF):: + + ip link set <pf dev> vf <vf id> spoofchk {off|on} + +IEEE 1588 Precision Time Protocol (PTP) Hardware Clock (PHC) +------------------------------------------------------------ +Precision Time Protocol (PTP) is used to synchronize clocks in a computer +network. PTP support varies among Intel devices that support this driver. Use +"ethtool -T <netdev name>" to get a definitive list of PTP capabilities +supported by the device. + +IEEE 802.1ad (QinQ) Support +--------------------------- +The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN +IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as +"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks +allow L2 tunneling and the ability to segregate traffic within a particular +VLAN ID, among other uses. + +The following are examples of how to configure 802.1ad (QinQ):: + + ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24 + ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371 + +Where "24" and "371" are example VLAN IDs. + +NOTES: + Receive checksum offloads, cloud filters, and VLAN acceleration are not + supported for 802.1ad (QinQ) packets. + +VXLAN and GENEVE Overlay HW Offloading +-------------------------------------- +Virtual Extensible LAN (VXLAN) allows you to extend an L2 network over an L3 +network, which may be useful in a virtualized or cloud environment. Some +Intel(R) Ethernet Network devices perform VXLAN processing, offloading it from +the operating system. This reduces CPU utilization. + +VXLAN offloading is controlled by the Tx and Rx checksum offload options +provided by ethtool. That is, if Tx checksum offload is enabled, and the +adapter has the capability, VXLAN offloading is also enabled. + +Support for VXLAN and GENEVE HW offloading is dependent on kernel support of +the HW offloading features. + +Multiple Functions per Port +--------------------------- +Some adapters based on the Intel Ethernet Controller X710/XL710 support +multiple functions on a single physical port. Configure these functions through +the System Setup/BIOS. + +Minimum TX Bandwidth is the guaranteed minimum data transmission bandwidth, as +a percentage of the full physical port link speed, that the partition will +receive. The bandwidth the partition is awarded will never fall below the level +you specify. + +The range for the minimum bandwidth values is: +1 to ((100 minus # of partitions on the physical port) plus 1) +For example, if a physical port has 4 partitions, the range would be: +1 to ((100 - 4) + 1 = 97) + +The Maximum Bandwidth percentage represents the maximum transmit bandwidth +allocated to the partition as a percentage of the full physical port link +speed. The accepted range of values is 1-100. The value is used as a limiter, +should you chose that any one particular function not be able to consume 100% +of a port's bandwidth (should it be available). The sum of all the values for +Maximum Bandwidth is not restricted, because no more than 100% of a port's +bandwidth can ever be used. + +NOTE: X710/XXV710 devices fail to enable Max VFs (64) when Multiple Functions +per Port (MFP) and SR-IOV are enabled. An error from i40e is logged that says +"add vsi failed for VF N, aq_err 16". To workaround the issue, enable less than +64 virtual functions (VFs). + +Data Center Bridging (DCB) +-------------------------- +DCB is a configuration Quality of Service implementation in hardware. It uses +the VLAN priority tag (802.1p) to filter traffic. That means that there are 8 +different priorities that traffic can be filtered into. It also enables +priority flow control (802.1Qbb) which can limit or eliminate the number of +dropped packets during network stress. Bandwidth can be allocated to each of +these priorities, which is enforced at the hardware level (802.1Qaz). + +Adapter firmware implements LLDP and DCBX protocol agents as per 802.1AB and +802.1Qaz respectively. The firmware based DCBX agent runs in willing mode only +and can accept settings from a DCBX capable peer. Software configuration of +DCBX parameters via dcbtool/lldptool are not supported. + +NOTE: Firmware LLDP can be disabled by setting the private flag disable-fw-lldp. + +The i40e driver implements the DCB netlink interface layer to allow user-space +to communicate with the driver and query DCB configuration for the port. + +NOTE: +The kernel assumes that TC0 is available, and will disable Priority Flow +Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is +enabled when setting up DCB on your switch. + +Interrupt Rate Limiting +----------------------- +:Valid Range: 0-235 (0=no limit) + +The Intel(R) Ethernet Controller XL710 family supports an interrupt rate +limiting mechanism. The user can control, via ethtool, the number of +microseconds between interrupts. + +Syntax:: + + # ethtool -C ethX rx-usecs-high N + +The range of 0-235 microseconds provides an effective range of 4,310 to 250,000 +interrupts per second. The value of rx-usecs-high can be set independently of +rx-usecs and tx-usecs in the same ethtool command, and is also independent of +the adaptive interrupt moderation algorithm. The underlying hardware supports +granularity in 4-microsecond intervals, so adjacent values may result in the +same interrupt rate. + +One possible use case is the following:: + + # ethtool -C ethX adaptive-rx off adaptive-tx off rx-usecs-high 20 rx-usecs \ + 5 tx-usecs 5 + +The above command would disable adaptive interrupt moderation, and allow a +maximum of 5 microseconds before indicating a receive or transmit was complete. +However, instead of resulting in as many as 200,000 interrupts per second, it +limits total interrupts per second to 50,000 via the rx-usecs-high parameter. + +Performance Optimization +======================== +Driver defaults are meant to fit a wide variety of workloads, but if further +optimization is required we recommend experimenting with the following settings. + +NOTE: For better performance when processing small (64B) frame sizes, try +enabling Hyper threading in the BIOS in order to increase the number of logical +cores in the system and subsequently increase the number of queues available to +the adapter. + +Virtualized Environments +------------------------ +1. Disable XPS on both ends by using the included virt_perf_default script +or by running the following command as root:: + + for file in `ls /sys/class/net/<ethX>/queues/tx-*/xps_cpus`; + do echo 0 > $file; done + +2. Using the appropriate mechanism (vcpupin) in the vm, pin the cpu's to +individual lcpu's, making sure to use a set of cpu's included in the +device's local_cpulist: /sys/class/net/<ethX>/device/local_cpulist. + +3. Configure as many Rx/Tx queues in the VM as available. Do not rely on +the default setting of 1. + + +Non-virtualized Environments +---------------------------- +Pin the adapter's IRQs to specific cores by disabling the irqbalance service +and using the included set_irq_affinity script. Please see the script's help +text for further options. + +- The following settings will distribute the IRQs across all the cores evenly:: + + # scripts/set_irq_affinity -x all <interface1> , [ <interface2>, ... ] + +- The following settings will distribute the IRQs across all the cores that are + local to the adapter (same NUMA node):: + + # scripts/set_irq_affinity -x local <interface1> ,[ <interface2>, ... ] + +For very CPU intensive workloads, we recommend pinning the IRQs to all cores. + +For IP Forwarding: Disable Adaptive ITR and lower Rx and Tx interrupts per +queue using ethtool. + +- Setting rx-usecs and tx-usecs to 125 will limit interrupts to about 8000 + interrupts per second per queue. + +:: + + # ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 125 \ + tx-usecs 125 + +For lower CPU utilization: Disable Adaptive ITR and lower Rx and Tx interrupts +per queue using ethtool. + +- Setting rx-usecs and tx-usecs to 250 will limit interrupts to about 4000 + interrupts per second per queue. + +:: + + # ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 250 \ + tx-usecs 250 + +For lower latency: Disable Adaptive ITR and ITR by setting Rx and Tx to 0 using +ethtool. + +:: + + # ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 0 \ + tx-usecs 0 + +Application Device Queues (ADq) +------------------------------- +Application Device Queues (ADq) allows you to dedicate one or more queues to a +specific application. This can reduce latency for the specified application, +and allow Tx traffic to be rate limited per application. Follow the steps below +to set ADq. + +1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface. +The shaper bw_rlimit parameter is optional. + +Example: Sets up two tcs, tc0 and tc1, with 16 queues each and max tx rate set +to 1Gbit for tc0 and 3Gbit for tc1. + +:: + + # tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1 + queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit + max_rate 1Gbit 3Gbit + +map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1 +sets priorities 0-3 to use tc0 and 4-7 to use tc1) + +queues: for each tc, <num queues>@<offset> (e.g. queues 16@0 16@16 assigns +16 queues to tc0 at offset 0 and 16 queues to tc1 at offset 16. Max total +number of queues for all tcs is 64 or number of cores, whichever is lower.) + +hw 1 mode channel: ‘channel’ with ‘hw’ set to 1 is a new new hardware +offload mode in mqprio that makes full use of the mqprio options, the +TCs, the queue configurations, and the QoS parameters. + +shaper bw_rlimit: for each tc, sets minimum and maximum bandwidth rates. +Totals must be equal or less than port speed. + +For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network +monitoring tools such as ifstat or sar –n DEV [interval] [number of samples] + +2. Enable HW TC offload on interface:: + + # ethtool -K <interface> hw-tc-offload on + +3. Apply TCs to ingress (RX) flow of interface:: + + # tc qdisc add dev <interface> ingress + +NOTES: + - Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory. + - ADq is not compatible with cloud filters. + - Setting up channels via ethtool (ethtool -L) is not supported when the + TCs are configured using mqprio. + - You must have iproute2 latest version + - NVM version 6.01 or later is required. + - ADq cannot be enabled when any the following features are enabled: Data + Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband + Filters. + - If another driver (for example, DPDK) has set cloud filters, you cannot + enable ADq. + - Tunnel filters are not supported in ADq. If encapsulated packets do + arrive in non-tunnel mode, filtering will be done on the inner headers. + For example, for VXLAN traffic in non-tunnel mode, PCTYPE is identified + as a VXLAN encapsulated packet, outer headers are ignored. Therefore, + inner headers are matched. + - If a TC filter on a PF matches traffic over a VF (on the PF), that + traffic will be routed to the appropriate queue of the PF, and will + not be passed on the VF. Such traffic will end up getting dropped higher + up in the TCP/IP stack as it does not match PF address data. + - If traffic matches multiple TC filters that point to different TCs, + that traffic will be duplicated and sent to all matching TC queues. + The hardware switch mirrors the packet to a VSI list when multiple + filters are matched. + + +Known Issues/Troubleshooting +============================ + +NOTE: 1 Gb devices based on the Intel(R) Ethernet Network Connection X722 do +not support the following features: + + * Data Center Bridging (DCB) + * QOS + * VMQ + * SR-IOV + * Task Encapsulation offload (VXLAN, NVGRE) + * Energy Efficient Ethernet (EEE) + * Auto-media detect + +Unexpected Issues when the device driver and DPDK share a device +---------------------------------------------------------------- +Unexpected issues may result when an i40e device is in multi driver mode and +the kernel driver and DPDK driver are sharing the device. This is because +access to the global NIC resources is not synchronized between multiple +drivers. Any change to the global NIC configuration (writing to a global +register, setting global configuration by AQ, or changing switch modes) will +affect all ports and drivers on the device. Loading DPDK with the +"multi-driver" module parameter may mitigate some of the issues. + +TC0 must be enabled when setting up DCB on a switch +--------------------------------------------------- +The kernel assumes that TC0 is available, and will disable Priority Flow +Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is +enabled when setting up DCB on your switch. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/i40e.txt b/Documentation/networking/i40e.txt deleted file mode 100644 index c2d6e1824b29..000000000000 --- a/Documentation/networking/i40e.txt +++ /dev/null @@ -1,190 +0,0 @@ -Linux Base Driver for the Intel(R) Ethernet Controller XL710 Family -=================================================================== - -Intel i40e Linux driver. -Copyright(c) 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Additional Configurations -- Performance Tuning -- Known Issues -- Support - - -Identifying Your Adapter -======================== - -The driver in this release is compatible with the Intel Ethernet -Controller XL710 Family. - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/network/sb/CS-012904.htm - - -Enabling the driver -=================== - -The driver is enabled via the standard kernel configuration system, -using the make command: - - make config/oldconfig/menuconfig/etc. - -The driver is located in the menu structure at: - - -> Device Drivers - -> Network device support (NETDEVICES [=y]) - -> Ethernet driver support - -> Intel devices - -> Intel(R) Ethernet Controller XL710 Family - -Additional Configurations -========================= - - Generic Receive Offload (GRO) - ----------------------------- - The driver supports the in-kernel software implementation of GRO. GRO has - shown that by coalescing Rx traffic into larger chunks of data, CPU - utilization can be significantly reduced when under large Rx load. GRO is - an evolution of the previously-used LRO interface. GRO is able to coalesce - other protocols besides TCP. It's also safe to use with configurations that - are problematic for LRO, namely bridging and iSCSI. - - Ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. The latest - ethtool version is required for this functionality. - - The latest release of ethtool can be found from - https://www.kernel.org/pub/software/network/ethtool - - - Flow Director n-ntuple traffic filters (FDir) - --------------------------------------------- - The driver utilizes the ethtool interface for configuring ntuple filters, - via "ethtool -N <device> <filter>". - - The sctp4, ip4, udp4, and tcp4 flow types are supported with the standard - fields including src-ip, dst-ip, src-port and dst-port. The driver only - supports fully enabling or fully masking the fields, so use of the mask - fields for partial matches is not supported. - - Additionally, the driver supports using the action to specify filters for a - Virtual Function. You can specify the action as a 64bit value, where the - lower 32 bits represents the queue number, while the next 8 bits represent - which VF. Note that 0 is the PF, so the VF identifier is offset by 1. For - example: - - ... action 0x800000002 ... - - Would indicate to direct traffic for Virtual Function 7 (8 minus 1) on queue - 2 of that VF. - - The driver also supports using the user-defined field to specify 2 bytes of - arbitrary data to match within the packet payload in addition to the regular - fields. The data is specified in the lower 32bits of the user-def field in - the following way: - - +----------------------------+---------------------------+ - | 31 28 24 20 16 | 15 12 8 4 0| - +----------------------------+---------------------------+ - | offset into packet payload | 2 bytes of flexible data | - +----------------------------+---------------------------+ - - As an example, - - ... user-def 0x4FFFF .... - - means to match the value 0xFFFF 4 bytes into the packet payload. Note that - the offset is based on the beginning of the payload, and not the beginning - of the packet. Thus - - flow-type tcp4 ... user-def 0x8BEAF .... - - would match TCP/IPv4 packets which have the value 0xBEAF 8bytes into the - TCP/IPv4 payload. - - For ICMP, the hardware parses the ICMP header as 4 bytes of header and 4 - bytes of payload, so if you want to match an ICMP frames payload you may need - to add 4 to the offset in order to match the data. - - Furthermore, the offset can only be up to a value of 64, as the hardware - will only read up to 64 bytes of data from the payload. It must also be even - as the flexible data is 2 bytes long and must be aligned to byte 0 of the - packet payload. - - When programming filters, the hardware is limited to using a single input - set for each flow type. This means that it is an error to program two - different filters with the same type that don't match on the same fields. - Thus the second of the following two commands will fail: - - ethtool -N <device> flow-type tcp4 src-ip 192.168.0.7 action 5 - ethtool -N <device> flow-type tcp4 dst-ip 192.168.15.18 action 1 - - This is because the first filter will be accepted and reprogram the input - set for TCPv4 filters, but the second filter will be unable to reprogram the - input set until all the conflicting TCPv4 filters are first removed. - - Note that the user-defined flexible offset is also considered part of the - input set and cannot be programmed separately for multiple filters of the - same type. However, the flexible data is not part of the input set and - multiple filters may use the same offset but match against different data. - - Data Center Bridging (DCB) - -------------------------- - DCB configuration is not currently supported. - - FCoE - ---- - The driver supports Fiber Channel over Ethernet (FCoE) and Data Center - Bridging (DCB) functionality. Configuring DCB and FCoE is outside the scope - of this driver doc. Refer to http://www.open-fcoe.org/ for FCoE project - information and http://www.open-lldp.org/ or email list - e1000-eedc@lists.sourceforge.net for DCB information. - - MAC and VLAN anti-spoofing feature - ---------------------------------- - When a malicious driver attempts to send a spoofed packet, it is dropped by - the hardware and not transmitted. An interrupt is sent to the PF driver - notifying it of the spoof attempt. - - When a spoofed packet is detected the PF driver will send the following - message to the system log (displayed by the "dmesg" command): - - Spoof event(s) detected on VF (n) - - Where n=the VF that attempted to do the spoofing. - - -Performance Tuning -================== - -An excellent article on performance tuning can be found at: - -http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Mark_Wagner.pdf - - -Known Issues -============ - - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://e1000.sourceforge.net - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sourceforge.net and copy -netdev@vger.kernel.org. diff --git a/Documentation/networking/i40evf.txt b/Documentation/networking/i40evf.txt deleted file mode 100644 index e9b3035b95d0..000000000000 --- a/Documentation/networking/i40evf.txt +++ /dev/null @@ -1,54 +0,0 @@ -Linux* Base Driver for Intel(R) Network Connection -================================================== - -Intel Ethernet Adaptive Virtual Function Linux driver. -Copyright(c) 2013-2017 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Known Issues/Troubleshooting -- Support - -This file describes the i40evf Linux* Base Driver. - -The i40evf driver supports the below mentioned virtual function -devices and can only be activated on kernels running the i40e or -newer Physical Function (PF) driver compiled with CONFIG_PCI_IOV. -The i40evf driver requires CONFIG_PCI_MSI to be enabled. - -The guest OS loading the i40evf driver must support MSI-X interrupts. - -Supported Hardware -================== -Intel XL710 X710 Virtual Function -Intel Ethernet Adaptive Virtual Function -Intel X722 Virtual Function - -Identifying Your Adapter -======================== - -For more information on how to identify your adapter, go to the -Adapter & Driver ID Guide at: - - http://support.intel.com/support/go/network/adapter/idguide.htm - -Known Issues/Troubleshooting -============================ - - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/iavf.rst b/Documentation/networking/iavf.rst new file mode 100644 index 000000000000..f8b42b64eb28 --- /dev/null +++ b/Documentation/networking/iavf.rst @@ -0,0 +1,281 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for Intel(R) Ethernet Adaptive Virtual Function +================================================================== + +Intel Ethernet Adaptive Virtual Function Linux driver. +Copyright(c) 2013-2018 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Additional Configurations +- Known Issues/Troubleshooting +- Support + +This file describes the iavf Linux* Base Driver. This driver was formerly +called i40evf. + +The iavf driver supports the below mentioned virtual function devices and +can only be activated on kernels running the i40e or newer Physical Function +(PF) driver compiled with CONFIG_PCI_IOV. The iavf driver requires +CONFIG_PCI_MSI to be enabled. + +The guest OS loading the iavf driver must support MSI-X interrupts. + +Identifying Your Adapter +======================== +The driver in this kernel is compatible with devices based on the following: + * Intel(R) XL710 X710 Virtual Function + * Intel(R) X722 Virtual Function + * Intel(R) XXV710 Virtual Function + * Intel(R) Ethernet Adaptive Virtual Function + +For the best performance, make sure the latest NVM/FW is installed on your +device. + +For information on how to identify your adapter, and for the latest NVM/FW +images and Intel network drivers, refer to the Intel Support website: +http://www.intel.com/support + + +Additional Features and Configurations +====================================== + +Viewing Link Messages +--------------------- +Link messages will not be displayed to the console if the distribution is +restricting system messages. In order to see network driver link messages on +your console, set dmesg to eight by entering the following:: + + dmesg -n 8 + +NOTE: This setting is not saved across reboots. + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: +https://www.kernel.org/pub/software/network/ethtool/ + +Setting VLAN Tag Stripping +-------------------------- +If you have applications that require Virtual Functions (VFs) to receive +packets with VLAN tags, you can disable VLAN tag stripping for the VF. The +Physical Function (PF) processes requests issued from the VF to enable or +disable VLAN tag stripping. Note that if the PF has assigned a VLAN to a VF, +then requests from that VF to set VLAN tag stripping will be ignored. + +To enable/disable VLAN tag stripping for a VF, issue the following command +from inside the VM in which you are running the VF:: + + ethtool -K <if_name> rxvlan on/off + +or alternatively:: + + ethtool --offload <if_name> rxvlan on/off + +Adaptive Virtual Function +------------------------- +Adaptive Virtual Function (AVF) allows the virtual function driver, or VF, to +adapt to changing feature sets of the physical function driver (PF) with which +it is associated. This allows system administrators to update a PF without +having to update all the VFs associated with it. All AVFs have a single common +device ID and branding string. + +AVFs have a minimum set of features known as "base mode," but may provide +additional features depending on what features are available in the PF with +which the AVF is associated. The following are base mode features: + +- 4 Queue Pairs (QP) and associated Configuration Status Registers (CSRs) + for Tx/Rx. +- i40e descriptors and ring format. +- Descriptor write-back completion. +- 1 control queue, with i40e descriptors, CSRs and ring format. +- 5 MSI-X interrupt vectors and corresponding i40e CSRs. +- 1 Interrupt Throttle Rate (ITR) index. +- 1 Virtual Station Interface (VSI) per VF. +- 1 Traffic Class (TC), TC0 +- Receive Side Scaling (RSS) with 64 entry indirection table and key, + configured through the PF. +- 1 unicast MAC address reserved per VF. +- 16 MAC address filters for each VF. +- Stateless offloads - non-tunneled checksums. +- AVF device ID. +- HW mailbox is used for VF to PF communications (including on Windows). + +IEEE 802.1ad (QinQ) Support +--------------------------- +The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN +IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as +"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks +allow L2 tunneling and the ability to segregate traffic within a particular +VLAN ID, among other uses. + +The following are examples of how to configure 802.1ad (QinQ):: + + ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24 + ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371 + +Where "24" and "371" are example VLAN IDs. + +NOTES: + Receive checksum offloads, cloud filters, and VLAN acceleration are not + supported for 802.1ad (QinQ) packets. + +Application Device Queues (ADq) +------------------------------- +Application Device Queues (ADq) allows you to dedicate one or more queues to a +specific application. This can reduce latency for the specified application, +and allow Tx traffic to be rate limited per application. Follow the steps below +to set ADq. + +1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface. +The shaper bw_rlimit parameter is optional. + +Example: Sets up two tcs, tc0 and tc1, with 16 queues each and max tx rate set +to 1Gbit for tc0 and 3Gbit for tc1. + +:: + + # tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1 + queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit + max_rate 1Gbit 3Gbit + +map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1 +sets priorities 0-3 to use tc0 and 4-7 to use tc1) + +queues: for each tc, <num queues>@<offset> (e.g. queues 16@0 16@16 assigns +16 queues to tc0 at offset 0 and 16 queues to tc1 at offset 16. Max total +number of queues for all tcs is 64 or number of cores, whichever is lower.) + +hw 1 mode channel: ‘channel’ with ‘hw’ set to 1 is a new new hardware +offload mode in mqprio that makes full use of the mqprio options, the +TCs, the queue configurations, and the QoS parameters. + +shaper bw_rlimit: for each tc, sets minimum and maximum bandwidth rates. +Totals must be equal or less than port speed. + +For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network +monitoring tools such as ifstat or sar –n DEV [interval] [number of samples] + +2. Enable HW TC offload on interface:: + + # ethtool -K <interface> hw-tc-offload on + +3. Apply TCs to ingress (RX) flow of interface:: + + # tc qdisc add dev <interface> ingress + +NOTES: + - Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory. + - ADq is not compatible with cloud filters. + - Setting up channels via ethtool (ethtool -L) is not supported when the TCs + are configured using mqprio. + - You must have iproute2 latest version + - NVM version 6.01 or later is required. + - ADq cannot be enabled when any the following features are enabled: Data + Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband Filters. + - If another driver (for example, DPDK) has set cloud filters, you cannot + enable ADq. + - Tunnel filters are not supported in ADq. If encapsulated packets do arrive + in non-tunnel mode, filtering will be done on the inner headers. For example, + for VXLAN traffic in non-tunnel mode, PCTYPE is identified as a VXLAN + encapsulated packet, outer headers are ignored. Therefore, inner headers are + matched. + - If a TC filter on a PF matches traffic over a VF (on the PF), that traffic + will be routed to the appropriate queue of the PF, and will not be passed on + the VF. Such traffic will end up getting dropped higher up in the TCP/IP + stack as it does not match PF address data. + - If traffic matches multiple TC filters that point to different TCs, that + traffic will be duplicated and sent to all matching TC queues. The hardware + switch mirrors the packet to a VSI list when multiple filters are matched. + + +Known Issues/Troubleshooting +============================ + +Traffic Is Not Being Passed Between VM and Client +------------------------------------------------- +You may not be able to pass traffic between a client system and a +Virtual Machine (VM) running on a separate host if the Virtual Function +(VF, or Virtual NIC) is not in trusted mode and spoof checking is enabled +on the VF. Note that this situation can occur in any combination of client, +host, and guest operating system. For information on how to set the VF to +trusted mode, refer to the section "VLAN Tag Packet Steering" in this +readme document. For information on setting spoof checking, refer to the +section "MAC and VLAN anti-spoofing feature" in this readme document. + +Do not unload port driver if VF with active VM is bound to it +------------------------------------------------------------- +Do not unload a port's driver if a Virtual Function (VF) with an active Virtual +Machine (VM) is bound to it. Doing so will cause the port to appear to hang. +Once the VM shuts down, or otherwise releases the VF, the command will complete. + +Virtual machine does not get link +--------------------------------- +If the virtual machine has more than one virtual port assigned to it, and those +virtual ports are bound to different physical ports, you may not get link on +all of the virtual ports. The following command may work around the issue:: + + ethtool -r <PF> + +Where <PF> is the PF interface in the host, for example: p5p1. You may need to +run the command more than once to get link on all virtual ports. + +MAC address of Virtual Function changes unexpectedly +---------------------------------------------------- +If a Virtual Function's MAC address is not assigned in the host, then the VF +(virtual function) driver will use a random MAC address. This random MAC +address may change each time the VF driver is reloaded. You can assign a static +MAC address in the host machine. This static MAC address will survive +a VF driver reload. + +Driver Buffer Overflow Fix +-------------------------- +The fix to resolve CVE-2016-8105, referenced in Intel SA-00069 +https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00069.html +is included in this and future versions of the driver. + +Multiple Interfaces on Same Ethernet Broadcast Network +------------------------------------------------------ +Due to the default ARP behavior on Linux, it is not possible to have one system +on two IP networks in the same Ethernet broadcast domain (non-partitioned +switch) behave as expected. All Ethernet interfaces will respond to IP traffic +for any IP address assigned to the system. This results in unbalanced receive +traffic. + +If you have multiple interfaces in a server, either turn on ARP filtering by +entering:: + + echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter + +NOTE: This setting is not saved across reboots. The configuration change can be +made permanent by adding the following line to the file /etc/sysctl.conf:: + + net.ipv4.conf.all.arp_filter = 1 + +Another alternative is to install the interfaces in separate broadcast domains +(either in different switches or in a switch partitioned to VLANs). + +Rx Page Allocation Errors +------------------------- +'Page allocation failure. order:0' errors may occur under stress. +This is caused by the way the Linux kernel reports this stressed condition. + + +Support +======= +For general information, go to the Intel support website at: + +https://support.intel.com + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on the supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net diff --git a/Documentation/networking/ice.rst b/Documentation/networking/ice.rst new file mode 100644 index 000000000000..1e4948c9e989 --- /dev/null +++ b/Documentation/networking/ice.rst @@ -0,0 +1,45 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for the Intel(R) Ethernet Connection E800 Series +=================================================================== + +Intel ice Linux driver. +Copyright(c) 2018 Intel Corporation. + +Contents +======== + +- Enabling the driver +- Support + +The driver in this release supports Intel's E800 Series of products. For +more information, visit Intel's support page at https://support.intel.com. + +Enabling the driver +=================== +The driver is enabled via the standard kernel configuration system, +using the make command:: + + make oldconfig/silentoldconfig/menuconfig/etc. + +The driver is located in the menu structure at: + + -> Device Drivers + -> Network device support (NETDEVICES [=y]) + -> Ethernet driver support + -> Intel devices + -> Intel(R) Ethernet Connection E800 Series Support + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/ice.txt b/Documentation/networking/ice.txt deleted file mode 100644 index 6261c46378e1..000000000000 --- a/Documentation/networking/ice.txt +++ /dev/null @@ -1,39 +0,0 @@ -Intel(R) Ethernet Connection E800 Series Linux Driver -=================================================================== - -Intel ice Linux driver. -Copyright(c) 2018 Intel Corporation. - -Contents -======== -- Enabling the driver -- Support - -The driver in this release supports Intel's E800 Series of products. For -more information, visit Intel's support page at http://support.intel.com. - -Enabling the driver -=================== - -The driver is enabled via the standard kernel configuration system, -using the make command: - - Make oldconfig/silentoldconfig/menuconfig/etc. - -The driver is located in the menu structure at: - - -> Device Drivers - -> Network device support (NETDEVICES [=y]) - -> Ethernet driver support - -> Intel devices - -> Intel(R) Ethernet Connection E800 Series Support - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -If an issue is identified with the released source code, please email -the maintainer listed in the MAINTAINERS file. diff --git a/Documentation/networking/igb.rst b/Documentation/networking/igb.rst new file mode 100644 index 000000000000..ba16b86d5593 --- /dev/null +++ b/Documentation/networking/igb.rst @@ -0,0 +1,193 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for Intel(R) Ethernet Network Connection +=========================================================== + +Intel Gigabit Linux driver. +Copyright(c) 1999-2018 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Command Line Parameters +- Additional Configurations +- Support + + +Identifying Your Adapter +======================== +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +http://www.intel.com/support + + +Command Line Parameters +======================== +If the driver is built as a module, the following optional parameters are used +by entering them on the command line with the modprobe command using this +syntax:: + + modprobe igb [<option>=<VAL1>,<VAL2>,...] + +There needs to be a <VAL#> for each network port in the system supported by +this driver. The values will be applied to each instance, in function order. +For example:: + + modprobe igb max_vfs=2,4 + +In this case, there are two network ports supported by igb in the system. + +NOTE: A descriptor describes a data buffer and attributes related to the data +buffer. This information is accessed by the hardware. + +max_vfs +------- +:Valid Range: 0-7 + +This parameter adds support for SR-IOV. It causes the driver to spawn up to +max_vfs worth of virtual functions. If the value is greater than 0 it will +also force the VMDq parameter to be 1 or more. + +The parameters for the driver are referenced by position. Thus, if you have a +dual port adapter, or more than one adapter in your system, and want N virtual +functions per port, you must specify a number for each port with each parameter +separated by a comma. For example:: + + modprobe igb max_vfs=4 + +This will spawn 4 VFs on the first port. + +:: + + modprobe igb max_vfs=2,4 + +This will spawn 2 VFs on the first port and 4 VFs on the second port. + +NOTE: Caution must be used in loading the driver with these parameters. +Depending on your system configuration, number of slots, etc., it is impossible +to predict in all cases where the positions would be on the command line. + +NOTE: Neither the device nor the driver control how VFs are mapped into config +space. Bus layout will vary by operating system. On operating systems that +support it, you can check sysfs to find the mapping. + +NOTE: When either SR-IOV mode or VMDq mode is enabled, hardware VLAN filtering +and VLAN tag stripping/insertion will remain enabled. Please remove the old +VLAN filter before the new VLAN filter is added. For example:: + + ip link set eth0 vf 0 vlan 100 // set vlan 100 for VF 0 + ip link set eth0 vf 0 vlan 0 // Delete vlan 100 + ip link set eth0 vf 0 vlan 200 // set a new vlan 200 for VF 0 + +Debug +----- +:Valid Range: 0-16 (0=none,...,16=all) +:Default Value: 0 + +This parameter adjusts the level debug messages displayed in the system logs. + + +Additional Features and Configurations +====================================== + +Jumbo Frames +------------ +Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) +to a value larger than the default value of 1500. + +Use the ifconfig command to increase the MTU size. For example, enter the +following where <x> is the interface number:: + + ifconfig eth<x> mtu 9000 up + +Alternatively, you can use the ip command as follows:: + + ip link set mtu 9000 dev eth<x> + ip link set up dev eth<x> + +This setting is not saved across reboots. The setting change can be made +permanent by adding 'MTU=9000' to the file: + +- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x> +- For SLES: /etc/sysconfig/network/<config_file> + +NOTE: The maximum MTU setting for Jumbo Frames is 9216. This value coincides +with the maximum Jumbo Frames size of 9234 bytes. + +NOTE: Using Jumbo frames at 10 or 100 Mbps is not supported and may result in +poor performance or loss of link. + + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: + +https://www.kernel.org/pub/software/network/ethtool/ + + +Enabling Wake on LAN* (WoL) +--------------------------- +WoL is configured through the ethtool* utility. + +WoL will be enabled on the system during the next shut down or reboot. For +this driver version, in order to enable WoL, the igb driver must be loaded +prior to shutting down or suspending the system. + +NOTE: Wake on LAN is only supported on port A of multi-port devices. Also +Wake On LAN is not supported for the following device: +- Intel(R) Gigabit VT Quad Port Server Adapter + + +Multiqueue +---------- +In this mode, a separate MSI-X vector is allocated for each queue and one for +"other" interrupts such as link status change and errors. All interrupts are +throttled via interrupt moderation. Interrupt moderation must be used to avoid +interrupt storms while the driver is processing one interrupt. The moderation +value should be at least as large as the expected time for the driver to +process an interrupt. Multiqueue is off by default. + +REQUIREMENTS: MSI-X support is required for Multiqueue. If MSI-X is not found, +the system will fallback to MSI or to Legacy interrupts. This driver supports +receive multiqueue on all kernels that support MSI-X. + +NOTE: On some kernels a reboot is required to switch between single queue mode +and multiqueue mode or vice-versa. + + +MAC and VLAN anti-spoofing feature +---------------------------------- +When a malicious driver attempts to send a spoofed packet, it is dropped by the +hardware and not transmitted. + +An interrupt is sent to the PF driver notifying it of the spoof attempt. When a +spoofed packet is detected, the PF driver will send the following message to +the system log (displayed by the "dmesg" command): +Spoof event(s) detected on VF(n), where n = the VF that attempted to do the +spoofing + + +Setting MAC Address, VLAN and Rate Limit Using IProute2 Tool +------------------------------------------------------------ +You can set a MAC address of a Virtual Function (VF), a default VLAN and the +rate limit using the IProute2 tool. Download the latest version of the +IProute2 tool from Sourceforge if your version does not have all the features +you require. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/igb.txt b/Documentation/networking/igb.txt deleted file mode 100644 index f90643ef39c9..000000000000 --- a/Documentation/networking/igb.txt +++ /dev/null @@ -1,129 +0,0 @@ -Linux* Base Driver for Intel(R) Ethernet Network Connection -=========================================================== - -Intel Gigabit Linux driver. -Copyright(c) 1999 - 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Additional Configurations -- Support - -Identifying Your Adapter -======================== - -This driver supports all 82575, 82576 and 82580-based Intel (R) gigabit network -connections. - -For specific information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/go/network/adapter/idguide.htm - -Command Line Parameters -======================= - -The default value for each parameter is generally the recommended setting, -unless otherwise noted. - -max_vfs -------- -Valid Range: 0-7 -Default Value: 0 - -This parameter adds support for SR-IOV. It causes the driver to spawn up to -max_vfs worth of virtual function. - -Additional Configurations -========================= - - Jumbo Frames - ------------ - Jumbo Frames support is enabled by changing the MTU to a value larger than - the default of 1500. Use the ip command to increase the MTU size. - For example: - - ip link set dev eth<x> mtu 9000 - - This setting is not saved across reboots. - - Notes: - - - The maximum MTU setting for Jumbo Frames is 9216. This value coincides - with the maximum Jumbo Frames size of 9234 bytes. - - - Using Jumbo frames at 10 or 100 Mbps is not supported and may result in - poor performance or loss of link. - - ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. The latest - version of ethtool can be found at: - - https://www.kernel.org/pub/software/network/ethtool/ - - Enabling Wake on LAN* (WoL) - --------------------------- - WoL is configured through the ethtool* utility. - - For instructions on enabling WoL with ethtool, refer to the ethtool man page. - - WoL will be enabled on the system during the next shut down or reboot. - For this driver version, in order to enable WoL, the igb driver must be - loaded when shutting down or rebooting the system. - - Wake On LAN is only supported on port A of multi-port adapters. - - Wake On LAN is not supported for the Intel(R) Gigabit VT Quad Port Server - Adapter. - - Multiqueue - ---------- - In this mode, a separate MSI-X vector is allocated for each queue and one - for "other" interrupts such as link status change and errors. All - interrupts are throttled via interrupt moderation. Interrupt moderation - must be used to avoid interrupt storms while the driver is processing one - interrupt. The moderation value should be at least as large as the expected - time for the driver to process an interrupt. Multiqueue is off by default. - - REQUIREMENTS: MSI-X support is required for Multiqueue. If MSI-X is not - found, the system will fallback to MSI or to Legacy interrupts. - - MAC and VLAN anti-spoofing feature - ---------------------------------- - When a malicious driver attempts to send a spoofed packet, it is dropped by - the hardware and not transmitted. An interrupt is sent to the PF driver - notifying it of the spoof attempt. - - When a spoofed packet is detected the PF driver will send the following - message to the system log (displayed by the "dmesg" command): - - Spoof event(s) detected on VF(n) - - Where n=the VF that attempted to do the spoofing. - - Setting MAC Address, VLAN and Rate Limit Using IProute2 Tool - ------------------------------------------------------------ - You can set a MAC address of a Virtual Function (VF), a default VLAN and the - rate limit using the IProute2 tool. Download the latest version of the - iproute2 tool from Sourceforge if your version does not have all the - features you require. - - -Support -======= - -For general information, go to the Intel support website at: - - www.intel.com/support/ - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/igbvf.rst b/Documentation/networking/igbvf.rst new file mode 100644 index 000000000000..a8a9ffa4f8d3 --- /dev/null +++ b/Documentation/networking/igbvf.rst @@ -0,0 +1,64 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Virtual Function Driver for Intel(R) 1G Ethernet +============================================================ + +Intel Gigabit Virtual Function Linux driver. +Copyright(c) 1999-2018 Intel Corporation. + +Contents +======== +- Identifying Your Adapter +- Additional Configurations +- Support + +This driver supports Intel 82576-based virtual function devices-based virtual +function devices that can only be activated on kernels that support SR-IOV. + +SR-IOV requires the correct platform and OS support. + +The guest OS loading this driver must support MSI-X interrupts. + +For questions related to hardware requirements, refer to the documentation +supplied with your Intel adapter. All hardware requirements listed apply to use +with Linux. + +Driver information can be obtained using ethtool, lspci, and ifconfig. +Instructions on updating ethtool can be found in the section Additional +Configurations later in this document. + +NOTE: There is a limit of a total of 32 shared VLANs to 1 or more VFs. + + +Identifying Your Adapter +======================== +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +http://www.intel.com/support + + +Additional Features and Configurations +====================================== + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: + +https://www.kernel.org/pub/software/network/ethtool/ + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/igbvf.txt b/Documentation/networking/igbvf.txt deleted file mode 100644 index bd404735fb46..000000000000 --- a/Documentation/networking/igbvf.txt +++ /dev/null @@ -1,80 +0,0 @@ -Linux* Base Driver for Intel(R) Ethernet Network Connection -=========================================================== - -Intel Gigabit Linux driver. -Copyright(c) 1999 - 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Additional Configurations -- Support - -This file describes the igbvf Linux* Base Driver for Intel Network Connection. - -The igbvf driver supports 82576-based virtual function devices that can only -be activated on kernels that support SR-IOV. SR-IOV requires the correct -platform and OS support. - -The igbvf driver requires the igb driver, version 2.0 or later. The igbvf -driver supports virtual functions generated by the igb driver with a max_vfs -value of 1 or greater. For more information on the max_vfs parameter refer -to the README included with the igb driver. - -The guest OS loading the igbvf driver must support MSI-X interrupts. - -This driver is only supported as a loadable module at this time. Intel is -not supplying patches against the kernel source to allow for static linking -of the driver. For questions related to hardware requirements, refer to the -documentation supplied with your Intel Gigabit adapter. All hardware -requirements listed apply to use with Linux. - -Instructions on updating ethtool can be found in the section "Additional -Configurations" later in this document. - -VLANs: There is a limit of a total of 32 shared VLANs to 1 or more VFs. - -Identifying Your Adapter -======================== - -The igbvf driver supports 82576-based virtual function devices that can only -be activated on kernels that support SR-IOV. - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/go/network/adapter/idguide.htm - -For the latest Intel network drivers for Linux, refer to the following -website. In the search field, enter your adapter name or type, or use the -networking link on the left to search for your adapter: - - http://downloadcenter.intel.com/scripts-df-external/Support_Intel.aspx - -Additional Configurations -========================= - - ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. The ethtool - version 3.0 or later is required for this functionality, although we - strongly recommend downloading the latest version at: - - https://www.kernel.org/pub/software/network/ethtool/ - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index fcd710f2cc7a..bd89dae8d578 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -14,6 +14,16 @@ Contents: dpaa2/index e100 e1000 + e1000e + fm10k + igb + igbvf + ixgb + ixgbe + ixgbevf + i40e + iavf + ice kapi z8530book msg_zerocopy diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 960de8fe3f40..163b5ff1073c 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1442,6 +1442,14 @@ max_hbh_length - INTEGER header. Default: INT_MAX (unlimited) +skip_notify_on_dev_down - BOOLEAN + Controls whether an RTM_DELROUTE message is generated for routes + removed when a device is taken down or deleted. IPv4 does not + generate this message; IPv6 does by default. Setting this sysctl + to true skips the message, making IPv4 and IPv6 on par in relying + on userspace caches to track link events and evict routes. + Default: false (generate message) + IPv6 Fragmentation: ip6frag_high_thresh - INTEGER diff --git a/Documentation/networking/ixgb.rst b/Documentation/networking/ixgb.rst new file mode 100644 index 000000000000..8bd80e27843d --- /dev/null +++ b/Documentation/networking/ixgb.rst @@ -0,0 +1,467 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection +===================================================================== + +October 1, 2018 + + +Contents +======== + +- In This Release +- Identifying Your Adapter +- Command Line Parameters +- Improving Performance +- Additional Configurations +- Known Issues/Troubleshooting +- Support + + + +In This Release +=============== + +This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R) +Network Connection. This driver includes support for Itanium(R)2-based +systems. + +For questions related to hardware requirements, refer to the documentation +supplied with your 10 Gigabit adapter. All hardware requirements listed apply +to use with Linux. + +The following features are available in this kernel: + - Native VLANs + - Channel Bonding (teaming) + - SNMP + +Channel Bonding documentation can be found in the Linux kernel source: +/Documentation/networking/bonding.txt + +The driver information previously displayed in the /proc filesystem is not +supported in this release. Alternatively, you can use ethtool (version 1.6 +or later), lspci, and iproute2 to obtain the same information. + +Instructions on updating ethtool can be found in the section "Additional +Configurations" later in this document. + + +Identifying Your Adapter +======================== + +The following Intel network adapters are compatible with the drivers in this +release: + ++------------+------------------------------+----------------------------------+ +| Controller | Adapter Name | Physical Layer | ++============+==============================+==================================+ +| 82597EX | Intel(R) PRO/10GbE LR/SR/CX4 | - 10G Base-LR (fiber) | +| | Server Adapters | - 10G Base-SR (fiber) | +| | | - 10G Base-CX4 (copper) | ++------------+------------------------------+----------------------------------+ + +For more information on how to identify your adapter, go to the Adapter & +Driver ID Guide at: + + https://support.intel.com + + +Command Line Parameters +======================= + +If the driver is built as a module, the following optional parameters are +used by entering them on the command line with the modprobe command using +this syntax:: + + modprobe ixgb [<option>=<VAL1>,<VAL2>,...] + +For example, with two 10GbE PCI adapters, entering:: + + modprobe ixgb TxDescriptors=80,128 + +loads the ixgb driver with 80 TX resources for the first adapter and 128 TX +resources for the second adapter. + +The default value for each parameter is generally the recommended setting, +unless otherwise noted. + +Copybreak +--------- +:Valid Range: 0-XXXX +:Default Value: 256 + + This is the maximum size of packet that is copied to a new buffer on + receive. + +Debug +----- +:Valid Range: 0-16 (0=none,...,16=all) +:Default Value: 0 + + This parameter adjusts the level of debug messages displayed in the + system logs. + +FlowControl +----------- +:Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx) +:Default Value: 1 if no EEPROM, otherwise read from EEPROM + + This parameter controls the automatic generation(Tx) and response(Rx) to + Ethernet PAUSE frames. There are hardware bugs associated with enabling + Tx flow control so beware. + +RxDescriptors +------------- +:Valid Range: 64-4096 +:Default Value: 1024 + + This value is the number of receive descriptors allocated by the driver. + Increasing this value allows the driver to buffer more incoming packets. + Each descriptor is 16 bytes. A receive buffer is also allocated for + each descriptor and can be either 2048, 4056, 8192, or 16384 bytes, + depending on the MTU setting. When the MTU size is 1500 or less, the + receive buffer size is 2048 bytes. When the MTU is greater than 1500 the + receive buffer size will be either 4056, 8192, or 16384 bytes. The + maximum MTU size is 16114. + +TxDescriptors +------------- +:Valid Range: 64-4096 +:Default Value: 256 + + This value is the number of transmit descriptors allocated by the driver. + Increasing this value allows the driver to queue more transmits. Each + descriptor is 16 bytes. + +RxIntDelay +---------- +:Valid Range: 0-65535 (0=off) +:Default Value: 72 + + This value delays the generation of receive interrupts in units of + 0.8192 microseconds. Receive interrupt reduction can improve CPU + efficiency if properly tuned for specific network traffic. Increasing + this value adds extra latency to frame reception and can end up + decreasing the throughput of TCP traffic. If the system is reporting + dropped receives, this value may be set too high, causing the driver to + run out of available receive descriptors. + +TxIntDelay +---------- +:Valid Range: 0-65535 (0=off) +:Default Value: 32 + + This value delays the generation of transmit interrupts in units of + 0.8192 microseconds. Transmit interrupt reduction can improve CPU + efficiency if properly tuned for specific network traffic. Increasing + this value adds extra latency to frame transmission and can end up + decreasing the throughput of TCP traffic. If this value is set too high, + it will cause the driver to run out of available transmit descriptors. + +XsumRX +------ +:Valid Range: 0-1 +:Default Value: 1 + + A value of '1' indicates that the driver should enable IP checksum + offload for received packets (both UDP and TCP) to the adapter hardware. + +RxFCHighThresh +-------------- +:Valid Range: 1,536-262,136 (0x600 - 0x3FFF8, 8 byte granularity) +:Default Value: 196,608 (0x30000) + + Receive Flow control high threshold (when we send a pause frame) + +RxFCLowThresh +------------- +:Valid Range: 64-262,136 (0x40 - 0x3FFF8, 8 byte granularity) +:Default Value: 163,840 (0x28000) + + Receive Flow control low threshold (when we send a resume frame) + +FCReqTimeout +------------ +:Valid Range: 1-65535 +:Default Value: 65535 + + Flow control request timeout (how long to pause the link partner's tx) + +IntDelayEnable +-------------- +:Value Range: 0,1 +:Default Value: 1 + + Interrupt Delay, 0 disables transmit interrupt delay and 1 enables it. + + +Improving Performance +===================== + +With the 10 Gigabit server adapters, the default Linux configuration will +very likely limit the total available throughput artificially. There is a set +of configuration changes that, when applied together, will increase the ability +of Linux to transmit and receive data. The following enhancements were +originally acquired from settings published at http://www.spec.org/web99/ for +various submitted results using Linux. + +NOTE: + These changes are only suggestions, and serve as a starting point for + tuning your network performance. + +The changes are made in three major ways, listed in order of greatest effect: + +- Use ip link to modify the mtu (maximum transmission unit) and the txqueuelen + parameter. +- Use sysctl to modify /proc parameters (essentially kernel tuning) +- Use setpci to modify the MMRBC field in PCI-X configuration space to increase + transmit burst lengths on the bus. + +NOTE: + setpci modifies the adapter's configuration registers to allow it to read + up to 4k bytes at a time (for transmits). However, for some systems the + behavior after modifying this register may be undefined (possibly errors of + some kind). A power-cycle, hard reset or explicitly setting the e6 register + back to 22 (setpci -d 8086:1a48 e6.b=22) may be required to get back to a + stable configuration. + +- COPY these lines and paste them into ixgb_perf.sh: + +:: + + #!/bin/bash + echo "configuring network performance , edit this file to change the interface + or device ID of 10GbE card" + # set mmrbc to 4k reads, modify only Intel 10GbE device IDs + # replace 1a48 with appropriate 10GbE device's ID installed on the system, + # if needed. + setpci -d 8086:1a48 e6.b=2e + # set the MTU (max transmission unit) - it requires your switch and clients + # to change as well. + # set the txqueuelen + # your ixgb adapter should be loaded as eth1 for this to work, change if needed + ip li set dev eth1 mtu 9000 txqueuelen 1000 up + # call the sysctl utility to modify /proc/sys entries + sysctl -p ./sysctl_ixgb.conf + +- COPY these lines and paste them into sysctl_ixgb.conf: + +:: + + # some of the defaults may be different for your kernel + # call this file with sysctl -p <this file> + # these are just suggested values that worked well to increase throughput in + # several network benchmark tests, your mileage may vary + + ### IPV4 specific settings + # turn TCP timestamp support off, default 1, reduces CPU use + net.ipv4.tcp_timestamps = 0 + # turn SACK support off, default on + # on systems with a VERY fast bus -> memory interface this is the big gainer + net.ipv4.tcp_sack = 0 + # set min/default/max TCP read buffer, default 4096 87380 174760 + net.ipv4.tcp_rmem = 10000000 10000000 10000000 + # set min/pressure/max TCP write buffer, default 4096 16384 131072 + net.ipv4.tcp_wmem = 10000000 10000000 10000000 + # set min/pressure/max TCP buffer space, default 31744 32256 32768 + net.ipv4.tcp_mem = 10000000 10000000 10000000 + + ### CORE settings (mostly for socket and UDP effect) + # set maximum receive socket buffer size, default 131071 + net.core.rmem_max = 524287 + # set maximum send socket buffer size, default 131071 + net.core.wmem_max = 524287 + # set default receive socket buffer size, default 65535 + net.core.rmem_default = 524287 + # set default send socket buffer size, default 65535 + net.core.wmem_default = 524287 + # set maximum amount of option memory buffers, default 10240 + net.core.optmem_max = 524287 + # set number of unprocessed input packets before kernel starts dropping them; default 300 + net.core.netdev_max_backlog = 300000 + +Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface +your ixgb driver is using and/or replace '1a48' with appropriate 10GbE device's +ID installed on the system. + +NOTE: + Unless these scripts are added to the boot process, these changes will + only last only until the next system reboot. + + +Resolving Slow UDP Traffic +-------------------------- +If your server does not seem to be able to receive UDP traffic as fast as it +can receive TCP traffic, it could be because Linux, by default, does not set +the network stack buffers as large as they need to be to support high UDP +transfer rates. One way to alleviate this problem is to allow more memory to +be used by the IP stack to store incoming data. + +For instance, use the commands:: + + sysctl -w net.core.rmem_max=262143 + +and:: + + sysctl -w net.core.rmem_default=262143 + +to increase the read buffer memory max and default to 262143 (256k - 1) from +defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables +will increase the amount of memory used by the network stack for receives, and +can be increased significantly more if necessary for your application. + + +Additional Configurations +========================= + +Configuring the Driver on Different Distributions +------------------------------------------------- +Configuring a network driver to load properly when the system is started is +distribution dependent. Typically, the configuration process involves adding +an alias line to /etc/modprobe.conf as well as editing other system startup +scripts and/or configuration files. Many popular Linux distributions ship +with tools to make these changes for you. To learn the proper way to +configure a network device for your system, refer to your distribution +documentation. If during this process you are asked for the driver or module +name, the name for the Linux Base Driver for the Intel 10GbE Family of +Adapters is ixgb. + +Viewing Link Messages +--------------------- +Link messages will not be displayed to the console if the distribution is +restricting system messages. In order to see network driver link messages on +your console, set dmesg to eight by entering the following:: + + dmesg -n 8 + +NOTE: This setting is not saved across reboots. + +Jumbo Frames +------------ +The driver supports Jumbo Frames for all adapters. Jumbo Frames support is +enabled by changing the MTU to a value larger than the default of 1500. +The maximum value for the MTU is 16114. Use the ip command to +increase the MTU size. For example:: + + ip li set dev ethx mtu 9000 + +The maximum MTU setting for Jumbo Frames is 16114. This value coincides +with the maximum Jumbo Frames size of 16128. + +Ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The ethtool +version 1.6 or later is required for this functionality. + +The latest release of ethtool can be found from +https://www.kernel.org/pub/software/network/ethtool/ + +NOTE: + The ethtool version 1.6 only supports a limited set of ethtool options. + Support for a more complete ethtool feature set can be enabled by + upgrading to the latest version. + +NAPI +---- +NAPI (Rx polling mode) is supported in the ixgb driver. + +See https://wiki.linuxfoundation.org/networking/napi for more information on +NAPI. + + +Known Issues/Troubleshooting +============================ + +NOTE: + After installing the driver, if your Intel Network Connection is not + working, verify in the "In This Release" section of the readme that you have + installed the correct driver. + +Cable Interoperability Issue with Fujitsu XENPAK Module in SmartBits Chassis +---------------------------------------------------------------------------- +Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 +Server adapter is connected to a Fujitsu XENPAK CX4 module in a SmartBits +chassis using 15 m/24AWG cable assemblies manufactured by Fujitsu or Leoni. +The CRC errors may be received either by the Intel(R) PRO/10GbE CX4 +Server adapter or the SmartBits. If this situation occurs using a different +cable assembly may resolve the issue. + +Cable Interoperability Issues with HP Procurve 3400cl Switch Port +----------------------------------------------------------------- +Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 Server +adapter is connected to an HP Procurve 3400cl switch port using short cables +(1 m or shorter). If this situation occurs, using a longer cable may resolve +the issue. + +Excessive CRC errors may be observed using Fujitsu 24AWG cable assemblies that +Are 10 m or longer or where using a Leoni 15 m/24AWG cable assembly. The CRC +errors may be received either by the CX4 Server adapter or at the switch. If +this situation occurs, using a different cable assembly may resolve the issue. + +Jumbo Frames System Requirement +------------------------------- +Memory allocation failures have been observed on Linux systems with 64 MB +of RAM or less that are running Jumbo Frames. If you are using Jumbo +Frames, your system may require more than the advertised minimum +requirement of 64 MB of system memory. + +Performance Degradation with Jumbo Frames +----------------------------------------- +Degradation in throughput performance may be observed in some Jumbo frames +environments. If this is observed, increasing the application's socket buffer +size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help. +See the specific application manual and /usr/src/linux*/Documentation/ +networking/ip-sysctl.txt for more details. + +Allocating Rx Buffers when Using Jumbo Frames +--------------------------------------------- +Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if +the available memory is heavily fragmented. This issue may be seen with PCI-X +adapters or with packet split disabled. This can be reduced or eliminated +by changing the amount of available memory for receive buffer allocation, by +increasing /proc/sys/vm/min_free_kbytes. + +Multiple Interfaces on Same Ethernet Broadcast Network +------------------------------------------------------ +Due to the default ARP behavior on Linux, it is not possible to have +one system on two IP networks in the same Ethernet broadcast domain +(non-partitioned switch) behave as expected. All Ethernet interfaces +will respond to IP traffic for any IP address assigned to the system. +This results in unbalanced receive traffic. + +If you have multiple interfaces in a server, do either of the following: + + - Turn on ARP filtering by entering:: + + echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter + + - Install the interfaces in separate broadcast domains - either in + different switches or in a switch partitioned to VLANs. + +UDP Stress Test Dropped Packet Issue +-------------------------------------- +Under small packets UDP stress test with 10GbE driver, the Linux system +may drop UDP packets due to the fullness of socket buffers. You may want +to change the driver's Flow Control variables to the minimum value for +controlling packet reception. + +Tx Hangs Possible Under Stress +------------------------------ +Under stress conditions, if TX hangs occur, turning off TSO +"ethtool -K eth0 tso off" may resolve the problem. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net diff --git a/Documentation/networking/ixgb.txt b/Documentation/networking/ixgb.txt deleted file mode 100644 index 09f71d71920a..000000000000 --- a/Documentation/networking/ixgb.txt +++ /dev/null @@ -1,433 +0,0 @@ -Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection -===================================================================== - -March 14, 2011 - - -Contents -======== - -- In This Release -- Identifying Your Adapter -- Building and Installation -- Command Line Parameters -- Improving Performance -- Additional Configurations -- Known Issues/Troubleshooting -- Support - - - -In This Release -=============== - -This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R) -Network Connection. This driver includes support for Itanium(R)2-based -systems. - -For questions related to hardware requirements, refer to the documentation -supplied with your 10 Gigabit adapter. All hardware requirements listed apply -to use with Linux. - -The following features are available in this kernel: - - Native VLANs - - Channel Bonding (teaming) - - SNMP - -Channel Bonding documentation can be found in the Linux kernel source: -/Documentation/networking/bonding.txt - -The driver information previously displayed in the /proc filesystem is not -supported in this release. Alternatively, you can use ethtool (version 1.6 -or later), lspci, and iproute2 to obtain the same information. - -Instructions on updating ethtool can be found in the section "Additional -Configurations" later in this document. - - -Identifying Your Adapter -======================== - -The following Intel network adapters are compatible with the drivers in this -release: - -Controller Adapter Name Physical Layer ----------- ------------ -------------- -82597EX Intel(R) PRO/10GbE LR/SR/CX4 10G Base-LR (1310 nm optical fiber) - Server Adapters 10G Base-SR (850 nm optical fiber) - 10G Base-CX4(twin-axial copper cabling) - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/network/sb/CS-012904.htm - - -Building and Installation -========================= - -select m for "Intel(R) PRO/10GbE support" located at: - Location: - -> Device Drivers - -> Network device support (NETDEVICES [=y]) - -> Ethernet (10000 Mbit) (NETDEV_10000 [=y]) -1. make modules && make modules_install - -2. Load the module: - - modprobe ixgb <parameter>=<value> - - The insmod command can be used if the full - path to the driver module is specified. For example: - - insmod /lib/modules/<KERNEL VERSION>/kernel/drivers/net/ixgb/ixgb.ko - - With 2.6 based kernels also make sure that older ixgb drivers are - removed from the kernel, before loading the new module: - - rmmod ixgb; modprobe ixgb - -3. Assign an IP address to the interface by entering the following, where - x is the interface number: - - ip addr add ethx <IP_address> - -4. Verify that the interface works. Enter the following, where <IP_address> - is the IP address for another machine on the same subnet as the interface - that is being tested: - - ping <IP_address> - - -Command Line Parameters -======================= - -If the driver is built as a module, the following optional parameters are -used by entering them on the command line with the modprobe command using -this syntax: - - modprobe ixgb [<option>=<VAL1>,<VAL2>,...] - -For example, with two 10GbE PCI adapters, entering: - - modprobe ixgb TxDescriptors=80,128 - -loads the ixgb driver with 80 TX resources for the first adapter and 128 TX -resources for the second adapter. - -The default value for each parameter is generally the recommended setting, -unless otherwise noted. - -FlowControl -Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx) -Default: Read from the EEPROM - If EEPROM is not detected, default is 1 - This parameter controls the automatic generation(Tx) and response(Rx) to - Ethernet PAUSE frames. There are hardware bugs associated with enabling - Tx flow control so beware. - -RxDescriptors -Valid Range: 64-512 -Default Value: 512 - This value is the number of receive descriptors allocated by the driver. - Increasing this value allows the driver to buffer more incoming packets. - Each descriptor is 16 bytes. A receive buffer is also allocated for - each descriptor and can be either 2048, 4056, 8192, or 16384 bytes, - depending on the MTU setting. When the MTU size is 1500 or less, the - receive buffer size is 2048 bytes. When the MTU is greater than 1500 the - receive buffer size will be either 4056, 8192, or 16384 bytes. The - maximum MTU size is 16114. - -RxIntDelay -Valid Range: 0-65535 (0=off) -Default Value: 72 - This value delays the generation of receive interrupts in units of - 0.8192 microseconds. Receive interrupt reduction can improve CPU - efficiency if properly tuned for specific network traffic. Increasing - this value adds extra latency to frame reception and can end up - decreasing the throughput of TCP traffic. If the system is reporting - dropped receives, this value may be set too high, causing the driver to - run out of available receive descriptors. - -TxDescriptors -Valid Range: 64-4096 -Default Value: 256 - This value is the number of transmit descriptors allocated by the driver. - Increasing this value allows the driver to queue more transmits. Each - descriptor is 16 bytes. - -XsumRX -Valid Range: 0-1 -Default Value: 1 - A value of '1' indicates that the driver should enable IP checksum - offload for received packets (both UDP and TCP) to the adapter hardware. - - -Improving Performance -===================== - -With the 10 Gigabit server adapters, the default Linux configuration will -very likely limit the total available throughput artificially. There is a set -of configuration changes that, when applied together, will increase the ability -of Linux to transmit and receive data. The following enhancements were -originally acquired from settings published at http://www.spec.org/web99/ for -various submitted results using Linux. - -NOTE: These changes are only suggestions, and serve as a starting point for - tuning your network performance. - -The changes are made in three major ways, listed in order of greatest effect: -- Use ip link to modify the mtu (maximum transmission unit) and the txqueuelen - parameter. -- Use sysctl to modify /proc parameters (essentially kernel tuning) -- Use setpci to modify the MMRBC field in PCI-X configuration space to increase - transmit burst lengths on the bus. - -NOTE: setpci modifies the adapter's configuration registers to allow it to read -up to 4k bytes at a time (for transmits). However, for some systems the -behavior after modifying this register may be undefined (possibly errors of -some kind). A power-cycle, hard reset or explicitly setting the e6 register -back to 22 (setpci -d 8086:1a48 e6.b=22) may be required to get back to a -stable configuration. - -- COPY these lines and paste them into ixgb_perf.sh: -#!/bin/bash -echo "configuring network performance , edit this file to change the interface -or device ID of 10GbE card" -# set mmrbc to 4k reads, modify only Intel 10GbE device IDs -# replace 1a48 with appropriate 10GbE device's ID installed on the system, -# if needed. -setpci -d 8086:1a48 e6.b=2e -# set the MTU (max transmission unit) - it requires your switch and clients -# to change as well. -# set the txqueuelen -# your ixgb adapter should be loaded as eth1 for this to work, change if needed -ip li set dev eth1 mtu 9000 txqueuelen 1000 up -# call the sysctl utility to modify /proc/sys entries -sysctl -p ./sysctl_ixgb.conf -- END ixgb_perf.sh - -- COPY these lines and paste them into sysctl_ixgb.conf: -# some of the defaults may be different for your kernel -# call this file with sysctl -p <this file> -# these are just suggested values that worked well to increase throughput in -# several network benchmark tests, your mileage may vary - -### IPV4 specific settings -# turn TCP timestamp support off, default 1, reduces CPU use -net.ipv4.tcp_timestamps = 0 -# turn SACK support off, default on -# on systems with a VERY fast bus -> memory interface this is the big gainer -net.ipv4.tcp_sack = 0 -# set min/default/max TCP read buffer, default 4096 87380 174760 -net.ipv4.tcp_rmem = 10000000 10000000 10000000 -# set min/pressure/max TCP write buffer, default 4096 16384 131072 -net.ipv4.tcp_wmem = 10000000 10000000 10000000 -# set min/pressure/max TCP buffer space, default 31744 32256 32768 -net.ipv4.tcp_mem = 10000000 10000000 10000000 - -### CORE settings (mostly for socket and UDP effect) -# set maximum receive socket buffer size, default 131071 -net.core.rmem_max = 524287 -# set maximum send socket buffer size, default 131071 -net.core.wmem_max = 524287 -# set default receive socket buffer size, default 65535 -net.core.rmem_default = 524287 -# set default send socket buffer size, default 65535 -net.core.wmem_default = 524287 -# set maximum amount of option memory buffers, default 10240 -net.core.optmem_max = 524287 -# set number of unprocessed input packets before kernel starts dropping them; default 300 -net.core.netdev_max_backlog = 300000 -- END sysctl_ixgb.conf - -Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface -your ixgb driver is using and/or replace '1a48' with appropriate 10GbE device's -ID installed on the system. - -NOTE: Unless these scripts are added to the boot process, these changes will - only last only until the next system reboot. - - -Resolving Slow UDP Traffic --------------------------- -If your server does not seem to be able to receive UDP traffic as fast as it -can receive TCP traffic, it could be because Linux, by default, does not set -the network stack buffers as large as they need to be to support high UDP -transfer rates. One way to alleviate this problem is to allow more memory to -be used by the IP stack to store incoming data. - -For instance, use the commands: - sysctl -w net.core.rmem_max=262143 -and - sysctl -w net.core.rmem_default=262143 -to increase the read buffer memory max and default to 262143 (256k - 1) from -defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables -will increase the amount of memory used by the network stack for receives, and -can be increased significantly more if necessary for your application. - - -Additional Configurations -========================= - - Configuring the Driver on Different Distributions - ------------------------------------------------- - Configuring a network driver to load properly when the system is started is - distribution dependent. Typically, the configuration process involves adding - an alias line to /etc/modprobe.conf as well as editing other system startup - scripts and/or configuration files. Many popular Linux distributions ship - with tools to make these changes for you. To learn the proper way to - configure a network device for your system, refer to your distribution - documentation. If during this process you are asked for the driver or module - name, the name for the Linux Base Driver for the Intel 10GbE Family of - Adapters is ixgb. - - Viewing Link Messages - --------------------- - Link messages will not be displayed to the console if the distribution is - restricting system messages. In order to see network driver link messages on - your console, set dmesg to eight by entering the following: - - dmesg -n 8 - - NOTE: This setting is not saved across reboots. - - - Jumbo Frames - ------------ - The driver supports Jumbo Frames for all adapters. Jumbo Frames support is - enabled by changing the MTU to a value larger than the default of 1500. - The maximum value for the MTU is 16114. Use the ip command to - increase the MTU size. For example: - - ip li set dev ethx mtu 9000 - - The maximum MTU setting for Jumbo Frames is 16114. This value coincides - with the maximum Jumbo Frames size of 16128. - - - ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. The ethtool - version 1.6 or later is required for this functionality. - - The latest release of ethtool can be found from - https://www.kernel.org/pub/software/network/ethtool/ - - NOTE: The ethtool version 1.6 only supports a limited set of ethtool options. - Support for a more complete ethtool feature set can be enabled by - upgrading to the latest version. - - - NAPI - ---- - - NAPI (Rx polling mode) is supported in the ixgb driver. NAPI is enabled - or disabled based on the configuration of the kernel. see CONFIG_IXGB_NAPI - - See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI. - - -Known Issues/Troubleshooting -============================ - - NOTE: After installing the driver, if your Intel Network Connection is not - working, verify in the "In This Release" section of the readme that you have - installed the correct driver. - - Intel(R) PRO/10GbE CX4 Server Adapter Cable Interoperability Issue with - Fujitsu XENPAK Module in SmartBits Chassis - --------------------------------------------------------------------- - Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 - Server adapter is connected to a Fujitsu XENPAK CX4 module in a SmartBits - chassis using 15 m/24AWG cable assemblies manufactured by Fujitsu or Leoni. - The CRC errors may be received either by the Intel(R) PRO/10GbE CX4 - Server adapter or the SmartBits. If this situation occurs using a different - cable assembly may resolve the issue. - - CX4 Server Adapter Cable Interoperability Issues with HP Procurve 3400cl - Switch Port - ------------------------------------------------------------------------ - Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 Server - adapter is connected to an HP Procurve 3400cl switch port using short cables - (1 m or shorter). If this situation occurs, using a longer cable may resolve - the issue. - - Excessive CRC errors may be observed using Fujitsu 24AWG cable assemblies that - Are 10 m or longer or where using a Leoni 15 m/24AWG cable assembly. The CRC - errors may be received either by the CX4 Server adapter or at the switch. If - this situation occurs, using a different cable assembly may resolve the issue. - - - Jumbo Frames System Requirement - ------------------------------- - Memory allocation failures have been observed on Linux systems with 64 MB - of RAM or less that are running Jumbo Frames. If you are using Jumbo - Frames, your system may require more than the advertised minimum - requirement of 64 MB of system memory. - - - Performance Degradation with Jumbo Frames - ----------------------------------------- - Degradation in throughput performance may be observed in some Jumbo frames - environments. If this is observed, increasing the application's socket buffer - size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help. - See the specific application manual and /usr/src/linux*/Documentation/ - networking/ip-sysctl.txt for more details. - - - Allocating Rx Buffers when Using Jumbo Frames - --------------------------------------------- - Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if - the available memory is heavily fragmented. This issue may be seen with PCI-X - adapters or with packet split disabled. This can be reduced or eliminated - by changing the amount of available memory for receive buffer allocation, by - increasing /proc/sys/vm/min_free_kbytes. - - - Multiple Interfaces on Same Ethernet Broadcast Network - ------------------------------------------------------ - Due to the default ARP behavior on Linux, it is not possible to have - one system on two IP networks in the same Ethernet broadcast domain - (non-partitioned switch) behave as expected. All Ethernet interfaces - will respond to IP traffic for any IP address assigned to the system. - This results in unbalanced receive traffic. - - If you have multiple interfaces in a server, do either of the following: - - - Turn on ARP filtering by entering: - echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter - - - Install the interfaces in separate broadcast domains - either in - different switches or in a switch partitioned to VLANs. - - - UDP Stress Test Dropped Packet Issue - -------------------------------------- - Under small packets UDP stress test with 10GbE driver, the Linux system - may drop UDP packets due to the fullness of socket buffers. You may want - to change the driver's Flow Control variables to the minimum value for - controlling packet reception. - - - Tx Hangs Possible Under Stress - ------------------------------ - Under stress conditions, if TX hangs occur, turning off TSO - "ethtool -K eth0 tso off" may resolve the problem. - - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/ixgbe.rst b/Documentation/networking/ixgbe.rst new file mode 100644 index 000000000000..725fc697fd8f --- /dev/null +++ b/Documentation/networking/ixgbe.rst @@ -0,0 +1,527 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Adapters +============================================================================= + +Intel 10 Gigabit Linux driver. +Copyright(c) 1999-2018 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Command Line Parameters +- Additional Configurations +- Known Issues +- Support + +Identifying Your Adapter +======================== +The driver is compatible with devices based on the following: + + * Intel(R) Ethernet Controller 82598 + * Intel(R) Ethernet Controller 82599 + * Intel(R) Ethernet Controller X520 + * Intel(R) Ethernet Controller X540 + * Intel(R) Ethernet Controller x550 + * Intel(R) Ethernet Controller X552 + * Intel(R) Ethernet Controller X553 + +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +https://www.intel.com/support + +SFP+ Devices with Pluggable Optics +---------------------------------- + +82599-BASED ADAPTERS +~~~~~~~~~~~~~~~~~~~~ +NOTES: +- If your 82599-based Intel(R) Network Adapter came with Intel optics or is an +Intel(R) Ethernet Server Adapter X520-2, then it only supports Intel optics +and/or the direct attach cables listed below. +- When 82599-based SFP+ devices are connected back to back, they should be set +to the same Speed setting via ethtool. Results may vary if you mix speed +settings. + ++---------------+---------------------------------------+------------------+ +| Supplier | Type | Part Numbers | ++===============+=======================================+==================+ +| SR Modules | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | FTLX8571D3BCV-IT | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | AFBR-703SDZ-IN2 | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | AFBR-703SDDZ-IN1 | ++---------------+---------------------------------------+------------------+ +| LR Modules | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | FTLX1471D3BCV-IT | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | AFCT-701SDZ-IN2 | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | AFCT-701SDDZ-IN1 | ++---------------+---------------------------------------+------------------+ + +The following is a list of 3rd party SFP+ modules that have received some +testing. Not all modules are applicable to all devices. + ++---------------+---------------------------------------+------------------+ +| Supplier | Type | Part Numbers | ++===============+=======================================+==================+ +| Finisar | SFP+ SR bailed, 10g single rate | FTLX8571D3BCL | ++---------------+---------------------------------------+------------------+ +| Avago | SFP+ SR bailed, 10g single rate | AFBR-700SDZ | ++---------------+---------------------------------------+------------------+ +| Finisar | SFP+ LR bailed, 10g single rate | FTLX1471D3BCL | ++---------------+---------------------------------------+------------------+ +| Finisar | DUAL RATE 1G/10G SFP+ SR (No Bail) | FTLX8571D3QCV-IT | ++---------------+---------------------------------------+------------------+ +| Avago | DUAL RATE 1G/10G SFP+ SR (No Bail) | AFBR-703SDZ-IN1 | ++---------------+---------------------------------------+------------------+ +| Finisar | DUAL RATE 1G/10G SFP+ LR (No Bail) | FTLX1471D3QCV-IT | ++---------------+---------------------------------------+------------------+ +| Avago | DUAL RATE 1G/10G SFP+ LR (No Bail) | AFCT-701SDZ-IN1 | ++---------------+---------------------------------------+------------------+ +| Finisar | 1000BASE-T SFP | FCLF8522P2BTL | ++---------------+---------------------------------------+------------------+ +| Avago | 1000BASE-T | ABCU-5710RZ | ++---------------+---------------------------------------+------------------+ +| HP | 1000BASE-SX SFP | 453153-001 | ++---------------+---------------------------------------+------------------+ + +82599-based adapters support all passive and active limiting direct attach +cables that comply with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. + +Laser turns off for SFP+ when ifconfig ethX down +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +"ifconfig ethX down" turns off the laser for 82599-based SFP+ fiber adapters. +"ifconfig ethX up" turns on the laser. +Alternatively, you can use "ip link set [down/up] dev ethX" to turn the +laser off and on. + + +82599-based QSFP+ Adapters +~~~~~~~~~~~~~~~~~~~~~~~~~~ +NOTES: +- If your 82599-based Intel(R) Network Adapter came with Intel optics, it only +supports Intel optics. +- 82599-based QSFP+ adapters only support 4x10 Gbps connections. 1x40 Gbps +connections are not supported. QSFP+ link partners must be configured for +4x10 Gbps. +- 82599-based QSFP+ adapters do not support automatic link speed detection. +The link speed must be configured to either 10 Gbps or 1 Gbps to match the link +partners speed capabilities. Incorrect speed configurations will result in +failure to link. +- Intel(R) Ethernet Converged Network Adapter X520-Q1 only supports the optics +and direct attach cables listed below. + ++---------------+---------------------------------------+------------------+ +| Supplier | Type | Part Numbers | ++===============+=======================================+==================+ +| Intel | DUAL RATE 1G/10G QSFP+ SRL (bailed) | E10GQSFPSR | ++---------------+---------------------------------------+------------------+ + +82599-based QSFP+ adapters support all passive and active limiting QSFP+ +direct attach cables that comply with SFF-8436 v4.1 specifications. + +82598-BASED ADAPTERS +~~~~~~~~~~~~~~~~~~~~ +NOTES: +- Intel(r) Ethernet Network Adapters that support removable optical modules +only support their original module type (for example, the Intel(R) 10 Gigabit +SR Dual Port Express Module only supports SR optical modules). If you plug in +a different type of module, the driver will not load. +- Hot Swapping/hot plugging optical modules is not supported. +- Only single speed, 10 gigabit modules are supported. +- LAN on Motherboard (LOMs) may support DA, SR, or LR modules. Other module +types are not supported. Please see your system documentation for details. + +The following is a list of SFP+ modules and direct attach cables that have +received some testing. Not all modules are applicable to all devices. + ++---------------+---------------------------------------+------------------+ +| Supplier | Type | Part Numbers | ++===============+=======================================+==================+ +| Finisar | SFP+ SR bailed, 10g single rate | FTLX8571D3BCL | ++---------------+---------------------------------------+------------------+ +| Avago | SFP+ SR bailed, 10g single rate | AFBR-700SDZ | ++---------------+---------------------------------------+------------------+ +| Finisar | SFP+ LR bailed, 10g single rate | FTLX1471D3BCL | ++---------------+---------------------------------------+------------------+ + +82598-based adapters support all passive direct attach cables that comply with +SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach cables +are not supported. + +Third party optic modules and cables referred to above are listed only for the +purpose of highlighting third party specifications and potential +compatibility, and are not recommendations or endorsements or sponsorship of +any third party's product by Intel. Intel is not endorsing or promoting +products made by any third party and the third party reference is provided +only to share information regarding certain optic modules and cables with the +above specifications. There may be other manufacturers or suppliers, producing +or supplying optic modules and cables with similar or matching descriptions. +Customers must use their own discretion and diligence to purchase optic +modules and cables from any third party of their choice. Customers are solely +responsible for assessing the suitability of the product and/or devices and +for the selection of the vendor for purchasing any product. THE OPTIC MODULES +AND CABLES REFERRED TO ABOVE ARE NOT WARRANTED OR SUPPORTED BY INTEL. INTEL +ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED +WARRANTY, RELATING TO SALE AND/OR USE OF SUCH THIRD PARTY PRODUCTS OR +SELECTION OF VENDOR BY CUSTOMERS. + +Command Line Parameters +======================= + +max_vfs +------- +:Valid Range: 1-63 + +This parameter adds support for SR-IOV. It causes the driver to spawn up to +max_vfs worth of virtual functions. +If the value is greater than 0 it will also force the VMDq parameter to be 1 or +more. + +NOTE: This parameter is only used on kernel 3.7.x and below. On kernel 3.8.x +and above, use sysfs to enable VFs. Also, for Red Hat distributions, this +parameter is only used on version 6.6 and older. For version 6.7 and newer, use +sysfs. For example:: + + #echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs // enable VFs + #echo 0 > /sys/class/net/$dev/device/sriov_numvfs //disable VFs + +The parameters for the driver are referenced by position. Thus, if you have a +dual port adapter, or more than one adapter in your system, and want N virtual +functions per port, you must specify a number for each port with each parameter +separated by a comma. For example:: + + modprobe ixgbe max_vfs=4 + +This will spawn 4 VFs on the first port. + +:: + + modprobe ixgbe max_vfs=2,4 + +This will spawn 2 VFs on the first port and 4 VFs on the second port. + +NOTE: Caution must be used in loading the driver with these parameters. +Depending on your system configuration, number of slots, etc., it is impossible +to predict in all cases where the positions would be on the command line. + +NOTE: Neither the device nor the driver control how VFs are mapped into config +space. Bus layout will vary by operating system. On operating systems that +support it, you can check sysfs to find the mapping. + +NOTE: When either SR-IOV mode or VMDq mode is enabled, hardware VLAN filtering +and VLAN tag stripping/insertion will remain enabled. Please remove the old +VLAN filter before the new VLAN filter is added. For example, + +:: + + ip link set eth0 vf 0 vlan 100 // set VLAN 100 for VF 0 + ip link set eth0 vf 0 vlan 0 // Delete VLAN 100 + ip link set eth0 vf 0 vlan 200 // set a new VLAN 200 for VF 0 + +With kernel 3.6, the driver supports the simultaneous usage of max_vfs and DCB +features, subject to the constraints described below. Prior to kernel 3.6, the +driver did not support the simultaneous operation of max_vfs greater than 0 and +the DCB features (multiple traffic classes utilizing Priority Flow Control and +Extended Transmission Selection). + +When DCB is enabled, network traffic is transmitted and received through +multiple traffic classes (packet buffers in the NIC). The traffic is associated +with a specific class based on priority, which has a value of 0 through 7 used +in the VLAN tag. When SR-IOV is not enabled, each traffic class is associated +with a set of receive/transmit descriptor queue pairs. The number of queue +pairs for a given traffic class depends on the hardware configuration. When +SR-IOV is enabled, the descriptor queue pairs are grouped into pools. The +Physical Function (PF) and each Virtual Function (VF) is allocated a pool of +receive/transmit descriptor queue pairs. When multiple traffic classes are +configured (for example, DCB is enabled), each pool contains a queue pair from +each traffic class. When a single traffic class is configured in the hardware, +the pools contain multiple queue pairs from the single traffic class. + +The number of VFs that can be allocated depends on the number of traffic +classes that can be enabled. The configurable number of traffic classes for +each enabled VF is as follows: +0 - 15 VFs = Up to 8 traffic classes, depending on device support +16 - 31 VFs = Up to 4 traffic classes +32 - 63 VFs = 1 traffic class + +When VFs are configured, the PF is allocated one pool as well. The PF supports +the DCB features with the constraint that each traffic class will only use a +single queue pair. When zero VFs are configured, the PF can support multiple +queue pairs per traffic class. + +allow_unsupported_sfp +--------------------- +:Valid Range: 0,1 +:Default Value: 0 (disabled) + +This parameter allows unsupported and untested SFP+ modules on 82599-based +adapters, as long as the type of module is known to the driver. + +debug +----- +:Valid Range: 0-16 (0=none,...,16=all) +:Default Value: 0 + +This parameter adjusts the level of debug messages displayed in the system +logs. + + +Additional Features and Configurations +====================================== + +Flow Control +------------ +Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable +receiving and transmitting pause frames for ixgbe. When transmit is enabled, +pause frames are generated when the receive packet buffer crosses a predefined +threshold. When receive is enabled, the transmit unit will halt for the time +delay specified when a pause frame is received. + +NOTE: You must have a flow control capable link partner. + +Flow Control is enabled by default. + +Use ethtool to change the flow control settings. To enable or disable Rx or +Tx Flow Control:: + + ethtool -A eth? rx <on|off> tx <on|off> + +Note: This command only enables or disables Flow Control if auto-negotiation is +disabled. If auto-negotiation is enabled, this command changes the parameters +used for auto-negotiation with the link partner. + +To enable or disable auto-negotiation:: + + ethtool -s eth? autoneg <on|off> + +Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending +on your device, you may not be able to change the auto-negotiation setting. + +NOTE: For 82598 backplane cards entering 1 gigabit mode, flow control default +behavior is changed to off. Flow control in 1 gigabit mode on these devices can +lead to transmit hangs. + +Intel(R) Ethernet Flow Director +------------------------------- +The Intel Ethernet Flow Director performs the following tasks: + +- Directs receive packets according to their flows to different queues. +- Enables tight control on routing a flow in the platform. +- Matches flows and CPU cores for flow affinity. +- Supports multiple parameters for flexible flow classification and load + balancing (in SFP mode only). + +NOTE: Intel Ethernet Flow Director masking works in the opposite manner from +subnet masking. In the following command:: + + #ethtool -N eth11 flow-type ip4 src-ip 172.4.1.2 m 255.0.0.0 dst-ip \ + 172.21.1.1 m 255.128.0.0 action 31 + +The src-ip value that is written to the filter will be 0.4.1.2, not 172.0.0.0 +as might be expected. Similarly, the dst-ip value written to the filter will be +0.21.1.1, not 172.0.0.0. + +To enable or disable the Intel Ethernet Flow Director:: + + # ethtool -K ethX ntuple <on|off> + +When disabling ntuple filters, all the user programmed filters are flushed from +the driver cache and hardware. All needed filters must be re-added when ntuple +is re-enabled. + +To add a filter that directs packet to queue 2, use -U or -N switch:: + + # ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \ + 192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1] + +To see the list of filters currently present:: + + # ethtool <-u|-n> ethX + +Sideband Perfect Filters +------------------------ +Sideband Perfect Filters are used to direct traffic that matches specified +characteristics. They are enabled through ethtool's ntuple interface. To add a +new filter use the following command:: + + ethtool -U <device> flow-type <type> src-ip <ip> dst-ip <ip> src-port <port> \ + dst-port <port> action <queue> + +Where: + <device> - the ethernet device to program + <type> - can be ip4, tcp4, udp4, or sctp4 + <ip> - the IP address to match on + <port> - the port number to match on + <queue> - the queue to direct traffic towards (-1 discards the matched traffic) + +Use the following command to delete a filter:: + + ethtool -U <device> delete <N> + +Where <N> is the filter id displayed when printing all the active filters, and +may also have been specified using "loc <N>" when adding the filter. + +The following example matches TCP traffic sent from 192.168.0.1, port 5300, +directed to 192.168.0.5, port 80, and sends it to queue 7:: + + ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 \ + src-port 5300 dst-port 80 action 7 + +For each flow-type, the programmed filters must all have the same matching +input set. For example, issuing the following two commands is acceptable:: + + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10 + +Issuing the next two commands, however, is not acceptable, since the first +specifies src-ip and the second specifies dst-ip:: + + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 + ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10 + +The second command will fail with an error. You may program multiple filters +with the same fields, using different values, but, on one device, you may not +program two TCP4 filters with different matching fields. + +Matching on a sub-portion of a field is not supported by the ixgbe driver, thus +partial mask fields are not supported. + +To create filters that direct traffic to a specific Virtual Function, use the +"user-def" parameter. Specify the user-def as a 64 bit value, where the lower 32 +bits represents the queue number, while the next 8 bits represent which VF. +Note that 0 is the PF, so the VF identifier is offset by 1. For example:: + + ... user-def 0x800000002 ... + +specifies to direct traffic to Virtual Function 7 (8 minus 1) into queue 2 of +that VF. + +Note that these filters will not break internal routing rules, and will not +route traffic that otherwise would not have been sent to the specified Virtual +Function. + +Jumbo Frames +------------ +Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) +to a value larger than the default value of 1500. + +Use the ifconfig command to increase the MTU size. For example, enter the +following where <x> is the interface number:: + + ifconfig eth<x> mtu 9000 up + +Alternatively, you can use the ip command as follows:: + + ip link set mtu 9000 dev eth<x> + ip link set up dev eth<x> + +This setting is not saved across reboots. The setting change can be made +permanent by adding 'MTU=9000' to the file:: + + /etc/sysconfig/network-scripts/ifcfg-eth<x> // for RHEL + /etc/sysconfig/network/<config_file> // for SLES + +NOTE: The maximum MTU setting for Jumbo Frames is 9710. This value coincides +with the maximum Jumbo Frames size of 9728 bytes. + +NOTE: This driver will attempt to use multiple page sized buffers to receive +each jumbo packet. This should help to avoid buffer starvation issues when +allocating receive packets. + +NOTE: For 82599-based network connections, if you are enabling jumbo frames in +a virtual function (VF), jumbo frames must first be enabled in the physical +function (PF). The VF MTU setting cannot be larger than the PF MTU. + +Generic Receive Offload, aka GRO +-------------------------------- +The driver supports the in-kernel software implementation of GRO. GRO has +shown that by coalescing Rx traffic into larger chunks of data, CPU +utilization can be significantly reduced when under large Rx load. GRO is an +evolution of the previously-used LRO interface. GRO is able to coalesce +other protocols besides TCP. It's also safe to use with configurations that +are problematic for LRO, namely bridging and iSCSI. + +Data Center Bridging (DCB) +-------------------------- +NOTE: +The kernel assumes that TC0 is available, and will disable Priority Flow +Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is +enabled when setting up DCB on your switch. + +DCB is a configuration Quality of Service implementation in hardware. It uses +the VLAN priority tag (802.1p) to filter traffic. That means that there are 8 +different priorities that traffic can be filtered into. It also enables +priority flow control (802.1Qbb) which can limit or eliminate the number of +dropped packets during network stress. Bandwidth can be allocated to each of +these priorities, which is enforced at the hardware level (802.1Qaz). + +Adapter firmware implements LLDP and DCBX protocol agents as per 802.1AB and +802.1Qaz respectively. The firmware based DCBX agent runs in willing mode only +and can accept settings from a DCBX capable peer. Software configuration of +DCBX parameters via dcbtool/lldptool are not supported. + +The ixgbe driver implements the DCB netlink interface layer to allow user-space +to communicate with the driver and query DCB configuration for the port. + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: +https://www.kernel.org/pub/software/network/ethtool/ + +FCoE +---- +The ixgbe driver supports Fiber Channel over Ethernet (FCoE) and Data Center +Bridging (DCB). This code has no default effect on the regular driver +operation. Configuring DCB and FCoE is outside the scope of this README. Refer +to http://www.open-fcoe.org/ for FCoE project information and contact +ixgbe-eedc@lists.sourceforge.net for DCB information. + +MAC and VLAN anti-spoofing feature +---------------------------------- +When a malicious driver attempts to send a spoofed packet, it is dropped by the +hardware and not transmitted. + +An interrupt is sent to the PF driver notifying it of the spoof attempt. When a +spoofed packet is detected, the PF driver will send the following message to +the system log (displayed by the "dmesg" command):: + + ixgbe ethX: ixgbe_spoof_check: n spoofed packets detected + +where "x" is the PF interface number; and "n" is number of spoofed packets. +NOTE: This feature can be disabled for a specific Virtual Function (VF):: + + ip link set <pf dev> vf <vf id> spoofchk {off|on} + + +Known Issues/Troubleshooting +============================ + +Enabling SR-IOV in a 64-bit Microsoft* Windows Server* 2012/R2 guest OS +----------------------------------------------------------------------- +Linux KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. +This includes traditional PCIe devices, as well as SR-IOV-capable devices based +on the Intel Ethernet Controller XL710. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/ixgbe.txt b/Documentation/networking/ixgbe.txt deleted file mode 100644 index 687835415707..000000000000 --- a/Documentation/networking/ixgbe.txt +++ /dev/null @@ -1,349 +0,0 @@ -Linux* Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Family of -Adapters -============================================================================= - -Intel 10 Gigabit Linux driver. -Copyright(c) 1999 - 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Additional Configurations -- Performance Tuning -- Known Issues -- Support - -Identifying Your Adapter -======================== - -The driver in this release is compatible with 82598, 82599 and X540-based -Intel Network Connections. - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/network/sb/CS-012904.htm - -SFP+ Devices with Pluggable Optics ----------------------------------- - -82599-BASED ADAPTERS - -NOTES: If your 82599-based Intel(R) Network Adapter came with Intel optics, or -is an Intel(R) Ethernet Server Adapter X520-2, then it only supports Intel -optics and/or the direct attach cables listed below. - -When 82599-based SFP+ devices are connected back to back, they should be set to -the same Speed setting via ethtool. Results may vary if you mix speed settings. -82598-based adapters support all passive direct attach cables that comply -with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach -cables are not supported. - -Supplier Type Part Numbers - -SR Modules -Intel DUAL RATE 1G/10G SFP+ SR (bailed) FTLX8571D3BCV-IT -Intel DUAL RATE 1G/10G SFP+ SR (bailed) AFBR-703SDDZ-IN1 -Intel DUAL RATE 1G/10G SFP+ SR (bailed) AFBR-703SDZ-IN2 -LR Modules -Intel DUAL RATE 1G/10G SFP+ LR (bailed) FTLX1471D3BCV-IT -Intel DUAL RATE 1G/10G SFP+ LR (bailed) AFCT-701SDDZ-IN1 -Intel DUAL RATE 1G/10G SFP+ LR (bailed) AFCT-701SDZ-IN2 - -The following is a list of 3rd party SFP+ modules and direct attach cables that -have received some testing. Not all modules are applicable to all devices. - -Supplier Type Part Numbers - -Finisar SFP+ SR bailed, 10g single rate FTLX8571D3BCL -Avago SFP+ SR bailed, 10g single rate AFBR-700SDZ -Finisar SFP+ LR bailed, 10g single rate FTLX1471D3BCL - -Finisar DUAL RATE 1G/10G SFP+ SR (No Bail) FTLX8571D3QCV-IT -Avago DUAL RATE 1G/10G SFP+ SR (No Bail) AFBR-703SDZ-IN1 -Finisar DUAL RATE 1G/10G SFP+ LR (No Bail) FTLX1471D3QCV-IT -Avago DUAL RATE 1G/10G SFP+ LR (No Bail) AFCT-701SDZ-IN1 -Finistar 1000BASE-T SFP FCLF8522P2BTL -Avago 1000BASE-T SFP ABCU-5710RZ - -82599-based adapters support all passive and active limiting direct attach -cables that comply with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. - -Laser turns off for SFP+ when device is down -------------------------------------------- -"ip link set down" turns off the laser for 82599-based SFP+ fiber adapters. -"ip link set up" turns on the laser. - - -82598-BASED ADAPTERS - -NOTES for 82598-Based Adapters: -- Intel(R) Network Adapters that support removable optical modules only support - their original module type (i.e., the Intel(R) 10 Gigabit SR Dual Port - Express Module only supports SR optical modules). If you plug in a different - type of module, the driver will not load. -- Hot Swapping/hot plugging optical modules is not supported. -- Only single speed, 10 gigabit modules are supported. -- LAN on Motherboard (LOMs) may support DA, SR, or LR modules. Other module - types are not supported. Please see your system documentation for details. - -The following is a list of 3rd party SFP+ modules and direct attach cables that -have received some testing. Not all modules are applicable to all devices. - -Supplier Type Part Numbers - -Finisar SFP+ SR bailed, 10g single rate FTLX8571D3BCL -Avago SFP+ SR bailed, 10g single rate AFBR-700SDZ -Finisar SFP+ LR bailed, 10g single rate FTLX1471D3BCL - -82598-based adapters support all passive direct attach cables that comply -with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach -cables are not supported. - - -Flow Control ------------- -Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable -receiving and transmitting pause frames for ixgbe. When TX is enabled, PAUSE -frames are generated when the receive packet buffer crosses a predefined -threshold. When rx is enabled, the transmit unit will halt for the time delay -specified when a PAUSE frame is received. - -Flow Control is enabled by default. If you want to disable a flow control -capable link partner, use ethtool: - - ethtool -A eth? autoneg off RX off TX off - -NOTE: For 82598 backplane cards entering 1 gig mode, flow control default -behavior is changed to off. Flow control in 1 gig mode on these devices can -lead to Tx hangs. - -Intel(R) Ethernet Flow Director -------------------------------- -Supports advanced filters that direct receive packets by their flows to -different queues. Enables tight control on routing a flow in the platform. -Matches flows and CPU cores for flow affinity. Supports multiple parameters -for flexible flow classification and load balancing. - -Flow director is enabled only if the kernel is multiple TX queue capable. - -An included script (set_irq_affinity.sh) automates setting the IRQ to CPU -affinity. - -You can verify that the driver is using Flow Director by looking at the counter -in ethtool: fdir_miss and fdir_match. - -Other ethtool Commands: -To enable Flow Director - ethtool -K ethX ntuple on -To add a filter - Use -U switch. e.g., ethtool -U ethX flow-type tcp4 src-ip 10.0.128.23 - action 1 -To see the list of filters currently present: - ethtool -u ethX - -Perfect Filter: Perfect filter is an interface to load the filter table that -funnels all flow into queue_0 unless an alternative queue is specified using -"action". In that case, any flow that matches the filter criteria will be -directed to the appropriate queue. - -If the queue is defined as -1, filter will drop matching packets. - -To account for filter matches and misses, there are two stats in ethtool: -fdir_match and fdir_miss. In addition, rx_queue_N_packets shows the number of -packets processed by the Nth queue. - -NOTE: Receive Packet Steering (RPS) and Receive Flow Steering (RFS) are not -compatible with Flow Director. IF Flow Director is enabled, these will be -disabled. - -The following three parameters impact Flow Director. - -FdirMode --------- -Valid Range: 0-2 (0=off, 1=ATR, 2=Perfect filter mode) -Default Value: 1 - - Flow Director filtering modes. - -FdirPballoc ------------ -Valid Range: 0-2 (0=64k, 1=128k, 2=256k) -Default Value: 0 - - Flow Director allocated packet buffer size. - -AtrSampleRate --------------- -Valid Range: 1-100 -Default Value: 20 - - Software ATR Tx packet sample rate. For example, when set to 20, every 20th - packet, looks to see if the packet will create a new flow. - -Node ----- -Valid Range: 0-n -Default Value: 1 (off) - - 0 - n: where n is the number of NUMA nodes (i.e. 0 - 3) currently online in - your system - 1: turns this option off - - The Node parameter will allow you to pick which NUMA node you want to have - the adapter allocate memory on. - -max_vfs -------- -Valid Range: 1-63 -Default Value: 0 - - If the value is greater than 0 it will also force the VMDq parameter to be 1 - or more. - - This parameter adds support for SR-IOV. It causes the driver to spawn up to - max_vfs worth of virtual function. - - -Additional Configurations -========================= - - Jumbo Frames - ------------ - The driver supports Jumbo Frames for all adapters. Jumbo Frames support is - enabled by changing the MTU to a value larger than the default of 1500. - The maximum value for the MTU is 16110. Use the ip command to - increase the MTU size. For example: - - ip link set dev ethx mtu 9000 - - The maximum MTU setting for Jumbo Frames is 9710. This value coincides - with the maximum Jumbo Frames size of 9728. - - Generic Receive Offload, aka GRO - -------------------------------- - The driver supports the in-kernel software implementation of GRO. GRO has - shown that by coalescing Rx traffic into larger chunks of data, CPU - utilization can be significantly reduced when under large Rx load. GRO is an - evolution of the previously-used LRO interface. GRO is able to coalesce - other protocols besides TCP. It's also safe to use with configurations that - are problematic for LRO, namely bridging and iSCSI. - - Data Center Bridging, aka DCB - ----------------------------- - DCB is a configuration Quality of Service implementation in hardware. - It uses the VLAN priority tag (802.1p) to filter traffic. That means - that there are 8 different priorities that traffic can be filtered into. - It also enables priority flow control which can limit or eliminate the - number of dropped packets during network stress. Bandwidth can be - allocated to each of these priorities, which is enforced at the hardware - level. - - To enable DCB support in ixgbe, you must enable the DCB netlink layer to - allow the userspace tools (see below) to communicate with the driver. - This can be found in the kernel configuration here: - - -> Networking support - -> Networking options - -> Data Center Bridging support - - Once this is selected, DCB support must be selected for ixgbe. This can - be found here: - - -> Device Drivers - -> Network device support (NETDEVICES [=y]) - -> Ethernet (10000 Mbit) (NETDEV_10000 [=y]) - -> Intel(R) 10GbE PCI Express adapters support - -> Data Center Bridging (DCB) Support - - After these options are selected, you must rebuild your kernel and your - modules. - - In order to use DCB, userspace tools must be downloaded and installed. - The dcbd tools can be found at: - - http://e1000.sf.net - - Ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. The latest - ethtool version is required for this functionality. - - The latest release of ethtool can be found from - https://www.kernel.org/pub/software/network/ethtool/ - - FCoE - ---- - This release of the ixgbe driver contains new code to enable users to use - Fiber Channel over Ethernet (FCoE) and Data Center Bridging (DCB) - functionality that is supported by the 82598-based hardware. This code has - no default effect on the regular driver operation, and configuring DCB and - FCoE is outside the scope of this driver README. Refer to - http://www.open-fcoe.org/ for FCoE project information and contact - e1000-eedc@lists.sourceforge.net for DCB information. - - MAC and VLAN anti-spoofing feature - ---------------------------------- - When a malicious driver attempts to send a spoofed packet, it is dropped by - the hardware and not transmitted. An interrupt is sent to the PF driver - notifying it of the spoof attempt. - - When a spoofed packet is detected the PF driver will send the following - message to the system log (displayed by the "dmesg" command): - - Spoof event(s) detected on VF (n) - - Where n=the VF that attempted to do the spoofing. - - -Performance Tuning -================== - -An excellent article on performance tuning can be found at: - -http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Mark_Wagner.pdf - - -Known Issues -============ - - Enabling SR-IOV in a 32-bit or 64-bit Microsoft* Windows* Server 2008/R2 - Guest OS using Intel (R) 82576-based GbE or Intel (R) 82599-based 10GbE - controller under KVM - ------------------------------------------------------------------------ - KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. This - includes traditional PCIe devices, as well as SR-IOV-capable devices using - Intel 82576-based and 82599-based controllers. - - While direct assignment of a PCIe device or an SR-IOV Virtual Function (VF) - to a Linux-based VM running 2.6.32 or later kernel works fine, there is a - known issue with Microsoft Windows Server 2008 VM that results in a "yellow - bang" error. This problem is within the KVM VMM itself, not the Intel driver, - or the SR-IOV logic of the VMM, but rather that KVM emulates an older CPU - model for the guests, and this older CPU model does not support MSI-X - interrupts, which is a requirement for Intel SR-IOV. - - If you wish to use the Intel 82576 or 82599-based controllers in SR-IOV mode - with KVM and a Microsoft Windows Server 2008 guest try the following - workaround. The workaround is to tell KVM to emulate a different model of CPU - when using qemu to create the KVM guest: - - "-cpu qemu64,model=13" - - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://e1000.sourceforge.net - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/ixgbevf.rst b/Documentation/networking/ixgbevf.rst new file mode 100644 index 000000000000..56cde6366c2f --- /dev/null +++ b/Documentation/networking/ixgbevf.rst @@ -0,0 +1,66 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Virtual Function Driver for Intel(R) 10G Ethernet +============================================================= + +Intel 10 Gigabit Virtual Function Linux driver. +Copyright(c) 1999-2018 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Known Issues +- Support + +This driver supports 82599, X540, X550, and X552-based virtual function devices +that can only be activated on kernels that support SR-IOV. + +For questions related to hardware requirements, refer to the documentation +supplied with your Intel adapter. All hardware requirements listed apply to use +with Linux. + + +Identifying Your Adapter +======================== +The driver is compatible with devices based on the following: + + * Intel(R) Ethernet Controller 82598 + * Intel(R) Ethernet Controller 82599 + * Intel(R) Ethernet Controller X520 + * Intel(R) Ethernet Controller X540 + * Intel(R) Ethernet Controller x550 + * Intel(R) Ethernet Controller X552 + * Intel(R) Ethernet Controller X553 + +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +https://www.intel.com/support + +Known Issues/Troubleshooting +============================ + +SR-IOV requires the correct platform and OS support. + +The guest OS loading this driver must support MSI-X interrupts. + +This driver is only supported as a loadable module at this time. Intel is not +supplying patches against the kernel source to allow for static linking of the +drivers. + +VLANs: There is a limit of a total of 64 shared VLANs to 1 or more VFs. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/ixgbevf.txt b/Documentation/networking/ixgbevf.txt deleted file mode 100644 index 53d8d2a5a6a3..000000000000 --- a/Documentation/networking/ixgbevf.txt +++ /dev/null @@ -1,52 +0,0 @@ -Linux* Base Driver for Intel(R) Ethernet Network Connection -=========================================================== - -Intel Gigabit Linux driver. -Copyright(c) 1999 - 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Known Issues/Troubleshooting -- Support - -This file describes the ixgbevf Linux* Base Driver for Intel Network -Connection. - -The ixgbevf driver supports 82599-based virtual function devices that can only -be activated on kernels with CONFIG_PCI_IOV enabled. - -The ixgbevf driver supports virtual functions generated by the ixgbe driver -with a max_vfs value of 1 or greater. - -The guest OS loading the ixgbevf driver must support MSI-X interrupts. - -VLANs: There is a limit of a total of 32 shared VLANs to 1 or more VFs. - -Identifying Your Adapter -======================== - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/go/network/adapter/idguide.htm - -Known Issues/Troubleshooting -============================ - - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/netvsc.txt b/Documentation/networking/netvsc.txt index 92f5b31392fa..3bfa635bbbd5 100644 --- a/Documentation/networking/netvsc.txt +++ b/Documentation/networking/netvsc.txt @@ -45,6 +45,15 @@ Features like packets and significantly reduces CPU usage under heavy Rx load. + Large Receive Offload (LRO), or Receive Side Coalescing (RSC) + ------------------------------------------------------------- + The driver supports LRO/RSC in the vSwitch feature. It reduces the per packet + processing overhead by coalescing multiple TCP segments when possible. The + feature is enabled by default on VMs running on Windows Server 2019 and + later. It may be changed by ethtool command: + ethtool -K eth0 lro on + ethtool -K eth0 lro off + SR-IOV support -------------- Hyper-V supports SR-IOV as a hardware acceleration option. If SR-IOV diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt index b5407163d53b..605e00cdd6be 100644 --- a/Documentation/networking/rxrpc.txt +++ b/Documentation/networking/rxrpc.txt @@ -1069,6 +1069,31 @@ The kernel interface functions are as follows: This function may transmit a PING ACK. + (*) Get reply timestamp. + + bool rxrpc_kernel_get_reply_time(struct socket *sock, + struct rxrpc_call *call, + ktime_t *_ts) + + This allows the timestamp on the first DATA packet of the reply of a + client call to be queried, provided that it is still in the Rx ring. If + successful, the timestamp will be stored into *_ts and true will be + returned; false will be returned otherwise. + + (*) Get remote client epoch. + + u32 rxrpc_kernel_get_epoch(struct socket *sock, + struct rxrpc_call *call) + + This allows the epoch that's contained in packets of an incoming client + call to be queried. This value is returned. The function always + successful if the call is still in progress. It shouldn't be called once + the call has expired. Note that calling this on a local client call only + returns the local epoch. + + This value can be used to determine if the remote client has been + restarted as it shouldn't change otherwise. + ======================= CONFIGURABLE PARAMETERS diff --git a/Documentation/networking/tcp.txt b/Documentation/networking/tcp.txt deleted file mode 100644 index 9c7139d57e57..000000000000 --- a/Documentation/networking/tcp.txt +++ /dev/null @@ -1,101 +0,0 @@ -TCP protocol -============ - -Last updated: 3 June 2017 - -Contents -======== - -- Congestion control -- How the new TCP output machine [nyi] works - -Congestion control -================== - -The following variables are used in the tcp_sock for congestion control: -snd_cwnd The size of the congestion window -snd_ssthresh Slow start threshold. We are in slow start if - snd_cwnd is less than this. -snd_cwnd_cnt A counter used to slow down the rate of increase - once we exceed slow start threshold. -snd_cwnd_clamp This is the maximum size that snd_cwnd can grow to. -snd_cwnd_stamp Timestamp for when congestion window last validated. -snd_cwnd_used Used as a highwater mark for how much of the - congestion window is in use. It is used to adjust - snd_cwnd down when the link is limited by the - application rather than the network. - -As of 2.6.13, Linux supports pluggable congestion control algorithms. -A congestion control mechanism can be registered through functions in -tcp_cong.c. The functions used by the congestion control mechanism are -registered via passing a tcp_congestion_ops struct to -tcp_register_congestion_control. As a minimum, the congestion control -mechanism must provide a valid name and must implement either ssthresh, -cong_avoid and undo_cwnd hooks or the "omnipotent" cong_control hook. - -Private data for a congestion control mechanism is stored in tp->ca_priv. -tcp_ca(tp) returns a pointer to this space. This is preallocated space - it -is important to check the size of your private data will fit this space, or -alternatively, space could be allocated elsewhere and a pointer to it could -be stored here. - -There are three kinds of congestion control algorithms currently: The -simplest ones are derived from TCP reno (highspeed, scalable) and just -provide an alternative congestion window calculation. More complex -ones like BIC try to look at other events to provide better -heuristics. There are also round trip time based algorithms like -Vegas and Westwood+. - -Good TCP congestion control is a complex problem because the algorithm -needs to maintain fairness and performance. Please review current -research and RFC's before developing new modules. - -The default congestion control mechanism is chosen based on the -DEFAULT_TCP_CONG Kconfig parameter. If you really want a particular default -value then you can set it using sysctl net.ipv4.tcp_congestion_control. The -module will be autoloaded if needed and you will get the expected protocol. If -you ask for an unknown congestion method, then the sysctl attempt will fail. - -If you remove a TCP congestion control module, then you will get the next -available one. Since reno cannot be built as a module, and cannot be -removed, it will always be available. - -How the new TCP output machine [nyi] works. -=========================================== - -Data is kept on a single queue. The skb->users flag tells us if the frame is -one that has been queued already. To add a frame we throw it on the end. Ack -walks down the list from the start. - -We keep a set of control flags - - - sk->tcp_pend_event - - TCP_PEND_ACK Ack needed - TCP_ACK_NOW Needed now - TCP_WINDOW Window update check - TCP_WINZERO Zero probing - - - sk->transmit_queue The transmission frame begin - sk->transmit_new First new frame pointer - sk->transmit_end Where to add frames - - sk->tcp_last_tx_ack Last ack seen - sk->tcp_dup_ack Dup ack count for fast retransmit - - -Frames are queued for output by tcp_write. We do our best to send the frames -off immediately if possible, but otherwise queue and compute the body -checksum in the copy. - -When a write is done we try to clear any pending events and piggy back them. -If the window is full we queue full sized frames. On the first timeout in -zero window we split this. - -On a timer we walk the retransmit list to send any retransmits, update the -backoff timers etc. A change of route table stamp causes a change of header -and recompute. We add any new tcp level headers and refinish the checksum -before sending. - diff --git a/Documentation/networking/xfrm_device.txt b/Documentation/networking/xfrm_device.txt index 50c34ca65efe..267f55b5f54a 100644 --- a/Documentation/networking/xfrm_device.txt +++ b/Documentation/networking/xfrm_device.txt @@ -68,6 +68,10 @@ and an indication of whether it is for Rx or Tx. The driver should - verify the algorithm is supported for offloads - store the SA information (key, salt, target-ip, protocol, etc) - enable the HW offload of the SA + - return status value: + 0 success + -EOPNETSUPP offload not supported, try SW IPsec + other fail the request The driver can also set an offload_handle in the SA, an opaque void pointer that can be used to convey context into the fast-path offload requests. diff --git a/Documentation/nvmem/nvmem.txt b/Documentation/nvmem/nvmem.txt index 8d8d8f58f96f..fc2fe4b18655 100644 --- a/Documentation/nvmem/nvmem.txt +++ b/Documentation/nvmem/nvmem.txt @@ -58,6 +58,37 @@ static int qfprom_probe(struct platform_device *pdev) It is mandatory that the NVMEM provider has a regmap associated with its struct device. Failure to do would return error code from nvmem_register(). +Users of board files can define and register nvmem cells using the +nvmem_cell_table struct: + +static struct nvmem_cell_info foo_nvmem_cells[] = { + { + .name = "macaddr", + .offset = 0x7f00, + .bytes = ETH_ALEN, + } +}; + +static struct nvmem_cell_table foo_nvmem_cell_table = { + .nvmem_name = "i2c-eeprom", + .cells = foo_nvmem_cells, + .ncells = ARRAY_SIZE(foo_nvmem_cells), +}; + +nvmem_add_cell_table(&foo_nvmem_cell_table); + +Additionally it is possible to create nvmem cell lookup entries and register +them with the nvmem framework from machine code as shown in the example below: + +static struct nvmem_cell_lookup foo_nvmem_lookup = { + .nvmem_name = "i2c-eeprom", + .cell_name = "macaddr", + .dev_id = "foo_mac.0", + .con_id = "mac-address", +}; + +nvmem_add_cell_lookups(&foo_nvmem_lookup, 1); + NVMEM Consumers +++++++++++++++ diff --git a/Documentation/parisc/00-INDEX b/Documentation/parisc/00-INDEX deleted file mode 100644 index cbd060961f43..000000000000 --- a/Documentation/parisc/00-INDEX +++ /dev/null @@ -1,6 +0,0 @@ -00-INDEX - - this file. -debugging - - some debugging hints for real-mode code -registers - - current/planned usage of registers diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX deleted file mode 100644 index 7f3c2def2cac..000000000000 --- a/Documentation/power/00-INDEX +++ /dev/null @@ -1,44 +0,0 @@ -00-INDEX - - This file -apm-acpi.txt - - basic info about the APM and ACPI support. -basic-pm-debugging.txt - - Debugging suspend and resume -charger-manager.txt - - Battery charger management. -admin-guide/devices.rst - - How drivers interact with system-wide power management -drivers-testing.txt - - Testing suspend and resume support in device drivers -freezing-of-tasks.txt - - How processes and controlled during suspend -interface.txt - - Power management user interface in /sys/power -opp.txt - - Operating Performance Point library -pci.txt - - How the PCI Subsystem Does Power Management -pm_qos_interface.txt - - info on Linux PM Quality of Service interface -power_supply_class.txt - - Tells userspace about battery, UPS, AC or DC power supply properties -runtime_pm.txt - - Power management framework for I/O devices. -s2ram.txt - - How to get suspend to ram working (and debug it when it isn't) -states.txt - - System power management states -suspend-and-cpuhotplug.txt - - Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug -swsusp-and-swap-files.txt - - Using swap files with software suspend (to disk) -swsusp-dmcrypt.txt - - How to use dm-crypt and software suspend (to disk) together -swsusp.txt - - Goals, implementation, and usage of software suspend (ACPI S3) -tricks.txt - - How to trick software suspend (to disk) into working when it isn't -userland-swsusp.txt - - Experimental implementation of software suspend in userspace -video.txt - - Video issues during resume from suspend diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX deleted file mode 100644 index 9dc845cf7d88..000000000000 --- a/Documentation/powerpc/00-INDEX +++ /dev/null @@ -1,34 +0,0 @@ -Index of files in Documentation/powerpc. If you think something about -Linux/PPC needs an entry here, needs correction or you've written one -please mail me. - Cort Dougan (cort@fsmlabs.com) - -00-INDEX - - this file -bootwrapper.txt - - Information on how the powerpc kernel is wrapped for boot on various - different platforms. -cpu_features.txt - - info on how we support a variety of CPUs with minimal compile-time - options. -cxl.txt - - Overview of the CXL driver. -eeh-pci-error-recovery.txt - - info on PCI Bus EEH Error Recovery -firmware-assisted-dump.txt - - Documentation on the firmware assisted dump mechanism "fadump". -hvcs.txt - - IBM "Hypervisor Virtual Console Server" Installation Guide -mpc52xx.txt - - Linux 2.6.x on MPC52xx family -pmu-ebb.txt - - Description of the API for using the PMU with Event Based Branches. -qe_firmware.txt - - describes the layout of firmware binaries for the Freescale QUICC - Engine and the code that parses and uploads the microcode therein. -ptrace.txt - - Information on the ptrace interfaces for hardware debug registers. -transactional_memory.txt - - Overview of the Power8 transactional memory support. -dscr.txt - - Overview DSCR (Data Stream Control Register) support. diff --git a/Documentation/preempt-locking.txt b/Documentation/preempt-locking.txt index c945062be66c..509f5a422d57 100644 --- a/Documentation/preempt-locking.txt +++ b/Documentation/preempt-locking.txt @@ -3,7 +3,6 @@ Proper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe =========================================================================== :Author: Robert Love <rml@tech9.net> -:Last Updated: 28 Aug 2002 Introduction @@ -92,11 +91,12 @@ any locks or interrupts are disabled, since preemption is implicitly disabled in those cases. But keep in mind that 'irqs disabled' is a fundamentally unsafe way of -disabling preemption - any spin_unlock() decreasing the preemption count -to 0 might trigger a reschedule. A simple printk() might trigger a reschedule. -So use this implicit preemption-disabling property only if you know that the -affected codepath does not do any of this. Best policy is to use this only for -small, atomic code that you wrote and which calls no complex functions. +disabling preemption - any cond_resched() or cond_resched_lock() might trigger +a reschedule if the preempt count is 0. A simple printk() might trigger a +reschedule. So use this implicit preemption-disabling property only if you +know that the affected codepath does not do any of this. Best policy is to use +this only for small, atomic code that you wrote and which calls no complex +functions. Example:: diff --git a/Documentation/process/2.Process.rst b/Documentation/process/2.Process.rst index 51d0349c7809..ae020d84d7c4 100644 --- a/Documentation/process/2.Process.rst +++ b/Documentation/process/2.Process.rst @@ -82,7 +82,7 @@ As an example, here is how the 4.16 development cycle went (all dates in March 11 4.16-rc5 March 18 4.16-rc6 March 25 4.16-rc7 - April 1 4.17 stable release + April 1 4.16 stable release ============== =============================== How do the developers decide when to close the development cycle and create diff --git a/Documentation/process/adding-syscalls.rst b/Documentation/process/adding-syscalls.rst index 0d4f29bc798b..88a7d5c8bb2f 100644 --- a/Documentation/process/adding-syscalls.rst +++ b/Documentation/process/adding-syscalls.rst @@ -232,7 +232,7 @@ normally be optional, so add a ``CONFIG`` option (typically to by the option. - Make the option depend on EXPERT if it should be hidden from normal users. - Make any new source files implementing the function dependent on the CONFIG - option in the Makefile (e.g. ``obj-$(CONFIG_XYZZY_SYSCALL) += xyzzy.c``). + option in the Makefile (e.g. ``obj-$(CONFIG_XYZZY_SYSCALL) += xyzzy.o``). - Double check that the kernel still builds with the new CONFIG option turned off. diff --git a/Documentation/process/deprecated.rst b/Documentation/process/deprecated.rst new file mode 100644 index 000000000000..0ef5a63c06ba --- /dev/null +++ b/Documentation/process/deprecated.rst @@ -0,0 +1,119 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================================================================== +Deprecated Interfaces, Language Features, Attributes, and Conventions +===================================================================== + +In a perfect world, it would be possible to convert all instances of +some deprecated API into the new API and entirely remove the old API in +a single development cycle. However, due to the size of the kernel, the +maintainership hierarchy, and timing, it's not always feasible to do these +kinds of conversions at once. This means that new instances may sneak into +the kernel while old ones are being removed, only making the amount of +work to remove the API grow. In order to educate developers about what +has been deprecated and why, this list has been created as a place to +point when uses of deprecated things are proposed for inclusion in the +kernel. + +__deprecated +------------ +While this attribute does visually mark an interface as deprecated, +it `does not produce warnings during builds any more +<https://git.kernel.org/linus/771c035372a036f83353eef46dbb829780330234>`_ +because one of the standing goals of the kernel is to build without +warnings and no one was actually doing anything to remove these deprecated +interfaces. While using `__deprecated` is nice to note an old API in +a header file, it isn't the full solution. Such interfaces must either +be fully removed from the kernel, or added to this file to discourage +others from using them in the future. + +open-coded arithmetic in allocator arguments +-------------------------------------------- +Dynamic size calculations (especially multiplication) should not be +performed in memory allocator (or similar) function arguments due to the +risk of them overflowing. This could lead to values wrapping around and a +smaller allocation being made than the caller was expecting. Using those +allocations could lead to linear overflows of heap memory and other +misbehaviors. (One exception to this is literal values where the compiler +can warn if they might overflow. Though using literals for arguments as +suggested below is also harmless.) + +For example, do not use ``count * size`` as an argument, as in:: + + foo = kmalloc(count * size, GFP_KERNEL); + +Instead, the 2-factor form of the allocator should be used:: + + foo = kmalloc_array(count, size, GFP_KERNEL); + +If no 2-factor form is available, the saturate-on-overflow helpers should +be used:: + + bar = vmalloc(array_size(count, size)); + +Another common case to avoid is calculating the size of a structure with +a trailing array of others structures, as in:: + + header = kzalloc(sizeof(*header) + count * sizeof(*header->item), + GFP_KERNEL); + +Instead, use the helper:: + + header = kzalloc(struct_size(header, item, count), GFP_KERNEL); + +See :c:func:`array_size`, :c:func:`array3_size`, and :c:func:`struct_size`, +for more details as well as the related :c:func:`check_add_overflow` and +:c:func:`check_mul_overflow` family of functions. + +simple_strtol(), simple_strtoll(), simple_strtoul(), simple_strtoull() +---------------------------------------------------------------------- +The :c:func:`simple_strtol`, :c:func:`simple_strtoll`, +:c:func:`simple_strtoul`, and :c:func:`simple_strtoull` functions +explicitly ignore overflows, which may lead to unexpected results +in callers. The respective :c:func:`kstrtol`, :c:func:`kstrtoll`, +:c:func:`kstrtoul`, and :c:func:`kstrtoull` functions tend to be the +correct replacements, though note that those require the string to be +NUL or newline terminated. + +strcpy() +-------- +:c:func:`strcpy` performs no bounds checking on the destination +buffer. This could result in linear overflows beyond the +end of the buffer, leading to all kinds of misbehaviors. While +`CONFIG_FORTIFY_SOURCE=y` and various compiler flags help reduce the +risk of using this function, there is no good reason to add new uses of +this function. The safe replacement is :c:func:`strscpy`. + +strncpy() on NUL-terminated strings +----------------------------------- +Use of :c:func:`strncpy` does not guarantee that the destination buffer +will be NUL terminated. This can lead to various linear read overflows +and other misbehavior due to the missing termination. It also NUL-pads the +destination buffer if the source contents are shorter than the destination +buffer size, which may be a needless performance penalty for callers using +only NUL-terminated strings. The safe replacement is :c:func:`strscpy`. +(Users of :c:func:`strscpy` still needing NUL-padding will need an +explicit :c:func:`memset` added.) + +If a caller is using non-NUL-terminated strings, :c:func:`strncpy()` can +still be used, but destinations should be marked with the `__nonstring +<https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html>`_ +attribute to avoid future compiler warnings. + +strlcpy() +--------- +:c:func:`strlcpy` reads the entire source buffer first, possibly exceeding +the given limit of bytes to copy. This is inefficient and can lead to +linear read overflows if a source string is not NUL-terminated. The +safe replacement is :c:func:`strscpy`. + +Variable Length Arrays (VLAs) +----------------------------- +Using stack VLAs produces much worse machine code than statically +sized stack arrays. While these non-trivial `performance issues +<https://git.kernel.org/linus/02361bc77888>`_ are reason enough to +eliminate VLAs, they are also a security risk. Dynamic growth of a stack +array may exceed the remaining memory in the stack segment. This could +lead to a crash, possible overwriting sensitive contents at the end of the +stack (when built without `CONFIG_THREAD_INFO_IN_TASK=y`), or overwriting +memory adjacent to the stack (when built without `CONFIG_VMAP_STACK=y`) diff --git a/Documentation/process/howto.rst b/Documentation/process/howto.rst index 130bf0f48875..dcb25f94188e 100644 --- a/Documentation/process/howto.rst +++ b/Documentation/process/howto.rst @@ -57,12 +57,13 @@ of doing things. Legal Issues ------------ -The Linux kernel source code is released under the GPL. Please see the -file, COPYING, in the main directory of the source tree, for details on -the license. If you have further questions about the license, please -contact a lawyer, and do not ask on the Linux kernel mailing list. The -people on the mailing lists are not lawyers, and you should not rely on -their statements on legal matters. +The Linux kernel source code is released under the GPL. Please see the file +COPYING in the main directory of the source tree. The Linux kernel licensing +rules and how to use `SPDX <https://spdx.org/>`_ identifiers in source code are +descibed in :ref:`Documentation/process/license-rules.rst <kernel_licensing>`. +If you have further questions about the license, please contact a lawyer, and do +not ask on the Linux kernel mailing list. The people on the mailing lists are +not lawyers, and you should not rely on their statements on legal matters. For common questions and answers about the GPL, please see: diff --git a/Documentation/process/index.rst b/Documentation/process/index.rst index 42691e2880eb..757808526d9a 100644 --- a/Documentation/process/index.rst +++ b/Documentation/process/index.rst @@ -19,6 +19,7 @@ Below are the essential guides that every developer should read. .. toctree:: :maxdepth: 1 + license-rules howto code-of-conduct code-of-conduct-interpretation @@ -42,6 +43,7 @@ Other guides to the community that are of interest to most developers are: stable-kernel-rules submit-checklist kernel-docs + deprecated These are some overall technical guides that have been put here for now for lack of a better place. diff --git a/Documentation/process/license-rules.rst b/Documentation/process/license-rules.rst index 8ea26325fe3f..2bb8c0fc2238 100644 --- a/Documentation/process/license-rules.rst +++ b/Documentation/process/license-rules.rst @@ -1,5 +1,7 @@ .. SPDX-License-Identifier: GPL-2.0 +.. _kernel_licensing: + Linux kernel licensing rules ============================ diff --git a/Documentation/s390/00-INDEX b/Documentation/s390/00-INDEX deleted file mode 100644 index 317f0378ae01..000000000000 --- a/Documentation/s390/00-INDEX +++ /dev/null @@ -1,28 +0,0 @@ -00-INDEX - - this file. -3270.ChangeLog - - ChangeLog for the UTS Global 3270-support patch (outdated). -3270.txt - - how to use the IBM 3270 display system support. -cds.txt - - s390 common device support (common I/O layer). -CommonIO - - common I/O layer command line parameters, procfs and debugfs entries -config3270.sh - - example configuration for 3270 devices. -DASD - - information on the DASD disk device driver. -Debugging390.txt - - hints for debugging on s390 systems. -driver-model.txt - - information on s390 devices and the driver model. -monreader.txt - - information on accessing the z/VM monitor stream from Linux. -qeth.txt - - HiperSockets Bridge Port Support. -s390dbf.txt - - information on using the s390 debug feature. -vfio-ccw.txt - information on the vfio-ccw I/O subchannel driver. -zfcpdump.txt - - information on the s390 SCSI dump tool. diff --git a/Documentation/s390/vfio-ap.txt b/Documentation/s390/vfio-ap.txt new file mode 100644 index 000000000000..65167cfe4485 --- /dev/null +++ b/Documentation/s390/vfio-ap.txt @@ -0,0 +1,837 @@ +Introduction: +============ +The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised +of three AP instructions and from 1 up to 256 PCIe cryptographic adapter cards. +The AP devices provide cryptographic functions to all CPUs assigned to a +linux system running in an IBM Z system LPAR. + +The AP adapter cards are exposed via the AP bus. The motivation for vfio-ap +is to make AP cards available to KVM guests using the VFIO mediated device +framework. This implementation relies considerably on the s390 virtualization +facilities which do most of the hard work of providing direct access to AP +devices. + +AP Architectural Overview: +========================= +To facilitate the comprehension of the design, let's start with some +definitions: + +* AP adapter + + An AP adapter is an IBM Z adapter card that can perform cryptographic + functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters + assigned to the LPAR in which a linux host is running will be available to + the linux host. Each adapter is identified by a number from 0 to 255; however, + the maximum adapter number is determined by machine model and/or adapter type. + When installed, an AP adapter is accessed by AP instructions executed by any + CPU. + + The AP adapter cards are assigned to a given LPAR via the system's Activation + Profile which can be edited via the HMC. When the linux host system is IPL'd + in the LPAR, the AP bus detects the AP adapter cards assigned to the LPAR and + creates a sysfs device for each assigned adapter. For example, if AP adapters + 4 and 10 (0x0a) are assigned to the LPAR, the AP bus will create the following + sysfs device entries: + + /sys/devices/ap/card04 + /sys/devices/ap/card0a + + Symbolic links to these devices will also be created in the AP bus devices + sub-directory: + + /sys/bus/ap/devices/[card04] + /sys/bus/ap/devices/[card04] + +* AP domain + + An adapter is partitioned into domains. An adapter can hold up to 256 domains + depending upon the adapter type and hardware configuration. A domain is + identified by a number from 0 to 255; however, the maximum domain number is + determined by machine model and/or adapter type.. A domain can be thought of + as a set of hardware registers and memory used for processing AP commands. A + domain can be configured with a secure private key used for clear key + encryption. A domain is classified in one of two ways depending upon how it + may be accessed: + + * Usage domains are domains that are targeted by an AP instruction to + process an AP command. + + * Control domains are domains that are changed by an AP command sent to a + usage domain; for example, to set the secure private key for the control + domain. + + The AP usage and control domains are assigned to a given LPAR via the system's + Activation Profile which can be edited via the HMC. When a linux host system + is IPL'd in the LPAR, the AP bus module detects the AP usage and control + domains assigned to the LPAR. The domain number of each usage domain and + adapter number of each AP adapter are combined to create AP queue devices + (see AP Queue section below). The domain number of each control domain will be + represented in a bitmask and stored in a sysfs file + /sys/bus/ap/ap_control_domain_mask. The bits in the mask, from most to least + significant bit, correspond to domains 0-255. + +* AP Queue + + An AP queue is the means by which an AP command is sent to a usage domain + inside a specific adapter. An AP queue is identified by a tuple + comprised of an AP adapter ID (APID) and an AP queue index (APQI). The + APQI corresponds to a given usage domain number within the adapter. This tuple + forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP + instructions include a field containing the APQN to identify the AP queue to + which the AP command is to be sent for processing. + + The AP bus will create a sysfs device for each APQN that can be derived from + the cross product of the AP adapter and usage domain numbers detected when the + AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage + domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the + following sysfs entries: + + /sys/devices/ap/card04/04.0006 + /sys/devices/ap/card04/04.0047 + /sys/devices/ap/card0a/0a.0006 + /sys/devices/ap/card0a/0a.0047 + + The following symbolic links to these devices will be created in the AP bus + devices subdirectory: + + /sys/bus/ap/devices/[04.0006] + /sys/bus/ap/devices/[04.0047] + /sys/bus/ap/devices/[0a.0006] + /sys/bus/ap/devices/[0a.0047] + +* AP Instructions: + + There are three AP instructions: + + * NQAP: to enqueue an AP command-request message to a queue + * DQAP: to dequeue an AP command-reply message from a queue + * PQAP: to administer the queues + + AP instructions identify the domain that is targeted to process the AP + command; this must be one of the usage domains. An AP command may modify a + domain that is not one of the usage domains, but the modified domain + must be one of the control domains. + +AP and SIE: +========== +Let's now take a look at how AP instructions executed on a guest are interpreted +by the hardware. + +A satellite control block called the Crypto Control Block (CRYCB) is attached to +our main hardware virtualization control block. The CRYCB contains three fields +to identify the adapters, usage domains and control domains assigned to the KVM +guest: + +* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned + to the KVM guest. Each bit in the mask, from left to right (i.e. from most + significant to least significant bit in big endian order), corresponds to + an APID from 0-255. If a bit is set, the corresponding adapter is valid for + use by the KVM guest. + +* The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains + assigned to the KVM guest. Each bit in the mask, from left to right (i.e. from + most significant to least significant bit in big endian order), corresponds to + an AP queue index (APQI) from 0-255. If a bit is set, the corresponding queue + is valid for use by the KVM guest. + +* The AP Domain Mask field is a bit mask that identifies the AP control domains + assigned to the KVM guest. The ADM bit mask controls which domains can be + changed by an AP command-request message sent to a usage domain from the + guest. Each bit in the mask, from left to right (i.e. from most significant to + least significant bit in big endian order), corresponds to a domain from + 0-255. If a bit is set, the corresponding domain can be modified by an AP + command-request message sent to a usage domain. + +If you recall from the description of an AP Queue, AP instructions include +an APQN to identify the AP queue to which an AP command-request message is to be +sent (NQAP and PQAP instructions), or from which a command-reply message is to +be received (DQAP instruction). The validity of an APQN is defined by the matrix +calculated from the APM and AQM; it is the cross product of all assigned adapter +numbers (APM) with all assigned queue indexes (AQM). For example, if adapters 1 +and 2 and usage domains 5 and 6 are assigned to a guest, the APQNs (1,5), (1,6), +(2,5) and (2,6) will be valid for the guest. + +The APQNs can provide secure key functionality - i.e., a private key is stored +on the adapter card for each of its domains - so each APQN must be assigned to +at most one guest or to the linux host. + + Example 1: Valid configuration: + ------------------------------ + Guest1: adapters 1,2 domains 5,6 + Guest2: adapter 1,2 domain 7 + + This is valid because both guests have a unique set of APQNs: + Guest1 has APQNs (1,5), (1,6), (2,5), (2,6); + Guest2 has APQNs (1,7), (2,7) + + Example 2: Valid configuration: + ------------------------------ + Guest1: adapters 1,2 domains 5,6 + Guest2: adapters 3,4 domains 5,6 + + This is also valid because both guests have a unique set of APQNs: + Guest1 has APQNs (1,5), (1,6), (2,5), (2,6); + Guest2 has APQNs (3,5), (3,6), (4,5), (4,6) + + Example 3: Invalid configuration: + -------------------------------- + Guest1: adapters 1,2 domains 5,6 + Guest2: adapter 1 domains 6,7 + + This is an invalid configuration because both guests have access to + APQN (1,6). + +The Design: +=========== +The design introduces three new objects: + +1. AP matrix device +2. VFIO AP device driver (vfio_ap.ko) +3. VFIO AP mediated matrix pass-through device + +The VFIO AP device driver +------------------------- +The VFIO AP (vfio_ap) device driver serves the following purposes: + +1. Provides the interfaces to secure APQNs for exclusive use of KVM guests. + +2. Sets up the VFIO mediated device interfaces to manage a mediated matrix + device and creates the sysfs interfaces for assigning adapters, usage + domains, and control domains comprising the matrix for a KVM guest. + +3. Configures the APM, AQM and ADM in the CRYCB referenced by a KVM guest's + SIE state description to grant the guest access to a matrix of AP devices + +Reserve APQNs for exclusive use of KVM guests +--------------------------------------------- +The following block diagram illustrates the mechanism by which APQNs are +reserved: + + +------------------+ + 7 remove | | + +--------------------> cex4queue driver | + | | | + | +------------------+ + | + | + | +------------------+ +-----------------+ + | 5 register driver | | 3 create | | + | +----------------> Device core +----------> matrix device | + | | | | | | + | | +--------^---------+ +-----------------+ + | | | + | | +-------------------+ + | | +-----------------------------------+ | + | | | 4 register AP driver | | 2 register device + | | | | | ++--------+---+-v---+ +--------+-------+-+ +| | | | +| ap_bus +--------------------- > vfio_ap driver | +| | 8 probe | | ++--------^---------+ +--^--^------------+ +6 edit | | | + apmask | +-----------------------------+ | 9 mdev create + aqmask | | 1 modprobe | ++--------+-----+---+ +----------------+-+ +------------------+ +| | | |8 create | mediated | +| admin | | VFIO device core |---------> matrix | +| + | | | device | ++------+-+---------+ +--------^---------+ +--------^---------+ + | | | | + | | 9 create vfio_ap-passthrough | | + | +------------------------------+ | + +-------------------------------------------------------------+ + 10 assign adapter/domain/control domain + +The process for reserving an AP queue for use by a KVM guest is: + +1. The administrator loads the vfio_ap device driver +2. The vfio-ap driver during its initialization will register a single 'matrix' + device with the device core. This will serve as the parent device for + all mediated matrix devices used to configure an AP matrix for a guest. +3. The /sys/devices/vfio_ap/matrix device is created by the device core +4 The vfio_ap device driver will register with the AP bus for AP queue devices + of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap + driver's probe and remove callback interfaces. Devices older than CEX4 queues + are not supported to simplify the implementation by not needlessly + complicating the design by supporting older devices that will go out of + service in the relatively near future, and for which there are few older + systems around on which to test. +5. The AP bus registers the vfio_ap device driver with the device core +6. The administrator edits the AP adapter and queue masks to reserve AP queues + for use by the vfio_ap device driver. +7. The AP bus removes the AP queues reserved for the vfio_ap driver from the + default zcrypt cex4queue driver. +8. The AP bus probes the vfio_ap device driver to bind the queues reserved for + it. +9. The administrator creates a passthrough type mediated matrix device to be + used by a guest +10 The administrator assigns the adapters, usage domains and control domains + to be exclusively used by a guest. + +Set up the VFIO mediated device interfaces +------------------------------------------ +The VFIO AP device driver utilizes the common interface of the VFIO mediated +device core driver to: +* Register an AP mediated bus driver to add a mediated matrix device to and + remove it from a VFIO group. +* Create and destroy a mediated matrix device +* Add a mediated matrix device to and remove it from the AP mediated bus driver +* Add a mediated matrix device to and remove it from an IOMMU group + +The following high-level block diagram shows the main components and interfaces +of the VFIO AP mediated matrix device driver: + + +-------------+ + | | + | +---------+ | mdev_register_driver() +--------------+ + | | Mdev | +<-----------------------+ | + | | bus | | | vfio_mdev.ko | + | | driver | +----------------------->+ |<-> VFIO user + | +---------+ | probe()/remove() +--------------+ APIs + | | + | MDEV CORE | + | MODULE | + | mdev.ko | + | +---------+ | mdev_register_device() +--------------+ + | |Physical | +<-----------------------+ | + | | device | | | vfio_ap.ko |<-> matrix + | |interface| +----------------------->+ | device + | +---------+ | callback +--------------+ + +-------------+ + +During initialization of the vfio_ap module, the matrix device is registered +with an 'mdev_parent_ops' structure that provides the sysfs attribute +structures, mdev functions and callback interfaces for managing the mediated +matrix device. + +* sysfs attribute structures: + * supported_type_groups + The VFIO mediated device framework supports creation of user-defined + mediated device types. These mediated device types are specified + via the 'supported_type_groups' structure when a device is registered + with the mediated device framework. The registration process creates the + sysfs structures for each mediated device type specified in the + 'mdev_supported_types' sub-directory of the device being registered. Along + with the device type, the sysfs attributes of the mediated device type are + provided. + + The VFIO AP device driver will register one mediated device type for + passthrough devices: + /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough + Only the read-only attributes required by the VFIO mdev framework will + be provided: + ... name + ... device_api + ... available_instances + ... device_api + Where: + * name: specifies the name of the mediated device type + * device_api: the mediated device type's API + * available_instances: the number of mediated matrix passthrough devices + that can be created + * device_api: specifies the VFIO API + * mdev_attr_groups + This attribute group identifies the user-defined sysfs attributes of the + mediated device. When a device is registered with the VFIO mediated device + framework, the sysfs attribute files identified in the 'mdev_attr_groups' + structure will be created in the mediated matrix device's directory. The + sysfs attributes for a mediated matrix device are: + * assign_adapter: + * unassign_adapter: + Write-only attributes for assigning/unassigning an AP adapter to/from the + mediated matrix device. To assign/unassign an adapter, the APID of the + adapter is echoed to the respective attribute file. + * assign_domain: + * unassign_domain: + Write-only attributes for assigning/unassigning an AP usage domain to/from + the mediated matrix device. To assign/unassign a domain, the domain + number of the the usage domain is echoed to the respective attribute + file. + * matrix: + A read-only file for displaying the APQNs derived from the cross product + of the adapter and domain numbers assigned to the mediated matrix device. + * assign_control_domain: + * unassign_control_domain: + Write-only attributes for assigning/unassigning an AP control domain + to/from the mediated matrix device. To assign/unassign a control domain, + the ID of the domain to be assigned/unassigned is echoed to the respective + attribute file. + * control_domains: + A read-only file for displaying the control domain numbers assigned to the + mediated matrix device. + +* functions: + * create: + allocates the ap_matrix_mdev structure used by the vfio_ap driver to: + * Store the reference to the KVM structure for the guest using the mdev + * Store the AP matrix configuration for the adapters, domains, and control + domains assigned via the corresponding sysfs attributes files + * remove: + deallocates the mediated matrix device's ap_matrix_mdev structure. This will + be allowed only if a running guest is not using the mdev. + +* callback interfaces + * open: + The vfio_ap driver uses this callback to register a + VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the mdev matrix + device. The open is invoked when QEMU connects the VFIO iommu group + for the mdev matrix device to the MDEV bus. Access to the KVM structure used + to configure the KVM guest is provided via this callback. The KVM structure, + is used to configure the guest's access to the AP matrix defined via the + mediated matrix device's sysfs attribute files. + * release: + unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the + mdev matrix device and deconfigures the guest's AP matrix. + +Configure the APM, AQM and ADM in the CRYCB: +------------------------------------------- +Configuring the AP matrix for a KVM guest will be performed when the +VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier +function is called when QEMU connects to KVM. The guest's AP matrix is +configured via it's CRYCB by: +* Setting the bits in the APM corresponding to the APIDs assigned to the + mediated matrix device via its 'assign_adapter' interface. +* Setting the bits in the AQM corresponding to the domains assigned to the + mediated matrix device via its 'assign_domain' interface. +* Setting the bits in the ADM corresponding to the domain dIDs assigned to the + mediated matrix device via its 'assign_control_domains' interface. + +The CPU model features for AP +----------------------------- +The AP stack relies on the presence of the AP instructions as well as two +facilities: The AP Facilities Test (APFT) facility; and the AP Query +Configuration Information (QCI) facility. These features/facilities are made +available to a KVM guest via the following CPU model features: + +1. ap: Indicates whether the AP instructions are installed on the guest. This + feature will be enabled by KVM only if the AP instructions are installed + on the host. + +2. apft: Indicates the APFT facility is available on the guest. This facility + can be made available to the guest only if it is available on the host (i.e., + facility bit 15 is set). + +3. apqci: Indicates the AP QCI facility is available on the guest. This facility + can be made available to the guest only if it is available on the host (i.e., + facility bit 12 is set). + +Note: If the user chooses to specify a CPU model different than the 'host' +model to QEMU, the CPU model features and facilities need to be turned on +explicitly; for example: + + /usr/bin/qemu-system-s390x ... -cpu z13,ap=on,apqci=on,apft=on + +A guest can be precluded from using AP features/facilities by turning them off +explicitly; for example: + + /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off + +Note: If the APFT facility is turned off (apft=off) for the guest, the guest +will not see any AP devices. The zcrypt device drivers that register for type 10 +and newer AP devices - i.e., the cex4card and cex4queue device drivers - need +the APFT facility to ascertain the facilities installed on a given AP device. If +the APFT facility is not installed on the guest, then the probe of device +drivers will fail since only type 10 and newer devices can be configured for +guest use. + +Example: +======= +Let's now provide an example to illustrate how KVM guests may be given +access to AP facilities. For this example, we will show how to configure +three guests such that executing the lszcrypt command on the guests would +look like this: + +Guest1 +------ +CARD.DOMAIN TYPE MODE +------------------------------ +05 CEX5C CCA-Coproc +05.0004 CEX5C CCA-Coproc +05.00ab CEX5C CCA-Coproc +06 CEX5A Accelerator +06.0004 CEX5A Accelerator +06.00ab CEX5C CCA-Coproc + +Guest2 +------ +CARD.DOMAIN TYPE MODE +------------------------------ +05 CEX5A Accelerator +05.0047 CEX5A Accelerator +05.00ff CEX5A Accelerator + +Guest2 +------ +CARD.DOMAIN TYPE MODE +------------------------------ +06 CEX5A Accelerator +06.0047 CEX5A Accelerator +06.00ff CEX5A Accelerator + +These are the steps: + +1. Install the vfio_ap module on the linux host. The dependency chain for the + vfio_ap module is: + * iommu + * s390 + * zcrypt + * vfio + * vfio_mdev + * vfio_mdev_device + * KVM + + To build the vfio_ap module, the kernel build must be configured with the + following Kconfig elements selected: + * IOMMU_SUPPORT + * S390 + * ZCRYPT + * S390_AP_IOMMU + * VFIO + * VFIO_MDEV + * VFIO_MDEV_DEVICE + * KVM + + If using make menuconfig select the following to build the vfio_ap module: + -> Device Drivers + -> IOMMU Hardware Support + select S390 AP IOMMU Support + -> VFIO Non-Privileged userspace driver framework + -> Mediated device driver frramework + -> VFIO driver for Mediated devices + -> I/O subsystem + -> VFIO support for AP devices + +2. Secure the AP queues to be used by the three guests so that the host can not + access them. To secure them, there are two sysfs files that specify + bitmasks marking a subset of the APQN range as 'usable by the default AP + queue device drivers' or 'not usable by the default device drivers' and thus + available for use by the vfio_ap device driver'. The location of the sysfs + files containing the masks are: + + /sys/bus/ap/apmask + /sys/bus/ap/aqmask + + The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs + (APID). Each bit in the mask, from left to right (i.e., from most significant + to least significant bit in big endian order), corresponds to an APID from + 0-255. If a bit is set, the APID is marked as usable only by the default AP + queue device drivers; otherwise, the APID is usable by the vfio_ap + device driver. + + The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes + (APQI). Each bit in the mask, from left to right (i.e., from most significant + to least significant bit in big endian order), corresponds to an APQI from + 0-255. If a bit is set, the APQI is marked as usable only by the default AP + queue device drivers; otherwise, the APQI is usable by the vfio_ap device + driver. + + Take, for example, the following mask: + + 0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff + + It indicates: + + 1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6 + belong to the vfio_ap device driver's pool. + + The APQN of each AP queue device assigned to the linux host is checked by the + AP bus against the set of APQNs derived from the cross product of APIDs + and APQIs marked as usable only by the default AP queue device drivers. If a + match is detected, only the default AP queue device drivers will be probed; + otherwise, the vfio_ap device driver will be probed. + + By default, the two masks are set to reserve all APQNs for use by the default + AP queue device drivers. There are two ways the default masks can be changed: + + 1. The sysfs mask files can be edited by echoing a string into the + respective sysfs mask file in one of two formats: + + * An absolute hex string starting with 0x - like "0x12345678" - sets + the mask. If the given string is shorter than the mask, it is padded + with 0s on the right; for example, specifying a mask value of 0x41 is + the same as specifying: + + 0x4100000000000000000000000000000000000000000000000000000000000000 + + Keep in mind that the mask reads from left to right (i.e., most + significant to least significant bit in big endian order), so the mask + above identifies device numbers 1 and 7 (01000001). + + If the string is longer than the mask, the operation is terminated with + an error (EINVAL). + + * Individual bits in the mask can be switched on and off by specifying + each bit number to be switched in a comma separated list. Each bit + number string must be prepended with a ('+') or minus ('-') to indicate + the corresponding bit is to be switched on ('+') or off ('-'). Some + valid values are: + + "+0" switches bit 0 on + "-13" switches bit 13 off + "+0x41" switches bit 65 on + "-0xff" switches bit 255 off + + The following example: + +0,-6,+0x47,-0xf0 + + Switches bits 0 and 71 (0x47) on + Switches bits 6 and 240 (0xf0) off + + Note that the bits not specified in the list remain as they were before + the operation. + + 2. The masks can also be changed at boot time via parameters on the kernel + command line like this: + + ap.apmask=0xffff ap.aqmask=0x40 + + This would create the following masks: + + apmask: + 0xffff000000000000000000000000000000000000000000000000000000000000 + + aqmask: + 0x4000000000000000000000000000000000000000000000000000000000000000 + + Resulting in these two pools: + + default drivers pool: adapter 0-15, domain 1 + alternate drivers pool: adapter 16-255, domains 0, 2-255 + + Securing the APQNs for our example: + ---------------------------------- + To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047, + 06.00ab, and 06.00ff for use by the vfio_ap device driver, the corresponding + APQNs can either be removed from the default masks: + + echo -5,-6 > /sys/bus/ap/apmask + + echo -4,-0x47,-0xab,-0xff > /sys/bus/ap/aqmask + + Or the masks can be set as follows: + + echo 0xf9ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff \ + > apmask + + echo 0xf7fffffffffffffffeffffffffffffffffffffffffeffffffffffffffffffffe \ + > aqmask + + This will result in AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, + 06.0047, 06.00ab, and 06.00ff getting bound to the vfio_ap device driver. The + sysfs directory for the vfio_ap device driver will now contain symbolic links + to the AP queue devices bound to it: + + /sys/bus/ap + ... [drivers] + ...... [vfio_ap] + ......... [05.0004] + ......... [05.0047] + ......... [05.00ab] + ......... [05.00ff] + ......... [06.0004] + ......... [06.0047] + ......... [06.00ab] + ......... [06.00ff] + + Keep in mind that only type 10 and newer adapters (i.e., CEX4 and later) + can be bound to the vfio_ap device driver. The reason for this is to + simplify the implementation by not needlessly complicating the design by + supporting older devices that will go out of service in the relatively near + future and for which there are few older systems on which to test. + + The administrator, therefore, must take care to secure only AP queues that + can be bound to the vfio_ap device driver. The device type for a given AP + queue device can be read from the parent card's sysfs directory. For example, + to see the hardware type of the queue 05.0004: + + cat /sys/bus/ap/devices/card05/hwtype + + The hwtype must be 10 or higher (CEX4 or newer) in order to be bound to the + vfio_ap device driver. + +3. Create the mediated devices needed to configure the AP matrixes for the + three guests and to provide an interface to the vfio_ap driver for + use by the guests: + + /sys/devices/vfio_ap/matrix/ + --- [mdev_supported_types] + ------ [vfio_ap-passthrough] (passthrough mediated matrix device type) + --------- create + --------- [devices] + + To create the mediated devices for the three guests: + + uuidgen > create + uuidgen > create + uuidgen > create + + or + + echo $uuid1 > create + echo $uuid2 > create + echo $uuid3 > create + + This will create three mediated devices in the [devices] subdirectory named + after the UUID written to the create attribute file. We call them $uuid1, + $uuid2 and $uuid3 and this is the sysfs directory structure after creation: + + /sys/devices/vfio_ap/matrix/ + --- [mdev_supported_types] + ------ [vfio_ap-passthrough] + --------- [devices] + ------------ [$uuid1] + --------------- assign_adapter + --------------- assign_control_domain + --------------- assign_domain + --------------- matrix + --------------- unassign_adapter + --------------- unassign_control_domain + --------------- unassign_domain + + ------------ [$uuid2] + --------------- assign_adapter + --------------- assign_control_domain + --------------- assign_domain + --------------- matrix + --------------- unassign_adapter + ----------------unassign_control_domain + ----------------unassign_domain + + ------------ [$uuid3] + --------------- assign_adapter + --------------- assign_control_domain + --------------- assign_domain + --------------- matrix + --------------- unassign_adapter + ----------------unassign_control_domain + ----------------unassign_domain + +4. The administrator now needs to configure the matrixes for the mediated + devices $uuid1 (for Guest1), $uuid2 (for Guest2) and $uuid3 (for Guest3). + + This is how the matrix is configured for Guest1: + + echo 5 > assign_adapter + echo 6 > assign_adapter + echo 4 > assign_domain + echo 0xab > assign_domain + + Control domains can similarly be assigned using the assign_control_domain + sysfs file. + + If a mistake is made configuring an adapter, domain or control domain, + you can use the unassign_xxx files to unassign the adapter, domain or + control domain. + + To display the matrix configuration for Guest1: + + cat matrix + + This is how the matrix is configured for Guest2: + + echo 5 > assign_adapter + echo 0x47 > assign_domain + echo 0xff > assign_domain + + This is how the matrix is configured for Guest3: + + echo 6 > assign_adapter + echo 0x47 > assign_domain + echo 0xff > assign_domain + + In order to successfully assign an adapter: + + * The adapter number specified must represent a value from 0 up to the + maximum adapter number configured for the system. If an adapter number + higher than the maximum is specified, the operation will terminate with + an error (ENODEV). + + * All APQNs that can be derived from the adapter ID and the IDs of + the previously assigned domains must be bound to the vfio_ap device + driver. If no domains have yet been assigned, then there must be at least + one APQN with the specified APID bound to the vfio_ap driver. If no such + APQNs are bound to the driver, the operation will terminate with an + error (EADDRNOTAVAIL). + + No APQN that can be derived from the adapter ID and the IDs of the + previously assigned domains can be assigned to another mediated matrix + device. If an APQN is assigned to another mediated matrix device, the + operation will terminate with an error (EADDRINUSE). + + In order to successfully assign a domain: + + * The domain number specified must represent a value from 0 up to the + maximum domain number configured for the system. If a domain number + higher than the maximum is specified, the operation will terminate with + an error (ENODEV). + + * All APQNs that can be derived from the domain ID and the IDs of + the previously assigned adapters must be bound to the vfio_ap device + driver. If no domains have yet been assigned, then there must be at least + one APQN with the specified APQI bound to the vfio_ap driver. If no such + APQNs are bound to the driver, the operation will terminate with an + error (EADDRNOTAVAIL). + + No APQN that can be derived from the domain ID and the IDs of the + previously assigned adapters can be assigned to another mediated matrix + device. If an APQN is assigned to another mediated matrix device, the + operation will terminate with an error (EADDRINUSE). + + In order to successfully assign a control domain, the domain number + specified must represent a value from 0 up to the maximum domain number + configured for the system. If a control domain number higher than the maximum + is specified, the operation will terminate with an error (ENODEV). + +5. Start Guest1: + + /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ + -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ... + +7. Start Guest2: + + /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ + -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ... + +7. Start Guest3: + + /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ + -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ... + +When the guest is shut down, the mediated matrix devices may be removed. + +Using our example again, to remove the mediated matrix device $uuid1: + + /sys/devices/vfio_ap/matrix/ + --- [mdev_supported_types] + ------ [vfio_ap-passthrough] + --------- [devices] + ------------ [$uuid1] + --------------- remove + + + echo 1 > remove + + This will remove all of the mdev matrix device's sysfs structures including + the mdev device itself. To recreate and reconfigure the mdev matrix device, + all of the steps starting with step 3 will have to be performed again. Note + that the remove will fail if a guest using the mdev is still running. + + It is not necessary to remove an mdev matrix device, but one may want to + remove it if no guest will use it during the remaining lifetime of the linux + host. If the mdev matrix device is removed, one may want to also reconfigure + the pool of adapters and queues reserved for use by the default drivers. + +Limitations +=========== +* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN + to the default drivers pool of a queue that is still assigned to a mediated + device in use by a guest. It is incumbent upon the administrator to + ensure there is no mediated device in use by a guest to which the APQN is + assigned lest the host be given access to the private data of the AP queue + device such as a private key configured specifically for the guest. + +* Dynamically modifying the AP matrix for a running guest (which would amount to + hot(un)plug of AP devices for the guest) is currently not supported + +* Live guest migration is not supported for guests using AP devices. diff --git a/Documentation/scheduler/00-INDEX b/Documentation/scheduler/00-INDEX deleted file mode 100644 index eccf7ad2e7f9..000000000000 --- a/Documentation/scheduler/00-INDEX +++ /dev/null @@ -1,18 +0,0 @@ -00-INDEX - - this file. -sched-arch.txt - - CPU Scheduler implementation hints for architecture specific code. -sched-bwc.txt - - CFS bandwidth control overview. -sched-design-CFS.txt - - goals, design and implementation of the Completely Fair Scheduler. -sched-domains.txt - - information on scheduling domains. -sched-nice-design.txt - - How and why the scheduler's nice levels are implemented. -sched-rt-group.txt - - real-time group scheduling. -sched-deadline.txt - - deadline scheduling. -sched-stats.txt - - information on schedstats (Linux Scheduler Statistics). diff --git a/Documentation/scheduler/completion.txt b/Documentation/scheduler/completion.txt index 2dbff579f957..e5b9df4d8078 100644 --- a/Documentation/scheduler/completion.txt +++ b/Documentation/scheduler/completion.txt @@ -267,7 +267,8 @@ queue spinlock. Any such concurrent calls to complete() or complete_all() probably are a design bug. Signaling completion from IRQ context is fine as it will appropriately -lock with spin_lock_irqsave()/spin_unlock_irqrestore() and it will never sleep. +lock with spin_lock_irqsave()/spin_unlock_irqrestore() and it will never +sleep. try_wait_for_completion()/completion_done(): diff --git a/Documentation/scsi/00-INDEX b/Documentation/scsi/00-INDEX deleted file mode 100644 index bb4a76f823e1..000000000000 --- a/Documentation/scsi/00-INDEX +++ /dev/null @@ -1,108 +0,0 @@ -00-INDEX - - this file -53c700.txt - - info on driver for 53c700 based adapters -BusLogic.txt - - info on driver for adapters with BusLogic chips -ChangeLog.1992-1997 - - Changes to scsi files, if not listed elsewhere -ChangeLog.arcmsr - - Changes to driver for ARECA's SATA RAID controller cards -ChangeLog.ips - - IBM ServeRAID driver Changelog -ChangeLog.lpfc - - Changes to lpfc driver -ChangeLog.megaraid - - Changes to LSI megaraid controller. -ChangeLog.megaraid_sas - - Changes to serial attached scsi version of LSI megaraid controller. -ChangeLog.ncr53c8xx - - Changes to ncr53c8xx driver -ChangeLog.sym53c8xx - - Changes to sym53c8xx driver -ChangeLog.sym53c8xx_2 - - Changes to second generation of sym53c8xx driver -FlashPoint.txt - - info on driver for BusLogic FlashPoint adapters -LICENSE.FlashPoint - - Licence of the Flashpoint driver -LICENSE.qla2xxx - - License for QLogic Linux Fibre Channel HBA Driver firmware. -LICENSE.qla4xxx - - License for QLogic Linux iSCSI HBA Driver. -Mylex.txt - - info on driver for Mylex adapters -NinjaSCSI.txt - - info on WorkBiT NinjaSCSI-32/32Bi driver -aacraid.txt - - Driver supporting Adaptec RAID controllers -advansys.txt - - List of Advansys Host Adapters -aha152x.txt - - info on driver for Adaptec AHA152x based adapters -aic79xx.txt - - Adaptec Ultra320 SCSI host adapters -aic7xxx.txt - - info on driver for Adaptec controllers -arcmsr_spec.txt - - ARECA FIRMWARE SPEC (for IOP331 adapter) -bfa.txt - - Brocade FC/FCOE adapter driver. -bnx2fc.txt - - FCoE hardware offload for Broadcom network interfaces. -cxgb3i.txt - - Chelsio iSCSI Linux Driver -dc395x.txt - - README file for the dc395x SCSI driver -dpti.txt - - info on driver for DPT SmartRAID and Adaptec I2O RAID based adapters -dtc3x80.txt - - info on driver for DTC 2x80 based adapters -g_NCR5380.txt - - info on driver for NCR5380 and NCR53c400 based adapters -hpsa.txt - - HP Smart Array Controller SCSI driver. -hptiop.txt - - HIGHPOINT ROCKETRAID 3xxx RAID DRIVER -libsas.txt - - Serial Attached SCSI management layer. -link_power_management_policy.txt - - Link power management options. -lpfc.txt - - LPFC driver release notes -megaraid.txt - - Common Management Module, shared code handling ioctls for LSI drivers -ncr53c8xx.txt - - info on driver for NCR53c8xx based adapters -osd.txt - Object-Based Storage Device, command set introduction. -osst.txt - - info on driver for OnStream SC-x0 SCSI tape -ppa.txt - - info on driver for IOmega zip drive -qlogicfas.txt - - info on driver for QLogic FASxxx based adapters -scsi-changer.txt - - README for the SCSI media changer driver -scsi-generic.txt - - info on the sg driver for generic (non-disk/CD/tape) SCSI devices. -scsi-parameters.txt - - List of SCSI-parameters to pass to the kernel at module load-time. -scsi.txt - - short blurb on using SCSI support as a module. -scsi_mid_low_api.txt - - info on API between SCSI layer and low level drivers -scsi_eh.txt - - info on SCSI midlayer error handling infrastructure -scsi_fc_transport.txt - - SCSI Fiber Channel Tansport -st.txt - - info on scsi tape driver -sym53c500_cs.txt - - info on PCMCIA driver for Symbios Logic 53c500 based adapters -sym53c8xx_2.txt - - info on second generation driver for sym53c8xx based adapters -tmscsim.txt - - info on driver for AM53c974 based adapters -ufs.txt - - info on Universal Flash Storage(UFS) and UFS host controller driver. diff --git a/Documentation/scsi/ufs.txt b/Documentation/scsi/ufs.txt index 41a6164592aa..520b5b033256 100644 --- a/Documentation/scsi/ufs.txt +++ b/Documentation/scsi/ufs.txt @@ -128,6 +128,26 @@ The current UFSHCD implementation supports following functionality, In this version of UFSHCD Query requests and power management functionality are not implemented. +4. BSG Support +------------------ + +This transport driver supports exchanging UFS protocol information units +(UPIUs) with a UFS device. Typically, user space will allocate +struct ufs_bsg_request and struct ufs_bsg_reply (see ufs_bsg.h) as +request_upiu and reply_upiu respectively. Filling those UPIUs should +be done in accordance with JEDEC spec UFS2.1 paragraph 10.7. +*Caveat emptor*: The driver makes no further input validations and sends the +UPIU to the device as it is. Open the bsg device in /dev/ufs-bsg and +send SG_IO with the applicable sg_io_v4: + + io_hdr_v4.guard = 'Q'; + io_hdr_v4.protocol = BSG_PROTOCOL_SCSI; + io_hdr_v4.subprotocol = BSG_SUB_PROTOCOL_SCSI_TRANSPORT; + io_hdr_v4.response = (__u64)reply_upiu; + io_hdr_v4.max_response_len = reply_len; + io_hdr_v4.request_len = request_len; + io_hdr_v4.request = (__u64)request_upiu; + UFS Specifications can be found at, UFS - http://www.jedec.org/sites/default/files/docs/JESD220.pdf UFSHCI - http://www.jedec.org/sites/default/files/docs/JESD223.pdf diff --git a/Documentation/security/LSM.rst b/Documentation/security/LSM.rst index 98522e0e1ee2..8b9ee597e9d0 100644 --- a/Documentation/security/LSM.rst +++ b/Documentation/security/LSM.rst @@ -5,7 +5,7 @@ Linux Security Module Development Based on https://lkml.org/lkml/2007/10/26/215, a new LSM is accepted into the kernel when its intent (a description of what it tries to protect against and in what cases one would expect to -use it) has been appropriately documented in ``Documentation/security/LSM.rst``. +use it) has been appropriately documented in ``Documentation/admin-guide/LSM/``. This allows an LSM's code to be easily compared to its goals, and so that end users and distros can make a more informed decision about which LSMs suit their requirements. diff --git a/Documentation/security/keys/ecryptfs.rst b/Documentation/security/keys/ecryptfs.rst index 4920f3a8ea75..0e2be0a6bb6a 100644 --- a/Documentation/security/keys/ecryptfs.rst +++ b/Documentation/security/keys/ecryptfs.rst @@ -5,10 +5,10 @@ Encrypted keys for the eCryptfs filesystem ECryptfs is a stacked filesystem which transparently encrypts and decrypts each file using a randomly generated File Encryption Key (FEK). -Each FEK is in turn encrypted with a File Encryption Key Encryption Key (FEFEK) +Each FEK is in turn encrypted with a File Encryption Key Encryption Key (FEKEK) either in kernel space or in user space with a daemon called 'ecryptfsd'. In the former case the operation is performed directly by the kernel CryptoAPI -using a key, the FEFEK, derived from a user prompted passphrase; in the latter +using a key, the FEKEK, derived from a user prompted passphrase; in the latter the FEK is encrypted by 'ecryptfsd' with the help of external libraries in order to support other mechanisms like public key cryptography, PKCS#11 and TPM based operations. @@ -22,12 +22,12 @@ by the userspace utility 'mount.ecryptfs' shipped with the package The 'encrypted' key type has been extended with the introduction of the new format 'ecryptfs' in order to be used in conjunction with the eCryptfs filesystem. Encrypted keys of the newly introduced format store an -authentication token in its payload with a FEFEK randomly generated by the +authentication token in its payload with a FEKEK randomly generated by the kernel and protected by the parent master key. In order to avoid known-plaintext attacks, the datablob obtained through commands 'keyctl print' or 'keyctl pipe' does not contain the overall -authentication token, which content is well known, but only the FEFEK in +authentication token, which content is well known, but only the FEKEK in encrypted form. The eCryptfs filesystem may really benefit from using encrypted keys in that the diff --git a/Documentation/serial/00-INDEX b/Documentation/serial/00-INDEX deleted file mode 100644 index 8021a9f29fc5..000000000000 --- a/Documentation/serial/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - this file. -README.cycladesZ - - info on Cyclades-Z firmware loading. -driver - - intro to the low level serial driver. -moxa-smartio - - file with info on installing/using Moxa multiport serial driver. -n_gsm.txt - - GSM 0710 tty multiplexer howto. -rocket.txt - - info on the Comtrol RocketPort multiport serial driver. -serial-rs485.txt - - info about RS485 structures and support in the kernel. -tty.txt - - guide to the locking policies of the tty layer. diff --git a/Documentation/sound/hd-audio/models.rst b/Documentation/sound/hd-audio/models.rst index e06238131f77..368a07a165f5 100644 --- a/Documentation/sound/hd-audio/models.rst +++ b/Documentation/sound/hd-audio/models.rst @@ -309,6 +309,8 @@ asus-nx50 ASUS Nx50 fixups asus-nx51 ASUS Nx51 fixups +asus-g751 + ASUS G751 fixups alc891-headset Headset mode support on ALC891 alc891-headset-multi diff --git a/Documentation/sound/kernel-api/writing-an-alsa-driver.rst b/Documentation/sound/kernel-api/writing-an-alsa-driver.rst index a0b268466cb1..b37234afdfa1 100644 --- a/Documentation/sound/kernel-api/writing-an-alsa-driver.rst +++ b/Documentation/sound/kernel-api/writing-an-alsa-driver.rst @@ -3,8 +3,6 @@ Writing an ALSA Driver ====================== :Author: Takashi Iwai <tiwai@suse.de> -:Date: Oct 15, 2007 -:Edition: 0.3.7 Preface ======= @@ -21,11 +19,6 @@ explain the general topic of linux kernel coding and doesn't cover low-level driver implementation details. It only describes the standard way to write a PCI sound driver on ALSA. -If you are already familiar with the older ALSA ver.0.5.x API, you can -check the drivers such as ``sound/pci/es1938.c`` or -``sound/pci/maestro3.c`` which have also almost the same code-base in -the ALSA 0.5.x tree, so you can compare the differences. - This document is still a draft version. Any feedback and corrections, please!! @@ -35,24 +28,7 @@ File Tree Structure General ------- -The ALSA drivers are provided in two ways. - -One is the trees provided as a tarball or via cvs from the ALSA's ftp -site, and another is the 2.6 (or later) Linux kernel tree. To -synchronize both, the ALSA driver tree is split into two different -trees: alsa-kernel and alsa-driver. The former contains purely the -source code for the Linux 2.6 (or later) tree. This tree is designed -only for compilation on 2.6 or later environment. The latter, -alsa-driver, contains many subtle files for compiling ALSA drivers -outside of the Linux kernel tree, wrapper functions for older 2.2 and -2.4 kernels, to adapt the latest kernel API, and additional drivers -which are still in development or in tests. The drivers in alsa-driver -tree will be moved to alsa-kernel (and eventually to the 2.6 kernel -tree) when they are finished and confirmed to work fine. - -The file tree structure of ALSA driver is depicted below. Both -alsa-kernel and alsa-driver have almost the same file structure, except -for “core” directory. It's named as “acore” in alsa-driver tree. +The file tree structure of ALSA driver is depicted below. :: @@ -61,14 +37,11 @@ for “core” directory. It's named as “acore” in alsa-driver tree. /oss /seq /oss - /instr - /ioctl32 /include /drivers /mpu401 /opl3 /i2c - /l3 /synth /emux /pci @@ -80,6 +53,7 @@ for “core” directory. It's named as “acore” in alsa-driver tree. /sparc /usb /pcmcia /(cards) + /soc /oss @@ -99,13 +73,6 @@ directory. The rawmidi OSS emulation is included in the ALSA rawmidi code since it's quite small. The sequencer code is stored in ``core/seq/oss`` directory (see `below <#core-seq-oss>`__). -core/ioctl32 -~~~~~~~~~~~~ - -This directory contains the 32bit-ioctl wrappers for 64bit architectures -such like x86-64, ppc64 and sparc64. For 32bit and alpha architectures, -these are not compiled. - core/seq ~~~~~~~~ @@ -119,11 +86,6 @@ core/seq/oss This contains the OSS sequencer emulation codes. -core/seq/instr -~~~~~~~~~~~~~~ - -This directory contains the modules for the sequencer instrument layer. - include directory ----------------- @@ -161,11 +123,6 @@ Although there is a standard i2c layer on Linux, ALSA has its own i2c code for some cards, because the soundcard needs only a simple operation and the standard i2c API is too complicated for such a purpose. -i2c/l3 -~~~~~~ - -This is a sub-directory for ARM L3 i2c. - synth directory --------------- @@ -209,11 +166,19 @@ The PCMCIA, especially PCCard drivers will go here. CardBus drivers will be in the pci directory, because their API is identical to that of standard PCI cards. +soc directory +------------- + +This directory contains the codes for ASoC (ALSA System on Chip) +layer including ASoC core, codec and machine drivers. + oss directory ------------- -The OSS/Lite source files are stored here in Linux 2.6 (or later) tree. -In the ALSA driver tarball, this directory is empty, of course :) +Here contains OSS/Lite codes. +All codes have been deprecated except for dmasound on m68k as of +writing this. + Basic Flow for PCI Drivers ========================== @@ -352,10 +317,8 @@ to details explained in the following section. /* (3) */ err = snd_mychip_create(card, pci, &chip); - if (err < 0) { - snd_card_free(card); - return err; - } + if (err < 0) + goto error; /* (4) */ strcpy(card->driver, "My Chip"); @@ -368,22 +331,23 @@ to details explained in the following section. /* (6) */ err = snd_card_register(card); - if (err < 0) { - snd_card_free(card); - return err; - } + if (err < 0) + goto error; /* (7) */ pci_set_drvdata(pci, card); dev++; return 0; + + error: + snd_card_free(card); + return err; } /* destructor -- see the "Destructor" sub-section */ static void snd_mychip_remove(struct pci_dev *pci) { snd_card_free(pci_get_drvdata(pci)); - pci_set_drvdata(pci, NULL); } @@ -445,14 +409,26 @@ In this part, the PCI resources are allocated. struct mychip *chip; .... err = snd_mychip_create(card, pci, &chip); - if (err < 0) { - snd_card_free(card); - return err; - } + if (err < 0) + goto error; The details will be explained in the section `PCI Resource Management`_. +When something goes wrong, the probe function needs to deal with the +error. In this example, we have a single error handling path placed +at the end of the function. + +:: + + error: + snd_card_free(card); + return err; + +Since each component can be properly freed, the single +:c:func:`snd_card_free()` call should suffice in most cases. + + 4) Set the driver ID and name strings. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -486,10 +462,8 @@ too. :: err = snd_card_register(card); - if (err < 0) { - snd_card_free(card); - return err; - } + if (err < 0) + goto error; Will be explained in the section `Management of Cards and Components`_, too. @@ -513,14 +487,13 @@ The destructor, remove callback, simply releases the card instance. Then the ALSA middle layer will release all the attached components automatically. -It would be typically like the following: +It would be typically just :c:func:`calling snd_card_free()`: :: static void snd_mychip_remove(struct pci_dev *pci) { snd_card_free(pci_get_drvdata(pci)); - pci_set_drvdata(pci, NULL); } @@ -546,7 +519,7 @@ in the source file. If the code is split into several files, the files without module options don't need them. In addition to these headers, you'll need ``<linux/interrupt.h>`` for -interrupt handling, and ``<asm/io.h>`` for I/O access. If you use the +interrupt handling, and ``<linux/io.h>`` for I/O access. If you use the :c:func:`mdelay()` or :c:func:`udelay()` functions, you'll need to include ``<linux/delay.h>`` too. @@ -720,6 +693,13 @@ function, which will call the real destructor. where :c:func:`snd_mychip_free()` is the real destructor. +The demerit of this method is the obviously more amount of codes. +The merit is, however, you can trigger the own callback at registering +and disconnecting the card via setting in snd_device_ops. +About the registering and disconnecting the card, see the subsections +below. + + Registration and Release ------------------------ @@ -905,10 +885,8 @@ Resource Allocation ------------------- The allocation of I/O ports and irqs is done via standard kernel -functions. Unlike ALSA ver.0.5.x., there are no helpers for that. And -these resources must be released in the destructor function (see below). -Also, on ALSA 0.9.x, you don't need to allocate (pseudo-)DMA for PCI -like in ALSA 0.5.x. +functions. These resources must be released in the destructor +function (see below). Now assume that the PCI device has an I/O port with 8 bytes and an interrupt. Then :c:type:`struct mychip <mychip>` will have the @@ -1064,7 +1042,8 @@ and the allocation would be like below: :: - if ((err = pci_request_regions(pci, "My Chip")) < 0) { + err = pci_request_regions(pci, "My Chip"); + if (err < 0) { kfree(chip); return err; } @@ -1086,6 +1065,21 @@ and the corresponding destructor would be: .... } +Of course, a modern way with :c:func:`pci_iomap()` will make things a +bit easier, too. + +:: + + err = pci_request_regions(pci, "My Chip"); + if (err < 0) { + kfree(chip); + return err; + } + chip->iobase_virt = pci_iomap(pci, 0, 0); + +which is paired with :c:func:`pci_iounmap()` at destructor. + + PCI Entries ----------- @@ -1154,13 +1148,6 @@ And at last, the module entries: Note that these module entries are tagged with ``__init`` and ``__exit`` prefixes. -Oh, one thing was forgotten. If you have no exported symbols, you need -to declare it in 2.2 or 2.4 kernels (it's not necessary in 2.6 kernels). - -:: - - EXPORT_NO_SYMBOLS; - That's all! PCM Interface @@ -2113,6 +2100,16 @@ non-contiguous buffers. The mmap calls this callback to get the page address. Some examples will be explained in the later section `Buffer and Memory Management`_, too. +mmap calllback +~~~~~~~~~~~~~~ + +This is another optional callback for controlling mmap behavior. +Once when defined, PCM core calls this callback when a page is +memory-mapped instead of dealing via the standard helper. +If you need special handling (due to some architecture or +device-specific issues), implement everything here as you like. + + PCM Interrupt Handler --------------------- @@ -2370,6 +2367,27 @@ to define the inverse rule: hw_rule_format_by_channels, NULL, SNDRV_PCM_HW_PARAM_CHANNELS, -1); +One typical usage of the hw constraints is to align the buffer size +with the period size. As default, ALSA PCM core doesn't enforce the +buffer size to be aligned with the period size. For example, it'd be +possible to have a combination like 256 period bytes with 999 buffer +bytes. + +Many device chips, however, require the buffer to be a multiple of +periods. In such a case, call +:c:func:`snd_pcm_hw_constraint_integer()` for +``SNDRV_PCM_HW_PARAM_PERIODS``. + +:: + + snd_pcm_hw_constraint_integer(substream->runtime, + SNDRV_PCM_HW_PARAM_PERIODS); + +This assures that the number of periods is integer, hence the buffer +size is aligned with the period size. + +The hw constraint is a very much powerful mechanism to define the +preferred PCM configuration, and there are relevant helpers. I won't give more details here, rather I would like to say, “Luke, use the source.” @@ -3712,7 +3730,14 @@ example, for an intermediate buffer. Since the allocated pages are not contiguous, you need to set the ``page`` callback to obtain the physical address at every offset. -The implementation of ``page`` callback would be like this: +The easiest way to achieve it would be to use +:c:func:`snd_pcm_lib_alloc_vmalloc_buffer()` for allocating the buffer +via :c:func:`vmalloc()`, and set :c:func:`snd_pcm_sgbuf_ops_page()` to +the ``page`` callback. At release, you need to call +:c:func:`snd_pcm_lib_free_vmalloc_buffer()`. + +If you want to implementation the ``page`` manually, it would be like +this: :: @@ -3848,7 +3873,9 @@ Power Management If the chip is supposed to work with suspend/resume functions, you need to add power-management code to the driver. The additional code for -power-management should be ifdef-ed with ``CONFIG_PM``. +power-management should be ifdef-ed with ``CONFIG_PM``, or annotated +with __maybe_unused attribute; otherwise the compiler will complain +you. If the driver *fully* supports suspend/resume that is, the device can be properly resumed to its state when suspend was called, you can set the @@ -3879,18 +3906,16 @@ the case of PCI drivers, the callbacks look like below: :: - #ifdef CONFIG_PM - static int snd_my_suspend(struct pci_dev *pci, pm_message_t state) + static int __maybe_unused snd_my_suspend(struct device *dev) { .... /* do things for suspend */ return 0; } - static int snd_my_resume(struct pci_dev *pci) + static int __maybe_unused snd_my_resume(struct device *dev) { .... /* do things for suspend */ return 0; } - #endif The scheme of the real suspend job is as follows. @@ -3909,18 +3934,14 @@ The scheme of the real suspend job is as follows. 6. Stop the hardware if necessary. -7. Disable the PCI device by calling - :c:func:`pci_disable_device()`. Then, call - :c:func:`pci_save_state()` at last. - A typical code would be like: :: - static int mychip_suspend(struct pci_dev *pci, pm_message_t state) + static int __maybe_unused mychip_suspend(struct device *dev) { /* (1) */ - struct snd_card *card = pci_get_drvdata(pci); + struct snd_card *card = dev_get_drvdata(dev); struct mychip *chip = card->private_data; /* (2) */ snd_power_change_state(card, SNDRV_CTL_POWER_D3hot); @@ -3932,9 +3953,6 @@ A typical code would be like: snd_mychip_save_registers(chip); /* (6) */ snd_mychip_stop_hardware(chip); - /* (7) */ - pci_disable_device(pci); - pci_save_state(pci); return 0; } @@ -3943,44 +3961,35 @@ The scheme of the real resume job is as follows. 1. Retrieve the card and the chip data. -2. Set up PCI. First, call :c:func:`pci_restore_state()`. Then - enable the pci device again by calling - :c:func:`pci_enable_device()`. Call - :c:func:`pci_set_master()` if necessary, too. +2. Re-initialize the chip. -3. Re-initialize the chip. +3. Restore the saved registers if necessary. -4. Restore the saved registers if necessary. +4. Resume the mixer, e.g. calling :c:func:`snd_ac97_resume()`. -5. Resume the mixer, e.g. calling :c:func:`snd_ac97_resume()`. +5. Restart the hardware (if any). -6. Restart the hardware (if any). - -7. Call :c:func:`snd_power_change_state()` with +6. Call :c:func:`snd_power_change_state()` with ``SNDRV_CTL_POWER_D0`` to notify the processes. A typical code would be like: :: - static int mychip_resume(struct pci_dev *pci) + static int __maybe_unused mychip_resume(struct pci_dev *pci) { /* (1) */ - struct snd_card *card = pci_get_drvdata(pci); + struct snd_card *card = dev_get_drvdata(dev); struct mychip *chip = card->private_data; /* (2) */ - pci_restore_state(pci); - pci_enable_device(pci); - pci_set_master(pci); - /* (3) */ snd_mychip_reinit_chip(chip); - /* (4) */ + /* (3) */ snd_mychip_restore_registers(chip); - /* (5) */ + /* (4) */ snd_ac97_resume(chip->ac97); - /* (6) */ + /* (5) */ snd_mychip_restart_chip(chip); - /* (7) */ + /* (6) */ snd_power_change_state(card, SNDRV_CTL_POWER_D0); return 0; } @@ -4046,15 +4055,14 @@ And next, set suspend/resume callbacks to the pci_driver. :: + static SIMPLE_DEV_PM_OPS(snd_my_pm_ops, mychip_suspend, mychip_resume); + static struct pci_driver driver = { .name = KBUILD_MODNAME, .id_table = snd_my_ids, .probe = snd_my_probe, .remove = snd_my_remove, - #ifdef CONFIG_PM - .suspend = snd_my_suspend, - .resume = snd_my_resume, - #endif + .driver.pm = &snd_my_pm_ops, }; Module Parameters @@ -4078,7 +4086,7 @@ variables, instead. ``enable`` option is not always necessary in this case, but it would be better to have a dummy option for compatibility. The module parameters must be declared with the standard -``module_param()()``, ``module_param_array()()`` and +``module_param()``, ``module_param_array()`` and :c:func:`MODULE_PARM_DESC()` macros. The typical coding would be like below: @@ -4094,15 +4102,14 @@ The typical coding would be like below: module_param_array(enable, bool, NULL, 0444); MODULE_PARM_DESC(enable, "Enable " CARD_NAME " soundcard."); -Also, don't forget to define the module description, classes, license -and devices. Especially, the recent modprobe requires to define the +Also, don't forget to define the module description and the license. +Especially, the recent modprobe requires to define the module license as GPL, etc., otherwise the system is shown as “tainted”. :: - MODULE_DESCRIPTION("My Chip"); + MODULE_DESCRIPTION("Sound driver for My Chip"); MODULE_LICENSE("GPL"); - MODULE_SUPPORTED_DEVICE("{{Vendor,My Chip Name}}"); How To Put Your Driver Into ALSA Tree @@ -4117,21 +4124,17 @@ a question now: how to put my own driver into the ALSA driver tree? Here Suppose that you create a new PCI driver for the card “xyz”. The card module name would be snd-xyz. The new driver is usually put into the -alsa-driver tree, ``alsa-driver/pci`` directory in the case of PCI -cards. Then the driver is evaluated, audited and tested by developers -and users. After a certain time, the driver will go to the alsa-kernel -tree (to the corresponding directory, such as ``alsa-kernel/pci``) and -eventually will be integrated into the Linux 2.6 tree (the directory -would be ``linux/sound/pci``). +alsa-driver tree, ``sound/pci`` directory in the case of PCI +cards. In the following sections, the driver code is supposed to be put into -alsa-driver tree. The two cases are covered: a driver consisting of a +Linux kernel tree. The two cases are covered: a driver consisting of a single source file and one consisting of several source files. Driver with A Single Source File -------------------------------- -1. Modify alsa-driver/pci/Makefile +1. Modify sound/pci/Makefile Suppose you have a file xyz.c. Add the following two lines @@ -4160,52 +4163,43 @@ Driver with A Single Source File For the details of Kconfig script, refer to the kbuild documentation. -3. Run cvscompile script to re-generate the configure script and build - the whole stuff again. - Drivers with Several Source Files --------------------------------- Suppose that the driver snd-xyz have several source files. They are -located in the new subdirectory, pci/xyz. +located in the new subdirectory, sound/pci/xyz. -1. Add a new directory (``xyz``) in ``alsa-driver/pci/Makefile`` as - below +1. Add a new directory (``sound/pci/xyz``) in ``sound/pci/Makefile`` + as below :: - obj-$(CONFIG_SND) += xyz/ + obj-$(CONFIG_SND) += sound/pci/xyz/ -2. Under the directory ``xyz``, create a Makefile +2. Under the directory ``sound/pci/xyz``, create a Makefile :: - ifndef SND_TOPDIR - SND_TOPDIR=../.. - endif - - include $(SND_TOPDIR)/toplevel.config - include $(SND_TOPDIR)/Makefile.conf - snd-xyz-objs := xyz.o abc.o def.o - obj-$(CONFIG_SND_XYZ) += snd-xyz.o - include $(SND_TOPDIR)/Rules.make - 3. Create the Kconfig entry This procedure is as same as in the last section. -4. Run cvscompile script to re-generate the configure script and build - the whole stuff again. Useful Functions ================ :c:func:`snd_printk()` and friends ---------------------------------------- +---------------------------------- + +.. note:: This subsection describes a few helper functions for + decorating a bit more on the standard :c:func:`printk()` & co. + However, in general, the use of such helpers is no longer recommended. + If possible, try to stick with the standard functions like + :c:func:`dev_err()` or :c:func:`pr_err()`. ALSA provides a verbose version of the :c:func:`printk()` function. If a kernel config ``CONFIG_SND_VERBOSE_PRINTK`` is set, this function @@ -4221,13 +4215,10 @@ just like :c:func:`snd_printk()`. If the ALSA is compiled without the debugging flag, it's ignored. :c:func:`snd_printdd()` is compiled in only when -``CONFIG_SND_DEBUG_VERBOSE`` is set. Please note that -``CONFIG_SND_DEBUG_VERBOSE`` is not set as default even if you configure -the alsa-driver with ``--with-debug=full`` option. You need to give -explicitly ``--with-debug=detect`` option instead. +``CONFIG_SND_DEBUG_VERBOSE`` is set. :c:func:`snd_BUG()` ------------------------- +------------------- It shows the ``BUG?`` message and stack trace as well as :c:func:`snd_BUG_ON()` at the point. It's useful to show that a @@ -4236,7 +4227,7 @@ fatal error happens there. When no debug flag is set, this macro is ignored. :c:func:`snd_BUG_ON()` ----------------------------- +---------------------- :c:func:`snd_BUG_ON()` macro is similar with :c:func:`WARN_ON()` macro. For example, snd_BUG_ON(!pointer); or diff --git a/Documentation/sphinx-static/theme_overrides.css b/Documentation/sphinx-static/theme_overrides.css index 522b6d4c49d4..e21e36cd6761 100644 --- a/Documentation/sphinx-static/theme_overrides.css +++ b/Documentation/sphinx-static/theme_overrides.css @@ -4,6 +4,44 @@ * */ +/* Improve contrast and increase size for easier reading. */ + +body { + font-family: serif; + color: black; + font-size: 100%; +} + +h1, h2, .rst-content .toctree-wrapper p.caption, h3, h4, h5, h6, legend { + font-family: sans-serif; +} + +.wy-menu-vertical li.current a { + color: #505050; +} + +.wy-menu-vertical li.on a, .wy-menu-vertical li.current > a { + color: #303030; +} + +div[class^="highlight"] pre { + font-family: monospace; + color: black; + font-size: 100%; +} + +.wy-menu-vertical { + font-family: sans-serif; +} + +.c { + font-style: normal; +} + +p { + font-size: 100%; +} + /* Interim: Code-blocks with line nos - lines and line numbers don't line up. * see: https://github.com/rtfd/sphinx_rtd_theme/issues/419 */ diff --git a/Documentation/spi/00-INDEX b/Documentation/spi/00-INDEX deleted file mode 100644 index 8e4bb17d70eb..000000000000 --- a/Documentation/spi/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - this file. -butterfly - - AVR Butterfly SPI driver overview and pin configuration. -ep93xx_spi - - Basic EP93xx SPI driver configuration. -pxa2xx - - PXA2xx SPI master controller build by spi_message fifo wq -spidev - - Intro to the userspace API for spi devices -spi-lm70llp - - Connecting an LM70-LLP sensor to the kernel via the SPI subsys. -spi-sc18is602 - - NXP SC18IS602/603 I2C-bus to SPI bridge -spi-summary - - (Linux) SPI overview. If unsure about SPI or SPI in Linux, start here. diff --git a/Documentation/switchtec.txt b/Documentation/switchtec.txt index f788264921ff..30d6a64e53f7 100644 --- a/Documentation/switchtec.txt +++ b/Documentation/switchtec.txt @@ -23,7 +23,7 @@ The primary means of communicating with the Switchtec management firmware is through the Memory-mapped Remote Procedure Call (MRPC) interface. Commands are submitted to the interface with a 4-byte command identifier and up to 1KB of command specific data. The firmware will -respond with a 4 bytes return code and up to 1KB of command specific +respond with a 4-byte return code and up to 1KB of command-specific data. The interface only processes a single command at a time. @@ -36,8 +36,8 @@ device: /dev/switchtec#, one for each management endpoint in the system. The char device has the following semantics: * A write must consist of at least 4 bytes and no more than 1028 bytes. - The first four bytes will be interpreted as the command to run and - the remainder will be used as the input data. A write will send the + The first 4 bytes will be interpreted as the Command ID and the + remainder will be used as the input data. A write will send the command to the firmware to begin processing. * Each write must be followed by exactly one read. Any double write will @@ -45,9 +45,9 @@ The char device has the following semantics: produce an error. * A read will block until the firmware completes the command and return - the four bytes of status plus up to 1024 bytes of output data. (The - length will be specified by the size parameter of the read call -- - reading less than 4 bytes will produce an error. + the 4-byte Command Return Value plus up to 1024 bytes of output + data. (The length will be specified by the size parameter of the read + call -- reading less than 4 bytes will produce an error.) * The poll call will also be supported for userspace applications that need to do other things while waiting for the command to complete. @@ -83,10 +83,20 @@ The following IOCTLs are also supported by the device: Non-Transparent Bridge (NTB) Driver =================================== -An NTB driver is provided for the switchtec hardware in switchtec_ntb. -Currently, it only supports switches configured with exactly 2 -partitions. It also requires the following configuration settings: +An NTB hardware driver is provided for the Switchtec hardware in +ntb_hw_switchtec. Currently, it only supports switches configured with +exactly 2 NT partitions and zero or more non-NT partitions. It also requires +the following configuration settings: -* Both partitions must be able to access each other's GAS spaces. +* Both NT partitions must be able to access each other's GAS spaces. Thus, the bits in the GAS Access Vector under Management Settings must be set to support this. +* Kernel configuration MUST include support for NTB (CONFIG_NTB needs + to be set) + +NT EP BAR 2 will be dynamically configured as a Direct Window, and +the configuration file does not need to configure it explicitly. + +Please refer to Documentation/ntb.txt in Linux source tree for an overall +understanding of the Linux NTB stack. ntb_hw_switchtec works as an NTB +Hardware Driver in this stack. diff --git a/Documentation/sysctl/00-INDEX b/Documentation/sysctl/00-INDEX deleted file mode 100644 index 8cf5d493fd03..000000000000 --- a/Documentation/sysctl/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - this file. -README - - general information about /proc/sys/ sysctl files. -abi.txt - - documentation for /proc/sys/abi/*. -fs.txt - - documentation for /proc/sys/fs/*. -kernel.txt - - documentation for /proc/sys/kernel/*. -net.txt - - documentation for /proc/sys/net/*. -sunrpc.txt - - documentation for /proc/sys/sunrpc/*. -vm.txt - - documentation for /proc/sys/vm/*. diff --git a/Documentation/timers/00-INDEX b/Documentation/timers/00-INDEX deleted file mode 100644 index 3be05fe0f1f9..000000000000 --- a/Documentation/timers/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - this file -highres.txt - - High resolution timers and dynamic ticks design notes -hpet.txt - - High Precision Event Timer Driver for Linux -hrtimers.txt - - subsystem for high-resolution kernel timers -NO_HZ.txt - - Summary of the different methods for the scheduler clock-interrupts management. -timekeeping.txt - - Clock sources, clock events, sched_clock() and delay timer notes -timers-howto.txt - - how to insert delays in the kernel the right (tm) way. -timer_stats.txt - - timer usage statistics diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst index 7ea16a0ceffc..f82434f2795e 100644 --- a/Documentation/trace/ftrace.rst +++ b/Documentation/trace/ftrace.rst @@ -2987,6 +2987,9 @@ The following commands are supported: command, it only prints out the contents of the ring buffer for the CPU that executed the function that triggered the dump. +- stacktrace: + When the function is hit, a stack trace is recorded. + trace_pipe ---------- diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst index 5ac724baea7d..7dda76503127 100644 --- a/Documentation/trace/histogram.rst +++ b/Documentation/trace/histogram.rst @@ -1765,7 +1765,7 @@ For example, here's how a latency can be calculated:: # echo 'hist:keys=pid,prio:ts0=common_timestamp ...' >> event1/trigger # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-$ts0 ...' >> event2/trigger -In the first line above, the event's timetamp is saved into the +In the first line above, the event's timestamp is saved into the variable ts0. In the next line, ts0 is subtracted from the second event's timestamp to produce the latency, which is then assigned into yet another variable, 'wakeup_lat'. The hist trigger below in turn @@ -1811,7 +1811,7 @@ the command that defined it with a '!':: /sys/kernel/debug/tracing/synthetic_events At this point, there isn't yet an actual 'wakeup_latency' event -instantiated in the event subsytem - for this to happen, a 'hist +instantiated in the event subsystem - for this to happen, a 'hist trigger action' needs to be instantiated and bound to actual fields and variables defined on other events (see Section 2.2.3 below on how that is done using hist trigger 'onmatch' action). Once that is @@ -1837,7 +1837,7 @@ output can be displayed by reading the event's 'hist' file. A hist trigger 'action' is a function that's executed whenever a histogram entry is added or updated. -The default 'action' if no special function is explicity specified is +The default 'action' if no special function is explicitly specified is as it always has been, to simply update the set of values associated with an entry. Some applications, however, may want to perform additional actions at that point, such as generate another event, or diff --git a/Documentation/trace/stm.rst b/Documentation/trace/stm.rst index 2c22ddb7fd3e..99f99963e5e7 100644 --- a/Documentation/trace/stm.rst +++ b/Documentation/trace/stm.rst @@ -1,3 +1,5 @@ +.. SPDX-License-Identifier: GPL-2.0 + =================== System Trace Module =================== @@ -53,12 +55,30 @@ under "user" directory from the example above and this new rule will be used for trace sources with the id string of "user/dummy". Trace sources have to open the stm class device's node and write their -trace data into its file descriptor. In order to identify themselves -to the policy, they need to do a STP_POLICY_ID_SET ioctl on this file -descriptor providing their id string. Otherwise, they will be -automatically allocated a master/channel pair upon first write to this -file descriptor according to the "default" rule of the policy, if such -exists. +trace data into its file descriptor. + +In order to find an appropriate policy node for a given trace source, +several mechanisms can be used. First, a trace source can explicitly +identify itself by calling an STP_POLICY_ID_SET ioctl on the character +device's file descriptor, providing their id string, before they write +any data there. Secondly, if they chose not to perform the explicit +identification (because you may not want to patch existing software +to do this), they can just start writing the data, at which point the +stm core will try to find a policy node with the name matching the +task's name (e.g., "syslogd") and if one exists, it will be used. +Thirdly, if the task name can't be found among the policy nodes, the +catch-all entry "default" will be used, if it exists. This entry also +needs to be created and configured by the system administrator or +whatever tools are taking care of the policy configuration. Finally, +if all the above steps failed, the write() to an stm file descriptor +will return a error (EINVAL). + +Previously, if no policy nodes were found for a trace source, the stm +class would silently fall back to allocating the first available +contiguous range of master/channels from the beginning of the device's +master/channel range. The new requirement for a policy node to exist +will help programmers and sysadmins identify gaps in configuration +and have better control over the un-identified sources. Some STM devices may allow direct mapping of the channel mmio regions to userspace for zero-copy writing. One mappable page (in terms of @@ -92,9 +112,9 @@ allocated for the device according to the policy configuration. If there's a node in the root of the policy directory that matches the stm_source device's name (for example, "console"), this node will be used to allocate master and channel numbers. If there's no such policy -node, the stm core will pick the first contiguous chunk of channels -within the first available master. Note that the node must exist -before the stm_source device is connected to its stm device. +node, the stm core will use the catch-all entry "default", if one +exists. If neither policy nodes exist, the write() to stm_source_link +will return an error. stm_console =========== diff --git a/Documentation/trace/sys-t.rst b/Documentation/trace/sys-t.rst new file mode 100644 index 000000000000..3d8eb92735e9 --- /dev/null +++ b/Documentation/trace/sys-t.rst @@ -0,0 +1,62 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================== +MIPI SyS-T over STP +=================== + +The MIPI SyS-T protocol driver can be used with STM class devices to +generate standardized trace stream. Aside from being a standard, it +provides better trace source identification and timestamp correlation. + +In order to use the MIPI SyS-T protocol driver with your STM device, +first, you'll need CONFIG_STM_PROTO_SYS_T. + +Now, you can select which protocol driver you want to use when you create +a policy for your STM device, by specifying it in the policy name: + +# mkdir /config/stp-policy/dummy_stm.0:p_sys-t.my-policy/ + +In other words, the policy name format is extended like this: + + <device_name>:<protocol_name>.<policy_name> + +With Intel TH, therefore it can look like "0-sth:p_sys-t.my-policy". + +If the protocol name is omitted, the STM class will chose whichever +protocol driver was loaded first. + +You can also double check that everything is working as expected by + +# cat /config/stp-policy/dummy_stm.0:p_sys-t.my-policy/protocol +p_sys-t + +Now, with the MIPI SyS-T protocol driver, each policy node in the +configfs gets a few additional attributes, which determine per-source +parameters specific to the protocol: + +# mkdir /config/stp-policy/dummy_stm.0:p_sys-t.my-policy/default +# ls /config/stp-policy/dummy_stm.0:p_sys-t.my-policy/default +channels +clocksync_interval +do_len +masters +ts_interval +uuid + +The most important one here is the "uuid", which determines the UUID +that will be used to tag all data coming from this source. It is +automatically generated when a new node is created, but it is likely +that you would want to change it. + +do_len switches on/off the additional "payload length" field in the +MIPI SyS-T message header. It is off by default as the STP already +marks message boundaries. + +ts_interval and clocksync_interval determine how much time in milliseconds +can pass before we need to include a protocol (not transport, aka STP) +timestamp in a message header or send a CLOCKSYNC packet, respectively. + +See Documentation/ABI/testing/configfs-stp-policy-p_sys-t for more +details. + +* [1] https://www.mipi.org/specifications/sys-t diff --git a/Documentation/virtual/00-INDEX b/Documentation/virtual/00-INDEX deleted file mode 100644 index af0d23968ee7..000000000000 --- a/Documentation/virtual/00-INDEX +++ /dev/null @@ -1,11 +0,0 @@ -Virtualization support in the Linux kernel. - -00-INDEX - - this file. - -paravirt_ops.txt - - Describes the Linux kernel pv_ops to support different hypervisors -kvm/ - - Kernel Virtual Machine. See also http://linux-kvm.org -uml/ - - User Mode Linux, builds/runs Linux kernel as a userspace program. diff --git a/Documentation/virtual/kvm/00-INDEX b/Documentation/virtual/kvm/00-INDEX deleted file mode 100644 index 3492458a4ae8..000000000000 --- a/Documentation/virtual/kvm/00-INDEX +++ /dev/null @@ -1,35 +0,0 @@ -00-INDEX - - this file. -amd-memory-encryption.rst - - notes on AMD Secure Encrypted Virtualization feature and SEV firmware - command description -api.txt - - KVM userspace API. -arm - - internal ABI between the kernel and HYP (for arm/arm64) -cpuid.txt - - KVM-specific cpuid leaves (x86). -devices/ - - KVM_CAP_DEVICE_CTRL userspace API. -halt-polling.txt - - notes on halt-polling -hypercalls.txt - - KVM hypercalls. -locking.txt - - notes on KVM locks. -mmu.txt - - the x86 kvm shadow mmu. -msr.txt - - KVM-specific MSRs (x86). -nested-vmx.txt - - notes on nested virtualization for Intel x86 processors. -ppc-pv.txt - - the paravirtualization interface on PowerPC. -review-checklist.txt - - review checklist for KVM patches. -s390-diag.txt - - Diagnose hypercall description (for IBM S/390) -timekeeping.txt - - timekeeping virtualization for x86-based architectures. -vcpu-requests.rst - - internal VCPU request API diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 647f94128a85..cd209f7730af 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -123,6 +123,37 @@ memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the flag KVM_VM_MIPS_VZ. +On arm64, the physical address size for a VM (IPA Size limit) is limited +to 40bits by default. The limit can be configured if the host supports the +extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use +KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type +identifier, where IPA_Bits is the maximum width of any physical +address used by the VM. The IPA_Bits is encoded in bits[7-0] of the +machine type identifier. + +e.g, to configure a guest to use 48bit physical address size : + + vm_fd = ioctl(dev_fd, KVM_CREATE_VM, KVM_VM_TYPE_ARM_IPA_SIZE(48)); + +The requested size (IPA_Bits) must be : + 0 - Implies default size, 40bits (for backward compatibility) + + or + + N - Implies N bits, where N is a positive integer such that, + 32 <= N <= Host_IPA_Limit + +Host_IPA_Limit is the maximum possible value for IPA_Bits on the host and +is dependent on the CPU capability and the kernel configuration. The limit can +be retrieved using KVM_CAP_ARM_VM_IPA_SIZE of the KVM_CHECK_EXTENSION +ioctl() at run-time. + +Please note that configuring the IPA size does not affect the capability +exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects +size of the address translated by the stage2 level (guest physical to +host physical address translations). + + 4.3 KVM_GET_MSR_INDEX_LIST, KVM_GET_MSR_FEATURE_INDEX_LIST Capability: basic, KVM_CAP_GET_MSR_FEATURES for KVM_GET_MSR_FEATURE_INDEX_LIST @@ -850,7 +881,7 @@ struct kvm_vcpu_events { __u8 injected; __u8 nr; __u8 has_error_code; - __u8 pad; + __u8 pending; __u32 error_code; } exception; struct { @@ -873,15 +904,23 @@ struct kvm_vcpu_events { __u8 smm_inside_nmi; __u8 latched_init; } smi; + __u8 reserved[27]; + __u8 exception_has_payload; + __u64 exception_payload; }; -Only two fields are defined in the flags field: +The following bits are defined in the flags field: -- KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that +- KVM_VCPUEVENT_VALID_SHADOW may be set to signal that interrupt.shadow contains a valid state. -- KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that - smi contains a valid state. +- KVM_VCPUEVENT_VALID_SMM may be set to signal that smi contains a + valid state. + +- KVM_VCPUEVENT_VALID_PAYLOAD may be set to signal that the + exception_has_payload, exception_payload, and exception.pending + fields contain a valid state. This bit will be set whenever + KVM_CAP_EXCEPTION_PAYLOAD is enabled. ARM/ARM64: @@ -961,6 +1000,11 @@ shall be written into the VCPU. KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available. +If KVM_CAP_EXCEPTION_PAYLOAD is enabled, KVM_VCPUEVENT_VALID_PAYLOAD +can be set in the flags field to signal that the +exception_has_payload, exception_payload, and exception.pending fields +contain a valid state and shall be written into the VCPU. + ARM/ARM64: Set the pending SError exception state for this VCPU. It is not possible to @@ -1922,6 +1966,7 @@ registers, find a list below: PPC | KVM_REG_PPC_TIDR | 64 PPC | KVM_REG_PPC_PSSCR | 64 PPC | KVM_REG_PPC_DEC_EXPIRY | 64 + PPC | KVM_REG_PPC_PTCR | 64 PPC | KVM_REG_PPC_TM_GPR0 | 64 ... PPC | KVM_REG_PPC_TM_GPR31 | 64 @@ -2269,6 +2314,10 @@ The supported flags are: The emulated MMU supports 1T segments in addition to the standard 256M ones. + - KVM_PPC_NO_HASH + This flag indicates that HPT guests are not supported by KVM, + thus all guests must use radix MMU mode. + The "slb_size" field indicates how many SLB entries are supported The "sps" array contains 8 entries indicating the supported base @@ -3676,6 +3725,34 @@ Returns: 0 on success, -1 on error This copies the vcpu's kvm_nested_state struct from userspace to the kernel. For the definition of struct kvm_nested_state, see KVM_GET_NESTED_STATE. +4.116 KVM_(UN)REGISTER_COALESCED_MMIO + +Capability: KVM_CAP_COALESCED_MMIO (for coalesced mmio) + KVM_CAP_COALESCED_PIO (for coalesced pio) +Architectures: all +Type: vm ioctl +Parameters: struct kvm_coalesced_mmio_zone +Returns: 0 on success, < 0 on error + +Coalesced I/O is a performance optimization that defers hardware +register write emulation so that userspace exits are avoided. It is +typically used to reduce the overhead of emulating frequently accessed +hardware registers. + +When a hardware register is configured for coalesced I/O, write accesses +do not exit to userspace and their value is recorded in a ring buffer +that is shared between kernel and userspace. + +Coalesced I/O is used if one or more write accesses to a hardware +register can be deferred until a read or a write to another hardware +register on the same device. This last access will cause a vmexit and +userspace will process accesses from the ring buffer before emulating +it. That will avoid exiting to userspace on repeated writes. + +Coalesced pio is based on coalesced mmio. There is little difference +between coalesced mmio and pio except that coalesced pio records accesses +to I/O ports. + 5. The kvm_run structure ------------------------ @@ -4522,7 +4599,7 @@ hpage module parameter is not set to 1, -EINVAL is returned. While it is generally possible to create a huge page backed VM without this capability, the VM will not be able to run. -7.14 KVM_CAP_MSR_PLATFORM_INFO +7.15 KVM_CAP_MSR_PLATFORM_INFO Architectures: x86 Parameters: args[0] whether feature should be enabled or not @@ -4531,6 +4608,45 @@ With this capability, a guest may read the MSR_PLATFORM_INFO MSR. Otherwise, a #GP would be raised when the guest tries to access. Currently, this capability does not enable write permissions of this MSR for the guest. +7.16 KVM_CAP_PPC_NESTED_HV + +Architectures: ppc +Parameters: none +Returns: 0 on success, -EINVAL when the implementation doesn't support + nested-HV virtualization. + +HV-KVM on POWER9 and later systems allows for "nested-HV" +virtualization, which provides a way for a guest VM to run guests that +can run using the CPU's supervisor mode (privileged non-hypervisor +state). Enabling this capability on a VM depends on the CPU having +the necessary functionality and on the facility being enabled with a +kvm-hv module parameter. + +7.17 KVM_CAP_EXCEPTION_PAYLOAD + +Architectures: x86 +Parameters: args[0] whether feature should be enabled or not + +With this capability enabled, CR2 will not be modified prior to the +emulated VM-exit when L1 intercepts a #PF exception that occurs in +L2. Similarly, for kvm-intel only, DR6 will not be modified prior to +the emulated VM-exit when L1 intercepts a #DB exception that occurs in +L2. As a result, when KVM_GET_VCPU_EVENTS reports a pending #PF (or +#DB) exception for L2, exception.has_payload will be set and the +faulting address (or the new DR6 bits*) will be reported in the +exception_payload field. Similarly, when userspace injects a #PF (or +#DB) into L2 using KVM_SET_VCPU_EVENTS, it is expected to set +exception.has_payload and to put the faulting address (or the new DR6 +bits*) in the exception_payload field. + +This capability also enables exception.pending in struct +kvm_vcpu_events, which allows userspace to distinguish between pending +and injected exceptions. + + +* For the new DR6 bits, note that bit 16 is set iff the #DB exception + will clear DR6.RTM. + 8. Other capabilities. ---------------------- @@ -4772,3 +4888,10 @@ CPU when the exception is taken. If this virtual SError is taken to EL1 using AArch64, this value will be reported in the ISS field of ESR_ELx. See KVM_CAP_VCPU_EVENTS for more details. +8.20 KVM_CAP_HYPERV_SEND_IPI + +Architectures: x86 + +This capability indicates that KVM supports paravirtualized Hyper-V IPI send +hypercalls: +HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx. diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX deleted file mode 100644 index f4a4f3e884cf..000000000000 --- a/Documentation/vm/00-INDEX +++ /dev/null @@ -1,50 +0,0 @@ -00-INDEX - - this file. -active_mm.rst - - An explanation from Linus about tsk->active_mm vs tsk->mm. -balance.rst - - various information on memory balancing. -cleancache.rst - - Intro to cleancache and page-granularity victim cache. -frontswap.rst - - Outline frontswap, part of the transcendent memory frontend. -highmem.rst - - Outline of highmem and common issues. -hmm.rst - - Documentation of heterogeneous memory management -hugetlbfs_reserv.rst - - A brief overview of hugetlbfs reservation design/implementation. -hwpoison.rst - - explains what hwpoison is -ksm.rst - - how to use the Kernel Samepage Merging feature. -mmu_notifier.rst - - a note about clearing pte/pmd and mmu notifications -numa.rst - - information about NUMA specific code in the Linux vm. -overcommit-accounting.rst - - description of the Linux kernels overcommit handling modes. -page_frags.rst - - description of page fragments allocator -page_migration.rst - - description of page migration in NUMA systems. -page_owner.rst - - tracking about who allocated each page -remap_file_pages.rst - - a note about remap_file_pages() system call -slub.rst - - a short users guide for SLUB. -split_page_table_lock.rst - - Separate per-table lock to improve scalability of the old page_table_lock. -swap_numa.rst - - automatic binding of swap device to numa node -transhuge.rst - - Transparent Hugepage Support, alternative way of using hugepages. -unevictable-lru.rst - - Unevictable LRU infrastructure -z3fold.txt - - outline of z3fold allocator for storing compressed pages -zsmalloc.rst - - outline of zsmalloc allocator for storing compressed pages -zswap.rst - - Intro to compressed cache for swap pages diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst index cdf3911582c8..44205f0b671f 100644 --- a/Documentation/vm/hmm.rst +++ b/Documentation/vm/hmm.rst @@ -194,13 +194,13 @@ use either:: unsigned long start, unsigned long end, hmm_pfn_t *pfns); - int hmm_vma_fault(struct vm_area_struct *vma, - struct hmm_range *range, - unsigned long start, - unsigned long end, - hmm_pfn_t *pfns, - bool write, - bool block); + int hmm_vma_fault(struct vm_area_struct *vma, + struct hmm_range *range, + unsigned long start, + unsigned long end, + hmm_pfn_t *pfns, + bool write, + bool block); The first one (hmm_vma_get_pfns()) will only fetch present CPU page table entries and will not trigger a page fault on missing or non-present entries. diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst index 3a775fd64e2d..195928808bac 100644 --- a/Documentation/vm/slub.rst +++ b/Documentation/vm/slub.rst @@ -36,9 +36,10 @@ debugging is enabled. Format: slub_debug=<Debug-Options> Enable options for all slabs -slub_debug=<Debug-Options>,<slab name> - Enable options only for select slabs +slub_debug=<Debug-Options>,<slab name1>,<slab name2>,... + Enable options only for select slabs (no spaces + after a comma) Possible debug options are:: @@ -62,7 +63,12 @@ Trying to find an issue in the dentry cache? Try:: slub_debug=,dentry -to only enable debugging on the dentry cache. +to only enable debugging on the dentry cache. You may use an asterisk at the +end of the slab name, in order to cover all slabs with the same prefix. For +example, here's how you can poison the dentry cache as well as all kmalloc +slabs: + + slub_debug=P,kmalloc-*,dentry Red zoning and tracking may realign the slab. We can just apply sanity checks to the dentry cache with:: diff --git a/Documentation/w1/00-INDEX b/Documentation/w1/00-INDEX deleted file mode 100644 index cb49802745dc..000000000000 --- a/Documentation/w1/00-INDEX +++ /dev/null @@ -1,10 +0,0 @@ -00-INDEX - - This file -slaves/ - - Drivers that provide support for specific family codes. -masters/ - - Individual chips providing 1-wire busses. -w1.generic - - The 1-wire (w1) bus -w1.netlink - - Userspace communication protocol over connector [1]. diff --git a/Documentation/w1/masters/00-INDEX b/Documentation/w1/masters/00-INDEX deleted file mode 100644 index 8330cf9325f0..000000000000 --- a/Documentation/w1/masters/00-INDEX +++ /dev/null @@ -1,12 +0,0 @@ -00-INDEX - - This file -ds2482 - - The Maxim/Dallas Semiconductor DS2482 provides 1-wire busses. -ds2490 - - The Maxim/Dallas Semiconductor DS2490 builds USB <-> W1 bridges. -mxc-w1 - - W1 master controller driver found on Freescale MX2/MX3 SoCs -omap-hdq - - HDQ/1-wire module of TI OMAP 2430/3430. -w1-gpio - - GPIO 1-wire bus master driver. diff --git a/Documentation/w1/slaves/00-INDEX b/Documentation/w1/slaves/00-INDEX deleted file mode 100644 index 68946f83e579..000000000000 --- a/Documentation/w1/slaves/00-INDEX +++ /dev/null @@ -1,14 +0,0 @@ -00-INDEX - - This file -w1_therm - - The Maxim/Dallas Semiconductor ds18*20 temperature sensor. -w1_ds2413 - - The Maxim/Dallas Semiconductor ds2413 dual channel addressable switch. -w1_ds2423 - - The Maxim/Dallas Semiconductor ds2423 counter device. -w1_ds2438 - - The Maxim/Dallas Semiconductor ds2438 smart battery monitor. -w1_ds28e04 - - The Maxim/Dallas Semiconductor ds28e04 eeprom. -w1_ds28e17 - - The Maxim/Dallas Semiconductor ds28e17 1-Wire-to-I2C Master Bridge. diff --git a/Documentation/x86/00-INDEX b/Documentation/x86/00-INDEX deleted file mode 100644 index 3bb2ee3edcd1..000000000000 --- a/Documentation/x86/00-INDEX +++ /dev/null @@ -1,20 +0,0 @@ -00-INDEX - - this file -boot.txt - - List of boot protocol versions -earlyprintk.txt - - Using earlyprintk with a USB2 debug port key. -entry_64.txt - - Describe (some of the) kernel entry points for x86. -exception-tables.txt - - why and how Linux kernel uses exception tables on x86 -microcode.txt - - How to load microcode from an initrd-CPIO archive early to fix CPU issues. -mtrr.txt - - how to use x86 Memory Type Range Registers to increase performance -pat.txt - - Page Attribute Table intro and API -usb-legacy-support.txt - - how to fix/avoid quirks when using emulated PS/2 mouse/keyboard. -zero-page.txt - - layout of the first page of memory. diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt index 2a4ee6302122..481d8d8536ac 100644 --- a/Documentation/x86/pat.txt +++ b/Documentation/x86/pat.txt @@ -90,12 +90,12 @@ pci proc | -- | -- | WC | Advanced APIs for drivers ------------------------- A. Exporting pages to users with remap_pfn_range, io_remap_pfn_range, -vm_insert_pfn +vmf_insert_pfn Drivers wanting to export some pages to userspace do it by using mmap interface and a combination of 1) pgprot_noncached() -2) io_remap_pfn_range() or remap_pfn_range() or vm_insert_pfn() +2) io_remap_pfn_range() or remap_pfn_range() or vmf_insert_pfn() With PAT support, a new API pgprot_writecombine is being added. So, drivers can continue to use the above sequence, with either pgprot_noncached() or diff --git a/Documentation/x86/x86_64/00-INDEX b/Documentation/x86/x86_64/00-INDEX deleted file mode 100644 index 92fc20ab5f0e..000000000000 --- a/Documentation/x86/x86_64/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - This file -boot-options.txt - - AMD64-specific boot options. -cpu-hotplug-spec - - Firmware support for CPU hotplug under Linux/x86-64 -fake-numa-for-cpusets - - Using numa=fake and CPUSets for Resource Management -kernel-stacks - - Context-specific per-processor interrupt stacks. -machinecheck - - Configurable sysfs parameters for the x86-64 machine check code. -mm.txt - - Memory layout of x86-64 (4 level page tables, 46 bits physical). -uefi.txt - - Booting Linux via Unified Extensible Firmware Interface. |