summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/obsolete/sysfs-cpuidle9
-rw-r--r--Documentation/ABI/obsolete/sysfs-driver-intel_pmc_bxt22
-rw-r--r--Documentation/ABI/stable/sysfs-devices-node2
-rw-r--r--Documentation/ABI/testing/debugfs-hisi-hpre89
-rw-r--r--Documentation/ABI/testing/debugfs-hisi-sec94
-rw-r--r--Documentation/ABI/testing/debugfs-hisi-zip70
-rw-r--r--Documentation/ABI/testing/dev-kmsg5
-rw-r--r--Documentation/ABI/testing/procfs-smaps_rollup2
-rw-r--r--Documentation/ABI/testing/sysfs-class-net13
-rw-r--r--Documentation/ABI/testing/sysfs-devices-system-cpu24
-rw-r--r--Documentation/ABI/testing/sysfs-platform-dptf62
-rw-r--r--Documentation/ABI/testing/sysfs-platform-intel-wmi-sbl-fw-update12
-rw-r--r--Documentation/Makefile6
-rw-r--r--Documentation/PCI/boot-interrupts.rst34
-rw-r--r--Documentation/RCU/Design/Requirements/Requirements.rst61
-rw-r--r--Documentation/admin-guide/acpi/ssdt-overlays.rst2
-rw-r--r--Documentation/admin-guide/bug-hunting.rst53
-rw-r--r--Documentation/admin-guide/cgroup-v1/memory.rst19
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst24
-rw-r--r--Documentation/admin-guide/cpu-load.rst2
-rw-r--r--Documentation/admin-guide/device-mapper/dm-integrity.rst15
-rw-r--r--Documentation/admin-guide/hw-vuln/l1tf.rst2
-rw-r--r--Documentation/admin-guide/init.rst76
-rw-r--r--Documentation/admin-guide/kdump/vmcoreinfo.rst6
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt109
-rw-r--r--Documentation/admin-guide/kernel-per-CPU-kthreads.rst2
-rw-r--r--Documentation/admin-guide/mm/hugetlbpage.rst35
-rw-r--r--Documentation/admin-guide/mm/transhuge.rst7
-rw-r--r--Documentation/admin-guide/mm/userfaultfd.rst211
-rw-r--r--Documentation/admin-guide/nfs/nfsroot.rst2
-rw-r--r--Documentation/admin-guide/numastat.rst31
-rw-r--r--Documentation/admin-guide/perf-security.rst86
-rw-r--r--Documentation/admin-guide/pm/cpuidle.rst20
-rw-r--r--Documentation/admin-guide/pm/intel-speed-select.rst917
-rw-r--r--Documentation/admin-guide/pm/intel_pstate.rst32
-rw-r--r--Documentation/admin-guide/pm/working-state.rst1
-rw-r--r--Documentation/admin-guide/pstore-blk.rst243
-rw-r--r--Documentation/admin-guide/ramoops.rst14
-rw-r--r--Documentation/admin-guide/ras.rst28
-rw-r--r--Documentation/admin-guide/serial-console.rst2
-rw-r--r--Documentation/admin-guide/sysctl/kernel.rst173
-rw-r--r--Documentation/admin-guide/sysctl/net.rst8
-rw-r--r--Documentation/admin-guide/sysctl/vm.rst23
-rw-r--r--Documentation/arm64/amu.rst5
-rw-r--r--Documentation/arm64/booting.rst39
-rw-r--r--Documentation/arm64/cpu-feature-registers.rst2
-rw-r--r--Documentation/arm64/elf_hwcaps.rst5
-rw-r--r--Documentation/arm64/silicon-errata.rst8
-rw-r--r--Documentation/block/biovecs.rst2
-rw-r--r--Documentation/block/index.rst1
-rw-r--r--Documentation/block/inline-encryption.rst263
-rw-r--r--Documentation/bpf/bpf_devel_QA.rst15
-rw-r--r--Documentation/bpf/index.rst4
-rw-r--r--Documentation/bpf/ringbuf.rst209
-rw-r--r--Documentation/conf.py38
-rw-r--r--Documentation/core-api/cachetlb.rst2
-rw-r--r--Documentation/core-api/debugging-via-ohci1394.rst (renamed from Documentation/debugging-via-ohci1394.txt)0
-rw-r--r--Documentation/core-api/dma-api-howto.rst (renamed from Documentation/DMA-API-HOWTO.txt)0
-rw-r--r--Documentation/core-api/dma-api.rst (renamed from Documentation/DMA-API.txt)0
-rw-r--r--Documentation/core-api/dma-attributes.rst (renamed from Documentation/DMA-attributes.txt)0
-rw-r--r--Documentation/core-api/dma-isa-lpc.rst (renamed from Documentation/DMA-ISA-LPC.txt)0
-rw-r--r--Documentation/core-api/index.rst9
-rw-r--r--Documentation/core-api/irq/concepts.rst (renamed from Documentation/IRQ.txt)0
-rw-r--r--Documentation/core-api/irq/index.rst11
-rw-r--r--Documentation/core-api/irq/irq-affinity.rst (renamed from Documentation/IRQ-affinity.txt)0
-rw-r--r--Documentation/core-api/irq/irq-domain.rst (renamed from Documentation/IRQ-domain.txt)3
-rw-r--r--Documentation/core-api/irq/irqflags-tracing.rst (renamed from Documentation/irqflags-tracing.txt)0
-rw-r--r--Documentation/core-api/kobject.rst28
-rw-r--r--Documentation/core-api/kref.rst (renamed from Documentation/kref.txt)0
-rw-r--r--Documentation/core-api/padata.rst41
-rw-r--r--Documentation/core-api/printk-basics.rst115
-rw-r--r--Documentation/core-api/printk-formats.rst38
-rw-r--r--Documentation/core-api/protection-keys.rst5
-rw-r--r--Documentation/core-api/rbtree.rst (renamed from Documentation/rbtree.txt)0
-rw-r--r--Documentation/dev-tools/kgdb.rst24
-rw-r--r--Documentation/dev-tools/kselftest.rst3
-rw-r--r--Documentation/devicetree/bindings/Makefile21
-rw-r--r--Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt36
-rw-r--r--Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml64
-rw-r--r--Documentation/devicetree/bindings/display/allwinner,sun6i-a31-mipi-dsi.yaml2
-rw-r--r--Documentation/devicetree/bindings/display/bridge/adi,adv7123.txt50
-rw-r--r--Documentation/devicetree/bindings/display/bridge/anx6345.yaml8
-rw-r--r--Documentation/devicetree/bindings/display/bridge/chrontel,ch7033.yaml77
-rw-r--r--Documentation/devicetree/bindings/display/bridge/dumb-vga-dac.txt50
-rw-r--r--Documentation/devicetree/bindings/display/bridge/dw_mipi_dsi.txt32
-rw-r--r--Documentation/devicetree/bindings/display/bridge/ite,it6505.yaml91
-rw-r--r--Documentation/devicetree/bindings/display/bridge/lvds-codec.yaml8
-rw-r--r--Documentation/devicetree/bindings/display/bridge/nwl-dsi.yaml226
-rw-r--r--Documentation/devicetree/bindings/display/bridge/ps8640.yaml8
-rw-r--r--Documentation/devicetree/bindings/display/bridge/simple-bridge.yaml99
-rw-r--r--Documentation/devicetree/bindings/display/bridge/snps,dw-mipi-dsi.yaml68
-rw-r--r--Documentation/devicetree/bindings/display/bridge/thine,thc63lvd1024.txt66
-rw-r--r--Documentation/devicetree/bindings/display/bridge/thine,thc63lvd1024.yaml121
-rw-r--r--Documentation/devicetree/bindings/display/bridge/ti,ths813x.txt51
-rw-r--r--Documentation/devicetree/bindings/display/dsi-controller.yaml4
-rw-r--r--Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.txt6
-rw-r--r--Documentation/devicetree/bindings/display/mediatek/mediatek,dsi.txt10
-rw-r--r--Documentation/devicetree/bindings/display/panel/arm,versatile-tft-panel.txt31
-rw-r--r--Documentation/devicetree/bindings/display/panel/arm,versatile-tft-panel.yaml54
-rw-r--r--Documentation/devicetree/bindings/display/panel/asus,z00t-tm5p5-nt35596.yaml56
-rw-r--r--Documentation/devicetree/bindings/display/panel/boe,himax8279d.txt24
-rw-r--r--Documentation/devicetree/bindings/display/panel/boe,himax8279d.yaml59
-rw-r--r--Documentation/devicetree/bindings/display/panel/boe,tv101wum-nl6.yaml2
-rw-r--r--Documentation/devicetree/bindings/display/panel/display-timings.yaml8
-rw-r--r--Documentation/devicetree/bindings/display/panel/feiyang,fy07024di26a30d.txt20
-rw-r--r--Documentation/devicetree/bindings/display/panel/feiyang,fy07024di26a30d.yaml58
-rw-r--r--Documentation/devicetree/bindings/display/panel/ilitek,ili9322.txt49
-rw-r--r--Documentation/devicetree/bindings/display/panel/ilitek,ili9322.yaml71
-rw-r--r--Documentation/devicetree/bindings/display/panel/ilitek,ili9881c.txt20
-rw-r--r--Documentation/devicetree/bindings/display/panel/ilitek,ili9881c.yaml50
-rw-r--r--Documentation/devicetree/bindings/display/panel/innolux,p097pfg.txt24
-rw-r--r--Documentation/devicetree/bindings/display/panel/innolux,p097pfg.yaml56
-rw-r--r--Documentation/devicetree/bindings/display/panel/innolux,p120zdg-bf1.txt22
-rw-r--r--Documentation/devicetree/bindings/display/panel/innolux,p120zdg-bf1.yaml43
-rw-r--r--Documentation/devicetree/bindings/display/panel/jdi,lt070me05000.txt31
-rw-r--r--Documentation/devicetree/bindings/display/panel/jdi,lt070me05000.yaml69
-rw-r--r--Documentation/devicetree/bindings/display/panel/kingdisplay,kd035g6-54nt.txt42
-rw-r--r--Documentation/devicetree/bindings/display/panel/kingdisplay,kd035g6-54nt.yaml65
-rw-r--r--Documentation/devicetree/bindings/display/panel/kingdisplay,kd097d04.txt22
-rw-r--r--Documentation/devicetree/bindings/display/panel/leadtek,ltk050h3146w.yaml51
-rw-r--r--Documentation/devicetree/bindings/display/panel/leadtek,ltk500hd1829.yaml1
-rw-r--r--Documentation/devicetree/bindings/display/panel/lg,acx467akm-7.txt7
-rw-r--r--Documentation/devicetree/bindings/display/panel/lg,ld070wx3-sl01.txt7
-rw-r--r--Documentation/devicetree/bindings/display/panel/lg,lg4573.txt19
-rw-r--r--Documentation/devicetree/bindings/display/panel/lg,lg4573.yaml45
-rw-r--r--Documentation/devicetree/bindings/display/panel/lg,lh500wx1-sd03.txt7
-rw-r--r--Documentation/devicetree/bindings/display/panel/lgphilips,lb035q02.txt33
-rw-r--r--Documentation/devicetree/bindings/display/panel/lgphilips,lb035q02.yaml59
-rw-r--r--Documentation/devicetree/bindings/display/panel/lvds.yaml10
-rw-r--r--Documentation/devicetree/bindings/display/panel/olimex,lcd-olinuxino.txt42
-rw-r--r--Documentation/devicetree/bindings/display/panel/olimex,lcd-olinuxino.yaml70
-rw-r--r--Documentation/devicetree/bindings/display/panel/osddisplays,osd101t2587-53ts.txt14
-rw-r--r--Documentation/devicetree/bindings/display/panel/panel-common.yaml17
-rw-r--r--Documentation/devicetree/bindings/display/panel/panel-simple-dsi.yaml14
-rw-r--r--Documentation/devicetree/bindings/display/panel/panel-simple.yaml22
-rw-r--r--Documentation/devicetree/bindings/display/panel/raydium,rm67191.txt41
-rw-r--r--Documentation/devicetree/bindings/display/panel/raydium,rm67191.yaml75
-rw-r--r--Documentation/devicetree/bindings/display/panel/samsung,amoled-mipi-dsi.yaml65
-rw-r--r--Documentation/devicetree/bindings/display/panel/samsung,ld9040.txt66
-rw-r--r--Documentation/devicetree/bindings/display/panel/samsung,ld9040.yaml107
-rw-r--r--Documentation/devicetree/bindings/display/panel/samsung,s6d16d0.txt30
-rw-r--r--Documentation/devicetree/bindings/display/panel/samsung,s6d16d0.yaml56
-rw-r--r--Documentation/devicetree/bindings/display/panel/samsung,s6e3ha2.txt31
-rw-r--r--Documentation/devicetree/bindings/display/panel/samsung,s6e63j0x03.txt24
-rw-r--r--Documentation/devicetree/bindings/display/panel/samsung,s6e63m0.txt33
-rw-r--r--Documentation/devicetree/bindings/display/panel/samsung,s6e63m0.yaml60
-rw-r--r--Documentation/devicetree/bindings/display/panel/seiko,43wvf1g.txt23
-rw-r--r--Documentation/devicetree/bindings/display/panel/seiko,43wvf1g.yaml50
-rw-r--r--Documentation/devicetree/bindings/display/panel/sharp,lq150x1lg11.txt36
-rw-r--r--Documentation/devicetree/bindings/display/panel/sharp,lq150x1lg11.yaml58
-rw-r--r--Documentation/devicetree/bindings/display/panel/sharp,ls037v7dw01.txt43
-rw-r--r--Documentation/devicetree/bindings/display/panel/sharp,ls037v7dw01.yaml68
-rw-r--r--Documentation/devicetree/bindings/display/panel/sharp,ls043t1le01.txt22
-rw-r--r--Documentation/devicetree/bindings/display/panel/sharp,ls043t1le01.yaml51
-rw-r--r--Documentation/devicetree/bindings/display/panel/simple-panel.txt1
-rw-r--r--Documentation/devicetree/bindings/display/panel/sitronix,st7701.txt30
-rw-r--r--Documentation/devicetree/bindings/display/panel/sitronix,st7701.yaml69
-rw-r--r--Documentation/devicetree/bindings/display/panel/sitronix,st7789v.txt37
-rw-r--r--Documentation/devicetree/bindings/display/panel/sitronix,st7789v.yaml63
-rw-r--r--Documentation/devicetree/bindings/display/panel/sony,acx565akm.txt30
-rw-r--r--Documentation/devicetree/bindings/display/panel/sony,acx565akm.yaml57
-rw-r--r--Documentation/devicetree/bindings/display/panel/startek,startek-kd050c.txt4
-rw-r--r--Documentation/devicetree/bindings/display/panel/startek,startek-kd050c.yaml33
-rw-r--r--Documentation/devicetree/bindings/display/panel/tpo,td.yaml65
-rw-r--r--Documentation/devicetree/bindings/display/panel/tpo,td028ttec1.txt32
-rw-r--r--Documentation/devicetree/bindings/display/panel/tpo,td043mtea1.txt33
-rw-r--r--Documentation/devicetree/bindings/display/panel/visionox,rm69299.yaml57
-rw-r--r--Documentation/devicetree/bindings/display/panel/xinpeng,xpp055c272.yaml1
-rw-r--r--Documentation/devicetree/bindings/display/renesas,du.txt10
-rw-r--r--Documentation/devicetree/bindings/display/rockchip/rockchip,rk3066-hdmi.txt72
-rw-r--r--Documentation/devicetree/bindings/display/rockchip/rockchip,rk3066-hdmi.yaml140
-rw-r--r--Documentation/devicetree/bindings/display/rockchip/rockchip-vop.txt74
-rw-r--r--Documentation/devicetree/bindings/display/rockchip/rockchip-vop.yaml134
-rw-r--r--Documentation/devicetree/bindings/dma/fsl-edma.txt3
-rw-r--r--Documentation/devicetree/bindings/dma/socionext,uniphier-xdmac.yaml7
-rw-r--r--Documentation/devicetree/bindings/hwmon/baikal,bt1-pvt.yaml107
-rw-r--r--Documentation/devicetree/bindings/iio/adc/st,stm32-adc.yaml2
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/loongson,htvec.yaml57
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/loongson,pch-msi.yaml62
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/loongson,pch-pic.yaml56
-rw-r--r--Documentation/devicetree/bindings/mfd/gateworks-gsc.yaml196
-rw-r--r--Documentation/devicetree/bindings/mfd/max8998.txt8
-rw-r--r--Documentation/devicetree/bindings/mfd/st,stpmic1.yaml2
-rw-r--r--Documentation/devicetree/bindings/mips/loongson/rs780e-acpi.yaml40
-rw-r--r--Documentation/devicetree/bindings/mmc/amlogic,meson-mx-sdhc.yaml68
-rw-r--r--Documentation/devicetree/bindings/mmc/arasan,sdhci.txt57
-rw-r--r--Documentation/devicetree/bindings/mmc/renesas,mmcif.txt5
-rw-r--r--Documentation/devicetree/bindings/mmc/renesas,sdhi.txt1
-rw-r--r--Documentation/devicetree/bindings/mmc/sdhci-msm.txt14
-rw-r--r--Documentation/devicetree/bindings/mmc/sdhci-pxa.txt50
-rw-r--r--Documentation/devicetree/bindings/mmc/sdhci-pxa.yaml102
-rw-r--r--Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml23
-rw-r--r--Documentation/devicetree/bindings/net/dsa/b53.txt3
-rw-r--r--Documentation/devicetree/bindings/net/ethernet-phy.yaml3
-rw-r--r--Documentation/devicetree/bindings/net/fsl-fec.txt8
-rw-r--r--Documentation/devicetree/bindings/net/imx-dwmac.txt56
-rw-r--r--Documentation/devicetree/bindings/net/mdio.yaml50
-rw-r--r--Documentation/devicetree/bindings/net/mediatek,star-emac.yaml89
-rw-r--r--Documentation/devicetree/bindings/net/nxp,tja11xx.yaml61
-rw-r--r--Documentation/devicetree/bindings/net/qca,ar71xx.txt45
-rw-r--r--Documentation/devicetree/bindings/net/qca,ar71xx.yaml216
-rw-r--r--Documentation/devicetree/bindings/net/qcom,ipa.yaml10
-rw-r--r--Documentation/devicetree/bindings/net/qcom,ipq4019-mdio.yaml61
-rw-r--r--Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt6
-rw-r--r--Documentation/devicetree/bindings/net/realtek-bluetooth.yaml54
-rw-r--r--Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt64
-rw-r--r--Documentation/devicetree/bindings/net/socionext,uniphier-ave4.yaml111
-rw-r--r--Documentation/devicetree/bindings/net/ti,dp83867.txt68
-rw-r--r--Documentation/devicetree/bindings/net/ti,dp83867.yaml127
-rw-r--r--Documentation/devicetree/bindings/net/ti,dp83869.yaml2
-rw-r--r--Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml20
-rw-r--r--Documentation/devicetree/bindings/net/ti,k3-am654-cpts.yaml145
-rw-r--r--Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt3
-rw-r--r--Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt14
-rw-r--r--Documentation/devicetree/bindings/pci/loongson.yaml62
-rw-r--r--Documentation/devicetree/bindings/phy/qcom,qusb2-phy.yaml6
-rw-r--r--Documentation/devicetree/bindings/regulator/anatop-regulator.txt40
-rw-r--r--Documentation/devicetree/bindings/regulator/anatop-regulator.yaml94
-rw-r--r--Documentation/devicetree/bindings/regulator/maxim,max77826.yaml68
-rw-r--r--Documentation/devicetree/bindings/regulator/mps,mp5416.yaml1
-rw-r--r--Documentation/devicetree/bindings/regulator/mps,mpq7920.yaml3
-rw-r--r--Documentation/devicetree/bindings/regulator/rohm,bd71828-regulator.yaml6
-rw-r--r--Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.yaml6
-rw-r--r--Documentation/devicetree/bindings/regulator/rohm,bd71847-regulator.yaml6
-rw-r--r--Documentation/devicetree/bindings/reserved-memory/ramoops.txt13
-rw-r--r--Documentation/devicetree/bindings/rng/arm-cctrng.yaml54
-rw-r--r--Documentation/devicetree/bindings/rtc/dw-apb.txt32
-rw-r--r--Documentation/devicetree/bindings/sound/rockchip-i2s.yaml3
-rw-r--r--Documentation/devicetree/bindings/sound/rockchip-spdif.txt45
-rw-r--r--Documentation/devicetree/bindings/sound/rockchip-spdif.yaml101
-rw-r--r--Documentation/devicetree/bindings/spi/brcm,spi-bcm-qspi.txt10
-rw-r--r--Documentation/devicetree/bindings/spi/mikrotik,rb4xx-spi.yaml36
-rw-r--r--Documentation/devicetree/bindings/spi/renesas,rspi.yaml144
-rw-r--r--Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt41
-rw-r--r--Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml133
-rw-r--r--Documentation/devicetree/bindings/spi/socionext,uniphier-spi.yaml57
-rw-r--r--Documentation/devicetree/bindings/spi/spi-dw.txt24
-rw-r--r--Documentation/devicetree/bindings/spi/spi-rspi.txt73
-rw-r--r--Documentation/devicetree/bindings/spi/spi-uniphier.txt28
-rw-r--r--Documentation/devicetree/bindings/spi/ti_qspi.txt2
-rw-r--r--Documentation/devicetree/bindings/timer/renesas,em-sti.yaml46
-rw-r--r--Documentation/devicetree/bindings/timer/snps,dw-apb-timer.yaml88
-rw-r--r--Documentation/devicetree/bindings/usb/renesas,usb3-peri.yaml1
-rw-r--r--Documentation/devicetree/bindings/usb/renesas,usbhs.yaml1
-rw-r--r--Documentation/devicetree/bindings/usb/usb-xhci.txt3
-rw-r--r--Documentation/devicetree/bindings/vendor-prefixes.yaml10
-rw-r--r--Documentation/doc-guide/maintainer-profile.rst2
-rw-r--r--Documentation/driver-api/dma-buf.rst4
-rw-r--r--Documentation/driver-api/driver-model/device.rst4
-rw-r--r--Documentation/driver-api/driver-model/devres.rst5
-rw-r--r--Documentation/driver-api/driver-model/overview.rst2
-rw-r--r--Documentation/driver-api/index.rst1
-rw-r--r--Documentation/driver-api/ipmi.rst (renamed from Documentation/IPMI.txt)0
-rw-r--r--Documentation/driver-api/nvdimm/nvdimm.rst4
-rw-r--r--Documentation/driver-api/pm/cpuidle.rst5
-rw-r--r--Documentation/driver-api/pm/devices.rst199
-rw-r--r--Documentation/driver-api/thermal/cpu-idle-cooling.rst3
-rw-r--r--Documentation/driver-api/thermal/index.rst1
-rw-r--r--Documentation/fb/efifb.rst38
-rw-r--r--Documentation/features/core/eBPF-JIT/arch-support.txt2
-rw-r--r--Documentation/features/debug/KASAN/arch-support.txt6
-rw-r--r--Documentation/features/debug/gcov-profile-all/arch-support.txt2
-rw-r--r--Documentation/features/debug/kprobes-on-ftrace/arch-support.txt2
-rw-r--r--Documentation/features/debug/kprobes/arch-support.txt4
-rw-r--r--Documentation/features/debug/kretprobes/arch-support.txt2
-rw-r--r--Documentation/features/debug/stackprotector/arch-support.txt2
-rw-r--r--Documentation/features/debug/uprobes/arch-support.txt2
-rw-r--r--Documentation/features/io/dma-contiguous/arch-support.txt2
-rw-r--r--Documentation/features/locking/lockdep/arch-support.txt2
-rw-r--r--Documentation/features/perf/kprobes-event/arch-support.txt4
-rw-r--r--Documentation/features/perf/perf-regs/arch-support.txt4
-rw-r--r--Documentation/features/perf/perf-stackdump/arch-support.txt4
-rw-r--r--Documentation/features/seccomp/seccomp-filter/arch-support.txt2
-rw-r--r--Documentation/features/vm/huge-vmap/arch-support.txt2
-rw-r--r--Documentation/features/vm/numa-memblock/arch-support.txt34
-rw-r--r--Documentation/features/vm/pte_special/arch-support.txt2
-rw-r--r--Documentation/filesystems/9p.rst2
-rw-r--r--Documentation/filesystems/afs.rst2
-rw-r--r--Documentation/filesystems/automount-support.rst (renamed from Documentation/filesystems/automount-support.txt)23
-rw-r--r--Documentation/filesystems/caching/backend-api.rst (renamed from Documentation/filesystems/caching/backend-api.txt)165
-rw-r--r--Documentation/filesystems/caching/cachefiles.rst (renamed from Documentation/filesystems/caching/cachefiles.txt)139
-rw-r--r--Documentation/filesystems/caching/fscache.rst565
-rw-r--r--Documentation/filesystems/caching/fscache.txt448
-rw-r--r--Documentation/filesystems/caching/index.rst14
-rw-r--r--Documentation/filesystems/caching/netfs-api.rst (renamed from Documentation/filesystems/caching/netfs-api.txt)172
-rw-r--r--Documentation/filesystems/caching/object.rst (renamed from Documentation/filesystems/caching/object.txt)43
-rw-r--r--Documentation/filesystems/caching/operations.rst (renamed from Documentation/filesystems/caching/operations.txt)45
-rw-r--r--Documentation/filesystems/cifs/cifsroot.rst (renamed from Documentation/filesystems/cifs/cifsroot.txt)56
-rw-r--r--Documentation/filesystems/coda.rst1670
-rw-r--r--Documentation/filesystems/coda.txt1676
-rw-r--r--Documentation/filesystems/configfs.rst (renamed from Documentation/filesystems/configfs/configfs.txt)131
-rw-r--r--Documentation/filesystems/dax.txt144
-rw-r--r--Documentation/filesystems/debugfs.rst9
-rw-r--r--Documentation/filesystems/devpts.rst36
-rw-r--r--Documentation/filesystems/devpts.txt26
-rw-r--r--Documentation/filesystems/dnotify.rst (renamed from Documentation/filesystems/dnotify.txt)13
-rw-r--r--Documentation/filesystems/efivarfs.rst17
-rw-r--r--Documentation/filesystems/f2fs.rst6
-rw-r--r--Documentation/filesystems/fiemap.rst (renamed from Documentation/filesystems/fiemap.txt)135
-rw-r--r--Documentation/filesystems/files.rst (renamed from Documentation/filesystems/files.txt)15
-rw-r--r--Documentation/filesystems/fscrypt.rst33
-rw-r--r--Documentation/filesystems/fuse-io.rst (renamed from Documentation/filesystems/fuse-io.txt)6
-rw-r--r--Documentation/filesystems/index.rst23
-rw-r--r--Documentation/filesystems/locking.rst6
-rw-r--r--Documentation/filesystems/locks.rst (renamed from Documentation/filesystems/locks.txt)14
-rw-r--r--Documentation/filesystems/mandatory-locking.rst (renamed from Documentation/filesystems/mandatory-locking.txt)25
-rw-r--r--Documentation/filesystems/mount_api.rst (renamed from Documentation/filesystems/mount_api.txt)329
-rw-r--r--Documentation/filesystems/orangefs.rst4
-rw-r--r--Documentation/filesystems/proc.rst7
-rw-r--r--Documentation/filesystems/quota.rst (renamed from Documentation/filesystems/quota.txt)41
-rw-r--r--Documentation/filesystems/ramfs-rootfs-initramfs.rst2
-rw-r--r--Documentation/filesystems/seq_file.rst (renamed from Documentation/filesystems/seq_file.txt)61
-rw-r--r--Documentation/filesystems/sharedsubtree.rst (renamed from Documentation/filesystems/sharedsubtree.txt)398
-rw-r--r--Documentation/filesystems/spufs/index.rst13
-rw-r--r--Documentation/filesystems/spufs/spu_create.rst131
-rw-r--r--Documentation/filesystems/spufs/spu_run.rst138
-rw-r--r--Documentation/filesystems/spufs/spufs.rst (renamed from Documentation/filesystems/spufs.txt)304
-rw-r--r--Documentation/filesystems/sysfs-pci.rst (renamed from Documentation/filesystems/sysfs-pci.txt)23
-rw-r--r--Documentation/filesystems/sysfs-tagging.rst (renamed from Documentation/filesystems/sysfs-tagging.txt)22
-rw-r--r--Documentation/filesystems/sysfs.rst2
-rw-r--r--Documentation/filesystems/vfs.rst15
-rw-r--r--Documentation/filesystems/xfs-delayed-logging-design.rst (renamed from Documentation/filesystems/xfs-delayed-logging-design.txt)65
-rw-r--r--Documentation/filesystems/xfs-self-describing-metadata.rst (renamed from Documentation/filesystems/xfs-self-describing-metadata.txt)200
-rw-r--r--Documentation/gpu/amdgpu.rst88
-rw-r--r--Documentation/gpu/drm-internals.rst12
-rw-r--r--Documentation/gpu/drm-kms.rst5
-rw-r--r--Documentation/gpu/drm-mm.rst9
-rw-r--r--Documentation/gpu/i915.rst52
-rw-r--r--Documentation/gpu/todo.rst12
-rw-r--r--Documentation/hwmon/amd_energy.rst109
-rw-r--r--Documentation/hwmon/bcm54140.rst45
-rw-r--r--Documentation/hwmon/bt1-pvt.rst117
-rw-r--r--Documentation/hwmon/gsc-hwmon.rst53
-rw-r--r--Documentation/hwmon/ina2xx.rst19
-rw-r--r--Documentation/hwmon/index.rst5
-rw-r--r--Documentation/hwmon/lm90.rst23
-rw-r--r--Documentation/hwmon/max16601.rst159
-rw-r--r--Documentation/i2c/i2c_bus.svg (renamed from Documentation/i2c/i2c.svg)2
-rw-r--r--Documentation/i2c/summary.rst2
-rw-r--r--Documentation/ia64/irq-redir.rst2
-rw-r--r--Documentation/iio/iio_configfs.rst2
-rw-r--r--Documentation/kbuild/makefiles.rst3
-rw-r--r--Documentation/locking/futex-requeue-pi.rst (renamed from Documentation/futex-requeue-pi.txt)0
-rw-r--r--Documentation/locking/hwspinlock.rst (renamed from Documentation/hwspinlock.txt)0
-rw-r--r--Documentation/locking/index.rst7
-rw-r--r--Documentation/locking/locktorture.rst2
-rw-r--r--Documentation/locking/locktypes.rst215
-rw-r--r--Documentation/locking/percpu-rw-semaphore.rst (renamed from Documentation/percpu-rw-semaphore.txt)0
-rw-r--r--Documentation/locking/pi-futex.rst (renamed from Documentation/pi-futex.txt)0
-rw-r--r--Documentation/locking/preempt-locking.rst (renamed from Documentation/preempt-locking.txt)0
-rw-r--r--Documentation/locking/robust-futex-ABI.rst (renamed from Documentation/robust-futex-ABI.txt)0
-rw-r--r--Documentation/locking/robust-futexes.rst (renamed from Documentation/robust-futexes.txt)0
-rw-r--r--Documentation/locking/rt-mutex.rst2
-rw-r--r--Documentation/maintainer/maintainer-entry-profile.rst12
-rw-r--r--Documentation/memory-barriers.txt2
-rw-r--r--Documentation/misc-devices/index.rst1
-rw-r--r--Documentation/networking/6pack.rst (renamed from Documentation/networking/6pack.txt)46
-rw-r--r--Documentation/networking/altera_tse.rst (renamed from Documentation/networking/altera_tse.txt)89
-rw-r--r--Documentation/networking/arcnet-hardware.rst (renamed from Documentation/networking/arcnet-hardware.txt)2227
-rw-r--r--Documentation/networking/arcnet.rst (renamed from Documentation/networking/arcnet.txt)348
-rw-r--r--Documentation/networking/atm.rst (renamed from Documentation/networking/atm.txt)6
-rw-r--r--Documentation/networking/ax25.rst (renamed from Documentation/networking/ax25.txt)6
-rw-r--r--Documentation/networking/baycom.rst (renamed from Documentation/networking/baycom.txt)120
-rw-r--r--Documentation/networking/bonding.rst (renamed from Documentation/networking/bonding.txt)1317
-rw-r--r--Documentation/networking/caif/caif.rst2
-rw-r--r--Documentation/networking/caif/index.rst13
-rw-r--r--Documentation/networking/caif/linux_caif.rst (renamed from Documentation/networking/caif/Linux-CAIF.txt)54
-rw-r--r--Documentation/networking/caif/spi_porting.rst229
-rw-r--r--Documentation/networking/caif/spi_porting.txt208
-rw-r--r--Documentation/networking/can.rst2
-rw-r--r--Documentation/networking/cdc_mbim.rst (renamed from Documentation/networking/cdc_mbim.txt)76
-rw-r--r--Documentation/networking/checksum-offloads.rst2
-rw-r--r--Documentation/networking/cops.rst80
-rw-r--r--Documentation/networking/cops.txt63
-rw-r--r--Documentation/networking/cxacru.rst (renamed from Documentation/networking/cxacru.txt)86
-rw-r--r--Documentation/networking/dccp.rst (renamed from Documentation/networking/dccp.txt)39
-rw-r--r--Documentation/networking/dctcp.rst (renamed from Documentation/networking/dctcp.txt)14
-rw-r--r--Documentation/networking/decnet.rst (renamed from Documentation/networking/decnet.txt)77
-rw-r--r--Documentation/networking/defza.rst (renamed from Documentation/networking/defza.txt)8
-rw-r--r--Documentation/networking/device_drivers/3com/3c509.rst (renamed from Documentation/networking/device_drivers/3com/3c509.txt)162
-rw-r--r--Documentation/networking/device_drivers/3com/vortex.rst (renamed from Documentation/networking/device_drivers/3com/vortex.txt)227
-rw-r--r--Documentation/networking/device_drivers/amazon/ena.rst (renamed from Documentation/networking/device_drivers/amazon/ena.txt)144
-rw-r--r--Documentation/networking/device_drivers/aquantia/atlantic.rst (renamed from Documentation/networking/device_drivers/aquantia/atlantic.txt)369
-rw-r--r--Documentation/networking/device_drivers/chelsio/cxgb.rst (renamed from Documentation/networking/device_drivers/chelsio/cxgb.txt)183
-rw-r--r--Documentation/networking/device_drivers/cirrus/cs89x0.rst (renamed from Documentation/networking/device_drivers/cirrus/cs89x0.txt)559
-rw-r--r--Documentation/networking/device_drivers/davicom/dm9000.rst (renamed from Documentation/networking/device_drivers/davicom/dm9000.txt)24
-rw-r--r--Documentation/networking/device_drivers/dec/de4x5.rst (renamed from Documentation/networking/device_drivers/dec/de4x5.txt)105
-rw-r--r--Documentation/networking/device_drivers/dec/dmfe.rst (renamed from Documentation/networking/device_drivers/dec/dmfe.txt)35
-rw-r--r--Documentation/networking/device_drivers/dlink/dl2k.rst (renamed from Documentation/networking/device_drivers/dlink/dl2k.txt)228
-rw-r--r--Documentation/networking/device_drivers/freescale/dpaa.rst (renamed from Documentation/networking/device_drivers/freescale/dpaa.txt)141
-rw-r--r--Documentation/networking/device_drivers/freescale/gianfar.rst (renamed from Documentation/networking/device_drivers/freescale/gianfar.txt)21
-rw-r--r--Documentation/networking/device_drivers/index.rst24
-rw-r--r--Documentation/networking/device_drivers/intel/e100.rst2
-rw-r--r--Documentation/networking/device_drivers/intel/ipw2100.rst (renamed from Documentation/networking/device_drivers/intel/ipw2100.txt)240
-rw-r--r--Documentation/networking/device_drivers/intel/ipw2200.rst (renamed from Documentation/networking/device_drivers/intel/ipw2200.txt)414
-rw-r--r--Documentation/networking/device_drivers/intel/ixgb.rst2
-rw-r--r--Documentation/networking/device_drivers/microsoft/netvsc.rst (renamed from Documentation/networking/device_drivers/microsoft/netvsc.txt)57
-rw-r--r--Documentation/networking/device_drivers/neterion/s2io.rst196
-rw-r--r--Documentation/networking/device_drivers/neterion/s2io.txt141
-rw-r--r--Documentation/networking/device_drivers/neterion/vxge.rst (renamed from Documentation/networking/device_drivers/neterion/vxge.txt)60
-rw-r--r--Documentation/networking/device_drivers/pensando/ionic.rst231
-rw-r--r--Documentation/networking/device_drivers/qualcomm/rmnet.rst (renamed from Documentation/networking/device_drivers/qualcomm/rmnet.txt)43
-rw-r--r--Documentation/networking/device_drivers/sb1000.rst222
-rw-r--r--Documentation/networking/device_drivers/sb1000.txt207
-rw-r--r--Documentation/networking/device_drivers/smsc/smc9.rst48
-rw-r--r--Documentation/networking/device_drivers/smsc/smc9.txt42
-rw-r--r--Documentation/networking/device_drivers/ti/cpsw.rst587
-rw-r--r--Documentation/networking/device_drivers/ti/cpsw.txt541
-rw-r--r--Documentation/networking/device_drivers/ti/cpsw_switchdev.rst (renamed from Documentation/networking/device_drivers/ti/cpsw_switchdev.txt)243
-rw-r--r--Documentation/networking/device_drivers/ti/tlan.rst (renamed from Documentation/networking/device_drivers/ti/tlan.txt)73
-rw-r--r--Documentation/networking/device_drivers/toshiba/spider_net.rst (renamed from Documentation/networking/device_drivers/toshiba/spider_net.txt)60
-rw-r--r--Documentation/networking/devlink-params-sja1105.txt27
-rw-r--r--Documentation/networking/devlink/devlink-region.rst11
-rw-r--r--Documentation/networking/devlink/devlink-trap.rst219
-rw-r--r--Documentation/networking/devlink/ice.rst15
-rw-r--r--Documentation/networking/dns_resolver.rst (renamed from Documentation/networking/dns_resolver.txt)52
-rw-r--r--Documentation/networking/driver.rst (renamed from Documentation/networking/driver.txt)22
-rw-r--r--Documentation/networking/dsa/sja1105.rst327
-rw-r--r--Documentation/networking/eql.rst (renamed from Documentation/networking/eql.txt)443
-rw-r--r--Documentation/networking/ethtool-netlink.rst195
-rw-r--r--Documentation/networking/fib_trie.rst (renamed from Documentation/networking/fib_trie.txt)16
-rw-r--r--Documentation/networking/filter.rst (renamed from Documentation/networking/filter.txt)868
-rw-r--r--Documentation/networking/fore200e.rst (renamed from Documentation/networking/fore200e.txt)8
-rw-r--r--Documentation/networking/framerelay.rst (renamed from Documentation/networking/framerelay.txt)21
-rw-r--r--Documentation/networking/gen_stats.rst (renamed from Documentation/networking/gen_stats.txt)98
-rw-r--r--Documentation/networking/generic-hdlc.rst (renamed from Documentation/networking/generic-hdlc.txt)86
-rw-r--r--Documentation/networking/generic_netlink.rst (renamed from Documentation/networking/generic_netlink.txt)6
-rw-r--r--Documentation/networking/gtp.rst (renamed from Documentation/networking/gtp.txt)97
-rw-r--r--Documentation/networking/hinic.rst (renamed from Documentation/networking/hinic.txt)5
-rw-r--r--Documentation/networking/ila.rst (renamed from Documentation/networking/ila.txt)89
-rw-r--r--Documentation/networking/index.rst87
-rw-r--r--Documentation/networking/ip-sysctl.rst (renamed from Documentation/networking/ip-sysctl.txt)862
-rw-r--r--Documentation/networking/ip_dynaddr.rst (renamed from Documentation/networking/ip_dynaddr.txt)29
-rw-r--r--Documentation/networking/ipddp.rst (renamed from Documentation/networking/ipddp.txt)13
-rw-r--r--Documentation/networking/iphase.rst (renamed from Documentation/networking/iphase.txt)187
-rw-r--r--Documentation/networking/ipsec.rst (renamed from Documentation/networking/ipsec.txt)14
-rw-r--r--Documentation/networking/ipv6.rst (renamed from Documentation/networking/ipv6.txt)8
-rw-r--r--Documentation/networking/ipvlan.rst (renamed from Documentation/networking/ipvlan.txt)159
-rw-r--r--Documentation/networking/ipvs-sysctl.rst (renamed from Documentation/networking/ipvs-sysctl.txt)188
-rw-r--r--Documentation/networking/kcm.rst (renamed from Documentation/networking/kcm.txt)85
-rw-r--r--Documentation/networking/l2tp.rst (renamed from Documentation/networking/l2tp.txt)159
-rw-r--r--Documentation/networking/lapb-module.rst (renamed from Documentation/networking/lapb-module.txt)122
-rw-r--r--Documentation/networking/ltpc.rst (renamed from Documentation/networking/ltpc.txt)47
-rw-r--r--Documentation/networking/mac80211-injection.rst (renamed from Documentation/networking/mac80211-injection.txt)41
-rw-r--r--Documentation/networking/mpls-sysctl.rst (renamed from Documentation/networking/mpls-sysctl.txt)17
-rw-r--r--Documentation/networking/multiqueue.rst (renamed from Documentation/networking/multiqueue.txt)41
-rw-r--r--Documentation/networking/netconsole.rst (renamed from Documentation/networking/netconsole.txt)125
-rw-r--r--Documentation/networking/netdev-features.rst (renamed from Documentation/networking/netdev-features.txt)19
-rw-r--r--Documentation/networking/netdevices.rst (renamed from Documentation/networking/netdevices.txt)21
-rw-r--r--Documentation/networking/netfilter-sysctl.rst (renamed from Documentation/networking/netfilter-sysctl.txt)11
-rw-r--r--Documentation/networking/netif-msg.rst95
-rw-r--r--Documentation/networking/netif-msg.txt79
-rw-r--r--Documentation/networking/nf_conntrack-sysctl.rst (renamed from Documentation/networking/nf_conntrack-sysctl.txt)51
-rw-r--r--Documentation/networking/nf_flowtable.rst (renamed from Documentation/networking/nf_flowtable.txt)55
-rw-r--r--Documentation/networking/openvswitch.rst (renamed from Documentation/networking/openvswitch.txt)23
-rw-r--r--Documentation/networking/operstates.rst (renamed from Documentation/networking/operstates.txt)45
-rw-r--r--Documentation/networking/packet_mmap.rst1084
-rw-r--r--Documentation/networking/packet_mmap.txt1061
-rw-r--r--Documentation/networking/phonet.rst (renamed from Documentation/networking/phonet.txt)56
-rw-r--r--Documentation/networking/pktgen.rst (renamed from Documentation/networking/pktgen.txt)320
-rw-r--r--Documentation/networking/plip.rst (renamed from Documentation/networking/PLIP.txt)43
-rw-r--r--Documentation/networking/ppp_generic.rst (renamed from Documentation/networking/ppp_generic.txt)52
-rw-r--r--Documentation/networking/proc_net_tcp.rst (renamed from Documentation/networking/proc_net_tcp.txt)23
-rw-r--r--Documentation/networking/radiotap-headers.rst (renamed from Documentation/networking/radiotap-headers.txt)99
-rw-r--r--Documentation/networking/ray_cs.rst (renamed from Documentation/networking/ray_cs.txt)105
-rw-r--r--Documentation/networking/rds.rst (renamed from Documentation/networking/rds.txt)305
-rw-r--r--Documentation/networking/regulatory.rst (renamed from Documentation/networking/regulatory.txt)29
-rw-r--r--Documentation/networking/rxrpc.rst (renamed from Documentation/networking/rxrpc.txt)319
-rw-r--r--Documentation/networking/scaling.rst4
-rw-r--r--Documentation/networking/sctp.rst (renamed from Documentation/networking/sctp.txt)37
-rw-r--r--Documentation/networking/secid.rst (renamed from Documentation/networking/secid.txt)6
-rw-r--r--Documentation/networking/seg6-sysctl.rst26
-rw-r--r--Documentation/networking/seg6-sysctl.txt18
-rw-r--r--Documentation/networking/skfp.rst (renamed from Documentation/networking/skfp.txt)153
-rw-r--r--Documentation/networking/snmp_counter.rst2
-rw-r--r--Documentation/networking/strparser.rst (renamed from Documentation/networking/strparser.txt)85
-rw-r--r--Documentation/networking/switchdev.rst (renamed from Documentation/networking/switchdev.txt)116
-rw-r--r--Documentation/networking/tc-actions-env-rules.rst29
-rw-r--r--Documentation/networking/tc-actions-env-rules.txt24
-rw-r--r--Documentation/networking/tcp-thin.rst (renamed from Documentation/networking/tcp-thin.txt)5
-rw-r--r--Documentation/networking/team.rst (renamed from Documentation/networking/team.txt)6
-rw-r--r--Documentation/networking/timestamping.rst (renamed from Documentation/networking/timestamping.txt)166
-rw-r--r--Documentation/networking/tproxy.rst (renamed from Documentation/networking/tproxy.txt)57
-rw-r--r--Documentation/networking/tuntap.rst (renamed from Documentation/networking/tuntap.txt)200
-rw-r--r--Documentation/networking/udplite.rst (renamed from Documentation/networking/udplite.txt)175
-rw-r--r--Documentation/networking/vrf.rst451
-rw-r--r--Documentation/networking/vrf.txt418
-rw-r--r--Documentation/networking/vxlan.rst (renamed from Documentation/networking/vxlan.txt)33
-rw-r--r--Documentation/networking/x25-iface.rst (renamed from Documentation/networking/x25-iface.txt)10
-rw-r--r--Documentation/networking/x25.rst (renamed from Documentation/networking/x25.txt)4
-rw-r--r--Documentation/networking/xfrm_device.rst (renamed from Documentation/networking/xfrm_device.txt)33
-rw-r--r--Documentation/networking/xfrm_proc.rst (renamed from Documentation/networking/xfrm_proc.txt)31
-rw-r--r--Documentation/networking/xfrm_sync.rst (renamed from Documentation/networking/xfrm_sync.txt)66
-rw-r--r--Documentation/networking/xfrm_sysctl.rst (renamed from Documentation/networking/xfrm_sysctl.txt)7
-rw-r--r--Documentation/networking/z8530drv.rst (renamed from Documentation/networking/z8530drv.txt)629
-rw-r--r--Documentation/nvdimm/maintainer-entry-profile.rst14
-rw-r--r--Documentation/power/pci.rst58
-rw-r--r--Documentation/power/suspend-and-cpuhotplug.rst6
-rw-r--r--Documentation/powerpc/cxl.rst2
-rw-r--r--Documentation/powerpc/firmware-assisted-dump.rst2
-rw-r--r--Documentation/process/adding-syscalls.rst2
-rw-r--r--Documentation/process/coding-style.rst23
-rw-r--r--Documentation/process/index.rst1
-rw-r--r--Documentation/process/submit-checklist.rst2
-rw-r--r--Documentation/process/unaligned-memory-access.rst (renamed from Documentation/unaligned-memory-access.txt)0
-rw-r--r--Documentation/s390/vfio-ap.rst2
-rw-r--r--Documentation/scheduler/sched-domains.rst10
-rw-r--r--Documentation/security/digsig.rst (renamed from Documentation/digsig.txt)0
-rw-r--r--Documentation/security/index.rst1
-rw-r--r--Documentation/security/lsm.rst202
-rw-r--r--Documentation/security/siphash.rst2
-rw-r--r--Documentation/sphinx/requirements.txt2
-rw-r--r--Documentation/timers/timers-howto.rst3
-rw-r--r--Documentation/trace/coresight/coresight-ect.rst1
-rw-r--r--Documentation/trace/events.rst28
-rw-r--r--Documentation/trace/ftrace-design.rst8
-rw-r--r--Documentation/translations/it_IT/doc-guide/kernel-doc.rst25
-rw-r--r--Documentation/translations/it_IT/kernel-hacking/hacking.rst18
-rw-r--r--Documentation/translations/it_IT/kernel-hacking/locking.rst172
-rw-r--r--Documentation/translations/it_IT/process/2.Process.rst95
-rw-r--r--Documentation/translations/it_IT/process/adding-syscalls.rst2
-rw-r--r--Documentation/translations/it_IT/process/coding-style.rst6
-rw-r--r--Documentation/translations/it_IT/process/deprecated.rst130
-rw-r--r--Documentation/translations/it_IT/process/email-clients.rst332
-rw-r--r--Documentation/translations/it_IT/process/index.rst1
-rw-r--r--Documentation/translations/it_IT/process/management-style.rst293
-rw-r--r--Documentation/translations/it_IT/process/submit-checklist.rst2
-rw-r--r--Documentation/translations/it_IT/riscv/patch-acceptance.rst40
-rw-r--r--Documentation/translations/ko_KR/memory-barriers.txt2
-rw-r--r--Documentation/translations/zh_CN/IRQ.txt4
-rw-r--r--Documentation/translations/zh_CN/filesystems/debugfs.rst221
-rw-r--r--Documentation/translations/zh_CN/filesystems/index.rst1
-rw-r--r--Documentation/translations/zh_CN/filesystems/sysfs.txt8
-rw-r--r--Documentation/translations/zh_CN/process/submit-checklist.rst2
-rw-r--r--Documentation/translations/zh_CN/video4linux/v4l2-framework.txt2
-rw-r--r--Documentation/usb/gadget_configfs.rst4
-rw-r--r--Documentation/usb/raw-gadget.rst37
-rw-r--r--Documentation/userspace-api/ioctl/ioctl-number.rst1
-rw-r--r--Documentation/virt/kvm/amd-memory-encryption.rst2
-rw-r--r--Documentation/virt/kvm/api.rst53
-rw-r--r--Documentation/virt/kvm/arm/pvtime.rst2
-rw-r--r--Documentation/virt/kvm/cpuid.rst8
-rw-r--r--Documentation/virt/kvm/devices/vcpu.rst2
-rw-r--r--Documentation/virt/kvm/hypercalls.rst4
-rw-r--r--Documentation/virt/kvm/index.rst2
-rw-r--r--Documentation/virt/kvm/mmu.rst2
-rw-r--r--Documentation/virt/kvm/msr.rst119
-rw-r--r--Documentation/virt/kvm/nested-vmx.rst5
-rw-r--r--Documentation/virt/kvm/review-checklist.rst2
-rw-r--r--Documentation/virt/kvm/running-nested-guests.rst276
-rw-r--r--Documentation/vm/hmm.rst30
-rw-r--r--Documentation/vm/index.rst1
-rw-r--r--Documentation/vm/memory-model.rst9
-rw-r--r--Documentation/vm/page_frags.rst2
-rw-r--r--Documentation/vm/page_owner.rst3
-rw-r--r--Documentation/vm/slub.rst2
-rw-r--r--Documentation/vm/zswap.rst4
-rw-r--r--Documentation/watchdog/convert_drivers_to_kernel_api.rst4
-rw-r--r--Documentation/watchdog/watchdog-kernel-api.rst2
-rw-r--r--Documentation/x86/x86_64/uefi.rst2
560 files changed, 28329 insertions, 16016 deletions
diff --git a/Documentation/ABI/obsolete/sysfs-cpuidle b/Documentation/ABI/obsolete/sysfs-cpuidle
new file mode 100644
index 000000000000..e398fb5e542f
--- /dev/null
+++ b/Documentation/ABI/obsolete/sysfs-cpuidle
@@ -0,0 +1,9 @@
+What: /sys/devices/system/cpu/cpuidle/current_governor_ro
+Date: April, 2020
+Contact: linux-pm@vger.kernel.org
+Description:
+ current_governor_ro shows current using cpuidle governor, but read only.
+ with the update that cpuidle governor can be changed at runtime in default,
+ both current_governor and current_governor_ro co-exist under
+ /sys/devices/system/cpu/cpuidle/ file, it's duplicate so make
+ current_governor_ro obselete.
diff --git a/Documentation/ABI/obsolete/sysfs-driver-intel_pmc_bxt b/Documentation/ABI/obsolete/sysfs-driver-intel_pmc_bxt
new file mode 100644
index 000000000000..39d5659f388b
--- /dev/null
+++ b/Documentation/ABI/obsolete/sysfs-driver-intel_pmc_bxt
@@ -0,0 +1,22 @@
+These files allow sending arbitrary IPC commands to the PMC/SCU which
+may be dangerous. These will be removed eventually and should not be
+used in any new applications.
+
+What: /sys/bus/platform/devices/INT34D2:00/simplecmd
+Date: Jun 2015
+KernelVersion: 4.1
+Contact: Mika Westerberg <mika.westerberg@linux.intel.com>
+Description: This interface allows userspace to send an arbitrary
+ IPC command to the PMC/SCU.
+
+ Format: %d %d where first number is command and
+ second number is subcommand.
+
+What: /sys/bus/platform/devices/INT34D2:00/northpeak
+Date: Jun 2015
+KernelVersion: 4.1
+Contact: Mika Westerberg <mika.westerberg@linux.intel.com>
+Description: This interface allows userspace to enable and disable
+ Northpeak through the PMC/SCU.
+
+ Format: %u.
diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
index df8413cf1468..484fc04bcc25 100644
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -54,7 +54,7 @@ Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Provides information about the node's distribution and memory
- utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.txt
+ utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.rst
What: /sys/devices/system/node/nodeX/numastat
Date: October 2002
diff --git a/Documentation/ABI/testing/debugfs-hisi-hpre b/Documentation/ABI/testing/debugfs-hisi-hpre
index ec4a79e3a807..b4be5f1db4b7 100644
--- a/Documentation/ABI/testing/debugfs-hisi-hpre
+++ b/Documentation/ABI/testing/debugfs-hisi-hpre
@@ -33,7 +33,7 @@ Contact: linux-crypto@vger.kernel.org
Description: Dump debug registers from the HPRE.
Only available for PF.
-What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/qm_regs
+What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/regs
Date: Sep 2019
Contact: linux-crypto@vger.kernel.org
Description: Dump debug registers from the QM.
@@ -44,14 +44,97 @@ What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/current_q
Date: Sep 2019
Contact: linux-crypto@vger.kernel.org
Description: One QM may contain multiple queues. Select specific queue to
- show its debug registers in above qm_regs.
+ show its debug registers in above regs.
Only available for PF.
What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/clear_enable
Date: Sep 2019
Contact: linux-crypto@vger.kernel.org
-Description: QM debug registers(qm_regs) read clear control. 1 means enable
+Description: QM debug registers(regs) read clear control. 1 means enable
register read clear, otherwise 0.
Writing to this file has no functional effect, only enable or
disable counters clear after reading of these registers.
Only available for PF.
+
+What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/err_irq
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the number of invalid interrupts for
+ QM task completion.
+ Available for both PF and VF, and take no other effect on HPRE.
+
+What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/aeq_irq
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the number of QM async event queue interrupts.
+ Available for both PF and VF, and take no other effect on HPRE.
+
+What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/abnormal_irq
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the number of interrupts for QM abnormal event.
+ Available for both PF and VF, and take no other effect on HPRE.
+
+What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/create_qp_err
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the number of queue allocation errors.
+ Available for both PF and VF, and take no other effect on HPRE.
+
+What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/mb_err
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the number of failed QM mailbox commands.
+ Available for both PF and VF, and take no other effect on HPRE.
+
+What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/status
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the status of the QM.
+ Four states: initiated, started, stopped and closed.
+ Available for both PF and VF, and take no other effect on HPRE.
+
+What: /sys/kernel/debug/hisi_hpre/<bdf>/hpre_dfx/send_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of sent requests.
+ Available for both PF and VF, and take no other effect on HPRE.
+
+What: /sys/kernel/debug/hisi_hpre/<bdf>/hpre_dfx/recv_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of received requests.
+ Available for both PF and VF, and take no other effect on HPRE.
+
+What: /sys/kernel/debug/hisi_hpre/<bdf>/hpre_dfx/send_busy_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of requests sent
+ with returning busy.
+ Available for both PF and VF, and take no other effect on HPRE.
+
+What: /sys/kernel/debug/hisi_hpre/<bdf>/hpre_dfx/send_fail_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of completed but error requests.
+ Available for both PF and VF, and take no other effect on HPRE.
+
+What: /sys/kernel/debug/hisi_hpre/<bdf>/hpre_dfx/invalid_req_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of invalid requests being received.
+ Available for both PF and VF, and take no other effect on HPRE.
+
+What: /sys/kernel/debug/hisi_hpre/<bdf>/hpre_dfx/overtime_thrhld
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Set the threshold time for counting the request which is
+ processed longer than the threshold.
+ 0: disable(default), 1: 1 microsecond.
+ Available for both PF and VF, and take no other effect on HPRE.
+
+What: /sys/kernel/debug/hisi_hpre/<bdf>/hpre_dfx/over_thrhld_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of time out requests.
+ Available for both PF and VF, and take no other effect on HPRE.
diff --git a/Documentation/ABI/testing/debugfs-hisi-sec b/Documentation/ABI/testing/debugfs-hisi-sec
index 06adb899495e..85feb4408e0f 100644
--- a/Documentation/ABI/testing/debugfs-hisi-sec
+++ b/Documentation/ABI/testing/debugfs-hisi-sec
@@ -1,10 +1,4 @@
-What: /sys/kernel/debug/hisi_sec/<bdf>/sec_dfx
-Date: Oct 2019
-Contact: linux-crypto@vger.kernel.org
-Description: Dump the debug registers of SEC cores.
- Only available for PF.
-
-What: /sys/kernel/debug/hisi_sec/<bdf>/clear_enable
+What: /sys/kernel/debug/hisi_sec2/<bdf>/clear_enable
Date: Oct 2019
Contact: linux-crypto@vger.kernel.org
Description: Enabling/disabling of clear action after reading
@@ -12,7 +6,7 @@ Description: Enabling/disabling of clear action after reading
0: disable, 1: enable.
Only available for PF, and take no other effect on SEC.
-What: /sys/kernel/debug/hisi_sec/<bdf>/current_qm
+What: /sys/kernel/debug/hisi_sec2/<bdf>/current_qm
Date: Oct 2019
Contact: linux-crypto@vger.kernel.org
Description: One SEC controller has one PF and multiple VFs, each function
@@ -20,24 +14,100 @@ Description: One SEC controller has one PF and multiple VFs, each function
qm refers to.
Only available for PF.
-What: /sys/kernel/debug/hisi_sec/<bdf>/qm/qm_regs
+What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/qm_regs
Date: Oct 2019
Contact: linux-crypto@vger.kernel.org
Description: Dump of QM related debug registers.
Available for PF and VF in host. VF in guest currently only
has one debug register.
-What: /sys/kernel/debug/hisi_sec/<bdf>/qm/current_q
+What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/current_q
Date: Oct 2019
Contact: linux-crypto@vger.kernel.org
Description: One QM of SEC may contain multiple queues. Select specific
- queue to show its debug registers in above 'qm_regs'.
+ queue to show its debug registers in above 'regs'.
Only available for PF.
-What: /sys/kernel/debug/hisi_sec/<bdf>/qm/clear_enable
+What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/clear_enable
Date: Oct 2019
Contact: linux-crypto@vger.kernel.org
Description: Enabling/disabling of clear action after reading
the SEC's QM debug registers.
0: disable, 1: enable.
Only available for PF, and take no other effect on SEC.
+
+What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/err_irq
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the number of invalid interrupts for
+ QM task completion.
+ Available for both PF and VF, and take no other effect on SEC.
+
+What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/aeq_irq
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the number of QM async event queue interrupts.
+ Available for both PF and VF, and take no other effect on SEC.
+
+What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/abnormal_irq
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the number of interrupts for QM abnormal event.
+ Available for both PF and VF, and take no other effect on SEC.
+
+What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/create_qp_err
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the number of queue allocation errors.
+ Available for both PF and VF, and take no other effect on SEC.
+
+What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/mb_err
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the number of failed QM mailbox commands.
+ Available for both PF and VF, and take no other effect on SEC.
+
+What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/status
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the status of the QM.
+ Four states: initiated, started, stopped and closed.
+ Available for both PF and VF, and take no other effect on SEC.
+
+What: /sys/kernel/debug/hisi_sec2/<bdf>/sec_dfx/send_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of sent requests.
+ Available for both PF and VF, and take no other effect on SEC.
+
+What: /sys/kernel/debug/hisi_sec2/<bdf>/sec_dfx/recv_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of received requests.
+ Available for both PF and VF, and take no other effect on SEC.
+
+What: /sys/kernel/debug/hisi_sec2/<bdf>/sec_dfx/send_busy_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of requests sent with returning busy.
+ Available for both PF and VF, and take no other effect on SEC.
+
+What: /sys/kernel/debug/hisi_sec2/<bdf>/sec_dfx/err_bd_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of BD type error requests
+ to be received.
+ Available for both PF and VF, and take no other effect on SEC.
+
+What: /sys/kernel/debug/hisi_sec2/<bdf>/sec_dfx/invalid_req_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of invalid requests being received.
+ Available for both PF and VF, and take no other effect on SEC.
+
+What: /sys/kernel/debug/hisi_sec2/<bdf>/sec_dfx/done_flag_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of completed but marked error requests
+ to be received.
+ Available for both PF and VF, and take no other effect on SEC.
diff --git a/Documentation/ABI/testing/debugfs-hisi-zip b/Documentation/ABI/testing/debugfs-hisi-zip
index a7c63e6c4bc3..3034a2bf99ca 100644
--- a/Documentation/ABI/testing/debugfs-hisi-zip
+++ b/Documentation/ABI/testing/debugfs-hisi-zip
@@ -26,7 +26,7 @@ Description: One ZIP controller has one PF and multiple VFs, each function
has a QM. Select the QM which below qm refers to.
Only available for PF.
-What: /sys/kernel/debug/hisi_zip/<bdf>/qm/qm_regs
+What: /sys/kernel/debug/hisi_zip/<bdf>/qm/regs
Date: Nov 2018
Contact: linux-crypto@vger.kernel.org
Description: Dump of QM related debug registers.
@@ -37,14 +37,78 @@ What: /sys/kernel/debug/hisi_zip/<bdf>/qm/current_q
Date: Nov 2018
Contact: linux-crypto@vger.kernel.org
Description: One QM may contain multiple queues. Select specific queue to
- show its debug registers in above qm_regs.
+ show its debug registers in above regs.
Only available for PF.
What: /sys/kernel/debug/hisi_zip/<bdf>/qm/clear_enable
Date: Nov 2018
Contact: linux-crypto@vger.kernel.org
-Description: QM debug registers(qm_regs) read clear control. 1 means enable
+Description: QM debug registers(regs) read clear control. 1 means enable
register read clear, otherwise 0.
Writing to this file has no functional effect, only enable or
disable counters clear after reading of these registers.
Only available for PF.
+
+What: /sys/kernel/debug/hisi_zip/<bdf>/qm/err_irq
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the number of invalid interrupts for
+ QM task completion.
+ Available for both PF and VF, and take no other effect on ZIP.
+
+What: /sys/kernel/debug/hisi_zip/<bdf>/qm/aeq_irq
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the number of QM async event queue interrupts.
+ Available for both PF and VF, and take no other effect on ZIP.
+
+What: /sys/kernel/debug/hisi_zip/<bdf>/qm/abnormal_irq
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the number of interrupts for QM abnormal event.
+ Available for both PF and VF, and take no other effect on ZIP.
+
+What: /sys/kernel/debug/hisi_zip/<bdf>/qm/create_qp_err
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the number of queue allocation errors.
+ Available for both PF and VF, and take no other effect on ZIP.
+
+What: /sys/kernel/debug/hisi_zip/<bdf>/qm/mb_err
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the number of failed QM mailbox commands.
+ Available for both PF and VF, and take no other effect on ZIP.
+
+What: /sys/kernel/debug/hisi_zip/<bdf>/qm/status
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the status of the QM.
+ Four states: initiated, started, stopped and closed.
+ Available for both PF and VF, and take no other effect on ZIP.
+
+What: /sys/kernel/debug/hisi_zip/<bdf>/zip_dfx/send_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of sent requests.
+ Available for both PF and VF, and take no other effect on ZIP.
+
+What: /sys/kernel/debug/hisi_zip/<bdf>/zip_dfx/recv_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of received requests.
+ Available for both PF and VF, and take no other effect on ZIP.
+
+What: /sys/kernel/debug/hisi_zip/<bdf>/zip_dfx/send_busy_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of requests received
+ with returning busy.
+ Available for both PF and VF, and take no other effect on ZIP.
+
+What: /sys/kernel/debug/hisi_zip/<bdf>/zip_dfx/err_bd_cnt
+Date: Apr 2020
+Contact: linux-crypto@vger.kernel.org
+Description: Dump the total number of BD type error requests
+ to be received.
+ Available for both PF and VF, and take no other effect on ZIP.
diff --git a/Documentation/ABI/testing/dev-kmsg b/Documentation/ABI/testing/dev-kmsg
index f307506eb54c..1e6c28b1942b 100644
--- a/Documentation/ABI/testing/dev-kmsg
+++ b/Documentation/ABI/testing/dev-kmsg
@@ -56,6 +56,11 @@ Description: The /dev/kmsg character device node provides userspace access
seek after the last record available at the time
the last SYSLOG_ACTION_CLEAR was issued.
+ Due to the record nature of this interface with a "read all"
+ behavior and the specific positions each seek operation sets,
+ SEEK_CUR is not supported, returning -ESPIPE (invalid seek) to
+ errno whenever requested.
+
The output format consists of a prefix carrying the syslog
prefix including priority and facility, the 64 bit message
sequence number and the monotonic timestamp in microseconds,
diff --git a/Documentation/ABI/testing/procfs-smaps_rollup b/Documentation/ABI/testing/procfs-smaps_rollup
index 274df44d8b1b..046978193368 100644
--- a/Documentation/ABI/testing/procfs-smaps_rollup
+++ b/Documentation/ABI/testing/procfs-smaps_rollup
@@ -11,7 +11,7 @@ Description:
Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem
are not present in /proc/pid/smaps. These fields represent
the sum of the Pss field of each type (anon, file, shmem).
- For more details, see Documentation/filesystems/proc.txt
+ For more details, see Documentation/filesystems/proc.rst
and the procfs man page.
Typical output looks like this:
diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net
index 664a8f6a634f..3b404577f380 100644
--- a/Documentation/ABI/testing/sysfs-class-net
+++ b/Documentation/ABI/testing/sysfs-class-net
@@ -124,6 +124,19 @@ Description:
authentication is performed (e.g: 802.1x). 'link_mode' attribute
will also reflect the dormant state.
+What: /sys/class/net/<iface>/testing
+Date: April 2002
+KernelVersion: 5.8
+Contact: netdev@vger.kernel.org
+Description:
+ Indicates whether the interface is under test. Possible
+ values are:
+ 0: interface is not being tested
+ 1: interface is being tested
+
+ When an interface is under test, it cannot be expected
+ to pass packets as normal.
+
What: /sys/clas/net/<iface>/duplex
Date: October 2009
KernelVersion: 2.6.33
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 2e0e3b45d02a..6b5dafab950c 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -106,10 +106,10 @@ Description: CPU topology files that describe a logical CPU's relationship
See Documentation/admin-guide/cputopology.rst for more information.
-What: /sys/devices/system/cpu/cpuidle/current_driver
- /sys/devices/system/cpu/cpuidle/current_governer_ro
- /sys/devices/system/cpu/cpuidle/available_governors
+What: /sys/devices/system/cpu/cpuidle/available_governors
+ /sys/devices/system/cpu/cpuidle/current_driver
/sys/devices/system/cpu/cpuidle/current_governor
+ /sys/devices/system/cpu/cpuidle/current_governer_ro
Date: September 2007
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Description: Discover cpuidle policy and mechanism
@@ -119,24 +119,18 @@ Description: Discover cpuidle policy and mechanism
consumption during idle.
Idle policy (governor) is differentiated from idle mechanism
- (driver)
-
- current_driver: (RO) displays current idle mechanism
-
- current_governor_ro: (RO) displays current idle policy
-
- With the cpuidle_sysfs_switch boot option enabled (meant for
- developer testing), the following three attributes are visible
- instead:
-
- current_driver: same as described above
+ (driver).
available_governors: (RO) displays a space separated list of
- available governors
+ available governors.
+
+ current_driver: (RO) displays current idle mechanism.
current_governor: (RW) displays current idle policy. Users can
switch the governor at runtime by writing to this file.
+ current_governor_ro: (RO) displays current idle policy.
+
See Documentation/admin-guide/pm/cpuidle.rst and
Documentation/driver-api/pm/cpuidle.rst for more information.
diff --git a/Documentation/ABI/testing/sysfs-platform-dptf b/Documentation/ABI/testing/sysfs-platform-dptf
index 325dc0667dbb..eeed81ca6949 100644
--- a/Documentation/ABI/testing/sysfs-platform-dptf
+++ b/Documentation/ABI/testing/sysfs-platform-dptf
@@ -27,10 +27,12 @@ KernelVersion: v4.10
Contact: linux-acpi@vger.kernel.org
Description:
(RO) Display the platform power source
- 0x00 = DC
- 0x01 = AC
- 0x02 = USB
- 0x03 = Wireless Charger
+ bits[3:0] Current power source
+ 0x00 = DC
+ 0x01 = AC
+ 0x02 = USB
+ 0x03 = Wireless Charger
+ bits[7:4] Power source sequence number
What: /sys/bus/platform/devices/INT3407:00/dptf_power/battery_steady_power
Date: Jul, 2016
@@ -38,3 +40,55 @@ KernelVersion: v4.10
Contact: linux-acpi@vger.kernel.org
Description:
(RO) The maximum sustained power for battery in milliwatts.
+
+What: /sys/bus/platform/devices/INT3407:00/dptf_power/rest_of_platform_power_mw
+Date: June, 2020
+KernelVersion: v5.8
+Contact: linux-acpi@vger.kernel.org
+Description:
+ (RO) Shows the rest (outside of SoC) of worst-case platform power.
+
+What: /sys/bus/platform/devices/INT3407:00/dptf_power/prochot_confirm
+Date: June, 2020
+KernelVersion: v5.8
+Contact: linux-acpi@vger.kernel.org
+Description:
+ (WO) Confirm embedded controller about a prochot notification.
+
+What: /sys/bus/platform/devices/INT3532:00/dptf_battery/max_platform_power_mw
+Date: June, 2020
+KernelVersion: v5.8
+Contact: linux-acpi@vger.kernel.org
+Description:
+ (RO) The maximum platform power that can be supported by the battery in milli watts.
+
+What: /sys/bus/platform/devices/INT3532:00/dptf_battery/max_steady_state_power_mw
+Date: June, 2020
+KernelVersion: v5.8
+Contact: linux-acpi@vger.kernel.org
+Description:
+ (RO) The maximum sustained power for battery in milli watts.
+
+What: /sys/bus/platform/devices/INT3532:00/dptf_battery/high_freq_impedance_mohm
+Date: June, 2020
+KernelVersion: v5.8
+Contact: linux-acpi@vger.kernel.org
+Description:
+ (RO) The high frequency impedance value that can be obtained from battery
+ fuel gauge in milli Ohms.
+
+What: /sys/bus/platform/devices/INT3532:00/dptf_battery/no_load_voltage_mv
+Date: June, 2020
+KernelVersion: v5.8
+Contact: linux-acpi@vger.kernel.org
+Description:
+ (RO) The no-load voltage that can be obtained from battery fuel gauge in
+ milli volts.
+
+What: /sys/bus/platform/devices/INT3532:00/dptf_battery/current_discharge_capbility_ma
+Date: June, 2020
+KernelVersion: v5.8
+Contact: linux-acpi@vger.kernel.org
+Description:
+ (RO) The battery discharge current capability obtained from battery fuel gauge in
+ milli Amps.
diff --git a/Documentation/ABI/testing/sysfs-platform-intel-wmi-sbl-fw-update b/Documentation/ABI/testing/sysfs-platform-intel-wmi-sbl-fw-update
new file mode 100644
index 000000000000..5aa618987cad
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-platform-intel-wmi-sbl-fw-update
@@ -0,0 +1,12 @@
+What: /sys/bus/wmi/devices/44FADEB1-B204-40F2-8581-394BBDC1B651/firmware_update_request
+Date: April 2020
+KernelVersion: 5.7
+Contact: "Jithu Joseph" <jithu.joseph@intel.com>
+Description:
+ Allow user space entities to trigger update of Slim
+ Bootloader (SBL). This attribute normally has a value
+ of 0 and userspace can signal SBL to update firmware,
+ on next reboot, by writing a value of 1.
+ There are two available states:
+ * 0 -> Skip firmware update while rebooting
+ * 1 -> Attempt firmware update on next reboot
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 7e12d10ca8e0..6b12dd82f712 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -98,7 +98,11 @@ else # HAVE_PDFLATEX
pdfdocs: latexdocs
@$(srctree)/scripts/sphinx-pre-install --version-check
- $(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;)
+ $(foreach var,$(SPHINXDIRS), \
+ $(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit; \
+ mkdir -p $(BUILDDIR)/$(var)/pdf; \
+ mv $(subst .tex,.pdf,$(wildcard $(BUILDDIR)/$(var)/latex/*.tex)) $(BUILDDIR)/$(var)/pdf/; \
+ )
endif # HAVE_PDFLATEX
diff --git a/Documentation/PCI/boot-interrupts.rst b/Documentation/PCI/boot-interrupts.rst
index d078ef3eb192..2ec70121bfca 100644
--- a/Documentation/PCI/boot-interrupts.rst
+++ b/Documentation/PCI/boot-interrupts.rst
@@ -32,12 +32,13 @@ interrupt goes unhandled over time, they are tracked by the Linux kernel as
Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it
reaches a specific count with the error "nobody cared". This disabled IRQ
now prevents valid usage by an existing interrupt which may happen to share
-the IRQ line.
+the IRQ line::
irq 19: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1
Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020
Call Trace:
+
<IRQ>
? dump_stack+0x46/0x5e
? __report_bad_irq+0x2e/0xb0
@@ -85,15 +86,18 @@ Mitigations
The mitigations take the form of PCI quirks. The preference has been to
first identify and make use of a means to disable the routing to the PCH.
In such a case a quirk to disable boot interrupt generation can be
-added.[1]
+added. [1]_
- Intel® 6300ESB I/O Controller Hub
+Intel® 6300ESB I/O Controller Hub
Alternate Base Address Register:
BIE: Boot Interrupt Enable
- 0 = Boot interrupt is enabled.
- 1 = Boot interrupt is disabled.
- Intel® Sandy Bridge through Sky Lake based Xeon servers:
+ == ===========================
+ 0 Boot interrupt is enabled.
+ 1 Boot interrupt is disabled.
+ == ===========================
+
+Intel® Sandy Bridge through Sky Lake based Xeon servers:
Coherent Interface Protocol Interrupt Control
dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2:
When this bit is set. Local INTx messages received from the
@@ -109,12 +113,12 @@ line by default. Therefore, on chipsets where this INTx routing cannot be
disabled, the Linux kernel will reroute the valid interrupt to its legacy
interrupt. This redirection of the handler will prevent the occurrence of
the spurious interrupt detection which would ordinarily disable the IRQ
-line due to excessive unhandled counts.[2]
+line due to excessive unhandled counts. [2]_
The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or
disable) the redirection of the interrupt handler to the PCH interrupt
line. The option can be overridden by either pci=ioapicreroute or
-pci=noioapicreroute.[3]
+pci=noioapicreroute. [3]_
More Documentation
@@ -127,19 +131,19 @@ into the evolution of its handling with chipsets.
Example of disabling of the boot interrupt
------------------------------------------
-Intel® 6300ESB I/O Controller Hub (Document # 300641-004US)
+ - Intel® 6300ESB I/O Controller Hub (Document # 300641-004US)
5.7.3 Boot Interrupt
https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf
-Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families
-Datasheet - Volume 2: Registers (Document # 330784-003)
+ - Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families
+ Datasheet - Volume 2: Registers (Document # 330784-003)
6.6.41 cipintrc Coherent Interface Protocol Interrupt Control
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
Example of handler rerouting
----------------------------
-Intel® 6700PXH 64-bit PCI Hub (Document # 302628)
+ - Intel® 6700PXH 64-bit PCI Hub (Document # 302628)
2.15.2 PCI Express Legacy INTx Support and Boot Interrupt
https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf
@@ -150,6 +154,6 @@ Cheers,
Sean V Kelley
sean.v.kelley@linux.intel.com
-[1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/
-[2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/
-[3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/
+.. [1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/
+.. [2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/
+.. [3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/
diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index fd5e2cbc4935..75b8ca007a11 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -1943,56 +1943,27 @@ invoked from a CPU-hotplug notifier.
Scheduler and RCU
~~~~~~~~~~~~~~~~~
-RCU depends on the scheduler, and the scheduler uses RCU to protect some
-of its data structures. The preemptible-RCU ``rcu_read_unlock()``
-implementation must therefore be written carefully to avoid deadlocks
-involving the scheduler's runqueue and priority-inheritance locks. In
-particular, ``rcu_read_unlock()`` must tolerate an interrupt where the
-interrupt handler invokes both ``rcu_read_lock()`` and
-``rcu_read_unlock()``. This possibility requires ``rcu_read_unlock()``
-to use negative nesting levels to avoid destructive recursion via
-interrupt handler's use of RCU.
-
-This scheduler-RCU requirement came as a `complete
-surprise <https://lwn.net/Articles/453002/>`__.
-
-As noted above, RCU makes use of kthreads, and it is necessary to avoid
-excessive CPU-time accumulation by these kthreads. This requirement was
-no surprise, but RCU's violation of it when running context-switch-heavy
-workloads when built with ``CONFIG_NO_HZ_FULL=y`` `did come as a
-surprise
+RCU makes use of kthreads, and it is necessary to avoid excessive CPU-time
+accumulation by these kthreads. This requirement was no surprise, but
+RCU's violation of it when running context-switch-heavy workloads when
+built with ``CONFIG_NO_HZ_FULL=y`` `did come as a surprise
[PDF] <http://www.rdrop.com/users/paulmck/scalability/paper/BareMetal.2015.01.15b.pdf>`__.
RCU has made good progress towards meeting this requirement, even for
context-switch-heavy ``CONFIG_NO_HZ_FULL=y`` workloads, but there is
room for further improvement.
-It is forbidden to hold any of scheduler's runqueue or
-priority-inheritance spinlocks across an ``rcu_read_unlock()`` unless
-interrupts have been disabled across the entire RCU read-side critical
-section, that is, up to and including the matching ``rcu_read_lock()``.
-Violating this restriction can result in deadlocks involving these
-scheduler spinlocks. There was hope that this restriction might be
-lifted when interrupt-disabled calls to ``rcu_read_unlock()`` started
-deferring the reporting of the resulting RCU-preempt quiescent state
-until the end of the corresponding interrupts-disabled region.
-Unfortunately, timely reporting of the corresponding quiescent state to
-expedited grace periods requires a call to ``raise_softirq()``, which
-can acquire these scheduler spinlocks. In addition, real-time systems
-using RCU priority boosting need this restriction to remain in effect
-because deferred quiescent-state reporting would also defer deboosting,
-which in turn would degrade real-time latencies.
-
-In theory, if a given RCU read-side critical section could be guaranteed
-to be less than one second in duration, holding a scheduler spinlock
-across that critical section's ``rcu_read_unlock()`` would require only
-that preemption be disabled across the entire RCU read-side critical
-section, not interrupts. Unfortunately, given the possibility of vCPU
-preemption, long-running interrupts, and so on, it is not possible in
-practice to guarantee that a given RCU read-side critical section will
-complete in less than one second. Therefore, as noted above, if
-scheduler spinlocks are held across a given call to
-``rcu_read_unlock()``, interrupts must be disabled across the entire RCU
-read-side critical section.
+There is no longer any prohibition against holding any of
+scheduler's runqueue or priority-inheritance spinlocks across an
+``rcu_read_unlock()``, even if interrupts and preemption were enabled
+somewhere within the corresponding RCU read-side critical section.
+Therefore, it is now perfectly legal to execute ``rcu_read_lock()``
+with preemption enabled, acquire one of the scheduler locks, and hold
+that lock across the matching ``rcu_read_unlock()``.
+
+Similarly, the RCU flavor consolidation has removed the need for negative
+nesting. The fact that interrupt-disabled regions of code act as RCU
+read-side critical sections implicitly avoids earlier issues that used
+to result in destructive recursion via interrupt handler's use of RCU.
Tracing and RCU
~~~~~~~~~~~~~~~
diff --git a/Documentation/admin-guide/acpi/ssdt-overlays.rst b/Documentation/admin-guide/acpi/ssdt-overlays.rst
index da37455f96c9..5d7e25988085 100644
--- a/Documentation/admin-guide/acpi/ssdt-overlays.rst
+++ b/Documentation/admin-guide/acpi/ssdt-overlays.rst
@@ -63,7 +63,7 @@ which can then be compiled to AML binary format::
ASL Input: minnomax.asl - 30 lines, 614 bytes, 7 keywords
AML Output: minnowmax.aml - 165 bytes, 6 named objects, 1 executable opcodes
-[1] http://wiki.minnowboard.org/MinnowBoard_MAX#Low_Speed_Expansion_Connector_.28Top.29
+[1] https://www.elinux.org/Minnowboard:MinnowMax#Low_Speed_Expansion_.28Top.29
The resulting AML code can then be loaded by the kernel using one of the methods
below.
diff --git a/Documentation/admin-guide/bug-hunting.rst b/Documentation/admin-guide/bug-hunting.rst
index 44b8a4edd348..f7c80f4649fc 100644
--- a/Documentation/admin-guide/bug-hunting.rst
+++ b/Documentation/admin-guide/bug-hunting.rst
@@ -49,15 +49,19 @@ the issue, it may also contain the word **Oops**, as on this one::
Despite being an **Oops** or some other sort of stack trace, the offended
line is usually required to identify and handle the bug. Along this chapter,
-we'll refer to "Oops" for all kinds of stack traces that need to be analized.
+we'll refer to "Oops" for all kinds of stack traces that need to be analyzed.
-.. note::
+If the kernel is compiled with ``CONFIG_DEBUG_INFO``, you can enhance the
+quality of the stack trace by using file:`scripts/decode_stacktrace.sh`.
+
+Modules linked in
+-----------------
+
+Modules that are tainted or are being loaded or unloaded are marked with
+"(...)", where the taint flags are described in
+file:`Documentation/admin-guide/tainted-kernels.rst`, "being loaded" is
+annotated with "+", and "being unloaded" is annotated with "-".
- ``ksymoops`` is useless on 2.6 or upper. Please use the Oops in its original
- format (from ``dmesg``, etc). Ignore any references in this or other docs to
- "decoding the Oops" or "running it through ksymoops".
- If you post an Oops from 2.6+ that has been run through ``ksymoops``,
- people will just tell you to repost it.
Where is the Oops message is located?
-------------------------------------
@@ -71,7 +75,7 @@ by running ``journalctl`` command.
Sometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to
read the data from the kernel buffers and save it. Or you can
``cat /proc/kmsg > file``, however you have to break in to stop the transfer,
-``kmsg`` is a "never ending file".
+since ``kmsg`` is a "never ending file".
If the machine has crashed so badly that you cannot enter commands or
the disk is not available then you have three options:
@@ -81,9 +85,9 @@ the disk is not available then you have three options:
planned for a crash. Alternatively, you can take a picture of
the screen with a digital camera - not nice, but better than
nothing. If the messages scroll off the top of the console, you
- may find that booting with a higher resolution (eg, ``vga=791``)
+ may find that booting with a higher resolution (e.g., ``vga=791``)
will allow you to read more of the text. (Caveat: This needs ``vesafb``,
- so won't help for 'early' oopses)
+ so won't help for 'early' oopses.)
(2) Boot with a serial console (see
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`),
@@ -104,7 +108,7 @@ Kernel source file. There are two methods for doing that. Usually, using
gdb
^^^
-The GNU debug (``gdb``) is the best way to figure out the exact file and line
+The GNU debugger (``gdb``) is the best way to figure out the exact file and line
number of the OOPS from the ``vmlinux`` file.
The usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``.
@@ -165,7 +169,7 @@ If you have a call trace, such as::
[<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
...
-this shows the problem likely in the :jbd: module. You can load that module
+this shows the problem likely is in the :jbd: module. You can load that module
in gdb and list the relevant code::
$ gdb fs/jbd/jbd.ko
@@ -199,8 +203,9 @@ in the kernel hacking menu of the menu configuration.) For example::
You need to be at the top level of the kernel tree for this to pick up
your C files.
-If you don't have access to the code you can also debug on some crash dumps
-e.g. crash dump output as shown by Dave Miller::
+If you don't have access to the source code you can still debug some crash
+dumps using the following method (example crash dump output as shown by
+Dave Miller)::
EIP is at +0x14/0x4c0
...
@@ -230,6 +235,9 @@ e.g. crash dump output as shown by Dave Miller::
mov 0x8(%ebp), %ebx ! %ebx = skb->sk
mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
+file:`scripts/decodecode` can be used to automate most of this, depending
+on what CPU architecture is being debugged.
+
Reporting the bug
-----------------
@@ -241,7 +249,7 @@ used for the development of the affected code. This can be done by using
the ``get_maintainer.pl`` script.
For example, if you find a bug at the gspca's sonixj.c file, you can get
-their maintainers with::
+its maintainers with::
$ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c
Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%)
@@ -253,16 +261,17 @@ their maintainers with::
Please notice that it will point to:
-- The last developers that touched on the source code. On the above example,
- Tejun and Bhaktipriya (in this specific case, none really envolved on the
- development of this file);
+- The last developers that touched the source code (if this is done inside
+ a git tree). On the above example, Tejun and Bhaktipriya (in this
+ specific case, none really envolved on the development of this file);
- The driver maintainer (Hans Verkuil);
- The subsystem maintainer (Mauro Carvalho Chehab);
- The driver and/or subsystem mailing list (linux-media@vger.kernel.org);
- the Linux Kernel mailing list (linux-kernel@vger.kernel.org).
Usually, the fastest way to have your bug fixed is to report it to mailing
-list used for the development of the code (linux-media ML) copying the driver maintainer (Hans).
+list used for the development of the code (linux-media ML) copying the
+driver maintainer (Hans).
If you are totally stumped as to whom to send the report, and
``get_maintainer.pl`` didn't provide you anything useful, send it to
@@ -303,9 +312,9 @@ protection fault message can be simply cut out of the message files
and forwarded to the kernel developers.
Two types of address resolution are performed by ``klogd``. The first is
-static translation and the second is dynamic translation. Static
-translation uses the System.map file in much the same manner that
-ksymoops does. In order to do static translation the ``klogd`` daemon
+static translation and the second is dynamic translation.
+Static translation uses the System.map file.
+In order to do static translation the ``klogd`` daemon
must be able to find a system map file at daemon initialization time.
See the klogd man page for information on how ``klogd`` searches for map
files.
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 0ae4f564c2d6..12757e63b26c 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -199,11 +199,11 @@ An RSS page is unaccounted when it's fully unmapped. A PageCache page is
unaccounted when it's removed from radix-tree. Even if RSS pages are fully
unmapped (by kswapd), they may exist as SwapCache in the system until they
are really freed. Such SwapCaches are also accounted.
-A swapped-in page is not accounted until it's mapped.
+A swapped-in page is accounted after adding into swapcache.
Note: The kernel does swapin-readahead and reads multiple swaps at once.
-This means swapped-in pages may contain pages for other tasks than a task
-causing page fault. So, we avoid accounting at swap-in I/O.
+Since page's memcg recorded into swap whatever memsw enabled, the page will
+be accounted after swapin.
At page migration, accounting information is kept.
@@ -222,18 +222,13 @@ the cgroup that brought it in -- this will happen on memory pressure).
But see section 8.2: when moving a task to another cgroup, its pages may
be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
-Exception: If CONFIG_MEMCG_SWAP is not used.
-When you do swapoff and make swapped-out pages of shmem(tmpfs) to
-be backed into memory in force, charges for pages are accounted against the
-caller of swapoff rather than the users of shmem.
-
-2.4 Swap Extension (CONFIG_MEMCG_SWAP)
+2.4 Swap Extension
--------------------------------------
-Swap Extension allows you to record charge for swap. A swapped-in page is
-charged back to original page allocator if possible.
+Swap usage is always recorded for each of cgroup. Swap Extension allows you to
+read and limit it.
-When swap is accounted, following files are added.
+When CONFIG_SWAP is enabled, following files are added.
- memory.memsw.usage_in_bytes.
- memory.memsw.limit_in_bytes.
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index bcc80269bb6a..b8c0460730f3 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1329,6 +1329,10 @@ PAGE_SIZE multiple when read back.
workingset_activate
Number of refaulted pages that were immediately activated
+ workingset_restore
+ Number of restored pages which have been detected as an active
+ workingset before they got reclaimed.
+
workingset_nodereclaim
Number of times a shadow node has been reclaimed
@@ -1370,6 +1374,22 @@ PAGE_SIZE multiple when read back.
The total amount of swap currently being used by the cgroup
and its descendants.
+ memory.swap.high
+ A read-write single value file which exists on non-root
+ cgroups. The default is "max".
+
+ Swap usage throttle limit. If a cgroup's swap usage exceeds
+ this limit, all its further allocations will be throttled to
+ allow userspace to implement custom out-of-memory procedures.
+
+ This limit marks a point of no return for the cgroup. It is NOT
+ designed to manage the amount of swapping a workload does
+ during regular operation. Compare to memory.swap.max, which
+ prohibits swapping past a set amount, but lets the cgroup
+ continue unimpeded as long as other memory can be reclaimed.
+
+ Healthy workloads are not expected to reach this limit.
+
memory.swap.max
A read-write single value file which exists on non-root
cgroups. The default is "max".
@@ -1383,6 +1403,10 @@ PAGE_SIZE multiple when read back.
otherwise, a value change in this file generates a file
modified event.
+ high
+ The number of times the cgroup's swap usage was over
+ the high threshold.
+
max
The number of times the cgroup's swap usage was about
to go over the max boundary and swap allocation
diff --git a/Documentation/admin-guide/cpu-load.rst b/Documentation/admin-guide/cpu-load.rst
index 2d01ce43d2a2..ebdecf864080 100644
--- a/Documentation/admin-guide/cpu-load.rst
+++ b/Documentation/admin-guide/cpu-load.rst
@@ -105,7 +105,7 @@ References
----------
- http://lkml.org/lkml/2007/2/12/6
-- Documentation/filesystems/proc.txt (1.8)
+- Documentation/filesystems/proc.rst (1.8)
Thanks
diff --git a/Documentation/admin-guide/device-mapper/dm-integrity.rst b/Documentation/admin-guide/device-mapper/dm-integrity.rst
index c00f9f11e3f3..8439d2ae689b 100644
--- a/Documentation/admin-guide/device-mapper/dm-integrity.rst
+++ b/Documentation/admin-guide/device-mapper/dm-integrity.rst
@@ -182,12 +182,15 @@ fix_padding
space-efficient. If this option is not present, large padding is
used - that is for compatibility with older kernels.
-
-The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can
-be changed when reloading the target (load an inactive table and swap the
-tables with suspend and resume). The other arguments should not be changed
-when reloading the target because the layout of disk data depend on them
-and the reloaded target would be non-functional.
+allow_discards
+ Allow block discard requests (a.k.a. TRIM) for the integrity device.
+ Discards are only allowed to devices using internal hash.
+
+The journal mode (D/J), buffer_sectors, journal_watermark, commit_time and
+allow_discards can be changed when reloading the target (load an inactive
+table and swap the tables with suspend and resume). The other arguments
+should not be changed when reloading the target because the layout of disk
+data depend on them and the reloaded target would be non-functional.
The layout of the formatted block device:
diff --git a/Documentation/admin-guide/hw-vuln/l1tf.rst b/Documentation/admin-guide/hw-vuln/l1tf.rst
index f83212fae4d5..3eeeb488d955 100644
--- a/Documentation/admin-guide/hw-vuln/l1tf.rst
+++ b/Documentation/admin-guide/hw-vuln/l1tf.rst
@@ -268,7 +268,7 @@ Guest mitigation mechanisms
/proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
available at:
- https://www.kernel.org/doc/Documentation/IRQ-affinity.txt
+ https://www.kernel.org/doc/Documentation/core-api/irq/irq-affinity.rst
.. _smt_control:
diff --git a/Documentation/admin-guide/init.rst b/Documentation/admin-guide/init.rst
index e89d97f31eaf..41f06a09152e 100644
--- a/Documentation/admin-guide/init.rst
+++ b/Documentation/admin-guide/init.rst
@@ -1,52 +1,48 @@
-Explaining the dreaded "No init found." boot hang message
+Explaining the "No working init found." boot hang message
=========================================================
+:Authors: Andreas Mohr <andi at lisas period de>
+ Cristian Souza <cristianmsbr at gmail period com>
-OK, so you've got this pretty unintuitive message (currently located
-in init/main.c) and are wondering what the H*** went wrong.
-Some high-level reasons for failure (listed roughly in order of execution)
-to load the init binary are:
-
-A) Unable to mount root FS
-B) init binary doesn't exist on rootfs
-C) broken console device
-D) binary exists but dependencies not available
-E) binary cannot be loaded
-
-Detailed explanations:
-
-A) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE)
- to get more detailed kernel messages.
-B) make sure you have the correct root FS type
- (and ``root=`` kernel parameter points to the correct partition),
- required drivers such as storage hardware (such as SCSI or USB!)
- and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules,
- to be pre-loaded by an initrd)
-C) Possibly a conflict in ``console= setup`` --> initial console unavailable.
- E.g. some serial consoles are unreliable due to serial IRQ issues (e.g.
- missing interrupt-based configuration).
+This document provides some high-level reasons for failure
+(listed roughly in order of execution) to load the init binary.
+
+1) **Unable to mount root FS**: Set "debug" kernel parameter (in bootloader
+ config file or CONFIG_CMDLINE) to get more detailed kernel messages.
+
+2) **init binary doesn't exist on rootfs**: Make sure you have the correct
+ root FS type (and ``root=`` kernel parameter points to the correct
+ partition), required drivers such as storage hardware (such as SCSI or
+ USB!) and filesystem (ext3, jffs2, etc.) are builtin (alternatively as
+ modules, to be pre-loaded by an initrd).
+
+3) **Broken console device**: Possibly a conflict in ``console= setup``
+ --> initial console unavailable. E.g. some serial consoles are unreliable
+ due to serial IRQ issues (e.g. missing interrupt-based configuration).
Try using a different ``console= device`` or e.g. ``netconsole=``.
-D) e.g. required library dependencies of the init binary such as
- ``/lib/ld-linux.so.2`` missing or broken. Use
- ``readelf -d <INIT>|grep NEEDED`` to find out which libraries are required.
-E) make sure the binary's architecture matches your hardware.
- E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware.
- In case you tried loading a non-binary file here (shell script?),
- you should make sure that the script specifies an interpreter in its shebang
- header line (``#!/...``) that is fully working (including its library
- dependencies). And before tackling scripts, better first test a simple
- non-script binary such as ``/bin/sh`` and confirm its successful execution.
- To find out more, add code ``to init/main.c`` to display kernel_execve()s
- return values.
+
+4) **Binary exists but dependencies not available**: E.g. required library
+ dependencies of the init binary such as ``/lib/ld-linux.so.2`` missing or
+ broken. Use ``readelf -d <INIT>|grep NEEDED`` to find out which libraries
+ are required.
+
+5) **Binary cannot be loaded**: Make sure the binary's architecture matches
+ your hardware. E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM
+ hardware. In case you tried loading a non-binary file here (shell script?),
+ you should make sure that the script specifies an interpreter in its
+ shebang header line (``#!/...``) that is fully working (including its
+ library dependencies). And before tackling scripts, better first test a
+ simple non-script binary such as ``/bin/sh`` and confirm its successful
+ execution. To find out more, add code ``to init/main.c`` to display
+ kernel_execve()s return values.
Please extend this explanation whenever you find new failure causes
(after all loading the init binary is a CRITICAL and hard transition step
-which needs to be made as painless as possible), then submit patch to LKML.
+which needs to be made as painless as possible), then submit a patch to LKML.
Further TODOs:
- Implement the various ``run_init_process()`` invocations via a struct array
which can then store the ``kernel_execve()`` result value and on failure
log it all by iterating over **all** results (very important usability fix).
-- try to make the implementation itself more helpful in general,
- e.g. by providing additional error messages at affected places.
+- Try to make the implementation itself more helpful in general, e.g. by
+ providing additional error messages at affected places.
-Andreas Mohr <andi at lisas period de>
diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index 007a6b86e0ee..e4ee8b2db604 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -393,6 +393,12 @@ KERNELOFFSET
The kernel randomization offset. Used to compute the page offset. If
KASLR is disabled, this value is zero.
+KERNELPACMASK
+-------------
+
+The mask to extract the Pointer Authentication Code from a kernel virtual
+address.
+
arm
===
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3ed0b42ca6f0..89386f6f3ab6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -356,7 +356,7 @@
shot down by NMI
autoconf= [IPV6]
- See Documentation/networking/ipv6.txt.
+ See Documentation/networking/ipv6.rst.
show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller
Limit apic dumping. The parameter defines the maximal
@@ -638,7 +638,7 @@
See Documentation/admin-guide/serial-console.rst for more
information. See
- Documentation/networking/netconsole.txt for an
+ Documentation/networking/netconsole.rst for an
alternative.
uart[8250],io,<addr>[,options]
@@ -831,15 +831,18 @@
decnet.addr= [HW,NET]
Format: <area>[,<node>]
- See also Documentation/networking/decnet.txt.
+ See also Documentation/networking/decnet.rst.
default_hugepagesz=
- [same as hugepagesz=] The size of the default
- HugeTLB page size. This is the size represented by
- the legacy /proc/ hugepages APIs, used for SHM, and
- default size when mounting hugetlbfs filesystems.
- Defaults to the default architecture's huge page size
- if not specified.
+ [HW] The size of the default HugeTLB page. This is
+ the size represented by the legacy /proc/ hugepages
+ APIs. In addition, this is the default hugetlb size
+ used for shmget(), mmap() and mounting hugetlbfs
+ filesystems. If not specified, defaults to the
+ architecture's default huge page size. Huge page
+ sizes are architecture dependent. See also
+ Documentation/admin-guide/mm/hugetlbpage.rst.
+ Format: size[KMG]
deferred_probe_timeout=
[KNL] Debugging option to set a timeout in seconds for
@@ -872,7 +875,7 @@
miss to occur.
disable= [IPV6]
- See Documentation/networking/ipv6.txt.
+ See Documentation/networking/ipv6.rst.
hardened_usercopy=
[KNL] Under CONFIG_HARDENED_USERCOPY, whether
@@ -912,7 +915,7 @@
to workaround buggy firmware.
disable_ipv6= [IPV6]
- See Documentation/networking/ipv6.txt.
+ See Documentation/networking/ipv6.rst.
disable_mtrr_cleanup [X86]
The kernel tries to adjust MTRR layout from continuous
@@ -1190,6 +1193,11 @@
This is designed to be used in conjunction with
the boot argument: earlyprintk=vga
+ This parameter works in place of the kgdboc parameter
+ but can only be used if the backing tty is available
+ very early in the boot process. For early debugging
+ via a serial port see kgdboc_earlycon instead.
+
edd= [EDD]
Format: {"off" | "on" | "skip[mbr]"}
@@ -1479,13 +1487,24 @@
hugepages using the cma allocator. If enabled, the
boot-time allocation of gigantic hugepages is skipped.
- hugepages= [HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
- hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
- On x86-64 and powerpc, this option can be specified
- multiple times interleaved with hugepages= to reserve
- huge pages of different sizes. Valid pages sizes on
- x86-64 are 2M (when the CPU supports "pse") and 1G
- (when the CPU supports the "pdpe1gb" cpuinfo flag).
+ hugepages= [HW] Number of HugeTLB pages to allocate at boot.
+ If this follows hugepagesz (below), it specifies
+ the number of pages of hugepagesz to be allocated.
+ If this is the first HugeTLB parameter on the command
+ line, it specifies the number of pages to allocate for
+ the default huge page size. See also
+ Documentation/admin-guide/mm/hugetlbpage.rst.
+ Format: <integer>
+
+ hugepagesz=
+ [HW] The size of the HugeTLB pages. This is used in
+ conjunction with hugepages (above) to allocate huge
+ pages of a specific size at boot. The pair
+ hugepagesz=X hugepages=Y can be specified once for
+ each supported huge page size. Huge page sizes are
+ architecture dependent. See also
+ Documentation/admin-guide/mm/hugetlbpage.rst.
+ Format: size[KMG]
hung_task_panic=
[KNL] Should the hung task detector generate panics.
@@ -1748,6 +1767,13 @@
initrd= [BOOT] Specify the location of the initial ramdisk
+ initrdmem= [KNL] Specify a physical address and size from which to
+ load the initrd. If an initrd is compiled in or
+ specified in the bootparams, it takes priority over this
+ setting.
+ Format: ss[KMG],nn[KMG]
+ Default is 0, 0
+
init_on_alloc= [MM] Fill newly allocated pages and heap objects with
zeroes.
Format: 0 | 1
@@ -2105,6 +2131,21 @@
kms, kbd format: kms,kbd
kms, kbd and serial format: kms,kbd,<ser_dev>[,baud]
+ kgdboc_earlycon= [KGDB,HW]
+ If the boot console provides the ability to read
+ characters and can work in polling mode, you can use
+ this parameter to tell kgdb to use it as a backend
+ until the normal console is registered. Intended to
+ be used together with the kgdboc parameter which
+ specifies the normal console to transition to.
+
+ The name of the early console should be specified
+ as the value of this parameter. Note that the name of
+ the early console might be different than the tty
+ name passed to kgdboc. It's OK to leave the value
+ blank and the first boot console that implements
+ read() will be picked.
+
kgdbwait [KGDB] Stop kernel execution and enter the
kernel debugger at the earliest opportunity.
@@ -3329,7 +3370,7 @@
See Documentation/admin-guide/sysctl/vm.rst for details.
ohci1394_dma=early [HW] enable debugging via the ohci1394 driver.
- See Documentation/debugging-via-ohci1394.txt for more
+ See Documentation/core-api/debugging-via-ohci1394.rst for more
info.
olpc_ec_timeout= [OLPC] ms delay when issuing EC commands
@@ -4210,12 +4251,24 @@
Duration of CPU stall (s) to test RCU CPU stall
warnings, zero to disable.
+ rcutorture.stall_cpu_block= [KNL]
+ Sleep while stalling if set. This will result
+ in warnings from preemptible RCU in addition
+ to any other stall-related activity.
+
rcutorture.stall_cpu_holdoff= [KNL]
Time to wait (s) after boot before inducing stall.
rcutorture.stall_cpu_irqsoff= [KNL]
Disable interrupts while stalling if set.
+ rcutorture.stall_gp_kthread= [KNL]
+ Duration (s) of forced sleep within RCU
+ grace-period kthread to test RCU CPU stall
+ warnings, zero to disable. If both stall_cpu
+ and stall_gp_kthread are specified, the
+ kthread is starved first, then the CPU.
+
rcutorture.stat_interval= [KNL]
Time (s) between statistics printk()s.
@@ -4286,6 +4339,13 @@
only normal grace-period primitives. No effect
on CONFIG_TINY_RCU kernels.
+ rcupdate.rcu_task_ipi_delay= [KNL]
+ Set time in jiffies during which RCU tasks will
+ avoid sending IPIs, starting with the beginning
+ of a given grace period. Setting a large
+ number avoids disturbing real-time workloads,
+ but lengthens grace periods.
+
rcupdate.rcu_task_stall_timeout= [KNL]
Set timeout in jiffies for RCU task stall warning
messages. Disable with a value less than or equal
@@ -4910,7 +4970,7 @@
Set the number of tcp_metrics_hash slots.
Default value is 8192 or 16384 depending on total
ram pages. This is used to specify the TCP metrics
- cache size. See Documentation/networking/ip-sysctl.txt
+ cache size. See Documentation/networking/ip-sysctl.rst
"tcp_no_metrics_save" section for more details.
tdfx= [HW,DRM]
@@ -5067,6 +5127,12 @@
interruptions from clocksource watchdog are not
acceptable).
+ tsc_early_khz= [X86] Skip early TSC calibration and use the given
+ value instead. Useful when the early TSC frequency discovery
+ procedure is not reliable, such as on overclocked systems
+ with CPUID.16h support and partial CPUID.15h support.
+ Format: <unsigned int>
+
tsx= [X86] Control Transactional Synchronization
Extensions (TSX) feature in Intel processors that
support TSX control.
@@ -5187,8 +5253,7 @@
usbcore.old_scheme_first=
[USB] Start with the old device initialization
- scheme, applies only to low and full-speed devices
- (default 0 = off).
+ scheme (default 0 = off).
usbcore.usbfs_memory_mb=
[USB] Memory limit (in MB) for buffers allocated by
diff --git a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst
index 21818aca4708..dc36aeb65d0a 100644
--- a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst
+++ b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst
@@ -10,7 +10,7 @@ them to a "housekeeping" CPU dedicated to such work.
References
==========
-- Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs.
+- Documentation/core-api/irq/irq-affinity.rst: Binding interrupts to sets of CPUs.
- Documentation/admin-guide/cgroup-v1: Using cgroups to bind tasks to sets of CPUs.
diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
index 1cc0bc78d10e..5026e58826e2 100644
--- a/Documentation/admin-guide/mm/hugetlbpage.rst
+++ b/Documentation/admin-guide/mm/hugetlbpage.rst
@@ -100,6 +100,41 @@ with a huge page size selection parameter "hugepagesz=<size>". <size> must
be specified in bytes with optional scale suffix [kKmMgG]. The default huge
page size may be selected with the "default_hugepagesz=<size>" boot parameter.
+Hugetlb boot command line parameter semantics
+hugepagesz - Specify a huge page size. Used in conjunction with hugepages
+ parameter to preallocate a number of huge pages of the specified
+ size. Hence, hugepagesz and hugepages are typically specified in
+ pairs such as:
+ hugepagesz=2M hugepages=512
+ hugepagesz can only be specified once on the command line for a
+ specific huge page size. Valid huge page sizes are architecture
+ dependent.
+hugepages - Specify the number of huge pages to preallocate. This typically
+ follows a valid hugepagesz or default_hugepagesz parameter. However,
+ if hugepages is the first or only hugetlb command line parameter it
+ implicitly specifies the number of huge pages of default size to
+ allocate. If the number of huge pages of default size is implicitly
+ specified, it can not be overwritten by a hugepagesz,hugepages
+ parameter pair for the default size.
+ For example, on an architecture with 2M default huge page size:
+ hugepages=256 hugepagesz=2M hugepages=512
+ will result in 256 2M huge pages being allocated and a warning message
+ indicating that the hugepages=512 parameter is ignored. If a hugepages
+ parameter is preceded by an invalid hugepagesz parameter, it will
+ be ignored.
+default_hugepagesz - Specify the default huge page size. This parameter can
+ only be specified once on the command line. default_hugepagesz can
+ optionally be followed by the hugepages parameter to preallocate a
+ specific number of huge pages of default size. The number of default
+ sized huge pages to preallocate can also be implicitly specified as
+ mentioned in the hugepages section above. Therefore, on an
+ architecture with 2M default huge page size:
+ hugepages=256
+ default_hugepagesz=2M hugepages=256
+ hugepages=256 default_hugepagesz=2M
+ will all result in 256 2M huge pages being allocated. Valid default
+ huge page size is architecture dependent.
+
When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages``
indicates the current number of pre-allocated huge pages of the default size.
Thus, one can use the following command to dynamically allocate/deallocate
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 2f31de8f7c74..6a233e42be08 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -220,6 +220,13 @@ memory. A lower value can prevent THPs from being
collapsed, resulting fewer pages being collapsed into
THPs, and lower memory access performance.
+``max_ptes_shared`` specifies how many pages can be shared across multiple
+processes. Exceeding the number would block the collapse::
+
+ /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared
+
+A higher value may increase memory footprint for some workloads.
+
Boot parameter
==============
diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst
index c30176e67900..0bf49d7313ad 100644
--- a/Documentation/admin-guide/mm/userfaultfd.rst
+++ b/Documentation/admin-guide/mm/userfaultfd.rst
@@ -12,107 +12,107 @@ and more generally they allow userland to take control of various
memory page faults, something otherwise only the kernel code could do.
For example userfaults allows a proper and more optimal implementation
-of the PROT_NONE+SIGSEGV trick.
+of the ``PROT_NONE+SIGSEGV`` trick.
Design
======
-Userfaults are delivered and resolved through the userfaultfd syscall.
+Userfaults are delivered and resolved through the ``userfaultfd`` syscall.
-The userfaultfd (aside from registering and unregistering virtual
+The ``userfaultfd`` (aside from registering and unregistering virtual
memory ranges) provides two primary functionalities:
-1) read/POLLIN protocol to notify a userland thread of the faults
+1) ``read/POLLIN`` protocol to notify a userland thread of the faults
happening
-2) various UFFDIO_* ioctls that can manage the virtual memory regions
- registered in the userfaultfd that allows userland to efficiently
+2) various ``UFFDIO_*`` ioctls that can manage the virtual memory regions
+ registered in the ``userfaultfd`` that allows userland to efficiently
resolve the userfaults it receives via 1) or to manage the virtual
memory in the background
The real advantage of userfaults if compared to regular virtual memory
management of mremap/mprotect is that the userfaults in all their
operations never involve heavyweight structures like vmas (in fact the
-userfaultfd runtime load never takes the mmap_sem for writing).
+``userfaultfd`` runtime load never takes the mmap_sem for writing).
Vmas are not suitable for page- (or hugepage) granular fault tracking
when dealing with virtual address spaces that could span
Terabytes. Too many vmas would be needed for that.
-The userfaultfd once opened by invoking the syscall, can also be
+The ``userfaultfd`` once opened by invoking the syscall, can also be
passed using unix domain sockets to a manager process, so the same
manager process could handle the userfaults of a multitude of
different processes without them being aware about what is going on
-(well of course unless they later try to use the userfaultfd
+(well of course unless they later try to use the ``userfaultfd``
themselves on the same region the manager is already tracking, which
-is a corner case that would currently return -EBUSY).
+is a corner case that would currently return ``-EBUSY``).
API
===
-When first opened the userfaultfd must be enabled invoking the
-UFFDIO_API ioctl specifying a uffdio_api.api value set to UFFD_API (or
-a later API version) which will specify the read/POLLIN protocol
-userland intends to speak on the UFFD and the uffdio_api.features
-userland requires. The UFFDIO_API ioctl if successful (i.e. if the
-requested uffdio_api.api is spoken also by the running kernel and the
+When first opened the ``userfaultfd`` must be enabled invoking the
+``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or
+a later API version) which will specify the ``read/POLLIN`` protocol
+userland intends to speak on the ``UFFD`` and the ``uffdio_api.features``
+userland requires. The ``UFFDIO_API`` ioctl if successful (i.e. if the
+requested ``uffdio_api.api`` is spoken also by the running kernel and the
requested features are going to be enabled) will return into
-uffdio_api.features and uffdio_api.ioctls two 64bit bitmasks of
+``uffdio_api.features`` and ``uffdio_api.ioctls`` two 64bit bitmasks of
respectively all the available features of the read(2) protocol and
the generic ioctl available.
-The uffdio_api.features bitmask returned by the UFFDIO_API ioctl
-defines what memory types are supported by the userfaultfd and what
+The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl
+defines what memory types are supported by the ``userfaultfd`` and what
events, except page fault notifications, may be generated.
-If the kernel supports registering userfaultfd ranges on hugetlbfs
-virtual memory areas, UFFD_FEATURE_MISSING_HUGETLBFS will be set in
-uffdio_api.features. Similarly, UFFD_FEATURE_MISSING_SHMEM will be
-set if the kernel supports registering userfaultfd ranges on shared
-memory (covering all shmem APIs, i.e. tmpfs, IPCSHM, /dev/zero
-MAP_SHARED, memfd_create, etc).
+If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs
+virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in
+``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be
+set if the kernel supports registering ``userfaultfd`` ranges on shared
+memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``,
+``MAP_SHARED``, ``memfd_create``, etc).
-The userland application that wants to use userfaultfd with hugetlbfs
+The userland application that wants to use ``userfaultfd`` with hugetlbfs
or shared memory need to set the corresponding flag in
-uffdio_api.features to enable those features.
+``uffdio_api.features`` to enable those features.
If the userland desires to receive notifications for events other than
-page faults, it has to verify that uffdio_api.features has appropriate
-UFFD_FEATURE_EVENT_* bits set. These events are described in more
-detail below in "Non-cooperative userfaultfd" section.
-
-Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should
-be invoked (if present in the returned uffdio_api.ioctls bitmask) to
-register a memory range in the userfaultfd by setting the
-uffdio_register structure accordingly. The uffdio_register.mode
+page faults, it has to verify that ``uffdio_api.features`` has appropriate
+``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more
+detail below in `Non-cooperative userfaultfd`_ section.
+
+Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should
+be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to
+register a memory range in the ``userfaultfd`` by setting the
+uffdio_register structure accordingly. The ``uffdio_register.mode``
bitmask will specify to the kernel which kind of faults to track for
-the range (UFFDIO_REGISTER_MODE_MISSING would track missing
-pages). The UFFDIO_REGISTER ioctl will return the
-uffdio_register.ioctls bitmask of ioctls that are suitable to resolve
+the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing
+pages). The ``UFFDIO_REGISTER`` ioctl will return the
+``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve
userfaults on the range registered. Not all ioctls will necessarily be
supported for all memory types depending on the underlying virtual
memory backend (anonymous memory vs tmpfs vs real filebacked
mappings).
-Userland can use the uffdio_register.ioctls to manage the virtual
+Userland can use the ``uffdio_register.ioctls`` to manage the virtual
address space in the background (to add or potentially also remove
-memory from the userfaultfd registered range). This means a userfault
+memory from the ``userfaultfd`` registered range). This means a userfault
could be triggering just before userland maps in the background the
user-faulted page.
-The primary ioctl to resolve userfaults is UFFDIO_COPY. That
+The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That
atomically copies a page into the userfault registered range and wakes
-up the blocked userfaults (unless uffdio_copy.mode &
-UFFDIO_COPY_MODE_DONTWAKE is set). Other ioctl works similarly to
-UFFDIO_COPY. They're atomic as in guaranteeing that nothing can see an
-half copied page since it'll keep userfaulting until the copy has
-finished.
+up the blocked userfaults
+(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set).
+Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in
+guaranteeing that nothing can see an half copied page since it'll
+keep userfaulting until the copy has finished.
Notes:
-- If you requested UFFDIO_REGISTER_MODE_MISSING when registering then
+- If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then
you must provide some kind of page in your thread after reading from
- the uffd. You must provide either UFFDIO_COPY or UFFDIO_ZEROPAGE.
+ the uffd. You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``.
The normal behavior of the OS automatically providing a zero page on
an annonymous mmaping is not in place.
@@ -122,13 +122,13 @@ Notes:
- You get the address of the access that triggered the missing page
event out of a struct uffd_msg that you read in the thread from the
- uffd. You can supply as many pages as you want with UFFDIO_COPY or
- UFFDIO_ZEROPAGE. Keep in mind that unless you used DONTWAKE then
+ uffd. You can supply as many pages as you want with ``UFFDIO_COPY`` or
+ ``UFFDIO_ZEROPAGE``. Keep in mind that unless you used DONTWAKE then
the first of any of those IOCTLs wakes up the faulting thread.
-- Be sure to test for all errors including (pollfd[0].revents &
- POLLERR). This can happen, e.g. when ranges supplied were
- incorrect.
+- Be sure to test for all errors including
+ (``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges
+ supplied were incorrect.
Write Protect Notifications
---------------------------
@@ -136,41 +136,42 @@ Write Protect Notifications
This is equivalent to (but faster than) using mprotect and a SIGSEGV
signal handler.
-Firstly you need to register a range with UFFDIO_REGISTER_MODE_WP.
-Instead of using mprotect(2) you use ioctl(uffd, UFFDIO_WRITEPROTECT,
-struct *uffdio_writeprotect) while mode = UFFDIO_WRITEPROTECT_MODE_WP
+Firstly you need to register a range with ``UFFDIO_REGISTER_MODE_WP``.
+Instead of using mprotect(2) you use
+``ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect)``
+while ``mode = UFFDIO_WRITEPROTECT_MODE_WP``
in the struct passed in. The range does not default to and does not
have to be identical to the range you registered with. You can write
protect as many ranges as you like (inside the registered range).
Then, in the thread reading from uffd the struct will have
-msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP set. Now you send
-ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect) again
-while pagefault.mode does not have UFFDIO_WRITEPROTECT_MODE_WP set.
-This wakes up the thread which will continue to run with writes. This
+``msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP`` set. Now you send
+``ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect)``
+again while ``pagefault.mode`` does not have ``UFFDIO_WRITEPROTECT_MODE_WP``
+set. This wakes up the thread which will continue to run with writes. This
allows you to do the bookkeeping about the write in the uffd reading
thread before the ioctl.
-If you registered with both UFFDIO_REGISTER_MODE_MISSING and
-UFFDIO_REGISTER_MODE_WP then you need to think about the sequence in
+If you registered with both ``UFFDIO_REGISTER_MODE_MISSING`` and
+``UFFDIO_REGISTER_MODE_WP`` then you need to think about the sequence in
which you supply a page and undo write protect. Note that there is a
difference between writes into a WP area and into a !WP area. The
-former will have UFFD_PAGEFAULT_FLAG_WP set, the latter
-UFFD_PAGEFAULT_FLAG_WRITE. The latter did not fail on protection but
-you still need to supply a page when UFFDIO_REGISTER_MODE_MISSING was
+former will have ``UFFD_PAGEFAULT_FLAG_WP`` set, the latter
+``UFFD_PAGEFAULT_FLAG_WRITE``. The latter did not fail on protection but
+you still need to supply a page when ``UFFDIO_REGISTER_MODE_MISSING`` was
used.
QEMU/KVM
========
-QEMU/KVM is using the userfaultfd syscall to implement postcopy live
+QEMU/KVM is using the ``userfaultfd`` syscall to implement postcopy live
migration. Postcopy live migration is one form of memory
externalization consisting of a virtual machine running with part or
all of its memory residing on a different node in the cloud. The
-userfaultfd abstraction is generic enough that not a single line of
+``userfaultfd`` abstraction is generic enough that not a single line of
KVM kernel code had to be modified in order to add postcopy live
migration to QEMU.
-Guest async page faults, FOLL_NOWAIT and all other GUP features work
+Guest async page faults, ``FOLL_NOWAIT`` and all other ``GUP*`` features work
just fine in combination with userfaults. Userfaults trigger async
page faults in the guest scheduler so those guest processes that
aren't waiting for userfaults (i.e. network bound) can keep running in
@@ -183,19 +184,19 @@ generating userfaults for readonly guest regions.
The implementation of postcopy live migration currently uses one
single bidirectional socket but in the future two different sockets
will be used (to reduce the latency of the userfaults to the minimum
-possible without having to decrease /proc/sys/net/ipv4/tcp_wmem).
+possible without having to decrease ``/proc/sys/net/ipv4/tcp_wmem``).
The QEMU in the source node writes all pages that it knows are missing
in the destination node, into the socket, and the migration thread of
-the QEMU running in the destination node runs UFFDIO_COPY|ZEROPAGE
-ioctls on the userfaultfd in order to map the received pages into the
-guest (UFFDIO_ZEROCOPY is used if the source page was a zero page).
+the QEMU running in the destination node runs ``UFFDIO_COPY|ZEROPAGE``
+ioctls on the ``userfaultfd`` in order to map the received pages into the
+guest (``UFFDIO_ZEROCOPY`` is used if the source page was a zero page).
A different postcopy thread in the destination node listens with
-poll() to the userfaultfd in parallel. When a POLLIN event is
+poll() to the ``userfaultfd`` in parallel. When a ``POLLIN`` event is
generated after a userfault triggers, the postcopy thread read() from
-the userfaultfd and receives the fault address (or -EAGAIN in case the
-userfault was already resolved and waken by a UFFDIO_COPY|ZEROPAGE run
+the ``userfaultfd`` and receives the fault address (or ``-EAGAIN`` in case the
+userfault was already resolved and waken by a ``UFFDIO_COPY|ZEROPAGE`` run
by the parallel QEMU migration thread).
After the QEMU postcopy thread (running in the destination node) gets
@@ -206,7 +207,7 @@ remaining missing pages from that new page offset. Soon after that
(just the time to flush the tcp_wmem queue through the network) the
migration thread in the QEMU running in the destination node will
receive the page that triggered the userfault and it'll map it as
-usual with the UFFDIO_COPY|ZEROPAGE (without actually knowing if it
+usual with the ``UFFDIO_COPY|ZEROPAGE`` (without actually knowing if it
was spontaneously sent by the source or if it was an urgent page
requested through a userfault).
@@ -219,74 +220,74 @@ checked to find which missing pages to send in round robin and we seek
over it when receiving incoming userfaults. After sending each page of
course the bitmap is updated accordingly. It's also useful to avoid
sending the same page twice (in case the userfault is read by the
-postcopy thread just before UFFDIO_COPY|ZEROPAGE runs in the migration
+postcopy thread just before ``UFFDIO_COPY|ZEROPAGE`` runs in the migration
thread).
Non-cooperative userfaultfd
===========================
-When the userfaultfd is monitored by an external manager, the manager
+When the ``userfaultfd`` is monitored by an external manager, the manager
must be able to track changes in the process virtual memory
layout. Userfaultfd can notify the manager about such changes using
the same read(2) protocol as for the page fault notifications. The
manager has to explicitly enable these events by setting appropriate
-bits in uffdio_api.features passed to UFFDIO_API ioctl:
+bits in ``uffdio_api.features`` passed to ``UFFDIO_API`` ioctl:
-UFFD_FEATURE_EVENT_FORK
- enable userfaultfd hooks for fork(). When this feature is
- enabled, the userfaultfd context of the parent process is
+``UFFD_FEATURE_EVENT_FORK``
+ enable ``userfaultfd`` hooks for fork(). When this feature is
+ enabled, the ``userfaultfd`` context of the parent process is
duplicated into the newly created process. The manager
- receives UFFD_EVENT_FORK with file descriptor of the new
- userfaultfd context in the uffd_msg.fork.
+ receives ``UFFD_EVENT_FORK`` with file descriptor of the new
+ ``userfaultfd`` context in the ``uffd_msg.fork``.
-UFFD_FEATURE_EVENT_REMAP
+``UFFD_FEATURE_EVENT_REMAP``
enable notifications about mremap() calls. When the
non-cooperative process moves a virtual memory area to a
different location, the manager will receive
- UFFD_EVENT_REMAP. The uffd_msg.remap will contain the old and
+ ``UFFD_EVENT_REMAP``. The ``uffd_msg.remap`` will contain the old and
new addresses of the area and its original length.
-UFFD_FEATURE_EVENT_REMOVE
+``UFFD_FEATURE_EVENT_REMOVE``
enable notifications about madvise(MADV_REMOVE) and
- madvise(MADV_DONTNEED) calls. The event UFFD_EVENT_REMOVE will
- be generated upon these calls to madvise. The uffd_msg.remove
+ madvise(MADV_DONTNEED) calls. The event ``UFFD_EVENT_REMOVE`` will
+ be generated upon these calls to madvise(). The ``uffd_msg.remove``
will contain start and end addresses of the removed area.
-UFFD_FEATURE_EVENT_UNMAP
+``UFFD_FEATURE_EVENT_UNMAP``
enable notifications about memory unmapping. The manager will
- get UFFD_EVENT_UNMAP with uffd_msg.remove containing start and
+ get ``UFFD_EVENT_UNMAP`` with ``uffd_msg.remove`` containing start and
end addresses of the unmapped area.
-Although the UFFD_FEATURE_EVENT_REMOVE and UFFD_FEATURE_EVENT_UNMAP
+Although the ``UFFD_FEATURE_EVENT_REMOVE`` and ``UFFD_FEATURE_EVENT_UNMAP``
are pretty similar, they quite differ in the action expected from the
-userfaultfd manager. In the former case, the virtual memory is
+``userfaultfd`` manager. In the former case, the virtual memory is
removed, but the area is not, the area remains monitored by the
-userfaultfd, and if a page fault occurs in that area it will be
+``userfaultfd``, and if a page fault occurs in that area it will be
delivered to the manager. The proper resolution for such page fault is
to zeromap the faulting address. However, in the latter case, when an
area is unmapped, either explicitly (with munmap() system call), or
implicitly (e.g. during mremap()), the area is removed and in turn the
-userfaultfd context for such area disappears too and the manager will
+``userfaultfd`` context for such area disappears too and the manager will
not get further userland page faults from the removed area. Still, the
notification is required in order to prevent manager from using
-UFFDIO_COPY on the unmapped area.
+``UFFDIO_COPY`` on the unmapped area.
Unlike userland page faults which have to be synchronous and require
explicit or implicit wakeup, all the events are delivered
asynchronously and the non-cooperative process resumes execution as
-soon as manager executes read(). The userfaultfd manager should
-carefully synchronize calls to UFFDIO_COPY with the events
-processing. To aid the synchronization, the UFFDIO_COPY ioctl will
-return -ENOSPC when the monitored process exits at the time of
-UFFDIO_COPY, and -ENOENT, when the non-cooperative process has changed
-its virtual memory layout simultaneously with outstanding UFFDIO_COPY
+soon as manager executes read(). The ``userfaultfd`` manager should
+carefully synchronize calls to ``UFFDIO_COPY`` with the events
+processing. To aid the synchronization, the ``UFFDIO_COPY`` ioctl will
+return ``-ENOSPC`` when the monitored process exits at the time of
+``UFFDIO_COPY``, and ``-ENOENT``, when the non-cooperative process has changed
+its virtual memory layout simultaneously with outstanding ``UFFDIO_COPY``
operation.
The current asynchronous model of the event delivery is optimal for
-single threaded non-cooperative userfaultfd manager implementations. A
+single threaded non-cooperative ``userfaultfd`` manager implementations. A
synchronous event delivery model can be added later as a new
-userfaultfd feature to facilitate multithreading enhancements of the
-non cooperative manager, for example to allow UFFDIO_COPY ioctls to
+``userfaultfd`` feature to facilitate multithreading enhancements of the
+non cooperative manager, for example to allow ``UFFDIO_COPY`` ioctls to
run in parallel to the event reception. Single threaded
implementations should continue to use the current async event
delivery model instead.
diff --git a/Documentation/admin-guide/nfs/nfsroot.rst b/Documentation/admin-guide/nfs/nfsroot.rst
index 82a4fda057f9..c6772075c80c 100644
--- a/Documentation/admin-guide/nfs/nfsroot.rst
+++ b/Documentation/admin-guide/nfs/nfsroot.rst
@@ -18,7 +18,7 @@ Mounting the root filesystem via NFS (nfsroot)
In order to use a diskless system, such as an X-terminal or printer server for
example, it is necessary for the root filesystem to be present on a non-disk
device. This may be an initramfs (see
-Documentation/filesystems/ramfs-rootfs-initramfs.txt), a ramdisk (see
+Documentation/filesystems/ramfs-rootfs-initramfs.rst), a ramdisk (see
Documentation/admin-guide/initrd.rst) or a filesystem mounted via NFS. The
following text describes on how to use NFS for the root filesystem. For the rest
of this text 'client' means the diskless system, and 'server' means the NFS
diff --git a/Documentation/admin-guide/numastat.rst b/Documentation/admin-guide/numastat.rst
index aaf1667489f8..08ec2c2bdce3 100644
--- a/Documentation/admin-guide/numastat.rst
+++ b/Documentation/admin-guide/numastat.rst
@@ -6,6 +6,21 @@ Numa policy hit/miss statistics
All units are pages. Hugepages have separate counters.
+The numa_hit, numa_miss and numa_foreign counters reflect how well processes
+are able to allocate memory from nodes they prefer. If they succeed, numa_hit
+is incremented on the preferred node, otherwise numa_foreign is incremented on
+the preferred node and numa_miss on the node where allocation succeeded.
+
+Usually preferred node is the one local to the CPU where the process executes,
+but restrictions such as mempolicies can change that, so there are also two
+counters based on CPU local node. local_node is similar to numa_hit and is
+incremented on allocation from a node by CPU on the same node. other_node is
+similar to numa_miss and is incremented on the node where allocation succeeds
+from a CPU from a different node. Note there is no counter analogical to
+numa_foreign.
+
+In more detail:
+
=============== ============================================================
numa_hit A process wanted to allocate memory from this node,
and succeeded.
@@ -14,11 +29,13 @@ numa_miss A process wanted to allocate memory from another node,
but ended up with memory from this node.
numa_foreign A process wanted to allocate on this node,
- but ended up with memory from another one.
+ but ended up with memory from another node.
-local_node A process ran on this node and got memory from it.
+local_node A process ran on this node's CPU,
+ and got memory from this node.
-other_node A process ran on this node and got memory from another node.
+other_node A process ran on a different node's CPU
+ and got memory from this node.
interleave_hit Interleaving wanted to allocate from this node
and succeeded.
@@ -28,3 +45,11 @@ For easier reading you can use the numastat utility from the numactl package
(http://oss.sgi.com/projects/libnuma/). Note that it only works
well right now on machines with a small number of CPUs.
+Note that on systems with memoryless nodes (where a node has CPUs but no
+memory) the numa_hit, numa_miss and numa_foreign statistics can be skewed
+heavily. In the current kernel implementation, if a process prefers a
+memoryless node (i.e. because it is running on one of its local CPU), the
+implementation actually treats one of the nearest nodes with memory as the
+preferred node. As a result, such allocation will not increase the numa_foreign
+counter on the memoryless node, and will skew the numa_hit, numa_miss and
+numa_foreign statistics of the nearest node.
diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst
index 72effa7c23b9..1307b5274a0f 100644
--- a/Documentation/admin-guide/perf-security.rst
+++ b/Documentation/admin-guide/perf-security.rst
@@ -1,6 +1,6 @@
.. _perf_security:
-Perf Events and tool security
+Perf events and tool security
=============================
Overview
@@ -42,11 +42,11 @@ categories:
Data that belong to the fourth category can potentially contain
sensitive process data. If PMUs in some monitoring modes capture values
of execution context registers or data from process memory then access
-to such monitoring capabilities requires to be ordered and secured
-properly. So, perf_events/Perf performance monitoring is the subject for
-security access control management [5]_ .
+to such monitoring modes requires to be ordered and secured properly.
+So, perf_events performance monitoring and observability operations are
+the subject for security access control management [5]_ .
-perf_events/Perf access control
+perf_events access control
-------------------------------
To perform security checks, the Linux implementation splits processes
@@ -66,11 +66,25 @@ into distinct units, known as capabilities [6]_ , which can be
independently enabled and disabled on per-thread basis for processes and
files of unprivileged users.
-Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated
+Unprivileged processes with enabled CAP_PERFMON capability are treated
as privileged processes with respect to perf_events performance
-monitoring and bypass *scope* permissions checks in the kernel.
-
-Unprivileged processes using perf_events system call API is also subject
+monitoring and observability operations, thus, bypass *scope* permissions
+checks in the kernel. CAP_PERFMON implements the principle of least
+privilege [13]_ (POSIX 1003.1e: 2.2.2.39) for performance monitoring and
+observability operations in the kernel and provides a secure approach to
+perfomance monitoring and observability in the system.
+
+For backward compatibility reasons the access to perf_events monitoring and
+observability operations is also open for CAP_SYS_ADMIN privileged
+processes but CAP_SYS_ADMIN usage for secure monitoring and observability
+use cases is discouraged with respect to the CAP_PERFMON capability.
+If system audit records [14]_ for a process using perf_events system call
+API contain denial records of acquiring both CAP_PERFMON and CAP_SYS_ADMIN
+capabilities then providing the process with CAP_PERFMON capability singly
+is recommended as the preferred secure approach to resolve double access
+denial logging related to usage of performance monitoring and observability.
+
+Unprivileged processes using perf_events system call are also subject
for PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose
outcome determines whether monitoring is permitted. So unprivileged
processes provided with CAP_SYS_PTRACE capability are effectively
@@ -82,14 +96,14 @@ performance analysis of monitored processes or a system. For example,
CAP_SYSLOG capability permits reading kernel space memory addresses from
/proc/kallsyms file.
-perf_events/Perf privileged users
+Privileged Perf users groups
---------------------------------
Mechanisms of capabilities, privileged capability-dumb files [6]_ and
-file system ACLs [10]_ can be used to create a dedicated group of
-perf_events/Perf privileged users who are permitted to execute
-performance monitoring without scope limits. The following steps can be
-taken to create such a group of privileged Perf users.
+file system ACLs [10]_ can be used to create dedicated groups of
+privileged Perf users who are permitted to execute performance monitoring
+and observability without scope limits. The following steps can be
+taken to create such groups of privileged Perf users.
1. Create perf_users group of privileged Perf users, assign perf_users
group to Perf tool executable and limit access to the executable for
@@ -108,30 +122,51 @@ taken to create such a group of privileged Perf users.
-rwxr-x--- 2 root perf_users 11M Oct 19 15:12 perf
2. Assign the required capabilities to the Perf tool executable file and
- enable members of perf_users group with performance monitoring
+ enable members of perf_users group with monitoring and observability
privileges [6]_ :
::
- # setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf
- # setcap -v "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf
+ # setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf
+ # setcap -v "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf
perf: OK
# getcap perf
- perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep
+ perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep
+
+If the libcap installed doesn't yet support "cap_perfmon", use "38" instead,
+i.e.:
+
+::
+
+ # setcap "38,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf
+
+Note that you may need to have 'cap_ipc_lock' in the mix for tools such as
+'perf top', alternatively use 'perf top -m N', to reduce the memory that
+it uses for the perf ring buffer, see the memory allocation section below.
+
+Using a libcap without support for CAP_PERFMON will make cap_get_flag(caps, 38,
+CAP_EFFECTIVE, &val) fail, which will lead the default event to be 'cycles:u',
+so as a workaround explicitly ask for the 'cycles' event, i.e.:
+
+::
+
+ # perf top -e cycles
+
+To get kernel and user samples with a perf binary with just CAP_PERFMON.
As a result, members of perf_users group are capable of conducting
-performance monitoring by using functionality of the configured Perf
-tool executable that, when executes, passes perf_events subsystem scope
-checks.
+performance monitoring and observability by using functionality of the
+configured Perf tool executable that, when executes, passes perf_events
+subsystem scope checks.
This specific access control management is only available to superuser
or root running processes with CAP_SETPCAP, CAP_SETFCAP [6]_
capabilities.
-perf_events/Perf unprivileged users
+Unprivileged users
-----------------------------------
-perf_events/Perf *scope* and *access* control for unprivileged processes
+perf_events *scope* and *access* control for unprivileged processes
is governed by perf_event_paranoid [2]_ setting:
-1:
@@ -166,7 +201,7 @@ is governed by perf_event_paranoid [2]_ setting:
perf_event_mlock_kb locking limit is imposed but ignored for
unprivileged processes with CAP_IPC_LOCK capability.
-perf_events/Perf resource control
+Resource control
---------------------------------
Open file descriptors
@@ -227,4 +262,5 @@ Bibliography
.. [10] `<http://man7.org/linux/man-pages/man5/acl.5.html>`_
.. [11] `<http://man7.org/linux/man-pages/man2/getrlimit.2.html>`_
.. [12] `<http://man7.org/linux/man-pages/man5/limits.conf.5.html>`_
-
+.. [13] `<https://sites.google.com/site/fullycapable>`_
+.. [14] `<http://man7.org/linux/man-pages/man8/auditd.8.html>`_
diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst
index 5605cc6f9560..a96a423e3779 100644
--- a/Documentation/admin-guide/pm/cpuidle.rst
+++ b/Documentation/admin-guide/pm/cpuidle.rst
@@ -159,17 +159,15 @@ governor uses that information depends on what algorithm is implemented by it
and that is the primary reason for having more than one governor in the
``CPUIdle`` subsystem.
-There are three ``CPUIdle`` governors available, ``menu``, `TEO <teo-gov_>`_
-and ``ladder``. Which of them is used by default depends on the configuration
-of the kernel and in particular on whether or not the scheduler tick can be
-`stopped by the idle loop <idle-cpus-and-tick_>`_. It is possible to change the
-governor at run time if the ``cpuidle_sysfs_switch`` command line parameter has
-been passed to the kernel, but that is not safe in general, so it should not be
-done on production systems (that may change in the future, though). The name of
-the ``CPUIdle`` governor currently used by the kernel can be read from the
-:file:`current_governor_ro` (or :file:`current_governor` if
-``cpuidle_sysfs_switch`` is present in the kernel command line) file under
-:file:`/sys/devices/system/cpu/cpuidle/` in ``sysfs``.
+There are four ``CPUIdle`` governors available, ``menu``, `TEO <teo-gov_>`_,
+``ladder`` and ``haltpoll``. Which of them is used by default depends on the
+configuration of the kernel and in particular on whether or not the scheduler
+tick can be `stopped by the idle loop <idle-cpus-and-tick_>`_. Available
+governors can be read from the :file:`available_governors`, and the governor
+can be changed at runtime. The name of the ``CPUIdle`` governor currently
+used by the kernel can be read from the :file:`current_governor_ro` or
+:file:`current_governor` file under :file:`/sys/devices/system/cpu/cpuidle/`
+in ``sysfs``.
Which ``CPUIdle`` driver is used, on the other hand, usually depends on the
platform the kernel is running on, but there are platforms with more than one
diff --git a/Documentation/admin-guide/pm/intel-speed-select.rst b/Documentation/admin-guide/pm/intel-speed-select.rst
new file mode 100644
index 000000000000..b2ca601c21c6
--- /dev/null
+++ b/Documentation/admin-guide/pm/intel-speed-select.rst
@@ -0,0 +1,917 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================================================
+Intel(R) Speed Select Technology User Guide
+============================================================
+
+The Intel(R) Speed Select Technology (Intel(R) SST) provides a powerful new
+collection of features that give more granular control over CPU performance.
+With Intel(R) SST, one server can be configured for power and performance for a
+variety of diverse workload requirements.
+
+Refer to the links below for an overview of the technology:
+
+- https://www.intel.com/content/www/us/en/architecture-and-technology/speed-select-technology-article.html
+- https://builders.intel.com/docs/networkbuilders/intel-speed-select-technology-base-frequency-enhancing-performance.pdf
+
+These capabilities are further enhanced in some of the newer generations of
+server platforms where these features can be enumerated and controlled
+dynamically without pre-configuring via BIOS setup options. This dynamic
+configuration is done via mailbox commands to the hardware. One way to enumerate
+and configure these features is by using the Intel Speed Select utility.
+
+This document explains how to use the Intel Speed Select tool to enumerate and
+control Intel(R) SST features. This document gives example commands and explains
+how these commands change the power and performance profile of the system under
+test. Using this tool as an example, customers can replicate the messaging
+implemented in the tool in their production software.
+
+intel-speed-select configuration tool
+======================================
+
+Most Linux distribution packages may include the "intel-speed-select" tool. If not,
+it can be built by downloading the Linux kernel tree from kernel.org. Once
+downloaded, the tool can be built without building the full kernel.
+
+From the kernel tree, run the following commands::
+
+# cd tools/power/x86/intel-speed-select/
+# make
+# make install
+
+Getting Help
+------------
+
+To get help with the tool, execute the command below::
+
+# intel-speed-select --help
+
+The top-level help describes arguments and features. Notice that there is a
+multi-level help structure in the tool. For example, to get help for the feature "perf-profile"::
+
+# intel-speed-select perf-profile --help
+
+To get help on a command, another level of help is provided. For example for the command info "info"::
+
+# intel-speed-select perf-profile info --help
+
+Summary of platform capability
+------------------------------
+To check the current platform and driver capaibilities, execute::
+
+#intel-speed-select --info
+
+For example on a test system::
+
+ # intel-speed-select --info
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ Platform: API version : 1
+ Platform: Driver version : 1
+ Platform: mbox supported : 1
+ Platform: mmio supported : 1
+ Intel(R) SST-PP (feature perf-profile) is supported
+ TDP level change control is unlocked, max level: 4
+ Intel(R) SST-TF (feature turbo-freq) is supported
+ Intel(R) SST-BF (feature base-freq) is not supported
+ Intel(R) SST-CP (feature core-power) is supported
+
+Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP)
+------------------------------------------------------------------------
+
+This feature allows configuration of a server dynamically based on workload
+performance requirements. This helps users during deployment as they do not have
+to choose a specific server configuration statically. This Intel(R) Speed Select
+Technology - Performance Profile (Intel(R) SST-PP) feature introduces a mechanism
+that allows multiple optimized performance profiles per system. Each profile
+defines a set of CPUs that need to be online and rest offline to sustain a
+guaranteed base frequency. Once the user issues a command to use a specific
+performance profile and meet CPU online/offline requirement, the user can expect
+a change in the base frequency dynamically. This feature is called
+"perf-profile" when using the Intel Speed Select tool.
+
+Number or performance levels
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There can be multiple performance profiles on a system. To get the number of
+profiles, execute the command below::
+
+ # intel-speed-select perf-profile get-config-levels
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ get-config-levels:4
+ package-1
+ die-0
+ cpu-14
+ get-config-levels:4
+
+On this system under test, there are 4 performance profiles in addition to the
+base performance profile (which is performance level 0).
+
+Lock/Unlock status
+~~~~~~~~~~~~~~~~~~
+
+Even if there are multiple performance profiles, it is possible that that they
+are locked. If they are locked, users cannot issue a command to change the
+performance state. It is possible that there is a BIOS setup to unlock or check
+with your system vendor.
+
+To check if the system is locked, execute the following command::
+
+ # intel-speed-select perf-profile get-lock-status
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ get-lock-status:0
+ package-1
+ die-0
+ cpu-14
+ get-lock-status:0
+
+In this case, lock status is 0, which means that the system is unlocked.
+
+Properties of a performance level
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get properties of a specific performance level (For example for the level 0, below), execute the command below::
+
+ # intel-speed-select perf-profile info -l 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ perf-profile-level-0
+ cpu-count:28
+ enable-cpu-mask:000003ff,f0003fff
+ enable-cpu-list:0,1,2,3,4,5,6,7,8,9,10,11,12,13,28,29,30,31,32,33,34,35,36,37,38,39,40,41
+ thermal-design-power-ratio:26
+ base-frequency(MHz):2600
+ speed-select-turbo-freq:disabled
+ speed-select-base-freq:disabled
+ ...
+ ...
+
+Here -l option is used to specify a performance level.
+
+If the option -l is omitted, then this command will print information about all
+the performance levels. The above command is printing properties of the
+performance level 0.
+
+For this performance profile, the list of CPUs displayed by the
+"enable-cpu-mask/enable-cpu-list" at the max can be "online." When that
+condition is met, then base frequency of 2600 MHz can be maintained. To
+understand more, execute "intel-speed-select perf-profile info" for performance
+level 4::
+
+ # intel-speed-select perf-profile info -l 4
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ perf-profile-level-4
+ cpu-count:28
+ enable-cpu-mask:000000fa,f0000faf
+ enable-cpu-list:0,1,2,3,5,7,8,9,10,11,28,29,30,31,33,35,36,37,38,39
+ thermal-design-power-ratio:28
+ base-frequency(MHz):2800
+ speed-select-turbo-freq:disabled
+ speed-select-base-freq:unsupported
+ ...
+ ...
+
+There are fewer CPUs in the "enable-cpu-mask/enable-cpu-list". Consequently, if
+the user only keeps these CPUs online and the rest "offline," then the base
+frequency is increased to 2.8 GHz compared to 2.6 GHz at performance level 0.
+
+Get current performance level
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get the current performance level, execute::
+
+ # intel-speed-select perf-profile get-config-current-level
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ get-config-current_level:0
+
+First verify that the base_frequency displayed by the cpufreq sysfs is correct::
+
+ # cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency
+ 2600000
+
+This matches the base-frequency (MHz) field value displayed from the
+"perf-profile info" command for performance level 0(cpufreq frequency is in
+KHz).
+
+To check if the average frequency is equal to the base frequency for a 100% busy
+workload, disable turbo::
+
+# echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
+
+Then runs a busy workload on all CPUs, for example::
+
+#stress -c 64
+
+To verify the base frequency, run turbostat::
+
+ #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
+
+ Package Core CPU Bzy_MHz
+ - - 2600
+ 0 0 0 2600
+ 0 1 1 2600
+ 0 2 2 2600
+ 0 3 3 2600
+ 0 4 4 2600
+ . . . .
+
+
+Changing performance level
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To the change the performance level to 4, execute::
+
+ # intel-speed-select -d perf-profile set-config-level -l 4 -o
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ perf-profile
+ set_tdp_level:success
+
+In the command above, "-o" is optional. If it is specified, then it will also
+offline CPUs which are not present in the enable_cpu_mask for this performance
+level.
+
+Now if the base_frequency is checked::
+
+ #cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency
+ 2800000
+
+Which shows that the base frequency now increased from 2600 MHz at performance
+level 0 to 2800 MHz at performance level 4. As a result, any workload, which can
+use fewer CPUs, can see a boost of 200 MHz compared to performance level 0.
+
+Check presence of other Intel(R) SST features
+---------------------------------------------
+
+Each of the performance profiles also specifies weather there is support of
+other two Intel(R) SST features (Intel(R) Speed Select Technology - Base Frequency
+(Intel(R) SST-BF) and Intel(R) Speed Select Technology - Turbo Frequency (Intel
+SST-TF)).
+
+For example, from the output of "perf-profile info" above, for level 0 and level
+4:
+
+For level 0::
+ speed-select-turbo-freq:disabled
+ speed-select-base-freq:disabled
+
+For level 4::
+ speed-select-turbo-freq:disabled
+ speed-select-base-freq:unsupported
+
+Given these results, the "speed-select-base-freq" (Intel(R) SST-BF) in level 4
+changed from "disabled" to "unsupported" compared to performance level 0.
+
+This means that at performance level 4, the "speed-select-base-freq" feature is
+not supported. However, at performance level 0, this feature is "supported", but
+currently "disabled", meaning the user has not activated this feature. Whereas
+"speed-select-turbo-freq" (Intel(R) SST-TF) is supported at both performance
+levels, but currently not activated by the user.
+
+The Intel(R) SST-BF and the Intel(R) SST-TF features are built on a foundation
+technology called Intel(R) Speed Select Technology - Core Power (Intel(R) SST-CP).
+The platform firmware enables this feature when Intel(R) SST-BF or Intel(R) SST-TF
+is supported on a platform.
+
+Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP)
+---------------------------------------------------------------
+
+Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP) is an interface that
+allows users to define per core priority. This defines a mechanism to distribute
+power among cores when there is a power constrained scenario. This defines a
+class of service (CLOS) configuration.
+
+The user can configure up to 4 class of service configurations. Each CLOS group
+configuration allows definitions of parameters, which affects how the frequency
+can be limited and power is distributed. Each CPU core can be tied to a class of
+service and hence an associated priority. The granularity is at core level not
+at per CPU level.
+
+Enable CLOS based prioritization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To use CLOS based prioritization feature, firmware must be informed to enable
+and use a priority type. There is a default per platform priority type, which
+can be changed with optional command line parameter.
+
+To enable and check the options, execute::
+
+ # intel-speed-select core-power enable --help
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ Enable core-power for a package/die
+ Clos Enable: Specify priority type with [--priority|-p]
+ 0: Proportional, 1: Ordered
+
+There are two types of priority types:
+
+- Ordered
+
+Priority for ordered throttling is defined based on the index of the assigned
+CLOS group. Where CLOS0 gets highest priority (throttled last).
+
+Priority order is:
+CLOS0 > CLOS1 > CLOS2 > CLOS3.
+
+- Proportional
+
+When proportional priority is used, there is an additional parameter called
+frequency_weight, which can be specified per CLOS group. The goal of
+proportional priority is to provide each core with the requested min., then
+distribute all remaining (excess/deficit) budgets in proportion to a defined
+weight. This proportional priority can be configured using "core-power config"
+command.
+
+To enable with the platform default priority type, execute::
+
+ # intel-speed-select core-power enable
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ core-power
+ enable:success
+ package-1
+ die-0
+ cpu-6
+ core-power
+ enable:success
+
+The scope of this enable is per package or die scoped when a package contains
+multiple dies. To check if CLOS is enabled and get priority type, "core-power
+info" command can be used. For example to check the status of core-power feature
+on CPU 0, execute::
+
+ # intel-speed-select -c 0 core-power info
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ core-power
+ support-status:supported
+ enable-status:enabled
+ clos-enable-status:enabled
+ priority-type:proportional
+ package-1
+ die-0
+ cpu-24
+ core-power
+ support-status:supported
+ enable-status:enabled
+ clos-enable-status:enabled
+ priority-type:proportional
+
+Configuring CLOS groups
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Each CLOS group has its own attributes including min, max, freq_weight and
+desired. These parameters can be configured with "core-power config" command.
+Defaults will be used if user skips setting a parameter except clos id, which is
+mandatory. To check core-power config options, execute::
+
+ # intel-speed-select core-power config --help
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ Set core-power configuration for one of the four clos ids
+ Specify targeted clos id with [--clos|-c]
+ Specify clos Proportional Priority [--weight|-w]
+ Specify clos min in MHz with [--min|-n]
+ Specify clos max in MHz with [--max|-m]
+
+For example::
+
+ # intel-speed-select core-power config -c 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ clos epp is not specified, default: 0
+ clos frequency weight is not specified, default: 0
+ clos min is not specified, default: 0 MHz
+ clos max is not specified, default: 25500 MHz
+ clos desired is not specified, default: 0
+ package-0
+ die-0
+ cpu-0
+ core-power
+ config:success
+ package-1
+ die-0
+ cpu-6
+ core-power
+ config:success
+
+The user has the option to change defaults. For example, the user can change the
+"min" and set the base frequency to always get guaranteed base frequency.
+
+Get the current CLOS configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To check the current configuration, "core-power get-config" can be used. For
+example, to get the configuration of CLOS 0::
+
+ # intel-speed-select core-power get-config -c 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ core-power
+ clos:0
+ epp:0
+ clos-proportional-priority:0
+ clos-min:0 MHz
+ clos-max:Max Turbo frequency
+ clos-desired:0 MHz
+ package-1
+ die-0
+ cpu-24
+ core-power
+ clos:0
+ epp:0
+ clos-proportional-priority:0
+ clos-min:0 MHz
+ clos-max:Max Turbo frequency
+ clos-desired:0 MHz
+
+Associating a CPU with a CLOS group
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To associate a CPU to a CLOS group "core-power assoc" command can be used::
+
+ # intel-speed-select core-power assoc --help
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ Associate a clos id to a CPU
+ Specify targeted clos id with [--clos|-c]
+
+
+For example to associate CPU 10 to CLOS group 3, execute::
+
+ # intel-speed-select -c 10 core-power assoc -c 3
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-10
+ core-power
+ assoc:success
+
+Once a CPU is associated, its sibling CPUs are also associated to a CLOS group.
+Once associated, avoid changing Linux "cpufreq" subsystem scaling frequency
+limits.
+
+To check the existing association for a CPU, "core-power get-assoc" command can
+be used. For example, to get association of CPU 10, execute::
+
+ # intel-speed-select -c 10 core-power get-assoc
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-1
+ die-0
+ cpu-10
+ get-assoc
+ clos:3
+
+This shows that CPU 10 is part of a CLOS group 3.
+
+
+Disable CLOS based prioritization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To disable, execute::
+
+# intel-speed-select core-power disable
+
+Some features like Intel(R) SST-TF can only be enabled when CLOS based prioritization
+is enabled. For this reason, disabling while Intel(R) SST-TF is enabled can cause
+Intel(R) SST-TF to fail. This will cause the "disable" command to display an error
+if Intel(R) SST-TF is already enabled. In turn, to disable, the Intel(R) SST-TF
+feature must be disabled first.
+
+Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF)
+-------------------------------------------------------------------
+
+The Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF) feature lets
+the user control base frequency. If some critical workload threads demand
+constant high guaranteed performance, then this feature can be used to execute
+the thread at higher base frequency on specific sets of CPUs (high priority
+CPUs) at the cost of lower base frequency (low priority CPUs) on other CPUs.
+This feature does not require offline of the low priority CPUs.
+
+The support of Intel(R) SST-BF depends on the Intel(R) Speed Select Technology -
+Performance Profile (Intel(R) SST-PP) performance level configuration. It is
+possible that only certain performance levels support Intel(R) SST-BF. It is also
+possible that only base performance level (level = 0) has support of Intel
+SST-BF. Consequently, first select the desired performance level to enable this
+feature.
+
+In the system under test here, Intel(R) SST-BF is supported at the base
+performance level 0, but currently disabled. For example for the level 0::
+
+ # intel-speed-select -c 0 perf-profile info -l 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ perf-profile-level-0
+ ...
+
+ speed-select-base-freq:disabled
+ ...
+
+Before enabling Intel(R) SST-BF and measuring its impact on a workload
+performance, execute some workload and measure performance and get a baseline
+performance to compare against.
+
+Here the user wants more guaranteed performance. For this reason, it is likely
+that turbo is disabled. To disable turbo, execute::
+
+#echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
+
+Based on the output of the "intel-speed-select perf-profile info -l 0" base
+frequency of guaranteed frequency 2600 MHz.
+
+
+Measure baseline performance for comparison
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To compare, pick a multi-threaded workload where each thread can be scheduled on
+separate CPUs. "Hackbench pipe" test is a good example on how to improve
+performance using Intel(R) SST-BF.
+
+Below, the workload is measuring average scheduler wakeup latency, so a lower
+number means better performance::
+
+ # taskset -c 3,4 perf bench -r 100 sched pipe
+ # Running 'sched/pipe' benchmark:
+ # Executed 1000000 pipe operations between two processes
+ Total time: 6.102 [sec]
+ 6.102445 usecs/op
+ 163868 ops/sec
+
+While running the above test, if we take turbostat output, it will show us that
+2 of the CPUs are busy and reaching max. frequency (which would be the base
+frequency as the turbo is disabled). The turbostat output::
+
+ #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
+ Package Core CPU Bzy_MHz
+ 0 0 0 1000
+ 0 1 1 1005
+ 0 2 2 1000
+ 0 3 3 2600
+ 0 4 4 2600
+ 0 5 5 1000
+ 0 6 6 1000
+ 0 7 7 1005
+ 0 8 8 1005
+ 0 9 9 1000
+ 0 10 10 1000
+ 0 11 11 995
+ 0 12 12 1000
+ 0 13 13 1000
+
+From the above turbostat output, both CPU 3 and 4 are very busy and reaching
+full guaranteed frequency of 2600 MHz.
+
+Intel(R) SST-BF Capabilities
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get capabilities of Intel(R) SST-BF for the current performance level 0,
+execute::
+
+ # intel-speed-select base-freq info -l 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ speed-select-base-freq
+ high-priority-base-frequency(MHz):3000
+ high-priority-cpu-mask:00000216,00002160
+ high-priority-cpu-list:5,6,8,13,33,34,36,41
+ low-priority-base-frequency(MHz):2400
+ tjunction-temperature(C):125
+ thermal-design-power(W):205
+
+The above capabilities show that there are some CPUs on this system that can
+offer base frequency of 3000 MHz compared to the standard base frequency at this
+performance levels. Nevertheless, these CPUs are fixed, and they are presented
+via high-priority-cpu-list/high-priority-cpu-mask. But if this Intel(R) SST-BF
+feature is selected, the low priorities CPUs (which are not in
+high-priority-cpu-list) can only offer up to 2400 MHz. As a result, if this
+clipping of low priority CPUs is acceptable, then the user can enable Intel
+SST-BF feature particularly for the above "sched pipe" workload since only two
+CPUs are used, they can be scheduled on high priority CPUs and can get boost of
+400 MHz.
+
+Enable Intel(R) SST-BF
+~~~~~~~~~~~~~~~~~~~~~~
+
+To enable Intel(R) SST-BF feature, execute::
+
+ # intel-speed-select base-freq enable -a
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ base-freq
+ enable:success
+ package-1
+ die-0
+ cpu-14
+ base-freq
+ enable:success
+
+In this case, -a option is optional. This not only enables Intel(R) SST-BF, but it
+also adjusts the priority of cores using Intel(R) Speed Select Technology Core
+Power (Intel(R) SST-CP) features. This option sets the minimum performance of each
+Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP) class to
+maximum performance so that the hardware will give maximum performance possible
+for each CPU.
+
+If -a option is not used, then the following steps are required before enabling
+Intel(R) SST-BF:
+
+- Discover Intel(R) SST-BF and note low and high priority base frequency
+- Note the high prioity CPU list
+- Enable CLOS using core-power feature set
+- Configure CLOS parameters. Use CLOS.min to set to minimum performance
+- Subscribe desired CPUs to CLOS groups
+
+With this configuration, if the same workload is executed by pinning the
+workload to high priority CPUs (CPU 5 and 6 in this case)::
+
+ #taskset -c 5,6 perf bench -r 100 sched pipe
+ # Running 'sched/pipe' benchmark:
+ # Executed 1000000 pipe operations between two processes
+ Total time: 5.627 [sec]
+ 5.627922 usecs/op
+ 177685 ops/sec
+
+This way, by enabling Intel(R) SST-BF, the performance of this benchmark is
+improved (latency reduced) by 7.79%. From the turbostat output, it can be
+observed that the high priority CPUs reached 3000 MHz compared to 2600 MHz.
+The turbostat output::
+
+ #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
+ Package Core CPU Bzy_MHz
+ 0 0 0 2151
+ 0 1 1 2166
+ 0 2 2 2175
+ 0 3 3 2175
+ 0 4 4 2175
+ 0 5 5 3000
+ 0 6 6 3000
+ 0 7 7 2180
+ 0 8 8 2662
+ 0 9 9 2176
+ 0 10 10 2175
+ 0 11 11 2176
+ 0 12 12 2176
+ 0 13 13 2661
+
+Disable Intel(R) SST-BF
+~~~~~~~~~~~~~~~~~~~~~~~
+
+To disable the Intel(R) SST-BF feature, execute::
+
+# intel-speed-select base-freq disable -a
+
+
+Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF)
+--------------------------------------------------------------------
+
+This feature enables the ability to set different "All core turbo ratio limits"
+to cores based on the priority. By using this feature, some cores can be
+configured to get higher turbo frequency by designating them as high priority at
+the cost of lower or no turbo frequency on the low priority cores.
+
+For this reason, this feature is only useful when system is busy utilizing all
+CPUs, but the user wants some configurable option to get high performance on
+some CPUs.
+
+The support of Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF)
+depends on the Intel(R) Speed Select Technology - Performance Profile (Intel
+SST-PP) performance level configuration. It is possible that only a certain
+performance level supports Intel(R) SST-TF. It is also possible that only the base
+performance level (level = 0) has the support of Intel(R) SST-TF. Hence, first
+select the desired performance level to enable this feature.
+
+In the system under test here, Intel(R) SST-TF is supported at the base
+performance level 0, but currently disabled::
+
+ # intel-speed-select -c 0 perf-profile info -l 0
+ Intel(R) Speed Select Technology
+ package-0
+ die-0
+ cpu-0
+ perf-profile-level-0
+ ...
+ ...
+ speed-select-turbo-freq:disabled
+ ...
+ ...
+
+
+To check if performance can be improved using Intel(R) SST-TF feature, get the turbo
+frequency properties with Intel(R) SST-TF enabled and compare to the base turbo
+capability of this system.
+
+Get Base turbo capability
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get the base turbo capability of performance level 0, execute::
+
+ # intel-speed-select perf-profile info -l 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ perf-profile-level-0
+ ...
+ ...
+ turbo-ratio-limits-sse
+ bucket-0
+ core-count:2
+ max-turbo-frequency(MHz):3200
+ bucket-1
+ core-count:4
+ max-turbo-frequency(MHz):3100
+ bucket-2
+ core-count:6
+ max-turbo-frequency(MHz):3100
+ bucket-3
+ core-count:8
+ max-turbo-frequency(MHz):3100
+ bucket-4
+ core-count:10
+ max-turbo-frequency(MHz):3100
+ bucket-5
+ core-count:12
+ max-turbo-frequency(MHz):3100
+ bucket-6
+ core-count:14
+ max-turbo-frequency(MHz):3100
+ bucket-7
+ core-count:16
+ max-turbo-frequency(MHz):3100
+
+Based on the data above, when all the CPUS are busy, the max. frequency of 3100
+MHz can be achieved. If there is some busy workload on cpu 0 - 11 (e.g. stress)
+and on CPU 12 and 13, execute "hackbench pipe" workload::
+
+ # taskset -c 12,13 perf bench -r 100 sched pipe
+ # Running 'sched/pipe' benchmark:
+ # Executed 1000000 pipe operations between two processes
+ Total time: 5.705 [sec]
+ 5.705488 usecs/op
+ 175269 ops/sec
+
+The turbostat output::
+
+ #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
+ Package Core CPU Bzy_MHz
+ 0 0 0 3000
+ 0 1 1 3000
+ 0 2 2 3000
+ 0 3 3 3000
+ 0 4 4 3000
+ 0 5 5 3100
+ 0 6 6 3100
+ 0 7 7 3000
+ 0 8 8 3100
+ 0 9 9 3000
+ 0 10 10 3000
+ 0 11 11 3000
+ 0 12 12 3100
+ 0 13 13 3100
+
+Based on turbostat output, the performance is limited by frequency cap of 3100
+MHz. To check if the hackbench performance can be improved for CPU 12 and CPU
+13, first check the capability of the Intel(R) SST-TF feature for this performance
+level.
+
+Get Intel(R) SST-TF Capability
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get the capability, the "turbo-freq info" command can be used::
+
+ # intel-speed-select turbo-freq info -l 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ speed-select-turbo-freq
+ bucket-0
+ high-priority-cores-count:2
+ high-priority-max-frequency(MHz):3200
+ high-priority-max-avx2-frequency(MHz):3200
+ high-priority-max-avx512-frequency(MHz):3100
+ bucket-1
+ high-priority-cores-count:4
+ high-priority-max-frequency(MHz):3100
+ high-priority-max-avx2-frequency(MHz):3000
+ high-priority-max-avx512-frequency(MHz):2900
+ bucket-2
+ high-priority-cores-count:6
+ high-priority-max-frequency(MHz):3100
+ high-priority-max-avx2-frequency(MHz):3000
+ high-priority-max-avx512-frequency(MHz):2900
+ speed-select-turbo-freq-clip-frequencies
+ low-priority-max-frequency(MHz):2600
+ low-priority-max-avx2-frequency(MHz):2400
+ low-priority-max-avx512-frequency(MHz):2100
+
+Based on the output above, there is an Intel(R) SST-TF bucket for which there are
+two high priority cores. If only two high priority cores are set, then max.
+turbo frequency on those cores can be increased to 3200 MHz. This is 100 MHz
+more than the base turbo capability for all cores.
+
+In turn, for the hackbench workload, two CPUs can be set as high priority and
+rest as low priority. One side effect is that once enabled, the low priority
+cores will be clipped to a lower frequency of 2600 MHz.
+
+Enable Intel(R) SST-TF
+~~~~~~~~~~~~~~~~~~~~~~
+
+To enable Intel(R) SST-TF, execute::
+
+ # intel-speed-select -c 12,13 turbo-freq enable -a
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-12
+ turbo-freq
+ enable:success
+ package-0
+ die-0
+ cpu-13
+ turbo-freq
+ enable:success
+ package--1
+ die-0
+ cpu-63
+ turbo-freq --auto
+ enable:success
+
+In this case, the option "-a" is optional. If set, it enables Intel(R) SST-TF
+feature and also sets the CPUs to high and and low priority using Intel Speed
+Select Technology Core Power (Intel(R) SST-CP) features. The CPU numbers passed
+with "-c" arguments are marked as high priority, including its siblings.
+
+If -a option is not used, then the following steps are required before enabling
+Intel(R) SST-TF:
+
+- Discover Intel(R) SST-TF and note buckets of high priority cores and maximum frequency
+
+- Enable CLOS using core-power feature set - Configure CLOS parameters
+
+- Subscribe desired CPUs to CLOS groups making sure that high priority cores are set to the maximum frequency
+
+If the same hackbench workload is executed, schedule hackbench threads on high
+priority CPUs::
+
+ #taskset -c 12,13 perf bench -r 100 sched pipe
+ # Running 'sched/pipe' benchmark:
+ # Executed 1000000 pipe operations between two processes
+ Total time: 5.510 [sec]
+ 5.510165 usecs/op
+ 180826 ops/sec
+
+This improved performance by around 3.3% improvement on a busy system. Here the
+turbostat output will show that the CPU 12 and CPU 13 are getting 100 MHz boost.
+The turbostat output::
+
+ #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
+ Package Core CPU Bzy_MHz
+ ...
+ 0 12 12 3200
+ 0 13 13 3200
diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst
index ad392f3aee06..39d80bc29ccd 100644
--- a/Documentation/admin-guide/pm/intel_pstate.rst
+++ b/Documentation/admin-guide/pm/intel_pstate.rst
@@ -62,9 +62,10 @@ on the capabilities of the processor.
Active Mode
-----------
-This is the default operation mode of ``intel_pstate``. If it works in this
-mode, the ``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq``
-policies contains the string "intel_pstate".
+This is the default operation mode of ``intel_pstate`` for processors with
+hardware-managed P-states (HWP) support. If it works in this mode, the
+``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` policies
+contains the string "intel_pstate".
In this mode the driver bypasses the scaling governors layer of ``CPUFreq`` and
provides its own scaling algorithms for P-state selection. Those algorithms
@@ -138,12 +139,13 @@ internal P-state selection logic to be less performance-focused.
Active Mode Without HWP
~~~~~~~~~~~~~~~~~~~~~~~
-This is the default operation mode for processors that do not support the HWP
-feature. It also is used by default with the ``intel_pstate=no_hwp`` argument
-in the kernel command line. However, in this mode ``intel_pstate`` may refuse
-to work with the given processor if it does not recognize it. [Note that
-``intel_pstate`` will never refuse to work with any processor with the HWP
-feature enabled.]
+This operation mode is optional for processors that do not support the HWP
+feature or when the ``intel_pstate=no_hwp`` argument is passed to the kernel in
+the command line. The active mode is used in those cases if the
+``intel_pstate=active`` argument is passed to the kernel in the command line.
+In this mode ``intel_pstate`` may refuse to work with processors that are not
+recognized by it. [Note that ``intel_pstate`` will never refuse to work with
+any processor with the HWP feature enabled.]
In this mode ``intel_pstate`` registers utilization update callbacks with the
CPU scheduler in order to run a P-state selection algorithm, either
@@ -188,10 +190,14 @@ is not set.
Passive Mode
------------
-This mode is used if the ``intel_pstate=passive`` argument is passed to the
-kernel in the command line (it implies the ``intel_pstate=no_hwp`` setting too).
-Like in the active mode without HWP support, in this mode ``intel_pstate`` may
-refuse to work with the given processor if it does not recognize it.
+This is the default operation mode of ``intel_pstate`` for processors without
+hardware-managed P-states (HWP) support. It is always used if the
+``intel_pstate=passive`` argument is passed to the kernel in the command line
+regardless of whether or not the given processor supports HWP. [Note that the
+``intel_pstate=no_hwp`` setting implies ``intel_pstate=passive`` if it is used
+without ``intel_pstate=active``.] Like in the active mode without HWP support,
+in this mode ``intel_pstate`` may refuse to work with processors that are not
+recognized by it.
If the driver works in this mode, the ``scaling_driver`` policy attribute in
``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq".
diff --git a/Documentation/admin-guide/pm/working-state.rst b/Documentation/admin-guide/pm/working-state.rst
index 0a38cdf39df1..f40994c422dc 100644
--- a/Documentation/admin-guide/pm/working-state.rst
+++ b/Documentation/admin-guide/pm/working-state.rst
@@ -13,3 +13,4 @@ Working-State Power Management
intel_pstate
cpufreq_drivers
intel_epb
+ intel-speed-select
diff --git a/Documentation/admin-guide/pstore-blk.rst b/Documentation/admin-guide/pstore-blk.rst
new file mode 100644
index 000000000000..296d5027787a
--- /dev/null
+++ b/Documentation/admin-guide/pstore-blk.rst
@@ -0,0 +1,243 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+pstore block oops/panic logger
+==============================
+
+Introduction
+------------
+
+pstore block (pstore/blk) is an oops/panic logger that writes its logs to a
+block device and non-block device before the system crashes. You can get
+these log files by mounting pstore filesystem like::
+
+ mount -t pstore pstore /sys/fs/pstore
+
+
+pstore block concepts
+---------------------
+
+pstore/blk provides efficient configuration method for pstore/blk, which
+divides all configurations into two parts, configurations for user and
+configurations for driver.
+
+Configurations for user determine how pstore/blk works, such as pmsg_size,
+kmsg_size and so on. All of them support both Kconfig and module parameters,
+but module parameters have priority over Kconfig.
+
+Configurations for driver are all about block device and non-block device,
+such as total_size of block device and read/write operations.
+
+Configurations for user
+-----------------------
+
+All of these configurations support both Kconfig and module parameters, but
+module parameters have priority over Kconfig.
+
+Here is an example for module parameters::
+
+ pstore_blk.blkdev=179:7 pstore_blk.kmsg_size=64
+
+The detail of each configurations may be of interest to you.
+
+blkdev
+~~~~~~
+
+The block device to use. Most of the time, it is a partition of block device.
+It's required for pstore/blk. It is also used for MTD device.
+
+It accepts the following variants for block device:
+
+1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
+ leading 0x, for example b302.
+#. /dev/<disk_name> represents the device number of disk
+#. /dev/<disk_name><decimal> represents the device number of partition - device
+ number of disk plus the partition number
+#. /dev/<disk_name>p<decimal> - same as the above; this form is used when disk
+ name of partitioned disk ends with a digit.
+#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF represents the unique id of
+ a partition if the partition table provides it. The UUID may be either an
+ EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP,
+ where SSSSSSSS is a zero-filled hex representation of the 32-bit
+ "NT disk signature", and PP is a zero-filled hex representation of the
+ 1-based partition number.
+#. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a
+ partition with a known unique id.
+#. <major>:<minor> major and minor number of the device separated by a colon.
+
+It accepts the following variants for MTD device:
+
+1. <device name> MTD device name. "pstore" is recommended.
+#. <device number> MTD device number.
+
+kmsg_size
+~~~~~~~~~
+
+The chunk size in KB for oops/panic front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care oops/panic log.
+
+There are multiple chunks for oops/panic front-end depending on the remaining
+space except other pstore front-ends.
+
+pstore/blk will log to oops/panic chunks one by one, and always overwrite the
+oldest chunk if there is no more free chunk.
+
+pmsg_size
+~~~~~~~~~
+
+The chunk size in KB for pmsg front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care pmsg log.
+
+Unlike oops/panic front-end, there is only one chunk for pmsg front-end.
+
+Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are
+appended to the chunk. On reboot the contents are available in
+*/sys/fs/pstore/pmsg-pstore-blk-0*.
+
+console_size
+~~~~~~~~~~~~
+
+The chunk size in KB for console front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care console log.
+
+Similar to pmsg front-end, there is only one chunk for console front-end.
+
+All log of console will be appended to the chunk. On reboot the contents are
+available in */sys/fs/pstore/console-pstore-blk-0*.
+
+ftrace_size
+~~~~~~~~~~~
+
+The chunk size in KB for ftrace front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care console log.
+
+Similar to oops front-end, there are multiple chunks for ftrace front-end
+depending on the count of cpu processors. Each chunk size is equal to
+ftrace_size / processors_count.
+
+All log of ftrace will be appended to the chunk. On reboot the contents are
+combined and available in */sys/fs/pstore/ftrace-pstore-blk-0*.
+
+Persistent function tracing might be useful for debugging software or hardware
+related hangs. Here is an example of usage::
+
+ # mount -t pstore pstore /sys/fs/pstore
+ # mount -t debugfs debugfs /sys/kernel/debug/
+ # echo 1 > /sys/kernel/debug/pstore/record_ftrace
+ # reboot -f
+ [...]
+ # mount -t pstore pstore /sys/fs/pstore
+ # tail /sys/fs/pstore/ftrace-pstore-blk-0
+ CPU:0 ts:5914676 c0063828 c0063b94 call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0
+ CPU:0 ts:5914678 c039ecdc c006385c cpuidle_enter_state <- call_cpuidle+0x44/0x48
+ CPU:0 ts:5914680 c039e9a0 c039ecf0 cpuidle_enter_freeze <- cpuidle_enter_state+0x304/0x314
+ CPU:0 ts:5914681 c0063870 c039ea30 sched_idle_set_state <- cpuidle_enter_state+0x44/0x314
+ CPU:1 ts:5916720 c0160f59 c015ee04 kernfs_unmap_bin_file <- __kernfs_remove+0x140/0x204
+ CPU:1 ts:5916721 c05ca625 c015ee0c __mutex_lock_slowpath <- __kernfs_remove+0x148/0x204
+ CPU:1 ts:5916723 c05c813d c05ca630 yield_to <- __mutex_lock_slowpath+0x314/0x358
+ CPU:1 ts:5916724 c05ca2d1 c05ca638 __ww_mutex_lock <- __mutex_lock_slowpath+0x31c/0x358
+
+max_reason
+~~~~~~~~~~
+
+Limiting which kinds of kmsg dumps are stored can be controlled via
+the ``max_reason`` value, as defined in include/linux/kmsg_dump.h's
+``enum kmsg_dump_reason``. For example, to store both Oopses and Panics,
+``max_reason`` should be set to 2 (KMSG_DUMP_OOPS), to store only Panics
+``max_reason`` should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0
+(KMSG_DUMP_UNDEF), means the reason filtering will be controlled by the
+``printk.always_kmsg_dump`` boot param: if unset, it'll be KMSG_DUMP_OOPS,
+otherwise KMSG_DUMP_MAX.
+
+Configurations for driver
+-------------------------
+
+Only a block device driver cares about these configurations. A block device
+driver uses ``register_pstore_blk`` to register to pstore/blk.
+
+.. kernel-doc:: fs/pstore/blk.c
+ :identifiers: register_pstore_blk
+
+A non-block device driver uses ``register_pstore_device`` with
+``struct pstore_device_info`` to register to pstore/blk.
+
+.. kernel-doc:: fs/pstore/blk.c
+ :identifiers: register_pstore_device
+
+.. kernel-doc:: include/linux/pstore_blk.h
+ :identifiers: pstore_device_info
+
+Compression and header
+----------------------
+
+Block device is large enough for uncompressed oops data. Actually we do not
+recommend data compression because pstore/blk will insert some information into
+the first line of oops/panic data. For example::
+
+ Panic: Total 16 times
+
+It means that it's OOPS|Panic for the 16th time since the first booting.
+Sometimes the number of occurrences of oops|panic since the first booting is
+important to judge whether the system is stable.
+
+The following line is inserted by pstore filesystem. For example::
+
+ Oops#2 Part1
+
+It means that it's OOPS for the 2nd time on the last boot.
+
+Reading the data
+----------------
+
+The dump data can be read from the pstore filesystem. The format for these
+files is ``dmesg-pstore-blk-[N]`` for oops/panic front-end,
+``pmsg-pstore-blk-0`` for pmsg front-end and so on. The timestamp of the
+dump file records the trigger time. To delete a stored record from block
+device, simply unlink the respective pstore file.
+
+Attentions in panic read/write APIs
+-----------------------------------
+
+If on panic, the kernel is not going to run for much longer, the tasks will not
+be scheduled and most kernel resources will be out of service. It
+looks like a single-threaded program running on a single-core computer.
+
+The following points require special attention for panic read/write APIs:
+
+1. Can **NOT** allocate any memory.
+ If you need memory, just allocate while the block driver is initializing
+ rather than waiting until the panic.
+#. Must be polled, **NOT** interrupt driven.
+ No task schedule any more. The block driver should delay to ensure the write
+ succeeds, but NOT sleep.
+#. Can **NOT** take any lock.
+ There is no other task, nor any shared resource; you are safe to break all
+ locks.
+#. Just use CPU to transfer.
+ Do not use DMA to transfer unless you are sure that DMA will not keep lock.
+#. Control registers directly.
+ Please control registers directly rather than use Linux kernel resources.
+ Do I/O map while initializing rather than wait until a panic occurs.
+#. Reset your block device and controller if necessary.
+ If you are not sure of the state of your block device and controller when
+ a panic occurs, you are safe to stop and reset them.
+
+pstore/blk supports psblk_blkdev_info(), which is defined in
+*linux/pstore_blk.h*, to get information of using block device, such as the
+device number, sector count and start sector of the whole disk.
+
+pstore block internals
+----------------------
+
+For developer reference, here are all the important structures and APIs:
+
+.. kernel-doc:: fs/pstore/zone.c
+ :internal:
+
+.. kernel-doc:: include/linux/pstore_zone.h
+ :internal:
+
+.. kernel-doc:: fs/pstore/blk.c
+ :export:
+
+.. kernel-doc:: include/linux/pstore_blk.h
+ :internal:
diff --git a/Documentation/admin-guide/ramoops.rst b/Documentation/admin-guide/ramoops.rst
index 6dbcc5481000..a60a96218ba9 100644
--- a/Documentation/admin-guide/ramoops.rst
+++ b/Documentation/admin-guide/ramoops.rst
@@ -32,11 +32,17 @@ memory to be mapped strongly ordered, and atomic operations on strongly ordered
memory are implementation defined, and won't work on many ARMs such as omaps.
The memory area is divided into ``record_size`` chunks (also rounded down to
-power of two) and each oops/panic writes a ``record_size`` chunk of
+power of two) and each kmesg dump writes a ``record_size`` chunk of
information.
-Dumping both oopses and panics can be done by setting 1 in the ``dump_oops``
-variable while setting 0 in that variable dumps only the panics.
+Limiting which kinds of kmsg dumps are stored can be controlled via
+the ``max_reason`` value, as defined in include/linux/kmsg_dump.h's
+``enum kmsg_dump_reason``. For example, to store both Oopses and Panics,
+``max_reason`` should be set to 2 (KMSG_DUMP_OOPS), to store only Panics
+``max_reason`` should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0
+(KMSG_DUMP_UNDEF), means the reason filtering will be controlled by the
+``printk.always_kmsg_dump`` boot param: if unset, it'll be KMSG_DUMP_OOPS,
+otherwise KMSG_DUMP_MAX.
The module uses a counter to record multiple dumps but the counter gets reset
on restart (i.e. new dumps after the restart will overwrite old ones).
@@ -90,7 +96,7 @@ Setting the ramoops parameters can be done in several different manners:
.mem_address = <...>,
.mem_type = <...>,
.record_size = <...>,
- .dump_oops = <...>,
+ .max_reason = <...>,
.ecc = <...>,
};
diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst
index 0310db624964..7b481b2a368e 100644
--- a/Documentation/admin-guide/ras.rst
+++ b/Documentation/admin-guide/ras.rst
@@ -156,11 +156,11 @@ the labels provided by the BIOS won't match the real ones.
ECC memory
----------
-As mentioned on the previous section, ECC memory has extra bits to be
-used for error correction. So, on 64 bit systems, a memory module
-has 64 bits of *data width*, and 74 bits of *total width*. So, there are
-8 bits extra bits to be used for the error detection and correction
-mechanisms. Those extra bits are called *syndrome*\ [#f1]_\ [#f2]_.
+As mentioned in the previous section, ECC memory has extra bits to be
+used for error correction. In the above example, a memory module has
+64 bits of *data width*, and 72 bits of *total width*. The extra 8
+bits which are used for the error detection and correction mechanisms
+are referred to as the *syndrome*\ [#f1]_\ [#f2]_.
So, when the cpu requests the memory controller to write a word with
*data width*, the memory controller calculates the *syndrome* in real time,
@@ -212,7 +212,7 @@ EDAC - Error Detection And Correction
purposes.
When the subsystem was pushed upstream for the first time, on
- Kernel 2.6.16, for the first time, it was renamed to ``EDAC``.
+ Kernel 2.6.16, it was renamed to ``EDAC``.
Purpose
-------
@@ -351,15 +351,17 @@ controllers. The following example will assume 2 channels:
+------------+-----------+-----------+
| | ``ch0`` | ``ch1`` |
+============+===========+===========+
- | ``csrow0`` | DIMM_A0 | DIMM_B0 |
- | | rank0 | rank0 |
- +------------+ - | - |
+ | |**DIMM_A0**|**DIMM_B0**|
+ +------------+-----------+-----------+
+ | ``csrow0`` | rank0 | rank0 |
+ +------------+-----------+-----------+
| ``csrow1`` | rank1 | rank1 |
+------------+-----------+-----------+
- | ``csrow2`` | DIMM_A1 | DIMM_B1 |
- | | rank0 | rank0 |
- +------------+ - | - |
- | ``csrow3`` | rank1 | rank1 |
+ | |**DIMM_A1**|**DIMM_B1**|
+ +------------+-----------+-----------+
+ | ``csrow2`` | rank0 | rank0 |
+ +------------+-----------+-----------+
+ | ``csrow3`` | rank1 | rank1 |
+------------+-----------+-----------+
In the above example, there are 4 physical slots on the motherboard
diff --git a/Documentation/admin-guide/serial-console.rst b/Documentation/admin-guide/serial-console.rst
index a8d1e36b627a..58b32832e50a 100644
--- a/Documentation/admin-guide/serial-console.rst
+++ b/Documentation/admin-guide/serial-console.rst
@@ -54,7 +54,7 @@ You will need to create a new device to use ``/dev/console``. The official
``/dev/console`` is now character device 5,1.
(You can also use a network device as a console. See
-``Documentation/networking/netconsole.txt`` for information on that.)
+``Documentation/networking/netconsole.rst`` for information on that.)
Here's an example that will use ``/dev/ttyS1`` (COM2) as the console.
Replace the sample values as needed.
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 0d427fd10941..1ebf68d01141 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -102,6 +102,30 @@ See the ``type_of_loader`` and ``ext_loader_ver`` fields in
:doc:`/x86/boot` for additional information.
+bpf_stats_enabled
+=================
+
+Controls whether the kernel should collect statistics on BPF programs
+(total time spent running, number of times run...). Enabling
+statistics causes a slight reduction in performance on each program
+run. The statistics can be seen using ``bpftool``.
+
+= ===================================
+0 Don't collect statistics (default).
+1 Collect statistics.
+= ===================================
+
+
+cad_pid
+=======
+
+This is the pid which will be signalled on reboot (notably, by
+Ctrl-Alt-Delete). Writing a value to this file which doesn't
+correspond to a running process will result in ``-ESRCH``.
+
+See also `ctrl-alt-del`_.
+
+
cap_last_cap
============
@@ -241,6 +265,40 @@ domain names are in general different. For a detailed discussion
see the ``hostname(1)`` man page.
+firmware_config
+===============
+
+See :doc:`/driver-api/firmware/fallback-mechanisms`.
+
+The entries in this directory allow the firmware loader helper
+fallback to be controlled:
+
+* ``force_sysfs_fallback``, when set to 1, forces the use of the
+ fallback;
+* ``ignore_sysfs_fallback``, when set to 1, ignores any fallback.
+
+
+ftrace_dump_on_oops
+===================
+
+Determines whether ``ftrace_dump()`` should be called on an oops (or
+kernel panic). This will output the contents of the ftrace buffers to
+the console. This is very useful for capturing traces that lead to
+crashes and outputting them to a serial console.
+
+= ===================================================
+0 Disabled (default).
+1 Dump buffers of all CPUs.
+2 Dump the buffer of the CPU that triggered the oops.
+= ===================================================
+
+
+ftrace_enabled, stack_tracer_enabled
+====================================
+
+See :doc:`/trace/ftrace`.
+
+
hardlockup_all_cpu_backtrace
============================
@@ -344,6 +402,25 @@ Controls whether the panic kmsg data should be reported to Hyper-V.
= =========================================================
+ignore-unaligned-usertrap
+=========================
+
+On architectures where unaligned accesses cause traps, and where this
+feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN``;
+currently, ``arc`` and ``ia64``), controls whether all unaligned traps
+are logged.
+
+= =============================================================
+0 Log all unaligned accesses.
+1 Only warn the first time a process traps. This is the default
+ setting.
+= =============================================================
+
+See also `unaligned-trap`_ and `unaligned-dump-stack`_. On ``ia64``,
+this allows system administrators to override the
+``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
+
+
kexec_load_disabled
===================
@@ -459,6 +536,15 @@ Notes:
successful IPC object allocation. If an IPC object allocation syscall
fails, it is undefined if the value remains unmodified or is reset to -1.
+
+ngroups_max
+===========
+
+Maximum number of supplementary groups, _i.e._ the maximum size which
+``setgroups`` will accept. Exports ``NGROUPS_MAX`` from the kernel.
+
+
+
nmi_watchdog
============
@@ -721,7 +807,13 @@ perf_event_paranoid
===================
Controls use of the performance events system by unprivileged
-users (without CAP_SYS_ADMIN). The default value is 2.
+users (without CAP_PERFMON). The default value is 2.
+
+For backward compatibility reasons access to system performance
+monitoring and observability remains open for CAP_SYS_ADMIN
+privileged processes but CAP_SYS_ADMIN usage for secure system
+performance monitoring and observability operations is discouraged
+with respect to CAP_PERFMON use cases.
=== ==================================================================
-1 Allow use of (almost) all events by all users.
@@ -730,13 +822,13 @@ users (without CAP_SYS_ADMIN). The default value is 2.
``CAP_IPC_LOCK``.
>=0 Disallow ftrace function tracepoint by users without
- ``CAP_SYS_ADMIN``.
+ ``CAP_PERFMON``.
- Disallow raw tracepoint access by users without ``CAP_SYS_ADMIN``.
+ Disallow raw tracepoint access by users without ``CAP_PERFMON``.
->=1 Disallow CPU event access by users without ``CAP_SYS_ADMIN``.
+>=1 Disallow CPU event access by users without ``CAP_PERFMON``.
->=2 Disallow kernel profiling by users without ``CAP_SYS_ADMIN``.
+>=2 Disallow kernel profiling by users without ``CAP_PERFMON``.
=== ==================================================================
@@ -871,7 +963,7 @@ this sysctl interface anymore.
pty
===
-See Documentation/filesystems/devpts.txt.
+See Documentation/filesystems/devpts.rst.
randomize_va_space
@@ -1167,6 +1259,65 @@ If a value outside of this range is written to ``threads-max`` an
``EINVAL`` error occurs.
+traceoff_on_warning
+===================
+
+When set, disables tracing (see :doc:`/trace/ftrace`) when a
+``WARN()`` is hit.
+
+
+tracepoint_printk
+=================
+
+When tracepoints are sent to printk() (enabled by the ``tp_printk``
+boot parameter), this entry provides runtime control::
+
+ echo 0 > /proc/sys/kernel/tracepoint_printk
+
+will stop tracepoints from being sent to printk(), and::
+
+ echo 1 > /proc/sys/kernel/tracepoint_printk
+
+will send them to printk() again.
+
+This only works if the kernel was booted with ``tp_printk`` enabled.
+
+See :doc:`/admin-guide/kernel-parameters` and
+:doc:`/trace/boottime-trace`.
+
+
+.. _unaligned-dump-stack:
+
+unaligned-dump-stack (ia64)
+===========================
+
+When logging unaligned accesses, controls whether the stack is
+dumped.
+
+= ===================================================
+0 Do not dump the stack. This is the default setting.
+1 Dump the stack.
+= ===================================================
+
+See also `ignore-unaligned-usertrap`_.
+
+
+unaligned-trap
+==============
+
+On architectures where unaligned accesses cause traps, and where this
+feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_ALLOW``; currently,
+``arc`` and ``parisc``), controls whether unaligned traps are caught
+and emulated (instead of failing).
+
+= ========================================================
+0 Do not emulate unaligned accesses.
+1 Emulate unaligned accesses. This is the default setting.
+= ========================================================
+
+See also `ignore-unaligned-usertrap`_.
+
+
unknown_nmi_panic
=================
@@ -1178,6 +1329,16 @@ NMI switch that most IA32 servers have fires unknown NMI up, for
example. If a system hangs up, try pressing the NMI switch.
+unprivileged_bpf_disabled
+=========================
+
+Writing 1 to this entry will disable unprivileged calls to ``bpf()``;
+once disabled, calling ``bpf()`` without ``CAP_SYS_ADMIN`` will return
+``-EPERM``.
+
+Once set, this can't be cleared.
+
+
watchdog
========
diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index e043c9213388..42cd04bca548 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -339,7 +339,9 @@ settings from init_net and for IPv6 we reset all settings to default.
If set to 1, both IPv4 and IPv6 settings are forced to inherit from
current ones in init_net. If set to 2, both IPv4 and IPv6 settings are
-forced to reset to their default values.
+forced to reset to their default values. If set to 3, both IPv4 and IPv6
+settings are forced to inherit from current ones in the netns where this
+new netns has been created.
Default : 0 (for compatibility reasons)
@@ -353,8 +355,8 @@ socket's buffer. It will not take effect unless PF_UNIX flag is specified.
3. /proc/sys/net/ipv4 - IPV4 settings
-------------------------------------
-Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for
-descriptions of these entries.
+Please see: Documentation/networking/ip-sysctl.rst and
+Documentation/admin-guide/sysctl/net.rst for descriptions of these entries.
4. Appletalk
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 0329a4d3fa9e..d46d5b7013c6 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -831,14 +831,27 @@ tooling to work, you can do::
swappiness
==========
-This control is used to define how aggressive the kernel will swap
-memory pages. Higher values will increase aggressiveness, lower values
-decrease the amount of swap. A value of 0 instructs the kernel not to
-initiate swap until the amount of free and file-backed pages is less
-than the high water mark in a zone.
+This control is used to define the rough relative IO cost of swapping
+and filesystem paging, as a value between 0 and 200. At 100, the VM
+assumes equal IO cost and will thus apply memory pressure to the page
+cache and swap-backed pages equally; lower values signify more
+expensive swap IO, higher values indicates cheaper.
+
+Keep in mind that filesystem IO patterns under memory pressure tend to
+be more efficient than swap's random IO. An optimal value will require
+experimentation and will also be workload-dependent.
The default value is 60.
+For in-memory swap, like zram or zswap, as well as hybrid setups that
+have swap on faster devices than the filesystem, values beyond 100 can
+be considered. For example, if the random IO against the swap device
+is on average 2x faster than IO from the filesystem, swappiness should
+be 133 (x + 2x = 200, 2x = 133.33).
+
+At 0, the kernel will not initiate swap until the amount of free and
+file-backed pages is less than the high watermark in a zone.
+
unprivileged_userfaultfd
========================
diff --git a/Documentation/arm64/amu.rst b/Documentation/arm64/amu.rst
index 5057b11100ed..452ec8b115c2 100644
--- a/Documentation/arm64/amu.rst
+++ b/Documentation/arm64/amu.rst
@@ -23,6 +23,7 @@ optional external memory-mapped interface.
Version 1 of the Activity Monitors architecture implements a counter group
of four fixed and architecturally defined 64-bit event counters.
+
- CPU cycle counter: increments at the frequency of the CPU.
- Constant counter: increments at the fixed frequency of the system
clock.
@@ -57,6 +58,7 @@ counters, only the presence of the extension.
Firmware (code running at higher exception levels, e.g. arm-tf) support is
needed to:
+
- Enable access for lower exception levels (EL2 and EL1) to the AMU
registers.
- Enable the counters. If not enabled these will read as 0.
@@ -78,6 +80,7 @@ are not trapped in EL2/EL3.
The fixed counters of AMUv1 are accessible though the following system
register definitions:
+
- SYS_AMEVCNTR0_CORE_EL0
- SYS_AMEVCNTR0_CONST_EL0
- SYS_AMEVCNTR0_INST_RET_EL0
@@ -93,6 +96,7 @@ Userspace access
----------------
Currently, access from userspace to the AMU registers is disabled due to:
+
- Security reasons: they might expose information about code executed in
secure mode.
- Purpose: AMU counters are intended for system management use.
@@ -105,6 +109,7 @@ Virtualization
Currently, access from userspace (EL0) and kernelspace (EL1) on the KVM
guest side is disabled due to:
+
- Security reasons: they might expose information about code executed
by other guests or the host.
diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
index a3f1a47b6f1c..7552dbc1cc54 100644
--- a/Documentation/arm64/booting.rst
+++ b/Documentation/arm64/booting.rst
@@ -173,7 +173,10 @@ Before jumping into the kernel, the following conditions must be met:
- Caches, MMUs
The MMU must be off.
- Instruction cache may be on or off.
+
+ The instruction cache may be on or off, and must not hold any stale
+ entries corresponding to the loaded kernel image.
+
The address range corresponding to the loaded kernel image must be
cleaned to the PoC. In the presence of a system cache or other
coherent masters with caches enabled, this will typically require
@@ -238,6 +241,7 @@ Before jumping into the kernel, the following conditions must be met:
- The DT or ACPI tables must describe a GICv2 interrupt controller.
For CPUs with pointer authentication functionality:
+
- If EL3 is present:
- SCR_EL3.APK (bit 16) must be initialised to 0b1
@@ -249,18 +253,22 @@ Before jumping into the kernel, the following conditions must be met:
- HCR_EL2.API (bit 41) must be initialised to 0b1
For CPUs with Activity Monitors Unit v1 (AMUv1) extension present:
+
- If EL3 is present:
- CPTR_EL3.TAM (bit 30) must be initialised to 0b0
- CPTR_EL2.TAM (bit 30) must be initialised to 0b0
- AMCNTENSET0_EL0 must be initialised to 0b1111
- AMCNTENSET1_EL0 must be initialised to a platform specific value
- having 0b1 set for the corresponding bit for each of the auxiliary
- counters present.
+
+ - CPTR_EL3.TAM (bit 30) must be initialised to 0b0
+ - CPTR_EL2.TAM (bit 30) must be initialised to 0b0
+ - AMCNTENSET0_EL0 must be initialised to 0b1111
+ - AMCNTENSET1_EL0 must be initialised to a platform specific value
+ having 0b1 set for the corresponding bit for each of the auxiliary
+ counters present.
+
- If the kernel is entered at EL1:
- AMCNTENSET0_EL0 must be initialised to 0b1111
- AMCNTENSET1_EL0 must be initialised to a platform specific value
- having 0b1 set for the corresponding bit for each of the auxiliary
- counters present.
+
+ - AMCNTENSET0_EL0 must be initialised to 0b1111
+ - AMCNTENSET1_EL0 must be initialised to a platform specific value
+ having 0b1 set for the corresponding bit for each of the auxiliary
+ counters present.
The requirements described above for CPU mode, caches, MMUs, architected
timers, coherency and system registers apply to all CPUs. All CPUs must
@@ -304,7 +312,8 @@ following manner:
Documentation/devicetree/bindings/arm/psci.yaml.
- Secondary CPU general-purpose register settings
- x0 = 0 (reserved for future use)
- x1 = 0 (reserved for future use)
- x2 = 0 (reserved for future use)
- x3 = 0 (reserved for future use)
+
+ - x0 = 0 (reserved for future use)
+ - x1 = 0 (reserved for future use)
+ - x2 = 0 (reserved for future use)
+ - x3 = 0 (reserved for future use)
diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
index 41937a8091aa..314fa5bc2655 100644
--- a/Documentation/arm64/cpu-feature-registers.rst
+++ b/Documentation/arm64/cpu-feature-registers.rst
@@ -176,6 +176,8 @@ infrastructure:
+------------------------------+---------+---------+
| SSBS | [7-4] | y |
+------------------------------+---------+---------+
+ | BT | [3-0] | y |
+ +------------------------------+---------+---------+
4) MIDR_EL1 - Main ID Register
diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst
index 7dfb97dfe416..84a9fd2d41b4 100644
--- a/Documentation/arm64/elf_hwcaps.rst
+++ b/Documentation/arm64/elf_hwcaps.rst
@@ -236,6 +236,11 @@ HWCAP2_RNG
Functionality implied by ID_AA64ISAR0_EL1.RNDR == 0b0001.
+HWCAP2_BTI
+
+ Functionality implied by ID_AA64PFR0_EL1.BT == 0b0001.
+
+
4. Unused AT_HWCAP bits
-----------------------
diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index 2c08c628febd..936cf2a59ca4 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -64,6 +64,10 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM | Cortex-A55 | #1530923 | ARM64_ERRATUM_1530923 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A57 | #852523 | N/A |
@@ -78,8 +82,6 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 |
+----------------+-----------------+-----------------+-----------------------------+
-| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 |
-+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A76 | #1188873,1418040| ARM64_ERRATUM_1418040 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A76 | #1165522 | ARM64_ERRATUM_1165522 |
@@ -88,8 +90,6 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 |
+----------------+-----------------+-----------------+-----------------------------+
-| ARM | Cortex-A55 | #1530923 | ARM64_ERRATUM_1530923 |
-+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1349291 | N/A |
diff --git a/Documentation/block/biovecs.rst b/Documentation/block/biovecs.rst
index ad303a2569d3..36771a131b56 100644
--- a/Documentation/block/biovecs.rst
+++ b/Documentation/block/biovecs.rst
@@ -129,6 +129,7 @@ Usage of helpers:
::
bio_for_each_segment_all()
+ bio_for_each_bvec_all()
bio_first_bvec_all()
bio_first_page_all()
bio_last_bvec_all()
@@ -143,4 +144,5 @@ Usage of helpers:
bio_vec' will contain a multi-page IO vector during the iteration::
bio_for_each_bvec()
+ bio_for_each_bvec_all()
rq_for_each_bvec()
diff --git a/Documentation/block/index.rst b/Documentation/block/index.rst
index 3fa7a52fafa4..026addfc69bc 100644
--- a/Documentation/block/index.rst
+++ b/Documentation/block/index.rst
@@ -14,6 +14,7 @@ Block
cmdline-partition
data-integrity
deadline-iosched
+ inline-encryption
ioprio
kyber-iosched
null_blk
diff --git a/Documentation/block/inline-encryption.rst b/Documentation/block/inline-encryption.rst
new file mode 100644
index 000000000000..354817b80887
--- /dev/null
+++ b/Documentation/block/inline-encryption.rst
@@ -0,0 +1,263 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+Inline Encryption
+=================
+
+Background
+==========
+
+Inline encryption hardware sits logically between memory and the disk, and can
+en/decrypt data as it goes in/out of the disk. Inline encryption hardware has a
+fixed number of "keyslots" - slots into which encryption contexts (i.e. the
+encryption key, encryption algorithm, data unit size) can be programmed by the
+kernel at any time. Each request sent to the disk can be tagged with the index
+of a keyslot (and also a data unit number to act as an encryption tweak), and
+the inline encryption hardware will en/decrypt the data in the request with the
+encryption context programmed into that keyslot. This is very different from
+full disk encryption solutions like self encrypting drives/TCG OPAL/ATA
+Security standards, since with inline encryption, any block on disk could be
+encrypted with any encryption context the kernel chooses.
+
+
+Objective
+=========
+
+We want to support inline encryption (IE) in the kernel.
+To allow for testing, we also want a crypto API fallback when actual
+IE hardware is absent. We also want IE to work with layered devices
+like dm and loopback (i.e. we want to be able to use the IE hardware
+of the underlying devices if present, or else fall back to crypto API
+en/decryption).
+
+
+Constraints and notes
+=====================
+
+- IE hardware has a limited number of "keyslots" that can be programmed
+ with an encryption context (key, algorithm, data unit size, etc.) at any time.
+ One can specify a keyslot in a data request made to the device, and the
+ device will en/decrypt the data using the encryption context programmed into
+ that specified keyslot. When possible, we want to make multiple requests with
+ the same encryption context share the same keyslot.
+
+- We need a way for upper layers like filesystems to specify an encryption
+ context to use for en/decrypting a struct bio, and a device driver (like UFS)
+ needs to be able to use that encryption context when it processes the bio.
+
+- We need a way for device drivers to expose their inline encryption
+ capabilities in a unified way to the upper layers.
+
+
+Design
+======
+
+We add a :c:type:`struct bio_crypt_ctx` to :c:type:`struct bio` that can
+represent an encryption context, because we need to be able to pass this
+encryption context from the upper layers (like the fs layer) to the
+device driver to act upon.
+
+While IE hardware works on the notion of keyslots, the FS layer has no
+knowledge of keyslots - it simply wants to specify an encryption context to
+use while en/decrypting a bio.
+
+We introduce a keyslot manager (KSM) that handles the translation from
+encryption contexts specified by the FS to keyslots on the IE hardware.
+This KSM also serves as the way IE hardware can expose its capabilities to
+upper layers. The generic mode of operation is: each device driver that wants
+to support IE will construct a KSM and set it up in its struct request_queue.
+Upper layers that want to use IE on this device can then use this KSM in
+the device's struct request_queue to translate an encryption context into
+a keyslot. The presence of the KSM in the request queue shall be used to mean
+that the device supports IE.
+
+The KSM uses refcounts to track which keyslots are idle (either they have no
+encryption context programmed, or there are no in-flight struct bios
+referencing that keyslot). When a new encryption context needs a keyslot, it
+tries to find a keyslot that has already been programmed with the same
+encryption context, and if there is no such keyslot, it evicts the least
+recently used idle keyslot and programs the new encryption context into that
+one. If no idle keyslots are available, then the caller will sleep until there
+is at least one.
+
+
+blk-mq changes, other block layer changes and blk-crypto-fallback
+=================================================================
+
+We add a pointer to a ``bi_crypt_context`` and ``keyslot`` to
+:c:type:`struct request`. These will be referred to as the ``crypto fields``
+for the request. This ``keyslot`` is the keyslot into which the
+``bi_crypt_context`` has been programmed in the KSM of the ``request_queue``
+that this request is being sent to.
+
+We introduce ``block/blk-crypto-fallback.c``, which allows upper layers to remain
+blissfully unaware of whether or not real inline encryption hardware is present
+underneath. When a bio is submitted with a target ``request_queue`` that doesn't
+support the encryption context specified with the bio, the block layer will
+en/decrypt the bio with the blk-crypto-fallback.
+
+If the bio is a ``WRITE`` bio, a bounce bio is allocated, and the data in the bio
+is encrypted stored in the bounce bio - blk-mq will then proceed to process the
+bounce bio as if it were not encrypted at all (except when blk-integrity is
+concerned). ``blk-crypto-fallback`` sets the bounce bio's ``bi_end_io`` to an
+internal function that cleans up the bounce bio and ends the original bio.
+
+If the bio is a ``READ`` bio, the bio's ``bi_end_io`` (and also ``bi_private``)
+is saved and overwritten by ``blk-crypto-fallback`` to
+``bio_crypto_fallback_decrypt_bio``. The bio's ``bi_crypt_context`` is also
+overwritten with ``NULL``, so that to the rest of the stack, the bio looks
+as if it was a regular bio that never had an encryption context specified.
+``bio_crypto_fallback_decrypt_bio`` will decrypt the bio, restore the original
+``bi_end_io`` (and also ``bi_private``) and end the bio again.
+
+Regardless of whether real inline encryption hardware is used or the
+blk-crypto-fallback is used, the ciphertext written to disk (and hence the
+on-disk format of data) will be the same (assuming the hardware's implementation
+of the algorithm being used adheres to spec and functions correctly).
+
+If a ``request queue``'s inline encryption hardware claimed to support the
+encryption context specified with a bio, then it will not be handled by the
+``blk-crypto-fallback``. We will eventually reach a point in blk-mq when a
+:c:type:`struct request` needs to be allocated for that bio. At that point,
+blk-mq tries to program the encryption context into the ``request_queue``'s
+keyslot_manager, and obtain a keyslot, which it stores in its newly added
+``keyslot`` field. This keyslot is released when the request is completed.
+
+When the first bio is added to a request, ``blk_crypto_rq_bio_prep`` is called,
+which sets the request's ``crypt_ctx`` to a copy of the bio's
+``bi_crypt_context``. bio_crypt_do_front_merge is called whenever a subsequent
+bio is merged to the front of the request, which updates the ``crypt_ctx`` of
+the request so that it matches the newly merged bio's ``bi_crypt_context``. In particular, the request keeps a copy of the ``bi_crypt_context`` of the first
+bio in its bio-list (blk-mq needs to be careful to maintain this invariant
+during bio and request merges).
+
+To make it possible for inline encryption to work with request queue based
+layered devices, when a request is cloned, its ``crypto fields`` are cloned as
+well. When the cloned request is submitted, blk-mq programs the
+``bi_crypt_context`` of the request into the clone's request_queue's keyslot
+manager, and stores the returned keyslot in the clone's ``keyslot``.
+
+
+API presented to users of the block layer
+=========================================
+
+``struct blk_crypto_key`` represents a crypto key (the raw key, size of the
+key, the crypto algorithm to use, the data unit size to use, and the number of
+bytes required to represent data unit numbers that will be specified with the
+``bi_crypt_context``).
+
+``blk_crypto_init_key`` allows upper layers to initialize such a
+``blk_crypto_key``.
+
+``bio_crypt_set_ctx`` should be called on any bio that a user of
+the block layer wants en/decrypted via inline encryption (or the
+blk-crypto-fallback, if hardware support isn't available for the desired
+crypto configuration). This function takes the ``blk_crypto_key`` and the
+data unit number (DUN) to use when en/decrypting the bio.
+
+``blk_crypto_config_supported`` allows upper layers to query whether or not the
+an encryption context passed to request queue can be handled by blk-crypto
+(either by real inline encryption hardware, or by the blk-crypto-fallback).
+This is useful e.g. when blk-crypto-fallback is disabled, and the upper layer
+wants to use an algorithm that may not supported by hardware - this function
+lets the upper layer know ahead of time that the algorithm isn't supported,
+and the upper layer can fallback to something else if appropriate.
+
+``blk_crypto_start_using_key`` - Upper layers must call this function on
+``blk_crypto_key`` and a ``request_queue`` before using the key with any bio
+headed for that ``request_queue``. This function ensures that either the
+hardware supports the key's crypto settings, or the crypto API fallback has
+transforms for the needed mode allocated and ready to go. Note that this
+function may allocate an ``skcipher``, and must not be called from the data
+path, since allocating ``skciphers`` from the data path can deadlock.
+
+``blk_crypto_evict_key`` *must* be called by upper layers before a
+``blk_crypto_key`` is freed. Further, it *must* only be called only once
+there are no more in-flight requests that use that ``blk_crypto_key``.
+``blk_crypto_evict_key`` will ensure that a key is removed from any keyslots in
+inline encryption hardware that the key might have been programmed into (or the blk-crypto-fallback).
+
+API presented to device drivers
+===============================
+
+A :c:type:``struct blk_keyslot_manager`` should be set up by device drivers in
+the ``request_queue`` of the device. The device driver needs to call
+``blk_ksm_init`` on the ``blk_keyslot_manager``, which specifying the number of
+keyslots supported by the hardware.
+
+The device driver also needs to tell the KSM how to actually manipulate the
+IE hardware in the device to do things like programming the crypto key into
+the IE hardware into a particular keyslot. All this is achieved through the
+:c:type:`struct blk_ksm_ll_ops` field in the KSM that the device driver
+must fill up after initing the ``blk_keyslot_manager``.
+
+The KSM also handles runtime power management for the device when applicable
+(e.g. when it wants to program a crypto key into the IE hardware, the device
+must be runtime powered on) - so the device driver must also set the ``dev``
+field in the ksm to point to the `struct device` for the KSM to use for runtime
+power management.
+
+``blk_ksm_reprogram_all_keys`` can be called by device drivers if the device
+needs each and every of its keyslots to be reprogrammed with the key it
+"should have" at the point in time when the function is called. This is useful
+e.g. if a device loses all its keys on runtime power down/up.
+
+``blk_ksm_destroy`` should be called to free up all resources used by a keyslot
+manager upon ``blk_ksm_init``, once the ``blk_keyslot_manager`` is no longer
+needed.
+
+
+Layered Devices
+===============
+
+Request queue based layered devices like dm-rq that wish to support IE need to
+create their own keyslot manager for their request queue, and expose whatever
+functionality they choose. When a layered device wants to pass a clone of that
+request to another ``request_queue``, blk-crypto will initialize and prepare the
+clone as necessary - see ``blk_crypto_insert_cloned_request`` in
+``blk-crypto.c``.
+
+
+Future Optimizations for layered devices
+========================================
+
+Creating a keyslot manager for a layered device uses up memory for each
+keyslot, and in general, a layered device merely passes the request on to a
+"child" device, so the keyslots in the layered device itself are completely
+unused, and don't need any refcounting or keyslot programming. We can instead
+define a new type of KSM; the "passthrough KSM", that layered devices can use
+to advertise an unlimited number of keyslots, and support for any encryption
+algorithms they choose, while not actually using any memory for each keyslot.
+Another use case for the "passthrough KSM" is for IE devices that do not have a
+limited number of keyslots.
+
+
+Interaction between inline encryption and blk integrity
+=======================================================
+
+At the time of this patch, there is no real hardware that supports both these
+features. However, these features do interact with each other, and it's not
+completely trivial to make them both work together properly. In particular,
+when a WRITE bio wants to use inline encryption on a device that supports both
+features, the bio will have an encryption context specified, after which
+its integrity information is calculated (using the plaintext data, since
+the encryption will happen while data is being written), and the data and
+integrity info is sent to the device. Obviously, the integrity info must be
+verified before the data is encrypted. After the data is encrypted, the device
+must not store the integrity info that it received with the plaintext data
+since that might reveal information about the plaintext data. As such, it must
+re-generate the integrity info from the ciphertext data and store that on disk
+instead. Another issue with storing the integrity info of the plaintext data is
+that it changes the on disk format depending on whether hardware inline
+encryption support is present or the kernel crypto API fallback is used (since
+if the fallback is used, the device will receive the integrity info of the
+ciphertext, not that of the plaintext).
+
+Because there isn't any real hardware yet, it seems prudent to assume that
+hardware implementations might not implement both features together correctly,
+and disallow the combination for now. Whenever a device supports integrity, the
+kernel will pretend that the device does not support hardware inline encryption
+(by essentially setting the keyslot manager in the request_queue of the device
+to NULL). When the crypto API fallback is enabled, this means that all bios with
+and encryption context will use the fallback, and IO will complete as usual.
+When the fallback is disabled, a bio with an encryption context will be failed.
diff --git a/Documentation/bpf/bpf_devel_QA.rst b/Documentation/bpf/bpf_devel_QA.rst
index 38c15c6fcb14..0b3db91dc100 100644
--- a/Documentation/bpf/bpf_devel_QA.rst
+++ b/Documentation/bpf/bpf_devel_QA.rst
@@ -437,6 +437,21 @@ needed::
See the kernels selftest `Documentation/dev-tools/kselftest.rst`_
document for further documentation.
+To maximize the number of tests passing, the .config of the kernel
+under test should match the config file fragment in
+tools/testing/selftests/bpf as closely as possible.
+
+Finally to ensure support for latest BPF Type Format features -
+discussed in `Documentation/bpf/btf.rst`_ - pahole version 1.16
+is required for kernels built with CONFIG_DEBUG_INFO_BTF=y.
+pahole is delivered in the dwarves package or can be built
+from source at
+
+https://github.com/acmel/dwarves
+
+Some distros have pahole version 1.16 packaged already, e.g.
+Fedora, Gentoo.
+
Q: Which BPF kernel selftests version should I run my kernel against?
---------------------------------------------------------------------
A: If you run a kernel ``xyz``, then always run the BPF kernel selftests
diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst
index f99677f3572f..38b4db8be7a2 100644
--- a/Documentation/bpf/index.rst
+++ b/Documentation/bpf/index.rst
@@ -7,7 +7,7 @@ Filter) facility, with a focus on the extended BPF version (eBPF).
This kernel side documentation is still work in progress. The main
textual documentation is (for historical reasons) described in
-`Documentation/networking/filter.txt`_, which describe both classical
+`Documentation/networking/filter.rst`_, which describe both classical
and extended BPF instruction-set.
The Cilium project also maintains a `BPF and XDP Reference Guide`_
that goes into great technical depth about the BPF Architecture.
@@ -59,7 +59,7 @@ Testing and debugging BPF
.. Links:
-.. _Documentation/networking/filter.txt: ../networking/filter.txt
+.. _Documentation/networking/filter.rst: ../networking/filter.txt
.. _man-pages: https://www.kernel.org/doc/man-pages/
.. _bpf(2): http://man7.org/linux/man-pages/man2/bpf.2.html
.. _BPF and XDP Reference Guide: http://cilium.readthedocs.io/en/latest/bpf/
diff --git a/Documentation/bpf/ringbuf.rst b/Documentation/bpf/ringbuf.rst
new file mode 100644
index 000000000000..75f943f0009d
--- /dev/null
+++ b/Documentation/bpf/ringbuf.rst
@@ -0,0 +1,209 @@
+===============
+BPF ring buffer
+===============
+
+This document describes BPF ring buffer design, API, and implementation details.
+
+.. contents::
+ :local:
+ :depth: 2
+
+Motivation
+----------
+
+There are two distinctive motivators for this work, which are not satisfied by
+existing perf buffer, which prompted creation of a new ring buffer
+implementation.
+
+- more efficient memory utilization by sharing ring buffer across CPUs;
+- preserving ordering of events that happen sequentially in time, even across
+ multiple CPUs (e.g., fork/exec/exit events for a task).
+
+These two problems are independent, but perf buffer fails to satisfy both.
+Both are a result of a choice to have per-CPU perf ring buffer. Both can be
+also solved by having an MPSC implementation of ring buffer. The ordering
+problem could technically be solved for perf buffer with some in-kernel
+counting, but given the first one requires an MPSC buffer, the same solution
+would solve the second problem automatically.
+
+Semantics and APIs
+------------------
+
+Single ring buffer is presented to BPF programs as an instance of BPF map of
+type ``BPF_MAP_TYPE_RINGBUF``. Two other alternatives considered, but
+ultimately rejected.
+
+One way would be to, similar to ``BPF_MAP_TYPE_PERF_EVENT_ARRAY``, make
+``BPF_MAP_TYPE_RINGBUF`` could represent an array of ring buffers, but not
+enforce "same CPU only" rule. This would be more familiar interface compatible
+with existing perf buffer use in BPF, but would fail if application needed more
+advanced logic to lookup ring buffer by arbitrary key.
+``BPF_MAP_TYPE_HASH_OF_MAPS`` addresses this with current approach.
+Additionally, given the performance of BPF ringbuf, many use cases would just
+opt into a simple single ring buffer shared among all CPUs, for which current
+approach would be an overkill.
+
+Another approach could introduce a new concept, alongside BPF map, to represent
+generic "container" object, which doesn't necessarily have key/value interface
+with lookup/update/delete operations. This approach would add a lot of extra
+infrastructure that has to be built for observability and verifier support. It
+would also add another concept that BPF developers would have to familiarize
+themselves with, new syntax in libbpf, etc. But then would really provide no
+additional benefits over the approach of using a map. ``BPF_MAP_TYPE_RINGBUF``
+doesn't support lookup/update/delete operations, but so doesn't few other map
+types (e.g., queue and stack; array doesn't support delete, etc).
+
+The approach chosen has an advantage of re-using existing BPF map
+infrastructure (introspection APIs in kernel, libbpf support, etc), being
+familiar concept (no need to teach users a new type of object in BPF program),
+and utilizing existing tooling (bpftool). For common scenario of using a single
+ring buffer for all CPUs, it's as simple and straightforward, as would be with
+a dedicated "container" object. On the other hand, by being a map, it can be
+combined with ``ARRAY_OF_MAPS`` and ``HASH_OF_MAPS`` map-in-maps to implement
+a wide variety of topologies, from one ring buffer for each CPU (e.g., as
+a replacement for perf buffer use cases), to a complicated application
+hashing/sharding of ring buffers (e.g., having a small pool of ring buffers
+with hashed task's tgid being a look up key to preserve order, but reduce
+contention).
+
+Key and value sizes are enforced to be zero. ``max_entries`` is used to specify
+the size of ring buffer and has to be a power of 2 value.
+
+There are a bunch of similarities between perf buffer
+(``BPF_MAP_TYPE_PERF_EVENT_ARRAY``) and new BPF ring buffer semantics:
+
+- variable-length records;
+- if there is no more space left in ring buffer, reservation fails, no
+ blocking;
+- memory-mappable data area for user-space applications for ease of
+ consumption and high performance;
+- epoll notifications for new incoming data;
+- but still the ability to do busy polling for new data to achieve the
+ lowest latency, if necessary.
+
+BPF ringbuf provides two sets of APIs to BPF programs:
+
+- ``bpf_ringbuf_output()`` allows to *copy* data from one place to a ring
+ buffer, similarly to ``bpf_perf_event_output()``;
+- ``bpf_ringbuf_reserve()``/``bpf_ringbuf_commit()``/``bpf_ringbuf_discard()``
+ APIs split the whole process into two steps. First, a fixed amount of space
+ is reserved. If successful, a pointer to a data inside ring buffer data
+ area is returned, which BPF programs can use similarly to a data inside
+ array/hash maps. Once ready, this piece of memory is either committed or
+ discarded. Discard is similar to commit, but makes consumer ignore the
+ record.
+
+``bpf_ringbuf_output()`` has disadvantage of incurring extra memory copy,
+because record has to be prepared in some other place first. But it allows to
+submit records of the length that's not known to verifier beforehand. It also
+closely matches ``bpf_perf_event_output()``, so will simplify migration
+significantly.
+
+``bpf_ringbuf_reserve()`` avoids the extra copy of memory by providing a memory
+pointer directly to ring buffer memory. In a lot of cases records are larger
+than BPF stack space allows, so many programs have use extra per-CPU array as
+a temporary heap for preparing sample. bpf_ringbuf_reserve() avoid this needs
+completely. But in exchange, it only allows a known constant size of memory to
+be reserved, such that verifier can verify that BPF program can't access memory
+outside its reserved record space. bpf_ringbuf_output(), while slightly slower
+due to extra memory copy, covers some use cases that are not suitable for
+``bpf_ringbuf_reserve()``.
+
+The difference between commit and discard is very small. Discard just marks
+a record as discarded, and such records are supposed to be ignored by consumer
+code. Discard is useful for some advanced use-cases, such as ensuring
+all-or-nothing multi-record submission, or emulating temporary
+``malloc()``/``free()`` within single BPF program invocation.
+
+Each reserved record is tracked by verifier through existing
+reference-tracking logic, similar to socket ref-tracking. It is thus
+impossible to reserve a record, but forget to submit (or discard) it.
+
+``bpf_ringbuf_query()`` helper allows to query various properties of ring
+buffer. Currently 4 are supported:
+
+- ``BPF_RB_AVAIL_DATA`` returns amount of unconsumed data in ring buffer;
+- ``BPF_RB_RING_SIZE`` returns the size of ring buffer;
+- ``BPF_RB_CONS_POS``/``BPF_RB_PROD_POS`` returns current logical possition
+ of consumer/producer, respectively.
+
+Returned values are momentarily snapshots of ring buffer state and could be
+off by the time helper returns, so this should be used only for
+debugging/reporting reasons or for implementing various heuristics, that take
+into account highly-changeable nature of some of those characteristics.
+
+One such heuristic might involve more fine-grained control over poll/epoll
+notifications about new data availability in ring buffer. Together with
+``BPF_RB_NO_WAKEUP``/``BPF_RB_FORCE_WAKEUP`` flags for output/commit/discard
+helpers, it allows BPF program a high degree of control and, e.g., more
+efficient batched notifications. Default self-balancing strategy, though,
+should be adequate for most applications and will work reliable and efficiently
+already.
+
+Design and Implementation
+-------------------------
+
+This reserve/commit schema allows a natural way for multiple producers, either
+on different CPUs or even on the same CPU/in the same BPF program, to reserve
+independent records and work with them without blocking other producers. This
+means that if BPF program was interruped by another BPF program sharing the
+same ring buffer, they will both get a record reserved (provided there is
+enough space left) and can work with it and submit it independently. This
+applies to NMI context as well, except that due to using a spinlock during
+reservation, in NMI context, ``bpf_ringbuf_reserve()`` might fail to get
+a lock, in which case reservation will fail even if ring buffer is not full.
+
+The ring buffer itself internally is implemented as a power-of-2 sized
+circular buffer, with two logical and ever-increasing counters (which might
+wrap around on 32-bit architectures, that's not a problem):
+
+- consumer counter shows up to which logical position consumer consumed the
+ data;
+- producer counter denotes amount of data reserved by all producers.
+
+Each time a record is reserved, producer that "owns" the record will
+successfully advance producer counter. At that point, data is still not yet
+ready to be consumed, though. Each record has 8 byte header, which contains the
+length of reserved record, as well as two extra bits: busy bit to denote that
+record is still being worked on, and discard bit, which might be set at commit
+time if record is discarded. In the latter case, consumer is supposed to skip
+the record and move on to the next one. Record header also encodes record's
+relative offset from the beginning of ring buffer data area (in pages). This
+allows ``bpf_ringbuf_commit()``/``bpf_ringbuf_discard()`` to accept only the
+pointer to the record itself, without requiring also the pointer to ring buffer
+itself. Ring buffer memory location will be restored from record metadata
+header. This significantly simplifies verifier, as well as improving API
+usability.
+
+Producer counter increments are serialized under spinlock, so there is
+a strict ordering between reservations. Commits, on the other hand, are
+completely lockless and independent. All records become available to consumer
+in the order of reservations, but only after all previous records where
+already committed. It is thus possible for slow producers to temporarily hold
+off submitted records, that were reserved later.
+
+Reservation/commit/consumer protocol is verified by litmus tests in
+Documentation/litmus_tests/bpf-rb/_.
+
+One interesting implementation bit, that significantly simplifies (and thus
+speeds up as well) implementation of both producers and consumers is how data
+area is mapped twice contiguously back-to-back in the virtual memory. This
+allows to not take any special measures for samples that have to wrap around
+at the end of the circular buffer data area, because the next page after the
+last data page would be first data page again, and thus the sample will still
+appear completely contiguous in virtual memory. See comment and a simple ASCII
+diagram showing this visually in ``bpf_ringbuf_area_alloc()``.
+
+Another feature that distinguishes BPF ringbuf from perf ring buffer is
+a self-pacing notifications of new data being availability.
+``bpf_ringbuf_commit()`` implementation will send a notification of new record
+being available after commit only if consumer has already caught up right up to
+the record being committed. If not, consumer still has to catch up and thus
+will see new data anyways without needing an extra poll notification.
+Benchmarks (see tools/testing/selftests/bpf/benchs/bench_ringbuf.c_) show that
+this allows to achieve a very high throughput without having to resort to
+tricks like "notify only every Nth sample", which are necessary with perf
+buffer. For extreme cases, when BPF program wants more manual control of
+notifications, commit/discard/output helpers accept ``BPF_RB_NO_WAKEUP`` and
+``BPF_RB_FORCE_WAKEUP`` flags, which give full control over notifications of
+data availability, but require extra caution and diligence in using this API.
diff --git a/Documentation/conf.py b/Documentation/conf.py
index 9ae8e9abf846..f6a1bc07c410 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -388,44 +388,6 @@ if major == 1 and minor < 6:
# author, documentclass [howto, manual, or own class]).
# Sorted in alphabetical order
latex_documents = [
- ('admin-guide/index', 'linux-user.tex', 'Linux Kernel User Documentation',
- 'The kernel development community', 'manual'),
- ('core-api/index', 'core-api.tex', 'The kernel core API manual',
- 'The kernel development community', 'manual'),
- ('crypto/index', 'crypto-api.tex', 'Linux Kernel Crypto API manual',
- 'The kernel development community', 'manual'),
- ('dev-tools/index', 'dev-tools.tex', 'Development tools for the Kernel',
- 'The kernel development community', 'manual'),
- ('doc-guide/index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide',
- 'The kernel development community', 'manual'),
- ('driver-api/index', 'driver-api.tex', 'The kernel driver API manual',
- 'The kernel development community', 'manual'),
- ('filesystems/index', 'filesystems.tex', 'Linux Filesystems API',
- 'The kernel development community', 'manual'),
- ('admin-guide/ext4', 'ext4-admin-guide.tex', 'ext4 Administration Guide',
- 'ext4 Community', 'manual'),
- ('filesystems/ext4/index', 'ext4-data-structures.tex',
- 'ext4 Data Structures and Algorithms', 'ext4 Community', 'manual'),
- ('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide',
- 'The kernel development community', 'manual'),
- ('input/index', 'linux-input.tex', 'The Linux input driver subsystem',
- 'The kernel development community', 'manual'),
- ('kernel-hacking/index', 'kernel-hacking.tex', 'Unreliable Guide To Hacking The Linux Kernel',
- 'The kernel development community', 'manual'),
- ('media/index', 'media.tex', 'Linux Media Subsystem Documentation',
- 'The kernel development community', 'manual'),
- ('networking/index', 'networking.tex', 'Linux Networking Documentation',
- 'The kernel development community', 'manual'),
- ('process/index', 'development-process.tex', 'Linux Kernel Development Documentation',
- 'The kernel development community', 'manual'),
- ('security/index', 'security.tex', 'The kernel security subsystem manual',
- 'The kernel development community', 'manual'),
- ('sh/index', 'sh.tex', 'SuperH architecture implementation manual',
- 'The kernel development community', 'manual'),
- ('sound/index', 'sound.tex', 'Linux Sound Subsystem Documentation',
- 'The kernel development community', 'manual'),
- ('userspace-api/index', 'userspace-api.tex', 'The Linux kernel user-space API guide',
- 'The kernel development community', 'manual'),
]
# Add all other index files from Documentation/ subdirectories
diff --git a/Documentation/core-api/cachetlb.rst b/Documentation/core-api/cachetlb.rst
index 93cb65d52720..a1582cc79f0f 100644
--- a/Documentation/core-api/cachetlb.rst
+++ b/Documentation/core-api/cachetlb.rst
@@ -213,7 +213,7 @@ Here are the routines, one by one:
there will be no entries in the cache for the kernel address
space for virtual addresses in the range 'start' to 'end-1'.
- The first of these two routines is invoked after map_vm_area()
+ The first of these two routines is invoked after map_kernel_range()
has installed the page table entries. The second is invoked
before unmap_kernel_range() deletes the page table entries.
diff --git a/Documentation/debugging-via-ohci1394.txt b/Documentation/core-api/debugging-via-ohci1394.rst
index 981ad4f89fd3..981ad4f89fd3 100644
--- a/Documentation/debugging-via-ohci1394.txt
+++ b/Documentation/core-api/debugging-via-ohci1394.rst
diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/core-api/dma-api-howto.rst
index 358d495456d1..358d495456d1 100644
--- a/Documentation/DMA-API-HOWTO.txt
+++ b/Documentation/core-api/dma-api-howto.rst
diff --git a/Documentation/DMA-API.txt b/Documentation/core-api/dma-api.rst
index 2d8d2fed7317..2d8d2fed7317 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/core-api/dma-api.rst
diff --git a/Documentation/DMA-attributes.txt b/Documentation/core-api/dma-attributes.rst
index 29dcbe8826e8..29dcbe8826e8 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/core-api/dma-attributes.rst
diff --git a/Documentation/DMA-ISA-LPC.txt b/Documentation/core-api/dma-isa-lpc.rst
index b1ec7b16c21f..b1ec7b16c21f 100644
--- a/Documentation/DMA-ISA-LPC.txt
+++ b/Documentation/core-api/dma-isa-lpc.rst
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index 0897ad12c119..15ab86112627 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -18,6 +18,7 @@ it.
kernel-api
workqueue
+ printk-basics
printk-formats
symbol-namespaces
@@ -30,10 +31,12 @@ Library functionality that is used throughout the kernel.
:maxdepth: 1
kobject
+ kref
assoc_array
xarray
idr
circular-buffers
+ rbtree
generic-radix-tree
packing
timekeeping
@@ -50,6 +53,7 @@ How Linux keeps everything from happening at the same time. See
atomic_ops
refcount-vs-atomic
+ irq/index
local_ops
padata
../RCU/index
@@ -78,6 +82,10 @@ more memory-management documentation in :doc:`/vm/index`.
:maxdepth: 1
memory-allocation
+ dma-api
+ dma-api-howto
+ dma-attributes
+ dma-isa-lpc
mm-api
genalloc
pin_user_pages
@@ -92,6 +100,7 @@ Interfaces for kernel debugging
debug-objects
tracepoint
+ debugging-via-ohci1394
Everything else
===============
diff --git a/Documentation/IRQ.txt b/Documentation/core-api/irq/concepts.rst
index 4273806a606b..4273806a606b 100644
--- a/Documentation/IRQ.txt
+++ b/Documentation/core-api/irq/concepts.rst
diff --git a/Documentation/core-api/irq/index.rst b/Documentation/core-api/irq/index.rst
new file mode 100644
index 000000000000..0d65d11e5420
--- /dev/null
+++ b/Documentation/core-api/irq/index.rst
@@ -0,0 +1,11 @@
+====
+IRQs
+====
+
+.. toctree::
+ :maxdepth: 1
+
+ concepts
+ irq-affinity
+ irq-domain
+ irqflags-tracing
diff --git a/Documentation/IRQ-affinity.txt b/Documentation/core-api/irq/irq-affinity.rst
index 29da5000836a..29da5000836a 100644
--- a/Documentation/IRQ-affinity.txt
+++ b/Documentation/core-api/irq/irq-affinity.rst
diff --git a/Documentation/IRQ-domain.txt b/Documentation/core-api/irq/irq-domain.rst
index 507775cce753..096db12f32d5 100644
--- a/Documentation/IRQ-domain.txt
+++ b/Documentation/core-api/irq/irq-domain.rst
@@ -263,7 +263,8 @@ needs to:
Hierarchy irq_domain is in no way x86 specific, and is heavily used to
support other architectures, such as ARM, ARM64 etc.
-=== Debugging ===
+Debugging
+=========
Most of the internals of the IRQ subsystem are exposed in debugfs by
turning CONFIG_GENERIC_IRQ_DEBUGFS on.
diff --git a/Documentation/irqflags-tracing.txt b/Documentation/core-api/irq/irqflags-tracing.rst
index bdd208259fb3..bdd208259fb3 100644
--- a/Documentation/irqflags-tracing.txt
+++ b/Documentation/core-api/irq/irqflags-tracing.rst
diff --git a/Documentation/core-api/kobject.rst b/Documentation/core-api/kobject.rst
index 1f62d4d7d966..e93dc8cf52dd 100644
--- a/Documentation/core-api/kobject.rst
+++ b/Documentation/core-api/kobject.rst
@@ -80,11 +80,11 @@ what is the pointer to the containing structure? You must avoid tricks
(such as assuming that the kobject is at the beginning of the structure)
and, instead, use the container_of() macro, found in ``<linux/kernel.h>``::
- container_of(pointer, type, member)
+ container_of(ptr, type, member)
where:
- * ``pointer`` is the pointer to the embedded kobject,
+ * ``ptr`` is the pointer to the embedded kobject,
* ``type`` is the type of the containing structure, and
* ``member`` is the name of the structure field to which ``pointer`` points.
@@ -140,7 +140,7 @@ the name of the kobject, call kobject_rename()::
int kobject_rename(struct kobject *kobj, const char *new_name);
-kobject_rename does not perform any locking or have a solid notion of
+kobject_rename() does not perform any locking or have a solid notion of
what names are valid so the caller must provide their own sanity checking
and serialization.
@@ -210,7 +210,7 @@ statically and will warn the developer of this improper usage.
If all that you want to use a kobject for is to provide a reference counter
for your structure, please use the struct kref instead; a kobject would be
overkill. For more information on how to use struct kref, please see the
-file Documentation/kref.txt in the Linux kernel source tree.
+file Documentation/core-api/kref.rst in the Linux kernel source tree.
Creating "simple" kobjects
@@ -222,17 +222,17 @@ ksets, show and store functions, and other details. This is the one
exception where a single kobject should be created. To create such an
entry, use the function::
- struct kobject *kobject_create_and_add(char *name, struct kobject *parent);
+ struct kobject *kobject_create_and_add(const char *name, struct kobject *parent);
This function will create a kobject and place it in sysfs in the location
underneath the specified parent kobject. To create simple attributes
associated with this kobject, use::
- int sysfs_create_file(struct kobject *kobj, struct attribute *attr);
+ int sysfs_create_file(struct kobject *kobj, const struct attribute *attr);
or::
- int sysfs_create_group(struct kobject *kobj, struct attribute_group *grp);
+ int sysfs_create_group(struct kobject *kobj, const struct attribute_group *grp);
Both types of attributes used here, with a kobject that has been created
with the kobject_create_and_add(), can be of type kobj_attribute, so no
@@ -300,8 +300,10 @@ kobj_type::
void (*release)(struct kobject *kobj);
const struct sysfs_ops *sysfs_ops;
struct attribute **default_attrs;
+ const struct attribute_group **default_groups;
const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj);
const void *(*namespace)(struct kobject *kobj);
+ void (*get_ownership)(struct kobject *kobj, kuid_t *uid, kgid_t *gid);
};
This structure is used to describe a particular type of kobject (or, more
@@ -352,12 +354,12 @@ created and never declared statically or on the stack. To create a new
kset use::
struct kset *kset_create_and_add(const char *name,
- struct kset_uevent_ops *u,
- struct kobject *parent);
+ const struct kset_uevent_ops *uevent_ops,
+ struct kobject *parent_kobj);
When you are finished with the kset, call::
- void kset_unregister(struct kset *kset);
+ void kset_unregister(struct kset *k);
to destroy it. This removes the kset from sysfs and decrements its reference
count. When the reference count goes to zero, the kset will be released.
@@ -371,9 +373,9 @@ If a kset wishes to control the uevent operations of the kobjects
associated with it, it can use the struct kset_uevent_ops to handle it::
struct kset_uevent_ops {
- int (*filter)(struct kset *kset, struct kobject *kobj);
- const char *(*name)(struct kset *kset, struct kobject *kobj);
- int (*uevent)(struct kset *kset, struct kobject *kobj,
+ int (* const filter)(struct kset *kset, struct kobject *kobj);
+ const char *(* const name)(struct kset *kset, struct kobject *kobj);
+ int (* const uevent)(struct kset *kset, struct kobject *kobj,
struct kobj_uevent_env *env);
};
diff --git a/Documentation/kref.txt b/Documentation/core-api/kref.rst
index c61eea6f1bf2..c61eea6f1bf2 100644
--- a/Documentation/kref.txt
+++ b/Documentation/core-api/kref.rst
diff --git a/Documentation/core-api/padata.rst b/Documentation/core-api/padata.rst
index 9a24c111781d..0830e5b0e821 100644
--- a/Documentation/core-api/padata.rst
+++ b/Documentation/core-api/padata.rst
@@ -4,23 +4,26 @@
The padata parallel execution mechanism
=======================================
-:Date: December 2019
+:Date: May 2020
Padata is a mechanism by which the kernel can farm jobs out to be done in
-parallel on multiple CPUs while retaining their ordering. It was developed for
-use with the IPsec code, which needs to be able to perform encryption and
-decryption on large numbers of packets without reordering those packets. The
-crypto developers made a point of writing padata in a sufficiently general
-fashion that it could be put to other uses as well.
+parallel on multiple CPUs while optionally retaining their ordering.
-Usage
-=====
+It was originally developed for IPsec, which needs to perform encryption and
+decryption on large numbers of packets without reordering those packets. This
+is currently the sole consumer of padata's serialized job support.
+
+Padata also supports multithreaded jobs, splitting up the job evenly while load
+balancing and coordinating between threads.
+
+Running Serialized Jobs
+=======================
Initializing
------------
-The first step in using padata is to set up a padata_instance structure for
-overall control of how jobs are to be run::
+The first step in using padata to run serialized jobs is to set up a
+padata_instance structure for overall control of how jobs are to be run::
#include <linux/padata.h>
@@ -162,6 +165,24 @@ functions that correspond to the allocation in reverse::
It is the user's responsibility to ensure all outstanding jobs are complete
before any of the above are called.
+Running Multithreaded Jobs
+==========================
+
+A multithreaded job has a main thread and zero or more helper threads, with the
+main thread participating in the job and then waiting until all helpers have
+finished. padata splits the job into units called chunks, where a chunk is a
+piece of the job that one thread completes in one call to the thread function.
+
+A user has to do three things to run a multithreaded job. First, describe the
+job by defining a padata_mt_job structure, which is explained in the Interface
+section. This includes a pointer to the thread function, which padata will
+call each time it assigns a job chunk to a thread. Then, define the thread
+function, which accepts three arguments, ``start``, ``end``, and ``arg``, where
+the first two delimit the range that the thread operates on and the last is a
+pointer to the job's shared state, if any. Prepare the shared state, which is
+typically allocated on the main thread's stack. Last, call
+padata_do_multithreaded(), which will return once the job is finished.
+
Interface
=========
diff --git a/Documentation/core-api/printk-basics.rst b/Documentation/core-api/printk-basics.rst
new file mode 100644
index 000000000000..563a9ce5fe1d
--- /dev/null
+++ b/Documentation/core-api/printk-basics.rst
@@ -0,0 +1,115 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================
+Message logging with printk
+===========================
+
+printk() is one of the most widely known functions in the Linux kernel. It's the
+standard tool we have for printing messages and usually the most basic way of
+tracing and debugging. If you're familiar with printf(3) you can tell printk()
+is based on it, although it has some functional differences:
+
+ - printk() messages can specify a log level.
+
+ - the format string, while largely compatible with C99, doesn't follow the
+ exact same specification. It has some extensions and a few limitations
+ (no ``%n`` or floating point conversion specifiers). See :ref:`How to get
+ printk format specifiers right <printk-specifiers>`.
+
+All printk() messages are printed to the kernel log buffer, which is a ring
+buffer exported to userspace through /dev/kmsg. The usual way to read it is
+using ``dmesg``.
+
+printk() is typically used like this::
+
+ printk(KERN_INFO "Message: %s\n", arg);
+
+where ``KERN_INFO`` is the log level (note that it's concatenated to the format
+string, the log level is not a separate argument). The available log levels are:
+
++----------------+--------+-----------------------------------------------+
+| Name | String | Alias function |
++================+========+===============================================+
+| KERN_EMERG | "0" | pr_emerg() |
++----------------+--------+-----------------------------------------------+
+| KERN_ALERT | "1" | pr_alert() |
++----------------+--------+-----------------------------------------------+
+| KERN_CRIT | "2" | pr_crit() |
++----------------+--------+-----------------------------------------------+
+| KERN_ERR | "3" | pr_err() |
++----------------+--------+-----------------------------------------------+
+| KERN_WARNING | "4" | pr_warn() |
++----------------+--------+-----------------------------------------------+
+| KERN_NOTICE | "5" | pr_notice() |
++----------------+--------+-----------------------------------------------+
+| KERN_INFO | "6" | pr_info() |
++----------------+--------+-----------------------------------------------+
+| KERN_DEBUG | "7" | pr_debug() and pr_devel() if DEBUG is defined |
++----------------+--------+-----------------------------------------------+
+| KERN_DEFAULT | "" | |
++----------------+--------+-----------------------------------------------+
+| KERN_CONT | "c" | pr_cont() |
++----------------+--------+-----------------------------------------------+
+
+
+The log level specifies the importance of a message. The kernel decides whether
+to show the message immediately (printing it to the current console) depending
+on its log level and the current *console_loglevel* (a kernel variable). If the
+message priority is higher (lower log level value) than the *console_loglevel*
+the message will be printed to the console.
+
+If the log level is omitted, the message is printed with ``KERN_DEFAULT``
+level.
+
+You can check the current *console_loglevel* with::
+
+ $ cat /proc/sys/kernel/printk
+ 4 4 1 7
+
+The result shows the *current*, *default*, *minimum* and *boot-time-default* log
+levels.
+
+To change the current console_loglevel simply write the the desired level to
+``/proc/sys/kernel/printk``. For example, to print all messages to the console::
+
+ # echo 8 > /proc/sys/kernel/printk
+
+Another way, using ``dmesg``::
+
+ # dmesg -n 5
+
+sets the console_loglevel to print KERN_WARNING (4) or more severe messages to
+console. See ``dmesg(1)`` for more information.
+
+As an alternative to printk() you can use the ``pr_*()`` aliases for
+logging. This family of macros embed the log level in the macro names. For
+example::
+
+ pr_info("Info message no. %d\n", msg_num);
+
+prints a ``KERN_INFO`` message.
+
+Besides being more concise than the equivalent printk() calls, they can use a
+common definition for the format string through the pr_fmt() macro. For
+instance, defining this at the top of a source file (before any ``#include``
+directive)::
+
+ #define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+would prefix every pr_*() message in that file with the module and function name
+that originated the message.
+
+For debugging purposes there are also two conditionally-compiled macros:
+pr_debug() and pr_devel(), which are compiled-out unless ``DEBUG`` (or
+also ``CONFIG_DYNAMIC_DEBUG`` in the case of pr_debug()) is defined.
+
+
+Function reference
+==================
+
+.. kernel-doc:: kernel/printk/printk.c
+ :functions: printk
+
+.. kernel-doc:: include/linux/printk.h
+ :functions: pr_emerg pr_alert pr_crit pr_err pr_warn pr_notice pr_info
+ pr_fmt pr_debug pr_devel pr_cont
diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst
index 8ebe46b1af39..8c9aba262b1e 100644
--- a/Documentation/core-api/printk-formats.rst
+++ b/Documentation/core-api/printk-formats.rst
@@ -2,6 +2,8 @@
How to get printk format specifiers right
=========================================
+.. _printk-specifiers:
+
:Author: Randy Dunlap <rdunlap@infradead.org>
:Author: Andrew Murray <amurray@mpc-data.co.uk>
@@ -112,6 +114,20 @@ used when printing stack backtraces. The specifier takes into
consideration the effect of compiler optimisations which may occur
when tail-calls are used and marked with the noreturn GCC attribute.
+Probed Pointers from BPF / tracing
+----------------------------------
+
+::
+
+ %pks kernel string
+ %pus user string
+
+The ``k`` and ``u`` specifiers are used for printing prior probed memory from
+either kernel memory (k) or user memory (u). The subsequent ``s`` specifier
+results in printing a string. For direct use in regular vsnprintf() the (k)
+and (u) annotation is ignored, however, when used out of BPF's bpf_trace_printk(),
+for example, it reads the memory it is pointing to without faulting.
+
Kernel Pointers
---------------
@@ -468,21 +484,23 @@ Examples (OF)::
%pfwf /ocp@68000000/i2c@48072000/camera@10/port/endpoint - Full name
%pfwP endpoint - Node name
-Time and date (struct rtc_time)
--------------------------------
+Time and date
+-------------
::
- %ptR YYYY-mm-ddTHH:MM:SS
- %ptRd YYYY-mm-dd
- %ptRt HH:MM:SS
- %ptR[dt][r]
+ %pt[RT] YYYY-mm-ddTHH:MM:SS
+ %pt[RT]d YYYY-mm-dd
+ %pt[RT]t HH:MM:SS
+ %pt[RT][dt][r]
-For printing date and time as represented by struct rtc_time structure in
-human readable format.
+For printing date and time as represented by
+ R struct rtc_time structure
+ T time64_t type
+in human readable format.
-By default year will be incremented by 1900 and month by 1. Use %ptRr (raw)
-to suppress this behaviour.
+By default year will be incremented by 1900 and month by 1.
+Use %pt[RT]r (raw) to suppress this behaviour.
Passed by reference.
diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst
index 49d9833af871..ec575e72d0b2 100644
--- a/Documentation/core-api/protection-keys.rst
+++ b/Documentation/core-api/protection-keys.rst
@@ -5,8 +5,9 @@ Memory Protection Keys
======================
Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
-which is found on Intel's Skylake "Scalable Processor" Server CPUs.
-It will be avalable in future non-server parts.
+which is found on Intel's Skylake (and later) "Scalable Processor"
+Server CPUs. It will be available in future non-server Intel parts
+and future AMD processors.
For anyone wishing to test or use this feature, it is available in
Amazon's EC2 C5 instances and is known to work there using an Ubuntu
diff --git a/Documentation/rbtree.txt b/Documentation/core-api/rbtree.rst
index 523d54b60087..523d54b60087 100644
--- a/Documentation/rbtree.txt
+++ b/Documentation/core-api/rbtree.rst
diff --git a/Documentation/dev-tools/kgdb.rst b/Documentation/dev-tools/kgdb.rst
index d38be58f872a..61293f40bc6e 100644
--- a/Documentation/dev-tools/kgdb.rst
+++ b/Documentation/dev-tools/kgdb.rst
@@ -274,6 +274,30 @@ don't like this are to hack gdb to send the :kbd:`SysRq-G` for you as well as
on the initial connect, or to use a debugger proxy that allows an
unmodified gdb to do the debugging.
+Kernel parameter: ``kgdboc_earlycon``
+-------------------------------------
+
+If you specify the kernel parameter ``kgdboc_earlycon`` and your serial
+driver registers a boot console that supports polling (doesn't need
+interrupts and implements a nonblocking read() function) kgdb will attempt
+to work using the boot console until it can transition to the regular
+tty driver specified by the ``kgdboc`` parameter.
+
+Normally there is only one boot console (especially that implements the
+read() function) so just adding ``kgdboc_earlycon`` on its own is
+sufficient to make this work. If you have more than one boot console you
+can add the boot console's name to differentiate. Note that names that
+are registered through the boot console layer and the tty layer are not
+the same for the same port.
+
+For instance, on one board to be explicit you might do::
+
+ kgdboc_earlycon=qcom_geni kgdboc=ttyMSM0
+
+If the only boot console on the device was "qcom_geni", you could simplify::
+
+ kgdboc_earlycon kgdboc=ttyMSM0
+
Kernel parameter: ``kgdbwait``
------------------------------
diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst
index 61ae13c44f91..5d1f56fcd2e7 100644
--- a/Documentation/dev-tools/kselftest.rst
+++ b/Documentation/dev-tools/kselftest.rst
@@ -301,7 +301,8 @@ Helpers
.. kernel-doc:: tools/testing/selftests/kselftest_harness.h
:functions: TH_LOG TEST TEST_SIGNAL FIXTURE FIXTURE_DATA FIXTURE_SETUP
- FIXTURE_TEARDOWN TEST_F TEST_HARNESS_MAIN
+ FIXTURE_TEARDOWN TEST_F TEST_HARNESS_MAIN FIXTURE_VARIANT
+ FIXTURE_VARIANT_ADD
Operators
---------
diff --git a/Documentation/devicetree/bindings/Makefile b/Documentation/devicetree/bindings/Makefile
index 1df680d07461..7782d9985082 100644
--- a/Documentation/devicetree/bindings/Makefile
+++ b/Documentation/devicetree/bindings/Makefile
@@ -2,6 +2,7 @@
DT_DOC_CHECKER ?= dt-doc-validate
DT_EXTRACT_EX ?= dt-extract-example
DT_MK_SCHEMA ?= dt-mk-schema
+DT_MK_SCHEMA_USERONLY_FLAG := $(if $(DT_SCHEMA_FILES), -u)
quiet_cmd_chk_binding = CHKDT $(patsubst $(srctree)/%,%,$<)
cmd_chk_binding = $(DT_DOC_CHECKER) -u $(srctree)/$(src) $< ; \
@@ -13,16 +14,18 @@ $(obj)/%.example.dts: $(src)/%.yaml FORCE
# Use full schemas when checking %.example.dts
DT_TMP_SCHEMA := $(obj)/processed-schema-examples.yaml
+find_cmd = find $(srctree)/$(src) \( -name '*.yaml' ! \
+ -name 'processed-schema*' ! \
+ -name '*.example.dt.yaml' \)
+
quiet_cmd_mk_schema = SCHEMA $@
- cmd_mk_schema = $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ $(real-prereqs)
+ cmd_mk_schema = rm -f $@ ; \
+ $(if $(DT_MK_SCHEMA_FLAGS), \
+ echo $(real-prereqs), \
+ $(find_cmd)) | \
+ xargs $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) >> $@
-DT_DOCS = $(addprefix $(src)/, \
- $(shell \
- cd $(srctree)/$(src) && \
- find * \( -name '*.yaml' ! \
- -name 'processed-schema*' ! \
- -name '*.example.dt.yaml' \) \
- ))
+DT_DOCS = $(shell $(find_cmd) | sed -e 's|^$(srctree)/||')
DT_SCHEMA_FILES ?= $(DT_DOCS)
@@ -37,7 +40,7 @@ override DTC_FLAGS := \
$(obj)/processed-schema-examples.yaml: $(DT_DOCS) FORCE
$(call if_changed,mk_schema)
-$(obj)/processed-schema.yaml: DT_MK_SCHEMA_FLAGS := -u
+$(obj)/processed-schema.yaml: DT_MK_SCHEMA_FLAGS := $(DT_MK_SCHEMA_USERONLY_FLAG)
$(obj)/processed-schema.yaml: $(DT_SCHEMA_FILES) FORCE
$(call if_changed,mk_schema)
diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt
deleted file mode 100644
index ecf027a9003a..000000000000
--- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt
+++ /dev/null
@@ -1,36 +0,0 @@
-Mediatek pericfg controller
-===========================
-
-The Mediatek pericfg controller provides various clocks and reset
-outputs to the system.
-
-Required Properties:
-
-- compatible: Should be one of:
- - "mediatek,mt2701-pericfg", "syscon"
- - "mediatek,mt2712-pericfg", "syscon"
- - "mediatek,mt7622-pericfg", "syscon"
- - "mediatek,mt7623-pericfg", "mediatek,mt2701-pericfg", "syscon"
- - "mediatek,mt7629-pericfg", "syscon"
- - "mediatek,mt8135-pericfg", "syscon"
- - "mediatek,mt8173-pericfg", "syscon"
- - "mediatek,mt8183-pericfg", "syscon"
-- #clock-cells: Must be 1
-- #reset-cells: Must be 1
-
-The pericfg controller uses the common clk binding from
-Documentation/devicetree/bindings/clock/clock-bindings.txt
-The available clocks are defined in dt-bindings/clock/mt*-clk.h.
-Also it uses the common reset controller binding from
-Documentation/devicetree/bindings/reset/reset.txt.
-The available reset outputs are defined in
-dt-bindings/reset/mt*-resets.h
-
-Example:
-
-pericfg: power-controller@10003000 {
- compatible = "mediatek,mt8173-pericfg", "syscon";
- reg = <0 0x10003000 0 0x1000>;
- #clock-cells = <1>;
- #reset-cells = <1>;
-};
diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml b/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml
new file mode 100644
index 000000000000..55209a2baedc
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml
@@ -0,0 +1,64 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/arm/mediatek/mediatek,pericfg.yaml#"
+$schema: "http://devicetree.org/meta-schemas/core.yaml#"
+
+title: MediaTek Peripheral Configuration Controller
+
+maintainers:
+ - Bartosz Golaszewski <bgolaszewski@baylibre.com>
+
+description:
+ The Mediatek pericfg controller provides various clocks and reset outputs
+ to the system.
+
+properties:
+ compatible:
+ oneOf:
+ - items:
+ - enum:
+ - mediatek,mt2701-pericfg
+ - mediatek,mt2712-pericfg
+ - mediatek,mt7622-pericfg
+ - mediatek,mt7629-pericfg
+ - mediatek,mt8135-pericfg
+ - mediatek,mt8173-pericfg
+ - mediatek,mt8183-pericfg
+ - mediatek,mt8516-pericfg
+ - const: syscon
+ - items:
+ # Special case for mt7623 for backward compatibility
+ - const: mediatek,mt7623-pericfg
+ - const: mediatek,mt2701-pericfg
+ - const: syscon
+
+ reg:
+ maxItems: 1
+
+ '#clock-cells':
+ const: 1
+
+ '#reset-cells':
+ const: 1
+
+required:
+ - compatible
+ - reg
+
+examples:
+ - |
+ pericfg@10003000 {
+ compatible = "mediatek,mt8173-pericfg", "syscon";
+ reg = <0x10003000 0x1000>;
+ #clock-cells = <1>;
+ #reset-cells = <1>;
+ };
+
+ - |
+ pericfg@10003000 {
+ compatible = "mediatek,mt7623-pericfg", "mediatek,mt2701-pericfg", "syscon";
+ reg = <0x10003000 0x1000>;
+ #clock-cells = <1>;
+ #reset-cells = <1>;
+ };
diff --git a/Documentation/devicetree/bindings/display/allwinner,sun6i-a31-mipi-dsi.yaml b/Documentation/devicetree/bindings/display/allwinner,sun6i-a31-mipi-dsi.yaml
index 9e90c2b00960..e73662c8d339 100644
--- a/Documentation/devicetree/bindings/display/allwinner,sun6i-a31-mipi-dsi.yaml
+++ b/Documentation/devicetree/bindings/display/allwinner,sun6i-a31-mipi-dsi.yaml
@@ -119,7 +119,7 @@ examples:
panel@0 {
compatible = "bananapi,lhr050h41", "ilitek,ili9881c";
reg = <0>;
- power-gpios = <&pio 1 7 0>; /* PB07 */
+ power-supply = <&reg_display>;
reset-gpios = <&r_pio 0 5 1>; /* PL05 */
backlight = <&pwm_bl>;
};
diff --git a/Documentation/devicetree/bindings/display/bridge/adi,adv7123.txt b/Documentation/devicetree/bindings/display/bridge/adi,adv7123.txt
deleted file mode 100644
index d3c2a4914ea2..000000000000
--- a/Documentation/devicetree/bindings/display/bridge/adi,adv7123.txt
+++ /dev/null
@@ -1,50 +0,0 @@
-Analog Devices ADV7123 Video DAC
---------------------------------
-
-The ADV7123 is a digital-to-analog converter that outputs VGA signals from a
-parallel video input.
-
-Required properties:
-
-- compatible: Should be "adi,adv7123"
-
-Optional properties:
-
-- psave-gpios: Power save control GPIO
-
-Required nodes:
-
-The ADV7123 has two video ports. Their connections are modeled using the OF
-graph bindings specified in Documentation/devicetree/bindings/graph.txt.
-
-- Video port 0 for DPI input
-- Video port 1 for VGA output
-
-
-Example
--------
-
- adv7123: encoder@0 {
- compatible = "adi,adv7123";
-
- ports {
- #address-cells = <1>;
- #size-cells = <0>;
-
- port@0 {
- reg = <0>;
-
- adv7123_in: endpoint@0 {
- remote-endpoint = <&dpi_out>;
- };
- };
-
- port@1 {
- reg = <1>;
-
- adv7123_out: endpoint@0 {
- remote-endpoint = <&vga_connector_in>;
- };
- };
- };
- };
diff --git a/Documentation/devicetree/bindings/display/bridge/anx6345.yaml b/Documentation/devicetree/bindings/display/bridge/anx6345.yaml
index c21103869923..8c0e4f285fbc 100644
--- a/Documentation/devicetree/bindings/display/bridge/anx6345.yaml
+++ b/Documentation/devicetree/bindings/display/bridge/anx6345.yaml
@@ -37,6 +37,12 @@ properties:
type: object
properties:
+ '#address-cells':
+ const: 1
+
+ '#size-cells':
+ const: 0
+
port@0:
type: object
description: |
@@ -51,6 +57,8 @@ properties:
required:
- port@0
+ additionalProperties: false
+
required:
- compatible
- reg
diff --git a/Documentation/devicetree/bindings/display/bridge/chrontel,ch7033.yaml b/Documentation/devicetree/bindings/display/bridge/chrontel,ch7033.yaml
new file mode 100644
index 000000000000..9f38f55fc990
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/bridge/chrontel,ch7033.yaml
@@ -0,0 +1,77 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+# Copyright (C) 2019,2020 Lubomir Rintel <lkundrak@v3.sk>
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/bridge/chrontel,ch7033.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Chrontel CH7033 Video Encoder Device Tree Bindings
+
+maintainers:
+ - Lubomir Rintel <lkundrak@v3.sk>
+
+properties:
+ compatible:
+ const: chrontel,ch7033
+
+ reg:
+ maxItems: 1
+ description: I2C address of the device
+
+ ports:
+ type: object
+
+ properties:
+ port@0:
+ type: object
+ description: |
+ Video port for RGB input.
+
+ port@1:
+ type: object
+ description: |
+ DVI port, should be connected to a node compatible with the
+ dvi-connector binding.
+
+ required:
+ - port@0
+ - port@1
+
+required:
+ - compatible
+ - reg
+ - ports
+
+additionalProperties: false
+
+examples:
+ - |
+ i2c {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ vga-dvi-encoder@76 {
+ compatible = "chrontel,ch7033";
+ reg = <0x76>;
+
+ ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port@0 {
+ reg = <0>;
+ endpoint {
+ remote-endpoint = <&lcd0_rgb_out>;
+ };
+ };
+
+ port@1 {
+ reg = <1>;
+ endpoint {
+ remote-endpoint = <&dvi_in>;
+ };
+ };
+
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/display/bridge/dumb-vga-dac.txt b/Documentation/devicetree/bindings/display/bridge/dumb-vga-dac.txt
deleted file mode 100644
index 164cbb15f04c..000000000000
--- a/Documentation/devicetree/bindings/display/bridge/dumb-vga-dac.txt
+++ /dev/null
@@ -1,50 +0,0 @@
-Dumb RGB to VGA DAC bridge
----------------------------
-
-This binding is aimed for dumb RGB to VGA DAC based bridges that do not require
-any configuration.
-
-Required properties:
-
-- compatible: Must be "dumb-vga-dac"
-
-Required nodes:
-
-This device has two video ports. Their connections are modelled using the OF
-graph bindings specified in Documentation/devicetree/bindings/graph.txt.
-
-- Video port 0 for RGB input
-- Video port 1 for VGA output
-
-Optional properties:
-- vdd-supply: Power supply for DAC
-
-Example
--------
-
-bridge {
- compatible = "dumb-vga-dac";
- #address-cells = <1>;
- #size-cells = <0>;
-
- ports {
- #address-cells = <1>;
- #size-cells = <0>;
-
- port@0 {
- reg = <0>;
-
- vga_bridge_in: endpoint {
- remote-endpoint = <&tcon0_out_vga>;
- };
- };
-
- port@1 {
- reg = <1>;
-
- vga_bridge_out: endpoint {
- remote-endpoint = <&vga_con_in>;
- };
- };
- };
-};
diff --git a/Documentation/devicetree/bindings/display/bridge/dw_mipi_dsi.txt b/Documentation/devicetree/bindings/display/bridge/dw_mipi_dsi.txt
deleted file mode 100644
index b13adf30b8d3..000000000000
--- a/Documentation/devicetree/bindings/display/bridge/dw_mipi_dsi.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-Synopsys DesignWare MIPI DSI host controller
-============================================
-
-This document defines device tree properties for the Synopsys DesignWare MIPI
-DSI host controller. It doesn't constitue a device tree binding specification
-by itself but is meant to be referenced by platform-specific device tree
-bindings.
-
-When referenced from platform device tree bindings the properties defined in
-this document are defined as follows. The platform device tree bindings are
-responsible for defining whether each optional property is used or not.
-
-- reg: Memory mapped base address and length of the DesignWare MIPI DSI
- host controller registers. (mandatory)
-
-- clocks: References to all the clocks specified in the clock-names property
- as specified in [1]. (mandatory)
-
-- clock-names:
- - "pclk" is the peripheral clock for either AHB and APB. (mandatory)
- - "px_clk" is the pixel clock for the DPI/RGB input. (optional)
-
-- resets: References to all the resets specified in the reset-names property
- as specified in [2]. (optional)
-
-- reset-names: string reset name, must be "apb" if used. (optional)
-
-- panel or bridge node: see [3]. (mandatory)
-
-[1] Documentation/devicetree/bindings/clock/clock-bindings.txt
-[2] Documentation/devicetree/bindings/reset/reset.txt
-[3] Documentation/devicetree/bindings/display/mipi-dsi-bus.txt
diff --git a/Documentation/devicetree/bindings/display/bridge/ite,it6505.yaml b/Documentation/devicetree/bindings/display/bridge/ite,it6505.yaml
new file mode 100644
index 000000000000..2c500166c65d
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/bridge/ite,it6505.yaml
@@ -0,0 +1,91 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/bridge/ite,it6505.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: ITE it6505 Device Tree Bindings
+
+maintainers:
+ - Allen Chen <allen.chen@ite.com.tw>
+
+description: |
+ The IT6505 is a high-performance DisplayPort 1.1a transmitter,
+ fully compliant with DisplayPort 1.1a, HDCP 1.3 specifications.
+ The IT6505 supports color depth of up to 36 bits (12 bits/color)
+ and ensures robust transmission of high-quality uncompressed video
+ content, along with uncompressed and compressed digital audio content.
+
+ Aside from the various video output formats supported, the IT6505
+ also encodes and transmits up to 8 channels of I2S digital audio,
+ with sampling rate up to 192kHz and sample size up to 24 bits.
+ In addition, an S/PDIF input port takes in compressed audio of up to
+ 192kHz frame rate.
+
+ Each IT6505 chip comes preprogrammed with an unique HDCP key,
+ in compliance with the HDCP 1.3 standard so as to provide secure
+ transmission of high-definition content. Users of the IT6505 need not
+ purchase any HDCP keys or ROMs.
+
+properties:
+ compatible:
+ const: ite,it6505
+
+ ovdd-supply:
+ maxItems: 1
+ description: I/O voltage
+
+ pwr18-supply:
+ maxItems: 1
+ description: core voltage
+
+ interrupts:
+ maxItems: 1
+ description: interrupt specifier of INT pin
+
+ reset-gpios:
+ maxItems: 1
+ description: gpio specifier of RESET pin
+
+ extcon:
+ maxItems: 1
+ description: extcon specifier for the Power Delivery
+
+ port:
+ type: object
+ description: A port node pointing to DPI host port node
+
+required:
+ - compatible
+ - ovdd-supply
+ - pwr18-supply
+ - interrupts
+ - reset-gpios
+ - extcon
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/irq.h>
+
+ i2c {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ dp-bridge@5c {
+ compatible = "ite,it6505";
+ interrupts = <152 IRQ_TYPE_EDGE_FALLING 152 0>;
+ reg = <0x5c>;
+ pinctrl-names = "default";
+ pinctrl-0 = <&it6505_pins>;
+ ovdd-supply = <&mt6358_vsim1_reg>;
+ pwr18-supply = <&it6505_pp18_reg>;
+ reset-gpios = <&pio 179 1>;
+ extcon = <&usbc_extcon>;
+
+ port {
+ it6505_in: endpoint {
+ remote-endpoint = <&dpi_out>;
+ };
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/display/bridge/lvds-codec.yaml b/Documentation/devicetree/bindings/display/bridge/lvds-codec.yaml
index 8f373029f5d2..800c63764e71 100644
--- a/Documentation/devicetree/bindings/display/bridge/lvds-codec.yaml
+++ b/Documentation/devicetree/bindings/display/bridge/lvds-codec.yaml
@@ -50,6 +50,12 @@ properties:
This device has two video ports. Their connections are modeled using the
OF graph bindings specified in Documentation/devicetree/bindings/graph.txt
properties:
+ '#address-cells':
+ const: 1
+
+ '#size-cells':
+ const: 0
+
port@0:
type: object
description: |
@@ -66,6 +72,8 @@ properties:
- port@0
- port@1
+ additionalProperties: false
+
powerdown-gpios:
description:
The GPIO used to control the power down line of this device.
diff --git a/Documentation/devicetree/bindings/display/bridge/nwl-dsi.yaml b/Documentation/devicetree/bindings/display/bridge/nwl-dsi.yaml
new file mode 100644
index 000000000000..8aff2d68fc33
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/bridge/nwl-dsi.yaml
@@ -0,0 +1,226 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/bridge/nwl-dsi.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Northwest Logic MIPI-DSI controller on i.MX SoCs
+
+maintainers:
+ - Guido Gúnther <agx@sigxcpu.org>
+ - Robert Chiras <robert.chiras@nxp.com>
+
+description: |
+ NWL MIPI-DSI host controller found on i.MX8 platforms. This is a dsi bridge for
+ the SOCs NWL MIPI-DSI host controller.
+
+properties:
+ compatible:
+ const: fsl,imx8mq-nwl-dsi
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ '#address-cells':
+ const: 1
+
+ '#size-cells':
+ const: 0
+
+ clocks:
+ items:
+ - description: DSI core clock
+ - description: RX_ESC clock (used in escape mode)
+ - description: TX_ESC clock (used in escape mode)
+ - description: PHY_REF clock
+ - description: LCDIF clock
+
+ clock-names:
+ items:
+ - const: core
+ - const: rx_esc
+ - const: tx_esc
+ - const: phy_ref
+ - const: lcdif
+
+ mux-controls:
+ description:
+ mux controller node to use for operating the input mux
+
+ phys:
+ maxItems: 1
+ description:
+ A phandle to the phy module representing the DPHY
+
+ phy-names:
+ items:
+ - const: dphy
+
+ power-domains:
+ maxItems: 1
+
+ resets:
+ items:
+ - description: dsi byte reset line
+ - description: dsi dpi reset line
+ - description: dsi esc reset line
+ - description: dsi pclk reset line
+
+ reset-names:
+ items:
+ - const: byte
+ - const: dpi
+ - const: esc
+ - const: pclk
+
+ ports:
+ type: object
+ description:
+ A node containing DSI input & output port nodes with endpoint
+ definitions as documented in
+ Documentation/devicetree/bindings/graph.txt.
+ properties:
+ port@0:
+ type: object
+ description:
+ Input port node to receive pixel data from the
+ display controller. Exactly one endpoint must be
+ specified.
+ properties:
+ '#address-cells':
+ const: 1
+
+ '#size-cells':
+ const: 0
+
+ endpoint@0:
+ description: sub-node describing the input from LCDIF
+ type: object
+
+ endpoint@1:
+ description: sub-node describing the input from DCSS
+ type: object
+
+ reg:
+ const: 0
+
+ required:
+ - '#address-cells'
+ - '#size-cells'
+ - reg
+
+ oneOf:
+ - required:
+ - endpoint@0
+ - required:
+ - endpoint@1
+
+ additionalProperties: false
+
+ port@1:
+ type: object
+ description:
+ DSI output port node to the panel or the next bridge
+ in the chain
+
+ '#address-cells':
+ const: 1
+
+ '#size-cells':
+ const: 0
+
+ required:
+ - '#address-cells'
+ - '#size-cells'
+ - port@0
+ - port@1
+
+ additionalProperties: false
+
+patternProperties:
+ "^panel@[0-9]+$":
+ type: object
+
+required:
+ - '#address-cells'
+ - '#size-cells'
+ - clock-names
+ - clocks
+ - compatible
+ - interrupts
+ - mux-controls
+ - phy-names
+ - phys
+ - ports
+ - reg
+ - reset-names
+ - resets
+
+additionalProperties: false
+
+examples:
+ - |
+
+ #include <dt-bindings/clock/imx8mq-clock.h>
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/reset/imx8mq-reset.h>
+
+ mipi_dsi: mipi_dsi@30a00000 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "fsl,imx8mq-nwl-dsi";
+ reg = <0x30A00000 0x300>;
+ clocks = <&clk IMX8MQ_CLK_DSI_CORE>,
+ <&clk IMX8MQ_CLK_DSI_AHB>,
+ <&clk IMX8MQ_CLK_DSI_IPG_DIV>,
+ <&clk IMX8MQ_CLK_DSI_PHY_REF>,
+ <&clk IMX8MQ_CLK_LCDIF_PIXEL>;
+ clock-names = "core", "rx_esc", "tx_esc", "phy_ref", "lcdif";
+ interrupts = <GIC_SPI 34 IRQ_TYPE_LEVEL_HIGH>;
+ mux-controls = <&mux 0>;
+ power-domains = <&pgc_mipi>;
+ resets = <&src IMX8MQ_RESET_MIPI_DSI_RESET_BYTE_N>,
+ <&src IMX8MQ_RESET_MIPI_DSI_DPI_RESET_N>,
+ <&src IMX8MQ_RESET_MIPI_DSI_ESC_RESET_N>,
+ <&src IMX8MQ_RESET_MIPI_DSI_PCLK_RESET_N>;
+ reset-names = "byte", "dpi", "esc", "pclk";
+ phys = <&dphy>;
+ phy-names = "dphy";
+
+ panel@0 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "rocktech,jh057n00900";
+ reg = <0>;
+ port@0 {
+ reg = <0>;
+ panel_in: endpoint {
+ remote-endpoint = <&mipi_dsi_out>;
+ };
+ };
+ };
+
+ ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port@0 {
+ #size-cells = <0>;
+ #address-cells = <1>;
+ reg = <0>;
+ mipi_dsi_in: endpoint@0 {
+ reg = <0>;
+ remote-endpoint = <&lcdif_mipi_dsi>;
+ };
+ };
+ port@1 {
+ reg = <1>;
+ mipi_dsi_out: endpoint {
+ remote-endpoint = <&panel_in>;
+ };
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/display/bridge/ps8640.yaml b/Documentation/devicetree/bindings/display/bridge/ps8640.yaml
index 5dff93641bea..7e27cfcf770d 100644
--- a/Documentation/devicetree/bindings/display/bridge/ps8640.yaml
+++ b/Documentation/devicetree/bindings/display/bridge/ps8640.yaml
@@ -50,6 +50,12 @@ properties:
Documentation/devicetree/bindings/media/video-interfaces.txt
Documentation/devicetree/bindings/graph.txt
properties:
+ '#address-cells':
+ const: 1
+
+ '#size-cells':
+ const: 0
+
port@0:
type: object
description: |
@@ -63,6 +69,8 @@ properties:
required:
- port@0
+ additionalProperties: false
+
required:
- compatible
- reg
diff --git a/Documentation/devicetree/bindings/display/bridge/simple-bridge.yaml b/Documentation/devicetree/bindings/display/bridge/simple-bridge.yaml
new file mode 100644
index 000000000000..0880cbf217d5
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/bridge/simple-bridge.yaml
@@ -0,0 +1,99 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/bridge/simple-bridge.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Transparent non-programmable DRM bridges
+
+maintainers:
+ - Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
+ - Maxime Ripard <mripard@kernel.org>
+
+description: |
+ This binding supports transparent non-programmable bridges that don't require
+ any configuration, with a single input and a single output.
+
+properties:
+ compatible:
+ oneOf:
+ - items:
+ - enum:
+ - ti,ths8134a
+ - ti,ths8134b
+ - const: ti,ths8134
+ - enum:
+ - adi,adv7123
+ - dumb-vga-dac
+ - ti,opa362
+ - ti,ths8134
+ - ti,ths8135
+
+ ports:
+ type: object
+ description: |
+ This device has two video ports. Their connections are modeled using the
+ OF graph bindings specified in Documentation/devicetree/bindings/graph.txt.
+ properties:
+ '#address-cells':
+ const: 1
+
+ '#size-cells':
+ const: 0
+
+ port@0:
+ type: object
+ description: The bridge input
+
+ port@1:
+ type: object
+ description: The bridge output
+
+ required:
+ - port@0
+ - port@1
+
+ additionalProperties: false
+
+ enable-gpios:
+ maxItems: 1
+ description: GPIO controlling bridge enable
+
+ vdd-supply:
+ maxItems: 1
+ description: Power supply for the bridge
+
+required:
+ - compatible
+ - ports
+
+additionalProperties: false
+
+examples:
+ - |
+ bridge {
+ compatible = "ti,ths8134a", "ti,ths8134";
+
+ ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port@0 {
+ reg = <0>;
+
+ vga_bridge_in: endpoint {
+ remote-endpoint = <&tcon0_out_vga>;
+ };
+ };
+
+ port@1 {
+ reg = <1>;
+
+ vga_bridge_out: endpoint {
+ remote-endpoint = <&vga_con_in>;
+ };
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/bridge/snps,dw-mipi-dsi.yaml b/Documentation/devicetree/bindings/display/bridge/snps,dw-mipi-dsi.yaml
new file mode 100644
index 000000000000..012aa8e7cb8c
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/bridge/snps,dw-mipi-dsi.yaml
@@ -0,0 +1,68 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/bridge/snps,dw-mipi-dsi.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Synopsys DesignWare MIPI DSI host controller
+
+maintainers:
+ - Philippe CORNU <philippe.cornu@st.com>
+
+description: |
+ This document defines device tree properties for the Synopsys DesignWare MIPI
+ DSI host controller. It doesn't constitue a device tree binding specification
+ by itself but is meant to be referenced by platform-specific device tree
+ bindings.
+
+ When referenced from platform device tree bindings the properties defined in
+ this document are defined as follows. The platform device tree bindings are
+ responsible for defining whether each property is required or optional.
+
+allOf:
+ - $ref: ../dsi-controller.yaml#
+
+properties:
+ reg:
+ maxItems: 1
+
+ clocks:
+ items:
+ - description: Module clock
+ - description: DSI bus clock for either AHB and APB
+ - description: Pixel clock for the DPI/RGB input
+ minItems: 2
+
+ clock-names:
+ items:
+ - const: ref
+ - const: pclk
+ - const: px_clk
+ minItems: 2
+
+ resets:
+ maxItems: 1
+
+ reset-names:
+ const: apb
+
+ ports:
+ type: object
+
+ properties:
+ port@0:
+ type: object
+ description: Input node to receive pixel data.
+ port@1:
+ type: object
+ description: DSI output node to panel.
+
+ required:
+ - port@0
+ - port@1
+
+required:
+ - clock-names
+ - clocks
+ - ports
+ - reg
diff --git a/Documentation/devicetree/bindings/display/bridge/thine,thc63lvd1024.txt b/Documentation/devicetree/bindings/display/bridge/thine,thc63lvd1024.txt
deleted file mode 100644
index d17d1e5820d7..000000000000
--- a/Documentation/devicetree/bindings/display/bridge/thine,thc63lvd1024.txt
+++ /dev/null
@@ -1,66 +0,0 @@
-Thine Electronics THC63LVD1024 LVDS decoder
--------------------------------------------
-
-The THC63LVD1024 is a dual link LVDS receiver designed to convert LVDS streams
-to parallel data outputs. The chip supports single/dual input/output modes,
-handling up to two LVDS input streams and up to two digital CMOS/TTL outputs.
-
-Single or dual operation mode, output data mapping and DDR output modes are
-configured through input signals and the chip does not expose any control bus.
-
-Required properties:
-- compatible: Shall be "thine,thc63lvd1024"
-- vcc-supply: Power supply for TTL output, TTL CLOCKOUT signal, LVDS input,
- PPL and digital circuitry
-
-Optional properties:
-- powerdown-gpios: Power down GPIO signal, pin name "/PDWN". Active low
-- oe-gpios: Output enable GPIO signal, pin name "OE". Active high
-
-The THC63LVD1024 video port connections are modeled according
-to OF graph bindings specified by Documentation/devicetree/bindings/graph.txt
-
-Required video port nodes:
-- port@0: First LVDS input port
-- port@2: First digital CMOS/TTL parallel output
-
-Optional video port nodes:
-- port@1: Second LVDS input port
-- port@3: Second digital CMOS/TTL parallel output
-
-The device can operate in single-link mode or dual-link mode. In single-link
-mode, all pixels are received on port@0, and port@1 shall not contain any
-endpoint. In dual-link mode, even-numbered pixels are received on port@0 and
-odd-numbered pixels on port@1, and both port@0 and port@1 shall contain
-endpoints.
-
-Example:
---------
-
- thc63lvd1024: lvds-decoder {
- compatible = "thine,thc63lvd1024";
-
- vcc-supply = <&reg_lvds_vcc>;
- powerdown-gpios = <&gpio4 15 GPIO_ACTIVE_LOW>;
-
- ports {
- #address-cells = <1>;
- #size-cells = <0>;
-
- port@0 {
- reg = <0>;
-
- lvds_dec_in_0: endpoint {
- remote-endpoint = <&lvds_out>;
- };
- };
-
- port@2{
- reg = <2>;
-
- lvds_dec_out_2: endpoint {
- remote-endpoint = <&adv7511_in>;
- };
- };
- };
- };
diff --git a/Documentation/devicetree/bindings/display/bridge/thine,thc63lvd1024.yaml b/Documentation/devicetree/bindings/display/bridge/thine,thc63lvd1024.yaml
new file mode 100644
index 000000000000..469ac4a34273
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/bridge/thine,thc63lvd1024.yaml
@@ -0,0 +1,121 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/bridge/thine,thc63lvd1024.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Thine Electronics THC63LVD1024 LVDS Decoder
+
+maintainers:
+ - Jacopo Mondi <jacopo+renesas@jmondi.org>
+ - Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
+
+description: |
+ The THC63LVD1024 is a dual link LVDS receiver designed to convert LVDS
+ streams to parallel data outputs. The chip supports single/dual input/output
+ modes, handling up to two LVDS input streams and up to two digital CMOS/TTL
+ outputs.
+
+ Single or dual operation mode, output data mapping and DDR output modes are
+ configured through input signals and the chip does not expose any control
+ bus.
+
+properties:
+ compatible:
+ const: thine,thc63lvd1024
+
+ ports:
+ type: object
+ description: |
+ This device has four video ports. Their connections are modeled using the
+ OF graph bindings specified in Documentation/devicetree/bindings/graph.txt.
+
+ The device can operate in single-link mode or dual-link mode. In
+ single-link mode, all pixels are received on port@0, and port@1 shall not
+ contain any endpoint. In dual-link mode, even-numbered pixels are
+ received on port@0 and odd-numbered pixels on port@1, and both port@0 and
+ port@1 shall contain endpoints.
+
+ properties:
+ '#address-cells':
+ const: 1
+
+ '#size-cells':
+ const: 0
+
+ port@0:
+ type: object
+ description: First LVDS input port
+
+ port@1:
+ type: object
+ description: Second LVDS input port
+
+ port@2:
+ type: object
+ description: First digital CMOS/TTL parallel output
+
+ port@3:
+ type: object
+ description: Second digital CMOS/TTL parallel output
+
+ required:
+ - port@0
+ - port@2
+
+ additionalProperties: false
+
+ oe-gpios:
+ maxItems: 1
+ description: Output enable GPIO signal, pin name "OE", active high.
+
+ powerdown-gpios:
+ maxItems: 1
+ description: Power down GPIO signal, pin name "/PDWN", active low.
+
+ vcc-supply:
+ maxItems: 1
+ description:
+ Power supply for the TTL output, TTL CLOCKOUT signal, LVDS input, PLL and
+ digital circuitry.
+
+required:
+ - compatible
+ - ports
+ - vcc-supply
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ lvds-decoder {
+ compatible = "thine,thc63lvd1024";
+
+ vcc-supply = <&reg_lvds_vcc>;
+ powerdown-gpios = <&gpio4 15 GPIO_ACTIVE_LOW>;
+
+ ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port@0 {
+ reg = <0>;
+
+ lvds_dec_in_0: endpoint {
+ remote-endpoint = <&lvds_out>;
+ };
+ };
+
+ port@2 {
+ reg = <2>;
+
+ lvds_dec_out_2: endpoint {
+ remote-endpoint = <&adv7511_in>;
+ };
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/bridge/ti,ths813x.txt b/Documentation/devicetree/bindings/display/bridge/ti,ths813x.txt
deleted file mode 100644
index df3d7c1ac09e..000000000000
--- a/Documentation/devicetree/bindings/display/bridge/ti,ths813x.txt
+++ /dev/null
@@ -1,51 +0,0 @@
-THS8134 and THS8135 Video DAC
------------------------------
-
-This is the binding for Texas Instruments THS8134, THS8134A, THS8134B and
-THS8135 Video DAC bridges.
-
-Required properties:
-
-- compatible: Must be one of
- "ti,ths8134"
- "ti,ths8134a," "ti,ths8134"
- "ti,ths8134b", "ti,ths8134"
- "ti,ths8135"
-
-Required nodes:
-
-This device has two video ports. Their connections are modelled using the OF
-graph bindings specified in Documentation/devicetree/bindings/graph.txt.
-
-- Video port 0 for RGB input
-- Video port 1 for VGA output
-
-Example
--------
-
-vga-bridge {
- compatible = "ti,ths8135";
- #address-cells = <1>;
- #size-cells = <0>;
-
- ports {
- #address-cells = <1>;
- #size-cells = <0>;
-
- port@0 {
- reg = <0>;
-
- vga_bridge_in: endpoint {
- remote-endpoint = <&lcdc_out_vga>;
- };
- };
-
- port@1 {
- reg = <1>;
-
- vga_bridge_out: endpoint {
- remote-endpoint = <&vga_con_in>;
- };
- };
- };
-};
diff --git a/Documentation/devicetree/bindings/display/dsi-controller.yaml b/Documentation/devicetree/bindings/display/dsi-controller.yaml
index fd986c36c737..85b71b1fd28a 100644
--- a/Documentation/devicetree/bindings/display/dsi-controller.yaml
+++ b/Documentation/devicetree/bindings/display/dsi-controller.yaml
@@ -28,7 +28,7 @@ description: |
properties:
$nodename:
- pattern: "^dsi-controller(@.*)?$"
+ pattern: "^dsi(@.*)?$"
"#address-cells":
const: 1
@@ -76,7 +76,7 @@ patternProperties:
examples:
- |
#include <dt-bindings/gpio/gpio.h>
- dsi-controller@a0351000 {
+ dsi@a0351000 {
reg = <0xa0351000 0x1000>;
#address-cells = <1>;
#size-cells = <0>;
diff --git a/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.txt b/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.txt
index 58914cf681b8..77def4456706 100644
--- a/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.txt
+++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.txt
@@ -17,6 +17,9 @@ Required properties:
Documentation/devicetree/bindings/graph.txt. This port should be connected
to the input port of an attached HDMI or LVDS encoder chip.
+Optional properties:
+- pinctrl-names: Contain "default" and "sleep".
+
Example:
dpi0: dpi@1401d000 {
@@ -27,6 +30,9 @@ dpi0: dpi@1401d000 {
<&mmsys CLK_MM_DPI_ENGINE>,
<&apmixedsys CLK_APMIXED_TVDPLL>;
clock-names = "pixel", "engine", "pll";
+ pinctrl-names = "default", "sleep";
+ pinctrl-0 = <&dpi_pin_func>;
+ pinctrl-1 = <&dpi_pin_idle>;
port {
dpi0_out: endpoint {
diff --git a/Documentation/devicetree/bindings/display/mediatek/mediatek,dsi.txt b/Documentation/devicetree/bindings/display/mediatek/mediatek,dsi.txt
index a19a6cc375ed..8e4729de8c85 100644
--- a/Documentation/devicetree/bindings/display/mediatek/mediatek,dsi.txt
+++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,dsi.txt
@@ -33,6 +33,13 @@ Required properties:
- #clock-cells: must be <0>;
- #phy-cells: must be <0>.
+Optional properties:
+- drive-strength-microamp: adjust driving current, should be 3000 ~ 6000. And
+ the step is 200.
+- nvmem-cells: A phandle to the calibration data provided by a nvmem device. If
+ unspecified default values shall be used.
+- nvmem-cell-names: Should be "calibration-data"
+
Example:
mipi_tx0: mipi-dphy@10215000 {
@@ -42,6 +49,9 @@ mipi_tx0: mipi-dphy@10215000 {
clock-output-names = "mipi_tx0_pll";
#clock-cells = <0>;
#phy-cells = <0>;
+ drive-strength-microamp = <4600>;
+ nvmem-cells= <&mipi_tx_calibration>;
+ nvmem-cell-names = "calibration-data";
};
dsi0: dsi@1401b000 {
diff --git a/Documentation/devicetree/bindings/display/panel/arm,versatile-tft-panel.txt b/Documentation/devicetree/bindings/display/panel/arm,versatile-tft-panel.txt
deleted file mode 100644
index 0601a9e34703..000000000000
--- a/Documentation/devicetree/bindings/display/panel/arm,versatile-tft-panel.txt
+++ /dev/null
@@ -1,31 +0,0 @@
-ARM Versatile TFT Panels
-
-These panels are connected to the daughterboards found on the
-ARM Versatile reference designs.
-
-This device node must appear as a child to a "syscon"-compatible
-node.
-
-Required properties:
-- compatible: should be "arm,versatile-tft-panel"
-
-Required subnodes:
-- port: see display/panel/panel-common.yaml, graph.txt
-
-
-Example:
-
-sysreg@0 {
- compatible = "arm,versatile-sysreg", "syscon", "simple-mfd";
- reg = <0x00000 0x1000>;
-
- panel: display@0 {
- compatible = "arm,versatile-tft-panel";
-
- port {
- panel_in: endpoint {
- remote-endpoint = <&foo>;
- };
- };
- };
-};
diff --git a/Documentation/devicetree/bindings/display/panel/arm,versatile-tft-panel.yaml b/Documentation/devicetree/bindings/display/panel/arm,versatile-tft-panel.yaml
new file mode 100644
index 000000000000..41fd5713c156
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/arm,versatile-tft-panel.yaml
@@ -0,0 +1,54 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/arm,versatile-tft-panel.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: ARM Versatile TFT Panels
+
+maintainers:
+ - Linus Walleij <linus.walleij@linaro.org>
+
+description: |
+ These panels are connected to the daughterboards found on the
+ ARM Versatile reference designs.
+
+ This device node must appear as a child to a "syscon"-compatible
+ node.
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: arm,versatile-tft-panel
+
+ port: true
+
+required:
+ - compatible
+ - port
+
+additionalProperties: false
+
+examples:
+ - |
+ sysreg {
+ compatible = "arm,versatile-sysreg", "syscon", "simple-mfd";
+ reg = <0x00000 0x1000>;
+
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel {
+ compatible = "arm,versatile-tft-panel";
+
+ port {
+ panel_in: endpoint {
+ remote-endpoint = <&foo>;
+ };
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/asus,z00t-tm5p5-nt35596.yaml b/Documentation/devicetree/bindings/display/panel/asus,z00t-tm5p5-nt35596.yaml
new file mode 100644
index 000000000000..083d2b9d0c69
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/asus,z00t-tm5p5-nt35596.yaml
@@ -0,0 +1,56 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/asus,z00t-tm5p5-nt35596.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: ASUS Z00T TM5P5 NT35596 5.5" 1080×1920 LCD Panel
+
+maintainers:
+ - Konrad Dybcio <konradybcio@gmail.com>
+
+description: |+
+ This panel seems to only be found in the Asus Z00T
+ smartphone and we have no straightforward way of
+ actually getting the correct model number,
+ as no schematics are released publicly.
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: asus,z00t-tm5p5-n35596
+ reg: true
+ reset-gpios: true
+ vdd-supply:
+ description: core voltage supply
+ vddio-supply:
+ description: vddio supply
+
+required:
+ - compatible
+ - reg
+ - vdd-supply
+ - vddio-supply
+ - reset-gpios
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ dsi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ panel@0 {
+ reg = <0>;
+
+ compatible = "asus,z00t-tm5p5-n35596";
+
+ vdd-supply = <&pm8916_l8>;
+ vddio-supply = <&pm8916_l6>;
+ reset-gpios = <&msmgpio 25 GPIO_ACTIVE_HIGH>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/display/panel/boe,himax8279d.txt b/Documentation/devicetree/bindings/display/panel/boe,himax8279d.txt
deleted file mode 100644
index 3caea2172b1b..000000000000
--- a/Documentation/devicetree/bindings/display/panel/boe,himax8279d.txt
+++ /dev/null
@@ -1,24 +0,0 @@
-Boe Himax8279d 1200x1920 TFT LCD panel
-
-Required properties:
-- compatible: should be "boe,himax8279d8p" and one of: "boe,himax8279d10p"
-- reg: DSI virtual channel of the peripheral
-- enable-gpios: panel enable gpio
-- pp33-gpios: a GPIO phandle for the 3.3v pin that provides the supply voltage
-- pp18-gpios: a GPIO phandle for the 1.8v pin that provides the supply voltage
-
-Optional properties:
-- backlight: phandle of the backlight device attached to the panel
-
-Example:
-
- &mipi_dsi {
- panel {
- compatible = "boe,himax8279d8p", "boe,himax8279d10p";
- reg = <0>;
- backlight = <&backlight>;
- enable-gpios = <&gpio 45 GPIO_ACTIVE_HIGH>;
- pp33-gpios = <&gpio 35 GPIO_ACTIVE_HIGH>;
- pp18-gpios = <&gpio 36 GPIO_ACTIVE_HIGH>;
- };
- };
diff --git a/Documentation/devicetree/bindings/display/panel/boe,himax8279d.yaml b/Documentation/devicetree/bindings/display/panel/boe,himax8279d.yaml
new file mode 100644
index 000000000000..272a3a018a33
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/boe,himax8279d.yaml
@@ -0,0 +1,59 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/boe,himax8279d.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Boe Himax8279d 1200x1920 TFT LCD panel
+
+maintainers:
+ - Jerry Han <jerry.han.hq@gmail.com>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ items:
+ - const: boe,himax8279d8p
+ - const: boe,himax8279d10p
+
+ backlight: true
+ enable-gpios: true
+ reg: true
+
+ pp33-gpios:
+ maxItems: 1
+ description: GPIO for the 3.3v pin that provides the supply voltage
+
+ pp18-gpios:
+ maxItems: 1
+ description: GPIO for the 1.8v pin that provides the supply voltage
+
+required:
+ - compatible
+ - reg
+ - enable-gpios
+ - pp33-gpios
+ - pp18-gpios
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ dsi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ panel@0 {
+ compatible = "boe,himax8279d8p", "boe,himax8279d10p";
+ reg = <0>;
+ backlight = <&backlight>;
+ enable-gpios = <&gpio 45 GPIO_ACTIVE_HIGH>;
+ pp33-gpios = <&gpio 35 GPIO_ACTIVE_HIGH>;
+ pp18-gpios = <&gpio 36 GPIO_ACTIVE_HIGH>;
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/boe,tv101wum-nl6.yaml b/Documentation/devicetree/bindings/display/panel/boe,tv101wum-nl6.yaml
index 740213459134..7f5df5851017 100644
--- a/Documentation/devicetree/bindings/display/panel/boe,tv101wum-nl6.yaml
+++ b/Documentation/devicetree/bindings/display/panel/boe,tv101wum-nl6.yaml
@@ -24,6 +24,8 @@ properties:
- boe,tv101wum-n53
# AUO B101UAN08.3 10.1" WUXGA TFT LCD panel
- auo,b101uan08.3
+ # BOE TV105WUM-NW0 10.5" WUXGA TFT LCD panel
+ - boe,tv105wum-nw0
reg:
description: the virtual channel number of a DSI peripheral
diff --git a/Documentation/devicetree/bindings/display/panel/display-timings.yaml b/Documentation/devicetree/bindings/display/panel/display-timings.yaml
index c8c0c9cb0492..56903ded005e 100644
--- a/Documentation/devicetree/bindings/display/panel/display-timings.yaml
+++ b/Documentation/devicetree/bindings/display/panel/display-timings.yaml
@@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/display/panel/display-timings.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
-title: display timing bindings
+title: display timings bindings
maintainers:
- Thierry Reding <thierry.reding@gmail.com>
@@ -14,7 +14,7 @@ maintainers:
description: |
A display panel may be able to handle several display timings,
with different resolutions.
- The display-timings node makes it possible to specify the timing
+ The display-timings node makes it possible to specify the timings
and to specify the timing that is native for the display.
properties:
@@ -25,8 +25,8 @@ properties:
$ref: /schemas/types.yaml#/definitions/phandle
description: |
The default display timing is the one specified as native-mode.
- If no native-mode is specified then the first node is assumed the
- native mode.
+ If no native-mode is specified then the first node is assumed
+ to be the native mode.
patternProperties:
"^timing":
diff --git a/Documentation/devicetree/bindings/display/panel/feiyang,fy07024di26a30d.txt b/Documentation/devicetree/bindings/display/panel/feiyang,fy07024di26a30d.txt
deleted file mode 100644
index 82caa7b65ae8..000000000000
--- a/Documentation/devicetree/bindings/display/panel/feiyang,fy07024di26a30d.txt
+++ /dev/null
@@ -1,20 +0,0 @@
-Feiyang FY07024DI26A30-D 7" MIPI-DSI LCD Panel
-
-Required properties:
-- compatible: must be "feiyang,fy07024di26a30d"
-- reg: DSI virtual channel used by that screen
-- avdd-supply: analog regulator dc1 switch
-- dvdd-supply: 3v3 digital regulator
-- reset-gpios: a GPIO phandle for the reset pin
-
-Optional properties:
-- backlight: phandle for the backlight control.
-
-panel@0 {
- compatible = "feiyang,fy07024di26a30d";
- reg = <0>;
- avdd-supply = <&reg_dc1sw>;
- dvdd-supply = <&reg_dldo2>;
- reset-gpios = <&pio 3 24 GPIO_ACTIVE_HIGH>; /* LCD-RST: PD24 */
- backlight = <&backlight>;
-};
diff --git a/Documentation/devicetree/bindings/display/panel/feiyang,fy07024di26a30d.yaml b/Documentation/devicetree/bindings/display/panel/feiyang,fy07024di26a30d.yaml
new file mode 100644
index 000000000000..95acf9e96f1c
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/feiyang,fy07024di26a30d.yaml
@@ -0,0 +1,58 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/feiyang,fy07024di26a30d.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Feiyang FY07024DI26A30-D 7" MIPI-DSI LCD Panel
+
+maintainers:
+ - Jagan Teki <jagan@amarulasolutions.com>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: feiyang,fy07024di26a30d
+
+ reg:
+ description: DSI virtual channel used by that screen
+ maxItems: 1
+
+ avdd-supply:
+ description: analog regulator dc1 switch
+
+ dvdd-supply:
+ description: 3v3 digital regulator
+
+ reset-gpios: true
+
+ backlight: true
+
+required:
+ - compatible
+ - reg
+ - avdd-supply
+ - dvdd-supply
+ - reset-gpios
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ dsi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel@0 {
+ compatible = "feiyang,fy07024di26a30d";
+ reg = <0>;
+ avdd-supply = <&reg_dc1sw>;
+ dvdd-supply = <&reg_dldo2>;
+ reset-gpios = <&pio 3 24 GPIO_ACTIVE_HIGH>; /* LCD-RST: PD24 */
+ backlight = <&backlight>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/display/panel/ilitek,ili9322.txt b/Documentation/devicetree/bindings/display/panel/ilitek,ili9322.txt
deleted file mode 100644
index 3d5ce6ad6ec7..000000000000
--- a/Documentation/devicetree/bindings/display/panel/ilitek,ili9322.txt
+++ /dev/null
@@ -1,49 +0,0 @@
-Ilitek ILI9322 TFT panel driver with SPI control bus
-
-This is a driver for 320x240 TFT panels, accepting a variety of input
-streams that get adapted and scaled to the panel. The panel output has
-960 TFT source driver pins and 240 TFT gate driver pins, VCOM, VCOML and
-VCOMH outputs.
-
-Required properties:
- - compatible: "dlink,dir-685-panel", "ilitek,ili9322"
- (full system-specific compatible is always required to look up configuration)
- - reg: address of the panel on the SPI bus
-
-Optional properties:
- - vcc-supply: core voltage supply, see regulator/regulator.txt
- - iovcc-supply: voltage supply for the interface input/output signals,
- see regulator/regulator.txt
- - vci-supply: voltage supply for analog parts, see regulator/regulator.txt
- - reset-gpios: a GPIO spec for the reset pin, see gpio/gpio.txt
-
- The following optional properties only apply to RGB and YUV input modes and
- can be omitted for BT.656 input modes:
-
- - pixelclk-active: see display/panel/display-timing.txt
- - de-active: see display/panel/display-timing.txt
- - hsync-active: see display/panel/display-timing.txt
- - vsync-active: see display/panel/display-timing.txt
-
-The panel must obey the rules for a SPI slave device as specified in
-spi/spi-bus.txt
-
-The device node can contain one 'port' child node with one child
-'endpoint' node, according to the bindings defined in
-media/video-interfaces.txt. This node should describe panel's video bus.
-
-Example:
-
-panel: display@0 {
- compatible = "dlink,dir-685-panel", "ilitek,ili9322";
- reg = <0>;
- vcc-supply = <&vdisp>;
- iovcc-supply = <&vdisp>;
- vci-supply = <&vdisp>;
-
- port {
- panel_in: endpoint {
- remote-endpoint = <&display_out>;
- };
- };
-};
diff --git a/Documentation/devicetree/bindings/display/panel/ilitek,ili9322.yaml b/Documentation/devicetree/bindings/display/panel/ilitek,ili9322.yaml
new file mode 100644
index 000000000000..177d48c5bd97
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/ilitek,ili9322.yaml
@@ -0,0 +1,71 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/ilitek,ili9322.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Ilitek ILI9322 TFT panel driver with SPI control bus
+
+maintainers:
+ - Linus Walleij <linus.walleij@linaro.org>
+
+description: |
+ This is a driver for 320x240 TFT panels, accepting a variety of input
+ streams that get adapted and scaled to the panel. The panel output has
+ 960 TFT source driver pins and 240 TFT gate driver pins, VCOM, VCOML and
+ VCOMH outputs.
+
+ The panel must obey the rules for a SPI slave device as specified in
+ spi/spi-controller.yaml
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ items:
+ - enum:
+ - dlink,dir-685-panel
+
+ - const: ilitek,ili9322
+
+ reset-gpios: true
+ port: true
+
+ vcc-supply:
+ description: Core voltage supply
+
+ iovcc-supply:
+ description: Voltage supply for the interface input/output signals
+
+ vci-supply:
+ description: Voltage supply for analog parts
+
+required:
+ - compatible
+ - reg
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ spi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel: display@0 {
+ compatible = "dlink,dir-685-panel", "ilitek,ili9322";
+ reg = <0>;
+ vcc-supply = <&vdisp>;
+ iovcc-supply = <&vdisp>;
+ vci-supply = <&vdisp>;
+
+ port {
+ panel_in: endpoint {
+ remote-endpoint = <&display_out>;
+ };
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/ilitek,ili9881c.txt b/Documentation/devicetree/bindings/display/panel/ilitek,ili9881c.txt
deleted file mode 100644
index 4a041acb4e18..000000000000
--- a/Documentation/devicetree/bindings/display/panel/ilitek,ili9881c.txt
+++ /dev/null
@@ -1,20 +0,0 @@
-Ilitek ILI9881c based MIPI-DSI panels
-
-Required properties:
- - compatible: must be "ilitek,ili9881c" and one of:
- * "bananapi,lhr050h41"
- - reg: DSI virtual channel used by that screen
- - power-supply: phandle to the power regulator
- - reset-gpios: a GPIO phandle for the reset pin
-
-Optional properties:
- - backlight: phandle to the backlight used
-
-Example:
-panel@0 {
- compatible = "bananapi,lhr050h41", "ilitek,ili9881c";
- reg = <0>;
- power-supply = <&reg_display>;
- reset-gpios = <&r_pio 0 5 GPIO_ACTIVE_LOW>; /* PL05 */
- backlight = <&pwm_bl>;
-};
diff --git a/Documentation/devicetree/bindings/display/panel/ilitek,ili9881c.yaml b/Documentation/devicetree/bindings/display/panel/ilitek,ili9881c.yaml
new file mode 100644
index 000000000000..a39332276bab
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/ilitek,ili9881c.yaml
@@ -0,0 +1,50 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/ilitek,ili9881c.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Ilitek ILI9881c based MIPI-DSI panels
+
+maintainers:
+ - Maxime Ripard <mripard@kernel.org>
+
+properties:
+ compatible:
+ items:
+ - enum:
+ - bananapi,lhr050h41
+
+ - const: ilitek,ili9881c
+
+ backlight: true
+ power-supply: true
+ reg: true
+ reset-gpios: true
+
+required:
+ - compatible
+ - power-supply
+ - reg
+ - reset-gpios
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ dsi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel@0 {
+ compatible = "bananapi,lhr050h41", "ilitek,ili9881c";
+ reg = <0>;
+ power-supply = <&reg_display>;
+ reset-gpios = <&r_pio 0 5 GPIO_ACTIVE_LOW>; /* PL05 */
+ backlight = <&pwm_bl>;
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/innolux,p097pfg.txt b/Documentation/devicetree/bindings/display/panel/innolux,p097pfg.txt
deleted file mode 100644
index d1cab3a8f0fb..000000000000
--- a/Documentation/devicetree/bindings/display/panel/innolux,p097pfg.txt
+++ /dev/null
@@ -1,24 +0,0 @@
-Innolux P097PFG 9.7" 1536x2048 TFT LCD panel
-
-Required properties:
-- compatible: should be "innolux,p097pfg"
-- reg: DSI virtual channel of the peripheral
-- avdd-supply: phandle of the regulator that provides positive voltage
-- avee-supply: phandle of the regulator that provides negative voltage
-- enable-gpios: panel enable gpio
-
-Optional properties:
-- backlight: phandle of the backlight device attached to the panel
-
-Example:
-
- &mipi_dsi {
- panel@0 {
- compatible = "innolux,p079zca";
- reg = <0>;
- avdd-supply = <...>;
- avee-supply = <...>;
- backlight = <&backlight>;
- enable-gpios = <&gpio1 13 GPIO_ACTIVE_HIGH>;
- };
- };
diff --git a/Documentation/devicetree/bindings/display/panel/innolux,p097pfg.yaml b/Documentation/devicetree/bindings/display/panel/innolux,p097pfg.yaml
new file mode 100644
index 000000000000..5a5f071627fb
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/innolux,p097pfg.yaml
@@ -0,0 +1,56 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/innolux,p097pfg.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Innolux P097PFG 9.7" 1536x2048 TFT LCD panel
+
+maintainers:
+ - Lin Huang <hl@rock-chips.com>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: innolux,p097pfg
+
+ backlight: true
+ enable-gpios: true
+ reg: true
+
+ avdd-supply:
+ description: The regulator that provides positive voltage
+
+ avee-supply:
+ description: The regulator that provides negative voltage
+
+required:
+ - compatible
+ - reg
+ - avdd-supply
+ - avee-supply
+ - enable-gpios
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ dsi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel@0 {
+ compatible = "innolux,p097pfg";
+ reg = <0>;
+ avdd-supply = <&avdd>;
+ avee-supply = <&avee>;
+ backlight = <&backlight>;
+ enable-gpios = <&gpio1 13 GPIO_ACTIVE_HIGH>;
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/innolux,p120zdg-bf1.txt b/Documentation/devicetree/bindings/display/panel/innolux,p120zdg-bf1.txt
deleted file mode 100644
index 513f03466aba..000000000000
--- a/Documentation/devicetree/bindings/display/panel/innolux,p120zdg-bf1.txt
+++ /dev/null
@@ -1,22 +0,0 @@
-Innolux P120ZDG-BF1 12.02 inch eDP 2K display panel
-
-This binding is compatible with the simple-panel binding, which is specified
-in simple-panel.txt in this directory.
-
-Required properties:
-- compatible: should be "innolux,p120zdg-bf1"
-- power-supply: regulator to provide the supply voltage
-
-Optional properties:
-- enable-gpios: GPIO pin to enable or disable the panel
-- backlight: phandle of the backlight device attached to the panel
-- no-hpd: If HPD isn't hooked up; add this property.
-
-Example:
- panel_edp: panel-edp {
- compatible = "innolux,p120zdg-bf1";
- enable-gpios = <&msmgpio 31 GPIO_ACTIVE_LOW>;
- power-supply = <&pm8916_l2>;
- backlight = <&backlight>;
- no-hpd;
- };
diff --git a/Documentation/devicetree/bindings/display/panel/innolux,p120zdg-bf1.yaml b/Documentation/devicetree/bindings/display/panel/innolux,p120zdg-bf1.yaml
new file mode 100644
index 000000000000..243dac2416f3
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/innolux,p120zdg-bf1.yaml
@@ -0,0 +1,43 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/innolux,p120zdg-bf1.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Innolux P120ZDG-BF1 12.02 inch eDP 2K display panel
+
+maintainers:
+ - Sandeep Panda <spanda@codeaurora.org>
+ - Douglas Anderson <dianders@chromium.org>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: innolux,p120zdg-bf1
+
+ enable-gpios: true
+ power-supply: true
+ backlight: true
+ no-hpd: true
+
+required:
+ - compatible
+ - power-supply
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ panel_edp: panel-edp {
+ compatible = "innolux,p120zdg-bf1";
+ enable-gpios = <&msmgpio 31 GPIO_ACTIVE_LOW>;
+ power-supply = <&pm8916_l2>;
+ backlight = <&backlight>;
+ no-hpd;
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/jdi,lt070me05000.txt b/Documentation/devicetree/bindings/display/panel/jdi,lt070me05000.txt
deleted file mode 100644
index 4989c91d505f..000000000000
--- a/Documentation/devicetree/bindings/display/panel/jdi,lt070me05000.txt
+++ /dev/null
@@ -1,31 +0,0 @@
-JDI model LT070ME05000 1200x1920 7" DSI Panel
-
-Required properties:
-- compatible: should be "jdi,lt070me05000"
-- vddp-supply: phandle of the regulator that provides the supply voltage
- Power IC supply (3-5V)
-- iovcc-supply: phandle of the regulator that provides the supply voltage
- IOVCC , power supply for LCM (1.8V)
-- enable-gpios: phandle of gpio for enable line
- LED_EN, LED backlight enable, High active
-- reset-gpios: phandle of gpio for reset line
- This should be 8mA, gpio can be configured using mux, pinctrl, pinctrl-names
- XRES, Reset, Low active
-- dcdc-en-gpios: phandle of the gpio for power ic line
- Power IC supply enable, High active
-
-Example:
-
- dsi0: qcom,mdss_dsi@4700000 {
- panel@0 {
- compatible = "jdi,lt070me05000";
- reg = <0>;
-
- vddp-supply = <&pm8921_l17>;
- iovcc-supply = <&pm8921_lvs7>;
-
- enable-gpios = <&pm8921_gpio 36 GPIO_ACTIVE_HIGH>;
- reset-gpios = <&tlmm_pinmux 54 GPIO_ACTIVE_LOW>;
- dcdc-en-gpios = <&pm8921_gpio 23 GPIO_ACTIVE_HIGH>;
- };
- };
diff --git a/Documentation/devicetree/bindings/display/panel/jdi,lt070me05000.yaml b/Documentation/devicetree/bindings/display/panel/jdi,lt070me05000.yaml
new file mode 100644
index 000000000000..b8b9435e464c
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/jdi,lt070me05000.yaml
@@ -0,0 +1,69 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/jdi,lt070me05000.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: JDI model LT070ME05000 1200x1920 7" DSI Panel
+
+maintainers:
+ - Vinay Simha BN <simhavcs@gmail.com>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: jdi,lt070me05000
+
+ enable-gpios: true
+ reg: true
+ reset-gpios: true
+
+ vddp-supply:
+ description: |
+ The regulator that provides the supply voltage Power IC supply (3-5V)
+
+ iovcc-supply:
+ description: |
+ The regulator that provides the supply voltage IOVCC,
+ power supply for LCM (1.8V)
+
+ dcdc-en-gpios:
+ description: |
+ phandle of the gpio for power ic line
+ Power IC supply enable, High active
+
+required:
+ - compatible
+ - reg
+ - vddp-supply
+ - iovcc-supply
+ - enable-gpios
+ - reset-gpios
+ - dcdc-en-gpios
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ dsi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel@0 {
+ compatible = "jdi,lt070me05000";
+ reg = <0>;
+
+ vddp-supply = <&pm8921_l17>;
+ iovcc-supply = <&pm8921_lvs7>;
+
+ enable-gpios = <&pm8921_gpio 36 GPIO_ACTIVE_HIGH>;
+ reset-gpios = <&tlmm_pinmux 54 GPIO_ACTIVE_LOW>;
+ dcdc-en-gpios = <&pm8921_gpio 23 GPIO_ACTIVE_HIGH>;
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/kingdisplay,kd035g6-54nt.txt b/Documentation/devicetree/bindings/display/panel/kingdisplay,kd035g6-54nt.txt
deleted file mode 100644
index fa9596082e44..000000000000
--- a/Documentation/devicetree/bindings/display/panel/kingdisplay,kd035g6-54nt.txt
+++ /dev/null
@@ -1,42 +0,0 @@
-King Display KD035G6-54NT 3.5" (320x240 pixels) 24-bit TFT LCD panel
-
-Required properties:
-- compatible: should be "kingdisplay,kd035g6-54nt"
-- power-supply: See panel-common.txt
-- reset-gpios: See panel-common.txt
-
-Optional properties:
-- backlight: see panel-common.txt
-
-The generic bindings for the SPI slaves documented in [1] also apply.
-
-The device node can contain one 'port' child node with one child
-'endpoint' node, according to the bindings defined in [2]. This
-node should describe panel's video bus.
-
-[1]: Documentation/devicetree/bindings/spi/spi-bus.txt
-[2]: Documentation/devicetree/bindings/graph.txt
-
-Example:
-
-&spi {
- panel@0 {
- compatible = "kingdisplay,kd035g6-54nt";
- reg = <0>;
-
- spi-max-frequency = <3125000>;
- spi-3wire;
- spi-cs-high;
-
- reset-gpios = <&gpe 2 GPIO_ACTIVE_LOW>;
-
- backlight = <&backlight>;
- power-supply = <&ldo6>;
-
- port {
- panel_input: endpoint {
- remote-endpoint = <&panel_output>;
- };
- };
- };
-};
diff --git a/Documentation/devicetree/bindings/display/panel/kingdisplay,kd035g6-54nt.yaml b/Documentation/devicetree/bindings/display/panel/kingdisplay,kd035g6-54nt.yaml
new file mode 100644
index 000000000000..6960036975fa
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/kingdisplay,kd035g6-54nt.yaml
@@ -0,0 +1,65 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/kingdisplay,kd035g6-54nt.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: King Display KD035G6-54NT 3.5" (320x240 pixels) 24-bit TFT LCD panel
+
+description: |
+ The panel must obey the rules for a SPI slave device as specified in
+ spi/spi-controller.yaml
+
+maintainers:
+ - Paul Cercueil <paul@crapouillou.net>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: kingdisplay,kd035g6-54nt
+
+ backlight: true
+ port: true
+ power-supply: true
+ reg: true
+ reset-gpios: true
+
+required:
+ - compatible
+ - power-supply
+ - reset-gpios
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ spi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel@0 {
+ compatible = "kingdisplay,kd035g6-54nt";
+ reg = <0>;
+
+ spi-max-frequency = <3125000>;
+ spi-3wire;
+ spi-cs-high;
+
+ reset-gpios = <&gpe 2 GPIO_ACTIVE_LOW>;
+
+ backlight = <&backlight>;
+ power-supply = <&ldo6>;
+
+ port {
+ panel_input: endpoint {
+ remote-endpoint = <&panel_output>;
+ };
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/kingdisplay,kd097d04.txt b/Documentation/devicetree/bindings/display/panel/kingdisplay,kd097d04.txt
deleted file mode 100644
index cfefff688614..000000000000
--- a/Documentation/devicetree/bindings/display/panel/kingdisplay,kd097d04.txt
+++ /dev/null
@@ -1,22 +0,0 @@
-Kingdisplay KD097D04 9.7" 1536x2048 TFT LCD panel
-
-Required properties:
-- compatible: should be "kingdisplay,kd097d04"
-- reg: DSI virtual channel of the peripheral
-- power-supply: phandle of the regulator that provides the supply voltage
-- enable-gpios: panel enable gpio
-
-Optional properties:
-- backlight: phandle of the backlight device attached to the panel
-
-Example:
-
- &mipi_dsi {
- panel@0 {
- compatible = "kingdisplay,kd097d04";
- reg = <0>;
- power-supply = <...>;
- backlight = <&backlight>;
- enable-gpios = <&gpio1 13 GPIO_ACTIVE_HIGH>;
- };
- };
diff --git a/Documentation/devicetree/bindings/display/panel/leadtek,ltk050h3146w.yaml b/Documentation/devicetree/bindings/display/panel/leadtek,ltk050h3146w.yaml
new file mode 100644
index 000000000000..a372bdc5bde1
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/leadtek,ltk050h3146w.yaml
@@ -0,0 +1,51 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/leadtek,ltk050h3146w.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Leadtek LTK050H3146W 5.0in 720x1280 DSI panel
+
+maintainers:
+ - Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ enum:
+ - leadtek,ltk050h3146w
+ - leadtek,ltk050h3146w-a2
+ reg: true
+ backlight: true
+ reset-gpios: true
+ iovcc-supply:
+ description: regulator that supplies the iovcc voltage
+ vci-supply:
+ description: regulator that supplies the vci voltage
+
+required:
+ - compatible
+ - reg
+ - backlight
+ - iovcc-supply
+ - vci-supply
+
+additionalProperties: false
+
+examples:
+ - |
+ dsi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ panel@0 {
+ compatible = "leadtek,ltk050h3146w";
+ reg = <0>;
+ backlight = <&backlight>;
+ iovcc-supply = <&vcc_1v8>;
+ vci-supply = <&vcc3v3_lcd>;
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/leadtek,ltk500hd1829.yaml b/Documentation/devicetree/bindings/display/panel/leadtek,ltk500hd1829.yaml
index fd931b293816..b900973b5f7b 100644
--- a/Documentation/devicetree/bindings/display/panel/leadtek,ltk500hd1829.yaml
+++ b/Documentation/devicetree/bindings/display/panel/leadtek,ltk500hd1829.yaml
@@ -37,7 +37,6 @@ examples:
dsi {
#address-cells = <1>;
#size-cells = <0>;
- reg = <0xff450000 0x1000>;
panel@0 {
compatible = "leadtek,ltk500hd1829";
diff --git a/Documentation/devicetree/bindings/display/panel/lg,acx467akm-7.txt b/Documentation/devicetree/bindings/display/panel/lg,acx467akm-7.txt
deleted file mode 100644
index fc1e1b325e49..000000000000
--- a/Documentation/devicetree/bindings/display/panel/lg,acx467akm-7.txt
+++ /dev/null
@@ -1,7 +0,0 @@
-LG ACX467AKM-7 4.95" 1080×1920 LCD Panel
-
-Required properties:
-- compatible: must be "lg,acx467akm-7"
-
-This binding is compatible with the simple-panel binding, which is specified
-in simple-panel.txt in this directory.
diff --git a/Documentation/devicetree/bindings/display/panel/lg,ld070wx3-sl01.txt b/Documentation/devicetree/bindings/display/panel/lg,ld070wx3-sl01.txt
deleted file mode 100644
index 5e649cb9aa1a..000000000000
--- a/Documentation/devicetree/bindings/display/panel/lg,ld070wx3-sl01.txt
+++ /dev/null
@@ -1,7 +0,0 @@
-LG Corporation 7" WXGA TFT LCD panel
-
-Required properties:
-- compatible: should be "lg,ld070wx3-sl01"
-
-This binding is compatible with the simple-panel binding, which is specified
-in simple-panel.txt in this directory.
diff --git a/Documentation/devicetree/bindings/display/panel/lg,lg4573.txt b/Documentation/devicetree/bindings/display/panel/lg,lg4573.txt
deleted file mode 100644
index 824441f4e95a..000000000000
--- a/Documentation/devicetree/bindings/display/panel/lg,lg4573.txt
+++ /dev/null
@@ -1,19 +0,0 @@
-LG LG4573 TFT Liquid Crystal Display with SPI control bus
-
-Required properties:
- - compatible: "lg,lg4573"
- - reg: address of the panel on the SPI bus
-
-The panel must obey rules for SPI slave device specified in document [1].
-
-[1]: Documentation/devicetree/bindings/spi/spi-bus.txt
-
-Example:
-
- lcd_panel: display@0 {
- #address-cells = <1>;
- #size-cells = <1>;
- compatible = "lg,lg4573";
- spi-max-frequency = <10000000>;
- reg = <0>;
- };
diff --git a/Documentation/devicetree/bindings/display/panel/lg,lg4573.yaml b/Documentation/devicetree/bindings/display/panel/lg,lg4573.yaml
new file mode 100644
index 000000000000..b4314ce7b411
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/lg,lg4573.yaml
@@ -0,0 +1,45 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/lg,lg4573.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: LG LG4573 TFT Liquid Crystal Display with SPI control bus
+
+description: |
+ The panel must obey the rules for a SPI slave device as specified in
+ spi/spi-controller.yaml
+
+maintainers:
+ - Heiko Schocher <hs@denx.de>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: lg,lg4573
+
+ reg: true
+ spi-max-frequency: true
+
+required:
+ - compatible
+ - reg
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ spi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ lcd_panel: display@0 {
+ compatible = "lg,lg4573";
+ spi-max-frequency = <10000000>;
+ reg = <0>;
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/lg,lh500wx1-sd03.txt b/Documentation/devicetree/bindings/display/panel/lg,lh500wx1-sd03.txt
deleted file mode 100644
index a04fd2b2e73d..000000000000
--- a/Documentation/devicetree/bindings/display/panel/lg,lh500wx1-sd03.txt
+++ /dev/null
@@ -1,7 +0,0 @@
-LG Corporation 5" HD TFT LCD panel
-
-Required properties:
-- compatible: should be "lg,lh500wx1-sd03"
-
-This binding is compatible with the simple-panel binding, which is specified
-in simple-panel.txt in this directory.
diff --git a/Documentation/devicetree/bindings/display/panel/lgphilips,lb035q02.txt b/Documentation/devicetree/bindings/display/panel/lgphilips,lb035q02.txt
deleted file mode 100644
index 1a1e653e5407..000000000000
--- a/Documentation/devicetree/bindings/display/panel/lgphilips,lb035q02.txt
+++ /dev/null
@@ -1,33 +0,0 @@
-LG.Philips LB035Q02 Panel
-=========================
-
-Required properties:
-- compatible: "lgphilips,lb035q02"
-- enable-gpios: panel enable gpio
-
-Optional properties:
-- label: a symbolic name for the panel
-
-Required nodes:
-- Video port for DPI input
-
-Example
--------
-
-lcd-panel: panel@0 {
- compatible = "lgphilips,lb035q02";
- reg = <0>;
- spi-max-frequency = <100000>;
- spi-cpol;
- spi-cpha;
-
- label = "lcd";
-
- enable-gpios = <&gpio7 7 0>;
-
- port {
- lcd_in: endpoint {
- remote-endpoint = <&dpi_out>;
- };
- };
-};
diff --git a/Documentation/devicetree/bindings/display/panel/lgphilips,lb035q02.yaml b/Documentation/devicetree/bindings/display/panel/lgphilips,lb035q02.yaml
new file mode 100644
index 000000000000..830e335ddb53
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/lgphilips,lb035q02.yaml
@@ -0,0 +1,59 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/lgphilips,lb035q02.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: LG.Philips LB035Q02 Panel
+
+description: |
+ The panel must obey the rules for a SPI slave device as specified in
+ spi/spi-controller.yaml
+
+maintainers:
+ - Tomi Valkeinen <tomi.valkeinen@ti.com>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: lgphilips,lb035q02
+
+ label: true
+ enable-gpios: true
+ port: true
+
+required:
+ - compatible
+ - enable-gpios
+ - port
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ spi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel: panel@0 {
+ compatible = "lgphilips,lb035q02";
+ reg = <0>;
+ spi-max-frequency = <100000>;
+ spi-cpol;
+ spi-cpha;
+
+ label = "lcd";
+
+ enable-gpios = <&gpio7 7 0>;
+
+ port {
+ lcd_in: endpoint {
+ remote-endpoint = <&dpi_out>;
+ };
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/lvds.yaml b/Documentation/devicetree/bindings/display/panel/lvds.yaml
index d0083301acbe..946dd354256c 100644
--- a/Documentation/devicetree/bindings/display/panel/lvds.yaml
+++ b/Documentation/devicetree/bindings/display/panel/lvds.yaml
@@ -96,12 +96,20 @@ properties:
If set, reverse the bit order described in the data mappings below on all
data lanes, transmitting bits for slots 6 to 0 instead of 0 to 6.
+ port: true
+ ports: true
+
required:
- compatible
- data-mapping
- width-mm
- height-mm
- panel-timing
- - port
+
+oneOf:
+ - required:
+ - port
+ - required:
+ - ports
...
diff --git a/Documentation/devicetree/bindings/display/panel/olimex,lcd-olinuxino.txt b/Documentation/devicetree/bindings/display/panel/olimex,lcd-olinuxino.txt
deleted file mode 100644
index a89f9c830a85..000000000000
--- a/Documentation/devicetree/bindings/display/panel/olimex,lcd-olinuxino.txt
+++ /dev/null
@@ -1,42 +0,0 @@
-Binding for Olimex Ltd. LCD-OLinuXino bridge panel.
-
-This device can be used as bridge between a host controller and LCD panels.
-Currently supported LCDs are:
- - LCD-OLinuXino-4.3TS
- - LCD-OLinuXino-5
- - LCD-OLinuXino-7
- - LCD-OLinuXino-10
-
-The panel itself contains:
- - AT24C16C EEPROM holding panel identification and timing requirements
- - AR1021 resistive touch screen controller (optional)
- - FT5x6 capacitive touch screnn controller (optional)
- - GT911/GT928 capacitive touch screen controller (optional)
-
-The above chips share same I2C bus. The EEPROM is factory preprogrammed with
-device information (id, serial, etc.) and timing requirements.
-
-Touchscreen bingings can be found in these files:
- - input/touchscreen/goodix.txt
- - input/touchscreen/edt-ft5x06.txt
- - input/touchscreen/ar1021.txt
-
-Required properties:
- - compatible: should be "olimex,lcd-olinuxino"
- - reg: address of the configuration EEPROM, should be <0x50>
- - power-supply: phandle of the regulator that provides the supply voltage
-
-Optional properties:
- - enable-gpios: GPIO pin to enable or disable the panel
- - backlight: phandle of the backlight device attacked to the panel
-
-Example:
-&i2c2 {
- panel@50 {
- compatible = "olimex,lcd-olinuxino";
- reg = <0x50>;
- power-supply = <&reg_vcc5v0>;
- enable-gpios = <&pio 7 8 GPIO_ACTIVE_HIGH>;
- backlight = <&backlight>;
- };
-};
diff --git a/Documentation/devicetree/bindings/display/panel/olimex,lcd-olinuxino.yaml b/Documentation/devicetree/bindings/display/panel/olimex,lcd-olinuxino.yaml
new file mode 100644
index 000000000000..2329d9610f83
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/olimex,lcd-olinuxino.yaml
@@ -0,0 +1,70 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/olimex,lcd-olinuxino.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Binding for Olimex Ltd. LCD-OLinuXino bridge panel.
+
+maintainers:
+ - Stefan Mavrodiev <stefan@olimex.com>
+
+description: |
+ This device can be used as bridge between a host controller and LCD panels.
+ Currently supported LCDs are:
+ - LCD-OLinuXino-4.3TS
+ - LCD-OLinuXino-5
+ - LCD-OLinuXino-7
+ - LCD-OLinuXino-10
+
+ The panel itself contains:
+ - AT24C16C EEPROM holding panel identification and timing requirements
+ - AR1021 resistive touch screen controller (optional)
+ - FT5x6 capacitive touch screnn controller (optional)
+ - GT911/GT928 capacitive touch screen controller (optional)
+
+ The above chips share same I2C bus. The EEPROM is factory preprogrammed with
+ device information (id, serial, etc.) and timing requirements.
+
+ Touchscreen bingings can be found in these files:
+ - input/touchscreen/goodix.yaml
+ - input/touchscreen/edt-ft5x06.txt
+ - input/touchscreen/ar1021.txt
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: olimex,lcd-olinuxino
+
+ backlight: true
+ enable-gpios: true
+ power-supply: true
+ reg: true
+
+required:
+ - compatible
+ - reg
+ - power-supply
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ i2c {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel@50 {
+ compatible = "olimex,lcd-olinuxino";
+ reg = <0x50>;
+ power-supply = <&reg_vcc5v0>;
+ enable-gpios = <&pio 7 8 GPIO_ACTIVE_HIGH>;
+ backlight = <&backlight>;
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/osddisplays,osd101t2587-53ts.txt b/Documentation/devicetree/bindings/display/panel/osddisplays,osd101t2587-53ts.txt
deleted file mode 100644
index 9d88e96003fc..000000000000
--- a/Documentation/devicetree/bindings/display/panel/osddisplays,osd101t2587-53ts.txt
+++ /dev/null
@@ -1,14 +0,0 @@
-One Stop Displays OSD101T2587-53TS 10.1" 1920x1200 panel
-
-The panel is similar to OSD101T2045-53TS, but it needs additional
-MIPI_DSI_TURN_ON_PERIPHERAL message from the host.
-
-Required properties:
-- compatible: should be "osddisplays,osd101t2587-53ts"
-- power-supply: as specified in the base binding
-
-Optional properties:
-- backlight: as specified in the base binding
-
-This binding is compatible with the simple-panel binding, which is specified
-in simple-panel.txt in this directory.
diff --git a/Documentation/devicetree/bindings/display/panel/panel-common.yaml b/Documentation/devicetree/bindings/display/panel/panel-common.yaml
index ed051ba12084..a747b755ad06 100644
--- a/Documentation/devicetree/bindings/display/panel/panel-common.yaml
+++ b/Documentation/devicetree/bindings/display/panel/panel-common.yaml
@@ -63,9 +63,9 @@ properties:
display-timings:
description:
- Some display panels supports several resolutions with different timing.
+ Some display panels support several resolutions with different timings.
The display-timings bindings supports specifying several timings and
- optional specify which is the native mode.
+ optionally specifying which is the native mode.
allOf:
- $ref: display-timings.yaml#
@@ -96,6 +96,12 @@ properties:
(hot plug detect) signal, but the signal isn't hooked up so we should
hardcode the max delay from the panel spec when powering up the panel.
+ hpd-gpios:
+ maxItems: 1
+ description:
+ If Hot Plug Detect (HPD) is connected to a GPIO in the system rather
+ than a dedicated HPD pin the pin can be specified here.
+
# Control I/Os
# Many display panels can be controlled through pins driven by GPIOs. The nature
@@ -124,6 +130,13 @@ properties:
while active. Active high reset signals can be supported by inverting the
GPIO specifier polarity flag.
+ te-gpios:
+ maxItems: 1
+ description:
+ GPIO spec for the tearing effect synchronization signal.
+ The tearing effect signal is active high. Active low signals can be
+ supported by inverting the GPIO specifier polarity flag.
+
# Power
power-supply:
description:
diff --git a/Documentation/devicetree/bindings/display/panel/panel-simple-dsi.yaml b/Documentation/devicetree/bindings/display/panel/panel-simple-dsi.yaml
index b2e8742fd6af..16778ce782fc 100644
--- a/Documentation/devicetree/bindings/display/panel/panel-simple-dsi.yaml
+++ b/Documentation/devicetree/bindings/display/panel/panel-simple-dsi.yaml
@@ -29,6 +29,20 @@ properties:
# compatible must be listed in alphabetical order, ordered by compatible.
# The description in the comment is mandatory for each compatible.
+ # AU Optronics Corporation 8.0" WUXGA TFT LCD panel
+ - auo,b080uan01
+ # Boe Corporation 8.0" WUXGA TFT LCD panel
+ - boe,tv080wum-nl0
+ # Kingdisplay KD097D04 9.7" 1536x2048 TFT LCD panel
+ - kingdisplay,kd097d04
+ # LG ACX467AKM-7 4.95" 1080×1920 LCD Panel
+ - lg,acx467akm-7
+ # LG Corporation 7" WXGA TFT LCD panel
+ - lg,ld070wx3-sl01
+ # One Stop Displays OSD101T2587-53TS 10.1" 1920x1200 panel
+ - osddisplays,osd101t2587-53ts
+ # Panasonic 10" WUXGA TFT LCD panel
+ - panasonic,vvx10f004b00
# Panasonic 10" WUXGA TFT LCD panel
- panasonic,vvx10f034n00
diff --git a/Documentation/devicetree/bindings/display/panel/panel-simple.yaml b/Documentation/devicetree/bindings/display/panel/panel-simple.yaml
index 393ffc6acbba..d6cca1479633 100644
--- a/Documentation/devicetree/bindings/display/panel/panel-simple.yaml
+++ b/Documentation/devicetree/bindings/display/panel/panel-simple.yaml
@@ -33,8 +33,6 @@ properties:
- ampire,am-480272h3tmqw-t01h
# Ampire AM-800480R3TMQW-A1H 7.0" WVGA TFT LCD panel
- ampire,am800480r3tmqwa1h
- # AU Optronics Corporation 8.0" WUXGA TFT LCD panel
- - auo,b080uan01
# AU Optronics Corporation 10.1" WSVGA TFT LCD panel
- auo,b101aw03
# AU Optronics Corporation 10.1" WSVGA TFT LCD panel
@@ -55,10 +53,16 @@ properties:
- auo,g101evn010
# AU Optronics Corporation 10.4" (800x600) color TFT LCD panel
- auo,g104sn02
+ # AU Optronics Corporation 12.1" (1280x800) TFT LCD panel
+ - auo,g121ean01
# AU Optronics Corporation 13.3" FHD (1920x1080) TFT LCD panel
- auo,g133han01
+ # AU Optronics Corporation 15.6" (1366x768) TFT LCD panel
+ - auo,g156xtn01
# AU Optronics Corporation 18.5" FHD (1920x1080) TFT LCD panel
- auo,g185han01
+ # AU Optronics Corporation 19.0" (1280x1024) TFT LCD panel
+ - auo,g190ean01
# AU Optronics Corporation 31.5" FHD (1920x1080) TFT LCD panel
- auo,p320hvn03
# AU Optronics Corporation 21.5" FHD (1920x1080) color TFT LCD panel
@@ -69,10 +73,12 @@ properties:
- boe,hv070wsa-100
# BOE OPTOELECTRONICS TECHNOLOGY 10.1" WXGA TFT LCD panel
- boe,nv101wxmn51
+ # BOE NV133FHM-N61 13.3" FHD (1920x1080) TFT LCD Panel
+ - boe,nv133fhm-n61
+ # BOE NV133FHM-N62 13.3" FHD (1920x1080) TFT LCD Panel
+ - boe,nv133fhm-n62
# BOE NV140FHM-N49 14.0" FHD a-Si FT panel
- boe,nv140fhmn49
- # Boe Corporation 8.0" WUXGA TFT LCD panel
- - boe,tv080wum-nl0
# CDTech(H.K.) Electronics Limited 4.3" 480x272 color TFT-LCD panel
- cdtech,s043wq26h-ct7
# CDTech(H.K.) Electronics Limited 7" 800x480 color TFT-LCD panel
@@ -82,6 +88,8 @@ properties:
# Chunghwa Picture Tubes Ltd. 10.1" WXGA TFT LCD panel
- chunghwa,claa101wa01a
# Chunghwa Picture Tubes Ltd. 10.1" WXGA TFT LCD panel
+ - chunghwa,claa101wb01
+ # Chunghwa Picture Tubes Ltd. 10.1" WXGA TFT LCD panel
- chunghwa,claa101wb03
# DataImage, Inc. 7" WVGA (800x480) TFT LCD panel with 24-bit parallel interface.
- dataimage,scf0700c48ggu18
@@ -127,6 +135,8 @@ properties:
- hannstar,hsd100pxn1
# Hitachi Ltd. Corporation 9" WVGA (800x480) TFT LCD panel
- hit,tx23d38vm0caa
+ # InfoVision Optoelectronics M133NWF4 R0 13.3" FHD (1920x1080) TFT LCD panel
+ - ivo,m133nwf4-r0
# Innolux AT043TN24 4.3" WQVGA TFT LCD panel
- innolux,at043tn24
# Innolux AT070TN92 7.0" WQVGA TFT LCD panel
@@ -155,6 +165,8 @@ properties:
- lemaker,bl035-rgb-002
# LG 7" (800x480 pixels) TFT LCD panel
- lg,lb070wv8
+ # LG Corporation 5" HD TFT LCD panel
+ - lg,lh500wx1-sd03
# LG LP079QX1-SP0V 7.9" (1536x2048 pixels) TFT LCD panel
- lg,lp079qx1-sp0v
# LG 9.7" (2048x1536 pixels) TFT LCD panel
@@ -227,6 +239,8 @@ properties:
- sharp,ls020b1dd01d
# Shelly SCA07010-BFN-LNN 7.0" WVGA TFT LCD panel
- shelly,sca07010-bfn-lnn
+ # Starry KR070PE2T 7" WVGA TFT LCD panel
+ - starry,kr070pe2t
# Starry 12.2" (1920x1200 pixels) TFT LCD panel
- starry,kr122ea0sra
# Tianma Micro-electronics TM070JDHG30 7.0" WXGA TFT LCD panel
diff --git a/Documentation/devicetree/bindings/display/panel/raydium,rm67191.txt b/Documentation/devicetree/bindings/display/panel/raydium,rm67191.txt
deleted file mode 100644
index 10424695aa02..000000000000
--- a/Documentation/devicetree/bindings/display/panel/raydium,rm67191.txt
+++ /dev/null
@@ -1,41 +0,0 @@
-Raydium RM67171 OLED LCD panel with MIPI-DSI protocol
-
-Required properties:
-- compatible: "raydium,rm67191"
-- reg: virtual channel for MIPI-DSI protocol
- must be <0>
-- dsi-lanes: number of DSI lanes to be used
- must be <3> or <4>
-- port: input port node with endpoint definition as
- defined in Documentation/devicetree/bindings/graph.txt;
- the input port should be connected to a MIPI-DSI device
- driver
-
-Optional properties:
-- reset-gpios: a GPIO spec for the RST_B GPIO pin
-- v3p3-supply: phandle to 3.3V regulator that powers the VDD_3V3 pin
-- v1p8-supply: phandle to 1.8V regulator that powers the VDD_1V8 pin
-- width-mm: see panel-common.txt
-- height-mm: see panel-common.txt
-- video-mode: 0 - burst-mode
- 1 - non-burst with sync event
- 2 - non-burst with sync pulse
-
-Example:
-
- panel@0 {
- compatible = "raydium,rm67191";
- reg = <0>;
- pinctrl-0 = <&pinctrl_mipi_dsi_0_1_en>;
- pinctrl-names = "default";
- reset-gpios = <&gpio1 7 GPIO_ACTIVE_LOW>;
- dsi-lanes = <4>;
- width-mm = <68>;
- height-mm = <121>;
-
- port {
- panel_in: endpoint {
- remote-endpoint = <&mipi_out>;
- };
- };
- };
diff --git a/Documentation/devicetree/bindings/display/panel/raydium,rm67191.yaml b/Documentation/devicetree/bindings/display/panel/raydium,rm67191.yaml
new file mode 100644
index 000000000000..745dd247c409
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/raydium,rm67191.yaml
@@ -0,0 +1,75 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/raydium,rm67191.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Raydium RM67171 OLED LCD panel with MIPI-DSI protocol
+
+maintainers:
+ - Robert Chiras <robert.chiras@nxp.com>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: raydium,rm67191
+
+ reg: true
+ port: true
+ reset-gpios: true
+ width-mm: true
+ height-mm: true
+
+ dsi-lanes:
+ description: Number of DSI lanes to be used must be <3> or <4>
+ enum: [3, 4]
+
+ v3p3-supply:
+ description: phandle to 3.3V regulator that powers the VDD_3V3 pin
+
+ v1p8-supply:
+ description: phandle to 1.8V regulator that powers the VDD_1V8 pin
+
+ video-mode:
+ description: |
+ 0 - burst-mode
+ 1 - non-burst with sync event
+ 2 - non-burst with sync pulse
+ enum: [0, 1, 2]
+
+required:
+ - compatible
+ - reg
+ - dsi-lanes
+ - port
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ dsi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel@0 {
+ compatible = "raydium,rm67191";
+ reg = <0>;
+ reset-gpios = <&gpio1 7 GPIO_ACTIVE_LOW>;
+ dsi-lanes = <4>;
+ width-mm = <68>;
+ height-mm = <121>;
+ video-mode = <1>;
+
+ port {
+ panel_in: endpoint {
+ remote-endpoint = <&mipi_out>;
+ };
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/samsung,amoled-mipi-dsi.yaml b/Documentation/devicetree/bindings/display/panel/samsung,amoled-mipi-dsi.yaml
new file mode 100644
index 000000000000..96bdde9298e0
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/samsung,amoled-mipi-dsi.yaml
@@ -0,0 +1,65 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/samsung,amoled-mipi-dsi.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Samsung AMOLED MIPI-DSI panels
+
+maintainers:
+ - Hoegeun Kwon <hoegeun.kwon@samsung.com>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ enum:
+ # Samsung S6E63J0X03 1.63" 320x320 AMOLED panel
+ - samsung,s6e63j0x03
+ # Samsung S6E3HA2 5.7" 1440x2560 AMOLED panel
+ - samsung,s6e3ha2
+ # Samsung S6E3HF2 5.65" 1600x2560 AMOLED panel
+ - samsung,s6e3hf2
+
+ reg: true
+ reset-gpios: true
+ enable-gpios: true
+ te-gpios: true
+
+ vdd3-supply:
+ description: I/O voltage supply
+
+ vci-supply:
+ description: voltage supply for analog circuits
+
+required:
+ - compatible
+ - reg
+ - vdd3-supply
+ - vci-supply
+ - reset-gpios
+ - enable-gpios
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ dsi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel@0 {
+ compatible = "samsung,s6e3ha2";
+ reg = <0>;
+ vdd3-supply = <&ldo27_reg>;
+ vci-supply = <&ldo28_reg>;
+ reset-gpios = <&gpg0 0 GPIO_ACTIVE_LOW>;
+ enable-gpios = <&gpf1 5 GPIO_ACTIVE_HIGH>;
+ te-gpios = <&gpf1 3 GPIO_ACTIVE_HIGH>;
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/samsung,ld9040.txt b/Documentation/devicetree/bindings/display/panel/samsung,ld9040.txt
deleted file mode 100644
index 354d4d1df4ff..000000000000
--- a/Documentation/devicetree/bindings/display/panel/samsung,ld9040.txt
+++ /dev/null
@@ -1,66 +0,0 @@
-Samsung LD9040 AMOLED LCD parallel RGB panel with SPI control bus
-
-Required properties:
- - compatible: "samsung,ld9040"
- - reg: address of the panel on SPI bus
- - vdd3-supply: core voltage supply
- - vci-supply: voltage supply for analog circuits
- - reset-gpios: a GPIO spec for the reset pin
- - display-timings: timings for the connected panel according to [1]
-
-The panel must obey rules for SPI slave device specified in document [2].
-
-Optional properties:
- - power-on-delay: delay after turning regulators on [ms]
- - reset-delay: delay after reset sequence [ms]
- - panel-width-mm: physical panel width [mm]
- - panel-height-mm: physical panel height [mm]
-
-The device node can contain one 'port' child node with one child
-'endpoint' node, according to the bindings defined in [3]. This
-node should describe panel's video bus.
-
-[1]: Documentation/devicetree/bindings/display/panel/display-timing.txt
-[2]: Documentation/devicetree/bindings/spi/spi-bus.txt
-[3]: Documentation/devicetree/bindings/media/video-interfaces.txt
-
-Example:
-
- lcd@0 {
- compatible = "samsung,ld9040";
- reg = <0>;
- vdd3-supply = <&ldo7_reg>;
- vci-supply = <&ldo17_reg>;
- reset-gpios = <&gpy4 5 0>;
- spi-max-frequency = <1200000>;
- spi-cpol;
- spi-cpha;
- power-on-delay = <10>;
- reset-delay = <10>;
- panel-width-mm = <90>;
- panel-height-mm = <154>;
-
- display-timings {
- timing {
- clock-frequency = <23492370>;
- hactive = <480>;
- vactive = <800>;
- hback-porch = <16>;
- hfront-porch = <16>;
- vback-porch = <2>;
- vfront-porch = <28>;
- hsync-len = <2>;
- vsync-len = <1>;
- hsync-active = <0>;
- vsync-active = <0>;
- de-active = <0>;
- pixelclk-active = <0>;
- };
- };
-
- port {
- lcd_ep: endpoint {
- remote-endpoint = <&fimd_dpi_ep>;
- };
- };
- };
diff --git a/Documentation/devicetree/bindings/display/panel/samsung,ld9040.yaml b/Documentation/devicetree/bindings/display/panel/samsung,ld9040.yaml
new file mode 100644
index 000000000000..060ee27a4749
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/samsung,ld9040.yaml
@@ -0,0 +1,107 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/samsung,ld9040.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Samsung LD9040 AMOLED LCD parallel RGB panel with SPI control bus
+
+description: |
+ The panel must obey the rules for a SPI slave device as specified in
+ spi/spi-controller.yaml
+
+maintainers:
+ - Andrzej Hajda <a.hajda@samsung.com>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: samsung,ld9040
+
+ display-timings: true
+ port: true
+ reg: true
+ reset-gpios: true
+
+ vdd3-supply:
+ description: core voltage supply
+
+ vci-supply:
+ description: voltage supply for analog circuits
+
+ power-on-delay:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description: delay after turning regulators on [ms]
+
+ reset-delay:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description: delay after reset sequence [ms]
+
+ panel-width-mm:
+ description: physical panel width [mm]
+
+ panel-height-mm:
+ description: physical panel height [mm]
+
+required:
+ - compatible
+ - reg
+ - vdd3-supply
+ - vci-supply
+ - reset-gpios
+ - display-timings
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ spi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ lcd@0 {
+ compatible = "samsung,ld9040";
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ reg = <0>;
+ vdd3-supply = <&ldo7_reg>;
+ vci-supply = <&ldo17_reg>;
+ reset-gpios = <&gpy4 5 0>;
+ spi-max-frequency = <1200000>;
+ spi-cpol;
+ spi-cpha;
+ power-on-delay = <10>;
+ reset-delay = <10>;
+ panel-width-mm = <90>;
+ panel-height-mm = <154>;
+
+ display-timings {
+ timing {
+ clock-frequency = <23492370>;
+ hactive = <480>;
+ vactive = <800>;
+ hback-porch = <16>;
+ hfront-porch = <16>;
+ vback-porch = <2>;
+ vfront-porch = <28>;
+ hsync-len = <2>;
+ vsync-len = <1>;
+ hsync-active = <0>;
+ vsync-active = <0>;
+ de-active = <0>;
+ pixelclk-active = <0>;
+ };
+ };
+
+ port {
+ lcd_ep: endpoint {
+ remote-endpoint = <&fimd_dpi_ep>;
+ };
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/samsung,s6d16d0.txt b/Documentation/devicetree/bindings/display/panel/samsung,s6d16d0.txt
deleted file mode 100644
index b94e366f451b..000000000000
--- a/Documentation/devicetree/bindings/display/panel/samsung,s6d16d0.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-Samsung S6D16D0 4" 864x480 AMOLED panel
-
-Required properties:
- - compatible: should be:
- "samsung,s6d16d0",
- - reg: the virtual channel number of a DSI peripheral
- - vdd1-supply: I/O voltage supply
- - reset-gpios: a GPIO spec for the reset pin (active low)
-
-The device node can contain one 'port' child node with one child
-'endpoint' node, according to the bindings defined in
-media/video-interfaces.txt. This node should describe panel's video bus.
-
-Example:
-&dsi {
- ...
-
- panel@0 {
- compatible = "samsung,s6d16d0";
- reg = <0>;
- vdd1-supply = <&foo>;
- reset-gpios = <&foo_gpio 0 GPIO_ACTIVE_LOW>;
-
- port {
- panel_in: endpoint {
- remote-endpoint = <&dsi_out>;
- };
- };
- };
-};
diff --git a/Documentation/devicetree/bindings/display/panel/samsung,s6d16d0.yaml b/Documentation/devicetree/bindings/display/panel/samsung,s6d16d0.yaml
new file mode 100644
index 000000000000..66d147496bc3
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/samsung,s6d16d0.yaml
@@ -0,0 +1,56 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/samsung,s6d16d0.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Samsung S6D16D0 4" 864x480 AMOLED panel
+
+maintainers:
+ - Linus Walleij <linus.walleij@linaro.org>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: samsung,s6d16d0
+
+ port: true
+ reg: true
+ reset-gpios: true
+
+ vdd1-supply:
+ description: I/O voltage supply
+
+required:
+ - compatible
+ - reg
+ - vdd1-supply
+ - reset-gpios
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ dsi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel@0 {
+ compatible = "samsung,s6d16d0";
+ reg = <0>;
+ vdd1-supply = <&foo>;
+ reset-gpios = <&foo_gpio 0 GPIO_ACTIVE_LOW>;
+
+ port {
+ panel_in: endpoint {
+ remote-endpoint = <&dsi_out>;
+ };
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/samsung,s6e3ha2.txt b/Documentation/devicetree/bindings/display/panel/samsung,s6e3ha2.txt
deleted file mode 100644
index 4acea25c244b..000000000000
--- a/Documentation/devicetree/bindings/display/panel/samsung,s6e3ha2.txt
+++ /dev/null
@@ -1,31 +0,0 @@
-Samsung S6E3HA2 5.7" 1440x2560 AMOLED panel
-Samsung S6E3HF2 5.65" 1600x2560 AMOLED panel
-
-Required properties:
- - compatible: should be one of:
- "samsung,s6e3ha2",
- "samsung,s6e3hf2".
- - reg: the virtual channel number of a DSI peripheral
- - vdd3-supply: I/O voltage supply
- - vci-supply: voltage supply for analog circuits
- - reset-gpios: a GPIO spec for the reset pin (active low)
- - enable-gpios: a GPIO spec for the panel enable pin (active high)
-
-Optional properties:
- - te-gpios: a GPIO spec for the tearing effect synchronization signal
- gpio pin (active high)
-
-Example:
-&dsi {
- ...
-
- panel@0 {
- compatible = "samsung,s6e3ha2";
- reg = <0>;
- vdd3-supply = <&ldo27_reg>;
- vci-supply = <&ldo28_reg>;
- reset-gpios = <&gpg0 0 GPIO_ACTIVE_LOW>;
- enable-gpios = <&gpf1 5 GPIO_ACTIVE_HIGH>;
- te-gpios = <&gpf1 3 GPIO_ACTIVE_HIGH>;
- };
-};
diff --git a/Documentation/devicetree/bindings/display/panel/samsung,s6e63j0x03.txt b/Documentation/devicetree/bindings/display/panel/samsung,s6e63j0x03.txt
deleted file mode 100644
index 3f1a8392af7f..000000000000
--- a/Documentation/devicetree/bindings/display/panel/samsung,s6e63j0x03.txt
+++ /dev/null
@@ -1,24 +0,0 @@
-Samsung S6E63J0X03 1.63" 320x320 AMOLED panel (interface: MIPI-DSI command mode)
-
-Required properties:
- - compatible: "samsung,s6e63j0x03"
- - reg: the virtual channel number of a DSI peripheral
- - vdd3-supply: I/O voltage supply
- - vci-supply: voltage supply for analog circuits
- - reset-gpios: a GPIO spec for the reset pin (active low)
- - te-gpios: a GPIO spec for the tearing effect synchronization signal
- gpio pin (active high)
-
-Example:
-&dsi {
- ...
-
- panel@0 {
- compatible = "samsung,s6e63j0x03";
- reg = <0>;
- vdd3-supply = <&ldo16_reg>;
- vci-supply = <&ldo20_reg>;
- reset-gpios = <&gpe0 1 GPIO_ACTIVE_LOW>;
- te-gpios = <&gpx0 6 GPIO_ACTIVE_HIGH>;
- };
-};
diff --git a/Documentation/devicetree/bindings/display/panel/samsung,s6e63m0.txt b/Documentation/devicetree/bindings/display/panel/samsung,s6e63m0.txt
deleted file mode 100644
index 9fb9ebeef8e4..000000000000
--- a/Documentation/devicetree/bindings/display/panel/samsung,s6e63m0.txt
+++ /dev/null
@@ -1,33 +0,0 @@
-Samsung s6e63m0 AMOLED LCD panel
-
-Required properties:
- - compatible: "samsung,s6e63m0"
- - reset-gpios: GPIO spec for reset pin
- - vdd3-supply: VDD regulator
- - vci-supply: VCI regulator
-
-The panel must obey rules for SPI slave device specified in document [1].
-
-The device node can contain one 'port' child node with one child
-'endpoint' node, according to the bindings defined in [2]. This
-node should describe panel's video bus.
-
-[1]: Documentation/devicetree/bindings/spi/spi-bus.txt
-[2]: Documentation/devicetree/bindings/media/video-interfaces.txt
-
-Example:
-
- s6e63m0: display@0 {
- compatible = "samsung,s6e63m0";
- reg = <0>;
- reset-gpio = <&mp05 5 1>;
- vdd3-supply = <&ldo12_reg>;
- vci-supply = <&ldo11_reg>;
- spi-max-frequency = <1200000>;
-
- port {
- lcd_ep: endpoint {
- remote-endpoint = <&fimd_ep>;
- };
- };
- };
diff --git a/Documentation/devicetree/bindings/display/panel/samsung,s6e63m0.yaml b/Documentation/devicetree/bindings/display/panel/samsung,s6e63m0.yaml
new file mode 100644
index 000000000000..1dab80ae1d0a
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/samsung,s6e63m0.yaml
@@ -0,0 +1,60 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/samsung,s6e63m0.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Samsung s6e63m0 AMOLED LCD panel
+
+maintainers:
+ - Jonathan Bakker <xc-racer2@live.ca>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: samsung,s6e63m0
+
+ reg: true
+ reset-gpios: true
+ port: true
+
+ vdd3-supply:
+ description: VDD regulator
+
+ vci-supply:
+ description: VCI regulator
+
+required:
+ - compatible
+ - reset-gpios
+ - vdd3-supply
+ - vci-supply
+ - port
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ spi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ display@0 {
+ compatible = "samsung,s6e63m0";
+ reg = <0>;
+ reset-gpios = <&mp05 5 1>;
+ vdd3-supply = <&ldo12_reg>;
+ vci-supply = <&ldo11_reg>;
+ spi-max-frequency = <1200000>;
+
+ port {
+ lcd_ep: endpoint {
+ remote-endpoint = <&fimd_ep>;
+ };
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/seiko,43wvf1g.txt b/Documentation/devicetree/bindings/display/panel/seiko,43wvf1g.txt
deleted file mode 100644
index aae57ef36cdd..000000000000
--- a/Documentation/devicetree/bindings/display/panel/seiko,43wvf1g.txt
+++ /dev/null
@@ -1,23 +0,0 @@
-Seiko Instruments Inc. 4.3" WVGA (800 x RGB x 480) TFT with Touch-Panel
-
-Required properties:
-- compatible: should be "sii,43wvf1g".
-- "dvdd-supply": 3v3 digital regulator.
-- "avdd-supply": 5v analog regulator.
-
-Optional properties:
-- backlight: phandle for the backlight control.
-
-Example:
-
- panel {
- compatible = "sii,43wvf1g";
- backlight = <&backlight_display>;
- dvdd-supply = <&reg_lcd_3v3>;
- avdd-supply = <&reg_lcd_5v>;
- port {
- panel_in: endpoint {
- remote-endpoint = <&display_out>;
- };
- };
- };
diff --git a/Documentation/devicetree/bindings/display/panel/seiko,43wvf1g.yaml b/Documentation/devicetree/bindings/display/panel/seiko,43wvf1g.yaml
new file mode 100644
index 000000000000..cfaa50cf5f5d
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/seiko,43wvf1g.yaml
@@ -0,0 +1,50 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/seiko,43wvf1g.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Seiko Instruments Inc. 4.3" WVGA (800 x RGB x 480) TFT with Touch-Panel
+
+maintainers:
+ - Marco Franchi <marco.franchi@nxp.com>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: sii,43wvf1g
+
+ backlight: true
+ port: true
+
+ dvdd-supply:
+ description: 3v3 digital regulator
+
+ avdd-supply:
+ description: 5v analog regulator
+
+required:
+ - compatible
+ - dvdd-supply
+ - avdd-supply
+
+additionalProperties: false
+
+examples:
+ - |
+ panel {
+ compatible = "sii,43wvf1g";
+
+ backlight = <&backlight_display>;
+ dvdd-supply = <&reg_lcd_3v3>;
+ avdd-supply = <&reg_lcd_5v>;
+ port {
+ panel_in: endpoint {
+ remote-endpoint = <&display_out>;
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/sharp,lq150x1lg11.txt b/Documentation/devicetree/bindings/display/panel/sharp,lq150x1lg11.txt
deleted file mode 100644
index 0f57c3143506..000000000000
--- a/Documentation/devicetree/bindings/display/panel/sharp,lq150x1lg11.txt
+++ /dev/null
@@ -1,36 +0,0 @@
-Sharp 15" LQ150X1LG11 XGA TFT LCD panel
-
-Required properties:
-- compatible: should be "sharp,lq150x1lg11"
-- power-supply: regulator to provide the VCC supply voltage (3.3 volts)
-
-Optional properties:
-- backlight: phandle of the backlight device
-- rlud-gpios: a single GPIO for the RL/UD (rotate 180 degrees) pin.
-- sellvds-gpios: a single GPIO for the SELLVDS pin.
-
-If rlud-gpios and/or sellvds-gpios are not specified, the RL/UD and/or SELLVDS
-pins are assumed to be handled appropriately by the hardware.
-
-Example:
-
- backlight: backlight {
- compatible = "pwm-backlight";
- pwms = <&pwm 0 100000>; /* VBR */
-
- brightness-levels = <0 20 40 60 80 100>;
- default-brightness-level = <2>;
-
- power-supply = <&vdd_12v_reg>; /* VDD */
- enable-gpios = <&gpio 42 GPIO_ACTIVE_HIGH>; /* XSTABY */
- };
-
- panel {
- compatible = "sharp,lq150x1lg11";
-
- power-supply = <&vcc_3v3_reg>; /* VCC */
-
- backlight = <&backlight>;
- rlud-gpios = <&gpio 17 GPIO_ACTIVE_HIGH>; /* RL/UD */
- sellvds-gpios = <&gpio 18 GPIO_ACTIVE_HIGH>; /* SELLVDS */
- };
diff --git a/Documentation/devicetree/bindings/display/panel/sharp,lq150x1lg11.yaml b/Documentation/devicetree/bindings/display/panel/sharp,lq150x1lg11.yaml
new file mode 100644
index 000000000000..92f2d12f4f4c
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/sharp,lq150x1lg11.yaml
@@ -0,0 +1,58 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/sharp,lq150x1lg11.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Sharp 15" LQ150X1LG11 XGA TFT LCD panel
+
+maintainers:
+ - Peter Rosin <peda@axentia.se>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: sharp,lq150x1lg11
+
+ power-supply: true
+ backlight: true
+
+ rlud-gpios:
+ maxItems: 1
+ description: |
+ GPIO for the RL/UD (rotate 180 degrees) pin.
+ If rlud-gpios and/or sellvds-gpios are not specified,
+ the RL/UD and/or SELLVDS pins are assumed to be handled
+ appropriately by the hardware.
+
+ sellvds-gpios:
+ maxItems: 1
+ description: |
+ GPIO for the SELLVDS pin.
+ If rlud-gpios and/or sellvds-gpios are not specified,
+ the RL/UD and/or SELLVDS pins are assumed to be handled
+ appropriately by the hardware.
+
+required:
+ - compatible
+ - power-supply
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ panel {
+ compatible = "sharp,lq150x1lg11";
+
+ power-supply = <&vcc_3v3_reg>; /* VCC */
+
+ backlight = <&backlight>;
+ rlud-gpios = <&gpio 17 GPIO_ACTIVE_HIGH>; /* RL/UD */
+ sellvds-gpios = <&gpio 18 GPIO_ACTIVE_HIGH>; /* SELLVDS */
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/sharp,ls037v7dw01.txt b/Documentation/devicetree/bindings/display/panel/sharp,ls037v7dw01.txt
deleted file mode 100644
index 0cc8981e9d49..000000000000
--- a/Documentation/devicetree/bindings/display/panel/sharp,ls037v7dw01.txt
+++ /dev/null
@@ -1,43 +0,0 @@
-SHARP LS037V7DW01 TFT-LCD panel
-===================================
-
-Required properties:
-- compatible: "sharp,ls037v7dw01"
-
-Optional properties:
-- label: a symbolic name for the panel
-- enable-gpios: a GPIO spec for the optional enable pin.
- This pin is the INI pin as specified in the LS037V7DW01.pdf file.
-- reset-gpios: a GPIO spec for the optional reset pin.
- This pin is the RESB pin as specified in the LS037V7DW01.pdf file.
-- mode-gpios: a GPIO
- ordered MO, LR, and UD as specified in the LS037V7DW01.pdf file.
-
-Required nodes:
-- Video port for DPI input
-
-This panel can have zero to five GPIOs to configure to change configuration
-between QVGA and VGA mode and the scan direction. As these pins can be also
-configured with external pulls, all the GPIOs are considered optional with holes
-in the array.
-
-Example
--------
-
-Example when connected to a omap2+ based device:
-
-lcd0: display {
- compatible = "sharp,ls037v7dw01";
- power-supply = <&lcd_3v3>;
- enable-gpios = <&gpio5 24 GPIO_ACTIVE_HIGH>; /* gpio152, lcd INI */
- reset-gpios = <&gpio5 27 GPIO_ACTIVE_HIGH>; /* gpio155, lcd RESB */
- mode-gpios = <&gpio5 26 GPIO_ACTIVE_HIGH /* gpio154, lcd MO */
- &gpio1 2 GPIO_ACTIVE_HIGH /* gpio2, lcd LR */
- &gpio1 3 GPIO_ACTIVE_HIGH>; /* gpio3, lcd UD */
-
- port {
- lcd_in: endpoint {
- remote-endpoint = <&dpi_out>;
- };
- };
-};
diff --git a/Documentation/devicetree/bindings/display/panel/sharp,ls037v7dw01.yaml b/Documentation/devicetree/bindings/display/panel/sharp,ls037v7dw01.yaml
new file mode 100644
index 000000000000..8c47a9b0b507
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/sharp,ls037v7dw01.yaml
@@ -0,0 +1,68 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/sharp,ls037v7dw01.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: SHARP LS037V7DW01 TFT-LCD panel
+
+description: |
+ This panel can have zero to five GPIOs to configure to change configuration
+ between QVGA and VGA mode and the scan direction. As these pins can be also
+ configured with external pulls, all the GPIOs are considered optional with holes
+ in the array.
+
+maintainers:
+ - Tony Lindgren <tony@atomide.com>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: sharp,ls037v7dw01
+
+ label: true
+ enable-gpios: true
+ reset-gpios: true
+ port: true
+ power-supply: true
+
+ mode-gpios:
+ minItems: 1
+ maxItems: 3
+ description: |
+ GPIO ordered MO, LR, and UD as specified in LS037V7DW01.pdf
+ This panel can have zero to three GPIOs to configure to
+ change configuration between QVGA and VGA mode and the
+ scan direction. As these pins can be also configured
+ with external pulls, all the GPIOs are considered
+ optional with holes in the array.
+
+required:
+ - compatible
+ - port
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ lcd0: display {
+ compatible = "sharp,ls037v7dw01";
+ power-supply = <&lcd_3v3>;
+ enable-gpios = <&gpio5 24 GPIO_ACTIVE_HIGH>; /* gpio152, lcd INI */
+ reset-gpios = <&gpio5 27 GPIO_ACTIVE_HIGH>; /* gpio155, lcd RESB */
+ mode-gpios = <&gpio5 26 GPIO_ACTIVE_HIGH /* gpio154, lcd MO */
+ &gpio1 2 GPIO_ACTIVE_HIGH /* gpio2, lcd LR */
+ &gpio1 3 GPIO_ACTIVE_HIGH>; /* gpio3, lcd UD */
+
+ port {
+ lcd_in: endpoint {
+ remote-endpoint = <&dpi_out>;
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/sharp,ls043t1le01.txt b/Documentation/devicetree/bindings/display/panel/sharp,ls043t1le01.txt
deleted file mode 100644
index 3770a111968b..000000000000
--- a/Documentation/devicetree/bindings/display/panel/sharp,ls043t1le01.txt
+++ /dev/null
@@ -1,22 +0,0 @@
-Sharp Microelectronics 4.3" qHD TFT LCD panel
-
-Required properties:
-- compatible: should be "sharp,ls043t1le01-qhd"
-- reg: DSI virtual channel of the peripheral
-- power-supply: phandle of the regulator that provides the supply voltage
-
-Optional properties:
-- backlight: phandle of the backlight device attached to the panel
-- reset-gpios: a GPIO spec for the reset pin
-
-Example:
-
- mdss_dsi@fd922800 {
- panel@0 {
- compatible = "sharp,ls043t1le01-qhd";
- reg = <0>;
- avdd-supply = <&pm8941_l22>;
- backlight = <&pm8941_wled>;
- reset-gpios = <&pm8941_gpios 19 GPIO_ACTIVE_HIGH>;
- };
- };
diff --git a/Documentation/devicetree/bindings/display/panel/sharp,ls043t1le01.yaml b/Documentation/devicetree/bindings/display/panel/sharp,ls043t1le01.yaml
new file mode 100644
index 000000000000..a90d0d8bf7c9
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/sharp,ls043t1le01.yaml
@@ -0,0 +1,51 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/sharp,ls043t1le01.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Sharp Microelectronics 4.3" qHD TFT LCD panel
+
+maintainers:
+ - Werner Johansson <werner.johansson@sonymobile.com>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: sharp,ls043t1le01-qhd
+
+ reg: true
+ backlight: true
+ reset-gpios: true
+ port: true
+
+ avdd-supply:
+ description: handle of the regulator that provides the supply voltage
+
+required:
+ - compatible
+ - reg
+ - avdd-supply
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ dsi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel@0 {
+ compatible = "sharp,ls043t1le01-qhd";
+ reg = <0>;
+ avdd-supply = <&pm8941_l22>;
+ backlight = <&pm8941_wled>;
+ reset-gpios = <&pm8941_gpios 19 GPIO_ACTIVE_HIGH>;
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/simple-panel.txt b/Documentation/devicetree/bindings/display/panel/simple-panel.txt
deleted file mode 100644
index e11208fb7da8..000000000000
--- a/Documentation/devicetree/bindings/display/panel/simple-panel.txt
+++ /dev/null
@@ -1 +0,0 @@
-See panel-common.yaml in this directory.
diff --git a/Documentation/devicetree/bindings/display/panel/sitronix,st7701.txt b/Documentation/devicetree/bindings/display/panel/sitronix,st7701.txt
deleted file mode 100644
index ccd17597f1f6..000000000000
--- a/Documentation/devicetree/bindings/display/panel/sitronix,st7701.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-Sitronix ST7701 based LCD panels
-
-ST7701 designed for small and medium sizes of TFT LCD display, is
-capable of supporting up to 480RGBX864 in resolution. It provides
-several system interfaces like MIPI/RGB/SPI.
-
-Techstar TS8550B is 480x854, 2-lane MIPI DSI LCD panel which has
-inbuilt ST7701 chip.
-
-Required properties:
-- compatible: must be "sitronix,st7701" and one of
- * "techstar,ts8550b"
-- reset-gpios: a GPIO phandle for the reset pin
-
-Required properties for techstar,ts8550b:
-- reg: DSI virtual channel used by that screen
-- VCC-supply: analog regulator for MIPI circuit
-- IOVCC-supply: I/O system regulator
-
-Optional properties:
-- backlight: phandle for the backlight control.
-
-panel@0 {
- compatible = "techstar,ts8550b", "sitronix,st7701";
- reg = <0>;
- VCC-supply = <&reg_dldo2>;
- IOVCC-supply = <&reg_dldo2>;
- reset-gpios = <&pio 3 24 GPIO_ACTIVE_HIGH>; /* LCD-RST: PD24 */
- backlight = <&backlight>;
-};
diff --git a/Documentation/devicetree/bindings/display/panel/sitronix,st7701.yaml b/Documentation/devicetree/bindings/display/panel/sitronix,st7701.yaml
new file mode 100644
index 000000000000..6dff59fe4be1
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/sitronix,st7701.yaml
@@ -0,0 +1,69 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/sitronix,st7701.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Sitronix ST7701 based LCD panels
+
+maintainers:
+ - Jagan Teki <jagan@amarulasolutions.com>
+
+description: |
+ ST7701 designed for small and medium sizes of TFT LCD display, is
+ capable of supporting up to 480RGBX864 in resolution. It provides
+ several system interfaces like MIPI/RGB/SPI.
+
+ Techstar TS8550B is 480x854, 2-lane MIPI DSI LCD panel which has
+ inbuilt ST7701 chip.
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ items:
+ - enum:
+ - techstar,ts8550b
+ - const: sitronix,st7701
+
+ reg:
+ description: DSI virtual channel used by that screen
+ maxItems: 1
+
+ VCC-supply:
+ description: analog regulator for MIPI circuit
+
+ IOVCC-supply:
+ description: I/O system regulator
+
+ reset-gpios: true
+
+ backlight: true
+
+required:
+ - compatible
+ - reg
+ - VCC-supply
+ - IOVCC-supply
+ - reset-gpios
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ dsi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel@0 {
+ compatible = "techstar,ts8550b", "sitronix,st7701";
+ reg = <0>;
+ VCC-supply = <&reg_dldo2>;
+ IOVCC-supply = <&reg_dldo2>;
+ reset-gpios = <&pio 3 24 GPIO_ACTIVE_HIGH>; /* LCD-RST: PD24 */
+ backlight = <&backlight>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/display/panel/sitronix,st7789v.txt b/Documentation/devicetree/bindings/display/panel/sitronix,st7789v.txt
deleted file mode 100644
index c6995dde641b..000000000000
--- a/Documentation/devicetree/bindings/display/panel/sitronix,st7789v.txt
+++ /dev/null
@@ -1,37 +0,0 @@
-Sitronix ST7789V RGB panel with SPI control bus
-
-Required properties:
- - compatible: "sitronix,st7789v"
- - reg: Chip select of the panel on the SPI bus
- - reset-gpios: a GPIO phandle for the reset pin
- - power-supply: phandle of the regulator that provides the supply voltage
-
-Optional properties:
- - backlight: phandle to the backlight used
-
-The generic bindings for the SPI slaves documented in [1] also applies
-
-The device node can contain one 'port' child node with one child
-'endpoint' node, according to the bindings defined in [2]. This
-node should describe panel's video bus.
-
-[1]: Documentation/devicetree/bindings/spi/spi-bus.txt
-[2]: Documentation/devicetree/bindings/graph.txt
-
-Example:
-
-panel@0 {
- compatible = "sitronix,st7789v";
- reg = <0>;
- reset-gpios = <&pio 6 11 GPIO_ACTIVE_LOW>;
- backlight = <&pwm_bl>;
- spi-max-frequency = <100000>;
- spi-cpol;
- spi-cpha;
-
- port {
- panel_input: endpoint {
- remote-endpoint = <&tcon0_out_panel>;
- };
- };
-};
diff --git a/Documentation/devicetree/bindings/display/panel/sitronix,st7789v.yaml b/Documentation/devicetree/bindings/display/panel/sitronix,st7789v.yaml
new file mode 100644
index 000000000000..fa46d151e7b3
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/sitronix,st7789v.yaml
@@ -0,0 +1,63 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/sitronix,st7789v.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Sitronix ST7789V RGB panel with SPI control bus
+
+description: |
+ The panel must obey the rules for a SPI slave device as specified in
+ spi/spi-controller.yaml
+
+maintainers:
+ - Maxime Ripard <mripard@kernel.org>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: sitronix,st7789v
+
+ reg: true
+ reset-gpios: true
+ power-supply: true
+ backlight: true
+ port: true
+
+required:
+ - compatible
+ - reg
+ - reset-gpios
+ - power-supply
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ spi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel@0 {
+ compatible = "sitronix,st7789v";
+ reg = <0>;
+ reset-gpios = <&pio 6 11 GPIO_ACTIVE_LOW>;
+ backlight = <&pwm_bl>;
+ power-supply = <&power>;
+ spi-max-frequency = <100000>;
+ spi-cpol;
+ spi-cpha;
+
+ port {
+ panel_input: endpoint {
+ remote-endpoint = <&tcon0_out_panel>;
+ };
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/sony,acx565akm.txt b/Documentation/devicetree/bindings/display/panel/sony,acx565akm.txt
deleted file mode 100644
index e12333280749..000000000000
--- a/Documentation/devicetree/bindings/display/panel/sony,acx565akm.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-Sony ACX565AKM SDI Panel
-========================
-
-Required properties:
-- compatible: "sony,acx565akm"
-
-Optional properties:
-- label: a symbolic name for the panel
-- reset-gpios: panel reset gpio
-
-Required nodes:
-- Video port for SDI input
-
-Example
--------
-
-acx565akm@2 {
- compatible = "sony,acx565akm";
- spi-max-frequency = <6000000>;
- reg = <2>;
-
- label = "lcd";
- reset-gpios = <&gpio3 26 GPIO_ACTIVE_HIGH>; /* 90 */
-
- port {
- lcd_in: endpoint {
- remote-endpoint = <&sdi_out>;
- };
- };
-};
diff --git a/Documentation/devicetree/bindings/display/panel/sony,acx565akm.yaml b/Documentation/devicetree/bindings/display/panel/sony,acx565akm.yaml
new file mode 100644
index 000000000000..95d053c548ab
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/sony,acx565akm.yaml
@@ -0,0 +1,57 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/sony,acx565akm.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Sony ACX565AKM SDI Panel
+
+description: |
+ The panel must obey the rules for a SPI slave device as specified in
+ spi/spi-controller.yaml
+
+maintainers:
+ - Tomi Valkeinen <tomi.valkeinen@ti.com>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: sony,acx565akm
+
+ label: true
+ reset-gpios: true
+ port: true
+
+required:
+ - compatible
+ - port
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ spi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel@2 {
+ compatible = "sony,acx565akm";
+ spi-max-frequency = <6000000>;
+ reg = <2>;
+
+ label = "lcd";
+ reset-gpios = <&gpio3 26 GPIO_ACTIVE_HIGH>; /* 90 */
+
+ port {
+ lcd_in: endpoint {
+ remote-endpoint = <&sdi_out>;
+ };
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/startek,startek-kd050c.txt b/Documentation/devicetree/bindings/display/panel/startek,startek-kd050c.txt
deleted file mode 100644
index 70cd8d18d841..000000000000
--- a/Documentation/devicetree/bindings/display/panel/startek,startek-kd050c.txt
+++ /dev/null
@@ -1,4 +0,0 @@
-Startek Electronic Technology Co. KD050C 5.0" WVGA TFT LCD panel
-
-Required properties:
-- compatible: should be "startek,startek-kd050c"
diff --git a/Documentation/devicetree/bindings/display/panel/startek,startek-kd050c.yaml b/Documentation/devicetree/bindings/display/panel/startek,startek-kd050c.yaml
new file mode 100644
index 000000000000..fd668640afd1
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/startek,startek-kd050c.yaml
@@ -0,0 +1,33 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/startek,startek-kd050c.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Startek Electronic Technology Co. KD050C 5.0" WVGA TFT LCD panel
+
+maintainers:
+ - Nikita Kiryanov <nikita@compulab.co.il>
+
+allOf:
+ - $ref: panel-dpi.yaml#
+
+properties:
+ compatible:
+ items:
+ - const: startek,startek-kd050c
+ - {} # panel-dpi, but not listed here to avoid false select
+
+ backlight: true
+ enable-gpios: true
+ height-mm: true
+ label: true
+ panel-timing: true
+ port: true
+ power-supply: true
+ reset-gpios: true
+ width-mm: true
+
+additionalProperties: false
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/tpo,td.yaml b/Documentation/devicetree/bindings/display/panel/tpo,td.yaml
new file mode 100644
index 000000000000..4aa605613445
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/tpo,td.yaml
@@ -0,0 +1,65 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/tpo,td.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Toppoly TD Panels
+
+description: |
+ The panel must obey the rules for a SPI slave device as specified in
+ spi/spi-controller.yaml
+
+maintainers:
+ - Marek Belisko <marek@goldelico.com>
+ - H. Nikolaus Schaller <hns@goldelico.com>
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ enum:
+ # Toppoly TD028TTEC1 Panel
+ - tpo,td028ttec1
+ # Toppoly TD043MTEA1 Panel
+ - tpo,td043mtea1
+
+ reg: true
+ label: true
+ reset-gpios: true
+ backlight: true
+ port: true
+
+required:
+ - compatible
+ - port
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ spi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ panel: panel@0 {
+ compatible = "tpo,td043mtea1";
+ reg = <0>;
+ spi-max-frequency = <100000>;
+ spi-cpol;
+ spi-cpha;
+
+ label = "lcd";
+
+ reset-gpios = <&gpio7 7 0>;
+
+ port {
+ lcd_in: endpoint {
+ remote-endpoint = <&dpi_out>;
+ };
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/display/panel/tpo,td028ttec1.txt b/Documentation/devicetree/bindings/display/panel/tpo,td028ttec1.txt
deleted file mode 100644
index 898e06ecf4ef..000000000000
--- a/Documentation/devicetree/bindings/display/panel/tpo,td028ttec1.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-Toppoly TD028TTEC1 Panel
-========================
-
-Required properties:
-- compatible: "tpo,td028ttec1"
-
-Optional properties:
-- label: a symbolic name for the panel
-- backlight: phandle of the backlight device
-
-Required nodes:
-- Video port for DPI input
-
-Example
--------
-
-lcd-panel: td028ttec1@0 {
- compatible = "tpo,td028ttec1";
- reg = <0>;
- spi-max-frequency = <100000>;
- spi-cpol;
- spi-cpha;
-
- label = "lcd";
- backlight = <&backlight>;
- port {
- lcd_in: endpoint {
- remote-endpoint = <&dpi_out>;
- };
- };
-};
-
diff --git a/Documentation/devicetree/bindings/display/panel/tpo,td043mtea1.txt b/Documentation/devicetree/bindings/display/panel/tpo,td043mtea1.txt
deleted file mode 100644
index ec6d62975162..000000000000
--- a/Documentation/devicetree/bindings/display/panel/tpo,td043mtea1.txt
+++ /dev/null
@@ -1,33 +0,0 @@
-TPO TD043MTEA1 Panel
-====================
-
-Required properties:
-- compatible: "tpo,td043mtea1"
-- reset-gpios: panel reset gpio
-
-Optional properties:
-- label: a symbolic name for the panel
-
-Required nodes:
-- Video port for DPI input
-
-Example
--------
-
-lcd-panel: panel@0 {
- compatible = "tpo,td043mtea1";
- reg = <0>;
- spi-max-frequency = <100000>;
- spi-cpol;
- spi-cpha;
-
- label = "lcd";
-
- reset-gpios = <&gpio7 7 0>;
-
- port {
- lcd_in: endpoint {
- remote-endpoint = <&dpi_out>;
- };
- };
-};
diff --git a/Documentation/devicetree/bindings/display/panel/visionox,rm69299.yaml b/Documentation/devicetree/bindings/display/panel/visionox,rm69299.yaml
new file mode 100644
index 000000000000..b36f39f6b233
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/visionox,rm69299.yaml
@@ -0,0 +1,57 @@
+# SPDX-License-Identifier: GPL-2.0-only or BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/visionox,rm69299.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Visionox model RM69299 Panels Device Tree Bindings.
+
+maintainers:
+ - Harigovindan P <harigovi@codeaurora.org>
+
+description: |
+ This binding is for display panels using a Visionox RM692999 panel.
+
+allOf:
+ - $ref: panel-common.yaml#
+
+properties:
+ compatible:
+ const: visionox,rm69299-1080p-display
+
+ vdda-supply:
+ description: |
+ Phandle of the regulator that provides the vdda supply voltage.
+
+ vdd3p3-supply:
+ description: |
+ Phandle of the regulator that provides the vdd3p3 supply voltage.
+
+ port: true
+ reset-gpios: true
+
+additionalProperties: false
+
+required:
+ - compatible
+ - vdda-supply
+ - vdd3p3-supply
+ - reset-gpios
+ - port
+
+examples:
+ - |
+ panel {
+ compatible = "visionox,rm69299-1080p-display";
+
+ vdda-supply = <&src_pp1800_l8c>;
+ vdd3p3-supply = <&src_pp2800_l18a>;
+
+ reset-gpios = <&pm6150l_gpio 3 0>;
+ port {
+ panel0_in: endpoint {
+ remote-endpoint = <&dsi0_out>;
+ };
+ };
+ };
+...
diff --git a/Documentation/devicetree/bindings/display/panel/xinpeng,xpp055c272.yaml b/Documentation/devicetree/bindings/display/panel/xinpeng,xpp055c272.yaml
index d9fdb58e06b4..6913923df569 100644
--- a/Documentation/devicetree/bindings/display/panel/xinpeng,xpp055c272.yaml
+++ b/Documentation/devicetree/bindings/display/panel/xinpeng,xpp055c272.yaml
@@ -37,7 +37,6 @@ examples:
dsi {
#address-cells = <1>;
#size-cells = <0>;
- reg = <0xff450000 0x1000>;
panel@0 {
compatible = "xinpeng,xpp055c272";
diff --git a/Documentation/devicetree/bindings/display/renesas,du.txt b/Documentation/devicetree/bindings/display/renesas,du.txt
index eb4ae41fe41f..51cd4d162770 100644
--- a/Documentation/devicetree/bindings/display/renesas,du.txt
+++ b/Documentation/devicetree/bindings/display/renesas,du.txt
@@ -50,6 +50,14 @@ Required Properties:
VSP instance that serves the DU channel, and the channel index identifies
the LIF instance in that VSP.
+Optional properties:
+ - resets: A list of phandle + reset-specifier pairs, one for each entry in
+ the reset-names property.
+ - reset-names: Names of the resets. This property is model-dependent.
+ - All but R8A7779 use one reset for a group of one or more successive
+ channels. The resets must be named "du.x" with "x" being the numerical
+ index of the lowest channel in the group.
+
Required nodes:
The connections to the DU output video ports are modeled using the OF graph
@@ -96,6 +104,8 @@ Example: R8A7795 (R-Car H3) ES2.0 DU
<&cpg CPG_MOD 722>,
<&cpg CPG_MOD 721>;
clock-names = "du.0", "du.1", "du.2", "du.3";
+ resets = <&cpg 724>, <&cpg 722>;
+ reset-names = "du.0", "du.2";
renesas,cmms = <&cmm0>, <&cmm1>, <&cmm2>, <&cmm3>;
renesas,vsps = <&vspd0 0>, <&vspd1 0>, <&vspd2 0>, <&vspd0 1>;
diff --git a/Documentation/devicetree/bindings/display/rockchip/rockchip,rk3066-hdmi.txt b/Documentation/devicetree/bindings/display/rockchip/rockchip,rk3066-hdmi.txt
deleted file mode 100644
index d1ad31bca8d9..000000000000
--- a/Documentation/devicetree/bindings/display/rockchip/rockchip,rk3066-hdmi.txt
+++ /dev/null
@@ -1,72 +0,0 @@
-Rockchip specific extensions for rk3066 HDMI
-============================================
-
-Required properties:
-- compatible:
- "rockchip,rk3066-hdmi";
-- reg:
- Physical base address and length of the controller's registers.
-- clocks, clock-names:
- Phandle to HDMI controller clock, name should be "hclk".
-- interrupts:
- HDMI interrupt number.
-- power-domains:
- Phandle to the RK3066_PD_VIO power domain.
-- rockchip,grf:
- This soc uses GRF regs to switch the HDMI TX input between vop0 and vop1.
-- ports:
- Contains one port node with two endpoints, numbered 0 and 1,
- connected respectively to vop0 and vop1.
- Contains one port node with one endpoint
- connected to a hdmi-connector node.
-- pinctrl-0, pinctrl-name:
- Switch the iomux for the HPD/I2C pins to HDMI function.
-
-Example:
- hdmi: hdmi@10116000 {
- compatible = "rockchip,rk3066-hdmi";
- reg = <0x10116000 0x2000>;
- interrupts = <GIC_SPI 64 IRQ_TYPE_LEVEL_HIGH>;
- clocks = <&cru HCLK_HDMI>;
- clock-names = "hclk";
- power-domains = <&power RK3066_PD_VIO>;
- rockchip,grf = <&grf>;
- pinctrl-names = "default";
- pinctrl-0 = <&hdmii2c_xfer>, <&hdmi_hpd>;
-
- ports {
- #address-cells = <1>;
- #size-cells = <0>;
- hdmi_in: port@0 {
- reg = <0>;
- #address-cells = <1>;
- #size-cells = <0>;
- hdmi_in_vop0: endpoint@0 {
- reg = <0>;
- remote-endpoint = <&vop0_out_hdmi>;
- };
- hdmi_in_vop1: endpoint@1 {
- reg = <1>;
- remote-endpoint = <&vop1_out_hdmi>;
- };
- };
- hdmi_out: port@1 {
- reg = <1>;
- hdmi_out_con: endpoint {
- remote-endpoint = <&hdmi_con_in>;
- };
- };
- };
- };
-
-&pinctrl {
- hdmi {
- hdmi_hpd: hdmi-hpd {
- rockchip,pins = <0 RK_PA0 1 &pcfg_pull_default>;
- };
- hdmii2c_xfer: hdmii2c-xfer {
- rockchip,pins = <0 RK_PA1 1 &pcfg_pull_none>,
- <0 RK_PA2 1 &pcfg_pull_none>;
- };
- };
-};
diff --git a/Documentation/devicetree/bindings/display/rockchip/rockchip,rk3066-hdmi.yaml b/Documentation/devicetree/bindings/display/rockchip/rockchip,rk3066-hdmi.yaml
new file mode 100644
index 000000000000..4110d003ce1f
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/rockchip/rockchip,rk3066-hdmi.yaml
@@ -0,0 +1,140 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/rockchip/rockchip,rk3066-hdmi.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Rockchip rk3066 HDMI controller
+
+maintainers:
+ - Sandy Huang <hjc@rock-chips.com>
+ - Heiko Stuebner <heiko@sntech.de>
+
+properties:
+ compatible:
+ const: rockchip,rk3066-hdmi
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ clocks:
+ maxItems: 1
+
+ clock-names:
+ const: hclk
+
+ pinctrl-0:
+ maxItems: 2
+
+ pinctrl-names:
+ const: default
+ description:
+ Switch the iomux for the HPD/I2C pins to HDMI function.
+
+ power-domains:
+ maxItems: 1
+
+ rockchip,grf:
+ $ref: /schemas/types.yaml#/definitions/phandle
+ description:
+ This soc uses GRF regs to switch the HDMI TX input between vop0 and vop1.
+
+ ports:
+ type: object
+
+ properties:
+ "#address-cells":
+ const: 1
+
+ "#size-cells":
+ const: 0
+
+ port@0:
+ type: object
+ description:
+ Port node with two endpoints, numbered 0 and 1,
+ connected respectively to vop0 and vop1.
+
+ port@1:
+ type: object
+ description:
+ Port node with one endpoint connected to a hdmi-connector node.
+
+ required:
+ - "#address-cells"
+ - "#size-cells"
+ - port@0
+ - port@1
+
+ additionalProperties: false
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+ - clock-names
+ - pinctrl-0
+ - pinctrl-names
+ - power-domains
+ - rockchip,grf
+ - ports
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/clock/rk3066a-cru.h>
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/pinctrl/rockchip.h>
+ #include <dt-bindings/power/rk3066-power.h>
+ hdmi: hdmi@10116000 {
+ compatible = "rockchip,rk3066-hdmi";
+ reg = <0x10116000 0x2000>;
+ interrupts = <GIC_SPI 64 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&cru HCLK_HDMI>;
+ clock-names = "hclk";
+ pinctrl-0 = <&hdmii2c_xfer>, <&hdmi_hpd>;
+ pinctrl-names = "default";
+ power-domains = <&power RK3066_PD_VIO>;
+ rockchip,grf = <&grf>;
+
+ ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ hdmi_in: port@0 {
+ reg = <0>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ hdmi_in_vop0: endpoint@0 {
+ reg = <0>;
+ remote-endpoint = <&vop0_out_hdmi>;
+ };
+ hdmi_in_vop1: endpoint@1 {
+ reg = <1>;
+ remote-endpoint = <&vop1_out_hdmi>;
+ };
+ };
+ hdmi_out: port@1 {
+ reg = <1>;
+ hdmi_out_con: endpoint {
+ remote-endpoint = <&hdmi_con_in>;
+ };
+ };
+ };
+ };
+
+ pinctrl {
+ hdmi {
+ hdmi_hpd: hdmi-hpd {
+ rockchip,pins = <0 RK_PA0 1 &pcfg_pull_default>;
+ };
+ hdmii2c_xfer: hdmii2c-xfer {
+ rockchip,pins = <0 RK_PA1 1 &pcfg_pull_none>,
+ <0 RK_PA2 1 &pcfg_pull_none>;
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/display/rockchip/rockchip-vop.txt b/Documentation/devicetree/bindings/display/rockchip/rockchip-vop.txt
deleted file mode 100644
index 8b3a5f514205..000000000000
--- a/Documentation/devicetree/bindings/display/rockchip/rockchip-vop.txt
+++ /dev/null
@@ -1,74 +0,0 @@
-device-tree bindings for rockchip soc display controller (vop)
-
-VOP (Visual Output Processor) is the Display Controller for the Rockchip
-series of SoCs which transfers the image data from a video memory
-buffer to an external LCD interface.
-
-Required properties:
-- compatible: value should be one of the following
- "rockchip,rk3036-vop";
- "rockchip,rk3126-vop";
- "rockchip,px30-vop-lit";
- "rockchip,px30-vop-big";
- "rockchip,rk3066-vop";
- "rockchip,rk3188-vop";
- "rockchip,rk3288-vop";
- "rockchip,rk3368-vop";
- "rockchip,rk3366-vop";
- "rockchip,rk3399-vop-big";
- "rockchip,rk3399-vop-lit";
- "rockchip,rk3228-vop";
- "rockchip,rk3328-vop";
-
-- reg: Must contain one entry corresponding to the base address and length
- of the register space. Can optionally contain a second entry
- corresponding to the CRTC gamma LUT address.
-
-- interrupts: should contain a list of all VOP IP block interrupts in the
- order: VSYNC, LCD_SYSTEM. The interrupt specifier
- format depends on the interrupt controller used.
-
-- clocks: must include clock specifiers corresponding to entries in the
- clock-names property.
-
-- clock-names: Must contain
- aclk_vop: for ddr buffer transfer.
- hclk_vop: for ahb bus to R/W the phy regs.
- dclk_vop: pixel clock.
-
-- resets: Must contain an entry for each entry in reset-names.
- See ../reset/reset.txt for details.
-- reset-names: Must include the following entries:
- - axi
- - ahb
- - dclk
-
-- iommus: required a iommu node
-
-- port: A port node with endpoint definitions as defined in
- Documentation/devicetree/bindings/media/video-interfaces.txt.
-
-Example:
-SoC specific DT entry:
- vopb: vopb@ff930000 {
- compatible = "rockchip,rk3288-vop";
- reg = <0x0 0xff930000 0x0 0x19c>, <0x0 0xff931000 0x0 0x1000>;
- interrupts = <GIC_SPI 15 IRQ_TYPE_LEVEL_HIGH>;
- clocks = <&cru ACLK_VOP0>, <&cru DCLK_VOP0>, <&cru HCLK_VOP0>;
- clock-names = "aclk_vop", "dclk_vop", "hclk_vop";
- resets = <&cru SRST_LCDC1_AXI>, <&cru SRST_LCDC1_AHB>, <&cru SRST_LCDC1_DCLK>;
- reset-names = "axi", "ahb", "dclk";
- iommus = <&vopb_mmu>;
- vopb_out: port {
- #address-cells = <1>;
- #size-cells = <0>;
- vopb_out_edp: endpoint@0 {
- reg = <0>;
- remote-endpoint=<&edp_in_vopb>;
- };
- vopb_out_hdmi: endpoint@1 {
- reg = <1>;
- remote-endpoint=<&hdmi_in_vopb>;
- };
- };
- };
diff --git a/Documentation/devicetree/bindings/display/rockchip/rockchip-vop.yaml b/Documentation/devicetree/bindings/display/rockchip/rockchip-vop.yaml
new file mode 100644
index 000000000000..1695e3e4bcec
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/rockchip/rockchip-vop.yaml
@@ -0,0 +1,134 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/rockchip/rockchip-vop.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Rockchip SoC display controller (VOP)
+
+description:
+ VOP (Video Output Processor) is the display controller for the Rockchip
+ series of SoCs which transfers the image data from a video memory
+ buffer to an external LCD interface.
+
+maintainers:
+ - Sandy Huang <hjc@rock-chips.com>
+ - Heiko Stuebner <heiko@sntech.de>
+
+properties:
+ compatible:
+ enum:
+ - rockchip,px30-vop-big
+ - rockchip,px30-vop-lit
+ - rockchip,rk3036-vop
+ - rockchip,rk3066-vop
+ - rockchip,rk3126-vop
+ - rockchip,rk3188-vop
+ - rockchip,rk3228-vop
+ - rockchip,rk3288-vop
+ - rockchip,rk3328-vop
+ - rockchip,rk3366-vop
+ - rockchip,rk3368-vop
+ - rockchip,rk3399-vop-big
+ - rockchip,rk3399-vop-lit
+
+ reg:
+ minItems: 1
+ items:
+ - description:
+ Must contain one entry corresponding to the base address and length
+ of the register space.
+ - description:
+ Can optionally contain a second entry corresponding to
+ the CRTC gamma LUT address.
+
+ interrupts:
+ maxItems: 1
+ description:
+ The VOP interrupt is shared by several interrupt sources, such as
+ frame start (VSYNC), line flag and other status interrupts.
+
+ clocks:
+ items:
+ - description: Clock for ddr buffer transfer.
+ - description: Pixel clock.
+ - description: Clock for the ahb bus to R/W the phy regs.
+
+ clock-names:
+ items:
+ - const: aclk_vop
+ - const: dclk_vop
+ - const: hclk_vop
+
+ resets:
+ maxItems: 3
+
+ reset-names:
+ items:
+ - const: axi
+ - const: ahb
+ - const: dclk
+
+ port:
+ type: object
+ description:
+ A port node with endpoint definitions as defined in
+ Documentation/devicetree/bindings/media/video-interfaces.txt.
+
+ assigned-clocks:
+ maxItems: 2
+
+ assigned-clock-rates:
+ maxItems: 2
+
+ iommus:
+ maxItems: 1
+
+ power-domains:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+ - clock-names
+ - resets
+ - reset-names
+ - port
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/clock/rk3288-cru.h>
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/power/rk3288-power.h>
+ vopb: vopb@ff930000 {
+ compatible = "rockchip,rk3288-vop";
+ reg = <0x0 0xff930000 0x0 0x19c>,
+ <0x0 0xff931000 0x0 0x1000>;
+ interrupts = <GIC_SPI 15 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&cru ACLK_VOP0>,
+ <&cru DCLK_VOP0>,
+ <&cru HCLK_VOP0>;
+ clock-names = "aclk_vop", "dclk_vop", "hclk_vop";
+ power-domains = <&power RK3288_PD_VIO>;
+ resets = <&cru SRST_LCDC1_AXI>,
+ <&cru SRST_LCDC1_AHB>,
+ <&cru SRST_LCDC1_DCLK>;
+ reset-names = "axi", "ahb", "dclk";
+ iommus = <&vopb_mmu>;
+ vopb_out: port {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ vopb_out_edp: endpoint@0 {
+ reg = <0>;
+ remote-endpoint=<&edp_in_vopb>;
+ };
+ vopb_out_hdmi: endpoint@1 {
+ reg = <1>;
+ remote-endpoint=<&hdmi_in_vopb>;
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/dma/fsl-edma.txt b/Documentation/devicetree/bindings/dma/fsl-edma.txt
index e77b08ebcd06..ee1754739b4b 100644
--- a/Documentation/devicetree/bindings/dma/fsl-edma.txt
+++ b/Documentation/devicetree/bindings/dma/fsl-edma.txt
@@ -10,7 +10,8 @@ Required properties:
- compatible :
- "fsl,vf610-edma" for eDMA used similar to that on Vybrid vf610 SoC
- "fsl,imx7ulp-edma" for eDMA2 used similar to that on i.mx7ulp
- - "fsl,fsl,ls1028a-edma" for eDMA used similar to that on Vybrid vf610 SoC
+ - "fsl,ls1028a-edma" followed by "fsl,vf610-edma" for eDMA used on the
+ LS1028A SoC.
- reg : Specifies base physical address(s) and size of the eDMA registers.
The 1st region is eDMA control register's address and size.
The 2nd and the 3rd regions are programmable channel multiplexing
diff --git a/Documentation/devicetree/bindings/dma/socionext,uniphier-xdmac.yaml b/Documentation/devicetree/bindings/dma/socionext,uniphier-xdmac.yaml
index 86cfb599256e..371f18773198 100644
--- a/Documentation/devicetree/bindings/dma/socionext,uniphier-xdmac.yaml
+++ b/Documentation/devicetree/bindings/dma/socionext,uniphier-xdmac.yaml
@@ -22,9 +22,7 @@ properties:
const: socionext,uniphier-xdmac
reg:
- items:
- - description: XDMAC base register region (offset and length)
- - description: XDMAC extension register region (offset and length)
+ maxItems: 1
interrupts:
maxItems: 1
@@ -49,12 +47,13 @@ required:
- reg
- interrupts
- "#dma-cells"
+ - dma-channels
examples:
- |
xdmac: dma-controller@5fc10000 {
compatible = "socionext,uniphier-xdmac";
- reg = <0x5fc10000 0x1000>, <0x5fc20000 0x800>;
+ reg = <0x5fc10000 0x5300>;
interrupts = <0 188 4>;
#dma-cells = <2>;
dma-channels = <16>;
diff --git a/Documentation/devicetree/bindings/hwmon/baikal,bt1-pvt.yaml b/Documentation/devicetree/bindings/hwmon/baikal,bt1-pvt.yaml
new file mode 100644
index 000000000000..84ae4cdd08ed
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/baikal,bt1-pvt.yaml
@@ -0,0 +1,107 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+# Copyright (C) 2020 BAIKAL ELECTRONICS, JSC
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/hwmon/baikal,bt1-pvt.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Baikal-T1 PVT Sensor
+
+maintainers:
+ - Serge Semin <fancer.lancer@gmail.com>
+
+description: |
+ Baikal-T1 SoC provides an embedded process, voltage and temperature
+ sensor to monitor an internal SoC environment (chip temperature, supply
+ voltage and process monitor) and on time detect critical situations,
+ which may cause the system instability and even damages. The IP-block
+ is based on the Analog Bits PVT sensor, but is equipped with a dedicated
+ control wrapper, which provides a MMIO registers-based access to the
+ sensor core functionality (APB3-bus based) and exposes an additional
+ functions like thresholds/data ready interrupts, its status and masks,
+ measurements timeout. Its internal structure is depicted on the next
+ diagram:
+
+ Analog Bits core Bakal-T1 PVT control block
+ +--------------------+ +------------------------+
+ | Temperature sensor |-+ +------| Sensors control |
+ |--------------------| |<---En---| |------------------------|
+ | Voltage sensor |-|<--Mode--| +--->| Sampled data |
+ |--------------------| |<--Trim--+ | |------------------------|
+ | Low-Vt sensor |-| | +--| Thresholds comparator |
+ |--------------------| |---Data----| | |------------------------|
+ | High-Vt sensor |-| | +->| Interrupts status |
+ |--------------------| |--Valid--+-+ | |------------------------|
+ | Standard-Vt sensor |-+ +---+--| Interrupts mask |
+ +--------------------+ |------------------------|
+ ^ | Interrupts timeout |
+ | +------------------------+
+ | ^ ^
+ Rclk-----+----------------------------------------+ |
+ APB3-------------------------------------------------+
+
+ This bindings describes the external Baikal-T1 PVT control interfaces
+ like MMIO registers space, interrupt request number and clocks source.
+ These are then used by the corresponding hwmon device driver to
+ implement the sysfs files-based access to the sensors functionality.
+
+properties:
+ compatible:
+ const: baikal,bt1-pvt
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ clocks:
+ items:
+ - description: PVT reference clock
+ - description: APB3 interface clock
+
+ clock-names:
+ items:
+ - const: ref
+ - const: pclk
+
+ "#thermal-sensor-cells":
+ description: Baikal-T1 can be referenced as the CPU thermal-sensor
+ const: 0
+
+ baikal,pvt-temp-offset-millicelsius:
+ description: |
+ Temperature sensor trimming factor. It can be used to manually adjust the
+ temperature measurements within 7.130 degrees Celsius.
+ maxItems: 1
+ items:
+ default: 0
+ minimum: 0
+ maximum: 7130
+
+unevaluatedProperties: false
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+ - clock-names
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/mips-gic.h>
+
+ pvt@1f200000 {
+ compatible = "baikal,bt1-pvt";
+ reg = <0x1f200000 0x1000>;
+ #thermal-sensor-cells = <0>;
+
+ interrupts = <GIC_SHARED 31 IRQ_TYPE_LEVEL_HIGH>;
+
+ baikal,pvt-temp-trim-millicelsius = <1000>;
+
+ clocks = <&ccu_sys>, <&ccu_sys>;
+ clock-names = "ref", "pclk";
+ };
+...
diff --git a/Documentation/devicetree/bindings/iio/adc/st,stm32-adc.yaml b/Documentation/devicetree/bindings/iio/adc/st,stm32-adc.yaml
index 933ba37944d7..dd8eb15aeb63 100644
--- a/Documentation/devicetree/bindings/iio/adc/st,stm32-adc.yaml
+++ b/Documentation/devicetree/bindings/iio/adc/st,stm32-adc.yaml
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
%YAML 1.2
---
-$id: "http://devicetree.org/schemas/bindings/iio/adc/st,stm32-adc.yaml#"
+$id: "http://devicetree.org/schemas/iio/adc/st,stm32-adc.yaml#"
$schema: "http://devicetree.org/meta-schemas/core.yaml#"
title: STMicroelectronics STM32 ADC bindings
diff --git a/Documentation/devicetree/bindings/interrupt-controller/loongson,htvec.yaml b/Documentation/devicetree/bindings/interrupt-controller/loongson,htvec.yaml
new file mode 100644
index 000000000000..e865cd8f96a9
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/loongson,htvec.yaml
@@ -0,0 +1,57 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/interrupt-controller/loongson,htvec.yaml#"
+$schema: "http://devicetree.org/meta-schemas/core.yaml#"
+
+title: Loongson-3 HyperTransport Interrupt Vector Controller
+
+maintainers:
+ - Jiaxun Yang <jiaxun.yang@flygoat.com>
+
+description:
+ This interrupt controller is found in the Loongson-3 family of chips for
+ receiving vectorized interrupts from PCH's interrupt controller.
+
+properties:
+ compatible:
+ const: loongson,htvec-1.0
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ minItems: 1
+ maxItems: 4
+ description: Four parent interrupts that receive chained interrupts.
+
+ interrupt-controller: true
+
+ '#interrupt-cells':
+ const: 1
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - interrupt-controller
+ - '#interrupt-cells'
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/irq.h>
+ htvec: interrupt-controller@fb000080 {
+ compatible = "loongson,htvec-1.0";
+ reg = <0xfb000080 0x40>;
+ interrupt-controller;
+ #interrupt-cells = <1>;
+
+ interrupt-parent = <&liointc>;
+ interrupts = <24 IRQ_TYPE_LEVEL_HIGH>,
+ <25 IRQ_TYPE_LEVEL_HIGH>,
+ <26 IRQ_TYPE_LEVEL_HIGH>,
+ <27 IRQ_TYPE_LEVEL_HIGH>;
+ };
+...
diff --git a/Documentation/devicetree/bindings/interrupt-controller/loongson,pch-msi.yaml b/Documentation/devicetree/bindings/interrupt-controller/loongson,pch-msi.yaml
new file mode 100644
index 000000000000..1a5ebbdd219a
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/loongson,pch-msi.yaml
@@ -0,0 +1,62 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/interrupt-controller/loongson,pch-msi.yaml#"
+$schema: "http://devicetree.org/meta-schemas/core.yaml#"
+
+title: Loongson PCH MSI Controller
+
+maintainers:
+ - Jiaxun Yang <jiaxun.yang@flygoat.com>
+
+description:
+ This interrupt controller is found in the Loongson LS7A family of PCH for
+ transforming interrupts from PCIe MSI into HyperTransport vectorized
+ interrupts.
+
+properties:
+ compatible:
+ const: loongson,pch-msi-1.0
+
+ reg:
+ maxItems: 1
+
+ loongson,msi-base-vec:
+ description:
+ u32 value of the base of parent HyperTransport vector allocated
+ to PCH MSI.
+ allOf:
+ - $ref: "/schemas/types.yaml#/definitions/uint32"
+ - minimum: 0
+ maximum: 255
+
+ loongson,msi-num-vecs:
+ description:
+ u32 value of the number of parent HyperTransport vectors allocated
+ to PCH MSI.
+ allOf:
+ - $ref: "/schemas/types.yaml#/definitions/uint32"
+ - minimum: 1
+ maximum: 256
+
+ msi-controller: true
+
+required:
+ - compatible
+ - reg
+ - msi-controller
+ - loongson,msi-base-vec
+ - loongson,msi-num-vecs
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/irq.h>
+ msi: msi-controller@2ff00000 {
+ compatible = "loongson,pch-msi-1.0";
+ reg = <0x2ff00000 0x4>;
+ msi-controller;
+ loongson,msi-base-vec = <64>;
+ loongson,msi-num-vecs = <64>;
+ interrupt-parent = <&htvec>;
+ };
+...
diff --git a/Documentation/devicetree/bindings/interrupt-controller/loongson,pch-pic.yaml b/Documentation/devicetree/bindings/interrupt-controller/loongson,pch-pic.yaml
new file mode 100644
index 000000000000..274adea13f33
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/loongson,pch-pic.yaml
@@ -0,0 +1,56 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/interrupt-controller/loongson,pch-pic.yaml#"
+$schema: "http://devicetree.org/meta-schemas/core.yaml#"
+
+title: Loongson PCH PIC Controller
+
+maintainers:
+ - Jiaxun Yang <jiaxun.yang@flygoat.com>
+
+description:
+ This interrupt controller is found in the Loongson LS7A family of PCH for
+ transforming interrupts from on-chip devices into HyperTransport vectorized
+ interrupts.
+
+properties:
+ compatible:
+ const: loongson,pch-pic-1.0
+
+ reg:
+ maxItems: 1
+
+ loongson,pic-base-vec:
+ description:
+ u32 value of the base of parent HyperTransport vector allocated
+ to PCH PIC.
+ allOf:
+ - $ref: "/schemas/types.yaml#/definitions/uint32"
+ - minimum: 0
+ maximum: 192
+
+ interrupt-controller: true
+
+ '#interrupt-cells':
+ const: 2
+
+required:
+ - compatible
+ - reg
+ - loongson,pic-base-vec
+ - interrupt-controller
+ - '#interrupt-cells'
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/irq.h>
+ pic: interrupt-controller@10000000 {
+ compatible = "loongson,pch-pic-1.0";
+ reg = <0x10000000 0x400>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
+ loongson,pic-base-vec = <64>;
+ interrupt-parent = <&htvec>;
+ };
+...
diff --git a/Documentation/devicetree/bindings/mfd/gateworks-gsc.yaml b/Documentation/devicetree/bindings/mfd/gateworks-gsc.yaml
new file mode 100644
index 000000000000..487a8445722e
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/gateworks-gsc.yaml
@@ -0,0 +1,196 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/mfd/gateworks-gsc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Gateworks System Controller
+
+description: |
+ The Gateworks System Controller (GSC) is a device present across various
+ Gateworks product families that provides a set of system related features
+ such as the following (refer to the board hardware user manuals to see what
+ features are present)
+ - Watchdog Timer
+ - GPIO
+ - Pushbutton controller
+ - Hardware monitor with ADC's for temperature and voltage rails and
+ fan controller
+
+maintainers:
+ - Tim Harvey <tharvey@gateworks.com>
+ - Robert Jones <rjones@gateworks.com>
+
+properties:
+ $nodename:
+ pattern: "gsc@[0-9a-f]{1,2}"
+ compatible:
+ const: gw,gsc
+
+ reg:
+ description: I2C device address
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ interrupt-controller: true
+
+ "#interrupt-cells":
+ const: 1
+
+ "#address-cells":
+ const: 1
+
+ "#size-cells":
+ const: 0
+
+ adc:
+ type: object
+ description: Optional hardware monitoring module
+
+ properties:
+ compatible:
+ const: gw,gsc-adc
+
+ "#address-cells":
+ const: 1
+
+ "#size-cells":
+ const: 0
+
+ patternProperties:
+ "^channel@[0-9]+$":
+ type: object
+ description: |
+ Properties for a single ADC which can report cooked values
+ (i.e. temperature sensor based on thermister), raw values
+ (i.e. voltage rail with a pre-scaling resistor divider).
+
+ properties:
+ reg:
+ description: Register of the ADC
+ maxItems: 1
+
+ label:
+ description: Name of the ADC input
+
+ gw,mode:
+ description: |
+ conversion mode:
+ 0 - temperature, in C*10
+ 1 - pre-scaled voltage value
+ 2 - scaled voltage based on an optional resistor divider
+ and optional offset
+ $ref: /schemas/types.yaml#/definitions/uint32
+ enum: [0, 1, 2]
+
+ gw,voltage-divider-ohms:
+ description: Values of resistors for divider on raw ADC input
+ maxItems: 2
+ items:
+ minimum: 1000
+ maximum: 1000000
+
+ gw,voltage-offset-microvolt:
+ description: |
+ A positive voltage offset to apply to a raw ADC
+ (i.e. to compensate for a diode drop).
+ minimum: 0
+ maximum: 1000000
+
+ required:
+ - gw,mode
+ - reg
+ - label
+
+ required:
+ - compatible
+ - "#address-cells"
+ - "#size-cells"
+
+patternProperties:
+ "^fan-controller@[0-9a-f]+$":
+ type: object
+ description: Optional fan controller
+
+ properties:
+ compatible:
+ const: gw,gsc-fan
+
+ "#address-cells":
+ const: 1
+
+ "#size-cells":
+ const: 0
+
+ reg:
+ description: The fan controller base address
+ maxItems: 1
+
+ required:
+ - compatible
+ - reg
+ - "#address-cells"
+ - "#size-cells"
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - interrupt-controller
+ - "#interrupt-cells"
+ - "#address-cells"
+ - "#size-cells"
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+ i2c {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ gsc@20 {
+ compatible = "gw,gsc";
+ reg = <0x20>;
+ interrupt-parent = <&gpio1>;
+ interrupts = <4 GPIO_ACTIVE_LOW>;
+ interrupt-controller;
+ #interrupt-cells = <1>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ adc {
+ compatible = "gw,gsc-adc";
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ channel@0 { /* A0: Board Temperature */
+ reg = <0x00>;
+ label = "temp";
+ gw,mode = <0>;
+ };
+
+ channel@2 { /* A1: Input Voltage (raw ADC) */
+ reg = <0x02>;
+ label = "vdd_vin";
+ gw,mode = <1>;
+ gw,voltage-divider-ohms = <22100 1000>;
+ gw,voltage-offset-microvolt = <800000>;
+ };
+
+ channel@b { /* A2: Battery voltage */
+ reg = <0x0b>;
+ label = "vdd_bat";
+ gw,mode = <1>;
+ };
+ };
+
+ fan-controller@2c {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "gw,gsc-fan";
+ reg = <0x2c>;
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/mfd/max8998.txt b/Documentation/devicetree/bindings/mfd/max8998.txt
index 5f2f07c09c90..4ed52184d081 100644
--- a/Documentation/devicetree/bindings/mfd/max8998.txt
+++ b/Documentation/devicetree/bindings/mfd/max8998.txt
@@ -73,6 +73,8 @@ number as described in MAX8998 datasheet.
- ESAFEOUT1: (ldo19)
- ESAFEOUT2: (ld020)
+ - CHARGER: main battery charger current control
+
Standard regulator bindings are used inside regulator subnodes. Check
Documentation/devicetree/bindings/regulator/regulator.txt
for more details.
@@ -113,5 +115,11 @@ Example:
regulator-always-on;
regulator-boot-on;
};
+
+ charger_reg: CHARGER {
+ regulator-name = "CHARGER";
+ regulator-min-microamp = <90000>;
+ regulator-max-microamp = <800000>;
+ };
};
};
diff --git a/Documentation/devicetree/bindings/mfd/st,stpmic1.yaml b/Documentation/devicetree/bindings/mfd/st,stpmic1.yaml
index f88d13d70441..be7faa6dc055 100644
--- a/Documentation/devicetree/bindings/mfd/st,stpmic1.yaml
+++ b/Documentation/devicetree/bindings/mfd/st,stpmic1.yaml
@@ -259,8 +259,6 @@ properties:
additionalProperties: false
- additionalProperties: false
-
additionalProperties: false
required:
diff --git a/Documentation/devicetree/bindings/mips/loongson/rs780e-acpi.yaml b/Documentation/devicetree/bindings/mips/loongson/rs780e-acpi.yaml
new file mode 100644
index 000000000000..d317897e1115
--- /dev/null
+++ b/Documentation/devicetree/bindings/mips/loongson/rs780e-acpi.yaml
@@ -0,0 +1,40 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/mips/loongson/rs780e-acpi.yaml#"
+$schema: "http://devicetree.org/meta-schemas/core.yaml#"
+
+title: Loongson RS780E PCH ACPI Controller
+
+maintainers:
+ - Jiaxun Yang <jiaxun.yang@flygoat.com>
+
+description: |
+ This controller can be found in Loongson-3 systems with RS780E PCH.
+
+properties:
+ compatible:
+ const: loongson,rs780e-acpi
+
+ reg:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+
+examples:
+ - |
+ isa@0 {
+ compatible = "isa";
+ #address-cells = <2>;
+ #size-cells = <1>;
+ ranges = <1 0 0 0x1000>;
+
+ acpi@800 {
+ compatible = "loongson,rs780e-acpi";
+ reg = <1 0x800 0x100>;
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/mmc/amlogic,meson-mx-sdhc.yaml b/Documentation/devicetree/bindings/mmc/amlogic,meson-mx-sdhc.yaml
new file mode 100644
index 000000000000..7a386a5b8fcb
--- /dev/null
+++ b/Documentation/devicetree/bindings/mmc/amlogic,meson-mx-sdhc.yaml
@@ -0,0 +1,68 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/mmc/amlogic,meson-mx-sdhc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Amlogic Meson SDHC controller Device Tree Bindings
+
+allOf:
+ - $ref: "mmc-controller.yaml"
+
+maintainers:
+ - Martin Blumenstingl <martin.blumenstingl@googlemail.com>
+
+description: |
+ The SDHC MMC host controller on Amlogic SoCs provides an eMMC and MMC
+ card interface with 1/4/8-bit bus width.
+ It supports eMMC spec 4.4x/4.5x including HS200 (up to 100MHz clock).
+
+properties:
+ compatible:
+ items:
+ - enum:
+ - amlogic,meson8-sdhc
+ - amlogic,meson8b-sdhc
+ - amlogic,meson8m2-sdhc
+ - const: amlogic,meson-mx-sdhc
+
+ reg:
+ minItems: 1
+
+ interrupts:
+ minItems: 1
+
+ clocks:
+ minItems: 5
+
+ clock-names:
+ items:
+ - const: clkin0
+ - const: clkin1
+ - const: clkin2
+ - const: clkin3
+ - const: pclk
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+ - clock-names
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/irq.h>
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+ sdhc: mmc@8e00 {
+ compatible = "amlogic,meson8-sdhc", "amlogic,meson-mx-sdhc";
+ reg = <0x8e00 0x42>;
+ interrupts = <GIC_SPI 78 IRQ_TYPE_EDGE_RISING>;
+ clocks = <&xtal>,
+ <&fclk_div4>,
+ <&fclk_div3>,
+ <&fclk_div5>,
+ <&sdhc_pclk>;
+ clock-names = "clkin0", "clkin1", "clkin2", "clkin3", "pclk";
+ };
diff --git a/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt b/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt
index 428685eb2ded..f29bf7dd2ece 100644
--- a/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt
+++ b/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt
@@ -18,12 +18,21 @@ Required Properties:
- "xlnx,zynqmp-8.9a": ZynqMP SDHCI 8.9a PHY
For this device it is strongly suggested to include clock-output-names and
#clock-cells.
+ - "xlnx,versal-8.9a": Versal SDHCI 8.9a PHY
+ For this device it is strongly suggested to include clock-output-names and
+ #clock-cells.
- "ti,am654-sdhci-5.1", "arasan,sdhci-5.1": TI AM654 MMC PHY
Note: This binding has been deprecated and moved to [5].
- "intel,lgm-sdhci-5.1-emmc", "arasan,sdhci-5.1": Intel LGM eMMC PHY
For this device it is strongly suggested to include arasan,soc-ctl-syscon.
- "intel,lgm-sdhci-5.1-sdxc", "arasan,sdhci-5.1": Intel LGM SDXC PHY
For this device it is strongly suggested to include arasan,soc-ctl-syscon.
+ - "intel,keembay-sdhci-5.1-emmc", "arasan,sdhci-5.1": Intel Keem Bay eMMC
+ For this device it is strongly suggested to include arasan,soc-ctl-syscon.
+ - "intel,keembay-sdhci-5.1-sd": Intel Keem Bay SD controller
+ For this device it is strongly suggested to include arasan,soc-ctl-syscon.
+ - "intel,keembay-sdhci-5.1-sdio": Intel Keem Bay SDIO controller
+ For this device it is strongly suggested to include arasan,soc-ctl-syscon.
[5] Documentation/devicetree/bindings/mmc/sdhci-am654.txt
@@ -104,6 +113,18 @@ Example:
clk-phase-sd-hs = <63>, <72>;
};
+ sdhci: mmc@f1040000 {
+ compatible = "xlnx,versal-8.9a", "arasan,sdhci-8.9a";
+ interrupt-parent = <&gic>;
+ interrupts = <0 126 4>;
+ reg = <0x0 0xf1040000 0x0 0x10000>;
+ clocks = <&clk200>, <&clk200>;
+ clock-names = "clk_xin", "clk_ahb";
+ clock-output-names = "clk_out_sd0", "clk_in_sd0";
+ #clock-cells = <1>;
+ clk-phase-sd-hs = <132>, <60>;
+ };
+
emmc: sdhci@ec700000 {
compatible = "intel,lgm-sdhci-5.1-emmc", "arasan,sdhci-5.1";
reg = <0xec700000 0x300>;
@@ -133,3 +154,39 @@ Example:
phy-names = "phy_arasan";
arasan,soc-ctl-syscon = <&sysconf>;
};
+
+ mmc: mmc@33000000 {
+ compatible = "intel,keembay-sdhci-5.1-emmc", "arasan,sdhci-5.1";
+ interrupts = <GIC_SPI 82 IRQ_TYPE_LEVEL_HIGH>;
+ reg = <0x0 0x33000000 0x0 0x300>;
+ clock-names = "clk_xin", "clk_ahb";
+ clocks = <&scmi_clk KEEM_BAY_PSS_AUX_EMMC>,
+ <&scmi_clk KEEM_BAY_PSS_EMMC>;
+ phys = <&emmc_phy>;
+ phy-names = "phy_arasan";
+ assigned-clocks = <&scmi_clk KEEM_BAY_PSS_AUX_EMMC>;
+ assigned-clock-rates = <200000000>;
+ clock-output-names = "emmc_cardclock";
+ #clock-cells = <0>;
+ arasan,soc-ctl-syscon = <&mmc_phy_syscon>;
+ };
+
+ sd0: mmc@31000000 {
+ compatible = "intel,keembay-sdhci-5.1-sd";
+ interrupts = <GIC_SPI 83 IRQ_TYPE_LEVEL_HIGH>;
+ reg = <0x0 0x31000000 0x0 0x300>;
+ clock-names = "clk_xin", "clk_ahb";
+ clocks = <&scmi_clk KEEM_BAY_PSS_AUX_SD0>,
+ <&scmi_clk KEEM_BAY_PSS_SD0>;
+ arasan,soc-ctl-syscon = <&sd0_phy_syscon>;
+ };
+
+ sd1: mmc@32000000 {
+ compatible = "intel,keembay-sdhci-5.1-sdio";
+ interrupts = <GIC_SPI 84 IRQ_TYPE_LEVEL_HIGH>;
+ reg = <0x0 0x32000000 0x0 0x300>;
+ clock-names = "clk_xin", "clk_ahb";
+ clocks = <&scmi_clk KEEM_BAY_PSS_AUX_SD1>,
+ <&scmi_clk KEEM_BAY_PSS_SD1>;
+ arasan,soc-ctl-syscon = <&sd1_phy_syscon>;
+ };
diff --git a/Documentation/devicetree/bindings/mmc/renesas,mmcif.txt b/Documentation/devicetree/bindings/mmc/renesas,mmcif.txt
index c064af5838aa..291532ac0446 100644
--- a/Documentation/devicetree/bindings/mmc/renesas,mmcif.txt
+++ b/Documentation/devicetree/bindings/mmc/renesas,mmcif.txt
@@ -11,6 +11,7 @@ Required properties:
- "renesas,mmcif-r7s72100" for the MMCIF found in r7s72100 SoCs
- "renesas,mmcif-r8a73a4" for the MMCIF found in r8a73a4 SoCs
- "renesas,mmcif-r8a7740" for the MMCIF found in r8a7740 SoCs
+ - "renesas,mmcif-r8a7742" for the MMCIF found in r8a7742 SoCs
- "renesas,mmcif-r8a7743" for the MMCIF found in r8a7743 SoCs
- "renesas,mmcif-r8a7744" for the MMCIF found in r8a7744 SoCs
- "renesas,mmcif-r8a7745" for the MMCIF found in r8a7745 SoCs
@@ -24,8 +25,8 @@ Required properties:
- interrupts: Some SoCs have only 1 shared interrupt, while others have either
2 or 3 individual interrupts (error, int, card detect). Below is the number
of interrupts for each SoC:
- 1: r8a73a4, r8a7743, r8a7744, r8a7745, r8a7778, r8a7790, r8a7791, r8a7793,
- r8a7794
+ 1: r8a73a4, r8a7742, r8a7743, r8a7744, r8a7745, r8a7778, r8a7790, r8a7791,
+ r8a7793, r8a7794
2: r8a7740, sh73a0
3: r7s72100
diff --git a/Documentation/devicetree/bindings/mmc/renesas,sdhi.txt b/Documentation/devicetree/bindings/mmc/renesas,sdhi.txt
index e6cc47844207..0ca9a622cce0 100644
--- a/Documentation/devicetree/bindings/mmc/renesas,sdhi.txt
+++ b/Documentation/devicetree/bindings/mmc/renesas,sdhi.txt
@@ -7,6 +7,7 @@ Required properties:
"renesas,sdhi-r7s9210" - SDHI IP on R7S9210 SoC
"renesas,sdhi-r8a73a4" - SDHI IP on R8A73A4 SoC
"renesas,sdhi-r8a7740" - SDHI IP on R8A7740 SoC
+ "renesas,sdhi-r8a7742" - SDHI IP on R8A7742 SoC
"renesas,sdhi-r8a7743" - SDHI IP on R8A7743 SoC
"renesas,sdhi-r8a7744" - SDHI IP on R8A7744 SoC
"renesas,sdhi-r8a7745" - SDHI IP on R8A7745 SoC
diff --git a/Documentation/devicetree/bindings/mmc/sdhci-msm.txt b/Documentation/devicetree/bindings/mmc/sdhci-msm.txt
index 5445931c5ab9..b8e1d2b7aea9 100644
--- a/Documentation/devicetree/bindings/mmc/sdhci-msm.txt
+++ b/Documentation/devicetree/bindings/mmc/sdhci-msm.txt
@@ -17,6 +17,7 @@ Required properties:
"qcom,msm8916-sdhci", "qcom,sdhci-msm-v4"
"qcom,msm8992-sdhci", "qcom,sdhci-msm-v4"
"qcom,msm8996-sdhci", "qcom,sdhci-msm-v4"
+ "qcom,sm8250-sdhci", "qcom,sdhci-msm-v5"
"qcom,sdm845-sdhci", "qcom,sdhci-msm-v5"
"qcom,qcs404-sdhci", "qcom,sdhci-msm-v5"
"qcom,sc7180-sdhci", "qcom,sdhci-msm-v5";
@@ -46,6 +47,13 @@ Required properties:
"cal" - reference clock for RCLK delay calibration (optional)
"sleep" - sleep clock for RCLK delay calibration (optional)
+- qcom,ddr-config: Certain chipsets and platforms require particular settings
+ for the DDR_CONFIG register. Use this field to specify the register
+ value as per the Hardware Programming Guide.
+
+- qcom,dll-config: Chipset and Platform specific value. Use this field to
+ specify the DLL_CONFIG register value as per Hardware Programming Guide.
+
Example:
sdhc_1: sdhci@f9824900 {
@@ -63,6 +71,9 @@ Example:
clocks = <&gcc GCC_SDCC1_APPS_CLK>, <&gcc GCC_SDCC1_AHB_CLK>;
clock-names = "core", "iface";
+
+ qcom,dll-config = <0x000f642c>;
+ qcom,ddr-config = <0x80040868>;
};
sdhc_2: sdhci@f98a4900 {
@@ -80,4 +91,7 @@ Example:
clocks = <&gcc GCC_SDCC2_APPS_CLK>, <&gcc GCC_SDCC2_AHB_CLK>;
clock-names = "core", "iface";
+
+ qcom,dll-config = <0x0007642c>;
+ qcom,ddr-config = <0x80040868>;
};
diff --git a/Documentation/devicetree/bindings/mmc/sdhci-pxa.txt b/Documentation/devicetree/bindings/mmc/sdhci-pxa.txt
deleted file mode 100644
index 3d1b449d6097..000000000000
--- a/Documentation/devicetree/bindings/mmc/sdhci-pxa.txt
+++ /dev/null
@@ -1,50 +0,0 @@
-* Marvell sdhci-pxa v2/v3 controller
-
-This file documents differences between the core properties in mmc.txt
-and the properties used by the sdhci-pxav2 and sdhci-pxav3 drivers.
-
-Required properties:
-- compatible: Should be "mrvl,pxav2-mmc", "mrvl,pxav3-mmc" or
- "marvell,armada-380-sdhci".
-- reg:
- * for "mrvl,pxav2-mmc" and "mrvl,pxav3-mmc", one register area for
- the SDHCI registers.
-
- * for "marvell,armada-380-sdhci", three register areas. The first
- one for the SDHCI registers themselves, the second one for the
- AXI/Mbus bridge registers of the SDHCI unit, the third one for the
- SDIO3 Configuration register
-- reg names: should be "sdhci", "mbus", "conf-sdio3". only mandatory
- for "marvell,armada-380-sdhci"
-- clocks: Array of clocks required for SDHCI; requires at least one for
- I/O clock.
-- clock-names: Array of names corresponding to clocks property; shall be
- "io" for I/O clock and "core" for optional core clock.
-
-Optional properties:
-- mrvl,clk-delay-cycles: Specify a number of cycles to delay for tuning.
-
-Example:
-
-sdhci@d4280800 {
- compatible = "mrvl,pxav3-mmc";
- reg = <0xd4280800 0x800>;
- bus-width = <8>;
- interrupts = <27>;
- clocks = <&chip CLKID_SDIO1XIN>, <&chip CLKID_SDIO1>;
- clock-names = "io", "core";
- non-removable;
- mrvl,clk-delay-cycles = <31>;
-};
-
-sdhci@d8000 {
- compatible = "marvell,armada-380-sdhci";
- reg-names = "sdhci", "mbus", "conf-sdio3";
- reg = <0xd8000 0x1000>,
- <0xdc000 0x100>;
- <0x18454 0x4>;
- interrupts = <0 25 0x4>;
- clocks = <&gateclk 17>;
- clock-names = "io";
- mrvl,clk-delay-cycles = <0x1F>;
-};
diff --git a/Documentation/devicetree/bindings/mmc/sdhci-pxa.yaml b/Documentation/devicetree/bindings/mmc/sdhci-pxa.yaml
new file mode 100644
index 000000000000..a58715c860b7
--- /dev/null
+++ b/Documentation/devicetree/bindings/mmc/sdhci-pxa.yaml
@@ -0,0 +1,102 @@
+# SPDX-License-Identifier: GPL-2.0-only
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/mmc/sdhci-pxa.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Marvell PXA SDHCI v2/v3 bindings
+
+maintainers:
+ - Ulf Hansson <ulf.hansson@linaro.org>
+
+allOf:
+ - $ref: mmc-controller.yaml#
+ - if:
+ properties:
+ compatible:
+ contains:
+ const: marvell,armada-380-sdhci
+ then:
+ properties:
+ regs:
+ minItems: 3
+ reg-names:
+ minItems: 3
+ required:
+ - reg-names
+ else:
+ properties:
+ regs:
+ maxItems: 1
+ reg-names:
+ maxItems: 1
+
+properties:
+ compatible:
+ enum:
+ - mrvl,pxav2-mmc
+ - mrvl,pxav3-mmc
+ - marvell,armada-380-sdhci
+
+ reg:
+ minItems: 1
+ maxItems: 3
+
+ reg-names:
+ items:
+ - const: sdhci
+ - const: mbus
+ - const: conf-sdio3
+
+ interrupts:
+ maxItems: 1
+
+ clocks:
+ minItems: 1
+ maxItems: 2
+
+ clock-names:
+ minItems: 1
+ maxItems: 2
+ items:
+ - const: io
+ - const: core
+
+ mrvl,clk-delay-cycles:
+ description: Specify a number of cycles to delay for tuning.
+ $ref: /schemas/types.yaml#/definitions/uint32
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+ - clock-names
+
+examples:
+ - |
+ #include <dt-bindings/clock/berlin2.h>
+ mmc@d4280800 {
+ compatible = "mrvl,pxav3-mmc";
+ reg = <0xd4280800 0x800>;
+ bus-width = <8>;
+ interrupts = <27>;
+ clocks = <&chip CLKID_SDIO1XIN>, <&chip CLKID_SDIO1>;
+ clock-names = "io", "core";
+ non-removable;
+ mrvl,clk-delay-cycles = <31>;
+ };
+ - |
+ mmc@d8000 {
+ compatible = "marvell,armada-380-sdhci";
+ reg-names = "sdhci", "mbus", "conf-sdio3";
+ reg = <0xd8000 0x1000>,
+ <0xdc000 0x100>,
+ <0x18454 0x4>;
+ interrupts = <0 25 0x4>;
+ clocks = <&gateclk 17>;
+ clock-names = "io";
+ mrvl,clk-delay-cycles = <0x1F>;
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml b/Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml
index ae91aa9d8616..64c20c92c07d 100644
--- a/Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml
@@ -40,18 +40,22 @@ allOf:
then:
properties:
clocks:
+ minItems: 3
+ maxItems: 4
items:
- description: GMAC main clock
- description: First parent clock of the internal mux
- description: Second parent clock of the internal mux
+ - description: The clock which drives the timing adjustment logic
clock-names:
minItems: 3
- maxItems: 3
+ maxItems: 4
items:
- const: stmmaceth
- const: clkin0
- const: clkin1
+ - const: timing-adjustment
amlogic,tx-delay-ns:
$ref: /schemas/types.yaml#definitions/uint32
@@ -67,6 +71,19 @@ allOf:
PHY and MAC are adding a delay).
Any configuration is ignored when the phy-mode is set to "rmii".
+ amlogic,rx-delay-ns:
+ enum:
+ - 0
+ - 2
+ default: 0
+ description:
+ The internal RGMII RX clock delay (provided by this IP block) in
+ nanoseconds. When phy-mode is set to "rgmii" then the RX delay
+ should be explicitly configured. When the phy-mode is set to
+ either "rgmii-id" or "rgmii-rxid" the RX clock delay is already
+ provided by the PHY. Any configuration is ignored when the
+ phy-mode is set to "rmii".
+
properties:
compatible:
additionalItems: true
@@ -107,7 +124,7 @@ examples:
reg = <0xc9410000 0x10000>, <0xc8834540 0x8>;
interrupts = <8>;
interrupt-names = "macirq";
- clocks = <&clk_eth>, <&clkc_fclk_div2>, <&clk_mpll2>;
- clock-names = "stmmaceth", "clkin0", "clkin1";
+ clocks = <&clk_eth>, <&clk_fclk_div2>, <&clk_mpll2>, <&clk_fclk_div2>;
+ clock-names = "stmmaceth", "clkin0", "clkin1", "timing-adjustment";
phy-mode = "rgmii";
};
diff --git a/Documentation/devicetree/bindings/net/dsa/b53.txt b/Documentation/devicetree/bindings/net/dsa/b53.txt
index 5201bc15fdd6..cfd1afdc6e94 100644
--- a/Documentation/devicetree/bindings/net/dsa/b53.txt
+++ b/Documentation/devicetree/bindings/net/dsa/b53.txt
@@ -110,6 +110,9 @@ Ethernet switch connected via MDIO to the host, CPU port wired to eth0:
#size-cells = <0>;
ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
port0@0 {
reg = <0>;
label = "lan1";
diff --git a/Documentation/devicetree/bindings/net/ethernet-phy.yaml b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
index 5aa141ccc113..9b1f1147ca36 100644
--- a/Documentation/devicetree/bindings/net/ethernet-phy.yaml
+++ b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
@@ -81,7 +81,8 @@ properties:
$ref: /schemas/types.yaml#definitions/flag
description:
If set, indicates the PHY device does not correctly release
- the turn around line low at the end of a MDIO transaction.
+ the turn around line low at end of the control phase of the
+ MDIO transaction.
enet-phy-lane-swap:
$ref: /schemas/types.yaml#definitions/flag
diff --git a/Documentation/devicetree/bindings/net/fsl-fec.txt b/Documentation/devicetree/bindings/net/fsl-fec.txt
index ff8b0f211aa1..9b543789cd52 100644
--- a/Documentation/devicetree/bindings/net/fsl-fec.txt
+++ b/Documentation/devicetree/bindings/net/fsl-fec.txt
@@ -22,8 +22,11 @@ Optional properties:
- fsl,err006687-workaround-present: If present indicates that the system has
the hardware workaround for ERR006687 applied and does not need a software
workaround.
-- gpr: phandle of SoC general purpose register mode. Required for wake on LAN
- on some SoCs
+- fsl,stop-mode: register bits of stop mode control, the format is
+ <&gpr req_gpr req_bit>.
+ gpr is the phandle to general purpose register node.
+ req_gpr is the gpr register offset for ENET stop request.
+ req_bit is the gpr bit offset for ENET stop request.
-interrupt-names: names of the interrupts listed in interrupts property in
the same order. The defaults if not specified are
__Number of interrupts__ __Default__
@@ -82,6 +85,7 @@ ethernet@83fec000 {
phy-supply = <&reg_fec_supply>;
phy-handle = <&ethphy>;
mdio {
+ clock-frequency = <5000000>;
ethphy: ethernet-phy@6 {
compatible = "ethernet-phy-ieee802.3-c22";
reg = <6>;
diff --git a/Documentation/devicetree/bindings/net/imx-dwmac.txt b/Documentation/devicetree/bindings/net/imx-dwmac.txt
new file mode 100644
index 000000000000..921d522fe8d7
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/imx-dwmac.txt
@@ -0,0 +1,56 @@
+IMX8 glue layer controller, NXP imx8 families support Synopsys MAC 5.10a IP.
+
+This file documents platform glue layer for IMX.
+Please see stmmac.txt for the other unchanged properties.
+
+The device node has following properties.
+
+Required properties:
+- compatible: Should be "nxp,imx8mp-dwmac-eqos" to select glue layer
+ and "snps,dwmac-5.10a" to select IP version.
+- clocks: Must contain a phandle for each entry in clock-names.
+- clock-names: Should be "stmmaceth" for the host clock.
+ Should be "pclk" for the MAC apb clock.
+ Should be "ptp_ref" for the MAC timer clock.
+ Should be "tx" for the MAC RGMII TX clock:
+ Should be "mem" for EQOS MEM clock.
+ - "mem" clock is required for imx8dxl platform.
+ - "mem" clock is not required for imx8mp platform.
+- interrupt-names: Should contain a list of interrupt names corresponding to
+ the interrupts in the interrupts property, if available.
+ Should be "macirq" for the main MAC IRQ
+ Should be "eth_wake_irq" for the IT which wake up system
+- intf_mode: Should be phandle/offset pair. The phandle to the syscon node which
+ encompases the GPR register, and the offset of the GPR register.
+ - required for imx8mp platform.
+ - is optional for imx8dxl platform.
+
+Optional properties:
+- intf_mode: is optional for imx8dxl platform.
+- snps,rmii_refclk_ext: to select RMII reference clock from external.
+
+Example:
+ eqos: ethernet@30bf0000 {
+ compatible = "nxp,imx8mp-dwmac-eqos", "snps,dwmac-5.10a";
+ reg = <0x30bf0000 0x10000>;
+ interrupts = <GIC_SPI 134 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 135 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "eth_wake_irq", "macirq";
+ clocks = <&clk IMX8MP_CLK_ENET_QOS_ROOT>,
+ <&clk IMX8MP_CLK_QOS_ENET_ROOT>,
+ <&clk IMX8MP_CLK_ENET_QOS_TIMER>,
+ <&clk IMX8MP_CLK_ENET_QOS>;
+ clock-names = "stmmaceth", "pclk", "ptp_ref", "tx";
+ assigned-clocks = <&clk IMX8MP_CLK_ENET_AXI>,
+ <&clk IMX8MP_CLK_ENET_QOS_TIMER>,
+ <&clk IMX8MP_CLK_ENET_QOS>;
+ assigned-clock-parents = <&clk IMX8MP_SYS_PLL1_266M>,
+ <&clk IMX8MP_SYS_PLL2_100M>,
+ <&clk IMX8MP_SYS_PLL2_125M>;
+ assigned-clock-rates = <0>, <100000000>, <125000000>;
+ nvmem-cells = <&eth_mac0>;
+ nvmem-cell-names = "mac-address";
+ nvmem_macaddr_swap;
+ intf_mode = <&gpr 0x4>;
+ status = "disabled";
+ };
diff --git a/Documentation/devicetree/bindings/net/mdio.yaml b/Documentation/devicetree/bindings/net/mdio.yaml
index 50c3397a82bc..d6a3bf8550eb 100644
--- a/Documentation/devicetree/bindings/net/mdio.yaml
+++ b/Documentation/devicetree/bindings/net/mdio.yaml
@@ -31,13 +31,25 @@ properties:
maxItems: 1
description:
The phandle and specifier for the GPIO that controls the RESET
- lines of all PHYs on that MDIO bus.
+ lines of all devices on that MDIO bus.
reset-delay-us:
description:
- RESET pulse width in microseconds. It applies to all PHY devices
- and must therefore be appropriately determined based on all PHY
- requirements (maximum value of all per-PHY RESET pulse widths).
+ RESET pulse width in microseconds. It applies to all MDIO devices
+ and must therefore be appropriately determined based on all devices
+ requirements (maximum value of all per-device RESET pulse widths).
+
+ clock-frequency:
+ description:
+ Desired MDIO bus clock frequency in Hz. Values greater than IEEE 802.3
+ defined 2.5MHz should only be used when all devices on the bus support
+ the given clock speed.
+
+ suppress-preamble:
+ description:
+ The 32 bit preamble should be suppressed. In order for this to
+ work, all devices on the bus must support suppressed preamble.
+ type: boolean
patternProperties:
"^ethernet-phy@[0-9a-f]+$":
@@ -48,7 +60,35 @@ patternProperties:
minimum: 0
maximum: 31
description:
- The ID number for the PHY.
+ The ID number for the device.
+
+ broken-turn-around:
+ $ref: /schemas/types.yaml#definitions/flag
+ description:
+ If set, indicates the MDIO device does not correctly release
+ the turn around line low at end of the control phase of the
+ MDIO transaction.
+
+ resets:
+ maxItems: 1
+
+ reset-names:
+ const: phy
+
+ reset-gpios:
+ maxItems: 1
+ description:
+ The GPIO phandle and specifier for the MDIO reset signal.
+
+ reset-assert-us:
+ description:
+ Delay after the reset was asserted in microseconds. If this
+ property is missing the delay will be skipped.
+
+ reset-deassert-us:
+ description:
+ Delay after the reset was deasserted in microseconds. If
+ this property is missing the delay will be skipped.
required:
- reg
diff --git a/Documentation/devicetree/bindings/net/mediatek,star-emac.yaml b/Documentation/devicetree/bindings/net/mediatek,star-emac.yaml
new file mode 100644
index 000000000000..aea88e621792
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/mediatek,star-emac.yaml
@@ -0,0 +1,89 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/mediatek,star-emac.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: MediaTek STAR Ethernet MAC Controller
+
+maintainers:
+ - Bartosz Golaszewski <bgolaszewski@baylibre.com>
+
+description:
+ This Ethernet MAC is used on the MT8* family of SoCs from MediaTek.
+ It's compliant with 802.3 standards and supports half- and full-duplex
+ modes with flow-control as well as CRC offloading and VLAN tags.
+
+allOf:
+ - $ref: "ethernet-controller.yaml#"
+
+properties:
+ compatible:
+ enum:
+ - mediatek,mt8516-eth
+ - mediatek,mt8518-eth
+ - mediatek,mt8175-eth
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ clocks:
+ minItems: 3
+ maxItems: 3
+
+ clock-names:
+ additionalItems: false
+ items:
+ - const: core
+ - const: reg
+ - const: trans
+
+ mediatek,pericfg:
+ $ref: /schemas/types.yaml#definitions/phandle
+ description:
+ Phandle to the device containing the PERICFG register range. This is used
+ to control the MII mode.
+
+ mdio:
+ type: object
+ description:
+ Creates and registers an MDIO bus.
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+ - clock-names
+ - mediatek,pericfg
+ - phy-handle
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/clock/mt8516-clk.h>
+
+ ethernet: ethernet@11180000 {
+ compatible = "mediatek,mt8516-eth";
+ reg = <0x11180000 0x1000>;
+ mediatek,pericfg = <&pericfg>;
+ interrupts = <GIC_SPI 111 IRQ_TYPE_LEVEL_LOW>;
+ clocks = <&topckgen CLK_TOP_RG_ETH>,
+ <&topckgen CLK_TOP_66M_ETH>,
+ <&topckgen CLK_TOP_133M_ETH>;
+ clock-names = "core", "reg", "trans";
+ phy-handle = <&eth_phy>;
+ phy-mode = "rmii";
+
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ eth_phy: ethernet-phy@0 {
+ reg = <0>;
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/nxp,tja11xx.yaml b/Documentation/devicetree/bindings/net/nxp,tja11xx.yaml
new file mode 100644
index 000000000000..42be0255512b
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/nxp,tja11xx.yaml
@@ -0,0 +1,61 @@
+# SPDX-License-Identifier: GPL-2.0+
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/nxp,tja11xx.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: NXP TJA11xx PHY
+
+maintainers:
+ - Andrew Lunn <andrew@lunn.ch>
+ - Florian Fainelli <f.fainelli@gmail.com>
+ - Heiner Kallweit <hkallweit1@gmail.com>
+
+description:
+ Bindings for NXP TJA11xx automotive PHYs
+
+allOf:
+ - $ref: ethernet-phy.yaml#
+
+patternProperties:
+ "^ethernet-phy@[0-9a-f]+$":
+ type: object
+ description: |
+ Some packages have multiple PHYs. Secondary PHY should be defines as
+ subnode of the first (parent) PHY.
+
+ properties:
+ reg:
+ minimum: 0
+ maximum: 31
+ description:
+ The ID number for the child PHY. Should be +1 of parent PHY.
+
+ required:
+ - reg
+
+examples:
+ - |
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ tja1101_phy0: ethernet-phy@4 {
+ reg = <0x4>;
+ };
+ };
+ - |
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ tja1102_phy0: ethernet-phy@4 {
+ reg = <0x4>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ tja1102_phy1: ethernet-phy@5 {
+ reg = <0x5>;
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/qca,ar71xx.txt b/Documentation/devicetree/bindings/net/qca,ar71xx.txt
deleted file mode 100644
index 2a33e71ba72b..000000000000
--- a/Documentation/devicetree/bindings/net/qca,ar71xx.txt
+++ /dev/null
@@ -1,45 +0,0 @@
-Required properties:
-- compatible: Should be "qca,<soc>-eth". Currently support compatibles are:
- qca,ar7100-eth - Atheros AR7100
- qca,ar7240-eth - Atheros AR7240
- qca,ar7241-eth - Atheros AR7241
- qca,ar7242-eth - Atheros AR7242
- qca,ar9130-eth - Atheros AR9130
- qca,ar9330-eth - Atheros AR9330
- qca,ar9340-eth - Atheros AR9340
- qca,qca9530-eth - Qualcomm Atheros QCA9530
- qca,qca9550-eth - Qualcomm Atheros QCA9550
- qca,qca9560-eth - Qualcomm Atheros QCA9560
-
-- reg : Address and length of the register set for the device
-- interrupts : Should contain eth interrupt
-- phy-mode : See ethernet.txt file in the same directory
-- clocks: the clock used by the core
-- clock-names: the names of the clock listed in the clocks property. These are
- "eth" and "mdio".
-- resets: Should contain phandles to the reset signals
-- reset-names: Should contain the names of reset signal listed in the resets
- property. These are "mac" and "mdio"
-
-Optional properties:
-- phy-handle : phandle to the PHY device connected to this device.
-- fixed-link : Assume a fixed link. See fixed-link.txt in the same directory.
- Use instead of phy-handle.
-
-Optional subnodes:
-- mdio : specifies the mdio bus, used as a container for phy nodes
- according to phy.txt in the same directory
-
-Example:
-
-ethernet@1a000000 {
- compatible = "qca,ar9330-eth";
- reg = <0x1a000000 0x200>;
- interrupts = <5>;
- resets = <&rst 13>, <&rst 23>;
- reset-names = "mac", "mdio";
- clocks = <&pll ATH79_CLK_AHB>, <&pll ATH79_CLK_MDIO>;
- clock-names = "eth", "mdio";
-
- phy-mode = "gmii";
-};
diff --git a/Documentation/devicetree/bindings/net/qca,ar71xx.yaml b/Documentation/devicetree/bindings/net/qca,ar71xx.yaml
new file mode 100644
index 000000000000..f99a5aabe923
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/qca,ar71xx.yaml
@@ -0,0 +1,216 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/qca,ar71xx.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: QCA AR71XX MAC
+
+allOf:
+ - $ref: ethernet-controller.yaml#
+
+maintainers:
+ - Oleksij Rempel <o.rempel@pengutronix.de>
+
+properties:
+ compatible:
+ oneOf:
+ - items:
+ - enum:
+ - qca,ar7100-eth # Atheros AR7100
+ - qca,ar7240-eth # Atheros AR7240
+ - qca,ar7241-eth # Atheros AR7241
+ - qca,ar7242-eth # Atheros AR7242
+ - qca,ar9130-eth # Atheros AR9130
+ - qca,ar9330-eth # Atheros AR9330
+ - qca,ar9340-eth # Atheros AR9340
+ - qca,qca9530-eth # Qualcomm Atheros QCA9530
+ - qca,qca9550-eth # Qualcomm Atheros QCA9550
+ - qca,qca9560-eth # Qualcomm Atheros QCA9560
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ '#address-cells':
+ description: number of address cells for the MDIO bus
+ const: 1
+
+ '#size-cells':
+ description: number of size cells on the MDIO bus
+ const: 0
+
+ clocks:
+ items:
+ - description: MAC main clock
+ - description: MDIO clock
+
+ clock-names:
+ items:
+ - const: eth
+ - const: mdio
+
+ resets:
+ items:
+ - description: MAC reset
+ - description: MDIO reset
+
+ reset-names:
+ items:
+ - const: mac
+ - const: mdio
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - phy-mode
+ - clocks
+ - clock-names
+ - resets
+ - reset-names
+
+examples:
+ # Lager board
+ - |
+ eth0: ethernet@19000000 {
+ compatible = "qca,ar9330-eth";
+ reg = <0x19000000 0x200>;
+ interrupts = <4>;
+ resets = <&rst 9>, <&rst 22>;
+ reset-names = "mac", "mdio";
+ clocks = <&pll 1>, <&pll 2>;
+ clock-names = "eth", "mdio";
+ qca,ethcfg = <&ethcfg>;
+ phy-mode = "mii";
+ phy-handle = <&phy_port4>;
+ };
+
+ eth1: ethernet@1a000000 {
+ compatible = "qca,ar9330-eth";
+ reg = <0x1a000000 0x200>;
+ interrupts = <5>;
+ resets = <&rst 13>, <&rst 23>;
+ reset-names = "mac", "mdio";
+ clocks = <&pll 1>, <&pll 2>;
+ clock-names = "eth", "mdio";
+
+ phy-mode = "gmii";
+
+ status = "disabled";
+
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ };
+
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ switch10: switch@10 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ compatible = "qca,ar9331-switch";
+ reg = <0x10>;
+ resets = <&rst 8>;
+ reset-names = "switch";
+
+ interrupt-parent = <&miscintc>;
+ interrupts = <12>;
+
+ interrupt-controller;
+ #interrupt-cells = <1>;
+
+ ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ switch_port0: port@0 {
+ reg = <0x0>;
+ label = "cpu";
+ ethernet = <&eth1>;
+
+ phy-mode = "gmii";
+
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ };
+ };
+
+ switch_port1: port@1 {
+ reg = <0x1>;
+ phy-handle = <&phy_port0>;
+ phy-mode = "internal";
+
+ status = "disabled";
+ };
+
+ switch_port2: port@2 {
+ reg = <0x2>;
+ phy-handle = <&phy_port1>;
+ phy-mode = "internal";
+
+ status = "disabled";
+ };
+
+ switch_port3: port@3 {
+ reg = <0x3>;
+ phy-handle = <&phy_port2>;
+ phy-mode = "internal";
+
+ status = "disabled";
+ };
+
+ switch_port4: port@4 {
+ reg = <0x4>;
+ phy-handle = <&phy_port3>;
+ phy-mode = "internal";
+
+ status = "disabled";
+ };
+ };
+
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ interrupt-parent = <&switch10>;
+
+ phy_port0: phy@0 {
+ reg = <0x0>;
+ interrupts = <0>;
+ status = "disabled";
+ };
+
+ phy_port1: phy@1 {
+ reg = <0x1>;
+ interrupts = <0>;
+ status = "disabled";
+ };
+
+ phy_port2: phy@2 {
+ reg = <0x2>;
+ interrupts = <0>;
+ status = "disabled";
+ };
+
+ phy_port3: phy@3 {
+ reg = <0x3>;
+ interrupts = <0>;
+ status = "disabled";
+ };
+
+ phy_port4: phy@4 {
+ reg = <0x4>;
+ interrupts = <0>;
+ status = "disabled";
+ };
+ };
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/qcom,ipa.yaml b/Documentation/devicetree/bindings/net/qcom,ipa.yaml
index 140f15245654..7b749fc04c32 100644
--- a/Documentation/devicetree/bindings/net/qcom,ipa.yaml
+++ b/Documentation/devicetree/bindings/net/qcom,ipa.yaml
@@ -20,7 +20,10 @@ description:
The GSI is an integral part of the IPA, but it is logically isolated
and has a distinct interrupt and a separately-defined address space.
- See also soc/qcom/qcom,smp2p.txt and interconnect/interconnect.txt.
+ See also soc/qcom/qcom,smp2p.txt and interconnect/interconnect.txt. See
+ iommu/iommu.txt and iommu/arm,smmu.yaml for more information about SMMU
+ bindings.
+
- |
-------- ---------
@@ -54,6 +57,9 @@ properties:
- const: ipa-shared
- const: gsi
+ iommus:
+ maxItems: 1
+
clocks:
maxItems: 1
@@ -126,6 +132,7 @@ properties:
required:
- compatible
+ - iommus
- reg
- clocks
- interrupts
@@ -164,6 +171,7 @@ examples:
modem-init;
modem-remoteproc = <&mss_pil>;
+ iommus = <&apps_smmu 0x720 0x3>;
reg = <0 0x1e40000 0 0x7000>,
<0 0x1e47000 0 0x2000>,
<0 0x1e04000 0 0x2c000>;
diff --git a/Documentation/devicetree/bindings/net/qcom,ipq4019-mdio.yaml b/Documentation/devicetree/bindings/net/qcom,ipq4019-mdio.yaml
new file mode 100644
index 000000000000..13555a89975f
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/qcom,ipq4019-mdio.yaml
@@ -0,0 +1,61 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/qcom,ipq4019-mdio.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Qualcomm IPQ40xx MDIO Controller Device Tree Bindings
+
+maintainers:
+ - Robert Marko <robert.marko@sartura.hr>
+
+allOf:
+ - $ref: "mdio.yaml#"
+
+properties:
+ compatible:
+ const: qcom,ipq4019-mdio
+
+ "#address-cells":
+ const: 1
+
+ "#size-cells":
+ const: 0
+
+ reg:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - "#address-cells"
+ - "#size-cells"
+
+examples:
+ - |
+ mdio@90000 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "qcom,ipq4019-mdio";
+ reg = <0x90000 0x64>;
+
+ ethphy0: ethernet-phy@0 {
+ reg = <0>;
+ };
+
+ ethphy1: ethernet-phy@1 {
+ reg = <1>;
+ };
+
+ ethphy2: ethernet-phy@2 {
+ reg = <2>;
+ };
+
+ ethphy3: ethernet-phy@3 {
+ reg = <3>;
+ };
+
+ ethphy4: ethernet-phy@4 {
+ reg = <4>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt
index d2202791c1d4..709ca6d51650 100644
--- a/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt
+++ b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt
@@ -10,9 +10,11 @@ device the slave device is attached to.
Required properties:
- compatible: should contain one of the following:
* "qcom,qca6174-bt"
+ * "qcom,qca9377-bt"
* "qcom,wcn3990-bt"
* "qcom,wcn3991-bt"
* "qcom,wcn3998-bt"
+ * "qcom,qca6390-bt"
Optional properties for compatible string qcom,qca6174-bt:
@@ -20,6 +22,10 @@ Optional properties for compatible string qcom,qca6174-bt:
- clocks: clock provided to the controller (SUSCLK_32KHZ)
- firmware-name: specify the name of nvm firmware to load
+Optional properties for compatible string qcom,qca9377-bt:
+
+ - max-speed: see Documentation/devicetree/bindings/serial/serial.yaml
+
Required properties for compatible string qcom,wcn399x-bt:
- vddio-supply: VDD_IO supply regulator handle.
diff --git a/Documentation/devicetree/bindings/net/realtek-bluetooth.yaml b/Documentation/devicetree/bindings/net/realtek-bluetooth.yaml
new file mode 100644
index 000000000000..f15a5e5e4859
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/realtek-bluetooth.yaml
@@ -0,0 +1,54 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/realtek-bluetooth.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: RTL8723BS/RTL8723CS/RTL8822CS Bluetooth Device Tree Bindings
+
+maintainers:
+ - Vasily Khoruzhick <anarsoul@gmail.com>
+ - Alistair Francis <alistair@alistair23.me>
+
+description:
+ RTL8723CS/RTL8723CS/RTL8822CS is WiFi + BT chip. WiFi part is connected over
+ SDIO, while BT is connected over serial. It speaks H5 protocol with few
+ extra commands to upload firmware and change module speed.
+
+properties:
+ compatible:
+ oneOf:
+ - const: "realtek,rtl8723bs-bt"
+ - const: "realtek,rtl8723cs-bt"
+ - const: "realtek,rtl8822cs-bt"
+
+ device-wake-gpios:
+ maxItems: 1
+ description: GPIO specifier, used to wakeup the BT module
+
+ enable-gpios:
+ maxItems: 1
+ description: GPIO specifier, used to enable the BT module
+
+ host-wake-gpios:
+ maxItems: 1
+ description: GPIO specifier, used to wakeup the host processor
+
+required:
+ - compatible
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ uart1 {
+ pinctrl-names = "default";
+ pinctrl-0 = <&uart1_pins>, <&uart1_rts_cts_pins>;
+ uart-has-rtscts = <1>;
+
+ bluetooth {
+ compatible = "realtek,rtl8723bs-bt";
+ device-wake-gpios = <&r_pio 0 5 GPIO_ACTIVE_HIGH>; /* PL5 */
+ host-wakeup-gpios = <&r_pio 0 6 GPIO_ACTIVE_HIGH>; /* PL6 */
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
deleted file mode 100644
index 4e85fc495e87..000000000000
--- a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
+++ /dev/null
@@ -1,64 +0,0 @@
-* Socionext AVE ethernet controller
-
-This describes the devicetree bindings for AVE ethernet controller
-implemented on Socionext UniPhier SoCs.
-
-Required properties:
- - compatible: Should be
- - "socionext,uniphier-pro4-ave4" : for Pro4 SoC
- - "socionext,uniphier-pxs2-ave4" : for PXs2 SoC
- - "socionext,uniphier-ld11-ave4" : for LD11 SoC
- - "socionext,uniphier-ld20-ave4" : for LD20 SoC
- - "socionext,uniphier-pxs3-ave4" : for PXs3 SoC
- - reg: Address where registers are mapped and size of region.
- - interrupts: Should contain the MAC interrupt.
- - phy-mode: See ethernet.txt in the same directory. Allow to choose
- "rgmii", "rmii", "mii", or "internal" according to the PHY.
- The acceptable mode is SoC-dependent.
- - phy-handle: Should point to the external phy device.
- See ethernet.txt file in the same directory.
- - clocks: A phandle to the clock for the MAC.
- For Pro4 SoC, that is "socionext,uniphier-pro4-ave4",
- another MAC clock, GIO bus clock and PHY clock are also required.
- - clock-names: Should contain
- - "ether", "ether-gb", "gio", "ether-phy" for Pro4 SoC
- - "ether" for others
- - resets: A phandle to the reset control for the MAC. For Pro4 SoC,
- GIO bus reset is also required.
- - reset-names: Should contain
- - "ether", "gio" for Pro4 SoC
- - "ether" for others
- - socionext,syscon-phy-mode: A phandle to syscon with one argument
- that configures phy mode. The argument is the ID of MAC instance.
-
-The MAC address will be determined using the optional properties
-defined in ethernet.txt.
-
-Required subnode:
- - mdio: A container for child nodes representing phy nodes.
- See phy.txt in the same directory.
-
-Example:
-
- ether: ethernet@65000000 {
- compatible = "socionext,uniphier-ld20-ave4";
- reg = <0x65000000 0x8500>;
- interrupts = <0 66 4>;
- phy-mode = "rgmii";
- phy-handle = <&ethphy>;
- clock-names = "ether";
- clocks = <&sys_clk 6>;
- reset-names = "ether";
- resets = <&sys_rst 6>;
- socionext,syscon-phy-mode = <&soc_glue 0>;
- local-mac-address = [00 00 00 00 00 00];
-
- mdio {
- #address-cells = <1>;
- #size-cells = <0>;
-
- ethphy: ethphy@1 {
- reg = <1>;
- };
- };
- };
diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.yaml b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.yaml
new file mode 100644
index 000000000000..7d84a863b9b9
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.yaml
@@ -0,0 +1,111 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/socionext,uniphier-ave4.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Socionext AVE ethernet controller
+
+maintainers:
+ - Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
+
+description: |
+ This describes the devicetree bindings for AVE ethernet controller
+ implemented on Socionext UniPhier SoCs.
+
+allOf:
+ - $ref: ethernet-controller.yaml#
+
+properties:
+ compatible:
+ enum:
+ - socionext,uniphier-pro4-ave4
+ - socionext,uniphier-pxs2-ave4
+ - socionext,uniphier-ld11-ave4
+ - socionext,uniphier-ld20-ave4
+ - socionext,uniphier-pxs3-ave4
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ phy-mode: true
+
+ phy-handle: true
+
+ mac-address: true
+
+ local-mac-address: true
+
+ clocks:
+ minItems: 1
+ maxItems: 4
+
+ clock-names:
+ oneOf:
+ - items: # for Pro4
+ - const: gio
+ - const: ether
+ - const: ether-gb
+ - const: ether-phy
+ - const: ether # for others
+
+ resets:
+ minItems: 1
+ maxItems: 2
+
+ reset-names:
+ oneOf:
+ - items: # for Pro4
+ - const: gio
+ - const: ether
+ - const: ether # for others
+
+ socionext,syscon-phy-mode:
+ $ref: /schemas/types.yaml#definitions/phandle-array
+ description:
+ A phandle to syscon with one argument that configures phy mode.
+ The argument is the ID of MAC instance.
+
+ mdio:
+ $ref: mdio.yaml#
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - phy-mode
+ - phy-handle
+ - clocks
+ - clock-names
+ - resets
+ - reset-names
+ - mdio
+
+additionalProperties: false
+
+examples:
+ - |
+ ether: ethernet@65000000 {
+ compatible = "socionext,uniphier-ld20-ave4";
+ reg = <0x65000000 0x8500>;
+ interrupts = <0 66 4>;
+ phy-mode = "rgmii";
+ phy-handle = <&ethphy>;
+ clock-names = "ether";
+ clocks = <&sys_clk 6>;
+ reset-names = "ether";
+ resets = <&sys_rst 6>;
+ socionext,syscon-phy-mode = <&soc_glue 0>;
+
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ ethphy: ethernet-phy@1 {
+ reg = <1>;
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/ti,dp83867.txt b/Documentation/devicetree/bindings/net/ti,dp83867.txt
deleted file mode 100644
index 44e2a4fab29e..000000000000
--- a/Documentation/devicetree/bindings/net/ti,dp83867.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-* Texas Instruments - dp83867 Giga bit ethernet phy
-
-Required properties:
- - reg - The ID number for the phy, usually a small integer
- - ti,rx-internal-delay - RGMII Receive Clock Delay - see dt-bindings/net/ti-dp83867.h
- for applicable values. Required only if interface type is
- PHY_INTERFACE_MODE_RGMII_ID or PHY_INTERFACE_MODE_RGMII_RXID
- - ti,tx-internal-delay - RGMII Transmit Clock Delay - see dt-bindings/net/ti-dp83867.h
- for applicable values. Required only if interface type is
- PHY_INTERFACE_MODE_RGMII_ID or PHY_INTERFACE_MODE_RGMII_TXID
-
-Note: If the interface type is PHY_INTERFACE_MODE_RGMII the TX/RX clock delays
- will be left at their default values, as set by the PHY's pin strapping.
- The default strapping will use a delay of 2.00 ns. Thus
- PHY_INTERFACE_MODE_RGMII, by default, does not behave as RGMII with no
- internal delay, but as PHY_INTERFACE_MODE_RGMII_ID. The device tree
- should use "rgmii-id" if internal delays are desired as this may be
- changed in future to cause "rgmii" mode to disable delays.
-
-Optional property:
- - ti,min-output-impedance - MAC Interface Impedance control to set
- the programmable output impedance to
- minimum value (35 ohms).
- - ti,max-output-impedance - MAC Interface Impedance control to set
- the programmable output impedance to
- maximum value (70 ohms).
- - ti,dp83867-rxctrl-strap-quirk - This denotes the fact that the
- board has RX_DV/RX_CTRL pin strapped in
- mode 1 or 2. To ensure PHY operation,
- there are specific actions that
- software needs to take when this pin is
- strapped in these modes. See data manual
- for details.
- - ti,clk-output-sel - Muxing option for CLK_OUT pin. See dt-bindings/net/ti-dp83867.h
- for applicable values. The CLK_OUT pin can also
- be disabled by this property. When omitted, the
- PHY's default will be left as is.
- - ti,sgmii-ref-clock-output-enable - This denotes which
- SGMII configuration is used (4 or 6-wire modes).
- Some MACs work with differential SGMII clock.
- See data manual for details.
-
- - ti,fifo-depth - Transmitt FIFO depth- see dt-bindings/net/ti-dp83867.h
- for applicable values (deprecated)
-
- -tx-fifo-depth - As defined in the ethernet-controller.yaml. Values for
- the depth can be found in dt-bindings/net/ti-dp83867.h
- -rx-fifo-depth - As defined in the ethernet-controller.yaml. Values for
- the depth can be found in dt-bindings/net/ti-dp83867.h
-
-Note: ti,min-output-impedance and ti,max-output-impedance are mutually
- exclusive. When both properties are present ti,max-output-impedance
- takes precedence.
-
-Default child nodes are standard Ethernet PHY device
-nodes as described in Documentation/devicetree/bindings/net/phy.txt
-
-Example:
-
- ethernet-phy@0 {
- reg = <0>;
- ti,rx-internal-delay = <DP83867_RGMIIDCTL_2_25_NS>;
- ti,tx-internal-delay = <DP83867_RGMIIDCTL_2_75_NS>;
- tx-fifo-depth = <DP83867_PHYCR_FIFO_DEPTH_4_B_NIB>;
- };
-
-Datasheet can be found:
-http://www.ti.com/product/DP83867IR/datasheet
diff --git a/Documentation/devicetree/bindings/net/ti,dp83867.yaml b/Documentation/devicetree/bindings/net/ti,dp83867.yaml
new file mode 100644
index 000000000000..554dcd7a40a9
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/ti,dp83867.yaml
@@ -0,0 +1,127 @@
+# SPDX-License-Identifier: (GPL-2.0+ OR BSD-2-Clause)
+# Copyright (C) 2019 Texas Instruments Incorporated
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/net/ti,dp83867.yaml#"
+$schema: "http://devicetree.org/meta-schemas/core.yaml#"
+
+title: TI DP83867 ethernet PHY
+
+allOf:
+ - $ref: "ethernet-controller.yaml#"
+
+maintainers:
+ - Dan Murphy <dmurphy@ti.com>
+
+description: |
+ The DP83867 device is a robust, low power, fully featured Physical Layer
+ transceiver with integrated PMD sublayers to support 10BASE-Te, 100BASE-TX
+ and 1000BASE-T Ethernet protocols.
+
+ The DP83867 is designed for easy implementation of 10/100/1000 Mbps Ethernet
+ LANs. It interfaces directly to twisted pair media via an external
+ transformer. This device interfaces directly to the MAC layer through the
+ IEEE 802.3 Standard Media Independent Interface (MII), the IEEE 802.3 Gigabit
+ Media Independent Interface (GMII) or Reduced GMII (RGMII).
+
+ Specifications about the charger can be found at:
+ https://www.ti.com/lit/gpn/dp83867ir
+
+properties:
+ reg:
+ maxItems: 1
+
+ ti,min-output-impedance:
+ type: boolean
+ description: |
+ MAC Interface Impedance control to set the programmable output impedance
+ to a minimum value (35 ohms).
+
+ ti,max-output-impedance:
+ type: boolean
+ description: |
+ MAC Interface Impedance control to set the programmable output impedance
+ to a maximum value (70 ohms).
+ Note: ti,min-output-impedance and ti,max-output-impedance are mutually
+ exclusive. When both properties are present ti,max-output-impedance
+ takes precedence.
+
+ tx-fifo-depth:
+ $ref: /schemas/types.yaml#definitions/uint32
+ description: |
+ Transmitt FIFO depth see dt-bindings/net/ti-dp83867.h for values
+
+ rx-fifo-depth:
+ $ref: /schemas/types.yaml#definitions/uint32
+ description: |
+ Receive FIFO depth see dt-bindings/net/ti-dp83867.h for values
+
+ ti,clk-output-sel:
+ $ref: /schemas/types.yaml#definitions/uint32
+ description: |
+ Muxing option for CLK_OUT pin. See dt-bindings/net/ti-dp83867.h
+ for applicable values. The CLK_OUT pin can also be disabled by this
+ property. When omitted, the PHY's default will be left as is.
+
+ ti,rx-internal-delay:
+ $ref: /schemas/types.yaml#definitions/uint32
+ description: |
+ RGMII Receive Clock Delay - see dt-bindings/net/ti-dp83867.h
+ for applicable values. Required only if interface type is
+ PHY_INTERFACE_MODE_RGMII_ID or PHY_INTERFACE_MODE_RGMII_RXID.
+
+ ti,tx-internal-delay:
+ $ref: /schemas/types.yaml#definitions/uint32
+ description: |
+ RGMII Transmit Clock Delay - see dt-bindings/net/ti-dp83867.h
+ for applicable values. Required only if interface type is
+ PHY_INTERFACE_MODE_RGMII_ID or PHY_INTERFACE_MODE_RGMII_TXID.
+
+ Note: If the interface type is PHY_INTERFACE_MODE_RGMII the TX/RX clock
+ delays will be left at their default values, as set by the PHY's pin
+ strapping. The default strapping will use a delay of 2.00 ns. Thus
+ PHY_INTERFACE_MODE_RGMII, by default, does not behave as RGMII with no
+ internal delay, but as PHY_INTERFACE_MODE_RGMII_ID. The device tree
+ should use "rgmii-id" if internal delays are desired as this may be
+ changed in future to cause "rgmii" mode to disable delays.
+
+ ti,dp83867-rxctrl-strap-quirk:
+ type: boolean
+ description: |
+ This denotes the fact that the board has RX_DV/RX_CTRL pin strapped in
+ mode 1 or 2. To ensure PHY operation, there are specific actions that
+ software needs to take when this pin is strapped in these modes.
+ See data manual for details.
+
+ ti,sgmii-ref-clock-output-enable:
+ type: boolean
+ description: |
+ This denotes which SGMII configuration is used (4 or 6-wire modes).
+ Some MACs work with differential SGMII clock. See data manual for details.
+
+ ti,fifo-depth:
+ deprecated: true
+ $ref: /schemas/types.yaml#definitions/uint32
+ description: |
+ Transmitt FIFO depth- see dt-bindings/net/ti-dp83867.h for applicable
+ values.
+
+required:
+ - reg
+
+examples:
+ - |
+ #include <dt-bindings/net/ti-dp83867.h>
+ mdio0 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ ethphy0: ethernet-phy@0 {
+ reg = <0>;
+ tx-fifo-depth = <DP83867_PHYCR_FIFO_DEPTH_4_B_NIB>;
+ rx-fifo-depth = <DP83867_PHYCR_FIFO_DEPTH_4_B_NIB>;
+ ti,max-output-impedance;
+ ti,clk-output-sel = <DP83867_CLK_O_SEL_CHN_A_RCLK>;
+ ti,rx-internal-delay = <DP83867_RGMIIDCTL_2_25_NS>;
+ ti,tx-internal-delay = <DP83867_RGMIIDCTL_2_75_NS>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/ti,dp83869.yaml b/Documentation/devicetree/bindings/net/ti,dp83869.yaml
index 6fe3e451da8a..5b69ef03bbf7 100644
--- a/Documentation/devicetree/bindings/net/ti,dp83869.yaml
+++ b/Documentation/devicetree/bindings/net/ti,dp83869.yaml
@@ -1,4 +1,4 @@
-# SPDX-License-Identifier: GPL-2.0
+# SPDX-License-Identifier: (GPL-2.0+ OR BSD-2-Clause)
# Copyright (C) 2019 Texas Instruments Incorporated
%YAML 1.2
---
diff --git a/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml b/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml
index 78bf511e2892..c87395f360a6 100644
--- a/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml
+++ b/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml
@@ -144,6 +144,13 @@ patternProperties:
description:
CPSW MDIO bus.
+ "^cpts@[0-9a-f]+":
+ type: object
+ allOf:
+ - $ref: "ti,k3-am654-cpts.yaml#"
+ description:
+ CPSW Common Platform Time Sync (CPTS) module.
+
required:
- compatible
- reg
@@ -164,6 +171,8 @@ examples:
#include <dt-bindings/pinctrl/k3.h>
#include <dt-bindings/soc/ti,sci_pm_domain.h>
#include <dt-bindings/net/ti-dp83867.h>
+ #include <dt-bindings/interrupt-controller/irq.h>
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
mcu_cpsw: ethernet@46000000 {
compatible = "ti,am654-cpsw-nuss";
@@ -222,4 +231,15 @@ examples:
ti,fifo-depth = <DP83867_PHYCR_FIFO_DEPTH_4_B_NIB>;
};
};
+
+ cpts@3d000 {
+ compatible = "ti,am65-cpts";
+ reg = <0x0 0x3d000 0x0 0x400>;
+ clocks = <&k3_clks 18 2>;
+ clock-names = "cpts";
+ interrupts-extended = <&gic500 GIC_SPI 858 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "cpts";
+ ti,cpts-ext-ts-inputs = <4>;
+ ti,cpts-periodic-outputs = <2>;
+ };
};
diff --git a/Documentation/devicetree/bindings/net/ti,k3-am654-cpts.yaml b/Documentation/devicetree/bindings/net/ti,k3-am654-cpts.yaml
new file mode 100644
index 000000000000..50e027911dd4
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/ti,k3-am654-cpts.yaml
@@ -0,0 +1,145 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/ti,k3-am654-cpts.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: The TI AM654x/J721E Common Platform Time Sync (CPTS) module Device Tree Bindings
+
+maintainers:
+ - Grygorii Strashko <grygorii.strashko@ti.com>
+ - Sekhar Nori <nsekhar@ti.com>
+
+description: |+
+ The TI AM654x/J721E CPTS module is used to facilitate host control of time
+ sync operations.
+ Main features of CPTS module are
+ - selection of multiple external clock sources
+ - Software control of time sync events via interrupt or polling
+ - 64-bit timestamp mode in ns with PPM and nudge adjustment.
+ - hardware timestamp push inputs (HWx_TS_PUSH)
+ - timestamp counter compare output (TS_COMP)
+ - timestamp counter bit output (TS_SYNC)
+ - periodic Generator function outputs (TS_GENFx)
+ - Ethernet Enhanced Scheduled Traffic Operations (CPTS_ESTFn) (TSN)
+ - external hardware timestamp push inputs (HWx_TS_PUSH) timestamping
+
+ Depending on integration it enables compliance with the IEEE 1588-2008
+ standard for a precision clock synchronization protocol, Ethernet Enhanced
+ Scheduled Traffic Operations (CPTS_ESTFn) and PCIe Subsystem Precision Time
+ Measurement (PTM).
+
+ TI AM654x/J721E SoCs has several similar CPTS modules integrated into the
+ different parts of the system which could be synchronized with each other
+ - Main CPTS
+ - MCU CPSW CPTS with IEEE 1588-2008 support
+ - PCIe subsystem CPTS for PTM support
+
+ Depending on CPTS module integration and when CPTS is integral part of
+ another module (MCU CPSW for example) "compatible" and "reg" can
+ be omitted - parent module is fully responsible for CPTS enabling and
+ configuration.
+
+properties:
+ $nodename:
+ pattern: "^cpts@[0-9a-f]+$"
+
+ compatible:
+ oneOf:
+ - const: ti,am65-cpts
+ - const: ti,j721e-cpts
+
+ reg:
+ maxItems: 1
+ description:
+ The physical base address and size of CPTS IO range
+
+ reg-names:
+ items:
+ - const: cpts
+
+ clocks:
+ description: CPTS reference clock
+
+ clock-names:
+ items:
+ - const: cpts
+
+ interrupts:
+ items:
+ - description: CPTS events interrupt
+
+ interrupt-names:
+ items:
+ - const: cpts
+
+ ti,cpts-ext-ts-inputs:
+ allOf:
+ - $ref: /schemas/types.yaml#/definitions/uint32
+ maximum: 8
+ description:
+ Number of hardware timestamp push inputs (HWx_TS_PUSH)
+
+ ti,cpts-periodic-outputs:
+ allOf:
+ - $ref: /schemas/types.yaml#/definitions/uint32
+ maximum: 8
+ description:
+ Number of timestamp Generator function outputs (TS_GENFx)
+
+ refclk-mux:
+ type: object
+ description: CPTS reference clock multiplexer clock
+ properties:
+ '#clock-cells':
+ const: 0
+
+ clocks:
+ maxItems: 8
+
+ assigned-clocks:
+ maxItems: 1
+
+ assigned-clocks-parents:
+ maxItems: 1
+
+ required:
+ - clocks
+
+required:
+ - compatible
+ - reg
+ - clocks
+ - clock-names
+ - interrupts
+ - interrupt-names
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/irq.h>
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+ cpts@310d0000 {
+ compatible = "ti,am65-cpts";
+ reg = <0x0 0x310d0000 0x0 0x400>;
+ reg-names = "cpts";
+ clocks = <&main_cpts_mux>;
+ clock-names = "cpts";
+ interrupts-extended = <&k3_irq 163 0 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "cpts";
+ ti,cpts-periodic-outputs = <6>;
+ ti,cpts-ext-ts-inputs = <8>;
+
+ main_cpts_mux: refclk-mux {
+ #clock-cells = <0>;
+ clocks = <&k3_clks 118 5>, <&k3_clks 118 11>,
+ <&k3_clks 157 91>, <&k3_clks 157 77>,
+ <&k3_clks 157 102>, <&k3_clks 157 80>,
+ <&k3_clks 120 3>, <&k3_clks 121 3>;
+ assigned-clocks = <&main_cpts_mux>;
+ assigned-clock-parents = <&k3_clks 118 11>;
+ };
+ };
+
diff --git a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt
index 3a76d8faaaed..ab7e7a00e534 100644
--- a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt
+++ b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt
@@ -25,6 +25,9 @@ Optional properties:
- mediatek,mtd-eeprom: Specify a MTD partition + offset containing EEPROM data
- big-endian: if the radio eeprom partition is written in big-endian, specify
this property
+- mediatek,eeprom-merge-otp: Merge EEPROM data with OTP data. Can be used on
+ boards where the flash calibration data is generic and specific calibration
+ data should be pulled from the OTP ROM
The MAC address can as well be set with corresponding optional properties
defined in net/ethernet.txt.
diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt
index 71bf91f97386..65ee68efd574 100644
--- a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt
+++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt
@@ -96,6 +96,17 @@ Optional properties:
- qcom,coexist-gpio-pin : gpio pin number information to support coex
which will be used by wifi firmware.
+* Subnodes
+The ath10k wifi node can contain one optional firmware subnode.
+Firmware subnode is needed when the platform does not have TustZone.
+The firmware subnode must have:
+
+- iommus:
+ Usage: required
+ Value type: <prop-encoded-array>
+ Definition: A list of phandle and IOMMU specifier pairs.
+
+
Example (to supply PCI based wifi block details):
In this example, the node is defined as child node of the PCI controller.
@@ -196,4 +207,7 @@ wifi@18000000 {
memory-region = <&wifi_msa_mem>;
iommus = <&apps_smmu 0x0040 0x1>;
qcom,msa-fixed-perm;
+ wifi-firmware {
+ iommus = <&apps_iommu 0xc22 0x1>;
+ };
};
diff --git a/Documentation/devicetree/bindings/pci/loongson.yaml b/Documentation/devicetree/bindings/pci/loongson.yaml
new file mode 100644
index 000000000000..30e7cf1aeb87
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/loongson.yaml
@@ -0,0 +1,62 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/pci/loongson.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Loongson PCI Host Controller
+
+maintainers:
+ - Jiaxun Yang <jiaxun.yang@flygoat.com>
+
+description: |+
+ PCI host controller found on Loongson PCHs and SoCs.
+
+allOf:
+ - $ref: /schemas/pci/pci-bus.yaml#
+
+properties:
+ compatible:
+ oneOf:
+ - const: loongson,ls2k-pci
+ - const: loongson,ls7a-pci
+ - const: loongson,rs780e-pci
+
+ reg:
+ minItems: 1
+ maxItems: 2
+ items:
+ - description: CFG0 standard config space register
+ - description: CFG1 extended config space register
+
+ ranges:
+ minItems: 1
+ maxItems: 3
+
+
+required:
+ - compatible
+ - reg
+ - ranges
+
+examples:
+ - |
+
+ bus {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ pcie@1a000000 {
+ compatible = "loongson,rs780e-pci";
+ device_type = "pci";
+ #address-cells = <3>;
+ #size-cells = <2>;
+
+ // CPU_PHYSICAL(2) SIZE(2)
+ reg = <0x0 0x1a000000 0x0 0x2000000>;
+
+ // BUS_ADDRESS(3) CPU_PHYSICAL(2) SIZE(2)
+ ranges = <0x01000000 0x0 0x00004000 0x0 0x00004000 0x0 0x00004000>,
+ <0x02000000 0x0 0x40000000 0x0 0x40000000 0x0 0x40000000>;
+ };
+ };
+...
diff --git a/Documentation/devicetree/bindings/phy/qcom,qusb2-phy.yaml b/Documentation/devicetree/bindings/phy/qcom,qusb2-phy.yaml
index 144ae29e7141..f8bd28ff31c1 100644
--- a/Documentation/devicetree/bindings/phy/qcom,qusb2-phy.yaml
+++ b/Documentation/devicetree/bindings/phy/qcom,qusb2-phy.yaml
@@ -97,7 +97,7 @@ then:
- $ref: /schemas/types.yaml#/definitions/uint32
- minimum: 0
maximum: 63
- default: 0
+ default: 32
qcom,charge-ctrl-value:
description:
@@ -130,7 +130,7 @@ then:
- $ref: /schemas/types.yaml#/definitions/uint32
- minimum: 0
maximum: 3
- default: 2
+ default: 0
qcom,preemphasis-width:
description:
@@ -152,7 +152,7 @@ then:
- $ref: /schemas/types.yaml#/definitions/uint32
- minimum: 0
maximum: 3
- default: 0
+ default: 1
required:
- compatible
diff --git a/Documentation/devicetree/bindings/regulator/anatop-regulator.txt b/Documentation/devicetree/bindings/regulator/anatop-regulator.txt
deleted file mode 100644
index a3106c72fbea..000000000000
--- a/Documentation/devicetree/bindings/regulator/anatop-regulator.txt
+++ /dev/null
@@ -1,40 +0,0 @@
-Anatop Voltage regulators
-
-Required properties:
-- compatible: Must be "fsl,anatop-regulator"
-- regulator-name: A string used as a descriptive name for regulator outputs
-- anatop-reg-offset: Anatop MFD register offset
-- anatop-vol-bit-shift: Bit shift for the register
-- anatop-vol-bit-width: Number of bits used in the register
-- anatop-min-bit-val: Minimum value of this register
-- anatop-min-voltage: Minimum voltage of this regulator
-- anatop-max-voltage: Maximum voltage of this regulator
-
-Optional properties:
-- anatop-delay-reg-offset: Anatop MFD step time register offset
-- anatop-delay-bit-shift: Bit shift for the step time register
-- anatop-delay-bit-width: Number of bits used in the step time register
-- vin-supply: The supply for this regulator
-- anatop-enable-bit: Regulator enable bit offset
-
-Any property defined as part of the core regulator
-binding, defined in regulator.txt, can also be used.
-
-Example:
-
- regulator-vddpu {
- compatible = "fsl,anatop-regulator";
- regulator-name = "vddpu";
- regulator-min-microvolt = <725000>;
- regulator-max-microvolt = <1300000>;
- regulator-always-on;
- anatop-reg-offset = <0x140>;
- anatop-vol-bit-shift = <9>;
- anatop-vol-bit-width = <5>;
- anatop-delay-reg-offset = <0x170>;
- anatop-delay-bit-shift = <24>;
- anatop-delay-bit-width = <2>;
- anatop-min-bit-val = <1>;
- anatop-min-voltage = <725000>;
- anatop-max-voltage = <1300000>;
- };
diff --git a/Documentation/devicetree/bindings/regulator/anatop-regulator.yaml b/Documentation/devicetree/bindings/regulator/anatop-regulator.yaml
new file mode 100644
index 000000000000..e7b3abe30363
--- /dev/null
+++ b/Documentation/devicetree/bindings/regulator/anatop-regulator.yaml
@@ -0,0 +1,94 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/regulator/anatop-regulator.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Freescale Anatop Voltage Regulators
+
+maintainers:
+ - Ying-Chun Liu (PaulLiu) <paul.liu@linaro.org>
+
+allOf:
+ - $ref: "regulator.yaml#"
+
+properties:
+ compatible:
+ const: fsl,anatop-regulator
+
+ regulator-name: true
+
+ anatop-reg-offset:
+ $ref: '/schemas/types.yaml#/definitions/uint32'
+ description: u32 value representing the anatop MFD register offset.
+
+ anatop-vol-bit-shift:
+ $ref: '/schemas/types.yaml#/definitions/uint32'
+ description: u32 value representing the bit shift for the register.
+
+ anatop-vol-bit-width:
+ $ref: '/schemas/types.yaml#/definitions/uint32'
+ description: u32 value representing the number of bits used in the register.
+
+ anatop-min-bit-val:
+ $ref: '/schemas/types.yaml#/definitions/uint32'
+ description: u32 value representing the minimum value of this register.
+
+ anatop-min-voltage:
+ $ref: '/schemas/types.yaml#/definitions/uint32'
+ description: u32 value representing the minimum voltage of this regulator.
+
+ anatop-max-voltage:
+ $ref: '/schemas/types.yaml#/definitions/uint32'
+ description: u32 value representing the maximum voltage of this regulator.
+
+ anatop-delay-reg-offset:
+ $ref: '/schemas/types.yaml#/definitions/uint32'
+ description: u32 value representing the anatop MFD step time register offset.
+
+ anatop-delay-bit-shift:
+ $ref: '/schemas/types.yaml#/definitions/uint32'
+ description: u32 value representing the bit shift for the step time register.
+
+ anatop-delay-bit-width:
+ $ref: '/schemas/types.yaml#/definitions/uint32'
+ description: u32 value representing the number of bits used in the step time register.
+
+ anatop-enable-bit:
+ $ref: '/schemas/types.yaml#/definitions/uint32'
+ description: u32 value representing regulator enable bit offset.
+
+ vin-supply:
+ $ref: '/schemas/types.yaml#/definitions/phandle'
+ description: input supply phandle.
+
+required:
+ - compatible
+ - regulator-name
+ - anatop-reg-offset
+ - anatop-vol-bit-shift
+ - anatop-vol-bit-width
+ - anatop-min-bit-val
+ - anatop-min-voltage
+ - anatop-max-voltage
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ regulator-vddpu {
+ compatible = "fsl,anatop-regulator";
+ regulator-name = "vddpu";
+ regulator-min-microvolt = <725000>;
+ regulator-max-microvolt = <1300000>;
+ regulator-always-on;
+ anatop-reg-offset = <0x140>;
+ anatop-vol-bit-shift = <9>;
+ anatop-vol-bit-width = <5>;
+ anatop-delay-reg-offset = <0x170>;
+ anatop-delay-bit-shift = <24>;
+ anatop-delay-bit-width = <2>;
+ anatop-min-bit-val = <1>;
+ anatop-min-voltage = <725000>;
+ anatop-max-voltage = <1300000>;
+ };
diff --git a/Documentation/devicetree/bindings/regulator/maxim,max77826.yaml b/Documentation/devicetree/bindings/regulator/maxim,max77826.yaml
new file mode 100644
index 000000000000..19cbd5eb2897
--- /dev/null
+++ b/Documentation/devicetree/bindings/regulator/maxim,max77826.yaml
@@ -0,0 +1,68 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/regulator/maxim,max77826.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Maxim Integrated MAX77826 PMIC
+
+maintainers:
+ - Iskren Chernev <iskren.chernev@gmail.com>
+
+properties:
+ $nodename:
+ pattern: "pmic@[0-9a-f]{1,2}"
+ compatible:
+ enum:
+ - maxim,max77826
+
+ reg:
+ maxItems: 1
+
+ regulators:
+ type: object
+ allOf:
+ - $ref: regulator.yaml#
+ description: |
+ list of regulators provided by this controller, must be named
+ after their hardware counterparts LDO[1-15], BUCK and BUCKBOOST
+
+ patternProperties:
+ "^LDO([1-9]|1[0-5])$":
+ type: object
+ allOf:
+ - $ref: regulator.yaml#
+
+ "^BUCK|BUCKBOOST$":
+ type: object
+ allOf:
+ - $ref: regulator.yaml#
+
+ additionalProperties: false
+
+required:
+ - compatible
+ - reg
+ - regulators
+
+additionalProperties: false
+
+examples:
+ - |
+ i2c {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ pmic@69 {
+ compatible = "maxim,max77826";
+ reg = <0x69>;
+
+ regulators {
+ LDO2 {
+ regulator-min-microvolt = <650000>;
+ regulator-max-microvolt = <3587500>;
+ };
+ };
+ };
+ };
+...
diff --git a/Documentation/devicetree/bindings/regulator/mps,mp5416.yaml b/Documentation/devicetree/bindings/regulator/mps,mp5416.yaml
index f0acce2029fd..3b019fa6db31 100644
--- a/Documentation/devicetree/bindings/regulator/mps,mp5416.yaml
+++ b/Documentation/devicetree/bindings/regulator/mps,mp5416.yaml
@@ -37,7 +37,6 @@ properties:
type: object
additionalProperties: false
- additionalProperties: false
required:
- compatible
diff --git a/Documentation/devicetree/bindings/regulator/mps,mpq7920.yaml b/Documentation/devicetree/bindings/regulator/mps,mpq7920.yaml
index a682af0dc67e..ae6e7ab36c58 100644
--- a/Documentation/devicetree/bindings/regulator/mps,mpq7920.yaml
+++ b/Documentation/devicetree/bindings/regulator/mps,mpq7920.yaml
@@ -75,7 +75,8 @@ properties:
description: |
disables over voltage protection of this buck
- additionalProperties: false
+ unevaluatedProperties: false
+
additionalProperties: false
required:
diff --git a/Documentation/devicetree/bindings/regulator/rohm,bd71828-regulator.yaml b/Documentation/devicetree/bindings/regulator/rohm,bd71828-regulator.yaml
index 71ce032b8cf8..1e52dafcb5c9 100644
--- a/Documentation/devicetree/bindings/regulator/rohm,bd71828-regulator.yaml
+++ b/Documentation/devicetree/bindings/regulator/rohm,bd71828-regulator.yaml
@@ -35,6 +35,8 @@ patternProperties:
description:
should be "ldo1", ..., "ldo7"
+ unevaluatedProperties: false
+
"^BUCK[1-7]$":
type: object
allOf:
@@ -103,5 +105,7 @@ patternProperties:
required:
- regulator-name
- additionalProperties: false
+
+ unevaluatedProperties: false
+
additionalProperties: false
diff --git a/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.yaml b/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.yaml
index a323b1696eee..543d4b52397e 100644
--- a/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.yaml
+++ b/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.yaml
@@ -41,6 +41,8 @@ patternProperties:
description:
should be "ldo1", ..., "ldo7"
+ unevaluatedProperties: false
+
"^BUCK[1-8]$":
type: object
allOf:
@@ -99,5 +101,7 @@ patternProperties:
required:
- regulator-name
- additionalProperties: false
+
+ unevaluatedProperties: false
+
additionalProperties: false
diff --git a/Documentation/devicetree/bindings/regulator/rohm,bd71847-regulator.yaml b/Documentation/devicetree/bindings/regulator/rohm,bd71847-regulator.yaml
index 526fd00bcb16..d797cc23406f 100644
--- a/Documentation/devicetree/bindings/regulator/rohm,bd71847-regulator.yaml
+++ b/Documentation/devicetree/bindings/regulator/rohm,bd71847-regulator.yaml
@@ -40,6 +40,8 @@ patternProperties:
description:
should be "ldo1", ..., "ldo6"
+ unevaluatedProperties: false
+
"^BUCK[1-6]$":
type: object
allOf:
@@ -93,5 +95,7 @@ patternProperties:
required:
- regulator-name
- additionalProperties: false
+
+ unevaluatedProperties: false
+
additionalProperties: false
diff --git a/Documentation/devicetree/bindings/reserved-memory/ramoops.txt b/Documentation/devicetree/bindings/reserved-memory/ramoops.txt
index 0eba562fe5c6..b7886fea368c 100644
--- a/Documentation/devicetree/bindings/reserved-memory/ramoops.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/ramoops.txt
@@ -30,7 +30,7 @@ Optional properties:
- ecc-size: enables ECC support and specifies ECC buffer size in bytes
(defaults to 0: no ECC)
-- record-size: maximum size in bytes of each dump done on oops/panic
+- record-size: maximum size in bytes of each kmsg dump.
(defaults to 0: disabled)
- console-size: size in bytes of log buffer reserved for kernel messages
@@ -45,7 +45,16 @@ Optional properties:
- unbuffered: if present, use unbuffered mappings to map the reserved region
(defaults to buffered mappings)
-- no-dump-oops: if present, only dump panics (defaults to panics and oops)
+- max-reason: if present, sets maximum type of kmsg dump reasons to store
+ (defaults to 2: log Oopses and Panics). This can be set to INT_MAX to
+ store all kmsg dumps. See include/linux/kmsg_dump.h KMSG_DUMP_* for other
+ kmsg dump reason values. Setting this to 0 (KMSG_DUMP_UNDEF), means the
+ reason filtering will be controlled by the printk.always_kmsg_dump boot
+ param: if unset, it will be KMSG_DUMP_OOPS, otherwise KMSG_DUMP_MAX.
+
+- no-dump-oops: deprecated, use max_reason instead. If present, and
+ max_reason is not specified, it is equivalent to max_reason = 1
+ (KMSG_DUMP_PANIC).
- flags: if present, pass ramoops behavioral flags (defaults to 0,
see include/linux/pstore_ram.h RAMOOPS_FLAG_* for flag values).
diff --git a/Documentation/devicetree/bindings/rng/arm-cctrng.yaml b/Documentation/devicetree/bindings/rng/arm-cctrng.yaml
new file mode 100644
index 000000000000..ca6aad19b6ba
--- /dev/null
+++ b/Documentation/devicetree/bindings/rng/arm-cctrng.yaml
@@ -0,0 +1,54 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/rng/arm-cctrng.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Arm TrustZone CryptoCell TRNG engine
+
+maintainers:
+ - Hadar Gat <hadar.gat@arm.com>
+
+description: |+
+ Arm TrustZone CryptoCell TRNG (True Random Number Generator) engine.
+
+properties:
+ compatible:
+ enum:
+ - arm,cryptocell-713-trng
+ - arm,cryptocell-703-trng
+
+ interrupts:
+ maxItems: 1
+
+ reg:
+ maxItems: 1
+
+ arm,rosc-ratio:
+ description:
+ Arm TrustZone CryptoCell TRNG engine has 4 ring oscillators.
+ Sampling ratio values for these 4 ring oscillators. (from calibration)
+ allOf:
+ - $ref: /schemas/types.yaml#/definitions/uint32-array
+ - items:
+ maxItems: 4
+
+ clocks:
+ maxItems: 1
+
+required:
+ - compatible
+ - interrupts
+ - reg
+ - arm,rosc-ratio
+
+additionalProperties: false
+
+examples:
+ - |
+ arm_cctrng: rng@60000000 {
+ compatible = "arm,cryptocell-713-trng";
+ interrupts = <0 29 4>;
+ reg = <0x60000000 0x10000>;
+ arm,rosc-ratio = <5000 1000 500 0>;
+ };
diff --git a/Documentation/devicetree/bindings/rtc/dw-apb.txt b/Documentation/devicetree/bindings/rtc/dw-apb.txt
deleted file mode 100644
index c703d51abb6c..000000000000
--- a/Documentation/devicetree/bindings/rtc/dw-apb.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-* Designware APB timer
-
-Required properties:
-- compatible: One of:
- "snps,dw-apb-timer"
- "snps,dw-apb-timer-sp" <DEPRECATED>
- "snps,dw-apb-timer-osc" <DEPRECATED>
-- reg: physical base address of the controller and length of memory mapped
- region.
-- interrupts: IRQ line for the timer.
-- either clocks+clock-names or clock-frequency properties
-
-Optional properties:
-- clocks : list of clock specifiers, corresponding to entries in
- the clock-names property;
-- clock-names : should contain "timer" and "pclk" entries, matching entries
- in the clocks property.
-- clock-frequency: The frequency in HZ of the timer.
-- clock-freq: For backwards compatibility with picoxcell
-
-If using the clock specifiers, the pclk clock is optional, as not all
-systems may use one.
-
-
-Example:
- timer@ffe00000 {
- compatible = "snps,dw-apb-timer";
- interrupts = <0 170 4>;
- reg = <0xffe00000 0x1000>;
- clocks = <&timer_clk>, <&timer_pclk>;
- clock-names = "timer", "pclk";
- };
diff --git a/Documentation/devicetree/bindings/sound/rockchip-i2s.yaml b/Documentation/devicetree/bindings/sound/rockchip-i2s.yaml
index 7cd0e278ed85..a3ba2186d6a1 100644
--- a/Documentation/devicetree/bindings/sound/rockchip-i2s.yaml
+++ b/Documentation/devicetree/bindings/sound/rockchip-i2s.yaml
@@ -56,6 +56,9 @@ properties:
- const: tx
- const: rx
+ power-domains:
+ maxItems: 1
+
rockchip,capture-channels:
allOf:
- $ref: /schemas/types.yaml#/definitions/uint32
diff --git a/Documentation/devicetree/bindings/sound/rockchip-spdif.txt b/Documentation/devicetree/bindings/sound/rockchip-spdif.txt
deleted file mode 100644
index ec20c1271e92..000000000000
--- a/Documentation/devicetree/bindings/sound/rockchip-spdif.txt
+++ /dev/null
@@ -1,45 +0,0 @@
-* Rockchip SPDIF transceiver
-
-The S/PDIF audio block is a stereo transceiver that allows the
-processor to receive and transmit digital audio via an coaxial cable or
-a fibre cable.
-
-Required properties:
-
-- compatible: should be one of the following:
- - "rockchip,rk3066-spdif"
- - "rockchip,rk3188-spdif"
- - "rockchip,rk3228-spdif"
- - "rockchip,rk3288-spdif"
- - "rockchip,rk3328-spdif"
- - "rockchip,rk3366-spdif"
- - "rockchip,rk3368-spdif"
- - "rockchip,rk3399-spdif"
-- reg: physical base address of the controller and length of memory mapped
- region.
-- interrupts: should contain the SPDIF interrupt.
-- dmas: DMA specifiers for tx dma. See the DMA client binding,
- Documentation/devicetree/bindings/dma/dma.txt
-- dma-names: should be "tx"
-- clocks: a list of phandle + clock-specifier pairs, one for each entry
- in clock-names.
-- clock-names: should contain following:
- - "hclk": clock for SPDIF controller
- - "mclk" : clock for SPDIF bus
-
-Required properties on RK3288:
- - rockchip,grf: the phandle of the syscon node for the general register
- file (GRF)
-
-Example for the rk3188 SPDIF controller:
-
-spdif: spdif@1011e000 {
- compatible = "rockchip,rk3188-spdif", "rockchip,rk3066-spdif";
- reg = <0x1011e000 0x2000>;
- interrupts = <GIC_SPI 32 IRQ_TYPE_LEVEL_HIGH>;
- dmas = <&dmac1_s 8>;
- dma-names = "tx";
- clock-names = "hclk", "mclk";
- clocks = <&cru HCLK_SPDIF>, <&cru SCLK_SPDIF>;
- #sound-dai-cells = <0>;
-};
diff --git a/Documentation/devicetree/bindings/sound/rockchip-spdif.yaml b/Documentation/devicetree/bindings/sound/rockchip-spdif.yaml
new file mode 100644
index 000000000000..c467152656f7
--- /dev/null
+++ b/Documentation/devicetree/bindings/sound/rockchip-spdif.yaml
@@ -0,0 +1,101 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/sound/rockchip-spdif.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Rockchip SPDIF transceiver
+
+description:
+ The S/PDIF audio block is a stereo transceiver that allows the
+ processor to receive and transmit digital audio via a coaxial or
+ fibre cable.
+
+maintainers:
+ - Heiko Stuebner <heiko@sntech.de>
+
+properties:
+ compatible:
+ oneOf:
+ - const: rockchip,rk3066-spdif
+ - const: rockchip,rk3228-spdif
+ - const: rockchip,rk3328-spdif
+ - const: rockchip,rk3366-spdif
+ - const: rockchip,rk3368-spdif
+ - const: rockchip,rk3399-spdif
+ - items:
+ - enum:
+ - rockchip,rk3188-spdif
+ - rockchip,rk3288-spdif
+ - const: rockchip,rk3066-spdif
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ clocks:
+ items:
+ - description: clock for SPDIF bus
+ - description: clock for SPDIF controller
+
+ clock-names:
+ items:
+ - const: mclk
+ - const: hclk
+
+ dmas:
+ maxItems: 1
+
+ dma-names:
+ const: tx
+
+ power-domains:
+ maxItems: 1
+
+ rockchip,grf:
+ $ref: /schemas/types.yaml#/definitions/phandle
+ description:
+ The phandle of the syscon node for the GRF register.
+ Required property on RK3288.
+
+ "#sound-dai-cells":
+ const: 0
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+ - clock-names
+ - dmas
+ - dma-names
+ - "#sound-dai-cells"
+
+if:
+ properties:
+ compatible:
+ contains:
+ const: rockchip,rk3288-spdif
+
+then:
+ required:
+ - rockchip,grf
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/clock/rk3188-cru.h>
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ spdif: spdif@1011e000 {
+ compatible = "rockchip,rk3188-spdif", "rockchip,rk3066-spdif";
+ reg = <0x1011e000 0x2000>;
+ interrupts = <GIC_SPI 32 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&cru SCLK_SPDIF>, <&cru HCLK_SPDIF>;
+ clock-names = "mclk", "hclk";
+ dmas = <&dmac1_s 8>;
+ dma-names = "tx";
+ #sound-dai-cells = <0>;
+ };
diff --git a/Documentation/devicetree/bindings/spi/brcm,spi-bcm-qspi.txt b/Documentation/devicetree/bindings/spi/brcm,spi-bcm-qspi.txt
index ad7ac80a3841..f5e518d099f2 100644
--- a/Documentation/devicetree/bindings/spi/brcm,spi-bcm-qspi.txt
+++ b/Documentation/devicetree/bindings/spi/brcm,spi-bcm-qspi.txt
@@ -26,6 +26,16 @@ Required properties:
"brcm,spi-bcm-qspi", "brcm,spi-brcmstb-qspi" : MSPI+BSPI on BRCMSTB SoCs
"brcm,spi-bcm-qspi", "brcm,spi-brcmstb-mspi" : Second Instance of MSPI
BRCMSTB SoCs
+ "brcm,spi-bcm7425-qspi", "brcm,spi-bcm-qspi", "brcm,spi-brcmstb-mspi" : Second Instance of MSPI
+ BRCMSTB SoCs
+ "brcm,spi-bcm7429-qspi", "brcm,spi-bcm-qspi", "brcm,spi-brcmstb-mspi" : Second Instance of MSPI
+ BRCMSTB SoCs
+ "brcm,spi-bcm7435-qspi", "brcm,spi-bcm-qspi", "brcm,spi-brcmstb-mspi" : Second Instance of MSPI
+ BRCMSTB SoCs
+ "brcm,spi-bcm7216-qspi", "brcm,spi-bcm-qspi", "brcm,spi-brcmstb-mspi" : Second Instance of MSPI
+ BRCMSTB SoCs
+ "brcm,spi-bcm7278-qspi", "brcm,spi-bcm-qspi", "brcm,spi-brcmstb-mspi" : Second Instance of MSPI
+ BRCMSTB SoCs
"brcm,spi-bcm-qspi", "brcm,spi-nsp-qspi" : MSPI+BSPI on Cygnus, NSP
"brcm,spi-bcm-qspi", "brcm,spi-ns2-qspi" : NS2 SoCs
diff --git a/Documentation/devicetree/bindings/spi/mikrotik,rb4xx-spi.yaml b/Documentation/devicetree/bindings/spi/mikrotik,rb4xx-spi.yaml
new file mode 100644
index 000000000000..4ddb42a4ae05
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/mikrotik,rb4xx-spi.yaml
@@ -0,0 +1,36 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/spi/mikrotik,rb4xx-spi.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: MikroTik RB4xx series SPI master
+
+maintainers:
+ - Gabor Juhos <juhosg@openwrt.org>
+ - Bert Vermeulen <bert@biot.com>
+
+allOf:
+ - $ref: "spi-controller.yaml#"
+
+properties:
+ compatible:
+ const: mikrotik,rb4xx-spi
+
+ reg:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+
+examples:
+ - |
+ spi: spi@1f000000 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "mikrotik,rb4xx-spi";
+ reg = <0x1f000000 0x10>;
+ };
+
+... \ No newline at end of file
diff --git a/Documentation/devicetree/bindings/spi/renesas,rspi.yaml b/Documentation/devicetree/bindings/spi/renesas,rspi.yaml
new file mode 100644
index 000000000000..c54ac059043f
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/renesas,rspi.yaml
@@ -0,0 +1,144 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/spi/renesas,rspi.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Renesas (Quad) Serial Peripheral Interface (RSPI/QSPI)
+
+maintainers:
+ - Geert Uytterhoeven <geert+renesas@glider.be>
+
+properties:
+ compatible:
+ oneOf:
+ - items:
+ - enum:
+ - renesas,rspi-sh7757 # SH7757
+ - const: renesas,rspi # Legacy SH
+
+ - items:
+ - enum:
+ - renesas,rspi-r7s72100 # RZ/A1H
+ - renesas,rspi-r7s9210 # RZ/A2
+ - const: renesas,rspi-rz # RZ/A
+
+ - items:
+ - enum:
+ - renesas,qspi-r8a7743 # RZ/G1M
+ - renesas,qspi-r8a7744 # RZ/G1N
+ - renesas,qspi-r8a7745 # RZ/G1E
+ - renesas,qspi-r8a77470 # RZ/G1C
+ - renesas,qspi-r8a7790 # R-Car H2
+ - renesas,qspi-r8a7791 # R-Car M2-W
+ - renesas,qspi-r8a7792 # R-Car V2H
+ - renesas,qspi-r8a7793 # R-Car M2-N
+ - renesas,qspi-r8a7794 # R-Car E2
+ - const: renesas,qspi # R-Car Gen2 and RZ/G1
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ oneOf:
+ - items:
+ - description: A combined interrupt
+ - items:
+ - description: Error interrupt (SPEI)
+ - description: Receive Interrupt (SPRI)
+ - description: Transmit Interrupt (SPTI)
+
+ interrupt-names:
+ oneOf:
+ - items:
+ - const: mux
+ - items:
+ - const: error
+ - const: rx
+ - const: tx
+
+ clocks:
+ maxItems: 1
+
+ power-domains:
+ maxItems: 1
+
+ resets:
+ maxItems: 1
+
+ dmas:
+ description:
+ Must contain a list of pairs of references to DMA specifiers, one for
+ transmission, and one for reception.
+
+ dma-names:
+ minItems: 2
+ maxItems: 4
+ items:
+ enum:
+ - tx
+ - rx
+
+ num-cs:
+ description: |
+ Total number of native chip selects.
+ Hardware limitations related to chip selects:
+ - When using GPIO chip selects, at least one native chip select must
+ be left unused, as it will be driven anyway.
+ minimum: 1
+ maximum: 2
+ default: 1
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+ - power-domains
+ - '#address-cells'
+ - '#size-cells'
+
+allOf:
+ - $ref: spi-controller.yaml#
+ - if:
+ properties:
+ compatible:
+ contains:
+ enum:
+ - renesas,rspi-rz
+ then:
+ properties:
+ interrupts:
+ minItems: 3
+ required:
+ - interrupt-names
+
+ - if:
+ properties:
+ compatible:
+ contains:
+ enum:
+ - renesas,qspi
+ then:
+ required:
+ - resets
+
+examples:
+ - |
+ #include <dt-bindings/clock/r8a7791-cpg-mssr.h>
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/power/r8a7791-sysc.h>
+
+ qspi: spi@e6b10000 {
+ compatible = "renesas,qspi-r8a7791", "renesas,qspi";
+ reg = <0xe6b10000 0x2c>;
+ interrupts = <GIC_SPI 184 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&cpg CPG_MOD 917>;
+ dmas = <&dmac0 0x17>, <&dmac0 0x18>, <&dmac1 0x17>, <&dmac1 0x18>;
+ dma-names = "tx", "rx", "tx", "rx";
+ power-domains = <&sysc R8A7791_PD_ALWAYS_ON>;
+ resets = <&cpg 917>;
+ num-cs = <1>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
diff --git a/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt b/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt
deleted file mode 100644
index 3ed08ee9feba..000000000000
--- a/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt
+++ /dev/null
@@ -1,41 +0,0 @@
-Synopsys DesignWare AMBA 2.0 Synchronous Serial Interface.
-
-Required properties:
-- compatible : "snps,dw-apb-ssi" or "mscc,<soc>-spi", where soc is "ocelot" or
- "jaguar2", or "amazon,alpine-dw-apb-ssi"
-- reg : The register base for the controller. For "mscc,<soc>-spi", a second
- register set is required (named ICPU_CFG:SPI_MST)
-- interrupts : One interrupt, used by the controller.
-- #address-cells : <1>, as required by generic SPI binding.
-- #size-cells : <0>, also as required by generic SPI binding.
-- clocks : phandles for the clocks, see the description of clock-names below.
- The phandle for the "ssi_clk" is required. The phandle for the "pclk" clock
- is optional. If a single clock is specified but no clock-name, it is the
- "ssi_clk" clock. If both clocks are listed, the "ssi_clk" must be first.
-
-Optional properties:
-- clock-names : Contains the names of the clocks:
- "ssi_clk", for the core clock used to generate the external SPI clock.
- "pclk", the interface clock, required for register access. If a clock domain
- used to enable this clock then it should be named "pclk_clkdomain".
-- cs-gpios : Specifies the gpio pins to be used for chipselects.
-- num-cs : The number of chipselects. If omitted, this will default to 4.
-- reg-io-width : The I/O register width (in bytes) implemented by this
- device. Supported values are 2 or 4 (the default).
-
-Child nodes as per the generic SPI binding.
-
-Example:
-
- spi@fff00000 {
- compatible = "snps,dw-apb-ssi";
- reg = <0xfff00000 0x1000>;
- interrupts = <0 154 4>;
- #address-cells = <1>;
- #size-cells = <0>;
- clocks = <&spi_m_clk>;
- num-cs = <2>;
- cs-gpios = <&gpio0 13 0>,
- <&gpio0 14 0>;
- };
-
diff --git a/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml b/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml
new file mode 100644
index 000000000000..c62cbe79f00d
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml
@@ -0,0 +1,133 @@
+# SPDX-License-Identifier: GPL-2.0-only
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/spi/snps,dw-apb-ssi.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Synopsys DesignWare AMBA 2.0 Synchronous Serial Interface
+
+maintainers:
+ - Mark Brown <broonie@kernel.org>
+
+allOf:
+ - $ref: "spi-controller.yaml#"
+ - if:
+ properties:
+ compatible:
+ contains:
+ enum:
+ - mscc,ocelot-spi
+ - mscc,jaguar2-spi
+ then:
+ properties:
+ reg:
+ minItems: 2
+
+properties:
+ compatible:
+ oneOf:
+ - description: Generic DW SPI Controller
+ enum:
+ - snps,dw-apb-ssi
+ - snps,dwc-ssi-1.01a
+ - description: Microsemi Ocelot/Jaguar2 SoC SPI Controller
+ items:
+ - enum:
+ - mscc,ocelot-spi
+ - mscc,jaguar2-spi
+ - const: snps,dw-apb-ssi
+ - description: Amazon Alpine SPI Controller
+ const: amazon,alpine-dw-apb-ssi
+ - description: Renesas RZ/N1 SPI Controller
+ items:
+ - const: renesas,rzn1-spi
+ - const: snps,dw-apb-ssi
+ - description: Intel Keem Bay SPI Controller
+ const: intel,keembay-ssi
+
+ reg:
+ minItems: 1
+ items:
+ - description: DW APB SSI controller memory mapped registers
+ - description: SPI MST region map
+
+ interrupts:
+ maxItems: 1
+
+ clocks:
+ minItems: 1
+ items:
+ - description: SPI Controller reference clock source
+ - description: APB interface clock source
+
+ clock-names:
+ minItems: 1
+ items:
+ - const: ssi_clk
+ - const: pclk
+
+ resets:
+ maxItems: 1
+
+ reset-names:
+ const: spi
+
+ reg-io-width:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description: I/O register width (in bytes) implemented by this device
+ default: 4
+ enum: [ 2, 4 ]
+
+ num-cs:
+ default: 4
+ minimum: 1
+ maximum: 4
+
+ dmas:
+ items:
+ - description: TX DMA Channel
+ - description: RX DMA Channel
+
+ dma-names:
+ items:
+ - const: tx
+ - const: rx
+
+patternProperties:
+ "^.*@[0-9a-f]+$":
+ type: object
+ properties:
+ reg:
+ minimum: 0
+ maximum: 3
+
+ spi-rx-bus-width:
+ const: 1
+
+ spi-tx-bus-width:
+ const: 1
+
+unevaluatedProperties: false
+
+required:
+ - compatible
+ - reg
+ - "#address-cells"
+ - "#size-cells"
+ - interrupts
+ - clocks
+
+examples:
+ - |
+ spi@fff00000 {
+ compatible = "snps,dw-apb-ssi";
+ reg = <0xfff00000 0x1000>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ interrupts = <0 154 4>;
+ clocks = <&spi_m_clk>;
+ num-cs = <2>;
+ cs-gpios = <&gpio0 13 0>,
+ <&gpio0 14 0>;
+ };
+...
diff --git a/Documentation/devicetree/bindings/spi/socionext,uniphier-spi.yaml b/Documentation/devicetree/bindings/spi/socionext,uniphier-spi.yaml
new file mode 100644
index 000000000000..c25409298bdf
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/socionext,uniphier-spi.yaml
@@ -0,0 +1,57 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/spi/socionext,uniphier-spi.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Socionext UniPhier SPI controller
+
+description: |
+ UniPhier SoCs have SCSSI which supports SPI single channel.
+
+maintainers:
+ - Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
+ - Keiji Hayashibara <hayashibara.keiji@socionext.com>
+
+allOf:
+ - $ref: spi-controller.yaml#
+
+properties:
+ "#address-cells": true
+ "#size-cells": true
+
+ compatible:
+ const: socionext,uniphier-scssi
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ clocks:
+ maxItems: 1
+
+ resets:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+ - resets
+ - "#address-cells"
+ - "#size-cells"
+
+examples:
+ - |
+ spi0: spi@54006000 {
+ compatible = "socionext,uniphier-scssi";
+ reg = <0x54006000 0x100>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ interrupts = <0 39 4>;
+ clocks = <&peri_clk 11>;
+ resets = <&peri_rst 11>;
+ };
diff --git a/Documentation/devicetree/bindings/spi/spi-dw.txt b/Documentation/devicetree/bindings/spi/spi-dw.txt
deleted file mode 100644
index 7b63ed601990..000000000000
--- a/Documentation/devicetree/bindings/spi/spi-dw.txt
+++ /dev/null
@@ -1,24 +0,0 @@
-Synopsys DesignWare SPI master
-
-Required properties:
-- compatible: should be "snps,designware-spi"
-- #address-cells: see spi-bus.txt
-- #size-cells: see spi-bus.txt
-- reg: address and length of the spi master registers
-- interrupts: should contain one interrupt
-- clocks: spi clock phandle
-- num-cs: see spi-bus.txt
-
-Optional properties:
-- cs-gpios: see spi-bus.txt
-
-Example:
-
-spi: spi@4020a000 {
- compatible = "snps,designware-spi";
- interrupts = <11 1>;
- reg = <0x4020a000 0x1000>;
- clocks = <&pclk>;
- num-cs = <2>;
- cs-gpios = <&banka 0 0>;
-};
diff --git a/Documentation/devicetree/bindings/spi/spi-rspi.txt b/Documentation/devicetree/bindings/spi/spi-rspi.txt
deleted file mode 100644
index 421722b93992..000000000000
--- a/Documentation/devicetree/bindings/spi/spi-rspi.txt
+++ /dev/null
@@ -1,73 +0,0 @@
-Device tree configuration for Renesas RSPI/QSPI driver
-
-Required properties:
-- compatible : For Renesas Serial Peripheral Interface on legacy SH:
- "renesas,rspi-<soctype>", "renesas,rspi" as fallback.
- For Renesas Serial Peripheral Interface on RZ/A:
- "renesas,rspi-<soctype>", "renesas,rspi-rz" as fallback.
- For Quad Serial Peripheral Interface on R-Car Gen2 and
- RZ/G1 devices:
- "renesas,qspi-<soctype>", "renesas,qspi" as fallback.
- Examples with soctypes are:
- - "renesas,rspi-sh7757" (SH)
- - "renesas,rspi-r7s72100" (RZ/A1H)
- - "renesas,rspi-r7s9210" (RZ/A2)
- - "renesas,qspi-r8a7743" (RZ/G1M)
- - "renesas,qspi-r8a7744" (RZ/G1N)
- - "renesas,qspi-r8a7745" (RZ/G1E)
- - "renesas,qspi-r8a77470" (RZ/G1C)
- - "renesas,qspi-r8a7790" (R-Car H2)
- - "renesas,qspi-r8a7791" (R-Car M2-W)
- - "renesas,qspi-r8a7792" (R-Car V2H)
- - "renesas,qspi-r8a7793" (R-Car M2-N)
- - "renesas,qspi-r8a7794" (R-Car E2)
-- reg : Address start and address range size of the device
-- interrupts : A list of interrupt-specifiers, one for each entry in
- interrupt-names.
- If interrupt-names is not present, an interrupt specifier
- for a single muxed interrupt.
-- interrupt-names : A list of interrupt names. Should contain (if present):
- - "error" for SPEI,
- - "rx" for SPRI,
- - "tx" to SPTI,
- - "mux" for a single muxed interrupt.
-- num-cs : Number of chip selects. Some RSPI cores have more than 1.
-- #address-cells : Must be <1>
-- #size-cells : Must be <0>
-
-Optional properties:
-- clocks : Must contain a reference to the functional clock.
-- dmas : Must contain a list of two references to DMA specifiers,
- one for transmission, and one for reception.
-- dma-names : Must contain a list of two DMA names, "tx" and "rx".
-
-Pinctrl properties might be needed, too. See
-Documentation/devicetree/bindings/pinctrl/renesas,*.
-
-Examples:
-
- spi0: spi@e800c800 {
- compatible = "renesas,rspi-r7s72100", "renesas,rspi-rz";
- reg = <0xe800c800 0x24>;
- interrupts = <0 238 IRQ_TYPE_LEVEL_HIGH>,
- <0 239 IRQ_TYPE_LEVEL_HIGH>,
- <0 240 IRQ_TYPE_LEVEL_HIGH>;
- interrupt-names = "error", "rx", "tx";
- interrupt-parent = <&gic>;
- num-cs = <1>;
- #address-cells = <1>;
- #size-cells = <0>;
- };
-
- spi: spi@e6b10000 {
- compatible = "renesas,qspi-r8a7791", "renesas,qspi";
- reg = <0 0xe6b10000 0 0x2c>;
- interrupt-parent = <&gic>;
- interrupts = <0 184 IRQ_TYPE_LEVEL_HIGH>;
- clocks = <&mstp9_clks R8A7791_CLK_QSPI_MOD>;
- num-cs = <1>;
- #address-cells = <1>;
- #size-cells = <0>;
- dmas = <&dmac0 0x17>, <&dmac0 0x18>;
- dma-names = "tx", "rx";
- };
diff --git a/Documentation/devicetree/bindings/spi/spi-uniphier.txt b/Documentation/devicetree/bindings/spi/spi-uniphier.txt
deleted file mode 100644
index e1201573a29a..000000000000
--- a/Documentation/devicetree/bindings/spi/spi-uniphier.txt
+++ /dev/null
@@ -1,28 +0,0 @@
-Socionext UniPhier SPI controller driver
-
-UniPhier SoCs have SCSSI which supports SPI single channel.
-
-Required properties:
- - compatible: should be "socionext,uniphier-scssi"
- - reg: address and length of the spi master registers
- - #address-cells: must be <1>, see spi-bus.txt
- - #size-cells: must be <0>, see spi-bus.txt
- - interrupts: a single interrupt specifier
- - pinctrl-names: should be "default"
- - pinctrl-0: pin control state for the default mode
- - clocks: a phandle to the clock for the device
- - resets: a phandle to the reset control for the device
-
-Example:
-
-spi0: spi@54006000 {
- compatible = "socionext,uniphier-scssi";
- reg = <0x54006000 0x100>;
- #address-cells = <1>;
- #size-cells = <0>;
- interrupts = <0 39 4>;
- pinctrl-names = "default";
- pinctrl-0 = <&pinctrl_spi0>;
- clocks = <&peri_clk 11>;
- resets = <&peri_rst 11>;
-};
diff --git a/Documentation/devicetree/bindings/spi/ti_qspi.txt b/Documentation/devicetree/bindings/spi/ti_qspi.txt
index e65fde4a7388..47b184bce414 100644
--- a/Documentation/devicetree/bindings/spi/ti_qspi.txt
+++ b/Documentation/devicetree/bindings/spi/ti_qspi.txt
@@ -29,7 +29,7 @@ modification to bootloader.
Example:
For am4372:
-qspi: qspi@4b300000 {
+qspi: qspi@47900000 {
compatible = "ti,am4372-qspi";
reg = <0x47900000 0x100>, <0x30000000 0x4000000>;
reg-names = "qspi_base", "qspi_mmap";
diff --git a/Documentation/devicetree/bindings/timer/renesas,em-sti.yaml b/Documentation/devicetree/bindings/timer/renesas,em-sti.yaml
new file mode 100644
index 000000000000..233d74d5402c
--- /dev/null
+++ b/Documentation/devicetree/bindings/timer/renesas,em-sti.yaml
@@ -0,0 +1,46 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/timer/renesas,em-sti.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Renesas EMMA Mobile System Timer
+
+maintainers:
+ - Magnus Damm <magnus.damm@gmail.com>
+
+properties:
+ compatible:
+ const: renesas,em-sti
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ clocks:
+ maxItems: 1
+
+ clock-names:
+ const: sclk
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+ - clock-names
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ timer@e0180000 {
+ compatible = "renesas,em-sti";
+ reg = <0xe0180000 0x54>;
+ interrupts = <GIC_SPI 125 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&sti_sclk>;
+ clock-names = "sclk";
+ };
diff --git a/Documentation/devicetree/bindings/timer/snps,dw-apb-timer.yaml b/Documentation/devicetree/bindings/timer/snps,dw-apb-timer.yaml
new file mode 100644
index 000000000000..5d300efdf0ca
--- /dev/null
+++ b/Documentation/devicetree/bindings/timer/snps,dw-apb-timer.yaml
@@ -0,0 +1,88 @@
+# SPDX-License-Identifier: GPL-2.0-only
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/timer/snps,dw-apb-timer.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Synopsys DesignWare APB Timer
+
+maintainers:
+ - Daniel Lezcano <daniel.lezcano@linaro.org>
+
+properties:
+ compatible:
+ oneOf:
+ - const: snps,dw-apb-timer
+ - enum:
+ - snps,dw-apb-timer-sp
+ - snps,dw-apb-timer-osc
+ deprecated: true
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ clocks:
+ minItems: 1
+ items:
+ - description: Timer ticks reference clock source
+ - description: APB interface clock source
+
+ clock-names:
+ minItems: 1
+ items:
+ - const: timer
+ - const: pclk
+
+ clock-frequency: true
+
+ clock-freq:
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ description: |
+ Has the same meaning as the 'clock-frequency' property - timer clock
+ frequency in HZ, but is defined only for the backwards compatibility
+ with the picoxcell platform.
+
+unevaluatedProperties: false
+
+required:
+ - compatible
+ - reg
+ - interrupts
+
+oneOf:
+ - required:
+ - clocks
+ - clock-names
+ - required:
+ - clock-frequency
+ - required:
+ - clock-freq
+
+examples:
+ - |
+ timer@ffe00000 {
+ compatible = "snps,dw-apb-timer";
+ interrupts = <0 170 4>;
+ reg = <0xffe00000 0x1000>;
+ clocks = <&timer_clk>, <&timer_pclk>;
+ clock-names = "timer", "pclk";
+ };
+ - |
+ timer@ffe00000 {
+ compatible = "snps,dw-apb-timer";
+ interrupts = <0 170 4>;
+ reg = <0xffe00000 0x1000>;
+ clocks = <&timer_clk>;
+ clock-names = "timer";
+ };
+ - |
+ timer@ffe00000 {
+ compatible = "snps,dw-apb-timer";
+ interrupts = <0 170 4>;
+ reg = <0xffe00000 0x1000>;
+ clock-frequency = <25000000>;
+ };
+...
diff --git a/Documentation/devicetree/bindings/usb/renesas,usb3-peri.yaml b/Documentation/devicetree/bindings/usb/renesas,usb3-peri.yaml
index 92d8631b9aa6..031452aa25bc 100644
--- a/Documentation/devicetree/bindings/usb/renesas,usb3-peri.yaml
+++ b/Documentation/devicetree/bindings/usb/renesas,usb3-peri.yaml
@@ -18,6 +18,7 @@ properties:
- renesas,r8a774c0-usb3-peri # RZ/G2E
- renesas,r8a7795-usb3-peri # R-Car H3
- renesas,r8a7796-usb3-peri # R-Car M3-W
+ - renesas,r8a77961-usb3-peri # R-Car M3-W+
- renesas,r8a77965-usb3-peri # R-Car M3-N
- renesas,r8a77990-usb3-peri # R-Car E3
- const: renesas,rcar-gen3-usb3-peri
diff --git a/Documentation/devicetree/bindings/usb/renesas,usbhs.yaml b/Documentation/devicetree/bindings/usb/renesas,usbhs.yaml
index 469affa872d3..a7ae95598ccb 100644
--- a/Documentation/devicetree/bindings/usb/renesas,usbhs.yaml
+++ b/Documentation/devicetree/bindings/usb/renesas,usbhs.yaml
@@ -40,6 +40,7 @@ properties:
- renesas,usbhs-r8a774c0 # RZ/G2E
- renesas,usbhs-r8a7795 # R-Car H3
- renesas,usbhs-r8a7796 # R-Car M3-W
+ - renesas,usbhs-r8a77961 # R-Car M3-W+
- renesas,usbhs-r8a77965 # R-Car M3-N
- renesas,usbhs-r8a77990 # R-Car E3
- renesas,usbhs-r8a77995 # R-Car D3
diff --git a/Documentation/devicetree/bindings/usb/usb-xhci.txt b/Documentation/devicetree/bindings/usb/usb-xhci.txt
index 3f378951d624..dc025f126d71 100644
--- a/Documentation/devicetree/bindings/usb/usb-xhci.txt
+++ b/Documentation/devicetree/bindings/usb/usb-xhci.txt
@@ -16,7 +16,8 @@ Required properties:
- "renesas,xhci-r8a7791" for r8a7791 SoC
- "renesas,xhci-r8a7793" for r8a7793 SoC
- "renesas,xhci-r8a7795" for r8a7795 SoC
- - "renesas,xhci-r8a7796" for r8a7796 SoC
+ - "renesas,xhci-r8a7796" for r8a77960 SoC
+ - "renesas,xhci-r8a77961" for r8a77961 SoC
- "renesas,xhci-r8a77965" for r8a77965 SoC
- "renesas,xhci-r8a77990" for r8a77990 SoC
- "renesas,rcar-gen2-xhci" for a generic R-Car Gen2 or RZ/G1 compatible
diff --git a/Documentation/devicetree/bindings/vendor-prefixes.yaml b/Documentation/devicetree/bindings/vendor-prefixes.yaml
index d3891386d671..997934c58f9a 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.yaml
+++ b/Documentation/devicetree/bindings/vendor-prefixes.yaml
@@ -187,6 +187,8 @@ patternProperties:
description: ChipOne
"^chipspark,.*":
description: ChipSPARK
+ "^chrontel,.*":
+ description: Chrontel, Inc.
"^chrp,.*":
description: Common Hardware Reference Platform
"^chunghwa,.*":
@@ -463,6 +465,8 @@ patternProperties:
description: Infineon Technologies
"^inforce,.*":
description: Inforce Computing
+ "^ivo,.*":
+ description: InfoVision Optoelectronics Kunshan Co. Ltd.
"^ingenic,.*":
description: Ingenic Semiconductor
"^innolux,.*":
@@ -488,7 +492,7 @@ patternProperties:
"^issi,.*":
description: Integrated Silicon Solutions Inc.
"^ite,.*":
- description: ITE Tech, Inc.
+ description: ITE Tech. Inc.
"^itead,.*":
description: ITEAD Intelligent Systems Co.Ltd
"^iwave,.*":
@@ -633,6 +637,8 @@ patternProperties:
description: Microsoft Corporation
"^mikroe,.*":
description: MikroElektronika d.o.o.
+ "^mikrotik,.*":
+ description: MikroTik
"^miniand,.*":
description: Miniand Tech
"^minix,.*":
@@ -1039,6 +1045,8 @@ patternProperties:
description: Tronsmart
"^truly,.*":
description: Truly Semiconductors Limited
+ "^visionox,.*":
+ description: Visionox
"^tsd,.*":
description: Theobroma Systems Design und Consulting GmbH
"^tyan,.*":
diff --git a/Documentation/doc-guide/maintainer-profile.rst b/Documentation/doc-guide/maintainer-profile.rst
index 5afc0ddba40a..755d39f0d407 100644
--- a/Documentation/doc-guide/maintainer-profile.rst
+++ b/Documentation/doc-guide/maintainer-profile.rst
@@ -6,7 +6,7 @@ Documentation subsystem maintainer entry profile
The documentation "subsystem" is the central coordinating point for the
kernel's documentation and associated infrastructure. It covers the
hierarchy under Documentation/ (with the exception of
-Documentation/device-tree), various utilities under scripts/ and, at least
+Documentation/devicetree), various utilities under scripts/ and, at least
some of the time, LICENSES/.
It's worth noting, though, that the boundaries of this subsystem are rather
diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
index c78db28519f7..63dec76d1d8d 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -11,7 +11,7 @@ course not limited to GPU use cases.
The three main components of this are: (1) dma-buf, representing a
sg_table and exposed to userspace as a file descriptor to allow passing
between devices, (2) fence, which provides a mechanism to signal when
-one device as finished access, and (3) reservation, which manages the
+one device has finished access, and (3) reservation, which manages the
shared or exclusive fence(s) associated with the buffer.
Shared DMA Buffers
@@ -31,7 +31,7 @@ The exporter
- implements and manages operations in :c:type:`struct dma_buf_ops
<dma_buf_ops>` for the buffer,
- allows other users to share the buffer by using dma_buf sharing APIs,
- - manages the details of buffer allocation, wrapped int a :c:type:`struct
+ - manages the details of buffer allocation, wrapped in a :c:type:`struct
dma_buf <dma_buf>`,
- decides about the actual backing storage where this allocation happens,
- and takes care of any migration of scatterlist - for all (shared) users of
diff --git a/Documentation/driver-api/driver-model/device.rst b/Documentation/driver-api/driver-model/device.rst
index 2b868d49d349..b9b022371e85 100644
--- a/Documentation/driver-api/driver-model/device.rst
+++ b/Documentation/driver-api/driver-model/device.rst
@@ -50,10 +50,10 @@ Attributes
Attributes of devices can be exported by a device driver through sysfs.
-Please see Documentation/filesystems/sysfs.txt for more information
+Please see Documentation/filesystems/sysfs.rst for more information
on how sysfs works.
-As explained in Documentation/kobject.txt, device attributes must be
+As explained in Documentation/core-api/kobject.rst, device attributes must be
created before the KOBJ_ADD uevent is generated. The only way to realize
that is by defining an attribute group.
diff --git a/Documentation/driver-api/driver-model/devres.rst b/Documentation/driver-api/driver-model/devres.rst
index 46c13780994c..fc242ed4bde5 100644
--- a/Documentation/driver-api/driver-model/devres.rst
+++ b/Documentation/driver-api/driver-model/devres.rst
@@ -372,6 +372,11 @@ MUX
devm_mux_chip_register()
devm_mux_control_get()
+NET
+ devm_alloc_etherdev()
+ devm_alloc_etherdev_mqs()
+ devm_register_netdev()
+
PER-CPU MEM
devm_alloc_percpu()
devm_free_percpu()
diff --git a/Documentation/driver-api/driver-model/overview.rst b/Documentation/driver-api/driver-model/overview.rst
index d4d1e9b40e0c..e98d0ab4a9b6 100644
--- a/Documentation/driver-api/driver-model/overview.rst
+++ b/Documentation/driver-api/driver-model/overview.rst
@@ -121,4 +121,4 @@ device-specific data or tunable interfaces.
More information about the sysfs directory layout can be found in
the other documents in this directory and in the file
-Documentation/filesystems/sysfs.txt.
+Documentation/filesystems/sysfs.rst.
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index dcc47c029f8e..6567187e7687 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -39,6 +39,7 @@ available subsections can be seen below.
spi
i2c
ipmb
+ ipmi
i3c/index
interconnect
devfreq
diff --git a/Documentation/IPMI.txt b/Documentation/driver-api/ipmi.rst
index 5ef1047e2e66..5ef1047e2e66 100644
--- a/Documentation/IPMI.txt
+++ b/Documentation/driver-api/ipmi.rst
diff --git a/Documentation/driver-api/nvdimm/nvdimm.rst b/Documentation/driver-api/nvdimm/nvdimm.rst
index 08f855cbb4e6..79c0fd39f2af 100644
--- a/Documentation/driver-api/nvdimm/nvdimm.rst
+++ b/Documentation/driver-api/nvdimm/nvdimm.rst
@@ -278,8 +278,8 @@ by a region device with a dynamically assigned id (REGION0 - REGION5).
be contiguous in DPA-space.
This bus is provided by the kernel under the device
- /sys/devices/platform/nfit_test.0 when CONFIG_NFIT_TEST is enabled and
- the nfit_test.ko module is loaded. This not only test LIBNVDIMM but the
+ /sys/devices/platform/nfit_test.0 when the nfit_test.ko module from
+ tools/testing/nvdimm is loaded. This not only test LIBNVDIMM but the
acpi_nfit.ko driver as well.
diff --git a/Documentation/driver-api/pm/cpuidle.rst b/Documentation/driver-api/pm/cpuidle.rst
index 006cf6db40c6..3588bf078566 100644
--- a/Documentation/driver-api/pm/cpuidle.rst
+++ b/Documentation/driver-api/pm/cpuidle.rst
@@ -68,9 +68,8 @@ only one in the list (that is, the list was empty before) or the value of its
governor currently in use, or the name of the new governor was passed to the
kernel as the value of the ``cpuidle.governor=`` command line parameter, the new
governor will be used from that point on (there can be only one ``CPUIdle``
-governor in use at a time). Also, if ``cpuidle_sysfs_switch`` is passed to the
-kernel in the command line, user space can choose the ``CPUIdle`` governor to
-use at run time via ``sysfs``.
+governor in use at a time). Also, user space can choose the ``CPUIdle``
+governor to use at run time via ``sysfs``.
Once registered, ``CPUIdle`` governors cannot be unregistered, so it is not
practical to put them into loadable kernel modules.
diff --git a/Documentation/driver-api/pm/devices.rst b/Documentation/driver-api/pm/devices.rst
index f66c7b9126ea..946ad0b94e31 100644
--- a/Documentation/driver-api/pm/devices.rst
+++ b/Documentation/driver-api/pm/devices.rst
@@ -349,7 +349,7 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
PM core will skip the ``suspend``, ``suspend_late`` and
``suspend_noirq`` phases as well as all of the corresponding phases of
the subsequent device resume for all of these devices. In that case,
- the ``->complete`` callback will be invoked directly after the
+ the ``->complete`` callback will be the next one invoked after the
``->prepare`` callback and is entirely responsible for putting the
device into a consistent state as appropriate.
@@ -361,9 +361,9 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
runtime PM disabled.
This feature also can be controlled by device drivers by using the
- ``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power
- management flags. [Typically, they are set at the time the driver is
- probed against the device in question by passing them to the
+ ``DPM_FLAG_NO_DIRECT_COMPLETE`` and ``DPM_FLAG_SMART_PREPARE`` driver
+ power management flags. [Typically, they are set at the time the driver
+ is probed against the device in question by passing them to the
:c:func:`dev_pm_set_driver_flags` helper function.] If the first of
these flags is set, the PM core will not apply the direct-complete
procedure described above to the given device and, consequenty, to any
@@ -383,11 +383,15 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
``->suspend`` methods provided by subsystems (bus types and PM domains
in particular) must follow an additional rule regarding what can be done
to the devices before their drivers' ``->suspend`` methods are called.
- Namely, they can only resume the devices from runtime suspend by
- calling :c:func:`pm_runtime_resume` for them, if that is necessary, and
+ Namely, they may resume the devices from runtime suspend by
+ calling :c:func:`pm_runtime_resume` for them, if that is necessary, but
they must not update the state of the devices in any other way at that
time (in case the drivers need to resume the devices from runtime
- suspend in their ``->suspend`` methods).
+ suspend in their ``->suspend`` methods). In fact, the PM core prevents
+ subsystems or drivers from putting devices into runtime suspend at
+ these times by calling :c:func:`pm_runtime_get_noresume` before issuing
+ the ``->prepare`` callback (and calling :c:func:`pm_runtime_put` after
+ issuing the ``->complete`` callback).
3. For a number of devices it is convenient to split suspend into the
"quiesce device" and "save device state" phases, in which cases
@@ -459,22 +463,22 @@ When resuming from freeze, standby or memory sleep, the phases are:
Note, however, that new children may be registered below the device as
soon as the ``->resume`` callbacks occur; it's not necessary to wait
- until the ``complete`` phase with that.
+ until the ``complete`` phase runs.
Moreover, if the preceding ``->prepare`` callback returned a positive
number, the device may have been left in runtime suspend throughout the
- whole system suspend and resume (the ``suspend``, ``suspend_late``,
- ``suspend_noirq`` phases of system suspend and the ``resume_noirq``,
- ``resume_early``, ``resume`` phases of system resume may have been
- skipped for it). In that case, the ``->complete`` callback is entirely
+ whole system suspend and resume (its ``->suspend``, ``->suspend_late``,
+ ``->suspend_noirq``, ``->resume_noirq``,
+ ``->resume_early``, and ``->resume`` callbacks may have been
+ skipped). In that case, the ``->complete`` callback is entirely
responsible for putting the device into a consistent state after system
suspend if necessary. [For example, it may need to queue up a runtime
resume request for the device for this purpose.] To check if that is
the case, the ``->complete`` callback can consult the device's
- ``power.direct_complete`` flag. Namely, if that flag is set when the
- ``->complete`` callback is being run, it has been called directly after
- the preceding ``->prepare`` and special actions may be required
- to make the device work correctly afterward.
+ ``power.direct_complete`` flag. If that flag is set when the
+ ``->complete`` callback is being run then the direct-complete mechanism
+ was used, and special actions may be required to make the device work
+ correctly afterward.
At the end of these phases, drivers should be as functional as they were before
suspending: I/O can be performed using DMA and IRQs, and the relevant clocks are
@@ -575,10 +579,12 @@ and the phases are similar.
The ``->poweroff``, ``->poweroff_late`` and ``->poweroff_noirq`` callbacks
should do essentially the same things as the ``->suspend``, ``->suspend_late``
-and ``->suspend_noirq`` callbacks, respectively. The only notable difference is
+and ``->suspend_noirq`` callbacks, respectively. A notable difference is
that they need not store the device register values, because the registers
should already have been stored during the ``freeze``, ``freeze_late`` or
-``freeze_noirq`` phases.
+``freeze_noirq`` phases. Also, on many machines the firmware will power-down
+the entire system, so it is not necessary for the callback to put the device in
+a low-power state.
Leaving Hibernation
@@ -764,70 +770,119 @@ device driver in question.
If it is necessary to resume a device from runtime suspend during a system-wide
transition into a sleep state, that can be done by calling
-:c:func:`pm_runtime_resume` for it from the ``->suspend`` callback (or its
-couterpart for transitions related to hibernation) of either the device's driver
-or a subsystem responsible for it (for example, a bus type or a PM domain).
-That is guaranteed to work by the requirement that subsystems must not change
-the state of devices (possibly except for resuming them from runtime suspend)
+:c:func:`pm_runtime_resume` from the ``->suspend`` callback (or the ``->freeze``
+or ``->poweroff`` callback for transitions related to hibernation) of either the
+device's driver or its subsystem (for example, a bus type or a PM domain).
+However, subsystems must not otherwise change the runtime status of devices
from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before*
invoking device drivers' ``->suspend`` callbacks (or equivalent).
+.. _smart_suspend_flag:
+
+The ``DPM_FLAG_SMART_SUSPEND`` Driver Flag
+------------------------------------------
+
Some bus types and PM domains have a policy to resume all devices from runtime
suspend upfront in their ``->suspend`` callbacks, but that may not be really
-necessary if the driver of the device can cope with runtime-suspended devices.
-The driver can indicate that by setting ``DPM_FLAG_SMART_SUSPEND`` in
-:c:member:`power.driver_flags` at the probe time, by passing it to the
-:c:func:`dev_pm_set_driver_flags` helper. That also may cause middle-layer code
+necessary if the device's driver can cope with runtime-suspended devices.
+The driver can indicate this by setting ``DPM_FLAG_SMART_SUSPEND`` in
+:c:member:`power.driver_flags` at probe time, with the assistance of the
+:c:func:`dev_pm_set_driver_flags` helper routine.
+
+Setting that flag causes the PM core and middle-layer code
(bus types, PM domains etc.) to skip the ``->suspend_late`` and
``->suspend_noirq`` callbacks provided by the driver if the device remains in
-runtime suspend at the beginning of the ``suspend_late`` phase of system-wide
-suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM
-has been disabled for it, under the assumption that its state should not change
-after that point until the system-wide transition is over (the PM core itself
-does that for devices whose "noirq", "late" and "early" system-wide PM callbacks
-are executed directly by it). If that happens, the driver's system-wide resume
-callbacks, if present, may still be invoked during the subsequent system-wide
-resume transition and the device's runtime power management status may be set
-to "active" before enabling runtime PM for it, so the driver must be prepared to
-cope with the invocation of its system-wide resume callbacks back-to-back with
-its ``->runtime_suspend`` one (without the intervening ``->runtime_resume`` and
-so on) and the final state of the device must reflect the "active" runtime PM
-status in that case.
+runtime suspend throughout those phases of the system-wide suspend (and
+similarly for the "freeze" and "poweroff" parts of system hibernation).
+[Otherwise the same driver
+callback might be executed twice in a row for the same device, which would not
+be valid in general.] If the middle-layer system-wide PM callbacks are present
+for the device then they are responsible for skipping these driver callbacks;
+if not then the PM core skips them. The subsystem callback routines can
+determine whether they need to skip the driver callbacks by testing the return
+value from the :c:func:`dev_pm_skip_suspend` helper function.
+
+In addition, with ``DPM_FLAG_SMART_SUSPEND`` set, the driver's ``->thaw_noirq``
+and ``->thaw_early`` callbacks are skipped in hibernation if the device remained
+in runtime suspend throughout the preceding "freeze" transition. Again, if the
+middle-layer callbacks are present for the device, they are responsible for
+doing this, otherwise the PM core takes care of it.
+
+
+The ``DPM_FLAG_MAY_SKIP_RESUME`` Driver Flag
+--------------------------------------------
During system-wide resume from a sleep state it's easiest to put devices into
the full-power state, as explained in :file:`Documentation/power/runtime_pm.rst`.
[Refer to that document for more information regarding this particular issue as
well as for information on the device runtime power management framework in
-general.]
-
-However, it often is desirable to leave devices in suspend after system
-transitions to the working state, especially if those devices had been in
+general.] However, it often is desirable to leave devices in suspend after
+system transitions to the working state, especially if those devices had been in
runtime suspend before the preceding system-wide suspend (or analogous)
-transition. Device drivers can use the ``DPM_FLAG_LEAVE_SUSPENDED`` flag to
-indicate to the PM core (and middle-layer code) that they prefer the specific
-devices handled by them to be left suspended and they have no problems with
-skipping their system-wide resume callbacks for this reason. Whether or not the
-devices will actually be left in suspend may depend on their state before the
-given system suspend-resume cycle and on the type of the system transition under
-way. In particular, devices are not left suspended if that transition is a
-restore from hibernation, as device states are not guaranteed to be reflected
-by the information stored in the hibernation image in that case.
-
-The middle-layer code involved in the handling of the device is expected to
-indicate to the PM core if the device may be left in suspend by setting its
-:c:member:`power.may_skip_resume` status bit which is checked by the PM core
-during the "noirq" phase of the preceding system-wide suspend (or analogous)
-transition. The middle layer is then responsible for handling the device as
-appropriate in its "noirq" resume callback, which is executed regardless of
-whether or not the device is left suspended, but the other resume callbacks
-(except for ``->complete``) will be skipped automatically by the PM core if the
-device really can be left in suspend.
-
-For devices whose "noirq", "late" and "early" driver callbacks are invoked
-directly by the PM core, all of the system-wide resume callbacks are skipped if
-``DPM_FLAG_LEAVE_SUSPENDED`` is set and the device is in runtime suspend during
-the ``suspend_noirq`` (or analogous) phase or the transition under way is a
-proper system suspend (rather than anything related to hibernation) and the
-device's wakeup settings are suitable for runtime PM (that is, it cannot
-generate wakeup signals at all or it is allowed to wake up the system from
-sleep).
+transition.
+
+To that end, device drivers can use the ``DPM_FLAG_MAY_SKIP_RESUME`` flag to
+indicate to the PM core and middle-layer code that they allow their "noirq" and
+"early" resume callbacks to be skipped if the device can be left in suspend
+after system-wide PM transitions to the working state. Whether or not that is
+the case generally depends on the state of the device before the given system
+suspend-resume cycle and on the type of the system transition under way.
+In particular, the "thaw" and "restore" transitions related to hibernation are
+not affected by ``DPM_FLAG_MAY_SKIP_RESUME`` at all. [All callbacks are
+issued during the "restore" transition regardless of the flag settings,
+and whether or not any driver callbacks
+are skipped during the "thaw" transition depends whether or not the
+``DPM_FLAG_SMART_SUSPEND`` flag is set (see `above <smart_suspend_flag_>`_).
+In addition, a device is not allowed to remain in runtime suspend if any of its
+children will be returned to full power.]
+
+The ``DPM_FLAG_MAY_SKIP_RESUME`` flag is taken into account in combination with
+the :c:member:`power.may_skip_resume` status bit set by the PM core during the
+"suspend" phase of suspend-type transitions. If the driver or the middle layer
+has a reason to prevent the driver's "noirq" and "early" resume callbacks from
+being skipped during the subsequent system resume transition, it should
+clear :c:member:`power.may_skip_resume` in its ``->suspend``, ``->suspend_late``
+or ``->suspend_noirq`` callback. [Note that the drivers setting
+``DPM_FLAG_SMART_SUSPEND`` need to clear :c:member:`power.may_skip_resume` in
+their ``->suspend`` callback in case the other two are skipped.]
+
+Setting the :c:member:`power.may_skip_resume` status bit along with the
+``DPM_FLAG_MAY_SKIP_RESUME`` flag is necessary, but generally not sufficient,
+for the driver's "noirq" and "early" resume callbacks to be skipped. Whether or
+not they should be skipped can be determined by evaluating the
+:c:func:`dev_pm_skip_resume` helper function.
+
+If that function returns ``true``, the driver's "noirq" and "early" resume
+callbacks should be skipped and the device's runtime PM status will be set to
+"suspended" by the PM core. Otherwise, if the device was runtime-suspended
+during the preceding system-wide suspend transition and its
+``DPM_FLAG_SMART_SUSPEND`` is set, its runtime PM status will be set to
+"active" by the PM core. [Hence, the drivers that do not set
+``DPM_FLAG_SMART_SUSPEND`` should not expect the runtime PM status of their
+devices to be changed from "suspended" to "active" by the PM core during
+system-wide resume-type transitions.]
+
+If the ``DPM_FLAG_MAY_SKIP_RESUME`` flag is not set for a device, but
+``DPM_FLAG_SMART_SUSPEND`` is set and the driver's "late" and "noirq" suspend
+callbacks are skipped, its system-wide "noirq" and "early" resume callbacks, if
+present, are invoked as usual and the device's runtime PM status is set to
+"active" by the PM core before enabling runtime PM for it. In that case, the
+driver must be prepared to cope with the invocation of its system-wide resume
+callbacks back-to-back with its ``->runtime_suspend`` one (without the
+intervening ``->runtime_resume`` and system-wide suspend callbacks) and the
+final state of the device must reflect the "active" runtime PM status in that
+case. [Note that this is not a problem at all if the driver's
+``->suspend_late`` callback pointer points to the same function as its
+``->runtime_suspend`` one and its ``->resume_early`` callback pointer points to
+the same function as the ``->runtime_resume`` one, while none of the other
+system-wide suspend-resume callbacks of the driver are present, for example.]
+
+Likewise, if ``DPM_FLAG_MAY_SKIP_RESUME`` is set for a device, its driver's
+system-wide "noirq" and "early" resume callbacks may be skipped while its "late"
+and "noirq" suspend callbacks may have been executed (in principle, regardless
+of whether or not ``DPM_FLAG_SMART_SUSPEND`` is set). In that case, the driver
+needs to be able to cope with the invocation of its ``->runtime_resume``
+callback back-to-back with its "late" and "noirq" suspend ones. [For instance,
+that is not a concern if the driver sets both ``DPM_FLAG_SMART_SUSPEND`` and
+``DPM_FLAG_MAY_SKIP_RESUME`` and uses the same pair of suspend/resume callback
+functions for runtime PM and system-wide suspend/resume.]
diff --git a/Documentation/driver-api/thermal/cpu-idle-cooling.rst b/Documentation/driver-api/thermal/cpu-idle-cooling.rst
index a1c3edecae00..b9f34ceb2a38 100644
--- a/Documentation/driver-api/thermal/cpu-idle-cooling.rst
+++ b/Documentation/driver-api/thermal/cpu-idle-cooling.rst
@@ -1,3 +1,6 @@
+================
+CPU Idle Cooling
+================
Situation:
----------
diff --git a/Documentation/driver-api/thermal/index.rst b/Documentation/driver-api/thermal/index.rst
index 5ba61d19c6ae..4cb0b9b6bfb8 100644
--- a/Documentation/driver-api/thermal/index.rst
+++ b/Documentation/driver-api/thermal/index.rst
@@ -8,6 +8,7 @@ Thermal
:maxdepth: 1
cpu-cooling-api
+ cpu-idle-cooling
sysfs-api
power_allocator
diff --git a/Documentation/fb/efifb.rst b/Documentation/fb/efifb.rst
index 04840331a00e..6badff64756f 100644
--- a/Documentation/fb/efifb.rst
+++ b/Documentation/fb/efifb.rst
@@ -2,8 +2,10 @@
What is efifb?
==============
-This is a generic EFI platform driver for Intel based Apple computers.
-efifb is only for EFI booted Intel Macs.
+This is a generic EFI platform driver for systems with UEFI firmware. The
+system must be booted via the EFI stub for this to be usable. efifb supports
+both firmware with Graphics Output Protocol (GOP) displays as well as older
+systems with only Universal Graphics Adapter (UGA) displays.
Supported Hardware
==================
@@ -12,11 +14,14 @@ Supported Hardware
- Macbook
- Macbook Pro 15"/17"
- MacMini
+- ARM/ARM64/X86 systems with UEFI firmware
How to use it?
==============
-efifb does not have any kind of autodetection of your machine.
+For UGA displays, efifb does not have any kind of autodetection of your
+machine.
+
You have to add the following kernel parameters in your elilo.conf::
Macbook :
@@ -28,6 +33,9 @@ You have to add the following kernel parameters in your elilo.conf::
Macbook Pro 17", iMac 20" :
video=efifb:i20
+For GOP displays, efifb can autodetect the display's resolution and framebuffer
+address, so these should work out of the box without any special parameters.
+
Accepted options:
======= ===========================================================
@@ -36,4 +44,28 @@ nowc Don't map the framebuffer write combined. This can be used
when large amounts of console data are written.
======= ===========================================================
+Options for GOP displays:
+
+mode=n
+ The EFI stub will set the mode of the display to mode number n if
+ possible.
+
+<xres>x<yres>[-(rgb|bgr|<bpp>)]
+ The EFI stub will search for a display mode that matches the specified
+ horizontal and vertical resolution, and optionally bit depth, and set
+ the mode of the display to it if one is found. The bit depth can either
+ "rgb" or "bgr" to match specifically those pixel formats, or a number
+ for a mode with matching bits per pixel.
+
+auto
+ The EFI stub will choose the mode with the highest resolution (product
+ of horizontal and vertical resolution). If there are multiple modes
+ with the highest resolution, it will choose one with the highest color
+ depth.
+
+list
+ The EFI stub will list out all the display modes that are available. A
+ specific mode can then be chosen using one of the above options for the
+ next boot.
+
Edgar Hucek <gimli@dark-green.com>
diff --git a/Documentation/features/core/eBPF-JIT/arch-support.txt b/Documentation/features/core/eBPF-JIT/arch-support.txt
index 9ae6e8d0d10d..9ed964f65224 100644
--- a/Documentation/features/core/eBPF-JIT/arch-support.txt
+++ b/Documentation/features/core/eBPF-JIT/arch-support.txt
@@ -23,7 +23,7 @@
| openrisc: | TODO |
| parisc: | TODO |
| powerpc: | ok |
- | riscv: | TODO |
+ | riscv: | ok |
| s390: | ok |
| sh: | TODO |
| sparc: | ok |
diff --git a/Documentation/features/debug/KASAN/arch-support.txt b/Documentation/features/debug/KASAN/arch-support.txt
index 304dcd461795..6ff38548923e 100644
--- a/Documentation/features/debug/KASAN/arch-support.txt
+++ b/Documentation/features/debug/KASAN/arch-support.txt
@@ -22,9 +22,9 @@
| nios2: | TODO |
| openrisc: | TODO |
| parisc: | TODO |
- | powerpc: | TODO |
- | riscv: | TODO |
- | s390: | TODO |
+ | powerpc: | ok |
+ | riscv: | ok |
+ | s390: | ok |
| sh: | TODO |
| sparc: | TODO |
| um: | TODO |
diff --git a/Documentation/features/debug/gcov-profile-all/arch-support.txt b/Documentation/features/debug/gcov-profile-all/arch-support.txt
index 6fb2b0671994..210256f6a4cf 100644
--- a/Documentation/features/debug/gcov-profile-all/arch-support.txt
+++ b/Documentation/features/debug/gcov-profile-all/arch-support.txt
@@ -11,7 +11,7 @@
| arm: | ok |
| arm64: | ok |
| c6x: | TODO |
- | csky: | TODO |
+ | csky: | ok |
| h8300: | TODO |
| hexagon: | TODO |
| ia64: | TODO |
diff --git a/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt b/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt
index 32b297295fff..97cd7aa74905 100644
--- a/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt
+++ b/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt
@@ -11,7 +11,7 @@
| arm: | TODO |
| arm64: | TODO |
| c6x: | TODO |
- | csky: | TODO |
+ | csky: | ok |
| h8300: | TODO |
| hexagon: | TODO |
| ia64: | TODO |
diff --git a/Documentation/features/debug/kprobes/arch-support.txt b/Documentation/features/debug/kprobes/arch-support.txt
index e68239b5d2f0..8b316c6e03d4 100644
--- a/Documentation/features/debug/kprobes/arch-support.txt
+++ b/Documentation/features/debug/kprobes/arch-support.txt
@@ -11,7 +11,7 @@
| arm: | ok |
| arm64: | ok |
| c6x: | TODO |
- | csky: | TODO |
+ | csky: | ok |
| h8300: | TODO |
| hexagon: | TODO |
| ia64: | ok |
@@ -23,7 +23,7 @@
| openrisc: | TODO |
| parisc: | ok |
| powerpc: | ok |
- | riscv: | ok |
+ | riscv: | TODO |
| s390: | ok |
| sh: | ok |
| sparc: | ok |
diff --git a/Documentation/features/debug/kretprobes/arch-support.txt b/Documentation/features/debug/kretprobes/arch-support.txt
index f17131b328e5..b805aada395e 100644
--- a/Documentation/features/debug/kretprobes/arch-support.txt
+++ b/Documentation/features/debug/kretprobes/arch-support.txt
@@ -11,7 +11,7 @@
| arm: | ok |
| arm64: | ok |
| c6x: | TODO |
- | csky: | TODO |
+ | csky: | ok |
| h8300: | TODO |
| hexagon: | TODO |
| ia64: | ok |
diff --git a/Documentation/features/debug/stackprotector/arch-support.txt b/Documentation/features/debug/stackprotector/arch-support.txt
index 32bbdfc64c32..12410f606edc 100644
--- a/Documentation/features/debug/stackprotector/arch-support.txt
+++ b/Documentation/features/debug/stackprotector/arch-support.txt
@@ -11,7 +11,7 @@
| arm: | ok |
| arm64: | ok |
| c6x: | TODO |
- | csky: | TODO |
+ | csky: | ok |
| h8300: | TODO |
| hexagon: | TODO |
| ia64: | TODO |
diff --git a/Documentation/features/debug/uprobes/arch-support.txt b/Documentation/features/debug/uprobes/arch-support.txt
index 1c577d0cfc7f..be8acbb95b54 100644
--- a/Documentation/features/debug/uprobes/arch-support.txt
+++ b/Documentation/features/debug/uprobes/arch-support.txt
@@ -11,7 +11,7 @@
| arm: | ok |
| arm64: | ok |
| c6x: | TODO |
- | csky: | TODO |
+ | csky: | ok |
| h8300: | TODO |
| hexagon: | TODO |
| ia64: | TODO |
diff --git a/Documentation/features/io/dma-contiguous/arch-support.txt b/Documentation/features/io/dma-contiguous/arch-support.txt
index eb28b5c97ca6..895c3b0f6492 100644
--- a/Documentation/features/io/dma-contiguous/arch-support.txt
+++ b/Documentation/features/io/dma-contiguous/arch-support.txt
@@ -16,7 +16,7 @@
| hexagon: | TODO |
| ia64: | TODO |
| m68k: | TODO |
- | microblaze: | TODO |
+ | microblaze: | ok |
| mips: | ok |
| nds32: | TODO |
| nios2: | TODO |
diff --git a/Documentation/features/locking/lockdep/arch-support.txt b/Documentation/features/locking/lockdep/arch-support.txt
index 941fd5b1094d..98cb9d85c55d 100644
--- a/Documentation/features/locking/lockdep/arch-support.txt
+++ b/Documentation/features/locking/lockdep/arch-support.txt
@@ -11,7 +11,7 @@
| arm: | ok |
| arm64: | ok |
| c6x: | TODO |
- | csky: | TODO |
+ | csky: | ok |
| h8300: | TODO |
| hexagon: | ok |
| ia64: | TODO |
diff --git a/Documentation/features/perf/kprobes-event/arch-support.txt b/Documentation/features/perf/kprobes-event/arch-support.txt
index d8278bf62b85..518f352fc727 100644
--- a/Documentation/features/perf/kprobes-event/arch-support.txt
+++ b/Documentation/features/perf/kprobes-event/arch-support.txt
@@ -11,7 +11,7 @@
| arm: | ok |
| arm64: | ok |
| c6x: | TODO |
- | csky: | TODO |
+ | csky: | ok |
| h8300: | TODO |
| hexagon: | ok |
| ia64: | TODO |
@@ -21,7 +21,7 @@
| nds32: | ok |
| nios2: | TODO |
| openrisc: | TODO |
- | parisc: | TODO |
+ | parisc: | ok |
| powerpc: | ok |
| riscv: | TODO |
| s390: | ok |
diff --git a/Documentation/features/perf/perf-regs/arch-support.txt b/Documentation/features/perf/perf-regs/arch-support.txt
index 687d049d9cee..c22cd6f8aa5e 100644
--- a/Documentation/features/perf/perf-regs/arch-support.txt
+++ b/Documentation/features/perf/perf-regs/arch-support.txt
@@ -11,7 +11,7 @@
| arm: | ok |
| arm64: | ok |
| c6x: | TODO |
- | csky: | TODO |
+ | csky: | ok |
| h8300: | TODO |
| hexagon: | TODO |
| ia64: | TODO |
@@ -23,7 +23,7 @@
| openrisc: | TODO |
| parisc: | TODO |
| powerpc: | ok |
- | riscv: | TODO |
+ | riscv: | ok |
| s390: | ok |
| sh: | TODO |
| sparc: | TODO |
diff --git a/Documentation/features/perf/perf-stackdump/arch-support.txt b/Documentation/features/perf/perf-stackdump/arch-support.txt
index 90996e3d18a8..527fe4d0b074 100644
--- a/Documentation/features/perf/perf-stackdump/arch-support.txt
+++ b/Documentation/features/perf/perf-stackdump/arch-support.txt
@@ -11,7 +11,7 @@
| arm: | ok |
| arm64: | ok |
| c6x: | TODO |
- | csky: | TODO |
+ | csky: | ok |
| h8300: | TODO |
| hexagon: | TODO |
| ia64: | TODO |
@@ -23,7 +23,7 @@
| openrisc: | TODO |
| parisc: | TODO |
| powerpc: | ok |
- | riscv: | TODO |
+ | riscv: | ok |
| s390: | ok |
| sh: | TODO |
| sparc: | TODO |
diff --git a/Documentation/features/seccomp/seccomp-filter/arch-support.txt b/Documentation/features/seccomp/seccomp-filter/arch-support.txt
index 4fe6c3c3be5c..c7b837f735b1 100644
--- a/Documentation/features/seccomp/seccomp-filter/arch-support.txt
+++ b/Documentation/features/seccomp/seccomp-filter/arch-support.txt
@@ -23,7 +23,7 @@
| openrisc: | TODO |
| parisc: | ok |
| powerpc: | ok |
- | riscv: | TODO |
+ | riscv: | ok |
| s390: | ok |
| sh: | TODO |
| sparc: | TODO |
diff --git a/Documentation/features/vm/huge-vmap/arch-support.txt b/Documentation/features/vm/huge-vmap/arch-support.txt
index 019131c5acce..8525f1981f19 100644
--- a/Documentation/features/vm/huge-vmap/arch-support.txt
+++ b/Documentation/features/vm/huge-vmap/arch-support.txt
@@ -22,7 +22,7 @@
| nios2: | TODO |
| openrisc: | TODO |
| parisc: | TODO |
- | powerpc: | TODO |
+ | powerpc: | ok |
| riscv: | TODO |
| s390: | TODO |
| sh: | TODO |
diff --git a/Documentation/features/vm/numa-memblock/arch-support.txt b/Documentation/features/vm/numa-memblock/arch-support.txt
deleted file mode 100644
index 3004beb0fd71..000000000000
--- a/Documentation/features/vm/numa-memblock/arch-support.txt
+++ /dev/null
@@ -1,34 +0,0 @@
-#
-# Feature name: numa-memblock
-# Kconfig: HAVE_MEMBLOCK_NODE_MAP
-# description: arch supports NUMA aware memblocks
-#
- -----------------------
- | arch |status|
- -----------------------
- | alpha: | TODO |
- | arc: | .. |
- | arm: | .. |
- | arm64: | ok |
- | c6x: | .. |
- | csky: | .. |
- | h8300: | .. |
- | hexagon: | .. |
- | ia64: | ok |
- | m68k: | .. |
- | microblaze: | ok |
- | mips: | ok |
- | nds32: | TODO |
- | nios2: | .. |
- | openrisc: | .. |
- | parisc: | .. |
- | powerpc: | ok |
- | riscv: | ok |
- | s390: | ok |
- | sh: | ok |
- | sparc: | ok |
- | um: | .. |
- | unicore32: | .. |
- | x86: | ok |
- | xtensa: | .. |
- -----------------------
diff --git a/Documentation/features/vm/pte_special/arch-support.txt b/Documentation/features/vm/pte_special/arch-support.txt
index 3d492a34c8ee..2e017387e228 100644
--- a/Documentation/features/vm/pte_special/arch-support.txt
+++ b/Documentation/features/vm/pte_special/arch-support.txt
@@ -17,7 +17,7 @@
| ia64: | TODO |
| m68k: | TODO |
| microblaze: | TODO |
- | mips: | TODO |
+ | mips: | ok |
| nds32: | TODO |
| nios2: | TODO |
| openrisc: | TODO |
diff --git a/Documentation/filesystems/9p.rst b/Documentation/filesystems/9p.rst
index 671fef39a802..2995279ddc24 100644
--- a/Documentation/filesystems/9p.rst
+++ b/Documentation/filesystems/9p.rst
@@ -192,4 +192,4 @@ For more information on the Plan 9 Operating System check out
http://plan9.bell-labs.com/plan9
For information on Plan 9 from User Space (Plan 9 applications and libraries
-ported to Linux/BSD/OSX/etc) check out http://swtch.com/plan9
+ported to Linux/BSD/OSX/etc) check out https://9fans.github.io/plan9port/
diff --git a/Documentation/filesystems/afs.rst b/Documentation/filesystems/afs.rst
index c4ec39a5966e..cada9464d6bd 100644
--- a/Documentation/filesystems/afs.rst
+++ b/Documentation/filesystems/afs.rst
@@ -70,7 +70,7 @@ list of volume location server IP addresses::
The first module is the AF_RXRPC network protocol driver. This provides the
RxRPC remote operation protocol and may also be accessed from userspace. See:
- Documentation/networking/rxrpc.txt
+ Documentation/networking/rxrpc.rst
The second module is the kerberos RxRPC security driver, and the third module
is the actual filesystem driver for the AFS filesystem.
diff --git a/Documentation/filesystems/automount-support.txt b/Documentation/filesystems/automount-support.rst
index 7d9f82607562..430f0b40796b 100644
--- a/Documentation/filesystems/automount-support.txt
+++ b/Documentation/filesystems/automount-support.rst
@@ -1,3 +1,10 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+Automount Support
+=================
+
+
Support is available for filesystems that wish to do automounting
support (such as kAFS which can be found in fs/afs/ and NFS in
fs/nfs/). This facility includes allowing in-kernel mounts to be
@@ -5,13 +12,12 @@ performed and mountpoint degradation to be requested. The latter can
also be requested by userspace.
-======================
-IN-KERNEL AUTOMOUNTING
+In-Kernel Automounting
======================
See section "Mount Traps" of Documentation/filesystems/autofs.rst
-Then from userspace, you can just do something like:
+Then from userspace, you can just do something like::
[root@andromeda root]# mount -t afs \#root.afs. /afs
[root@andromeda root]# ls /afs
@@ -21,7 +27,7 @@ Then from userspace, you can just do something like:
[root@andromeda root]# ls /afs/cambridge/afsdoc/
ChangeLog html LICENSE pdf RELNOTES-1.2.2
-And then if you look in the mountpoint catalogue, you'll see something like:
+And then if you look in the mountpoint catalogue, you'll see something like::
[root@andromeda root]# cat /proc/mounts
...
@@ -30,8 +36,7 @@ And then if you look in the mountpoint catalogue, you'll see something like:
#afsdoc. /afs/cambridge.redhat.com/afsdoc afs rw 0 0
-===========================
-AUTOMATIC MOUNTPOINT EXPIRY
+Automatic Mountpoint Expiry
===========================
Automatic expiration of mountpoints is easy, provided you've mounted the
@@ -43,7 +48,8 @@ To do expiration, you need to follow these steps:
hung.
(2) When a new mountpoint is created in the ->d_automount method, add
- the mnt to the list using mnt_set_expiry()
+ the mnt to the list using mnt_set_expiry()::
+
mnt_set_expiry(newmnt, &afs_vfsmounts);
(3) When you want mountpoints to be expired, call mark_mounts_for_expiry()
@@ -70,8 +76,7 @@ and the copies of those that are on an expiration list will be added to the
same expiration list.
-=======================
-USERSPACE DRIVEN EXPIRY
+Userspace Driven Expiry
=======================
As an alternative, it is possible for userspace to request expiry of any
diff --git a/Documentation/filesystems/caching/backend-api.txt b/Documentation/filesystems/caching/backend-api.rst
index c418280c915f..19fbf6b9aa36 100644
--- a/Documentation/filesystems/caching/backend-api.txt
+++ b/Documentation/filesystems/caching/backend-api.rst
@@ -1,6 +1,8 @@
- ==========================
- FS-CACHE CACHE BACKEND API
- ==========================
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
+FS-Cache Cache backend API
+==========================
The FS-Cache system provides an API by which actual caches can be supplied to
FS-Cache for it to then serve out to network filesystems and other interested
@@ -9,15 +11,14 @@ parties.
This API is declared in <linux/fscache-cache.h>.
-====================================
-INITIALISING AND REGISTERING A CACHE
+Initialising and Registering a Cache
====================================
To start off, a cache definition must be initialised and registered for each
cache the backend wants to make available. For instance, CacheFS does this in
the fill_super() operation on mounting.
-The cache definition (struct fscache_cache) should be initialised by calling:
+The cache definition (struct fscache_cache) should be initialised by calling::
void fscache_init_cache(struct fscache_cache *cache,
struct fscache_cache_ops *ops,
@@ -26,17 +27,17 @@ The cache definition (struct fscache_cache) should be initialised by calling:
Where:
- (*) "cache" is a pointer to the cache definition;
+ * "cache" is a pointer to the cache definition;
- (*) "ops" is a pointer to the table of operations that the backend supports on
+ * "ops" is a pointer to the table of operations that the backend supports on
this cache; and
- (*) "idfmt" is a format and printf-style arguments for constructing a label
+ * "idfmt" is a format and printf-style arguments for constructing a label
for the cache.
The cache should then be registered with FS-Cache by passing a pointer to the
-previously initialised cache definition to:
+previously initialised cache definition to::
int fscache_add_cache(struct fscache_cache *cache,
struct fscache_object *fsdef,
@@ -44,12 +45,12 @@ previously initialised cache definition to:
Two extra arguments should also be supplied:
- (*) "fsdef" which should point to the object representation for the FS-Cache
+ * "fsdef" which should point to the object representation for the FS-Cache
master index in this cache. Netfs primary index entries will be created
here. FS-Cache keeps the caller's reference to the index object if
successful and will release it upon withdrawal of the cache.
- (*) "tagname" which, if given, should be a text string naming this cache. If
+ * "tagname" which, if given, should be a text string naming this cache. If
this is NULL, the identifier will be used instead. For CacheFS, the
identifier is set to name the underlying block device and the tag can be
supplied by mount.
@@ -58,20 +59,18 @@ This function may return -ENOMEM if it ran out of memory or -EEXIST if the tag
is already in use. 0 will be returned on success.
-=====================
-UNREGISTERING A CACHE
+Unregistering a Cache
=====================
A cache can be withdrawn from the system by calling this function with a
-pointer to the cache definition:
+pointer to the cache definition::
void fscache_withdraw_cache(struct fscache_cache *cache);
In CacheFS's case, this is called by put_super().
-========
-SECURITY
+Security
========
The cache methods are executed one of two contexts:
@@ -89,8 +88,7 @@ be masqueraded for the duration of the cache driver's access to the cache.
This is left to the cache to handle; FS-Cache makes no effort in this regard.
-===================================
-CONTROL AND STATISTICS PRESENTATION
+Control and Statistics Presentation
===================================
The cache may present data to the outside world through FS-Cache's interfaces
@@ -101,11 +99,10 @@ is enabled. This is accessible through the kobject struct fscache_cache::kobj
and is for use by the cache as it sees fit.
-========================
-RELEVANT DATA STRUCTURES
+Relevant Data Structures
========================
- (*) Index/Data file FS-Cache representation cookie:
+ * Index/Data file FS-Cache representation cookie::
struct fscache_cookie {
struct fscache_object_def *def;
@@ -121,7 +118,7 @@ RELEVANT DATA STRUCTURES
cache operations.
- (*) In-cache object representation:
+ * In-cache object representation::
struct fscache_object {
int debug_id;
@@ -150,7 +147,7 @@ RELEVANT DATA STRUCTURES
initialised by calling fscache_object_init(object).
- (*) FS-Cache operation record:
+ * FS-Cache operation record::
struct fscache_operation {
atomic_t usage;
@@ -173,7 +170,7 @@ RELEVANT DATA STRUCTURES
an operation needs more processing time, it should be enqueued again.
- (*) FS-Cache retrieval operation record:
+ * FS-Cache retrieval operation record::
struct fscache_retrieval {
struct fscache_operation op;
@@ -198,7 +195,7 @@ RELEVANT DATA STRUCTURES
it sees fit.
- (*) FS-Cache storage operation record:
+ * FS-Cache storage operation record::
struct fscache_storage {
struct fscache_operation op;
@@ -212,16 +209,17 @@ RELEVANT DATA STRUCTURES
storage.
-================
-CACHE OPERATIONS
+Cache Operations
================
The cache backend provides FS-Cache with a table of operations that can be
performed on the denizens of the cache. These are held in a structure of type:
- struct fscache_cache_ops
+ ::
+
+ struct fscache_cache_ops
- (*) Name of cache provider [mandatory]:
+ * Name of cache provider [mandatory]::
const char *name
@@ -229,7 +227,7 @@ performed on the denizens of the cache. These are held in a structure of type:
the backend.
- (*) Allocate a new object [mandatory]:
+ * Allocate a new object [mandatory]::
struct fscache_object *(*alloc_object)(struct fscache_cache *cache,
struct fscache_cookie *cookie)
@@ -244,7 +242,7 @@ performed on the denizens of the cache. These are held in a structure of type:
form once lookup is complete or aborted.
- (*) Look up and create object [mandatory]:
+ * Look up and create object [mandatory]::
void (*lookup_object)(struct fscache_object *object)
@@ -263,7 +261,7 @@ performed on the denizens of the cache. These are held in a structure of type:
to abort the lookup of that object.
- (*) Release lookup data [mandatory]:
+ * Release lookup data [mandatory]::
void (*lookup_complete)(struct fscache_object *object)
@@ -271,7 +269,7 @@ performed on the denizens of the cache. These are held in a structure of type:
using to perform a lookup.
- (*) Increment object refcount [mandatory]:
+ * Increment object refcount [mandatory]::
struct fscache_object *(*grab_object)(struct fscache_object *object)
@@ -280,7 +278,7 @@ performed on the denizens of the cache. These are held in a structure of type:
It should return the object pointer if successful.
- (*) Lock/Unlock object [mandatory]:
+ * Lock/Unlock object [mandatory]::
void (*lock_object)(struct fscache_object *object)
void (*unlock_object)(struct fscache_object *object)
@@ -289,7 +287,7 @@ performed on the denizens of the cache. These are held in a structure of type:
to schedule with the lock held, so a spinlock isn't sufficient.
- (*) Pin/Unpin object [optional]:
+ * Pin/Unpin object [optional]::
int (*pin_object)(struct fscache_object *object)
void (*unpin_object)(struct fscache_object *object)
@@ -299,7 +297,7 @@ performed on the denizens of the cache. These are held in a structure of type:
enough space in the cache to permit this.
- (*) Check coherency state of an object [mandatory]:
+ * Check coherency state of an object [mandatory]::
int (*check_consistency)(struct fscache_object *object)
@@ -308,7 +306,7 @@ performed on the denizens of the cache. These are held in a structure of type:
if they're consistent and -ESTALE otherwise. -ENOMEM and -ERESTARTSYS
may also be returned.
- (*) Update object [mandatory]:
+ * Update object [mandatory]::
int (*update_object)(struct fscache_object *object)
@@ -317,7 +315,7 @@ performed on the denizens of the cache. These are held in a structure of type:
obtained by calling object->cookie->def->get_aux()/get_attr().
- (*) Invalidate data object [mandatory]:
+ * Invalidate data object [mandatory]::
int (*invalidate_object)(struct fscache_operation *op)
@@ -329,7 +327,7 @@ performed on the denizens of the cache. These are held in a structure of type:
fscache_op_complete() must be called on op before returning.
- (*) Discard object [mandatory]:
+ * Discard object [mandatory]::
void (*drop_object)(struct fscache_object *object)
@@ -341,7 +339,7 @@ performed on the denizens of the cache. These are held in a structure of type:
caller. The caller will invoke the put_object() method as appropriate.
- (*) Release object reference [mandatory]:
+ * Release object reference [mandatory]::
void (*put_object)(struct fscache_object *object)
@@ -349,7 +347,7 @@ performed on the denizens of the cache. These are held in a structure of type:
be freed when all the references to it are released.
- (*) Synchronise a cache [mandatory]:
+ * Synchronise a cache [mandatory]::
void (*sync)(struct fscache_cache *cache)
@@ -357,7 +355,7 @@ performed on the denizens of the cache. These are held in a structure of type:
device.
- (*) Dissociate a cache [mandatory]:
+ * Dissociate a cache [mandatory]::
void (*dissociate_pages)(struct fscache_cache *cache)
@@ -365,7 +363,7 @@ performed on the denizens of the cache. These are held in a structure of type:
cache withdrawal.
- (*) Notification that the attributes on a netfs file changed [mandatory]:
+ * Notification that the attributes on a netfs file changed [mandatory]::
int (*attr_changed)(struct fscache_object *object);
@@ -386,7 +384,7 @@ performed on the denizens of the cache. These are held in a structure of type:
execution of this operation.
- (*) Reserve cache space for an object's data [optional]:
+ * Reserve cache space for an object's data [optional]::
int (*reserve_space)(struct fscache_object *object, loff_t size);
@@ -404,7 +402,7 @@ performed on the denizens of the cache. These are held in a structure of type:
size if larger than that already.
- (*) Request page be read from cache [mandatory]:
+ * Request page be read from cache [mandatory]::
int (*read_or_alloc_page)(struct fscache_retrieval *op,
struct page *page,
@@ -446,7 +444,7 @@ performed on the denizens of the cache. These are held in a structure of type:
with. This will complete the operation when all pages are dealt with.
- (*) Request pages be read from cache [mandatory]:
+ * Request pages be read from cache [mandatory]::
int (*read_or_alloc_pages)(struct fscache_retrieval *op,
struct list_head *pages,
@@ -457,7 +455,7 @@ performed on the denizens of the cache. These are held in a structure of type:
of pages instead of one page. Any pages on which a read operation is
started must be added to the page cache for the specified mapping and also
to the LRU. Such pages must also be removed from the pages list and
- *nr_pages decremented per page.
+ ``*nr_pages`` decremented per page.
If there was an error such as -ENOMEM, then that should be returned; else
if one or more pages couldn't be read or allocated, then -ENOBUFS should
@@ -466,7 +464,7 @@ performed on the denizens of the cache. These are held in a structure of type:
returned.
- (*) Request page be allocated in the cache [mandatory]:
+ * Request page be allocated in the cache [mandatory]::
int (*allocate_page)(struct fscache_retrieval *op,
struct page *page,
@@ -482,7 +480,7 @@ performed on the denizens of the cache. These are held in a structure of type:
allocated, then the netfs page should be marked and 0 returned.
- (*) Request pages be allocated in the cache [mandatory]:
+ * Request pages be allocated in the cache [mandatory]::
int (*allocate_pages)(struct fscache_retrieval *op,
struct list_head *pages,
@@ -493,7 +491,7 @@ performed on the denizens of the cache. These are held in a structure of type:
nr_pages should be treated as for the read_or_alloc_pages() method.
- (*) Request page be written to cache [mandatory]:
+ * Request page be written to cache [mandatory]::
int (*write_page)(struct fscache_storage *op,
struct page *page);
@@ -514,7 +512,7 @@ performed on the denizens of the cache. These are held in a structure of type:
appropriately.
- (*) Discard retained per-page metadata [mandatory]:
+ * Discard retained per-page metadata [mandatory]::
void (*uncache_page)(struct fscache_object *object, struct page *page)
@@ -523,13 +521,12 @@ performed on the denizens of the cache. These are held in a structure of type:
maintains for this page.
-==================
-FS-CACHE UTILITIES
+FS-Cache Utilities
==================
FS-Cache provides some utilities that a cache backend may make use of:
- (*) Note occurrence of an I/O error in a cache:
+ * Note occurrence of an I/O error in a cache::
void fscache_io_error(struct fscache_cache *cache)
@@ -541,7 +538,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
This does not actually withdraw the cache. That must be done separately.
- (*) Invoke the retrieval I/O completion function:
+ * Invoke the retrieval I/O completion function::
void fscache_end_io(struct fscache_retrieval *op, struct page *page,
int error);
@@ -550,8 +547,8 @@ FS-Cache provides some utilities that a cache backend may make use of:
error value should be 0 if successful and an error otherwise.
- (*) Record that one or more pages being retrieved or allocated have been dealt
- with:
+ * Record that one or more pages being retrieved or allocated have been dealt
+ with::
void fscache_retrieval_complete(struct fscache_retrieval *op,
int n_pages);
@@ -562,7 +559,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
completed.
- (*) Record operation completion:
+ * Record operation completion::
void fscache_op_complete(struct fscache_operation *op);
@@ -571,7 +568,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
one or more pending operations to start running.
- (*) Set highest store limit:
+ * Set highest store limit::
void fscache_set_store_limit(struct fscache_object *object,
loff_t i_size);
@@ -581,7 +578,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
rejected by fscache_read_alloc_page() and co with -ENOBUFS.
- (*) Mark pages as being cached:
+ * Mark pages as being cached::
void fscache_mark_pages_cached(struct fscache_retrieval *op,
struct pagevec *pagevec);
@@ -590,7 +587,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
the netfs must call fscache_uncache_page() to unmark the pages.
- (*) Perform coherency check on an object:
+ * Perform coherency check on an object::
enum fscache_checkaux fscache_check_aux(struct fscache_object *object,
const void *data,
@@ -603,29 +600,26 @@ FS-Cache provides some utilities that a cache backend may make use of:
One of three values will be returned:
- (*) FSCACHE_CHECKAUX_OKAY
-
+ FSCACHE_CHECKAUX_OKAY
The coherency data indicates the object is valid as is.
- (*) FSCACHE_CHECKAUX_NEEDS_UPDATE
-
+ FSCACHE_CHECKAUX_NEEDS_UPDATE
The coherency data needs updating, but otherwise the object is
valid.
- (*) FSCACHE_CHECKAUX_OBSOLETE
-
+ FSCACHE_CHECKAUX_OBSOLETE
The coherency data indicates that the object is obsolete and should
be discarded.
- (*) Initialise a freshly allocated object:
+ * Initialise a freshly allocated object::
void fscache_object_init(struct fscache_object *object);
This initialises all the fields in an object representation.
- (*) Indicate the destruction of an object:
+ * Indicate the destruction of an object::
void fscache_object_destroyed(struct fscache_cache *cache);
@@ -635,7 +629,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
all the objects.
- (*) Indicate negative lookup on an object:
+ * Indicate negative lookup on an object::
void fscache_object_lookup_negative(struct fscache_object *object);
@@ -650,7 +644,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
significant - all subsequent calls are ignored.
- (*) Indicate an object has been obtained:
+ * Indicate an object has been obtained::
void fscache_obtained_object(struct fscache_object *object);
@@ -667,7 +661,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
(2) that writes may now proceed against this object.
- (*) Indicate that object lookup failed:
+ * Indicate that object lookup failed::
void fscache_object_lookup_error(struct fscache_object *object);
@@ -676,7 +670,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
as possible.
- (*) Indicate that a stale object was found and discarded:
+ * Indicate that a stale object was found and discarded::
void fscache_object_retrying_stale(struct fscache_object *object);
@@ -685,7 +679,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
discarded from the cache and the lookup will be performed again.
- (*) Indicate that the caching backend killed an object:
+ * Indicate that the caching backend killed an object::
void fscache_object_mark_killed(struct fscache_object *object,
enum fscache_why_object_killed why);
@@ -693,13 +687,20 @@ FS-Cache provides some utilities that a cache backend may make use of:
This is called to indicate that the cache backend preemptively killed an
object. The why parameter should be set to indicate the reason:
- FSCACHE_OBJECT_IS_STALE - the object was stale and needs discarding.
- FSCACHE_OBJECT_NO_SPACE - there was insufficient cache space
- FSCACHE_OBJECT_WAS_RETIRED - the object was retired when relinquished.
- FSCACHE_OBJECT_WAS_CULLED - the object was culled to make space.
+ FSCACHE_OBJECT_IS_STALE
+ - the object was stale and needs discarding.
+
+ FSCACHE_OBJECT_NO_SPACE
+ - there was insufficient cache space
+
+ FSCACHE_OBJECT_WAS_RETIRED
+ - the object was retired when relinquished.
+
+ FSCACHE_OBJECT_WAS_CULLED
+ - the object was culled to make space.
- (*) Get and release references on a retrieval record:
+ * Get and release references on a retrieval record::
void fscache_get_retrieval(struct fscache_retrieval *op);
void fscache_put_retrieval(struct fscache_retrieval *op);
@@ -708,7 +709,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
asynchronous data retrieval and block allocation.
- (*) Enqueue a retrieval record for processing.
+ * Enqueue a retrieval record for processing::
void fscache_enqueue_retrieval(struct fscache_retrieval *op);
@@ -718,7 +719,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
within the callback function.
- (*) List of object state names:
+ * List of object state names::
const char *fscache_object_states[];
diff --git a/Documentation/filesystems/caching/cachefiles.txt b/Documentation/filesystems/caching/cachefiles.rst
index 28aefcbb1442..65d3db476765 100644
--- a/Documentation/filesystems/caching/cachefiles.txt
+++ b/Documentation/filesystems/caching/cachefiles.rst
@@ -1,8 +1,10 @@
- ===============================================
- CacheFiles: CACHE ON ALREADY MOUNTED FILESYSTEM
- ===============================================
+.. SPDX-License-Identifier: GPL-2.0
-Contents:
+===============================================
+CacheFiles: CACHE ON ALREADY MOUNTED FILESYSTEM
+===============================================
+
+.. Contents:
(*) Overview.
@@ -27,8 +29,8 @@ Contents:
(*) Debugging.
-========
-OVERVIEW
+
+Overview
========
CacheFiles is a caching backend that's meant to use as a cache a directory on
@@ -58,8 +60,8 @@ spare space and automatically contract when the set of data requires more
space.
-============
-REQUIREMENTS
+
+Requirements
============
The use of CacheFiles and its daemon requires the following features to be
@@ -79,84 +81,70 @@ It is strongly recommended that the "dir_index" option is enabled on Ext3
filesystems being used as a cache.
-=============
-CONFIGURATION
+Configuration
=============
The cache is configured by a script in /etc/cachefilesd.conf. These commands
set up cache ready for use. The following script commands are available:
- (*) brun <N>%
- (*) bcull <N>%
- (*) bstop <N>%
- (*) frun <N>%
- (*) fcull <N>%
- (*) fstop <N>%
-
+ brun <N>%, bcull <N>%, bstop <N>%, frun <N>%, fcull <N>%, fstop <N>%
Configure the culling limits. Optional. See the section on culling
The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.
The commands beginning with a 'b' are file space (block) limits, those
beginning with an 'f' are file count limits.
- (*) dir <path>
-
+ dir <path>
Specify the directory containing the root of the cache. Mandatory.
- (*) tag <name>
-
+ tag <name>
Specify a tag to FS-Cache to use in distinguishing multiple caches.
Optional. The default is "CacheFiles".
- (*) debug <mask>
-
+ debug <mask>
Specify a numeric bitmask to control debugging in the kernel module.
Optional. The default is zero (all off). The following values can be
OR'd into the mask to collect various information:
+ == =================================================
1 Turn on trace of function entry (_enter() macros)
2 Turn on trace of function exit (_leave() macros)
4 Turn on trace of internal debug points (_debug())
+ == =================================================
- This mask can also be set through sysfs, eg:
+ This mask can also be set through sysfs, eg::
echo 5 >/sys/modules/cachefiles/parameters/debug
-==================
-STARTING THE CACHE
+Starting the Cache
==================
The cache is started by running the daemon. The daemon opens the cache device,
configures the cache and tells it to begin caching. At that point the cache
binds to fscache and the cache becomes live.
-The daemon is run as follows:
+The daemon is run as follows::
/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
The flags are:
- (*) -d
-
+ ``-d``
Increase the debugging level. This can be specified multiple times and
is cumulative with itself.
- (*) -s
-
+ ``-s``
Send messages to stderr instead of syslog.
- (*) -n
-
+ ``-n``
Don't daemonise and go into background.
- (*) -f <configfile>
-
+ ``-f <configfile>``
Use an alternative configuration file rather than the default one.
-===============
-THINGS TO AVOID
+Things to Avoid
===============
Do not mount other things within the cache as this will cause problems. The
@@ -179,8 +167,7 @@ Do not chmod files in the cache. The module creates things with minimal
permissions to prevent random users being able to access them directly.
-=============
-CACHE CULLING
+Cache Culling
=============
The cache may need culling occasionally to make space. This involves
@@ -192,27 +179,21 @@ Cache culling is done on the basis of the percentage of blocks and the
percentage of files available in the underlying filesystem. There are six
"limits":
- (*) brun
- (*) frun
-
+ brun, frun
If the amount of free space and the number of available files in the cache
rises above both these limits, then culling is turned off.
- (*) bcull
- (*) fcull
-
+ bcull, fcull
If the amount of available space or the number of available files in the
cache falls below either of these limits, then culling is started.
- (*) bstop
- (*) fstop
-
+ bstop, fstop
If the amount of available space or the number of available files in the
cache falls below either of these limits, then no further allocation of
disk space or files is permitted until culling has raised things above
these limits again.
-These must be configured thusly:
+These must be configured thusly::
0 <= bstop < bcull < brun < 100
0 <= fstop < fcull < frun < 100
@@ -226,16 +207,14 @@ started as soon as space is made in the table. Objects will be skipped if
their atimes have changed or if the kernel module says it is still using them.
-===============
-CACHE STRUCTURE
+Cache Structure
===============
The CacheFiles module will create two directories in the directory it was
given:
- (*) cache/
-
- (*) graveyard/
+ * cache/
+ * graveyard/
The active cache objects all reside in the first directory. The CacheFiles
kernel module moves any retired or culled objects that it can't simply unlink
@@ -261,10 +240,10 @@ If an object has children, then it will be represented as a directory.
Immediately in the representative directory are a collection of directories
named for hash values of the child object keys with an '@' prepended. Into
this directory, if possible, will be placed the representations of the child
-objects:
+objects::
- INDEX INDEX INDEX DATA FILES
- ========= ========== ================================= ================
+ /INDEX /INDEX /INDEX /DATA FILES
+ /=========/==========/=================================/================
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
@@ -275,7 +254,7 @@ If the key is so long that it exceeds NAME_MAX with the decorations added on to
it, then it will be cut into pieces, the first few of which will be used to
make a nest of directories, and the last one of which will be the objects
inside the last directory. The names of the intermediate directories will have
-'+' prepended:
+'+' prepended::
J1223/@23/+xy...z/+kl...m/Epqr
@@ -288,11 +267,13 @@ To handle this, CacheFiles will use a suitably printable filename directly and
"base-64" encode ones that aren't directly suitable. The two versions of
object filenames indicate the encoding:
+ =============== =============== ===============
OBJECT TYPE PRINTABLE ENCODED
=============== =============== ===============
Index "I..." "J..."
Data "D..." "E..."
Special "S..." "T..."
+ =============== =============== ===============
Intermediate directories are always "@" or "+" as appropriate.
@@ -307,8 +288,7 @@ Note that CacheFiles will erase from the cache any file it doesn't recognise or
any file of an incorrect type (such as a FIFO file or a device file).
-==========================
-SECURITY MODEL AND SELINUX
+Security Model and SELinux
==========================
CacheFiles is implemented to deal properly with the LSM security features of
@@ -331,26 +311,26 @@ When the CacheFiles module is asked to bind to its cache, it:
(1) Finds the security label attached to the root cache directory and uses
that as the security label with which it will create files. By default,
- this is:
+ this is::
cachefiles_var_t
(2) Finds the security label of the process which issued the bind request
- (presumed to be the cachefilesd daemon), which by default will be:
+ (presumed to be the cachefilesd daemon), which by default will be::
cachefilesd_t
and asks LSM to supply a security ID as which it should act given the
- daemon's label. By default, this will be:
+ daemon's label. By default, this will be::
cachefiles_kernel_t
SELinux transitions the daemon's security ID to the module's security ID
- based on a rule of this form in the policy.
+ based on a rule of this form in the policy::
type_transition <daemon's-ID> kernel_t : process <module's-ID>;
- For instance:
+ For instance::
type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
@@ -370,7 +350,7 @@ There are policy source files available in:
http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
-and later versions. In that tarball, see the files:
+and later versions. In that tarball, see the files::
cachefilesd.te
cachefilesd.fc
@@ -379,7 +359,7 @@ and later versions. In that tarball, see the files:
They are built and installed directly by the RPM.
If a non-RPM based system is being used, then copy the above files to their own
-directory and run:
+directory and run::
make -f /usr/share/selinux/devel/Makefile
semodule -i cachefilesd.pp
@@ -394,7 +374,7 @@ an auxiliary policy must be installed to label the alternate location of the
cache.
For instructions on how to add an auxiliary policy to enable the cache to be
-located elsewhere when SELinux is in enforcing mode, please see:
+located elsewhere when SELinux is in enforcing mode, please see::
/usr/share/doc/cachefilesd-*/move-cache.txt
@@ -402,8 +382,7 @@ When the cachefilesd rpm is installed; alternatively, the document can be found
in the sources.
-==================
-A NOTE ON SECURITY
+A Note on Security
==================
CacheFiles makes use of the split security in the task_struct. It allocates
@@ -445,17 +424,18 @@ for CacheFiles to run in a context of a specific security label, or to create
files and directories with another security label.
-=======================
-STATISTICAL INFORMATION
+Statistical Information
=======================
-If FS-Cache is compiled with the following option enabled:
+If FS-Cache is compiled with the following option enabled::
CONFIG_CACHEFILES_HISTOGRAM=y
then it will gather certain statistics and display them through a proc file.
- (*) /proc/fs/cachefiles/histogram
+ /proc/fs/cachefiles/histogram
+
+ ::
cat /proc/fs/cachefiles/histogram
JIFS SECS LOOKUPS MKDIRS CREATES
@@ -465,36 +445,39 @@ then it will gather certain statistics and display them through a proc file.
between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
columns are as follows:
+ ======= =======================================================
COLUMN TIME MEASUREMENT
======= =======================================================
LOOKUPS Length of time to perform a lookup on the backing fs
MKDIRS Length of time to perform a mkdir on the backing fs
CREATES Length of time to perform a create on the backing fs
+ ======= =======================================================
Each row shows the number of events that took a particular range of times.
Each step is 1 jiffy in size. The JIFS column indicates the particular
jiffy range covered, and the SECS field the equivalent number of seconds.
-=========
-DEBUGGING
+Debugging
=========
If CONFIG_CACHEFILES_DEBUG is enabled, the CacheFiles facility can have runtime
-debugging enabled by adjusting the value in:
+debugging enabled by adjusting the value in::
/sys/module/cachefiles/parameters/debug
This is a bitmask of debugging streams to enable:
+ ======= ======= =============================== =======================
BIT VALUE STREAM POINT
======= ======= =============================== =======================
0 1 General Function entry trace
1 2 Function exit trace
2 4 General
+ ======= ======= =============================== =======================
The appropriate set of values should be OR'd together and the result written to
-the control file. For example:
+the control file. For example::
echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
diff --git a/Documentation/filesystems/caching/fscache.rst b/Documentation/filesystems/caching/fscache.rst
new file mode 100644
index 000000000000..70de86922b6a
--- /dev/null
+++ b/Documentation/filesystems/caching/fscache.rst
@@ -0,0 +1,565 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
+General Filesystem Caching
+==========================
+
+Overview
+========
+
+This facility is a general purpose cache for network filesystems, though it
+could be used for caching other things such as ISO9660 filesystems too.
+
+FS-Cache mediates between cache backends (such as CacheFS) and network
+filesystems::
+
+ +---------+
+ | | +--------------+
+ | NFS |--+ | |
+ | | | +-->| CacheFS |
+ +---------+ | +----------+ | | /dev/hda5 |
+ | | | | +--------------+
+ +---------+ +-->| | |
+ | | | |--+
+ | AFS |----->| FS-Cache |
+ | | | |--+
+ +---------+ +-->| | |
+ | | | | +--------------+
+ +---------+ | +----------+ | | |
+ | | | +-->| CacheFiles |
+ | ISOFS |--+ | /var/cache |
+ | | +--------------+
+ +---------+
+
+Or to look at it another way, FS-Cache is a module that provides a caching
+facility to a network filesystem such that the cache is transparent to the
+user::
+
+ +---------+
+ | |
+ | Server |
+ | |
+ +---------+
+ | NETWORK
+ ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ |
+ | +----------+
+ V | |
+ +---------+ | |
+ | | | |
+ | NFS |----->| FS-Cache |
+ | | | |--+
+ +---------+ | | | +--------------+ +--------------+
+ | | | | | | | |
+ V +----------+ +-->| CacheFiles |-->| Ext3 |
+ +---------+ | /var/cache | | /dev/sda6 |
+ | | +--------------+ +--------------+
+ | VFS | ^ ^
+ | | | |
+ +---------+ +--------------+ |
+ | KERNEL SPACE | |
+ ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~
+ | USER SPACE | |
+ V | |
+ +---------+ +--------------+
+ | | | |
+ | Process | | cachefilesd |
+ | | | |
+ +---------+ +--------------+
+
+
+FS-Cache does not follow the idea of completely loading every netfs file
+opened in its entirety into a cache before permitting it to be accessed and
+then serving the pages out of that cache rather than the netfs inode because:
+
+ (1) It must be practical to operate without a cache.
+
+ (2) The size of any accessible file must not be limited to the size of the
+ cache.
+
+ (3) The combined size of all opened files (this includes mapped libraries)
+ must not be limited to the size of the cache.
+
+ (4) The user should not be forced to download an entire file just to do a
+ one-off access of a small portion of it (such as might be done with the
+ "file" program).
+
+It instead serves the cache out in PAGE_SIZE chunks as and when requested by
+the netfs('s) using it.
+
+
+FS-Cache provides the following facilities:
+
+ (1) More than one cache can be used at once. Caches can be selected
+ explicitly by use of tags.
+
+ (2) Caches can be added / removed at any time.
+
+ (3) The netfs is provided with an interface that allows either party to
+ withdraw caching facilities from a file (required for (2)).
+
+ (4) The interface to the netfs returns as few errors as possible, preferring
+ rather to let the netfs remain oblivious.
+
+ (5) Cookies are used to represent indices, files and other objects to the
+ netfs. The simplest cookie is just a NULL pointer - indicating nothing
+ cached there.
+
+ (6) The netfs is allowed to propose - dynamically - any index hierarchy it
+ desires, though it must be aware that the index search function is
+ recursive, stack space is limited, and indices can only be children of
+ indices.
+
+ (7) Data I/O is done direct to and from the netfs's pages. The netfs
+ indicates that page A is at index B of the data-file represented by cookie
+ C, and that it should be read or written. The cache backend may or may
+ not start I/O on that page, but if it does, a netfs callback will be
+ invoked to indicate completion. The I/O may be either synchronous or
+ asynchronous.
+
+ (8) Cookies can be "retired" upon release. At this point FS-Cache will mark
+ them as obsolete and the index hierarchy rooted at that point will get
+ recycled.
+
+ (9) The netfs provides a "match" function for index searches. In addition to
+ saying whether a match was made or not, this can also specify that an
+ entry should be updated or deleted.
+
+(10) As much as possible is done asynchronously.
+
+
+FS-Cache maintains a virtual indexing tree in which all indices, files, objects
+and pages are kept. Bits of this tree may actually reside in one or more
+caches::
+
+ FSDEF
+ |
+ +------------------------------------+
+ | |
+ NFS AFS
+ | |
+ +--------------------------+ +-----------+
+ | | | |
+ homedir mirror afs.org redhat.com
+ | | |
+ +------------+ +---------------+ +----------+
+ | | | | | |
+ 00001 00002 00007 00125 vol00001 vol00002
+ | | | | |
+ +---+---+ +-----+ +---+ +------+------+ +-----+----+
+ | | | | | | | | | | | | |
+ PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak
+ | |
+ PG0 +-------+
+ | |
+ 00001 00003
+ |
+ +---+---+
+ | | |
+ PG0 PG1 PG2
+
+In the example above, you can see two netfs's being backed: NFS and AFS. These
+have different index hierarchies:
+
+ * The NFS primary index contains per-server indices. Each server index is
+ indexed by NFS file handles to get data file objects. Each data file
+ objects can have an array of pages, but may also have further child
+ objects, such as extended attributes and directory entries. Extended
+ attribute objects themselves have page-array contents.
+
+ * The AFS primary index contains per-cell indices. Each cell index contains
+ per-logical-volume indices. Each of volume index contains up to three
+ indices for the read-write, read-only and backup mirrors of those volumes.
+ Each of these contains vnode data file objects, each of which contains an
+ array of pages.
+
+The very top index is the FS-Cache master index in which individual netfs's
+have entries.
+
+Any index object may reside in more than one cache, provided it only has index
+children. Any index with non-index object children will be assumed to only
+reside in one cache.
+
+
+The netfs API to FS-Cache can be found in:
+
+ Documentation/filesystems/caching/netfs-api.rst
+
+The cache backend API to FS-Cache can be found in:
+
+ Documentation/filesystems/caching/backend-api.rst
+
+A description of the internal representations and object state machine can be
+found in:
+
+ Documentation/filesystems/caching/object.rst
+
+
+Statistical Information
+=======================
+
+If FS-Cache is compiled with the following options enabled::
+
+ CONFIG_FSCACHE_STATS=y
+ CONFIG_FSCACHE_HISTOGRAM=y
+
+then it will gather certain statistics and display them through a number of
+proc files.
+
+/proc/fs/fscache/stats
+----------------------
+
+ This shows counts of a number of events that can happen in FS-Cache:
+
++--------------+-------+-------------------------------------------------------+
+|CLASS |EVENT |MEANING |
++==============+=======+=======================================================+
+|Cookies |idx=N |Number of index cookies allocated |
++ +-------+-------------------------------------------------------+
+| |dat=N |Number of data storage cookies allocated |
++ +-------+-------------------------------------------------------+
+| |spc=N |Number of special cookies allocated |
++--------------+-------+-------------------------------------------------------+
+|Objects |alc=N |Number of objects allocated |
++ +-------+-------------------------------------------------------+
+| |nal=N |Number of object allocation failures |
++ +-------+-------------------------------------------------------+
+| |avl=N |Number of objects that reached the available state |
++ +-------+-------------------------------------------------------+
+| |ded=N |Number of objects that reached the dead state |
++--------------+-------+-------------------------------------------------------+
+|ChkAux |non=N |Number of objects that didn't have a coherency check |
++ +-------+-------------------------------------------------------+
+| |ok=N |Number of objects that passed a coherency check |
++ +-------+-------------------------------------------------------+
+| |upd=N |Number of objects that needed a coherency data update |
++ +-------+-------------------------------------------------------+
+| |obs=N |Number of objects that were declared obsolete |
++--------------+-------+-------------------------------------------------------+
+|Pages |mrk=N |Number of pages marked as being cached |
+| |unc=N |Number of uncache page requests seen |
++--------------+-------+-------------------------------------------------------+
+|Acquire |n=N |Number of acquire cookie requests seen |
++ +-------+-------------------------------------------------------+
+| |nul=N |Number of acq reqs given a NULL parent |
++ +-------+-------------------------------------------------------+
+| |noc=N |Number of acq reqs rejected due to no cache available |
++ +-------+-------------------------------------------------------+
+| |ok=N |Number of acq reqs succeeded |
++ +-------+-------------------------------------------------------+
+| |nbf=N |Number of acq reqs rejected due to error |
++ +-------+-------------------------------------------------------+
+| |oom=N |Number of acq reqs failed on ENOMEM |
++--------------+-------+-------------------------------------------------------+
+|Lookups |n=N |Number of lookup calls made on cache backends |
++ +-------+-------------------------------------------------------+
+| |neg=N |Number of negative lookups made |
++ +-------+-------------------------------------------------------+
+| |pos=N |Number of positive lookups made |
++ +-------+-------------------------------------------------------+
+| |crt=N |Number of objects created by lookup |
++ +-------+-------------------------------------------------------+
+| |tmo=N |Number of lookups timed out and requeued |
++--------------+-------+-------------------------------------------------------+
+|Updates |n=N |Number of update cookie requests seen |
++ +-------+-------------------------------------------------------+
+| |nul=N |Number of upd reqs given a NULL parent |
++ +-------+-------------------------------------------------------+
+| |run=N |Number of upd reqs granted CPU time |
++--------------+-------+-------------------------------------------------------+
+|Relinqs |n=N |Number of relinquish cookie requests seen |
++ +-------+-------------------------------------------------------+
+| |nul=N |Number of rlq reqs given a NULL parent |
++ +-------+-------------------------------------------------------+
+| |wcr=N |Number of rlq reqs waited on completion of creation |
++--------------+-------+-------------------------------------------------------+
+|AttrChg |n=N |Number of attribute changed requests seen |
++ +-------+-------------------------------------------------------+
+| |ok=N |Number of attr changed requests queued |
++ +-------+-------------------------------------------------------+
+| |nbf=N |Number of attr changed rejected -ENOBUFS |
++ +-------+-------------------------------------------------------+
+| |oom=N |Number of attr changed failed -ENOMEM |
++ +-------+-------------------------------------------------------+
+| |run=N |Number of attr changed ops given CPU time |
++--------------+-------+-------------------------------------------------------+
+|Allocs |n=N |Number of allocation requests seen |
++ +-------+-------------------------------------------------------+
+| |ok=N |Number of successful alloc reqs |
++ +-------+-------------------------------------------------------+
+| |wt=N |Number of alloc reqs that waited on lookup completion |
++ +-------+-------------------------------------------------------+
+| |nbf=N |Number of alloc reqs rejected -ENOBUFS |
++ +-------+-------------------------------------------------------+
+| |int=N |Number of alloc reqs aborted -ERESTARTSYS |
++ +-------+-------------------------------------------------------+
+| |ops=N |Number of alloc reqs submitted |
++ +-------+-------------------------------------------------------+
+| |owt=N |Number of alloc reqs waited for CPU time |
++ +-------+-------------------------------------------------------+
+| |abt=N |Number of alloc reqs aborted due to object death |
++--------------+-------+-------------------------------------------------------+
+|Retrvls |n=N |Number of retrieval (read) requests seen |
++ +-------+-------------------------------------------------------+
+| |ok=N |Number of successful retr reqs |
++ +-------+-------------------------------------------------------+
+| |wt=N |Number of retr reqs that waited on lookup completion |
++ +-------+-------------------------------------------------------+
+| |nod=N |Number of retr reqs returned -ENODATA |
++ +-------+-------------------------------------------------------+
+| |nbf=N |Number of retr reqs rejected -ENOBUFS |
++ +-------+-------------------------------------------------------+
+| |int=N |Number of retr reqs aborted -ERESTARTSYS |
++ +-------+-------------------------------------------------------+
+| |oom=N |Number of retr reqs failed -ENOMEM |
++ +-------+-------------------------------------------------------+
+| |ops=N |Number of retr reqs submitted |
++ +-------+-------------------------------------------------------+
+| |owt=N |Number of retr reqs waited for CPU time |
++ +-------+-------------------------------------------------------+
+| |abt=N |Number of retr reqs aborted due to object death |
++--------------+-------+-------------------------------------------------------+
+|Stores |n=N |Number of storage (write) requests seen |
++ +-------+-------------------------------------------------------+
+| |ok=N |Number of successful store reqs |
++ +-------+-------------------------------------------------------+
+| |agn=N |Number of store reqs on a page already pending storage |
++ +-------+-------------------------------------------------------+
+| |nbf=N |Number of store reqs rejected -ENOBUFS |
++ +-------+-------------------------------------------------------+
+| |oom=N |Number of store reqs failed -ENOMEM |
++ +-------+-------------------------------------------------------+
+| |ops=N |Number of store reqs submitted |
++ +-------+-------------------------------------------------------+
+| |run=N |Number of store reqs granted CPU time |
++ +-------+-------------------------------------------------------+
+| |pgs=N |Number of pages given store req processing time |
++ +-------+-------------------------------------------------------+
+| |rxd=N |Number of store reqs deleted from tracking tree |
++ +-------+-------------------------------------------------------+
+| |olm=N |Number of store reqs over store limit |
++--------------+-------+-------------------------------------------------------+
+|VmScan |nos=N |Number of release reqs against pages with no |
+| | |pending store |
++ +-------+-------------------------------------------------------+
+| |gon=N |Number of release reqs against pages stored by |
+| | |time lock granted |
++ +-------+-------------------------------------------------------+
+| |bsy=N |Number of release reqs ignored due to in-progress store|
++ +-------+-------------------------------------------------------+
+| |can=N |Number of page stores cancelled due to release req |
++--------------+-------+-------------------------------------------------------+
+|Ops |pend=N |Number of times async ops added to pending queues |
++ +-------+-------------------------------------------------------+
+| |run=N |Number of times async ops given CPU time |
++ +-------+-------------------------------------------------------+
+| |enq=N |Number of times async ops queued for processing |
++ +-------+-------------------------------------------------------+
+| |can=N |Number of async ops cancelled |
++ +-------+-------------------------------------------------------+
+| |rej=N |Number of async ops rejected due to object |
+| | |lookup/create failure |
++ +-------+-------------------------------------------------------+
+| |ini=N |Number of async ops initialised |
++ +-------+-------------------------------------------------------+
+| |dfr=N |Number of async ops queued for deferred release |
++ +-------+-------------------------------------------------------+
+| |rel=N |Number of async ops released |
+| | |(should equal ini=N when idle) |
++ +-------+-------------------------------------------------------+
+| |gc=N |Number of deferred-release async ops garbage collected |
++--------------+-------+-------------------------------------------------------+
+|CacheOp |alo=N |Number of in-progress alloc_object() cache ops |
++ +-------+-------------------------------------------------------+
+| |luo=N |Number of in-progress lookup_object() cache ops |
++ +-------+-------------------------------------------------------+
+| |luc=N |Number of in-progress lookup_complete() cache ops |
++ +-------+-------------------------------------------------------+
+| |gro=N |Number of in-progress grab_object() cache ops |
++ +-------+-------------------------------------------------------+
+| |upo=N |Number of in-progress update_object() cache ops |
++ +-------+-------------------------------------------------------+
+| |dro=N |Number of in-progress drop_object() cache ops |
++ +-------+-------------------------------------------------------+
+| |pto=N |Number of in-progress put_object() cache ops |
++ +-------+-------------------------------------------------------+
+| |syn=N |Number of in-progress sync_cache() cache ops |
++ +-------+-------------------------------------------------------+
+| |atc=N |Number of in-progress attr_changed() cache ops |
++ +-------+-------------------------------------------------------+
+| |rap=N |Number of in-progress read_or_alloc_page() cache ops |
++ +-------+-------------------------------------------------------+
+| |ras=N |Number of in-progress read_or_alloc_pages() cache ops |
++ +-------+-------------------------------------------------------+
+| |alp=N |Number of in-progress allocate_page() cache ops |
++ +-------+-------------------------------------------------------+
+| |als=N |Number of in-progress allocate_pages() cache ops |
++ +-------+-------------------------------------------------------+
+| |wrp=N |Number of in-progress write_page() cache ops |
++ +-------+-------------------------------------------------------+
+| |ucp=N |Number of in-progress uncache_page() cache ops |
++ +-------+-------------------------------------------------------+
+| |dsp=N |Number of in-progress dissociate_pages() cache ops |
++--------------+-------+-------------------------------------------------------+
+|CacheEv |nsp=N |Number of object lookups/creations rejected due to |
+| | |lack of space |
++ +-------+-------------------------------------------------------+
+| |stl=N |Number of stale objects deleted |
++ +-------+-------------------------------------------------------+
+| |rtr=N |Number of objects retired when relinquished |
++ +-------+-------------------------------------------------------+
+| |cul=N |Number of objects culled |
++--------------+-------+-------------------------------------------------------+
+
+
+
+/proc/fs/fscache/histogram
+--------------------------
+
+ ::
+
+ cat /proc/fs/fscache/histogram
+ JIFS SECS OBJ INST OP RUNS OBJ RUNS RETRV DLY RETRIEVLS
+ ===== ===== ========= ========= ========= ========= =========
+
+ This shows the breakdown of the number of times each amount of time
+ between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
+ columns are as follows:
+
+ ========= =======================================================
+ COLUMN TIME MEASUREMENT
+ ========= =======================================================
+ OBJ INST Length of time to instantiate an object
+ OP RUNS Length of time a call to process an operation took
+ OBJ RUNS Length of time a call to process an object event took
+ RETRV DLY Time between an requesting a read and lookup completing
+ RETRIEVLS Time between beginning and end of a retrieval
+ ========= =======================================================
+
+ Each row shows the number of events that took a particular range of times.
+ Each step is 1 jiffy in size. The JIFS column indicates the particular
+ jiffy range covered, and the SECS field the equivalent number of seconds.
+
+
+
+Object List
+===========
+
+If CONFIG_FSCACHE_OBJECT_LIST is enabled, the FS-Cache facility will maintain a
+list of all the objects currently allocated and allow them to be viewed
+through::
+
+ /proc/fs/fscache/objects
+
+This will look something like::
+
+ [root@andromeda ~]# head /proc/fs/fscache/objects
+ OBJECT PARENT STAT CHLDN OPS OOP IPR EX READS EM EV F S | NETFS_COOKIE_DEF TY FL NETFS_DATA OBJECT_KEY, AUX_DATA
+ ======== ======== ==== ===== === === === == ===== == == = = | ================ == == ================ ================
+ 17e4b 2 ACTV 0 0 0 0 0 0 7b 4 0 0 | NFS.fh DT 0 ffff88001dd82820 010006017edcf8bbc93b43298fdfbe71e50b57b13a172c0117f38472, e567634700000000000000000000000063f2404a000000000000000000000000c9030000000000000000000063f2404a
+ 1693a 2 ACTV 0 0 0 0 0 0 7b 4 0 0 | NFS.fh DT 0 ffff88002db23380 010006017edcf8bbc93b43298fdfbe71e50b57b1e0162c01a2df0ea6, 420ebc4a000000000000000000000000420ebc4a0000000000000000000000000e1801000000000000000000420ebc4a
+
+where the first set of columns before the '|' describe the object:
+
+ ======= ===============================================================
+ COLUMN DESCRIPTION
+ ======= ===============================================================
+ OBJECT Object debugging ID (appears as OBJ%x in some debug messages)
+ PARENT Debugging ID of parent object
+ STAT Object state
+ CHLDN Number of child objects of this object
+ OPS Number of outstanding operations on this object
+ OOP Number of outstanding child object management operations
+ IPR
+ EX Number of outstanding exclusive operations
+ READS Number of outstanding read operations
+ EM Object's event mask
+ EV Events raised on this object
+ F Object flags
+ S Object work item busy state mask (1:pending 2:running)
+ ======= ===============================================================
+
+and the second set of columns describe the object's cookie, if present:
+
+ ================ ======================================================
+ COLUMN DESCRIPTION
+ ================ ======================================================
+ NETFS_COOKIE_DEF Name of netfs cookie definition
+ TY Cookie type (IX - index, DT - data, hex - special)
+ FL Cookie flags
+ NETFS_DATA Netfs private data stored in the cookie
+ OBJECT_KEY Object key } 1 column, with separating comma
+ AUX_DATA Object aux data } presence may be configured
+ ================ ======================================================
+
+The data shown may be filtered by attaching the a key to an appropriate keyring
+before viewing the file. Something like::
+
+ keyctl add user fscache:objlist <restrictions> @s
+
+where <restrictions> are a selection of the following letters:
+
+ == =========================================================
+ K Show hexdump of object key (don't show if not given)
+ A Show hexdump of object aux data (don't show if not given)
+ == =========================================================
+
+and the following paired letters:
+
+ == =========================================================
+ C Show objects that have a cookie
+ c Show objects that don't have a cookie
+ B Show objects that are busy
+ b Show objects that aren't busy
+ W Show objects that have pending writes
+ w Show objects that don't have pending writes
+ R Show objects that have outstanding reads
+ r Show objects that don't have outstanding reads
+ S Show objects that have work queued
+ s Show objects that don't have work queued
+ == =========================================================
+
+If neither side of a letter pair is given, then both are implied. For example:
+
+ keyctl add user fscache:objlist KB @s
+
+shows objects that are busy, and lists their object keys, but does not dump
+their auxiliary data. It also implies "CcWwRrSs", but as 'B' is given, 'b' is
+not implied.
+
+By default all objects and all fields will be shown.
+
+
+Debugging
+=========
+
+If CONFIG_FSCACHE_DEBUG is enabled, the FS-Cache facility can have runtime
+debugging enabled by adjusting the value in::
+
+ /sys/module/fscache/parameters/debug
+
+This is a bitmask of debugging streams to enable:
+
+ ======= ======= =============================== =======================
+ BIT VALUE STREAM POINT
+ ======= ======= =============================== =======================
+ 0 1 Cache management Function entry trace
+ 1 2 Function exit trace
+ 2 4 General
+ 3 8 Cookie management Function entry trace
+ 4 16 Function exit trace
+ 5 32 General
+ 6 64 Page handling Function entry trace
+ 7 128 Function exit trace
+ 8 256 General
+ 9 512 Operation management Function entry trace
+ 10 1024 Function exit trace
+ 11 2048 General
+ ======= ======= =============================== =======================
+
+The appropriate set of values should be OR'd together and the result written to
+the control file. For example::
+
+ echo $((1|8|64)) >/sys/module/fscache/parameters/debug
+
+will turn on all function entry debugging.
diff --git a/Documentation/filesystems/caching/fscache.txt b/Documentation/filesystems/caching/fscache.txt
deleted file mode 100644
index 50f0a5757f48..000000000000
--- a/Documentation/filesystems/caching/fscache.txt
+++ /dev/null
@@ -1,448 +0,0 @@
- ==========================
- General Filesystem Caching
- ==========================
-
-========
-OVERVIEW
-========
-
-This facility is a general purpose cache for network filesystems, though it
-could be used for caching other things such as ISO9660 filesystems too.
-
-FS-Cache mediates between cache backends (such as CacheFS) and network
-filesystems:
-
- +---------+
- | | +--------------+
- | NFS |--+ | |
- | | | +-->| CacheFS |
- +---------+ | +----------+ | | /dev/hda5 |
- | | | | +--------------+
- +---------+ +-->| | |
- | | | |--+
- | AFS |----->| FS-Cache |
- | | | |--+
- +---------+ +-->| | |
- | | | | +--------------+
- +---------+ | +----------+ | | |
- | | | +-->| CacheFiles |
- | ISOFS |--+ | /var/cache |
- | | +--------------+
- +---------+
-
-Or to look at it another way, FS-Cache is a module that provides a caching
-facility to a network filesystem such that the cache is transparent to the
-user:
-
- +---------+
- | |
- | Server |
- | |
- +---------+
- | NETWORK
- ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- |
- | +----------+
- V | |
- +---------+ | |
- | | | |
- | NFS |----->| FS-Cache |
- | | | |--+
- +---------+ | | | +--------------+ +--------------+
- | | | | | | | |
- V +----------+ +-->| CacheFiles |-->| Ext3 |
- +---------+ | /var/cache | | /dev/sda6 |
- | | +--------------+ +--------------+
- | VFS | ^ ^
- | | | |
- +---------+ +--------------+ |
- | KERNEL SPACE | |
- ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~
- | USER SPACE | |
- V | |
- +---------+ +--------------+
- | | | |
- | Process | | cachefilesd |
- | | | |
- +---------+ +--------------+
-
-
-FS-Cache does not follow the idea of completely loading every netfs file
-opened in its entirety into a cache before permitting it to be accessed and
-then serving the pages out of that cache rather than the netfs inode because:
-
- (1) It must be practical to operate without a cache.
-
- (2) The size of any accessible file must not be limited to the size of the
- cache.
-
- (3) The combined size of all opened files (this includes mapped libraries)
- must not be limited to the size of the cache.
-
- (4) The user should not be forced to download an entire file just to do a
- one-off access of a small portion of it (such as might be done with the
- "file" program).
-
-It instead serves the cache out in PAGE_SIZE chunks as and when requested by
-the netfs('s) using it.
-
-
-FS-Cache provides the following facilities:
-
- (1) More than one cache can be used at once. Caches can be selected
- explicitly by use of tags.
-
- (2) Caches can be added / removed at any time.
-
- (3) The netfs is provided with an interface that allows either party to
- withdraw caching facilities from a file (required for (2)).
-
- (4) The interface to the netfs returns as few errors as possible, preferring
- rather to let the netfs remain oblivious.
-
- (5) Cookies are used to represent indices, files and other objects to the
- netfs. The simplest cookie is just a NULL pointer - indicating nothing
- cached there.
-
- (6) The netfs is allowed to propose - dynamically - any index hierarchy it
- desires, though it must be aware that the index search function is
- recursive, stack space is limited, and indices can only be children of
- indices.
-
- (7) Data I/O is done direct to and from the netfs's pages. The netfs
- indicates that page A is at index B of the data-file represented by cookie
- C, and that it should be read or written. The cache backend may or may
- not start I/O on that page, but if it does, a netfs callback will be
- invoked to indicate completion. The I/O may be either synchronous or
- asynchronous.
-
- (8) Cookies can be "retired" upon release. At this point FS-Cache will mark
- them as obsolete and the index hierarchy rooted at that point will get
- recycled.
-
- (9) The netfs provides a "match" function for index searches. In addition to
- saying whether a match was made or not, this can also specify that an
- entry should be updated or deleted.
-
-(10) As much as possible is done asynchronously.
-
-
-FS-Cache maintains a virtual indexing tree in which all indices, files, objects
-and pages are kept. Bits of this tree may actually reside in one or more
-caches.
-
- FSDEF
- |
- +------------------------------------+
- | |
- NFS AFS
- | |
- +--------------------------+ +-----------+
- | | | |
- homedir mirror afs.org redhat.com
- | | |
- +------------+ +---------------+ +----------+
- | | | | | |
- 00001 00002 00007 00125 vol00001 vol00002
- | | | | |
- +---+---+ +-----+ +---+ +------+------+ +-----+----+
- | | | | | | | | | | | | |
-PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak
- | |
- PG0 +-------+
- | |
- 00001 00003
- |
- +---+---+
- | | |
- PG0 PG1 PG2
-
-In the example above, you can see two netfs's being backed: NFS and AFS. These
-have different index hierarchies:
-
- (*) The NFS primary index contains per-server indices. Each server index is
- indexed by NFS file handles to get data file objects. Each data file
- objects can have an array of pages, but may also have further child
- objects, such as extended attributes and directory entries. Extended
- attribute objects themselves have page-array contents.
-
- (*) The AFS primary index contains per-cell indices. Each cell index contains
- per-logical-volume indices. Each of volume index contains up to three
- indices for the read-write, read-only and backup mirrors of those volumes.
- Each of these contains vnode data file objects, each of which contains an
- array of pages.
-
-The very top index is the FS-Cache master index in which individual netfs's
-have entries.
-
-Any index object may reside in more than one cache, provided it only has index
-children. Any index with non-index object children will be assumed to only
-reside in one cache.
-
-
-The netfs API to FS-Cache can be found in:
-
- Documentation/filesystems/caching/netfs-api.txt
-
-The cache backend API to FS-Cache can be found in:
-
- Documentation/filesystems/caching/backend-api.txt
-
-A description of the internal representations and object state machine can be
-found in:
-
- Documentation/filesystems/caching/object.txt
-
-
-=======================
-STATISTICAL INFORMATION
-=======================
-
-If FS-Cache is compiled with the following options enabled:
-
- CONFIG_FSCACHE_STATS=y
- CONFIG_FSCACHE_HISTOGRAM=y
-
-then it will gather certain statistics and display them through a number of
-proc files.
-
- (*) /proc/fs/fscache/stats
-
- This shows counts of a number of events that can happen in FS-Cache:
-
- CLASS EVENT MEANING
- ======= ======= =======================================================
- Cookies idx=N Number of index cookies allocated
- dat=N Number of data storage cookies allocated
- spc=N Number of special cookies allocated
- Objects alc=N Number of objects allocated
- nal=N Number of object allocation failures
- avl=N Number of objects that reached the available state
- ded=N Number of objects that reached the dead state
- ChkAux non=N Number of objects that didn't have a coherency check
- ok=N Number of objects that passed a coherency check
- upd=N Number of objects that needed a coherency data update
- obs=N Number of objects that were declared obsolete
- Pages mrk=N Number of pages marked as being cached
- unc=N Number of uncache page requests seen
- Acquire n=N Number of acquire cookie requests seen
- nul=N Number of acq reqs given a NULL parent
- noc=N Number of acq reqs rejected due to no cache available
- ok=N Number of acq reqs succeeded
- nbf=N Number of acq reqs rejected due to error
- oom=N Number of acq reqs failed on ENOMEM
- Lookups n=N Number of lookup calls made on cache backends
- neg=N Number of negative lookups made
- pos=N Number of positive lookups made
- crt=N Number of objects created by lookup
- tmo=N Number of lookups timed out and requeued
- Updates n=N Number of update cookie requests seen
- nul=N Number of upd reqs given a NULL parent
- run=N Number of upd reqs granted CPU time
- Relinqs n=N Number of relinquish cookie requests seen
- nul=N Number of rlq reqs given a NULL parent
- wcr=N Number of rlq reqs waited on completion of creation
- AttrChg n=N Number of attribute changed requests seen
- ok=N Number of attr changed requests queued
- nbf=N Number of attr changed rejected -ENOBUFS
- oom=N Number of attr changed failed -ENOMEM
- run=N Number of attr changed ops given CPU time
- Allocs n=N Number of allocation requests seen
- ok=N Number of successful alloc reqs
- wt=N Number of alloc reqs that waited on lookup completion
- nbf=N Number of alloc reqs rejected -ENOBUFS
- int=N Number of alloc reqs aborted -ERESTARTSYS
- ops=N Number of alloc reqs submitted
- owt=N Number of alloc reqs waited for CPU time
- abt=N Number of alloc reqs aborted due to object death
- Retrvls n=N Number of retrieval (read) requests seen
- ok=N Number of successful retr reqs
- wt=N Number of retr reqs that waited on lookup completion
- nod=N Number of retr reqs returned -ENODATA
- nbf=N Number of retr reqs rejected -ENOBUFS
- int=N Number of retr reqs aborted -ERESTARTSYS
- oom=N Number of retr reqs failed -ENOMEM
- ops=N Number of retr reqs submitted
- owt=N Number of retr reqs waited for CPU time
- abt=N Number of retr reqs aborted due to object death
- Stores n=N Number of storage (write) requests seen
- ok=N Number of successful store reqs
- agn=N Number of store reqs on a page already pending storage
- nbf=N Number of store reqs rejected -ENOBUFS
- oom=N Number of store reqs failed -ENOMEM
- ops=N Number of store reqs submitted
- run=N Number of store reqs granted CPU time
- pgs=N Number of pages given store req processing time
- rxd=N Number of store reqs deleted from tracking tree
- olm=N Number of store reqs over store limit
- VmScan nos=N Number of release reqs against pages with no pending store
- gon=N Number of release reqs against pages stored by time lock granted
- bsy=N Number of release reqs ignored due to in-progress store
- can=N Number of page stores cancelled due to release req
- Ops pend=N Number of times async ops added to pending queues
- run=N Number of times async ops given CPU time
- enq=N Number of times async ops queued for processing
- can=N Number of async ops cancelled
- rej=N Number of async ops rejected due to object lookup/create failure
- ini=N Number of async ops initialised
- dfr=N Number of async ops queued for deferred release
- rel=N Number of async ops released (should equal ini=N when idle)
- gc=N Number of deferred-release async ops garbage collected
- CacheOp alo=N Number of in-progress alloc_object() cache ops
- luo=N Number of in-progress lookup_object() cache ops
- luc=N Number of in-progress lookup_complete() cache ops
- gro=N Number of in-progress grab_object() cache ops
- upo=N Number of in-progress update_object() cache ops
- dro=N Number of in-progress drop_object() cache ops
- pto=N Number of in-progress put_object() cache ops
- syn=N Number of in-progress sync_cache() cache ops
- atc=N Number of in-progress attr_changed() cache ops
- rap=N Number of in-progress read_or_alloc_page() cache ops
- ras=N Number of in-progress read_or_alloc_pages() cache ops
- alp=N Number of in-progress allocate_page() cache ops
- als=N Number of in-progress allocate_pages() cache ops
- wrp=N Number of in-progress write_page() cache ops
- ucp=N Number of in-progress uncache_page() cache ops
- dsp=N Number of in-progress dissociate_pages() cache ops
- CacheEv nsp=N Number of object lookups/creations rejected due to lack of space
- stl=N Number of stale objects deleted
- rtr=N Number of objects retired when relinquished
- cul=N Number of objects culled
-
-
- (*) /proc/fs/fscache/histogram
-
- cat /proc/fs/fscache/histogram
- JIFS SECS OBJ INST OP RUNS OBJ RUNS RETRV DLY RETRIEVLS
- ===== ===== ========= ========= ========= ========= =========
-
- This shows the breakdown of the number of times each amount of time
- between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
- columns are as follows:
-
- COLUMN TIME MEASUREMENT
- ======= =======================================================
- OBJ INST Length of time to instantiate an object
- OP RUNS Length of time a call to process an operation took
- OBJ RUNS Length of time a call to process an object event took
- RETRV DLY Time between an requesting a read and lookup completing
- RETRIEVLS Time between beginning and end of a retrieval
-
- Each row shows the number of events that took a particular range of times.
- Each step is 1 jiffy in size. The JIFS column indicates the particular
- jiffy range covered, and the SECS field the equivalent number of seconds.
-
-
-===========
-OBJECT LIST
-===========
-
-If CONFIG_FSCACHE_OBJECT_LIST is enabled, the FS-Cache facility will maintain a
-list of all the objects currently allocated and allow them to be viewed
-through:
-
- /proc/fs/fscache/objects
-
-This will look something like:
-
- [root@andromeda ~]# head /proc/fs/fscache/objects
- OBJECT PARENT STAT CHLDN OPS OOP IPR EX READS EM EV F S | NETFS_COOKIE_DEF TY FL NETFS_DATA OBJECT_KEY, AUX_DATA
- ======== ======== ==== ===== === === === == ===== == == = = | ================ == == ================ ================
- 17e4b 2 ACTV 0 0 0 0 0 0 7b 4 0 0 | NFS.fh DT 0 ffff88001dd82820 010006017edcf8bbc93b43298fdfbe71e50b57b13a172c0117f38472, e567634700000000000000000000000063f2404a000000000000000000000000c9030000000000000000000063f2404a
- 1693a 2 ACTV 0 0 0 0 0 0 7b 4 0 0 | NFS.fh DT 0 ffff88002db23380 010006017edcf8bbc93b43298fdfbe71e50b57b1e0162c01a2df0ea6, 420ebc4a000000000000000000000000420ebc4a0000000000000000000000000e1801000000000000000000420ebc4a
-
-where the first set of columns before the '|' describe the object:
-
- COLUMN DESCRIPTION
- ======= ===============================================================
- OBJECT Object debugging ID (appears as OBJ%x in some debug messages)
- PARENT Debugging ID of parent object
- STAT Object state
- CHLDN Number of child objects of this object
- OPS Number of outstanding operations on this object
- OOP Number of outstanding child object management operations
- IPR
- EX Number of outstanding exclusive operations
- READS Number of outstanding read operations
- EM Object's event mask
- EV Events raised on this object
- F Object flags
- S Object work item busy state mask (1:pending 2:running)
-
-and the second set of columns describe the object's cookie, if present:
-
- COLUMN DESCRIPTION
- =============== =======================================================
- NETFS_COOKIE_DEF Name of netfs cookie definition
- TY Cookie type (IX - index, DT - data, hex - special)
- FL Cookie flags
- NETFS_DATA Netfs private data stored in the cookie
- OBJECT_KEY Object key } 1 column, with separating comma
- AUX_DATA Object aux data } presence may be configured
-
-The data shown may be filtered by attaching the a key to an appropriate keyring
-before viewing the file. Something like:
-
- keyctl add user fscache:objlist <restrictions> @s
-
-where <restrictions> are a selection of the following letters:
-
- K Show hexdump of object key (don't show if not given)
- A Show hexdump of object aux data (don't show if not given)
-
-and the following paired letters:
-
- C Show objects that have a cookie
- c Show objects that don't have a cookie
- B Show objects that are busy
- b Show objects that aren't busy
- W Show objects that have pending writes
- w Show objects that don't have pending writes
- R Show objects that have outstanding reads
- r Show objects that don't have outstanding reads
- S Show objects that have work queued
- s Show objects that don't have work queued
-
-If neither side of a letter pair is given, then both are implied. For example:
-
- keyctl add user fscache:objlist KB @s
-
-shows objects that are busy, and lists their object keys, but does not dump
-their auxiliary data. It also implies "CcWwRrSs", but as 'B' is given, 'b' is
-not implied.
-
-By default all objects and all fields will be shown.
-
-
-=========
-DEBUGGING
-=========
-
-If CONFIG_FSCACHE_DEBUG is enabled, the FS-Cache facility can have runtime
-debugging enabled by adjusting the value in:
-
- /sys/module/fscache/parameters/debug
-
-This is a bitmask of debugging streams to enable:
-
- BIT VALUE STREAM POINT
- ======= ======= =============================== =======================
- 0 1 Cache management Function entry trace
- 1 2 Function exit trace
- 2 4 General
- 3 8 Cookie management Function entry trace
- 4 16 Function exit trace
- 5 32 General
- 6 64 Page handling Function entry trace
- 7 128 Function exit trace
- 8 256 General
- 9 512 Operation management Function entry trace
- 10 1024 Function exit trace
- 11 2048 General
-
-The appropriate set of values should be OR'd together and the result written to
-the control file. For example:
-
- echo $((1|8|64)) >/sys/module/fscache/parameters/debug
-
-will turn on all function entry debugging.
diff --git a/Documentation/filesystems/caching/index.rst b/Documentation/filesystems/caching/index.rst
new file mode 100644
index 000000000000..033da7ac7c6e
--- /dev/null
+++ b/Documentation/filesystems/caching/index.rst
@@ -0,0 +1,14 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Filesystem Caching
+==================
+
+.. toctree::
+ :maxdepth: 2
+
+ fscache
+ object
+ backend-api
+ cachefiles
+ netfs-api
+ operations
diff --git a/Documentation/filesystems/caching/netfs-api.txt b/Documentation/filesystems/caching/netfs-api.rst
index ba968e8f5704..d9f14b8610ba 100644
--- a/Documentation/filesystems/caching/netfs-api.txt
+++ b/Documentation/filesystems/caching/netfs-api.rst
@@ -1,6 +1,8 @@
- ===============================
- FS-CACHE NETWORK FILESYSTEM API
- ===============================
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+FS-Cache Network Filesystem API
+===============================
There's an API by which a network filesystem can make use of the FS-Cache
facilities. This is based around a number of principles:
@@ -19,7 +21,7 @@ facilities. This is based around a number of principles:
This API is declared in <linux/fscache.h>.
-This document contains the following sections:
+.. This document contains the following sections:
(1) Network filesystem definition
(2) Index definition
@@ -41,12 +43,11 @@ This document contains the following sections:
(18) FS-Cache specific page flags.
-=============================
-NETWORK FILESYSTEM DEFINITION
+Network Filesystem Definition
=============================
FS-Cache needs a description of the network filesystem. This is specified
-using a record of the following structure:
+using a record of the following structure::
struct fscache_netfs {
uint32_t version;
@@ -71,7 +72,7 @@ The fields are:
another parameter passed into the registration function.
For example, kAFS (linux/fs/afs/) uses the following definitions to describe
-itself:
+itself::
struct fscache_netfs afs_cache_netfs = {
.version = 0,
@@ -79,8 +80,7 @@ itself:
};
-================
-INDEX DEFINITION
+Index Definition
================
Indices are used for two purposes:
@@ -114,11 +114,10 @@ There are some limits on indices:
function is recursive. Too many layers will run the kernel out of stack.
-=================
-OBJECT DEFINITION
+Object Definition
=================
-To define an object, a structure of the following type should be filled out:
+To define an object, a structure of the following type should be filled out::
struct fscache_cookie_def
{
@@ -149,16 +148,13 @@ This has the following fields:
This is one of the following values:
- (*) FSCACHE_COOKIE_TYPE_INDEX
-
+ FSCACHE_COOKIE_TYPE_INDEX
This defines an index, which is a special FS-Cache type.
- (*) FSCACHE_COOKIE_TYPE_DATAFILE
-
+ FSCACHE_COOKIE_TYPE_DATAFILE
This defines an ordinary data file.
- (*) Any other value between 2 and 255
-
+ Any other value between 2 and 255
This defines an extraordinary object such as an XATTR.
(2) The name of the object type (NUL terminated unless all 16 chars are used)
@@ -192,9 +188,14 @@ This has the following fields:
If present, the function should return one of the following values:
- (*) FSCACHE_CHECKAUX_OKAY - the entry is okay as is
- (*) FSCACHE_CHECKAUX_NEEDS_UPDATE - the entry requires update
- (*) FSCACHE_CHECKAUX_OBSOLETE - the entry should be deleted
+ FSCACHE_CHECKAUX_OKAY
+ - the entry is okay as is
+
+ FSCACHE_CHECKAUX_NEEDS_UPDATE
+ - the entry requires update
+
+ FSCACHE_CHECKAUX_OBSOLETE
+ - the entry should be deleted
This function can also be used to extract data from the auxiliary data in
the cache and copy it into the netfs's structures.
@@ -236,32 +237,30 @@ This has the following fields:
This function is not required for indices as they're not permitted data.
-===================================
-NETWORK FILESYSTEM (UN)REGISTRATION
+Network Filesystem (Un)registration
===================================
The first step is to declare the network filesystem to the cache. This also
involves specifying the layout of the primary index (for AFS, this would be the
"cell" level).
-The registration function is:
+The registration function is::
int fscache_register_netfs(struct fscache_netfs *netfs);
It just takes a pointer to the netfs definition. It returns 0 or an error as
appropriate.
-For kAFS, registration is done as follows:
+For kAFS, registration is done as follows::
ret = fscache_register_netfs(&afs_cache_netfs);
-The last step is, of course, unregistration:
+The last step is, of course, unregistration::
void fscache_unregister_netfs(struct fscache_netfs *netfs);
-================
-CACHE TAG LOOKUP
+Cache Tag Lookup
================
FS-Cache permits the use of more than one cache. To permit particular index
@@ -270,7 +269,7 @@ representation tags. This step is optional; it can be left entirely up to
FS-Cache as to which cache should be used. The problem with doing that is that
FS-Cache will always pick the first cache that was registered.
-To get the representation for a named tag:
+To get the representation for a named tag::
struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name);
@@ -278,7 +277,7 @@ This takes a text string as the name and returns a representation of a tag. It
will never return an error. It may return a dummy tag, however, if it runs out
of memory; this will inhibit caching with this tag.
-Any representation so obtained must be released by passing it to this function:
+Any representation so obtained must be released by passing it to this function::
void fscache_release_cache_tag(struct fscache_cache_tag *tag);
@@ -286,13 +285,12 @@ The tag will be retrieved by FS-Cache when it calls the object definition
operation select_cache().
-==================
-INDEX REGISTRATION
+Index Registration
==================
The third step is to inform FS-Cache about part of an index hierarchy that can
be used to locate files. This is done by requesting a cookie for each index in
-the path to the file:
+the path to the file::
struct fscache_cookie *
fscache_acquire_cookie(struct fscache_cookie *parent,
@@ -339,7 +337,7 @@ must be enabled to do anything with it. A disabled cookie can be enabled by
calling fscache_enable_cookie() (see below).
For example, with AFS, a cell would be added to the primary index. This index
-entry would have a dependent inode containing volume mappings within this cell:
+entry would have a dependent inode containing volume mappings within this cell::
cell->cache =
fscache_acquire_cookie(afs_cache_netfs.primary_index,
@@ -349,7 +347,7 @@ entry would have a dependent inode containing volume mappings within this cell:
cell, 0, true);
And then a particular volume could be added to that index by ID, creating
-another index for vnodes (AFS inode equivalents):
+another index for vnodes (AFS inode equivalents)::
volume->cache =
fscache_acquire_cookie(volume->cell->cache,
@@ -359,13 +357,12 @@ another index for vnodes (AFS inode equivalents):
volume, 0, true);
-======================
-DATA FILE REGISTRATION
+Data File Registration
======================
The fourth step is to request a data file be created in the cache. This is
identical to index cookie acquisition. The only difference is that the type in
-the object definition should be something other than index type.
+the object definition should be something other than index type::
vnode->cache =
fscache_acquire_cookie(volume->cache,
@@ -375,15 +372,14 @@ the object definition should be something other than index type.
vnode, vnode->status.size, true);
-=================================
-MISCELLANEOUS OBJECT REGISTRATION
+Miscellaneous Object Registration
=================================
An optional step is to request an object of miscellaneous type be created in
the cache. This is almost identical to index cookie acquisition. The only
difference is that the type in the object definition should be something other
than index type. While the parent object could be an index, it's more likely
-it would be some other type of object such as a data file.
+it would be some other type of object such as a data file::
xattr->cache =
fscache_acquire_cookie(vnode->cache,
@@ -396,13 +392,12 @@ Miscellaneous objects might be used to store extended attributes or directory
entries for example.
-==========================
-SETTING THE DATA FILE SIZE
+Setting the Data File Size
==========================
The fifth step is to set the physical attributes of the file, such as its size.
This doesn't automatically reserve any space in the cache, but permits the
-cache to adjust its metadata for data tracking appropriately:
+cache to adjust its metadata for data tracking appropriately::
int fscache_attr_changed(struct fscache_cookie *cookie);
@@ -417,8 +412,7 @@ some point in the future, and as such, it may happen after the function returns
to the caller. The attribute adjustment excludes read and write operations.
-=====================
-PAGE ALLOC/READ/WRITE
+Page alloc/read/write
=====================
And the sixth step is to store and retrieve pages in the cache. There are
@@ -441,7 +435,7 @@ PAGE READ
Firstly, the netfs should ask FS-Cache to examine the caches and read the
contents cached for a particular page of a particular file if present, or else
-allocate space to store the contents if not:
+allocate space to store the contents if not::
typedef
void (*fscache_rw_complete_t)(struct page *page,
@@ -474,14 +468,14 @@ Else if there's a copy of the page resident in the cache:
(4) When the read is complete, end_io_func() will be invoked with:
- (*) The netfs data supplied when the cookie was created.
+ * The netfs data supplied when the cookie was created.
- (*) The page descriptor.
+ * The page descriptor.
- (*) The context argument passed to the above function. This will be
+ * The context argument passed to the above function. This will be
maintained with the get_context/put_context functions mentioned above.
- (*) An argument that's 0 on success or negative for an error code.
+ * An argument that's 0 on success or negative for an error code.
If an error occurs, it should be assumed that the page contains no usable
data. fscache_readpages_cancel() may need to be called.
@@ -504,11 +498,11 @@ This function may also return -ENOMEM or -EINTR, in which case it won't have
read any data from the cache.
-PAGE ALLOCATE
+Page Allocate
-------------
Alternatively, if there's not expected to be any data in the cache for a page
-because the file has been extended, a block can simply be allocated instead:
+because the file has been extended, a block can simply be allocated instead::
int fscache_alloc_page(struct fscache_cookie *cookie,
struct page *page,
@@ -523,12 +517,12 @@ The mark_pages_cached() cookie operation will be called on the page if
successful.
-PAGE WRITE
+Page Write
----------
Secondly, if the netfs changes the contents of the page (either due to an
initial download or if a user performs a write), then the page should be
-written back to the cache:
+written back to the cache::
int fscache_write_page(struct fscache_cookie *cookie,
struct page *page,
@@ -566,11 +560,11 @@ place if unforeseen circumstances arose (such as a disk error).
Writing takes place asynchronously.
-MULTIPLE PAGE READ
+Multiple Page Read
------------------
A facility is provided to read several pages at once, as requested by the
-readpages() address space operation:
+readpages() address space operation::
int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
struct address_space *mapping,
@@ -598,7 +592,7 @@ This works in a similar way to fscache_read_or_alloc_page(), except:
be returned.
Otherwise, if all pages had reads dispatched, then 0 will be returned, the
- list will be empty and *nr_pages will be 0.
+ list will be empty and ``*nr_pages`` will be 0.
(4) end_io_func will be called once for each page being read as the reads
complete. It will be called in process context if error != 0, but it may
@@ -609,13 +603,13 @@ some of the pages being read and some being allocated. Those pages will have
been marked appropriately and will need uncaching.
-CANCELLATION OF UNREAD PAGES
+Cancellation of Unread Pages
----------------------------
If one or more pages are passed to fscache_read_or_alloc_pages() but not then
read from the cache and also not read from the underlying filesystem then
those pages will need to have any marks and reservations removed. This can be
-done by calling:
+done by calling::
void fscache_readpages_cancel(struct fscache_cookie *cookie,
struct list_head *pages);
@@ -625,11 +619,10 @@ fscache_read_or_alloc_pages(). Every page in the pages list will be examined
and any that have PG_fscache set will be uncached.
-==============
-PAGE UNCACHING
+Page Uncaching
==============
-To uncache a page, this function should be called:
+To uncache a page, this function should be called::
void fscache_uncache_page(struct fscache_cookie *cookie,
struct page *page);
@@ -644,12 +637,12 @@ data file must be retired (see the relinquish cookie function below).
Furthermore, note that this does not cancel the asynchronous read or write
operation started by the read/alloc and write functions, so the page
-invalidation functions must use:
+invalidation functions must use::
bool fscache_check_page_write(struct fscache_cookie *cookie,
struct page *page);
-to see if a page is being written to the cache, and:
+to see if a page is being written to the cache, and::
void fscache_wait_on_page_write(struct fscache_cookie *cookie,
struct page *page);
@@ -660,7 +653,7 @@ to wait for it to finish if it is.
When releasepage() is being implemented, a special FS-Cache function exists to
manage the heuristics of coping with vmscan trying to eject pages, which may
conflict with the cache trying to write pages to the cache (which may itself
-need to allocate memory):
+need to allocate memory)::
bool fscache_maybe_release_page(struct fscache_cookie *cookie,
struct page *page,
@@ -676,12 +669,12 @@ storage request to complete, or it may attempt to cancel the storage request -
in which case the page will not be stored in the cache this time.
-BULK INODE PAGE UNCACHE
+Bulk Image Page Uncache
-----------------------
A convenience routine is provided to perform an uncache on all the pages
attached to an inode. This assumes that the pages on the inode correspond on a
-1:1 basis with the pages in the cache.
+1:1 basis with the pages in the cache::
void fscache_uncache_all_inode_pages(struct fscache_cookie *cookie,
struct inode *inode);
@@ -692,12 +685,11 @@ written to the cache and for the cache to finish with the page generally. No
error is returned.
-===============================
-INDEX AND DATA FILE CONSISTENCY
+Index and Data File consistency
===============================
To find out whether auxiliary data for an object is up to data within the
-cache, the following function can be called:
+cache, the following function can be called::
int fscache_check_consistency(struct fscache_cookie *cookie,
const void *aux_data);
@@ -708,7 +700,7 @@ data buffer first. It returns 0 if it is and -ESTALE if it isn't; it may also
return -ENOMEM and -ERESTARTSYS.
To request an update of the index data for an index or other object, the
-following function should be called:
+following function should be called::
void fscache_update_cookie(struct fscache_cookie *cookie,
const void *aux_data);
@@ -721,8 +713,7 @@ Note that partial updates may happen automatically at other times, such as when
data blocks are added to a data file object.
-=================
-COOKIE ENABLEMENT
+Cookie Enablement
=================
Cookies exist in one of two states: enabled and disabled. If a cookie is
@@ -731,7 +722,7 @@ invalidate its state; allocate, read or write backing pages - though it is
still possible to uncache pages and relinquish the cookie.
The initial enablement state is set by fscache_acquire_cookie(), but the cookie
-can be enabled or disabled later. To disable a cookie, call:
+can be enabled or disabled later. To disable a cookie, call::
void fscache_disable_cookie(struct fscache_cookie *cookie,
const void *aux_data,
@@ -746,7 +737,7 @@ All possible failures are handled internally. The caller should consider
calling fscache_uncache_all_inode_pages() afterwards to make sure all page
markings are cleared up.
-Cookies can be enabled or reenabled with:
+Cookies can be enabled or reenabled with::
void fscache_enable_cookie(struct fscache_cookie *cookie,
const void *aux_data,
@@ -771,13 +762,12 @@ In both cases, the cookie's auxiliary data buffer is updated from aux_data if
that is non-NULL inside the enablement lock before proceeding.
-===============================
-MISCELLANEOUS COOKIE OPERATIONS
+Miscellaneous Cookie operations
===============================
There are a number of operations that can be used to control cookies:
- (*) Cookie pinning:
+ * Cookie pinning::
int fscache_pin_cookie(struct fscache_cookie *cookie);
void fscache_unpin_cookie(struct fscache_cookie *cookie);
@@ -790,7 +780,7 @@ There are a number of operations that can be used to control cookies:
-ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
-EIO if there's any other problem.
- (*) Data space reservation:
+ * Data space reservation::
int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size);
@@ -809,11 +799,10 @@ There are a number of operations that can be used to control cookies:
make space if it's not in use.
-=====================
-COOKIE UNREGISTRATION
+Cookie Unregistration
=====================
-To get rid of a cookie, this function should be called.
+To get rid of a cookie, this function should be called::
void fscache_relinquish_cookie(struct fscache_cookie *cookie,
const void *aux_data,
@@ -835,16 +824,14 @@ the cookies for "child" indices, objects and pages have been relinquished
first.
-==================
-INDEX INVALIDATION
+Index Invalidation
==================
There is no direct way to invalidate an index subtree. To do this, the caller
should relinquish and retire the cookie they have, and then acquire a new one.
-======================
-DATA FILE INVALIDATION
+Data File Invalidation
======================
Sometimes it will be necessary to invalidate an object that contains data.
@@ -853,7 +840,7 @@ change - at which point the netfs has to throw away all the state it had for an
inode and reload from the server.
To indicate that a cache object should be invalidated, the following function
-can be called:
+can be called::
void fscache_invalidate(struct fscache_cookie *cookie);
@@ -868,13 +855,12 @@ auxiliary data update operation as it is very likely these will have changed.
Using the following function, the netfs can wait for the invalidation operation
to have reached a point at which it can start submitting ordinary operations
-once again:
+once again::
void fscache_wait_on_invalidate(struct fscache_cookie *cookie);
-===========================
-FS-CACHE SPECIFIC PAGE FLAG
+FS-cache Specific Page Flag
===========================
FS-Cache makes use of a page flag, PG_private_2, for its own purpose. This is
@@ -898,7 +884,7 @@ was given under certain circumstances.
This bit does not overlap with such as PG_private. This means that FS-Cache
can be used with a filesystem that uses the block buffering code.
-There are a number of operations defined on this flag:
+There are a number of operations defined on this flag::
int PageFsCache(struct page *page);
void SetPageFsCache(struct page *page)
diff --git a/Documentation/filesystems/caching/object.txt b/Documentation/filesystems/caching/object.rst
index 100ff41127e4..ce0e043ccd33 100644
--- a/Documentation/filesystems/caching/object.txt
+++ b/Documentation/filesystems/caching/object.rst
@@ -1,10 +1,12 @@
- ====================================================
- IN-KERNEL CACHE OBJECT REPRESENTATION AND MANAGEMENT
- ====================================================
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================================
+In-Kernel Cache Object Representation and Management
+====================================================
By: David Howells <dhowells@redhat.com>
-Contents:
+.. Contents:
(*) Representation
@@ -18,8 +20,7 @@ Contents:
(*) The set of events.
-==============
-REPRESENTATION
+Representation
==============
FS-Cache maintains an in-kernel representation of each object that a netfs is
@@ -38,7 +39,7 @@ or even by no objects (it may not be cached).
Furthermore, both cookies and objects are hierarchical. The two hierarchies
correspond, but the cookies tree is a superset of the union of the object trees
-of multiple caches:
+of multiple caches::
NETFS INDEX TREE : CACHE 1 : CACHE 2
: :
@@ -89,8 +90,7 @@ pointers to the cookies. The cookies themselves and any objects attached to
those cookies are hidden from it.
-===============================
-OBJECT MANAGEMENT STATE MACHINE
+Object Management State Machine
===============================
Within FS-Cache, each active object is managed by its own individual state
@@ -124,7 +124,7 @@ is not masked, the object will be queued for processing (by calling
fscache_enqueue_object()).
-PROVISION OF CPU TIME
+Provision of CPU Time
---------------------
The work to be done by the various states was given CPU time by the threads of
@@ -141,7 +141,7 @@ because:
workqueues don't necessarily have the right numbers of threads.
-LOCKING SIMPLIFICATION
+Locking Simplification
----------------------
Because only one worker thread may be operating on any particular object's
@@ -151,8 +151,7 @@ from the cache backend's representation (fscache_object) - which may be
requested from either end.
-=================
-THE SET OF STATES
+The Set of States
=================
The object state machine has a set of states that it can be in. There are
@@ -275,19 +274,17 @@ memory and potentially deletes stuff from disk:
this state.
-THE SET OF EVENTS
+The Set of Events
-----------------
There are a number of events that can be raised to an object state machine:
- (*) FSCACHE_OBJECT_EV_UPDATE
-
+ FSCACHE_OBJECT_EV_UPDATE
The netfs requested that an object be updated. The state machine will ask
the cache backend to update the object, and the cache backend will ask the
netfs for details of the change through its cookie definition ops.
- (*) FSCACHE_OBJECT_EV_CLEARED
-
+ FSCACHE_OBJECT_EV_CLEARED
This is signalled in two circumstances:
(a) when an object's last child object is dropped and
@@ -296,20 +293,16 @@ There are a number of events that can be raised to an object state machine:
This is used to proceed from the dying state.
- (*) FSCACHE_OBJECT_EV_ERROR
-
+ FSCACHE_OBJECT_EV_ERROR
This is signalled when an I/O error occurs during the processing of some
object.
- (*) FSCACHE_OBJECT_EV_RELEASE
- (*) FSCACHE_OBJECT_EV_RETIRE
-
+ FSCACHE_OBJECT_EV_RELEASE, FSCACHE_OBJECT_EV_RETIRE
These are signalled when the netfs relinquishes a cookie it was using.
The event selected depends on whether the netfs asks for the backing
object to be retired (deleted) or retained.
- (*) FSCACHE_OBJECT_EV_WITHDRAW
-
+ FSCACHE_OBJECT_EV_WITHDRAW
This is signalled when the cache backend wants to withdraw an object.
This means that the object will have to be detached from the netfs's
cookie.
diff --git a/Documentation/filesystems/caching/operations.txt b/Documentation/filesystems/caching/operations.rst
index d8976c434718..f7ddcc028939 100644
--- a/Documentation/filesystems/caching/operations.txt
+++ b/Documentation/filesystems/caching/operations.rst
@@ -1,10 +1,12 @@
- ================================
- ASYNCHRONOUS OPERATIONS HANDLING
- ================================
+.. SPDX-License-Identifier: GPL-2.0
+
+================================
+Asynchronous Operations Handling
+================================
By: David Howells <dhowells@redhat.com>
-Contents:
+.. Contents:
(*) Overview.
@@ -17,8 +19,7 @@ Contents:
(*) Asynchronous callback.
-========
-OVERVIEW
+Overview
========
FS-Cache has an asynchronous operations handling facility that it uses for its
@@ -33,11 +34,10 @@ backend for completion.
To make use of this facility, <linux/fscache-cache.h> should be #included.
-===============================
-OPERATION RECORD INITIALISATION
+Operation Record Initialisation
===============================
-An operation is recorded in an fscache_operation struct:
+An operation is recorded in an fscache_operation struct::
struct fscache_operation {
union {
@@ -50,7 +50,7 @@ An operation is recorded in an fscache_operation struct:
};
Someone wanting to issue an operation should allocate something with this
-struct embedded in it. They should initialise it by calling:
+struct embedded in it. They should initialise it by calling::
void fscache_operation_init(struct fscache_operation *op,
fscache_operation_release_t release);
@@ -67,8 +67,7 @@ FSCACHE_OP_WAITING may be set in op->flags prior to each submission of the
operation and waited for afterwards.
-==========
-PARAMETERS
+Parameters
==========
There are a number of parameters that can be set in the operation record's flag
@@ -87,7 +86,7 @@ operations:
If this option is to be used, FSCACHE_OP_WAITING must be set in op->flags
before submitting the operation, and the operating thread must wait for it
- to be cleared before proceeding:
+ to be cleared before proceeding::
wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
TASK_UNINTERRUPTIBLE);
@@ -101,7 +100,7 @@ operations:
page to a netfs page after the backing fs has read the page in.
If this option is used, op->fast_work and op->processor must be
- initialised before submitting the operation:
+ initialised before submitting the operation::
INIT_WORK(&op->fast_work, do_some_work);
@@ -114,7 +113,7 @@ operations:
pages that have just been fetched from a remote server.
If this option is used, op->slow_work and op->processor must be
- initialised before submitting the operation:
+ initialised before submitting the operation::
fscache_operation_init_slow(op, processor)
@@ -132,8 +131,7 @@ Furthermore, operations may be one of two types:
operations running at the same time.
-=========
-PROCEDURE
+Procedure
=========
Operations are used through the following procedure:
@@ -143,7 +141,7 @@ Operations are used through the following procedure:
generic op embedded within.
(2) The submitting thread must then submit the operation for processing using
- one of the following two functions:
+ one of the following two functions::
int fscache_submit_op(struct fscache_object *object,
struct fscache_operation *op);
@@ -164,7 +162,7 @@ Operations are used through the following procedure:
operation of conflicting exclusivity is in progress on the object.
If the operation is asynchronous, the manager will retain a reference to
- it, so the caller should put their reference to it by passing it to:
+ it, so the caller should put their reference to it by passing it to::
void fscache_put_operation(struct fscache_operation *op);
@@ -179,12 +177,12 @@ Operations are used through the following procedure:
(4) The operation holds an effective lock upon the object, preventing other
exclusive ops conflicting until it is released. The operation can be
enqueued for further immediate asynchronous processing by adjusting the
- CPU time provisioning option if necessary, eg:
+ CPU time provisioning option if necessary, eg::
op->flags &= ~FSCACHE_OP_TYPE;
op->flags |= ~FSCACHE_OP_FAST;
- and calling:
+ and calling::
void fscache_enqueue_operation(struct fscache_operation *op)
@@ -192,13 +190,12 @@ Operations are used through the following procedure:
pools.
-=====================
-ASYNCHRONOUS CALLBACK
+Asynchronous Callback
=====================
When used in asynchronous mode, the worker thread pool will invoke the
processor method with a pointer to the operation. This should then get at the
-container struct by using container_of():
+container struct by using container_of()::
static void fscache_write_op(struct fscache_operation *_op)
{
diff --git a/Documentation/filesystems/cifs/cifsroot.txt b/Documentation/filesystems/cifs/cifsroot.rst
index 947b7ec6ce9e..4930bb443134 100644
--- a/Documentation/filesystems/cifs/cifsroot.txt
+++ b/Documentation/filesystems/cifs/cifsroot.rst
@@ -1,7 +1,11 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================================
Mounting root file system via SMB (cifs.ko)
===========================================
Written 2019 by Paulo Alcantara <palcantara@suse.de>
+
Written 2019 by Aurelien Aptel <aaptel@suse.com>
The CONFIG_CIFS_ROOT option enables experimental root file system
@@ -32,7 +36,7 @@ Server configuration
====================
To enable SMB1+UNIX extensions you will need to set these global
-settings in Samba smb.conf:
+settings in Samba smb.conf::
[global]
server min protocol = NT1
@@ -41,12 +45,16 @@ settings in Samba smb.conf:
Kernel command line
===================
-root=/dev/cifs
+::
+
+ root=/dev/cifs
This is just a virtual device that basically tells the kernel to mount
the root file system via SMB protocol.
-cifsroot=//<server-ip>/<share>[,options]
+::
+
+ cifsroot=//<server-ip>/<share>[,options]
Enables the kernel to mount the root file system via SMB that are
located in the <server-ip> and <share> specified in this option.
@@ -65,33 +73,33 @@ options
Examples
========
-Export root file system as a Samba share in smb.conf file.
+Export root file system as a Samba share in smb.conf file::
-...
-[linux]
- path = /path/to/rootfs
- read only = no
- guest ok = yes
- force user = root
- force group = root
- browseable = yes
- writeable = yes
- admin users = root
- public = yes
- create mask = 0777
- directory mask = 0777
-...
+ ...
+ [linux]
+ path = /path/to/rootfs
+ read only = no
+ guest ok = yes
+ force user = root
+ force group = root
+ browseable = yes
+ writeable = yes
+ admin users = root
+ public = yes
+ create mask = 0777
+ directory mask = 0777
+ ...
-Restart smb service.
+Restart smb service::
-# systemctl restart smb
+ # systemctl restart smb
Test it under QEMU on a kernel built with CONFIG_CIFS_ROOT and
-CONFIG_IP_PNP options enabled.
+CONFIG_IP_PNP options enabled::
-# qemu-system-x86_64 -enable-kvm -cpu host -m 1024 \
- -kernel /path/to/linux/arch/x86/boot/bzImage -nographic \
- -append "root=/dev/cifs rw ip=dhcp cifsroot=//10.0.2.2/linux,username=foo,password=bar console=ttyS0 3"
+ # qemu-system-x86_64 -enable-kvm -cpu host -m 1024 \
+ -kernel /path/to/linux/arch/x86/boot/bzImage -nographic \
+ -append "root=/dev/cifs rw ip=dhcp cifsroot=//10.0.2.2/linux,username=foo,password=bar console=ttyS0 3"
1: https://wiki.samba.org/index.php/UNIX_Extensions
diff --git a/Documentation/filesystems/coda.rst b/Documentation/filesystems/coda.rst
new file mode 100644
index 000000000000..84c860c89887
--- /dev/null
+++ b/Documentation/filesystems/coda.rst
@@ -0,0 +1,1670 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================
+Coda Kernel-Venus Interface
+===========================
+
+.. Note::
+
+ This is one of the technical documents describing a component of
+ Coda -- this document describes the client kernel-Venus interface.
+
+For more information:
+
+ http://www.coda.cs.cmu.edu
+
+For user level software needed to run Coda:
+
+ ftp://ftp.coda.cs.cmu.edu
+
+To run Coda you need to get a user level cache manager for the client,
+named Venus, as well as tools to manipulate ACLs, to log in, etc. The
+client needs to have the Coda filesystem selected in the kernel
+configuration.
+
+The server needs a user level server and at present does not depend on
+kernel support.
+
+ The Venus kernel interface
+
+ Peter J. Braam
+
+ v1.0, Nov 9, 1997
+
+ This document describes the communication between Venus and kernel
+ level filesystem code needed for the operation of the Coda file sys-
+ tem. This document version is meant to describe the current interface
+ (version 1.0) as well as improvements we envisage.
+
+.. Table of Contents
+
+ 1. Introduction
+
+ 2. Servicing Coda filesystem calls
+
+ 3. The message layer
+
+ 3.1 Implementation details
+
+ 4. The interface at the call level
+
+ 4.1 Data structures shared by the kernel and Venus
+ 4.2 The pioctl interface
+ 4.3 root
+ 4.4 lookup
+ 4.5 getattr
+ 4.6 setattr
+ 4.7 access
+ 4.8 create
+ 4.9 mkdir
+ 4.10 link
+ 4.11 symlink
+ 4.12 remove
+ 4.13 rmdir
+ 4.14 readlink
+ 4.15 open
+ 4.16 close
+ 4.17 ioctl
+ 4.18 rename
+ 4.19 readdir
+ 4.20 vget
+ 4.21 fsync
+ 4.22 inactive
+ 4.23 rdwr
+ 4.24 odymount
+ 4.25 ody_lookup
+ 4.26 ody_expand
+ 4.27 prefetch
+ 4.28 signal
+
+ 5. The minicache and downcalls
+
+ 5.1 INVALIDATE
+ 5.2 FLUSH
+ 5.3 PURGEUSER
+ 5.4 ZAPFILE
+ 5.5 ZAPDIR
+ 5.6 ZAPVNODE
+ 5.7 PURGEFID
+ 5.8 REPLACE
+
+ 6. Initialization and cleanup
+
+ 6.1 Requirements
+
+1. Introduction
+===============
+
+ A key component in the Coda Distributed File System is the cache
+ manager, Venus.
+
+ When processes on a Coda enabled system access files in the Coda
+ filesystem, requests are directed at the filesystem layer in the
+ operating system. The operating system will communicate with Venus to
+ service the request for the process. Venus manages a persistent
+ client cache and makes remote procedure calls to Coda file servers and
+ related servers (such as authentication servers) to service these
+ requests it receives from the operating system. When Venus has
+ serviced a request it replies to the operating system with appropriate
+ return codes, and other data related to the request. Optionally the
+ kernel support for Coda may maintain a minicache of recently processed
+ requests to limit the number of interactions with Venus. Venus
+ possesses the facility to inform the kernel when elements from its
+ minicache are no longer valid.
+
+ This document describes precisely this communication between the
+ kernel and Venus. The definitions of so called upcalls and downcalls
+ will be given with the format of the data they handle. We shall also
+ describe the semantic invariants resulting from the calls.
+
+ Historically Coda was implemented in a BSD file system in Mach 2.6.
+ The interface between the kernel and Venus is very similar to the BSD
+ VFS interface. Similar functionality is provided, and the format of
+ the parameters and returned data is very similar to the BSD VFS. This
+ leads to an almost natural environment for implementing a kernel-level
+ filesystem driver for Coda in a BSD system. However, other operating
+ systems such as Linux and Windows 95 and NT have virtual filesystem
+ with different interfaces.
+
+ To implement Coda on these systems some reverse engineering of the
+ Venus/Kernel protocol is necessary. Also it came to light that other
+ systems could profit significantly from certain small optimizations
+ and modifications to the protocol. To facilitate this work as well as
+ to make future ports easier, communication between Venus and the
+ kernel should be documented in great detail. This is the aim of this
+ document.
+
+2. Servicing Coda filesystem calls
+===================================
+
+ The service of a request for a Coda file system service originates in
+ a process P which accessing a Coda file. It makes a system call which
+ traps to the OS kernel. Examples of such calls trapping to the kernel
+ are ``read``, ``write``, ``open``, ``close``, ``create``, ``mkdir``,
+ ``rmdir``, ``chmod`` in a Unix ontext. Similar calls exist in the Win32
+ environment, and are named ``CreateFile``.
+
+ Generally the operating system handles the request in a virtual
+ filesystem (VFS) layer, which is named I/O Manager in NT and IFS
+ manager in Windows 95. The VFS is responsible for partial processing
+ of the request and for locating the specific filesystem(s) which will
+ service parts of the request. Usually the information in the path
+ assists in locating the correct FS drivers. Sometimes after extensive
+ pre-processing, the VFS starts invoking exported routines in the FS
+ driver. This is the point where the FS specific processing of the
+ request starts, and here the Coda specific kernel code comes into
+ play.
+
+ The FS layer for Coda must expose and implement several interfaces.
+ First and foremost the VFS must be able to make all necessary calls to
+ the Coda FS layer, so the Coda FS driver must expose the VFS interface
+ as applicable in the operating system. These differ very significantly
+ among operating systems, but share features such as facilities to
+ read/write and create and remove objects. The Coda FS layer services
+ such VFS requests by invoking one or more well defined services
+ offered by the cache manager Venus. When the replies from Venus have
+ come back to the FS driver, servicing of the VFS call continues and
+ finishes with a reply to the kernel's VFS. Finally the VFS layer
+ returns to the process.
+
+ As a result of this design a basic interface exposed by the FS driver
+ must allow Venus to manage message traffic. In particular Venus must
+ be able to retrieve and place messages and to be notified of the
+ arrival of a new message. The notification must be through a mechanism
+ which does not block Venus since Venus must attend to other tasks even
+ when no messages are waiting or being processed.
+
+ **Interfaces of the Coda FS Driver**
+
+ Furthermore the FS layer provides for a special path of communication
+ between a user process and Venus, called the pioctl interface. The
+ pioctl interface is used for Coda specific services, such as
+ requesting detailed information about the persistent cache managed by
+ Venus. Here the involvement of the kernel is minimal. It identifies
+ the calling process and passes the information on to Venus. When
+ Venus replies the response is passed back to the caller in unmodified
+ form.
+
+ Finally Venus allows the kernel FS driver to cache the results from
+ certain services. This is done to avoid excessive context switches
+ and results in an efficient system. However, Venus may acquire
+ information, for example from the network which implies that cached
+ information must be flushed or replaced. Venus then makes a downcall
+ to the Coda FS layer to request flushes or updates in the cache. The
+ kernel FS driver handles such requests synchronously.
+
+ Among these interfaces the VFS interface and the facility to place,
+ receive and be notified of messages are platform specific. We will
+ not go into the calls exported to the VFS layer but we will state the
+ requirements of the message exchange mechanism.
+
+
+3. The message layer
+=====================
+
+ At the lowest level the communication between Venus and the FS driver
+ proceeds through messages. The synchronization between processes
+ requesting Coda file service and Venus relies on blocking and waking
+ up processes. The Coda FS driver processes VFS- and pioctl-requests
+ on behalf of a process P, creates messages for Venus, awaits replies
+ and finally returns to the caller. The implementation of the exchange
+ of messages is platform specific, but the semantics have (so far)
+ appeared to be generally applicable. Data buffers are created by the
+ FS Driver in kernel memory on behalf of P and copied to user memory in
+ Venus.
+
+ The FS Driver while servicing P makes upcalls to Venus. Such an
+ upcall is dispatched to Venus by creating a message structure. The
+ structure contains the identification of P, the message sequence
+ number, the size of the request and a pointer to the data in kernel
+ memory for the request. Since the data buffer is re-used to hold the
+ reply from Venus, there is a field for the size of the reply. A flags
+ field is used in the message to precisely record the status of the
+ message. Additional platform dependent structures involve pointers to
+ determine the position of the message on queues and pointers to
+ synchronization objects. In the upcall routine the message structure
+ is filled in, flags are set to 0, and it is placed on the *pending*
+ queue. The routine calling upcall is responsible for allocating the
+ data buffer; its structure will be described in the next section.
+
+ A facility must exist to notify Venus that the message has been
+ created, and implemented using available synchronization objects in
+ the OS. This notification is done in the upcall context of the process
+ P. When the message is on the pending queue, process P cannot proceed
+ in upcall. The (kernel mode) processing of P in the filesystem
+ request routine must be suspended until Venus has replied. Therefore
+ the calling thread in P is blocked in upcall. A pointer in the
+ message structure will locate the synchronization object on which P is
+ sleeping.
+
+ Venus detects the notification that a message has arrived, and the FS
+ driver allow Venus to retrieve the message with a getmsg_from_kernel
+ call. This action finishes in the kernel by putting the message on the
+ queue of processing messages and setting flags to READ. Venus is
+ passed the contents of the data buffer. The getmsg_from_kernel call
+ now returns and Venus processes the request.
+
+ At some later point the FS driver receives a message from Venus,
+ namely when Venus calls sendmsg_to_kernel. At this moment the Coda FS
+ driver looks at the contents of the message and decides if:
+
+
+ * the message is a reply for a suspended thread P. If so it removes
+ the message from the processing queue and marks the message as
+ WRITTEN. Finally, the FS driver unblocks P (still in the kernel
+ mode context of Venus) and the sendmsg_to_kernel call returns to
+ Venus. The process P will be scheduled at some point and continues
+ processing its upcall with the data buffer replaced with the reply
+ from Venus.
+
+ * The message is a ``downcall``. A downcall is a request from Venus to
+ the FS Driver. The FS driver processes the request immediately
+ (usually a cache eviction or replacement) and when it finishes
+ sendmsg_to_kernel returns.
+
+ Now P awakes and continues processing upcall. There are some
+ subtleties to take account of. First P will determine if it was woken
+ up in upcall by a signal from some other source (for example an
+ attempt to terminate P) or as is normally the case by Venus in its
+ sendmsg_to_kernel call. In the normal case, the upcall routine will
+ deallocate the message structure and return. The FS routine can proceed
+ with its processing.
+
+
+ **Sleeping and IPC arrangements**
+
+ In case P is woken up by a signal and not by Venus, it will first look
+ at the flags field. If the message is not yet READ, the process P can
+ handle its signal without notifying Venus. If Venus has READ, and
+ the request should not be processed, P can send Venus a signal message
+ to indicate that it should disregard the previous message. Such
+ signals are put in the queue at the head, and read first by Venus. If
+ the message is already marked as WRITTEN it is too late to stop the
+ processing. The VFS routine will now continue. (-- If a VFS request
+ involves more than one upcall, this can lead to complicated state, an
+ extra field "handle_signals" could be added in the message structure
+ to indicate points of no return have been passed.--)
+
+
+
+3.1. Implementation details
+----------------------------
+
+ The Unix implementation of this mechanism has been through the
+ implementation of a character device associated with Coda. Venus
+ retrieves messages by doing a read on the device, replies are sent
+ with a write and notification is through the select system call on the
+ file descriptor for the device. The process P is kept waiting on an
+ interruptible wait queue object.
+
+ In Windows NT and the DPMI Windows 95 implementation a DeviceIoControl
+ call is used. The DeviceIoControl call is designed to copy buffers
+ from user memory to kernel memory with OPCODES. The sendmsg_to_kernel
+ is issued as a synchronous call, while the getmsg_from_kernel call is
+ asynchronous. Windows EventObjects are used for notification of
+ message arrival. The process P is kept waiting on a KernelEvent
+ object in NT and a semaphore in Windows 95.
+
+
+4. The interface at the call level
+===================================
+
+
+ This section describes the upcalls a Coda FS driver can make to Venus.
+ Each of these upcalls make use of two structures: inputArgs and
+ outputArgs. In pseudo BNF form the structures take the following
+ form::
+
+
+ struct inputArgs {
+ u_long opcode;
+ u_long unique; /* Keep multiple outstanding msgs distinct */
+ u_short pid; /* Common to all */
+ u_short pgid; /* Common to all */
+ struct CodaCred cred; /* Common to all */
+
+ <union "in" of call dependent parts of inputArgs>
+ };
+
+ struct outputArgs {
+ u_long opcode;
+ u_long unique; /* Keep multiple outstanding msgs distinct */
+ u_long result;
+
+ <union "out" of call dependent parts of inputArgs>
+ };
+
+
+
+ Before going on let us elucidate the role of the various fields. The
+ inputArgs start with the opcode which defines the type of service
+ requested from Venus. There are approximately 30 upcalls at present
+ which we will discuss. The unique field labels the inputArg with a
+ unique number which will identify the message uniquely. A process and
+ process group id are passed. Finally the credentials of the caller
+ are included.
+
+ Before delving into the specific calls we need to discuss a variety of
+ data structures shared by the kernel and Venus.
+
+
+
+
+4.1. Data structures shared by the kernel and Venus
+----------------------------------------------------
+
+
+ The CodaCred structure defines a variety of user and group ids as
+ they are set for the calling process. The vuid_t and vgid_t are 32 bit
+ unsigned integers. It also defines group membership in an array. On
+ Unix the CodaCred has proven sufficient to implement good security
+ semantics for Coda but the structure may have to undergo modification
+ for the Windows environment when these mature::
+
+ struct CodaCred {
+ vuid_t cr_uid, cr_euid, cr_suid, cr_fsuid; /* Real, effective, set, fs uid */
+ vgid_t cr_gid, cr_egid, cr_sgid, cr_fsgid; /* same for groups */
+ vgid_t cr_groups[NGROUPS]; /* Group membership for caller */
+ };
+
+
+ .. Note::
+
+ It is questionable if we need CodaCreds in Venus. Finally Venus
+ doesn't know about groups, although it does create files with the
+ default uid/gid. Perhaps the list of group membership is superfluous.
+
+
+ The next item is the fundamental identifier used to identify Coda
+ files, the ViceFid. A fid of a file uniquely defines a file or
+ directory in the Coda filesystem within a cell [1]_::
+
+ typedef struct ViceFid {
+ VolumeId Volume;
+ VnodeId Vnode;
+ Unique_t Unique;
+ } ViceFid;
+
+ .. [1] A cell is agroup of Coda servers acting under the aegis of a single
+ system control machine or SCM. See the Coda Administration manual
+ for a detailed description of the role of the SCM.
+
+ Each of the constituent fields: VolumeId, VnodeId and Unique_t are
+ unsigned 32 bit integers. We envisage that a further field will need
+ to be prefixed to identify the Coda cell; this will probably take the
+ form of a Ipv6 size IP address naming the Coda cell through DNS.
+
+ The next important structure shared between Venus and the kernel is
+ the attributes of the file. The following structure is used to
+ exchange information. It has room for future extensions such as
+ support for device files (currently not present in Coda)::
+
+
+ struct coda_timespec {
+ int64_t tv_sec; /* seconds */
+ long tv_nsec; /* nanoseconds */
+ };
+
+ struct coda_vattr {
+ enum coda_vtype va_type; /* vnode type (for create) */
+ u_short va_mode; /* files access mode and type */
+ short va_nlink; /* number of references to file */
+ vuid_t va_uid; /* owner user id */
+ vgid_t va_gid; /* owner group id */
+ long va_fsid; /* file system id (dev for now) */
+ long va_fileid; /* file id */
+ u_quad_t va_size; /* file size in bytes */
+ long va_blocksize; /* blocksize preferred for i/o */
+ struct coda_timespec va_atime; /* time of last access */
+ struct coda_timespec va_mtime; /* time of last modification */
+ struct coda_timespec va_ctime; /* time file changed */
+ u_long va_gen; /* generation number of file */
+ u_long va_flags; /* flags defined for file */
+ dev_t va_rdev; /* device special file represents */
+ u_quad_t va_bytes; /* bytes of disk space held by file */
+ u_quad_t va_filerev; /* file modification number */
+ u_int va_vaflags; /* operations flags, see below */
+ long va_spare; /* remain quad aligned */
+ };
+
+
+4.2. The pioctl interface
+--------------------------
+
+
+ Coda specific requests can be made by application through the pioctl
+ interface. The pioctl is implemented as an ordinary ioctl on a
+ fictitious file /coda/.CONTROL. The pioctl call opens this file, gets
+ a file handle and makes the ioctl call. Finally it closes the file.
+
+ The kernel involvement in this is limited to providing the facility to
+ open and close and pass the ioctl message and to verify that a path in
+ the pioctl data buffers is a file in a Coda filesystem.
+
+ The kernel is handed a data packet of the form::
+
+ struct {
+ const char *path;
+ struct ViceIoctl vidata;
+ int follow;
+ } data;
+
+
+
+ where::
+
+
+ struct ViceIoctl {
+ caddr_t in, out; /* Data to be transferred in, or out */
+ short in_size; /* Size of input buffer <= 2K */
+ short out_size; /* Maximum size of output buffer, <= 2K */
+ };
+
+
+
+ The path must be a Coda file, otherwise the ioctl upcall will not be
+ made.
+
+ .. Note:: The data structures and code are a mess. We need to clean this up.
+
+
+**We now proceed to document the individual calls**:
+
+
+4.3. root
+----------
+
+
+ Arguments
+ in
+
+ empty
+
+ out::
+
+ struct cfs_root_out {
+ ViceFid VFid;
+ } cfs_root;
+
+
+
+ Description
+ This call is made to Venus during the initialization of
+ the Coda filesystem. If the result is zero, the cfs_root structure
+ contains the ViceFid of the root of the Coda filesystem. If a non-zero
+ result is generated, its value is a platform dependent error code
+ indicating the difficulty Venus encountered in locating the root of
+ the Coda filesystem.
+
+4.4. lookup
+------------
+
+
+ Summary
+ Find the ViceFid and type of an object in a directory if it exists.
+
+ Arguments
+ in::
+
+ struct cfs_lookup_in {
+ ViceFid VFid;
+ char *name; /* Place holder for data. */
+ } cfs_lookup;
+
+
+
+ out::
+
+ struct cfs_lookup_out {
+ ViceFid VFid;
+ int vtype;
+ } cfs_lookup;
+
+
+
+ Description
+ This call is made to determine the ViceFid and filetype of
+ a directory entry. The directory entry requested carries name name
+ and Venus will search the directory identified by cfs_lookup_in.VFid.
+ The result may indicate that the name does not exist, or that
+ difficulty was encountered in finding it (e.g. due to disconnection).
+ If the result is zero, the field cfs_lookup_out.VFid contains the
+ targets ViceFid and cfs_lookup_out.vtype the coda_vtype giving the
+ type of object the name designates.
+
+ The name of the object is an 8 bit character string of maximum length
+ CFS_MAXNAMLEN, currently set to 256 (including a 0 terminator.)
+
+ It is extremely important to realize that Venus bitwise ors the field
+ cfs_lookup.vtype with CFS_NOCACHE to indicate that the object should
+ not be put in the kernel name cache.
+
+ .. Note::
+
+ The type of the vtype is currently wrong. It should be
+ coda_vtype. Linux does not take note of CFS_NOCACHE. It should.
+
+
+4.5. getattr
+-------------
+
+
+ Summary Get the attributes of a file.
+
+ Arguments
+ in::
+
+ struct cfs_getattr_in {
+ ViceFid VFid;
+ struct coda_vattr attr; /* XXXXX */
+ } cfs_getattr;
+
+
+
+ out::
+
+ struct cfs_getattr_out {
+ struct coda_vattr attr;
+ } cfs_getattr;
+
+
+
+ Description
+ This call returns the attributes of the file identified by fid.
+
+ Errors
+ Errors can occur if the object with fid does not exist, is
+ unaccessible or if the caller does not have permission to fetch
+ attributes.
+
+ .. Note::
+
+ Many kernel FS drivers (Linux, NT and Windows 95) need to acquire
+ the attributes as well as the Fid for the instantiation of an internal
+ "inode" or "FileHandle". A significant improvement in performance on
+ such systems could be made by combining the lookup and getattr calls
+ both at the Venus/kernel interaction level and at the RPC level.
+
+ The vattr structure included in the input arguments is superfluous and
+ should be removed.
+
+
+4.6. setattr
+-------------
+
+
+ Summary
+ Set the attributes of a file.
+
+ Arguments
+ in::
+
+ struct cfs_setattr_in {
+ ViceFid VFid;
+ struct coda_vattr attr;
+ } cfs_setattr;
+
+
+
+
+ out
+
+ empty
+
+ Description
+ The structure attr is filled with attributes to be changed
+ in BSD style. Attributes not to be changed are set to -1, apart from
+ vtype which is set to VNON. Other are set to the value to be assigned.
+ The only attributes which the FS driver may request to change are the
+ mode, owner, groupid, atime, mtime and ctime. The return value
+ indicates success or failure.
+
+ Errors
+ A variety of errors can occur. The object may not exist, may
+ be inaccessible, or permission may not be granted by Venus.
+
+
+4.7. access
+------------
+
+
+ Arguments
+ in::
+
+ struct cfs_access_in {
+ ViceFid VFid;
+ int flags;
+ } cfs_access;
+
+
+
+ out
+
+ empty
+
+ Description
+ Verify if access to the object identified by VFid for
+ operations described by flags is permitted. The result indicates if
+ access will be granted. It is important to remember that Coda uses
+ ACLs to enforce protection and that ultimately the servers, not the
+ clients enforce the security of the system. The result of this call
+ will depend on whether a token is held by the user.
+
+ Errors
+ The object may not exist, or the ACL describing the protection
+ may not be accessible.
+
+
+4.8. create
+------------
+
+
+ Summary
+ Invoked to create a file
+
+ Arguments
+ in::
+
+ struct cfs_create_in {
+ ViceFid VFid;
+ struct coda_vattr attr;
+ int excl;
+ int mode;
+ char *name; /* Place holder for data. */
+ } cfs_create;
+
+
+
+
+ out::
+
+ struct cfs_create_out {
+ ViceFid VFid;
+ struct coda_vattr attr;
+ } cfs_create;
+
+
+
+ Description
+ This upcall is invoked to request creation of a file.
+ The file will be created in the directory identified by VFid, its name
+ will be name, and the mode will be mode. If excl is set an error will
+ be returned if the file already exists. If the size field in attr is
+ set to zero the file will be truncated. The uid and gid of the file
+ are set by converting the CodaCred to a uid using a macro CRTOUID
+ (this macro is platform dependent). Upon success the VFid and
+ attributes of the file are returned. The Coda FS Driver will normally
+ instantiate a vnode, inode or file handle at kernel level for the new
+ object.
+
+
+ Errors
+ A variety of errors can occur. Permissions may be insufficient.
+ If the object exists and is not a file the error EISDIR is returned
+ under Unix.
+
+ .. Note::
+
+ The packing of parameters is very inefficient and appears to
+ indicate confusion between the system call creat and the VFS operation
+ create. The VFS operation create is only called to create new objects.
+ This create call differs from the Unix one in that it is not invoked
+ to return a file descriptor. The truncate and exclusive options,
+ together with the mode, could simply be part of the mode as it is
+ under Unix. There should be no flags argument; this is used in open
+ (2) to return a file descriptor for READ or WRITE mode.
+
+ The attributes of the directory should be returned too, since the size
+ and mtime changed.
+
+
+4.9. mkdir
+-----------
+
+
+ Summary
+ Create a new directory.
+
+ Arguments
+ in::
+
+ struct cfs_mkdir_in {
+ ViceFid VFid;
+ struct coda_vattr attr;
+ char *name; /* Place holder for data. */
+ } cfs_mkdir;
+
+
+
+ out::
+
+ struct cfs_mkdir_out {
+ ViceFid VFid;
+ struct coda_vattr attr;
+ } cfs_mkdir;
+
+
+
+
+ Description
+ This call is similar to create but creates a directory.
+ Only the mode field in the input parameters is used for creation.
+ Upon successful creation, the attr returned contains the attributes of
+ the new directory.
+
+ Errors
+ As for create.
+
+ .. Note::
+
+ The input parameter should be changed to mode instead of
+ attributes.
+
+ The attributes of the parent should be returned since the size and
+ mtime changes.
+
+
+4.10. link
+-----------
+
+
+ Summary
+ Create a link to an existing file.
+
+ Arguments
+ in::
+
+ struct cfs_link_in {
+ ViceFid sourceFid; /* cnode to link *to* */
+ ViceFid destFid; /* Directory in which to place link */
+ char *tname; /* Place holder for data. */
+ } cfs_link;
+
+
+
+ out
+
+ empty
+
+ Description
+ This call creates a link to the sourceFid in the directory
+ identified by destFid with name tname. The source must reside in the
+ target's parent, i.e. the source must be have parent destFid, i.e. Coda
+ does not support cross directory hard links. Only the return value is
+ relevant. It indicates success or the type of failure.
+
+ Errors
+ The usual errors can occur.
+
+
+4.11. symlink
+--------------
+
+
+ Summary
+ create a symbolic link
+
+ Arguments
+ in::
+
+ struct cfs_symlink_in {
+ ViceFid VFid; /* Directory to put symlink in */
+ char *srcname;
+ struct coda_vattr attr;
+ char *tname;
+ } cfs_symlink;
+
+
+
+ out
+
+ none
+
+ Description
+ Create a symbolic link. The link is to be placed in the
+ directory identified by VFid and named tname. It should point to the
+ pathname srcname. The attributes of the newly created object are to
+ be set to attr.
+
+ .. Note::
+
+ The attributes of the target directory should be returned since
+ its size changed.
+
+
+4.12. remove
+-------------
+
+
+ Summary
+ Remove a file
+
+ Arguments
+ in::
+
+ struct cfs_remove_in {
+ ViceFid VFid;
+ char *name; /* Place holder for data. */
+ } cfs_remove;
+
+
+
+ out
+
+ none
+
+ Description
+ Remove file named cfs_remove_in.name in directory
+ identified by VFid.
+
+
+ .. Note::
+
+ The attributes of the directory should be returned since its
+ mtime and size may change.
+
+
+4.13. rmdir
+------------
+
+
+ Summary
+ Remove a directory
+
+ Arguments
+ in::
+
+ struct cfs_rmdir_in {
+ ViceFid VFid;
+ char *name; /* Place holder for data. */
+ } cfs_rmdir;
+
+
+
+ out
+
+ none
+
+ Description
+ Remove the directory with name name from the directory
+ identified by VFid.
+
+ .. Note:: The attributes of the parent directory should be returned since
+ its mtime and size may change.
+
+
+4.14. readlink
+---------------
+
+
+ Summary
+ Read the value of a symbolic link.
+
+ Arguments
+ in::
+
+ struct cfs_readlink_in {
+ ViceFid VFid;
+ } cfs_readlink;
+
+
+
+ out::
+
+ struct cfs_readlink_out {
+ int count;
+ caddr_t data; /* Place holder for data. */
+ } cfs_readlink;
+
+
+
+ Description
+ This routine reads the contents of symbolic link
+ identified by VFid into the buffer data. The buffer data must be able
+ to hold any name up to CFS_MAXNAMLEN (PATH or NAM??).
+
+ Errors
+ No unusual errors.
+
+
+4.15. open
+-----------
+
+
+ Summary
+ Open a file.
+
+ Arguments
+ in::
+
+ struct cfs_open_in {
+ ViceFid VFid;
+ int flags;
+ } cfs_open;
+
+
+
+ out::
+
+ struct cfs_open_out {
+ dev_t dev;
+ ino_t inode;
+ } cfs_open;
+
+
+
+ Description
+ This request asks Venus to place the file identified by
+ VFid in its cache and to note that the calling process wishes to open
+ it with flags as in open(2). The return value to the kernel differs
+ for Unix and Windows systems. For Unix systems the Coda FS Driver is
+ informed of the device and inode number of the container file in the
+ fields dev and inode. For Windows the path of the container file is
+ returned to the kernel.
+
+
+ .. Note::
+
+ Currently the cfs_open_out structure is not properly adapted to
+ deal with the Windows case. It might be best to implement two
+ upcalls, one to open aiming at a container file name, the other at a
+ container file inode.
+
+
+4.16. close
+------------
+
+
+ Summary
+ Close a file, update it on the servers.
+
+ Arguments
+ in::
+
+ struct cfs_close_in {
+ ViceFid VFid;
+ int flags;
+ } cfs_close;
+
+
+
+ out
+
+ none
+
+ Description
+ Close the file identified by VFid.
+
+ .. Note::
+
+ The flags argument is bogus and not used. However, Venus' code
+ has room to deal with an execp input field, probably this field should
+ be used to inform Venus that the file was closed but is still memory
+ mapped for execution. There are comments about fetching versus not
+ fetching the data in Venus vproc_vfscalls. This seems silly. If a
+ file is being closed, the data in the container file is to be the new
+ data. Here again the execp flag might be in play to create confusion:
+ currently Venus might think a file can be flushed from the cache when
+ it is still memory mapped. This needs to be understood.
+
+
+4.17. ioctl
+------------
+
+
+ Summary
+ Do an ioctl on a file. This includes the pioctl interface.
+
+ Arguments
+ in::
+
+ struct cfs_ioctl_in {
+ ViceFid VFid;
+ int cmd;
+ int len;
+ int rwflag;
+ char *data; /* Place holder for data. */
+ } cfs_ioctl;
+
+
+
+ out::
+
+
+ struct cfs_ioctl_out {
+ int len;
+ caddr_t data; /* Place holder for data. */
+ } cfs_ioctl;
+
+
+
+ Description
+ Do an ioctl operation on a file. The command, len and
+ data arguments are filled as usual. flags is not used by Venus.
+
+ .. Note::
+
+ Another bogus parameter. flags is not used. What is the
+ business about PREFETCHING in the Venus code?
+
+
+
+4.18. rename
+-------------
+
+
+ Summary
+ Rename a fid.
+
+ Arguments
+ in::
+
+ struct cfs_rename_in {
+ ViceFid sourceFid;
+ char *srcname;
+ ViceFid destFid;
+ char *destname;
+ } cfs_rename;
+
+
+
+ out
+
+ none
+
+ Description
+ Rename the object with name srcname in directory
+ sourceFid to destname in destFid. It is important that the names
+ srcname and destname are 0 terminated strings. Strings in Unix
+ kernels are not always null terminated.
+
+
+4.19. readdir
+--------------
+
+
+ Summary
+ Read directory entries.
+
+ Arguments
+ in::
+
+ struct cfs_readdir_in {
+ ViceFid VFid;
+ int count;
+ int offset;
+ } cfs_readdir;
+
+
+
+
+ out::
+
+ struct cfs_readdir_out {
+ int size;
+ caddr_t data; /* Place holder for data. */
+ } cfs_readdir;
+
+
+
+ Description
+ Read directory entries from VFid starting at offset and
+ read at most count bytes. Returns the data in data and returns
+ the size in size.
+
+
+ .. Note::
+
+ This call is not used. Readdir operations exploit container
+ files. We will re-evaluate this during the directory revamp which is
+ about to take place.
+
+
+4.20. vget
+-----------
+
+
+ Summary
+ instructs Venus to do an FSDB->Get.
+
+ Arguments
+ in::
+
+ struct cfs_vget_in {
+ ViceFid VFid;
+ } cfs_vget;
+
+
+
+ out::
+
+ struct cfs_vget_out {
+ ViceFid VFid;
+ int vtype;
+ } cfs_vget;
+
+
+
+ Description
+ This upcall asks Venus to do a get operation on an fsobj
+ labelled by VFid.
+
+ .. Note::
+
+ This operation is not used. However, it is extremely useful
+ since it can be used to deal with read/write memory mapped files.
+ These can be "pinned" in the Venus cache using vget and released with
+ inactive.
+
+
+4.21. fsync
+------------
+
+
+ Summary
+ Tell Venus to update the RVM attributes of a file.
+
+ Arguments
+ in::
+
+ struct cfs_fsync_in {
+ ViceFid VFid;
+ } cfs_fsync;
+
+
+
+ out
+
+ none
+
+ Description
+ Ask Venus to update RVM attributes of object VFid. This
+ should be called as part of kernel level fsync type calls. The
+ result indicates if the syncing was successful.
+
+ .. Note:: Linux does not implement this call. It should.
+
+
+4.22. inactive
+---------------
+
+
+ Summary
+ Tell Venus a vnode is no longer in use.
+
+ Arguments
+ in::
+
+ struct cfs_inactive_in {
+ ViceFid VFid;
+ } cfs_inactive;
+
+
+
+ out
+
+ none
+
+ Description
+ This operation returns EOPNOTSUPP.
+
+ .. Note:: This should perhaps be removed.
+
+
+4.23. rdwr
+-----------
+
+
+ Summary
+ Read or write from a file
+
+ Arguments
+ in::
+
+ struct cfs_rdwr_in {
+ ViceFid VFid;
+ int rwflag;
+ int count;
+ int offset;
+ int ioflag;
+ caddr_t data; /* Place holder for data. */
+ } cfs_rdwr;
+
+
+
+
+ out::
+
+ struct cfs_rdwr_out {
+ int rwflag;
+ int count;
+ caddr_t data; /* Place holder for data. */
+ } cfs_rdwr;
+
+
+
+ Description
+ This upcall asks Venus to read or write from a file.
+
+
+ .. Note::
+
+ It should be removed since it is against the Coda philosophy that
+ read/write operations never reach Venus. I have been told the
+ operation does not work. It is not currently used.
+
+
+
+4.24. odymount
+---------------
+
+
+ Summary
+ Allows mounting multiple Coda "filesystems" on one Unix mount point.
+
+ Arguments
+ in::
+
+ struct ody_mount_in {
+ char *name; /* Place holder for data. */
+ } ody_mount;
+
+
+
+ out::
+
+ struct ody_mount_out {
+ ViceFid VFid;
+ } ody_mount;
+
+
+
+ Description
+ Asks Venus to return the rootfid of a Coda system named
+ name. The fid is returned in VFid.
+
+ .. Note::
+
+ This call was used by David for dynamic sets. It should be
+ removed since it causes a jungle of pointers in the VFS mounting area.
+ It is not used by Coda proper. Call is not implemented by Venus.
+
+
+4.25. ody_lookup
+-----------------
+
+
+ Summary
+ Looks up something.
+
+ Arguments
+ in
+
+ irrelevant
+
+
+ out
+
+ irrelevant
+
+
+ .. Note:: Gut it. Call is not implemented by Venus.
+
+
+4.26. ody_expand
+-----------------
+
+
+ Summary
+ expands something in a dynamic set.
+
+ Arguments
+ in
+
+ irrelevant
+
+ out
+
+ irrelevant
+
+ .. Note:: Gut it. Call is not implemented by Venus.
+
+
+4.27. prefetch
+---------------
+
+
+ Summary
+ Prefetch a dynamic set.
+
+ Arguments
+
+ in
+
+ Not documented.
+
+ out
+
+ Not documented.
+
+ Description
+ Venus worker.cc has support for this call, although it is
+ noted that it doesn't work. Not surprising, since the kernel does not
+ have support for it. (ODY_PREFETCH is not a defined operation).
+
+
+ .. Note:: Gut it. It isn't working and isn't used by Coda.
+
+
+
+4.28. signal
+-------------
+
+
+ Summary
+ Send Venus a signal about an upcall.
+
+ Arguments
+ in
+
+ none
+
+ out
+
+ not applicable.
+
+ Description
+ This is an out-of-band upcall to Venus to inform Venus
+ that the calling process received a signal after Venus read the
+ message from the input queue. Venus is supposed to clean up the
+ operation.
+
+ Errors
+ No reply is given.
+
+ .. Note::
+
+ We need to better understand what Venus needs to clean up and if
+ it is doing this correctly. Also we need to handle multiple upcall
+ per system call situations correctly. It would be important to know
+ what state changes in Venus take place after an upcall for which the
+ kernel is responsible for notifying Venus to clean up (e.g. open
+ definitely is such a state change, but many others are maybe not).
+
+
+5. The minicache and downcalls
+===============================
+
+
+ The Coda FS Driver can cache results of lookup and access upcalls, to
+ limit the frequency of upcalls. Upcalls carry a price since a process
+ context switch needs to take place. The counterpart of caching the
+ information is that Venus will notify the FS Driver that cached
+ entries must be flushed or renamed.
+
+ The kernel code generally has to maintain a structure which links the
+ internal file handles (called vnodes in BSD, inodes in Linux and
+ FileHandles in Windows) with the ViceFid's which Venus maintains. The
+ reason is that frequent translations back and forth are needed in
+ order to make upcalls and use the results of upcalls. Such linking
+ objects are called cnodes.
+
+ The current minicache implementations have cache entries which record
+ the following:
+
+ 1. the name of the file
+
+ 2. the cnode of the directory containing the object
+
+ 3. a list of CodaCred's for which the lookup is permitted.
+
+ 4. the cnode of the object
+
+ The lookup call in the Coda FS Driver may request the cnode of the
+ desired object from the cache, by passing its name, directory and the
+ CodaCred's of the caller. The cache will return the cnode or indicate
+ that it cannot be found. The Coda FS Driver must be careful to
+ invalidate cache entries when it modifies or removes objects.
+
+ When Venus obtains information that indicates that cache entries are
+ no longer valid, it will make a downcall to the kernel. Downcalls are
+ intercepted by the Coda FS Driver and lead to cache invalidations of
+ the kind described below. The Coda FS Driver does not return an error
+ unless the downcall data could not be read into kernel memory.
+
+
+5.1. INVALIDATE
+----------------
+
+
+ No information is available on this call.
+
+
+5.2. FLUSH
+-----------
+
+
+
+ Arguments
+ None
+
+ Summary
+ Flush the name cache entirely.
+
+ Description
+ Venus issues this call upon startup and when it dies. This
+ is to prevent stale cache information being held. Some operating
+ systems allow the kernel name cache to be switched off dynamically.
+ When this is done, this downcall is made.
+
+
+5.3. PURGEUSER
+---------------
+
+
+ Arguments
+ ::
+
+ struct cfs_purgeuser_out {/* CFS_PURGEUSER is a venus->kernel call */
+ struct CodaCred cred;
+ } cfs_purgeuser;
+
+
+
+ Description
+ Remove all entries in the cache carrying the Cred. This
+ call is issued when tokens for a user expire or are flushed.
+
+
+5.4. ZAPFILE
+-------------
+
+
+ Arguments
+ ::
+
+ struct cfs_zapfile_out { /* CFS_ZAPFILE is a venus->kernel call */
+ ViceFid CodaFid;
+ } cfs_zapfile;
+
+
+
+ Description
+ Remove all entries which have the (dir vnode, name) pair.
+ This is issued as a result of an invalidation of cached attributes of
+ a vnode.
+
+ .. Note::
+
+ Call is not named correctly in NetBSD and Mach. The minicache
+ zapfile routine takes different arguments. Linux does not implement
+ the invalidation of attributes correctly.
+
+
+
+5.5. ZAPDIR
+------------
+
+
+ Arguments
+ ::
+
+ struct cfs_zapdir_out { /* CFS_ZAPDIR is a venus->kernel call */
+ ViceFid CodaFid;
+ } cfs_zapdir;
+
+
+
+ Description
+ Remove all entries in the cache lying in a directory
+ CodaFid, and all children of this directory. This call is issued when
+ Venus receives a callback on the directory.
+
+
+5.6. ZAPVNODE
+--------------
+
+
+
+ Arguments
+ ::
+
+ struct cfs_zapvnode_out { /* CFS_ZAPVNODE is a venus->kernel call */
+ struct CodaCred cred;
+ ViceFid VFid;
+ } cfs_zapvnode;
+
+
+
+ Description
+ Remove all entries in the cache carrying the cred and VFid
+ as in the arguments. This downcall is probably never issued.
+
+
+5.7. PURGEFID
+--------------
+
+
+ Arguments
+ ::
+
+ struct cfs_purgefid_out { /* CFS_PURGEFID is a venus->kernel call */
+ ViceFid CodaFid;
+ } cfs_purgefid;
+
+
+
+ Description
+ Flush the attribute for the file. If it is a dir (odd
+ vnode), purge its children from the namecache and remove the file from the
+ namecache.
+
+
+
+5.8. REPLACE
+-------------
+
+
+ Summary
+ Replace the Fid's for a collection of names.
+
+ Arguments
+ ::
+
+ struct cfs_replace_out { /* cfs_replace is a venus->kernel call */
+ ViceFid NewFid;
+ ViceFid OldFid;
+ } cfs_replace;
+
+
+
+ Description
+ This routine replaces a ViceFid in the name cache with
+ another. It is added to allow Venus during reintegration to replace
+ locally allocated temp fids while disconnected with global fids even
+ when the reference counts on those fids are not zero.
+
+
+6. Initialization and cleanup
+==============================
+
+
+ This section gives brief hints as to desirable features for the Coda
+ FS Driver at startup and upon shutdown or Venus failures. Before
+ entering the discussion it is useful to repeat that the Coda FS Driver
+ maintains the following data:
+
+
+ 1. message queues
+
+ 2. cnodes
+
+ 3. name cache entries
+
+ The name cache entries are entirely private to the driver, so they
+ can easily be manipulated. The message queues will generally have
+ clear points of initialization and destruction. The cnodes are
+ much more delicate. User processes hold reference counts in Coda
+ filesystems and it can be difficult to clean up the cnodes.
+
+ It can expect requests through:
+
+ 1. the message subsystem
+
+ 2. the VFS layer
+
+ 3. pioctl interface
+
+ Currently the pioctl passes through the VFS for Coda so we can
+ treat these similarly.
+
+
+6.1. Requirements
+------------------
+
+
+ The following requirements should be accommodated:
+
+ 1. The message queues should have open and close routines. On Unix
+ the opening of the character devices are such routines.
+
+ - Before opening, no messages can be placed.
+
+ - Opening will remove any old messages still pending.
+
+ - Close will notify any sleeping processes that their upcall cannot
+ be completed.
+
+ - Close will free all memory allocated by the message queues.
+
+
+ 2. At open the namecache shall be initialized to empty state.
+
+ 3. Before the message queues are open, all VFS operations will fail.
+ Fortunately this can be achieved by making sure than mounting the
+ Coda filesystem cannot succeed before opening.
+
+ 4. After closing of the queues, no VFS operations can succeed. Here
+ one needs to be careful, since a few operations (lookup,
+ read/write, readdir) can proceed without upcalls. These must be
+ explicitly blocked.
+
+ 5. Upon closing the namecache shall be flushed and disabled.
+
+ 6. All memory held by cnodes can be freed without relying on upcalls.
+
+ 7. Unmounting the file system can be done without relying on upcalls.
+
+ 8. Mounting the Coda filesystem should fail gracefully if Venus cannot
+ get the rootfid or the attributes of the rootfid. The latter is
+ best implemented by Venus fetching these objects before attempting
+ to mount.
+
+ .. Note::
+
+ NetBSD in particular but also Linux have not implemented the
+ above requirements fully. For smooth operation this needs to be
+ corrected.
+
+
+
diff --git a/Documentation/filesystems/coda.txt b/Documentation/filesystems/coda.txt
deleted file mode 100644
index 1711ad48e38a..000000000000
--- a/Documentation/filesystems/coda.txt
+++ /dev/null
@@ -1,1676 +0,0 @@
-NOTE:
-This is one of the technical documents describing a component of
-Coda -- this document describes the client kernel-Venus interface.
-
-For more information:
- http://www.coda.cs.cmu.edu
-For user level software needed to run Coda:
- ftp://ftp.coda.cs.cmu.edu
-
-To run Coda you need to get a user level cache manager for the client,
-named Venus, as well as tools to manipulate ACLs, to log in, etc. The
-client needs to have the Coda filesystem selected in the kernel
-configuration.
-
-The server needs a user level server and at present does not depend on
-kernel support.
-
-
-
-
-
-
-
- The Venus kernel interface
- Peter J. Braam
- v1.0, Nov 9, 1997
-
- This document describes the communication between Venus and kernel
- level filesystem code needed for the operation of the Coda file sys-
- tem. This document version is meant to describe the current interface
- (version 1.0) as well as improvements we envisage.
- ______________________________________________________________________
-
- Table of Contents
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 1. Introduction
-
- 2. Servicing Coda filesystem calls
-
- 3. The message layer
-
- 3.1 Implementation details
-
- 4. The interface at the call level
-
- 4.1 Data structures shared by the kernel and Venus
- 4.2 The pioctl interface
- 4.3 root
- 4.4 lookup
- 4.5 getattr
- 4.6 setattr
- 4.7 access
- 4.8 create
- 4.9 mkdir
- 4.10 link
- 4.11 symlink
- 4.12 remove
- 4.13 rmdir
- 4.14 readlink
- 4.15 open
- 4.16 close
- 4.17 ioctl
- 4.18 rename
- 4.19 readdir
- 4.20 vget
- 4.21 fsync
- 4.22 inactive
- 4.23 rdwr
- 4.24 odymount
- 4.25 ody_lookup
- 4.26 ody_expand
- 4.27 prefetch
- 4.28 signal
-
- 5. The minicache and downcalls
-
- 5.1 INVALIDATE
- 5.2 FLUSH
- 5.3 PURGEUSER
- 5.4 ZAPFILE
- 5.5 ZAPDIR
- 5.6 ZAPVNODE
- 5.7 PURGEFID
- 5.8 REPLACE
-
- 6. Initialization and cleanup
-
- 6.1 Requirements
-
-
- ______________________________________________________________________
- 0wpage
-
- 11.. IInnttrroodduuccttiioonn
-
-
-
- A key component in the Coda Distributed File System is the cache
- manager, _V_e_n_u_s.
-
-
- When processes on a Coda enabled system access files in the Coda
- filesystem, requests are directed at the filesystem layer in the
- operating system. The operating system will communicate with Venus to
- service the request for the process. Venus manages a persistent
- client cache and makes remote procedure calls to Coda file servers and
- related servers (such as authentication servers) to service these
- requests it receives from the operating system. When Venus has
- serviced a request it replies to the operating system with appropriate
- return codes, and other data related to the request. Optionally the
- kernel support for Coda may maintain a minicache of recently processed
- requests to limit the number of interactions with Venus. Venus
- possesses the facility to inform the kernel when elements from its
- minicache are no longer valid.
-
- This document describes precisely this communication between the
- kernel and Venus. The definitions of so called upcalls and downcalls
- will be given with the format of the data they handle. We shall also
- describe the semantic invariants resulting from the calls.
-
- Historically Coda was implemented in a BSD file system in Mach 2.6.
- The interface between the kernel and Venus is very similar to the BSD
- VFS interface. Similar functionality is provided, and the format of
- the parameters and returned data is very similar to the BSD VFS. This
- leads to an almost natural environment for implementing a kernel-level
- filesystem driver for Coda in a BSD system. However, other operating
- systems such as Linux and Windows 95 and NT have virtual filesystem
- with different interfaces.
-
- To implement Coda on these systems some reverse engineering of the
- Venus/Kernel protocol is necessary. Also it came to light that other
- systems could profit significantly from certain small optimizations
- and modifications to the protocol. To facilitate this work as well as
- to make future ports easier, communication between Venus and the
- kernel should be documented in great detail. This is the aim of this
- document.
-
- 0wpage
-
- 22.. SSeerrvviicciinngg CCooddaa ffiilleessyysstteemm ccaallllss
-
- The service of a request for a Coda file system service originates in
- a process PP which accessing a Coda file. It makes a system call which
- traps to the OS kernel. Examples of such calls trapping to the kernel
- are _r_e_a_d_, _w_r_i_t_e_, _o_p_e_n_, _c_l_o_s_e_, _c_r_e_a_t_e_, _m_k_d_i_r_, _r_m_d_i_r_, _c_h_m_o_d in a Unix
- context. Similar calls exist in the Win32 environment, and are named
- _C_r_e_a_t_e_F_i_l_e_, .
-
- Generally the operating system handles the request in a virtual
- filesystem (VFS) layer, which is named I/O Manager in NT and IFS
- manager in Windows 95. The VFS is responsible for partial processing
- of the request and for locating the specific filesystem(s) which will
- service parts of the request. Usually the information in the path
- assists in locating the correct FS drivers. Sometimes after extensive
- pre-processing, the VFS starts invoking exported routines in the FS
- driver. This is the point where the FS specific processing of the
- request starts, and here the Coda specific kernel code comes into
- play.
-
- The FS layer for Coda must expose and implement several interfaces.
- First and foremost the VFS must be able to make all necessary calls to
- the Coda FS layer, so the Coda FS driver must expose the VFS interface
- as applicable in the operating system. These differ very significantly
- among operating systems, but share features such as facilities to
- read/write and create and remove objects. The Coda FS layer services
- such VFS requests by invoking one or more well defined services
- offered by the cache manager Venus. When the replies from Venus have
- come back to the FS driver, servicing of the VFS call continues and
- finishes with a reply to the kernel's VFS. Finally the VFS layer
- returns to the process.
-
- As a result of this design a basic interface exposed by the FS driver
- must allow Venus to manage message traffic. In particular Venus must
- be able to retrieve and place messages and to be notified of the
- arrival of a new message. The notification must be through a mechanism
- which does not block Venus since Venus must attend to other tasks even
- when no messages are waiting or being processed.
-
-
-
-
-
-
- Interfaces of the Coda FS Driver
-
- Furthermore the FS layer provides for a special path of communication
- between a user process and Venus, called the pioctl interface. The
- pioctl interface is used for Coda specific services, such as
- requesting detailed information about the persistent cache managed by
- Venus. Here the involvement of the kernel is minimal. It identifies
- the calling process and passes the information on to Venus. When
- Venus replies the response is passed back to the caller in unmodified
- form.
-
- Finally Venus allows the kernel FS driver to cache the results from
- certain services. This is done to avoid excessive context switches
- and results in an efficient system. However, Venus may acquire
- information, for example from the network which implies that cached
- information must be flushed or replaced. Venus then makes a downcall
- to the Coda FS layer to request flushes or updates in the cache. The
- kernel FS driver handles such requests synchronously.
-
- Among these interfaces the VFS interface and the facility to place,
- receive and be notified of messages are platform specific. We will
- not go into the calls exported to the VFS layer but we will state the
- requirements of the message exchange mechanism.
-
- 0wpage
-
- 33.. TThhee mmeessssaaggee llaayyeerr
-
-
-
- At the lowest level the communication between Venus and the FS driver
- proceeds through messages. The synchronization between processes
- requesting Coda file service and Venus relies on blocking and waking
- up processes. The Coda FS driver processes VFS- and pioctl-requests
- on behalf of a process P, creates messages for Venus, awaits replies
- and finally returns to the caller. The implementation of the exchange
- of messages is platform specific, but the semantics have (so far)
- appeared to be generally applicable. Data buffers are created by the
- FS Driver in kernel memory on behalf of P and copied to user memory in
- Venus.
-
- The FS Driver while servicing P makes upcalls to Venus. Such an
- upcall is dispatched to Venus by creating a message structure. The
- structure contains the identification of P, the message sequence
- number, the size of the request and a pointer to the data in kernel
- memory for the request. Since the data buffer is re-used to hold the
- reply from Venus, there is a field for the size of the reply. A flags
- field is used in the message to precisely record the status of the
- message. Additional platform dependent structures involve pointers to
- determine the position of the message on queues and pointers to
- synchronization objects. In the upcall routine the message structure
- is filled in, flags are set to 0, and it is placed on the _p_e_n_d_i_n_g
- queue. The routine calling upcall is responsible for allocating the
- data buffer; its structure will be described in the next section.
-
- A facility must exist to notify Venus that the message has been
- created, and implemented using available synchronization objects in
- the OS. This notification is done in the upcall context of the process
- P. When the message is on the pending queue, process P cannot proceed
- in upcall. The (kernel mode) processing of P in the filesystem
- request routine must be suspended until Venus has replied. Therefore
- the calling thread in P is blocked in upcall. A pointer in the
- message structure will locate the synchronization object on which P is
- sleeping.
-
- Venus detects the notification that a message has arrived, and the FS
- driver allow Venus to retrieve the message with a getmsg_from_kernel
- call. This action finishes in the kernel by putting the message on the
- queue of processing messages and setting flags to READ. Venus is
- passed the contents of the data buffer. The getmsg_from_kernel call
- now returns and Venus processes the request.
-
- At some later point the FS driver receives a message from Venus,
- namely when Venus calls sendmsg_to_kernel. At this moment the Coda FS
- driver looks at the contents of the message and decides if:
-
-
- +o the message is a reply for a suspended thread P. If so it removes
- the message from the processing queue and marks the message as
- WRITTEN. Finally, the FS driver unblocks P (still in the kernel
- mode context of Venus) and the sendmsg_to_kernel call returns to
- Venus. The process P will be scheduled at some point and continues
- processing its upcall with the data buffer replaced with the reply
- from Venus.
-
- +o The message is a _d_o_w_n_c_a_l_l. A downcall is a request from Venus to
- the FS Driver. The FS driver processes the request immediately
- (usually a cache eviction or replacement) and when it finishes
- sendmsg_to_kernel returns.
-
- Now P awakes and continues processing upcall. There are some
- subtleties to take account of. First P will determine if it was woken
- up in upcall by a signal from some other source (for example an
- attempt to terminate P) or as is normally the case by Venus in its
- sendmsg_to_kernel call. In the normal case, the upcall routine will
- deallocate the message structure and return. The FS routine can proceed
- with its processing.
-
-
-
-
-
-
-
- Sleeping and IPC arrangements
-
- In case P is woken up by a signal and not by Venus, it will first look
- at the flags field. If the message is not yet READ, the process P can
- handle its signal without notifying Venus. If Venus has READ, and
- the request should not be processed, P can send Venus a signal message
- to indicate that it should disregard the previous message. Such
- signals are put in the queue at the head, and read first by Venus. If
- the message is already marked as WRITTEN it is too late to stop the
- processing. The VFS routine will now continue. (-- If a VFS request
- involves more than one upcall, this can lead to complicated state, an
- extra field "handle_signals" could be added in the message structure
- to indicate points of no return have been passed.--)
-
-
-
- 33..11.. IImmpplleemmeennttaattiioonn ddeettaaiillss
-
- The Unix implementation of this mechanism has been through the
- implementation of a character device associated with Coda. Venus
- retrieves messages by doing a read on the device, replies are sent
- with a write and notification is through the select system call on the
- file descriptor for the device. The process P is kept waiting on an
- interruptible wait queue object.
-
- In Windows NT and the DPMI Windows 95 implementation a DeviceIoControl
- call is used. The DeviceIoControl call is designed to copy buffers
- from user memory to kernel memory with OPCODES. The sendmsg_to_kernel
- is issued as a synchronous call, while the getmsg_from_kernel call is
- asynchronous. Windows EventObjects are used for notification of
- message arrival. The process P is kept waiting on a KernelEvent
- object in NT and a semaphore in Windows 95.
-
- 0wpage
-
- 44.. TThhee iinntteerrffaaccee aatt tthhee ccaallll lleevveell
-
-
- This section describes the upcalls a Coda FS driver can make to Venus.
- Each of these upcalls make use of two structures: inputArgs and
- outputArgs. In pseudo BNF form the structures take the following
- form:
-
-
- struct inputArgs {
- u_long opcode;
- u_long unique; /* Keep multiple outstanding msgs distinct */
- u_short pid; /* Common to all */
- u_short pgid; /* Common to all */
- struct CodaCred cred; /* Common to all */
-
- <union "in" of call dependent parts of inputArgs>
- };
-
- struct outputArgs {
- u_long opcode;
- u_long unique; /* Keep multiple outstanding msgs distinct */
- u_long result;
-
- <union "out" of call dependent parts of inputArgs>
- };
-
-
-
- Before going on let us elucidate the role of the various fields. The
- inputArgs start with the opcode which defines the type of service
- requested from Venus. There are approximately 30 upcalls at present
- which we will discuss. The unique field labels the inputArg with a
- unique number which will identify the message uniquely. A process and
- process group id are passed. Finally the credentials of the caller
- are included.
-
- Before delving into the specific calls we need to discuss a variety of
- data structures shared by the kernel and Venus.
-
-
-
-
- 44..11.. DDaattaa ssttrruuccttuurreess sshhaarreedd bbyy tthhee kkeerrnneell aanndd VVeennuuss
-
-
- The CodaCred structure defines a variety of user and group ids as
- they are set for the calling process. The vuid_t and vgid_t are 32 bit
- unsigned integers. It also defines group membership in an array. On
- Unix the CodaCred has proven sufficient to implement good security
- semantics for Coda but the structure may have to undergo modification
- for the Windows environment when these mature.
-
- struct CodaCred {
- vuid_t cr_uid, cr_euid, cr_suid, cr_fsuid; /* Real, effective, set, fs uid */
- vgid_t cr_gid, cr_egid, cr_sgid, cr_fsgid; /* same for groups */
- vgid_t cr_groups[NGROUPS]; /* Group membership for caller */
- };
-
-
-
- NNOOTTEE It is questionable if we need CodaCreds in Venus. Finally Venus
- doesn't know about groups, although it does create files with the
- default uid/gid. Perhaps the list of group membership is superfluous.
-
-
- The next item is the fundamental identifier used to identify Coda
- files, the ViceFid. A fid of a file uniquely defines a file or
- directory in the Coda filesystem within a _c_e_l_l. (-- A _c_e_l_l is a
- group of Coda servers acting under the aegis of a single system
- control machine or SCM. See the Coda Administration manual for a
- detailed description of the role of the SCM.--)
-
-
- typedef struct ViceFid {
- VolumeId Volume;
- VnodeId Vnode;
- Unique_t Unique;
- } ViceFid;
-
-
-
- Each of the constituent fields: VolumeId, VnodeId and Unique_t are
- unsigned 32 bit integers. We envisage that a further field will need
- to be prefixed to identify the Coda cell; this will probably take the
- form of a Ipv6 size IP address naming the Coda cell through DNS.
-
- The next important structure shared between Venus and the kernel is
- the attributes of the file. The following structure is used to
- exchange information. It has room for future extensions such as
- support for device files (currently not present in Coda).
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- struct coda_timespec {
- int64_t tv_sec; /* seconds */
- long tv_nsec; /* nanoseconds */
- };
-
- struct coda_vattr {
- enum coda_vtype va_type; /* vnode type (for create) */
- u_short va_mode; /* files access mode and type */
- short va_nlink; /* number of references to file */
- vuid_t va_uid; /* owner user id */
- vgid_t va_gid; /* owner group id */
- long va_fsid; /* file system id (dev for now) */
- long va_fileid; /* file id */
- u_quad_t va_size; /* file size in bytes */
- long va_blocksize; /* blocksize preferred for i/o */
- struct coda_timespec va_atime; /* time of last access */
- struct coda_timespec va_mtime; /* time of last modification */
- struct coda_timespec va_ctime; /* time file changed */
- u_long va_gen; /* generation number of file */
- u_long va_flags; /* flags defined for file */
- dev_t va_rdev; /* device special file represents */
- u_quad_t va_bytes; /* bytes of disk space held by file */
- u_quad_t va_filerev; /* file modification number */
- u_int va_vaflags; /* operations flags, see below */
- long va_spare; /* remain quad aligned */
- };
-
-
-
-
- 44..22.. TThhee ppiiooccttll iinntteerrffaaccee
-
-
- Coda specific requests can be made by application through the pioctl
- interface. The pioctl is implemented as an ordinary ioctl on a
- fictitious file /coda/.CONTROL. The pioctl call opens this file, gets
- a file handle and makes the ioctl call. Finally it closes the file.
-
- The kernel involvement in this is limited to providing the facility to
- open and close and pass the ioctl message _a_n_d to verify that a path in
- the pioctl data buffers is a file in a Coda filesystem.
-
- The kernel is handed a data packet of the form:
-
- struct {
- const char *path;
- struct ViceIoctl vidata;
- int follow;
- } data;
-
-
-
- where
-
-
- struct ViceIoctl {
- caddr_t in, out; /* Data to be transferred in, or out */
- short in_size; /* Size of input buffer <= 2K */
- short out_size; /* Maximum size of output buffer, <= 2K */
- };
-
-
-
- The path must be a Coda file, otherwise the ioctl upcall will not be
- made.
-
- NNOOTTEE The data structures and code are a mess. We need to clean this
- up.
-
- We now proceed to document the individual calls:
-
- 0wpage
-
- 44..33.. rroooott
-
-
- AArrgguummeennttss
-
- iinn empty
-
- oouutt
-
- struct cfs_root_out {
- ViceFid VFid;
- } cfs_root;
-
-
-
- DDeessccrriippttiioonn This call is made to Venus during the initialization of
- the Coda filesystem. If the result is zero, the cfs_root structure
- contains the ViceFid of the root of the Coda filesystem. If a non-zero
- result is generated, its value is a platform dependent error code
- indicating the difficulty Venus encountered in locating the root of
- the Coda filesystem.
-
- 0wpage
-
- 44..44.. llooookkuupp
-
-
- SSuummmmaarryy Find the ViceFid and type of an object in a directory if it
- exists.
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_lookup_in {
- ViceFid VFid;
- char *name; /* Place holder for data. */
- } cfs_lookup;
-
-
-
- oouutt
-
- struct cfs_lookup_out {
- ViceFid VFid;
- int vtype;
- } cfs_lookup;
-
-
-
- DDeessccrriippttiioonn This call is made to determine the ViceFid and filetype of
- a directory entry. The directory entry requested carries name name
- and Venus will search the directory identified by cfs_lookup_in.VFid.
- The result may indicate that the name does not exist, or that
- difficulty was encountered in finding it (e.g. due to disconnection).
- If the result is zero, the field cfs_lookup_out.VFid contains the
- targets ViceFid and cfs_lookup_out.vtype the coda_vtype giving the
- type of object the name designates.
-
- The name of the object is an 8 bit character string of maximum length
- CFS_MAXNAMLEN, currently set to 256 (including a 0 terminator.)
-
- It is extremely important to realize that Venus bitwise ors the field
- cfs_lookup.vtype with CFS_NOCACHE to indicate that the object should
- not be put in the kernel name cache.
-
- NNOOTTEE The type of the vtype is currently wrong. It should be
- coda_vtype. Linux does not take note of CFS_NOCACHE. It should.
-
- 0wpage
-
- 44..55.. ggeettaattttrr
-
-
- SSuummmmaarryy Get the attributes of a file.
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_getattr_in {
- ViceFid VFid;
- struct coda_vattr attr; /* XXXXX */
- } cfs_getattr;
-
-
-
- oouutt
-
- struct cfs_getattr_out {
- struct coda_vattr attr;
- } cfs_getattr;
-
-
-
- DDeessccrriippttiioonn This call returns the attributes of the file identified by
- fid.
-
- EErrrroorrss Errors can occur if the object with fid does not exist, is
- unaccessible or if the caller does not have permission to fetch
- attributes.
-
- NNoottee Many kernel FS drivers (Linux, NT and Windows 95) need to acquire
- the attributes as well as the Fid for the instantiation of an internal
- "inode" or "FileHandle". A significant improvement in performance on
- such systems could be made by combining the _l_o_o_k_u_p and _g_e_t_a_t_t_r calls
- both at the Venus/kernel interaction level and at the RPC level.
-
- The vattr structure included in the input arguments is superfluous and
- should be removed.
-
- 0wpage
-
- 44..66.. sseettaattttrr
-
-
- SSuummmmaarryy Set the attributes of a file.
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_setattr_in {
- ViceFid VFid;
- struct coda_vattr attr;
- } cfs_setattr;
-
-
-
-
- oouutt
- empty
-
- DDeessccrriippttiioonn The structure attr is filled with attributes to be changed
- in BSD style. Attributes not to be changed are set to -1, apart from
- vtype which is set to VNON. Other are set to the value to be assigned.
- The only attributes which the FS driver may request to change are the
- mode, owner, groupid, atime, mtime and ctime. The return value
- indicates success or failure.
-
- EErrrroorrss A variety of errors can occur. The object may not exist, may
- be inaccessible, or permission may not be granted by Venus.
-
- 0wpage
-
- 44..77.. aacccceessss
-
-
- SSuummmmaarryy
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_access_in {
- ViceFid VFid;
- int flags;
- } cfs_access;
-
-
-
- oouutt
- empty
-
- DDeessccrriippttiioonn Verify if access to the object identified by VFid for
- operations described by flags is permitted. The result indicates if
- access will be granted. It is important to remember that Coda uses
- ACLs to enforce protection and that ultimately the servers, not the
- clients enforce the security of the system. The result of this call
- will depend on whether a _t_o_k_e_n is held by the user.
-
- EErrrroorrss The object may not exist, or the ACL describing the protection
- may not be accessible.
-
- 0wpage
-
- 44..88.. ccrreeaattee
-
-
- SSuummmmaarryy Invoked to create a file
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_create_in {
- ViceFid VFid;
- struct coda_vattr attr;
- int excl;
- int mode;
- char *name; /* Place holder for data. */
- } cfs_create;
-
-
-
-
- oouutt
-
- struct cfs_create_out {
- ViceFid VFid;
- struct coda_vattr attr;
- } cfs_create;
-
-
-
- DDeessccrriippttiioonn This upcall is invoked to request creation of a file.
- The file will be created in the directory identified by VFid, its name
- will be name, and the mode will be mode. If excl is set an error will
- be returned if the file already exists. If the size field in attr is
- set to zero the file will be truncated. The uid and gid of the file
- are set by converting the CodaCred to a uid using a macro CRTOUID
- (this macro is platform dependent). Upon success the VFid and
- attributes of the file are returned. The Coda FS Driver will normally
- instantiate a vnode, inode or file handle at kernel level for the new
- object.
-
-
- EErrrroorrss A variety of errors can occur. Permissions may be insufficient.
- If the object exists and is not a file the error EISDIR is returned
- under Unix.
-
- NNOOTTEE The packing of parameters is very inefficient and appears to
- indicate confusion between the system call creat and the VFS operation
- create. The VFS operation create is only called to create new objects.
- This create call differs from the Unix one in that it is not invoked
- to return a file descriptor. The truncate and exclusive options,
- together with the mode, could simply be part of the mode as it is
- under Unix. There should be no flags argument; this is used in open
- (2) to return a file descriptor for READ or WRITE mode.
-
- The attributes of the directory should be returned too, since the size
- and mtime changed.
-
- 0wpage
-
- 44..99.. mmkkddiirr
-
-
- SSuummmmaarryy Create a new directory.
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_mkdir_in {
- ViceFid VFid;
- struct coda_vattr attr;
- char *name; /* Place holder for data. */
- } cfs_mkdir;
-
-
-
- oouutt
-
- struct cfs_mkdir_out {
- ViceFid VFid;
- struct coda_vattr attr;
- } cfs_mkdir;
-
-
-
-
- DDeessccrriippttiioonn This call is similar to create but creates a directory.
- Only the mode field in the input parameters is used for creation.
- Upon successful creation, the attr returned contains the attributes of
- the new directory.
-
- EErrrroorrss As for create.
-
- NNOOTTEE The input parameter should be changed to mode instead of
- attributes.
-
- The attributes of the parent should be returned since the size and
- mtime changes.
-
- 0wpage
-
- 44..1100.. lliinnkk
-
-
- SSuummmmaarryy Create a link to an existing file.
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_link_in {
- ViceFid sourceFid; /* cnode to link *to* */
- ViceFid destFid; /* Directory in which to place link */
- char *tname; /* Place holder for data. */
- } cfs_link;
-
-
-
- oouutt
- empty
-
- DDeessccrriippttiioonn This call creates a link to the sourceFid in the directory
- identified by destFid with name tname. The source must reside in the
- target's parent, i.e. the source must be have parent destFid, i.e. Coda
- does not support cross directory hard links. Only the return value is
- relevant. It indicates success or the type of failure.
-
- EErrrroorrss The usual errors can occur.0wpage
-
- 44..1111.. ssyymmlliinnkk
-
-
- SSuummmmaarryy create a symbolic link
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_symlink_in {
- ViceFid VFid; /* Directory to put symlink in */
- char *srcname;
- struct coda_vattr attr;
- char *tname;
- } cfs_symlink;
-
-
-
- oouutt
- none
-
- DDeessccrriippttiioonn Create a symbolic link. The link is to be placed in the
- directory identified by VFid and named tname. It should point to the
- pathname srcname. The attributes of the newly created object are to
- be set to attr.
-
- EErrrroorrss
-
- NNOOTTEE The attributes of the target directory should be returned since
- its size changed.
-
- 0wpage
-
- 44..1122.. rreemmoovvee
-
-
- SSuummmmaarryy Remove a file
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_remove_in {
- ViceFid VFid;
- char *name; /* Place holder for data. */
- } cfs_remove;
-
-
-
- oouutt
- none
-
- DDeessccrriippttiioonn Remove file named cfs_remove_in.name in directory
- identified by VFid.
-
- EErrrroorrss
-
- NNOOTTEE The attributes of the directory should be returned since its
- mtime and size may change.
-
- 0wpage
-
- 44..1133.. rrmmddiirr
-
-
- SSuummmmaarryy Remove a directory
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_rmdir_in {
- ViceFid VFid;
- char *name; /* Place holder for data. */
- } cfs_rmdir;
-
-
-
- oouutt
- none
-
- DDeessccrriippttiioonn Remove the directory with name name from the directory
- identified by VFid.
-
- EErrrroorrss
-
- NNOOTTEE The attributes of the parent directory should be returned since
- its mtime and size may change.
-
- 0wpage
-
- 44..1144.. rreeaaddlliinnkk
-
-
- SSuummmmaarryy Read the value of a symbolic link.
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_readlink_in {
- ViceFid VFid;
- } cfs_readlink;
-
-
-
- oouutt
-
- struct cfs_readlink_out {
- int count;
- caddr_t data; /* Place holder for data. */
- } cfs_readlink;
-
-
-
- DDeessccrriippttiioonn This routine reads the contents of symbolic link
- identified by VFid into the buffer data. The buffer data must be able
- to hold any name up to CFS_MAXNAMLEN (PATH or NAM??).
-
- EErrrroorrss No unusual errors.
-
- 0wpage
-
- 44..1155.. ooppeenn
-
-
- SSuummmmaarryy Open a file.
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_open_in {
- ViceFid VFid;
- int flags;
- } cfs_open;
-
-
-
- oouutt
-
- struct cfs_open_out {
- dev_t dev;
- ino_t inode;
- } cfs_open;
-
-
-
- DDeessccrriippttiioonn This request asks Venus to place the file identified by
- VFid in its cache and to note that the calling process wishes to open
- it with flags as in open(2). The return value to the kernel differs
- for Unix and Windows systems. For Unix systems the Coda FS Driver is
- informed of the device and inode number of the container file in the
- fields dev and inode. For Windows the path of the container file is
- returned to the kernel.
- EErrrroorrss
-
- NNOOTTEE Currently the cfs_open_out structure is not properly adapted to
- deal with the Windows case. It might be best to implement two
- upcalls, one to open aiming at a container file name, the other at a
- container file inode.
-
- 0wpage
-
- 44..1166.. cclloossee
-
-
- SSuummmmaarryy Close a file, update it on the servers.
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_close_in {
- ViceFid VFid;
- int flags;
- } cfs_close;
-
-
-
- oouutt
- none
-
- DDeessccrriippttiioonn Close the file identified by VFid.
-
- EErrrroorrss
-
- NNOOTTEE The flags argument is bogus and not used. However, Venus' code
- has room to deal with an execp input field, probably this field should
- be used to inform Venus that the file was closed but is still memory
- mapped for execution. There are comments about fetching versus not
- fetching the data in Venus vproc_vfscalls. This seems silly. If a
- file is being closed, the data in the container file is to be the new
- data. Here again the execp flag might be in play to create confusion:
- currently Venus might think a file can be flushed from the cache when
- it is still memory mapped. This needs to be understood.
-
- 0wpage
-
- 44..1177.. iiooccttll
-
-
- SSuummmmaarryy Do an ioctl on a file. This includes the pioctl interface.
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_ioctl_in {
- ViceFid VFid;
- int cmd;
- int len;
- int rwflag;
- char *data; /* Place holder for data. */
- } cfs_ioctl;
-
-
-
- oouutt
-
-
- struct cfs_ioctl_out {
- int len;
- caddr_t data; /* Place holder for data. */
- } cfs_ioctl;
-
-
-
- DDeessccrriippttiioonn Do an ioctl operation on a file. The command, len and
- data arguments are filled as usual. flags is not used by Venus.
-
- EErrrroorrss
-
- NNOOTTEE Another bogus parameter. flags is not used. What is the
- business about PREFETCHING in the Venus code?
-
-
- 0wpage
-
- 44..1188.. rreennaammee
-
-
- SSuummmmaarryy Rename a fid.
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_rename_in {
- ViceFid sourceFid;
- char *srcname;
- ViceFid destFid;
- char *destname;
- } cfs_rename;
-
-
-
- oouutt
- none
-
- DDeessccrriippttiioonn Rename the object with name srcname in directory
- sourceFid to destname in destFid. It is important that the names
- srcname and destname are 0 terminated strings. Strings in Unix
- kernels are not always null terminated.
-
- EErrrroorrss
-
- 0wpage
-
- 44..1199.. rreeaaddddiirr
-
-
- SSuummmmaarryy Read directory entries.
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_readdir_in {
- ViceFid VFid;
- int count;
- int offset;
- } cfs_readdir;
-
-
-
-
- oouutt
-
- struct cfs_readdir_out {
- int size;
- caddr_t data; /* Place holder for data. */
- } cfs_readdir;
-
-
-
- DDeessccrriippttiioonn Read directory entries from VFid starting at offset and
- read at most count bytes. Returns the data in data and returns
- the size in size.
-
- EErrrroorrss
-
- NNOOTTEE This call is not used. Readdir operations exploit container
- files. We will re-evaluate this during the directory revamp which is
- about to take place.
-
- 0wpage
-
- 44..2200.. vvggeett
-
-
- SSuummmmaarryy instructs Venus to do an FSDB->Get.
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_vget_in {
- ViceFid VFid;
- } cfs_vget;
-
-
-
- oouutt
-
- struct cfs_vget_out {
- ViceFid VFid;
- int vtype;
- } cfs_vget;
-
-
-
- DDeessccrriippttiioonn This upcall asks Venus to do a get operation on an fsobj
- labelled by VFid.
-
- EErrrroorrss
-
- NNOOTTEE This operation is not used. However, it is extremely useful
- since it can be used to deal with read/write memory mapped files.
- These can be "pinned" in the Venus cache using vget and released with
- inactive.
-
- 0wpage
-
- 44..2211.. ffssyynncc
-
-
- SSuummmmaarryy Tell Venus to update the RVM attributes of a file.
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_fsync_in {
- ViceFid VFid;
- } cfs_fsync;
-
-
-
- oouutt
- none
-
- DDeessccrriippttiioonn Ask Venus to update RVM attributes of object VFid. This
- should be called as part of kernel level fsync type calls. The
- result indicates if the syncing was successful.
-
- EErrrroorrss
-
- NNOOTTEE Linux does not implement this call. It should.
-
- 0wpage
-
- 44..2222.. iinnaaccttiivvee
-
-
- SSuummmmaarryy Tell Venus a vnode is no longer in use.
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_inactive_in {
- ViceFid VFid;
- } cfs_inactive;
-
-
-
- oouutt
- none
-
- DDeessccrriippttiioonn This operation returns EOPNOTSUPP.
-
- EErrrroorrss
-
- NNOOTTEE This should perhaps be removed.
-
- 0wpage
-
- 44..2233.. rrddwwrr
-
-
- SSuummmmaarryy Read or write from a file
-
- AArrgguummeennttss
-
- iinn
-
- struct cfs_rdwr_in {
- ViceFid VFid;
- int rwflag;
- int count;
- int offset;
- int ioflag;
- caddr_t data; /* Place holder for data. */
- } cfs_rdwr;
-
-
-
-
- oouutt
-
- struct cfs_rdwr_out {
- int rwflag;
- int count;
- caddr_t data; /* Place holder for data. */
- } cfs_rdwr;
-
-
-
- DDeessccrriippttiioonn This upcall asks Venus to read or write from a file.
-
- EErrrroorrss
-
- NNOOTTEE It should be removed since it is against the Coda philosophy that
- read/write operations never reach Venus. I have been told the
- operation does not work. It is not currently used.
-
-
- 0wpage
-
- 44..2244.. ooddyymmoouunntt
-
-
- SSuummmmaarryy Allows mounting multiple Coda "filesystems" on one Unix mount
- point.
-
- AArrgguummeennttss
-
- iinn
-
- struct ody_mount_in {
- char *name; /* Place holder for data. */
- } ody_mount;
-
-
-
- oouutt
-
- struct ody_mount_out {
- ViceFid VFid;
- } ody_mount;
-
-
-
- DDeessccrriippttiioonn Asks Venus to return the rootfid of a Coda system named
- name. The fid is returned in VFid.
-
- EErrrroorrss
-
- NNOOTTEE This call was used by David for dynamic sets. It should be
- removed since it causes a jungle of pointers in the VFS mounting area.
- It is not used by Coda proper. Call is not implemented by Venus.
-
- 0wpage
-
- 44..2255.. ooddyy__llooookkuupp
-
-
- SSuummmmaarryy Looks up something.
-
- AArrgguummeennttss
-
- iinn irrelevant
-
-
- oouutt
- irrelevant
-
- DDeessccrriippttiioonn
-
- EErrrroorrss
-
- NNOOTTEE Gut it. Call is not implemented by Venus.
-
- 0wpage
-
- 44..2266.. ooddyy__eexxppaanndd
-
-
- SSuummmmaarryy expands something in a dynamic set.
-
- AArrgguummeennttss
-
- iinn irrelevant
-
- oouutt
- irrelevant
-
- DDeessccrriippttiioonn
-
- EErrrroorrss
-
- NNOOTTEE Gut it. Call is not implemented by Venus.
-
- 0wpage
-
- 44..2277.. pprreeffeettcchh
-
-
- SSuummmmaarryy Prefetch a dynamic set.
-
- AArrgguummeennttss
-
- iinn Not documented.
-
- oouutt
- Not documented.
-
- DDeessccrriippttiioonn Venus worker.cc has support for this call, although it is
- noted that it doesn't work. Not surprising, since the kernel does not
- have support for it. (ODY_PREFETCH is not a defined operation).
-
- EErrrroorrss
-
- NNOOTTEE Gut it. It isn't working and isn't used by Coda.
-
-
- 0wpage
-
- 44..2288.. ssiiggnnaall
-
-
- SSuummmmaarryy Send Venus a signal about an upcall.
-
- AArrgguummeennttss
-
- iinn none
-
- oouutt
- not applicable.
-
- DDeessccrriippttiioonn This is an out-of-band upcall to Venus to inform Venus
- that the calling process received a signal after Venus read the
- message from the input queue. Venus is supposed to clean up the
- operation.
-
- EErrrroorrss No reply is given.
-
- NNOOTTEE We need to better understand what Venus needs to clean up and if
- it is doing this correctly. Also we need to handle multiple upcall
- per system call situations correctly. It would be important to know
- what state changes in Venus take place after an upcall for which the
- kernel is responsible for notifying Venus to clean up (e.g. open
- definitely is such a state change, but many others are maybe not).
-
- 0wpage
-
- 55.. TThhee mmiinniiccaacchhee aanndd ddoowwnnccaallllss
-
-
- The Coda FS Driver can cache results of lookup and access upcalls, to
- limit the frequency of upcalls. Upcalls carry a price since a process
- context switch needs to take place. The counterpart of caching the
- information is that Venus will notify the FS Driver that cached
- entries must be flushed or renamed.
-
- The kernel code generally has to maintain a structure which links the
- internal file handles (called vnodes in BSD, inodes in Linux and
- FileHandles in Windows) with the ViceFid's which Venus maintains. The
- reason is that frequent translations back and forth are needed in
- order to make upcalls and use the results of upcalls. Such linking
- objects are called ccnnooddeess.
-
- The current minicache implementations have cache entries which record
- the following:
-
- 1. the name of the file
-
- 2. the cnode of the directory containing the object
-
- 3. a list of CodaCred's for which the lookup is permitted.
-
- 4. the cnode of the object
-
- The lookup call in the Coda FS Driver may request the cnode of the
- desired object from the cache, by passing its name, directory and the
- CodaCred's of the caller. The cache will return the cnode or indicate
- that it cannot be found. The Coda FS Driver must be careful to
- invalidate cache entries when it modifies or removes objects.
-
- When Venus obtains information that indicates that cache entries are
- no longer valid, it will make a downcall to the kernel. Downcalls are
- intercepted by the Coda FS Driver and lead to cache invalidations of
- the kind described below. The Coda FS Driver does not return an error
- unless the downcall data could not be read into kernel memory.
-
-
- 55..11.. IINNVVAALLIIDDAATTEE
-
-
- No information is available on this call.
-
-
- 55..22.. FFLLUUSSHH
-
-
-
- AArrgguummeennttss None
-
- SSuummmmaarryy Flush the name cache entirely.
-
- DDeessccrriippttiioonn Venus issues this call upon startup and when it dies. This
- is to prevent stale cache information being held. Some operating
- systems allow the kernel name cache to be switched off dynamically.
- When this is done, this downcall is made.
-
-
- 55..33.. PPUURRGGEEUUSSEERR
-
-
- AArrgguummeennttss
-
- struct cfs_purgeuser_out {/* CFS_PURGEUSER is a venus->kernel call */
- struct CodaCred cred;
- } cfs_purgeuser;
-
-
-
- DDeessccrriippttiioonn Remove all entries in the cache carrying the Cred. This
- call is issued when tokens for a user expire or are flushed.
-
-
- 55..44.. ZZAAPPFFIILLEE
-
-
- AArrgguummeennttss
-
- struct cfs_zapfile_out { /* CFS_ZAPFILE is a venus->kernel call */
- ViceFid CodaFid;
- } cfs_zapfile;
-
-
-
- DDeessccrriippttiioonn Remove all entries which have the (dir vnode, name) pair.
- This is issued as a result of an invalidation of cached attributes of
- a vnode.
-
- NNOOTTEE Call is not named correctly in NetBSD and Mach. The minicache
- zapfile routine takes different arguments. Linux does not implement
- the invalidation of attributes correctly.
-
-
-
- 55..55.. ZZAAPPDDIIRR
-
-
- AArrgguummeennttss
-
- struct cfs_zapdir_out { /* CFS_ZAPDIR is a venus->kernel call */
- ViceFid CodaFid;
- } cfs_zapdir;
-
-
-
- DDeessccrriippttiioonn Remove all entries in the cache lying in a directory
- CodaFid, and all children of this directory. This call is issued when
- Venus receives a callback on the directory.
-
-
- 55..66.. ZZAAPPVVNNOODDEE
-
-
-
- AArrgguummeennttss
-
- struct cfs_zapvnode_out { /* CFS_ZAPVNODE is a venus->kernel call */
- struct CodaCred cred;
- ViceFid VFid;
- } cfs_zapvnode;
-
-
-
- DDeessccrriippttiioonn Remove all entries in the cache carrying the cred and VFid
- as in the arguments. This downcall is probably never issued.
-
-
- 55..77.. PPUURRGGEEFFIIDD
-
-
- SSuummmmaarryy
-
- AArrgguummeennttss
-
- struct cfs_purgefid_out { /* CFS_PURGEFID is a venus->kernel call */
- ViceFid CodaFid;
- } cfs_purgefid;
-
-
-
- DDeessccrriippttiioonn Flush the attribute for the file. If it is a dir (odd
- vnode), purge its children from the namecache and remove the file from the
- namecache.
-
-
-
- 55..88.. RREEPPLLAACCEE
-
-
- SSuummmmaarryy Replace the Fid's for a collection of names.
-
- AArrgguummeennttss
-
- struct cfs_replace_out { /* cfs_replace is a venus->kernel call */
- ViceFid NewFid;
- ViceFid OldFid;
- } cfs_replace;
-
-
-
- DDeessccrriippttiioonn This routine replaces a ViceFid in the name cache with
- another. It is added to allow Venus during reintegration to replace
- locally allocated temp fids while disconnected with global fids even
- when the reference counts on those fids are not zero.
-
- 0wpage
-
- 66.. IInniittiiaalliizzaattiioonn aanndd cclleeaannuupp
-
-
- This section gives brief hints as to desirable features for the Coda
- FS Driver at startup and upon shutdown or Venus failures. Before
- entering the discussion it is useful to repeat that the Coda FS Driver
- maintains the following data:
-
-
- 1. message queues
-
- 2. cnodes
-
- 3. name cache entries
-
- The name cache entries are entirely private to the driver, so they
- can easily be manipulated. The message queues will generally have
- clear points of initialization and destruction. The cnodes are
- much more delicate. User processes hold reference counts in Coda
- filesystems and it can be difficult to clean up the cnodes.
-
- It can expect requests through:
-
- 1. the message subsystem
-
- 2. the VFS layer
-
- 3. pioctl interface
-
- Currently the _p_i_o_c_t_l passes through the VFS for Coda so we can
- treat these similarly.
-
-
- 66..11.. RReeqquuiirreemmeennttss
-
-
- The following requirements should be accommodated:
-
- 1. The message queues should have open and close routines. On Unix
- the opening of the character devices are such routines.
-
- +o Before opening, no messages can be placed.
-
- +o Opening will remove any old messages still pending.
-
- +o Close will notify any sleeping processes that their upcall cannot
- be completed.
-
- +o Close will free all memory allocated by the message queues.
-
-
- 2. At open the namecache shall be initialized to empty state.
-
- 3. Before the message queues are open, all VFS operations will fail.
- Fortunately this can be achieved by making sure than mounting the
- Coda filesystem cannot succeed before opening.
-
- 4. After closing of the queues, no VFS operations can succeed. Here
- one needs to be careful, since a few operations (lookup,
- read/write, readdir) can proceed without upcalls. These must be
- explicitly blocked.
-
- 5. Upon closing the namecache shall be flushed and disabled.
-
- 6. All memory held by cnodes can be freed without relying on upcalls.
-
- 7. Unmounting the file system can be done without relying on upcalls.
-
- 8. Mounting the Coda filesystem should fail gracefully if Venus cannot
- get the rootfid or the attributes of the rootfid. The latter is
- best implemented by Venus fetching these objects before attempting
- to mount.
-
- NNOOTTEE NetBSD in particular but also Linux have not implemented the
- above requirements fully. For smooth operation this needs to be
- corrected.
-
-
-
diff --git a/Documentation/filesystems/configfs/configfs.txt b/Documentation/filesystems/configfs.rst
index 16e606c11f40..f8941954c667 100644
--- a/Documentation/filesystems/configfs/configfs.txt
+++ b/Documentation/filesystems/configfs.rst
@@ -1,5 +1,6 @@
-
-configfs - Userspace-driven kernel object configuration.
+=======================================================
+Configfs - Userspace-driven Kernel Object Configuration
+=======================================================
Joel Becker <joel.becker@oracle.com>
@@ -9,7 +10,8 @@ Copyright (c) 2005 Oracle Corporation,
Joel Becker <joel.becker@oracle.com>
-[What is configfs?]
+What is configfs?
+=================
configfs is a ram-based filesystem that provides the converse of
sysfs's functionality. Where sysfs is a filesystem-based view of
@@ -35,10 +37,11 @@ kernel modules backing the items must respond to this.
Both sysfs and configfs can and should exist together on the same
system. One is not a replacement for the other.
-[Using configfs]
+Using configfs
+==============
configfs can be compiled as a module or into the kernel. You can access
-it by doing
+it by doing::
mount -t configfs none /config
@@ -56,28 +59,29 @@ values. Don't mix more than one attribute in one attribute file.
There are two types of configfs attributes:
* Normal attributes, which similar to sysfs attributes, are small ASCII text
-files, with a maximum size of one page (PAGE_SIZE, 4096 on i386). Preferably
-only one value per file should be used, and the same caveats from sysfs apply.
-Configfs expects write(2) to store the entire buffer at once. When writing to
-normal configfs attributes, userspace processes should first read the entire
-file, modify the portions they wish to change, and then write the entire
-buffer back.
+ files, with a maximum size of one page (PAGE_SIZE, 4096 on i386). Preferably
+ only one value per file should be used, and the same caveats from sysfs apply.
+ Configfs expects write(2) to store the entire buffer at once. When writing to
+ normal configfs attributes, userspace processes should first read the entire
+ file, modify the portions they wish to change, and then write the entire
+ buffer back.
* Binary attributes, which are somewhat similar to sysfs binary attributes,
-but with a few slight changes to semantics. The PAGE_SIZE limitation does not
-apply, but the whole binary item must fit in single kernel vmalloc'ed buffer.
-The write(2) calls from user space are buffered, and the attributes'
-write_bin_attribute method will be invoked on the final close, therefore it is
-imperative for user-space to check the return code of close(2) in order to
-verify that the operation finished successfully.
-To avoid a malicious user OOMing the kernel, there's a per-binary attribute
-maximum buffer value.
+ but with a few slight changes to semantics. The PAGE_SIZE limitation does not
+ apply, but the whole binary item must fit in single kernel vmalloc'ed buffer.
+ The write(2) calls from user space are buffered, and the attributes'
+ write_bin_attribute method will be invoked on the final close, therefore it is
+ imperative for user-space to check the return code of close(2) in order to
+ verify that the operation finished successfully.
+ To avoid a malicious user OOMing the kernel, there's a per-binary attribute
+ maximum buffer value.
When an item needs to be destroyed, remove it with rmdir(2). An
item cannot be destroyed if any other item has a link to it (via
symlink(2)). Links can be removed via unlink(2).
-[Configuring FakeNBD: an Example]
+Configuring FakeNBD: an Example
+===============================
Imagine there's a Network Block Device (NBD) driver that allows you to
access remote block devices. Call it FakeNBD. FakeNBD uses configfs
@@ -86,14 +90,14 @@ sysadmins use to configure FakeNBD, but somehow that program has to tell
the driver about it. Here's where configfs comes in.
When the FakeNBD driver is loaded, it registers itself with configfs.
-readdir(3) sees this just fine:
+readdir(3) sees this just fine::
# ls /config
fakenbd
A fakenbd connection can be created with mkdir(2). The name is
arbitrary, but likely the tool will make some use of the name. Perhaps
-it is a uuid or a disk name:
+it is a uuid or a disk name::
# mkdir /config/fakenbd/disk1
# ls /config/fakenbd/disk1
@@ -102,7 +106,7 @@ it is a uuid or a disk name:
The target attribute contains the IP address of the server FakeNBD will
connect to. The device attribute is the device on the server.
Predictably, the rw attribute determines whether the connection is
-read-only or read-write.
+read-only or read-write::
# echo 10.0.0.1 > /config/fakenbd/disk1/target
# echo /dev/sda1 > /config/fakenbd/disk1/device
@@ -111,7 +115,8 @@ read-only or read-write.
That's it. That's all there is. Now the device is configured, via the
shell no less.
-[Coding With configfs]
+Coding With configfs
+====================
Every object in configfs is a config_item. A config_item reflects an
object in the subsystem. It has attributes that match values on that
@@ -130,7 +135,10 @@ appears as a directory at the top of the configfs filesystem. A
subsystem is also a config_group, and can do everything a config_group
can.
-[struct config_item]
+struct config_item
+==================
+
+::
struct config_item {
char *ci_name;
@@ -168,7 +176,10 @@ By itself, a config_item cannot do much more than appear in configfs.
Usually a subsystem wants the item to display and/or store attributes,
among other things. For that, it needs a type.
-[struct config_item_type]
+struct config_item_type
+=======================
+
+::
struct configfs_item_operations {
void (*release)(struct config_item *);
@@ -192,7 +203,10 @@ allocated dynamically will need to provide the ct_item_ops->release()
method. This method is called when the config_item's reference count
reaches zero.
-[struct configfs_attribute]
+struct configfs_attribute
+=========================
+
+::
struct configfs_attribute {
char *ca_name;
@@ -214,7 +228,10 @@ be called whenever userspace asks for a read(2) on the attribute. If an
attribute is writable and provides a ->store method, that method will be
be called whenever userspace asks for a write(2) on the attribute.
-[struct configfs_bin_attribute]
+struct configfs_bin_attribute
+=============================
+
+::
struct configfs_bin_attribute {
struct configfs_attribute cb_attr;
@@ -240,11 +257,12 @@ will happen for write(2). The reads/writes are bufferred so only a
single read/write will occur; the attributes' need not concern itself
with it.
-[struct config_group]
+struct config_group
+===================
A config_item cannot live in a vacuum. The only way one can be created
is via mkdir(2) on a config_group. This will trigger creation of a
-child item.
+child item::
struct config_group {
struct config_item cg_item;
@@ -264,7 +282,7 @@ The config_group structure contains a config_item. Properly configuring
that item means that a group can behave as an item in its own right.
However, it can do more: it can create child items or groups. This is
accomplished via the group operations specified on the group's
-config_item_type.
+config_item_type::
struct configfs_group_operations {
struct config_item *(*make_item)(struct config_group *group,
@@ -279,7 +297,8 @@ config_item_type.
};
A group creates child items by providing the
-ct_group_ops->make_item() method. If provided, this method is called from mkdir(2) in the group's directory. The subsystem allocates a new
+ct_group_ops->make_item() method. If provided, this method is called from
+mkdir(2) in the group's directory. The subsystem allocates a new
config_item (or more likely, its container structure), initializes it,
and returns it to configfs. Configfs will then populate the filesystem
tree to reflect the new item.
@@ -296,13 +315,14 @@ upon item allocation. If a subsystem has no work to do, it may omit
the ct_group_ops->drop_item() method, and configfs will call
config_item_put() on the item on behalf of the subsystem.
-IMPORTANT: drop_item() is void, and as such cannot fail. When rmdir(2)
-is called, configfs WILL remove the item from the filesystem tree
-(assuming that it has no children to keep it busy). The subsystem is
-responsible for responding to this. If the subsystem has references to
-the item in other threads, the memory is safe. It may take some time
-for the item to actually disappear from the subsystem's usage. But it
-is gone from configfs.
+Important:
+ drop_item() is void, and as such cannot fail. When rmdir(2)
+ is called, configfs WILL remove the item from the filesystem tree
+ (assuming that it has no children to keep it busy). The subsystem is
+ responsible for responding to this. If the subsystem has references to
+ the item in other threads, the memory is safe. It may take some time
+ for the item to actually disappear from the subsystem's usage. But it
+ is gone from configfs.
When drop_item() is called, the item's linkage has already been torn
down. It no longer has a reference on its parent and has no place in
@@ -319,10 +339,11 @@ is implemented in the configfs rmdir(2) code. ->drop_item() will not be
called, as the item has not been dropped. rmdir(2) will fail, as the
directory is not empty.
-[struct configfs_subsystem]
+struct configfs_subsystem
+=========================
A subsystem must register itself, usually at module_init time. This
-tells configfs to make the subsystem appear in the file tree.
+tells configfs to make the subsystem appear in the file tree::
struct configfs_subsystem {
struct config_group su_group;
@@ -332,17 +353,19 @@ tells configfs to make the subsystem appear in the file tree.
int configfs_register_subsystem(struct configfs_subsystem *subsys);
void configfs_unregister_subsystem(struct configfs_subsystem *subsys);
- A subsystem consists of a toplevel config_group and a mutex.
+A subsystem consists of a toplevel config_group and a mutex.
The group is where child config_items are created. For a subsystem,
this group is usually defined statically. Before calling
configfs_register_subsystem(), the subsystem must have initialized the
group via the usual group _init() functions, and it must also have
initialized the mutex.
- When the register call returns, the subsystem is live, and it
+
+When the register call returns, the subsystem is live, and it
will be visible via configfs. At that point, mkdir(2) can be called and
the subsystem must be ready for it.
-[An Example]
+An Example
+==========
The best example of these basic concepts is the simple_children
subsystem/group and the simple_child item in
@@ -350,7 +373,8 @@ samples/configfs/configfs_sample.c. It shows a trivial object displaying
and storing an attribute, and a simple group creating and destroying
these children.
-[Hierarchy Navigation and the Subsystem Mutex]
+Hierarchy Navigation and the Subsystem Mutex
+============================================
There is an extra bonus that configfs provides. The config_groups and
config_items are arranged in a hierarchy due to the fact that they
@@ -375,7 +399,8 @@ be in its parent's cg_children list for the same duration. This allows
a subsystem to trust ci_parent and cg_children while they hold the
mutex.
-[Item Aggregation Via symlink(2)]
+Item Aggregation Via symlink(2)
+===============================
configfs provides a simple group via the group->item parent/child
relationship. Often, however, a larger environment requires aggregation
@@ -403,7 +428,8 @@ A config_item cannot be removed while it links to any other item, nor
can it be removed while an item links to it. Dangling symlinks are not
allowed in configfs.
-[Automatically Created Subgroups]
+Automatically Created Subgroups
+===============================
A new config_group may want to have two types of child config_items.
While this could be codified by magic names in ->make_item(), it is much
@@ -433,7 +459,8 @@ As a consequence of this, default groups cannot be removed directly via
rmdir(2). They also are not considered when rmdir(2) on the parent
group is checking for children.
-[Dependent Subsystems]
+Dependent Subsystems
+====================
Sometimes other drivers depend on particular configfs items. For
example, ocfs2 mounts depend on a heartbeat region item. If that
@@ -460,9 +487,11 @@ succeeds, then heartbeat knows the region is safe to give to ocfs2.
If it fails, it was being torn down anyway, and heartbeat can gracefully
pass up an error.
-[Committable Items]
+Committable Items
+=================
-NOTE: Committable items are currently unimplemented.
+Note:
+ Committable items are currently unimplemented.
Some config_items cannot have a valid initial state. That is, no
default values can be specified for the item's attributes such that the
@@ -504,5 +533,3 @@ As rmdir(2) does not work in the "live" directory, an item must be
shutdown, or "uncommitted". Again, this is done via rename(2), this
time from the "live" directory back to the "pending" one. The subsystem
is notified by the ct_group_ops->uncommit_object() method.
-
-
diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index 679729442fd2..8e2670781c9b 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -20,8 +20,144 @@ Usage
If you have a block device which supports DAX, you can make a filesystem
on it as usual. The DAX code currently only supports files with a block
size equal to your kernel's PAGE_SIZE, so you may need to specify a block
-size when creating the filesystem. When mounting it, use the "-o dax"
-option on the command line or add 'dax' to the options in /etc/fstab.
+size when creating the filesystem.
+
+Currently 3 filesystems support DAX: ext2, ext4 and xfs. Enabling DAX on them
+is different.
+
+Enabling DAX on ext4 and ext2
+-----------------------------
+
+When mounting the filesystem, use the "-o dax" option on the command line or
+add 'dax' to the options in /etc/fstab. This works to enable DAX on all files
+within the filesystem. It is equivalent to the '-o dax=always' behavior below.
+
+
+Enabling DAX on xfs
+-------------------
+
+Summary
+-------
+
+ 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
+ the statx flag STATX_ATTR_DAX. See the manpage for statx(2) for details
+ about this access mode.
+
+ 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular
+ files and directories. This advisory flag can be set or cleared at any
+ time, but doing so does not immediately affect the S_DAX state.
+
+ 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
+ be inherited by all regular files and subdirectories that are subsequently
+ created in this directory. Files and subdirectories that exist at the time
+ this flag is set or cleared on the parent directory are not modified by
+ this modification of the parent directory.
+
+ 4. There exist dax mount options which can override FS_XFLAG_DAX in the
+ setting of the S_DAX flag. Given underlying storage which supports DAX the
+ following hold:
+
+ "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default.
+
+ "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX."
+
+ "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
+
+ "-o dax" is a legacy option which is an alias for "dax=always".
+ This may be removed in the future so "-o dax=always" is
+ the preferred method for specifying this behavior.
+
+ NOTE: Modifications to and the inheritance behavior of FS_XFLAG_DAX remain
+ the same even when the filesystem is mounted with a dax option. However,
+ in-core inode state (S_DAX) will be overridden until the filesystem is
+ remounted with dax=inode and the inode is evicted from kernel memory.
+
+ 5. The S_DAX policy can be changed via:
+
+ a) Setting the parent directory FS_XFLAG_DAX as needed before files are
+ created
+
+ b) Setting the appropriate dax="foo" mount option
+
+ c) Changing the FS_XFLAG_DAX flag on existing regular files and
+ directories. This has runtime constraints and limitations that are
+ described in 6) below.
+
+ 6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX flag,
+ the change in behaviour for existing regular files may not occur
+ immediately. If the change must take effect immediately, the administrator
+ needs to:
+
+ a) stop the application so there are no active references to the data set
+ the policy change will affect
+
+ b) evict the data set from kernel caches so it will be re-instantiated when
+ the application is restarted. This can be achieved by:
+
+ i. drop-caches
+ ii. a filesystem unmount and mount cycle
+ iii. a system reboot
+
+
+Details
+-------
+
+There are 2 per-file dax flags. One is a persistent inode setting (FS_XFLAG_DAX)
+and the other is a volatile flag indicating the active state of the feature
+(S_DAX).
+
+FS_XFLAG_DAX is preserved within the filesystem. This persistent config
+setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
+(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
+
+New files and directories automatically inherit FS_XFLAG_DAX from
+their parent directory _when_ _created_. Therefore, setting FS_XFLAG_DAX at
+directory creation time can be used to set a default behavior for an entire
+sub-tree.
+
+To clarify inheritance, here are 3 examples:
+
+Example A:
+
+mkdir -p a/b/c
+xfs_io -c 'chattr +x' a
+mkdir a/b/c/d
+mkdir a/e
+
+ dax: a,e
+ no dax: b,c,d
+
+Example B:
+
+mkdir a
+xfs_io -c 'chattr +x' a
+mkdir -p a/b/c/d
+
+ dax: a,b,c,d
+ no dax:
+
+Example C:
+
+mkdir -p a/b/c
+xfs_io -c 'chattr +x' c
+mkdir a/b/c/d
+
+ dax: c,d
+ no dax: a,b
+
+
+The current enabled state (S_DAX) is set when a file inode is instantiated in
+memory by the kernel. It is set based on the underlying media support, the
+value of FS_XFLAG_DAX and the filesystem's dax mount option.
+
+statx can be used to query S_DAX. NOTE that only regular files will ever have
+S_DAX set and therefore statx will never indicate that S_DAX is set on
+directories.
+
+Setting the FS_XFLAG_DAX flag (specifically or through inheritance) occurs even
+if the underlying media does not support dax and/or the filesystem is
+overridden with a mount option.
+
Implementation Tips for Block Driver Writers
@@ -74,7 +210,7 @@ are zeroed out and converted to written extents before being returned to avoid
exposure of uninitialized data through mmap.
These filesystems may be used for inspiration:
-- ext2: see Documentation/filesystems/ext2.txt
+- ext2: see Documentation/filesystems/ext2.rst
- ext4: see Documentation/filesystems/ext4/
- xfs: see Documentation/admin-guide/xfs.rst
@@ -94,7 +230,7 @@ sysadmins have an option to restore the lost data from a prior backup/inbuilt
redundancy in the following ways:
1. Delete the affected file, and restore from a backup (sysadmin route):
- This will free the file system blocks that were being used by the file,
+ This will free the filesystem blocks that were being used by the file,
and the next time they're allocated, they will be zeroed first, which
happens through the driver, and will clear bad sectors.
diff --git a/Documentation/filesystems/debugfs.rst b/Documentation/filesystems/debugfs.rst
index db9ea0854040..1da7a4b7383d 100644
--- a/Documentation/filesystems/debugfs.rst
+++ b/Documentation/filesystems/debugfs.rst
@@ -79,8 +79,8 @@ created with any of::
struct dentry *parent, u8 *value);
void debugfs_create_u16(const char *name, umode_t mode,
struct dentry *parent, u16 *value);
- struct dentry *debugfs_create_u32(const char *name, umode_t mode,
- struct dentry *parent, u32 *value);
+ void debugfs_create_u32(const char *name, umode_t mode,
+ struct dentry *parent, u32 *value);
void debugfs_create_u64(const char *name, umode_t mode,
struct dentry *parent, u64 *value);
@@ -166,16 +166,17 @@ file::
};
struct debugfs_regset32 {
- struct debugfs_reg32 *regs;
+ const struct debugfs_reg32 *regs;
int nregs;
void __iomem *base;
+ struct device *dev; /* Optional device for Runtime PM */
};
debugfs_create_regset32(const char *name, umode_t mode,
struct dentry *parent,
struct debugfs_regset32 *regset);
- void debugfs_print_regs32(struct seq_file *s, struct debugfs_reg32 *regs,
+ void debugfs_print_regs32(struct seq_file *s, const struct debugfs_reg32 *regs,
int nregs, void __iomem *base, char *prefix);
The "base" argument may be 0, but you may want to build the reg32 array
diff --git a/Documentation/filesystems/devpts.rst b/Documentation/filesystems/devpts.rst
new file mode 100644
index 000000000000..a03248ddfb4c
--- /dev/null
+++ b/Documentation/filesystems/devpts.rst
@@ -0,0 +1,36 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+The Devpts Filesystem
+=====================
+
+Each mount of the devpts filesystem is now distinct such that ptys
+and their indicies allocated in one mount are independent from ptys
+and their indicies in all other mounts.
+
+All mounts of the devpts filesystem now create a ``/dev/pts/ptmx`` node
+with permissions ``0000``.
+
+To retain backwards compatibility the a ptmx device node (aka any node
+created with ``mknod name c 5 2``) when opened will look for an instance
+of devpts under the name ``pts`` in the same directory as the ptmx device
+node.
+
+As an option instead of placing a ``/dev/ptmx`` device node at ``/dev/ptmx``
+it is possible to place a symlink to ``/dev/pts/ptmx`` at ``/dev/ptmx`` or
+to bind mount ``/dev/ptx/ptmx`` to ``/dev/ptmx``. If you opt for using
+the devpts filesystem in this manner devpts should be mounted with
+the ``ptmxmode=0666``, or ``chmod 0666 /dev/pts/ptmx`` should be called.
+
+Total count of pty pairs in all instances is limited by sysctls::
+
+ kernel.pty.max = 4096 - global limit
+ kernel.pty.reserve = 1024 - reserved for filesystems mounted from the initial mount namespace
+ kernel.pty.nr - current count of ptys
+
+Per-instance limit could be set by adding mount option ``max=<count>``.
+
+This feature was added in kernel 3.4 together with
+``sysctl kernel.pty.reserve``.
+
+In kernels older than 3.4 sysctl ``kernel.pty.max`` works as per-instance limit.
diff --git a/Documentation/filesystems/devpts.txt b/Documentation/filesystems/devpts.txt
deleted file mode 100644
index 9f94fe276dea..000000000000
--- a/Documentation/filesystems/devpts.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-Each mount of the devpts filesystem is now distinct such that ptys
-and their indicies allocated in one mount are independent from ptys
-and their indicies in all other mounts.
-
-All mounts of the devpts filesystem now create a /dev/pts/ptmx node
-with permissions 0000.
-
-To retain backwards compatibility the a ptmx device node (aka any node
-created with "mknod name c 5 2") when opened will look for an instance
-of devpts under the name "pts" in the same directory as the ptmx device
-node.
-
-As an option instead of placing a /dev/ptmx device node at /dev/ptmx
-it is possible to place a symlink to /dev/pts/ptmx at /dev/ptmx or
-to bind mount /dev/ptx/ptmx to /dev/ptmx. If you opt for using
-the devpts filesystem in this manner devpts should be mounted with
-the ptmxmode=0666, or chmod 0666 /dev/pts/ptmx should be called.
-
-Total count of pty pairs in all instances is limited by sysctls:
-kernel.pty.max = 4096 - global limit
-kernel.pty.reserve = 1024 - reserved for filesystems mounted from the initial mount namespace
-kernel.pty.nr - current count of ptys
-
-Per-instance limit could be set by adding mount option "max=<count>".
-This feature was added in kernel 3.4 together with sysctl kernel.pty.reserve.
-In kernels older than 3.4 sysctl kernel.pty.max works as per-instance limit.
diff --git a/Documentation/filesystems/dnotify.txt b/Documentation/filesystems/dnotify.rst
index 15156883d321..a28a1f9ef79c 100644
--- a/Documentation/filesystems/dnotify.txt
+++ b/Documentation/filesystems/dnotify.rst
@@ -1,5 +1,8 @@
- Linux Directory Notification
- ============================
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+Linux Directory Notification
+============================
Stephen Rothwell <sfr@canb.auug.org.au>
@@ -12,6 +15,7 @@ being delivered using signals.
The application decides which "events" it wants to be notified about.
The currently defined events are:
+ ========= =====================================================
DN_ACCESS A file in the directory was accessed (read)
DN_MODIFY A file in the directory was modified (write,truncate)
DN_CREATE A file was created in the directory
@@ -19,6 +23,7 @@ The currently defined events are:
DN_RENAME A file in the directory was renamed
DN_ATTRIB A file in the directory had its attributes
changed (chmod,chown)
+ ========= =====================================================
Usually, the application must reregister after each notification, but
if DN_MULTISHOT is or'ed with the event mask, then the registration will
@@ -36,7 +41,7 @@ especially important if DN_MULTISHOT is specified. Note that SIGRTMIN
is often blocked, so it is better to use (at least) SIGRTMIN + 1.
Implementation expectations (features and bugs :-))
----------------------------
+---------------------------------------------------
The notification should work for any local access to files even if the
actual file system is on a remote server. This implies that remote
@@ -67,4 +72,4 @@ See tools/testing/selftests/filesystems/dnotify_test.c for an example.
NOTE
----
Beginning with Linux 2.6.13, dnotify has been replaced by inotify.
-See Documentation/filesystems/inotify.txt for more information on it.
+See Documentation/filesystems/inotify.rst for more information on it.
diff --git a/Documentation/filesystems/efivarfs.rst b/Documentation/filesystems/efivarfs.rst
index 90ac65683e7e..0551985821b8 100644
--- a/Documentation/filesystems/efivarfs.rst
+++ b/Documentation/filesystems/efivarfs.rst
@@ -24,3 +24,20 @@ files that are not well-known standardized variables are created
as immutable files. This doesn't prevent removal - "chattr -i" will work -
but it does prevent this kind of failure from being accomplished
accidentally.
+
+.. warning ::
+ When a content of an UEFI variable in /sys/firmware/efi/efivars is
+ displayed, for example using "hexdump", pay attention that the first
+ 4 bytes of the output represent the UEFI variable attributes,
+ in little-endian format.
+
+ Practically the output of each efivar is composed of:
+
+ +-----------------------------------+
+ |4_bytes_of_attributes + efivar_data|
+ +-----------------------------------+
+
+*See also:*
+
+- Documentation/admin-guide/acpi/ssdt-overlays.rst
+- Documentation/ABI/stable/sysfs-firmware-efi-vars
diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst
index 87d794bc75a4..4218ac658629 100644
--- a/Documentation/filesystems/f2fs.rst
+++ b/Documentation/filesystems/f2fs.rst
@@ -225,8 +225,12 @@ fsync_mode=%s Control the policy of fsync. Currently supports "posix",
pass, but the performance will regress. "nobarrier" is
based on "posix", but doesn't issue flush command for
non-atomic files likewise "nobarrier" mount option.
-test_dummy_encryption Enable dummy encryption, which provides a fake fscrypt
+test_dummy_encryption
+test_dummy_encryption=%s
+ Enable dummy encryption, which provides a fake fscrypt
context. The fake fscrypt context is used by xfstests.
+ The argument may be either "v1" or "v2", in order to
+ select the corresponding fscrypt policy version.
checkpoint=%s[:%u[%]] Set to "disable" to turn off checkpointing. Set to "enable"
to reenable checkpointing. Is enabled by default. While
disabled, any unmounting or unexpected shutdowns will cause
diff --git a/Documentation/filesystems/fiemap.txt b/Documentation/filesystems/fiemap.rst
index ac87e6fda842..2a572e7edc08 100644
--- a/Documentation/filesystems/fiemap.txt
+++ b/Documentation/filesystems/fiemap.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
============
Fiemap Ioctl
============
@@ -10,9 +12,9 @@ returns a list of extents.
Request Basics
--------------
-A fiemap request is encoded within struct fiemap:
+A fiemap request is encoded within struct fiemap::
-struct fiemap {
+ struct fiemap {
__u64 fm_start; /* logical offset (inclusive) at
* which to start mapping (in) */
__u64 fm_length; /* logical length of mapping which
@@ -23,7 +25,7 @@ struct fiemap {
__u32 fm_extent_count; /* size of fm_extents array (in) */
__u32 fm_reserved;
struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */
-};
+ };
fm_start, and fm_length specify the logical range within the file
@@ -51,12 +53,12 @@ nothing to prevent the file from changing between calls to FIEMAP.
The following flags can be set in fm_flags:
-* FIEMAP_FLAG_SYNC
-If this flag is set, the kernel will sync the file before mapping extents.
+FIEMAP_FLAG_SYNC
+ If this flag is set, the kernel will sync the file before mapping extents.
-* FIEMAP_FLAG_XATTR
-If this flag is set, the extents returned will describe the inodes
-extended attribute lookup tree, instead of its data tree.
+FIEMAP_FLAG_XATTR
+ If this flag is set, the extents returned will describe the inodes
+ extended attribute lookup tree, instead of its data tree.
Extent Mapping
@@ -75,18 +77,18 @@ complete the requested range and will not have the FIEMAP_EXTENT_LAST
flag set (see the next section on extent flags).
Each extent is described by a single fiemap_extent structure as
-returned in fm_extents.
-
-struct fiemap_extent {
- __u64 fe_logical; /* logical offset in bytes for the start of
- * the extent */
- __u64 fe_physical; /* physical offset in bytes for the start
- * of the extent */
- __u64 fe_length; /* length in bytes for the extent */
- __u64 fe_reserved64[2];
- __u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */
- __u32 fe_reserved[3];
-};
+returned in fm_extents::
+
+ struct fiemap_extent {
+ __u64 fe_logical; /* logical offset in bytes for the start of
+ * the extent */
+ __u64 fe_physical; /* physical offset in bytes for the start
+ * of the extent */
+ __u64 fe_length; /* length in bytes for the extent */
+ __u64 fe_reserved64[2];
+ __u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */
+ __u32 fe_reserved[3];
+ };
All offsets and lengths are in bytes and mirror those on disk. It is valid
for an extents logical offset to start before the request or its logical
@@ -114,26 +116,27 @@ worry about all present and future flags which might imply unaligned
data. Note that the opposite is not true - it would be valid for
FIEMAP_EXTENT_NOT_ALIGNED to appear alone.
-* FIEMAP_EXTENT_LAST
-This is generally the last extent in the file. A mapping attempt past
-this extent may return nothing. Some implementations set this flag to
-indicate this extent is the last one in the range queried by the user
-(via fiemap->fm_length).
+FIEMAP_EXTENT_LAST
+ This is generally the last extent in the file. A mapping attempt past
+ this extent may return nothing. Some implementations set this flag to
+ indicate this extent is the last one in the range queried by the user
+ (via fiemap->fm_length).
+
+FIEMAP_EXTENT_UNKNOWN
+ The location of this extent is currently unknown. This may indicate
+ the data is stored on an inaccessible volume or that no storage has
+ been allocated for the file yet.
-* FIEMAP_EXTENT_UNKNOWN
-The location of this extent is currently unknown. This may indicate
-the data is stored on an inaccessible volume or that no storage has
-been allocated for the file yet.
+FIEMAP_EXTENT_DELALLOC
+ This will also set FIEMAP_EXTENT_UNKNOWN.
-* FIEMAP_EXTENT_DELALLOC
- - This will also set FIEMAP_EXTENT_UNKNOWN.
-Delayed allocation - while there is data for this extent, its
-physical location has not been allocated yet.
+ Delayed allocation - while there is data for this extent, its
+ physical location has not been allocated yet.
-* FIEMAP_EXTENT_ENCODED
-This extent does not consist of plain filesystem blocks but is
-encoded (e.g. encrypted or compressed). Reading the data in this
-extent via I/O to the block device will have undefined results.
+FIEMAP_EXTENT_ENCODED
+ This extent does not consist of plain filesystem blocks but is
+ encoded (e.g. encrypted or compressed). Reading the data in this
+ extent via I/O to the block device will have undefined results.
Note that it is *always* undefined to try to update the data
in-place by writing to the indicated location without the
@@ -145,32 +148,32 @@ unmounted, and then only if the FIEMAP_EXTENT_ENCODED flag is
clear; user applications must not try reading or writing to the
filesystem via the block device under any other circumstances.
-* FIEMAP_EXTENT_DATA_ENCRYPTED
- - This will also set FIEMAP_EXTENT_ENCODED
-The data in this extent has been encrypted by the file system.
+FIEMAP_EXTENT_DATA_ENCRYPTED
+ This will also set FIEMAP_EXTENT_ENCODED
+ The data in this extent has been encrypted by the file system.
-* FIEMAP_EXTENT_NOT_ALIGNED
-Extent offsets and length are not guaranteed to be block aligned.
+FIEMAP_EXTENT_NOT_ALIGNED
+ Extent offsets and length are not guaranteed to be block aligned.
-* FIEMAP_EXTENT_DATA_INLINE
+FIEMAP_EXTENT_DATA_INLINE
This will also set FIEMAP_EXTENT_NOT_ALIGNED
-Data is located within a meta data block.
+ Data is located within a meta data block.
-* FIEMAP_EXTENT_DATA_TAIL
+FIEMAP_EXTENT_DATA_TAIL
This will also set FIEMAP_EXTENT_NOT_ALIGNED
-Data is packed into a block with data from other files.
+ Data is packed into a block with data from other files.
-* FIEMAP_EXTENT_UNWRITTEN
-Unwritten extent - the extent is allocated but its data has not been
-initialized. This indicates the extent's data will be all zero if read
-through the filesystem but the contents are undefined if read directly from
-the device.
+FIEMAP_EXTENT_UNWRITTEN
+ Unwritten extent - the extent is allocated but its data has not been
+ initialized. This indicates the extent's data will be all zero if read
+ through the filesystem but the contents are undefined if read directly from
+ the device.
-* FIEMAP_EXTENT_MERGED
-This will be set when a file does not support extents, i.e., it uses a block
-based addressing scheme. Since returning an extent for each block back to
-userspace would be highly inefficient, the kernel will try to merge most
-adjacent blocks into 'extents'.
+FIEMAP_EXTENT_MERGED
+ This will be set when a file does not support extents, i.e., it uses a block
+ based addressing scheme. Since returning an extent for each block back to
+ userspace would be highly inefficient, the kernel will try to merge most
+ adjacent blocks into 'extents'.
VFS -> File System Implementation
@@ -179,23 +182,23 @@ VFS -> File System Implementation
File systems wishing to support fiemap must implement a ->fiemap callback on
their inode_operations structure. The fs ->fiemap call is responsible for
defining its set of supported fiemap flags, and calling a helper function on
-each discovered extent:
+each discovered extent::
-struct inode_operations {
+ struct inode_operations {
...
int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
u64 len);
->fiemap is passed struct fiemap_extent_info which describes the
-fiemap request:
+fiemap request::
-struct fiemap_extent_info {
+ struct fiemap_extent_info {
unsigned int fi_flags; /* Flags as passed from user */
unsigned int fi_extents_mapped; /* Number of mapped extents */
unsigned int fi_extents_max; /* Size of fiemap_extent array */
struct fiemap_extent *fi_extents_start; /* Start of fiemap_extent array */
-};
+ };
It is intended that the file system should not need to access any of this
structure directly. Filesystem handlers should be tolerant to signals and return
@@ -203,9 +206,9 @@ EINTR once fatal signal received.
Flag checking should be done at the beginning of the ->fiemap callback via the
-fiemap_check_flags() helper:
+fiemap_check_flags() helper::
-int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
+ int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
The struct fieinfo should be passed in as received from ioctl_fiemap(). The
set of fiemap flags which the fs understands should be passed via fs_flags. If
@@ -216,10 +219,10 @@ ioctl_fiemap().
For each extent in the request range, the file system should call
-the helper function, fiemap_fill_next_extent():
+the helper function, fiemap_fill_next_extent()::
-int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
- u64 phys, u64 len, u32 flags, u32 dev);
+ int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
+ u64 phys, u64 len, u32 flags, u32 dev);
fiemap_fill_next_extent() will use the passed values to populate the
next free extent in the fm_extents array. 'General' extent flags will
diff --git a/Documentation/filesystems/files.txt b/Documentation/filesystems/files.rst
index 46dfc6b038c3..cbf8e57376bf 100644
--- a/Documentation/filesystems/files.txt
+++ b/Documentation/filesystems/files.rst
@@ -1,5 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================
File management in the Linux kernel
------------------------------------
+===================================
This document describes how locking for files (struct file)
and file descriptor table (struct files) works.
@@ -34,7 +37,7 @@ appear atomic. Here are the locking rules for
the fdtable structure -
1. All references to the fdtable must be done through
- the files_fdtable() macro :
+ the files_fdtable() macro::
struct fdtable *fdt;
@@ -61,7 +64,8 @@ the fdtable structure -
4. To look up the file structure given an fd, a reader
must use either fcheck() or fcheck_files() APIs. These
take care of barrier requirements due to lock-free lookup.
- An example :
+
+ An example::
struct file *file;
@@ -77,7 +81,7 @@ the fdtable structure -
of the fd (fget()/fget_light()) are lock-free, it is possible
that look-up may race with the last put() operation on the
file structure. This is avoided using atomic_long_inc_not_zero()
- on ->f_count :
+ on ->f_count::
rcu_read_lock();
file = fcheck_files(files, fd);
@@ -106,7 +110,8 @@ the fdtable structure -
holding files->file_lock. If ->file_lock is dropped, then
another thread expand the files thereby creating a new
fdtable and making the earlier fdtable pointer stale.
- For example :
+
+ For example::
spin_lock(&files->file_lock);
fd = locate_fd(files, file, start);
diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
index aa072112cfff..f517af8ec11c 100644
--- a/Documentation/filesystems/fscrypt.rst
+++ b/Documentation/filesystems/fscrypt.rst
@@ -292,8 +292,22 @@ files' data differently, inode numbers are included in the IVs.
Consequently, shrinking the filesystem may not be allowed.
This format is optimized for use with inline encryption hardware
-compliant with the UFS or eMMC standards, which support only 64 IV
-bits per I/O request and may have only a small number of keyslots.
+compliant with the UFS standard, which supports only 64 IV bits per
+I/O request and may have only a small number of keyslots.
+
+IV_INO_LBLK_32 policies
+-----------------------
+
+IV_INO_LBLK_32 policies work like IV_INO_LBLK_64, except that for
+IV_INO_LBLK_32, the inode number is hashed with SipHash-2-4 (where the
+SipHash key is derived from the master key) and added to the file
+logical block number mod 2^32 to produce a 32-bit IV.
+
+This format is optimized for use with inline encryption hardware
+compliant with the eMMC v5.2 standard, which supports only 32 IV bits
+per I/O request and may have only a small number of keyslots. This
+format results in some level of IV reuse, so it should only be used
+when necessary due to hardware limitations.
Key identifiers
---------------
@@ -369,6 +383,10 @@ a little endian number, except that:
to 32 bits and is placed in bits 0-31 of the IV. The inode number
(which is also limited to 32 bits) is placed in bits 32-63.
+- With `IV_INO_LBLK_32 policies`_, the logical block number is limited
+ to 32 bits and is placed in bits 0-31 of the IV. The inode number
+ is then hashed and added mod 2^32.
+
Note that because file logical block numbers are included in the IVs,
filesystems must enforce that blocks are never shifted around within
encrypted files, e.g. via "collapse range" or "insert range".
@@ -465,8 +483,15 @@ This structure must be initialized as follows:
(0x3).
- FSCRYPT_POLICY_FLAG_DIRECT_KEY: See `DIRECT_KEY policies`_.
- FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64: See `IV_INO_LBLK_64
- policies`_. This is mutually exclusive with DIRECT_KEY and is not
- supported on v1 policies.
+ policies`_.
+ - FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32: See `IV_INO_LBLK_32
+ policies`_.
+
+ v1 encryption policies only support the PAD_* and DIRECT_KEY flags.
+ The other flags are only supported by v2 encryption policies.
+
+ The DIRECT_KEY, IV_INO_LBLK_64, and IV_INO_LBLK_32 flags are
+ mutually exclusive.
- For v2 encryption policies, ``__reserved`` must be zeroed.
diff --git a/Documentation/filesystems/fuse-io.txt b/Documentation/filesystems/fuse-io.rst
index 07b8f73f100f..255a368fe534 100644
--- a/Documentation/filesystems/fuse-io.txt
+++ b/Documentation/filesystems/fuse-io.rst
@@ -1,3 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+Fuse I/O Modes
+==============
+
Fuse supports the following I/O modes:
- direct-io
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index e7b46dac7079..17795341e0a3 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -24,6 +24,22 @@ algorithms work.
splice
locking
directory-locking
+ devpts
+ dnotify
+ fiemap
+ files
+ locks
+ mandatory-locking
+ mount_api
+ quota
+ seq_file
+ sharedsubtree
+ sysfs-pci
+ sysfs-tagging
+
+ automount-support
+
+ caching/index
porting
@@ -57,7 +73,10 @@ Documentation for filesystem implementations.
befs
bfs
btrfs
+ cifs/cifsroot
ceph
+ coda
+ configfs
cramfs
debugfs
dlmfs
@@ -73,6 +92,7 @@ Documentation for filesystem implementations.
hfsplus
hpfs
fuse
+ fuse-io
inotify
isofs
nilfs2
@@ -88,6 +108,7 @@ Documentation for filesystem implementations.
ramfs-rootfs-initramfs
relay
romfs
+ spufs/index
squashfs
sysfs
sysv-fs
@@ -97,4 +118,6 @@ Documentation for filesystem implementations.
udf
virtiofs
vfat
+ xfs-delayed-logging-design
+ xfs-self-describing-metadata
zonefs
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 5057e4d9dcd1..0af2e0e11461 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -239,6 +239,7 @@ prototypes::
int (*readpage)(struct file *, struct page *);
int (*writepages)(struct address_space *, struct writeback_control *);
int (*set_page_dirty)(struct page *page);
+ void (*readahead)(struct readahead_control *);
int (*readpages)(struct file *filp, struct address_space *mapping,
struct list_head *pages, unsigned nr_pages);
int (*write_begin)(struct file *, struct address_space *mapping,
@@ -271,7 +272,8 @@ writepage: yes, unlocks (see below)
readpage: yes, unlocks
writepages:
set_page_dirty no
-readpages:
+readahead: yes, unlocks
+readpages: no
write_begin: locks the page exclusive
write_end: yes, unlocks exclusive
bmap:
@@ -295,6 +297,8 @@ the request handler (/dev/loop).
->readpage() unlocks the page, either synchronously or via I/O
completion.
+->readahead() unlocks the pages that I/O is attempted on like ->readpage().
+
->readpages() populates the pagecache with the passed pages and starts
I/O against them. They come unlocked upon I/O completion.
diff --git a/Documentation/filesystems/locks.txt b/Documentation/filesystems/locks.rst
index 5368690f412e..c5ae858b1aac 100644
--- a/Documentation/filesystems/locks.txt
+++ b/Documentation/filesystems/locks.rst
@@ -1,4 +1,8 @@
- File Locking Release Notes
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
+File Locking Release Notes
+==========================
Andy Walker <andy@lysaker.kvaerner.no>
@@ -6,7 +10,7 @@
1. What's New?
---------------
+==============
1.1 Broken Flock Emulation
--------------------------
@@ -25,7 +29,7 @@ anyway (see the file "Documentation/process/changes.rst".)
---------------------------
1.2.1 Typical Problems - Sendmail
----------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Because sendmail was unable to use the old flock() emulation, many sendmail
installations use fcntl() instead of flock(). This is true of Slackware 3.0
for example. This gave rise to some other subtle problems if sendmail was
@@ -37,7 +41,7 @@ to lock solid with deadlocked processes.
1.2.2 The Solution
-------------------
+^^^^^^^^^^^^^^^^^^
The solution I have chosen, after much experimentation and discussion,
is to make flock() and fcntl() locks oblivious to each other. Both can
exists, and neither will have any effect on the other.
@@ -54,7 +58,7 @@ fcntl(), with all the problems that implies.
---------------------------------------
Mandatory locking, as described in
-'Documentation/filesystems/mandatory-locking.txt' was prior to this release a
+'Documentation/filesystems/mandatory-locking.rst' was prior to this release a
general configuration option that was valid for all mounted filesystems. This
had a number of inherent dangers, not the least of which was the ability to
freeze an NFS server by asking it to read a file for which a mandatory lock
diff --git a/Documentation/filesystems/mandatory-locking.txt b/Documentation/filesystems/mandatory-locking.rst
index a251ca33164a..9ce73544a8f0 100644
--- a/Documentation/filesystems/mandatory-locking.txt
+++ b/Documentation/filesystems/mandatory-locking.rst
@@ -1,8 +1,13 @@
- Mandatory File Locking For The Linux Operating System
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================================================
+Mandatory File Locking For The Linux Operating System
+=====================================================
Andy Walker <andy@lysaker.kvaerner.no>
15 April 1996
+
(Updated September 2007)
0. Why you should avoid mandatory locking
@@ -53,15 +58,17 @@ possible on existing user code. The scheme is based on marking individual files
as candidates for mandatory locking, and using the existing fcntl()/lockf()
interface for applying locks just as if they were normal, advisory locks.
-Note 1: In saying "file" in the paragraphs above I am actually not telling
-the whole truth. System V locking is based on fcntl(). The granularity of
-fcntl() is such that it allows the locking of byte ranges in files, in addition
-to entire files, so the mandatory locking rules also have byte level
-granularity.
+.. Note::
+
+ 1. In saying "file" in the paragraphs above I am actually not telling
+ the whole truth. System V locking is based on fcntl(). The granularity of
+ fcntl() is such that it allows the locking of byte ranges in files, in
+ addition to entire files, so the mandatory locking rules also have byte
+ level granularity.
-Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite
-borrowing the fcntl() locking scheme from System V. The mandatory locking
-scheme is defined by the System V Interface Definition (SVID) Version 3.
+ 2. POSIX.1 does not specify any scheme for mandatory locking, despite
+ borrowing the fcntl() locking scheme from System V. The mandatory locking
+ scheme is defined by the System V Interface Definition (SVID) Version 3.
2. Marking a file for mandatory locking
---------------------------------------
diff --git a/Documentation/filesystems/mount_api.txt b/Documentation/filesystems/mount_api.rst
index 87c14bbb2b35..dea22d64f060 100644
--- a/Documentation/filesystems/mount_api.txt
+++ b/Documentation/filesystems/mount_api.rst
@@ -1,8 +1,10 @@
- ====================
- FILESYSTEM MOUNT API
- ====================
+.. SPDX-License-Identifier: GPL-2.0
-CONTENTS
+====================
+fILESYSTEM Mount API
+====================
+
+.. CONTENTS
(1) Overview.
@@ -21,8 +23,7 @@ CONTENTS
(8) Parameter helper functions.
-========
-OVERVIEW
+Overview
========
The creation of new mounts is now to be done in a multistep process:
@@ -43,7 +44,7 @@ The creation of new mounts is now to be done in a multistep process:
(7) Destroy the context.
-To support this, the file_system_type struct gains two new fields:
+To support this, the file_system_type struct gains two new fields::
int (*init_fs_context)(struct fs_context *fc);
const struct fs_parameter_description *parameters;
@@ -57,12 +58,11 @@ Note that security initialisation is done *after* the filesystem is called so
that the namespaces may be adjusted first.
-======================
-THE FILESYSTEM CONTEXT
+The Filesystem context
======================
The creation and reconfiguration of a superblock is governed by a filesystem
-context. This is represented by the fs_context structure:
+context. This is represented by the fs_context structure::
struct fs_context {
const struct fs_context_operations *ops;
@@ -86,78 +86,106 @@ context. This is represented by the fs_context structure:
The fs_context fields are as follows:
- (*) const struct fs_context_operations *ops
+ * ::
+
+ const struct fs_context_operations *ops
These are operations that can be done on a filesystem context (see
below). This must be set by the ->init_fs_context() file_system_type
operation.
- (*) struct file_system_type *fs_type
+ * ::
+
+ struct file_system_type *fs_type
A pointer to the file_system_type of the filesystem that is being
constructed or reconfigured. This retains a reference on the type owner.
- (*) void *fs_private
+ * ::
+
+ void *fs_private
A pointer to the file system's private data. This is where the filesystem
will need to store any options it parses.
- (*) struct dentry *root
+ * ::
+
+ struct dentry *root
A pointer to the root of the mountable tree (and indirectly, the
superblock thereof). This is filled in by the ->get_tree() op. If this
is set, an active reference on root->d_sb must also be held.
- (*) struct user_namespace *user_ns
- (*) struct net *net_ns
+ * ::
+
+ struct user_namespace *user_ns
+ struct net *net_ns
There are a subset of the namespaces in use by the invoking process. They
retain references on each namespace. The subscribed namespaces may be
replaced by the filesystem to reflect other sources, such as the parent
mount superblock on an automount.
- (*) const struct cred *cred
+ * ::
+
+ const struct cred *cred
The mounter's credentials. This retains a reference on the credentials.
- (*) char *source
+ * ::
+
+ char *source
This specifies the source. It may be a block device (e.g. /dev/sda1) or
something more exotic, such as the "host:/path" that NFS desires.
- (*) char *subtype
+ * ::
+
+ char *subtype
This is a string to be added to the type displayed in /proc/mounts to
qualify it (used by FUSE). This is available for the filesystem to set if
desired.
- (*) void *security
+ * ::
+
+ void *security
A place for the LSMs to hang their security data for the superblock. The
relevant security operations are described below.
- (*) void *s_fs_info
+ * ::
+
+ void *s_fs_info
The proposed s_fs_info for a new superblock, set in the superblock by
sget_fc(). This can be used to distinguish superblocks.
- (*) unsigned int sb_flags
- (*) unsigned int sb_flags_mask
+ * ::
+
+ unsigned int sb_flags
+ unsigned int sb_flags_mask
Which bits SB_* flags are to be set/cleared in super_block::s_flags.
- (*) unsigned int s_iflags
+ * ::
+
+ unsigned int s_iflags
These will be bitwise-OR'd with s->s_iflags when a superblock is created.
- (*) enum fs_context_purpose
+ * ::
+
+ enum fs_context_purpose
This indicates the purpose for which the context is intended. The
available values are:
- FS_CONTEXT_FOR_MOUNT, -- New superblock for explicit mount
- FS_CONTEXT_FOR_SUBMOUNT -- New automatic submount of extant mount
- FS_CONTEXT_FOR_RECONFIGURE -- Change an existing mount
+ ========================== ======================================
+ FS_CONTEXT_FOR_MOUNT, New superblock for explicit mount
+ FS_CONTEXT_FOR_SUBMOUNT New automatic submount of extant mount
+ FS_CONTEXT_FOR_RECONFIGURE Change an existing mount
+ ========================== ======================================
The mount context is created by calling vfs_new_fs_context() or
vfs_dup_fs_context() and is destroyed with put_fs_context(). Note that the
@@ -176,11 +204,10 @@ mount context. For instance, NFS might pin the appropriate protocol version
module.
-=================================
-THE FILESYSTEM CONTEXT OPERATIONS
+The Filesystem Context Operations
=================================
-The filesystem context points to a table of operations:
+The filesystem context points to a table of operations::
struct fs_context_operations {
void (*free)(struct fs_context *fc);
@@ -195,24 +222,32 @@ The filesystem context points to a table of operations:
These operations are invoked by the various stages of the mount procedure to
manage the filesystem context. They are as follows:
- (*) void (*free)(struct fs_context *fc);
+ * ::
+
+ void (*free)(struct fs_context *fc);
Called to clean up the filesystem-specific part of the filesystem context
when the context is destroyed. It should be aware that parts of the
context may have been removed and NULL'd out by ->get_tree().
- (*) int (*dup)(struct fs_context *fc, struct fs_context *src_fc);
+ * ::
+
+ int (*dup)(struct fs_context *fc, struct fs_context *src_fc);
Called when a filesystem context has been duplicated to duplicate the
filesystem-private data. An error may be returned to indicate failure to
do this.
- [!] Note that even if this fails, put_fs_context() will be called
+ .. Warning::
+
+ Note that even if this fails, put_fs_context() will be called
immediately thereafter, so ->dup() *must* make the
filesystem-private data safe for ->free().
- (*) int (*parse_param)(struct fs_context *fc,
- struct struct fs_parameter *param);
+ * ::
+
+ int (*parse_param)(struct fs_context *fc,
+ struct struct fs_parameter *param);
Called when a parameter is being added to the filesystem context. param
points to the key name and maybe a value object. VFS-specific options
@@ -224,7 +259,9 @@ manage the filesystem context. They are as follows:
If successful, 0 should be returned or a negative error code otherwise.
- (*) int (*parse_monolithic)(struct fs_context *fc, void *data);
+ * ::
+
+ int (*parse_monolithic)(struct fs_context *fc, void *data);
Called when the mount(2) system call is invoked to pass the entire data
page in one go. If this is expected to be just a list of "key[=val]"
@@ -236,7 +273,9 @@ manage the filesystem context. They are as follows:
finds it's the standard key-val list then it may pass it off to
generic_parse_monolithic().
- (*) int (*get_tree)(struct fs_context *fc);
+ * ::
+
+ int (*get_tree)(struct fs_context *fc);
Called to get or create the mountable root and superblock, using the
information stored in the filesystem context (reconfiguration goes via a
@@ -249,7 +288,9 @@ manage the filesystem context. They are as follows:
The phase on a userspace-driven context will be set to only allow this to
be called once on any particular context.
- (*) int (*reconfigure)(struct fs_context *fc);
+ * ::
+
+ int (*reconfigure)(struct fs_context *fc);
Called to effect reconfiguration of a superblock using information stored
in the filesystem context. It may detach any resources it desires from
@@ -259,19 +300,20 @@ manage the filesystem context. They are as follows:
On success it should return 0. In the case of an error, it should return
a negative error code.
- [NOTE] reconfigure is intended as a replacement for remount_fs.
+ .. Note:: reconfigure is intended as a replacement for remount_fs.
-===========================
-FILESYSTEM CONTEXT SECURITY
+Filesystem context Security
===========================
The filesystem context contains a security pointer that the LSMs can use for
building up a security context for the superblock to be mounted. There are a
number of operations used by the new mount code for this purpose:
- (*) int security_fs_context_alloc(struct fs_context *fc,
- struct dentry *reference);
+ * ::
+
+ int security_fs_context_alloc(struct fs_context *fc,
+ struct dentry *reference);
Called to initialise fc->security (which is preset to NULL) and allocate
any resources needed. It should return 0 on success or a negative error
@@ -283,22 +325,28 @@ number of operations used by the new mount code for this purpose:
non-NULL in the case of a submount (FS_CONTEXT_FOR_SUBMOUNT) in which case
it indicates the automount point.
- (*) int security_fs_context_dup(struct fs_context *fc,
- struct fs_context *src_fc);
+ * ::
+
+ int security_fs_context_dup(struct fs_context *fc,
+ struct fs_context *src_fc);
Called to initialise fc->security (which is preset to NULL) and allocate
any resources needed. The original filesystem context is pointed to by
src_fc and may be used for reference. It should return 0 on success or a
negative error code on failure.
- (*) void security_fs_context_free(struct fs_context *fc);
+ * ::
+
+ void security_fs_context_free(struct fs_context *fc);
Called to clean up anything attached to fc->security. Note that the
contents may have been transferred to a superblock and the pointer cleared
during get_tree.
- (*) int security_fs_context_parse_param(struct fs_context *fc,
- struct fs_parameter *param);
+ * ::
+
+ int security_fs_context_parse_param(struct fs_context *fc,
+ struct fs_parameter *param);
Called for each mount parameter, including the source. The arguments are
as for the ->parse_param() method. It should return 0 to indicate that
@@ -310,7 +358,9 @@ number of operations used by the new mount code for this purpose:
(provided the value pointer is NULL'd out). If it is stolen, 1 must be
returned to prevent it being passed to the filesystem.
- (*) int security_fs_context_validate(struct fs_context *fc);
+ * ::
+
+ int security_fs_context_validate(struct fs_context *fc);
Called after all the options have been parsed to validate the collection
as a whole and to do any necessary allocation so that
@@ -320,36 +370,43 @@ number of operations used by the new mount code for this purpose:
In the case of reconfiguration, the target superblock will be accessible
via fc->root.
- (*) int security_sb_get_tree(struct fs_context *fc);
+ * ::
+
+ int security_sb_get_tree(struct fs_context *fc);
Called during the mount procedure to verify that the specified superblock
is allowed to be mounted and to transfer the security data there. It
should return 0 or a negative error code.
- (*) void security_sb_reconfigure(struct fs_context *fc);
+ * ::
+
+ void security_sb_reconfigure(struct fs_context *fc);
Called to apply any reconfiguration to an LSM's context. It must not
fail. Error checking and resource allocation must be done in advance by
the parameter parsing and validation hooks.
- (*) int security_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
- unsigned int mnt_flags);
+ * ::
+
+ int security_sb_mountpoint(struct fs_context *fc,
+ struct path *mountpoint,
+ unsigned int mnt_flags);
Called during the mount procedure to verify that the root dentry attached
to the context is permitted to be attached to the specified mountpoint.
It should return 0 on success or a negative error code on failure.
-==========================
-VFS FILESYSTEM CONTEXT API
+VFS Filesystem context API
==========================
There are four operations for creating a filesystem context and one for
destroying a context:
- (*) struct fs_context *fs_context_for_mount(
- struct file_system_type *fs_type,
- unsigned int sb_flags);
+ * ::
+
+ struct fs_context *fs_context_for_mount(struct file_system_type *fs_type,
+ unsigned int sb_flags);
Allocate a filesystem context for the purpose of setting up a new mount,
whether that be with a new superblock or sharing an existing one. This
@@ -359,7 +416,9 @@ destroying a context:
fs_type specifies the filesystem type that will manage the context and
sb_flags presets the superblock flags stored therein.
- (*) struct fs_context *fs_context_for_reconfigure(
+ * ::
+
+ struct fs_context *fs_context_for_reconfigure(
struct dentry *dentry,
unsigned int sb_flags,
unsigned int sb_flags_mask);
@@ -369,7 +428,9 @@ destroying a context:
configured. sb_flags and sb_flags_mask indicate which superblock flags
need changing and to what.
- (*) struct fs_context *fs_context_for_submount(
+ * ::
+
+ struct fs_context *fs_context_for_submount(
struct file_system_type *fs_type,
struct dentry *reference);
@@ -382,7 +443,9 @@ destroying a context:
Note that it's not a requirement that the reference dentry be of the same
filesystem type as fs_type.
- (*) struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc);
+ * ::
+
+ struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc);
Duplicate a filesystem context, copying any options noted and duplicating
or additionally referencing any resources held therein. This is available
@@ -392,14 +455,18 @@ destroying a context:
The purpose in the new context is inherited from the old one.
- (*) void put_fs_context(struct fs_context *fc);
+ * ::
+
+ void put_fs_context(struct fs_context *fc);
Destroy a filesystem context, releasing any resources it holds. This
calls the ->free() operation. This is intended to be called by anyone who
created a filesystem context.
- [!] filesystem contexts are not refcounted, so this causes unconditional
- destruction.
+ .. Warning::
+
+ filesystem contexts are not refcounted, so this causes unconditional
+ destruction.
In all the above operations, apart from the put op, the return is a mount
context pointer or a negative error code.
@@ -407,8 +474,10 @@ context pointer or a negative error code.
For the remaining operations, if an error occurs, a negative error code will be
returned.
- (*) int vfs_parse_fs_param(struct fs_context *fc,
- struct fs_parameter *param);
+ * ::
+
+ int vfs_parse_fs_param(struct fs_context *fc,
+ struct fs_parameter *param);
Supply a single mount parameter to the filesystem context. This include
the specification of the source/device which is specified as the "source"
@@ -423,53 +492,64 @@ returned.
The parameter value is typed and can be one of:
- fs_value_is_flag, Parameter not given a value.
- fs_value_is_string, Value is a string
- fs_value_is_blob, Value is a binary blob
- fs_value_is_filename, Value is a filename* + dirfd
- fs_value_is_file, Value is an open file (file*)
+ ==================== =============================
+ fs_value_is_flag Parameter not given a value
+ fs_value_is_string Value is a string
+ fs_value_is_blob Value is a binary blob
+ fs_value_is_filename Value is a filename* + dirfd
+ fs_value_is_file Value is an open file (file*)
+ ==================== =============================
If there is a value, that value is stored in a union in the struct in one
of param->{string,blob,name,file}. Note that the function may steal and
clear the pointer, but then becomes responsible for disposing of the
object.
- (*) int vfs_parse_fs_string(struct fs_context *fc, const char *key,
- const char *value, size_t v_size);
+ * ::
+
+ int vfs_parse_fs_string(struct fs_context *fc, const char *key,
+ const char *value, size_t v_size);
A wrapper around vfs_parse_fs_param() that copies the value string it is
passed.
- (*) int generic_parse_monolithic(struct fs_context *fc, void *data);
+ * ::
+
+ int generic_parse_monolithic(struct fs_context *fc, void *data);
Parse a sys_mount() data page, assuming the form to be a text list
consisting of key[=val] options separated by commas. Each item in the
list is passed to vfs_mount_option(). This is the default when the
->parse_monolithic() method is NULL.
- (*) int vfs_get_tree(struct fs_context *fc);
+ * ::
+
+ int vfs_get_tree(struct fs_context *fc);
Get or create the mountable root and superblock, using the parameters in
the filesystem context to select/configure the superblock. This invokes
the ->get_tree() method.
- (*) struct vfsmount *vfs_create_mount(struct fs_context *fc);
+ * ::
+
+ struct vfsmount *vfs_create_mount(struct fs_context *fc);
Create a mount given the parameters in the specified filesystem context.
Note that this does not attach the mount to anything.
-===========================
-SUPERBLOCK CREATION HELPERS
+Superblock Creation Helpers
===========================
A number of VFS helpers are available for use by filesystems for the creation
or looking up of superblocks.
- (*) struct super_block *
- sget_fc(struct fs_context *fc,
- int (*test)(struct super_block *sb, struct fs_context *fc),
- int (*set)(struct super_block *sb, struct fs_context *fc));
+ * ::
+
+ struct super_block *
+ sget_fc(struct fs_context *fc,
+ int (*test)(struct super_block *sb, struct fs_context *fc),
+ int (*set)(struct super_block *sb, struct fs_context *fc));
This is the core routine. If test is non-NULL, it searches for an
existing superblock matching the criteria held in the fs_context, using
@@ -482,10 +562,12 @@ or looking up of superblocks.
The following helpers all wrap sget_fc():
- (*) int vfs_get_super(struct fs_context *fc,
- enum vfs_get_super_keying keying,
- int (*fill_super)(struct super_block *sb,
- struct fs_context *fc))
+ * ::
+
+ int vfs_get_super(struct fs_context *fc,
+ enum vfs_get_super_keying keying,
+ int (*fill_super)(struct super_block *sb,
+ struct fs_context *fc))
This creates/looks up a deviceless superblock. The keying indicates how
many superblocks of this type may exist and in what manner they may be
@@ -515,14 +597,14 @@ PARAMETER DESCRIPTION
=====================
Parameters are described using structures defined in linux/fs_parser.h.
-There's a core description struct that links everything together:
+There's a core description struct that links everything together::
struct fs_parameter_description {
const struct fs_parameter_spec *specs;
const struct fs_parameter_enum *enums;
};
-For example:
+For example::
enum {
Opt_autocell,
@@ -539,10 +621,12 @@ For example:
The members are as follows:
- (1) const struct fs_parameter_specification *specs;
+ (1) ::
+
+ const struct fs_parameter_specification *specs;
Table of parameter specifications, terminated with a null entry, where the
- entries are of type:
+ entries are of type::
struct fs_parameter_spec {
const char *name;
@@ -558,6 +642,7 @@ The members are as follows:
The 'type' field indicates the desired value type and must be one of:
+ ======================= ======================= =====================
TYPE NAME EXPECTED VALUE RESULT IN
======================= ======================= =====================
fs_param_is_flag No value n/a
@@ -573,19 +658,23 @@ The members are as follows:
fs_param_is_blockdev Blockdev path * Needs lookup
fs_param_is_path Path * Needs lookup
fs_param_is_fd File descriptor result->int_32
+ ======================= ======================= =====================
Note that if the value is of fs_param_is_bool type, fs_parse() will try
to match any string value against "0", "1", "no", "yes", "false", "true".
Each parameter can also be qualified with 'flags':
+ ======================= ================================================
fs_param_v_optional The value is optional
fs_param_neg_with_no result->negated set if key is prefixed with "no"
fs_param_neg_with_empty result->negated set if value is ""
fs_param_deprecated The parameter is deprecated.
+ ======================= ================================================
These are wrapped with a number of convenience wrappers:
+ ======================= ===============================================
MACRO SPECIFIES
======================= ===============================================
fsparam_flag() fs_param_is_flag
@@ -602,9 +691,10 @@ The members are as follows:
fsparam_bdev() fs_param_is_blockdev
fsparam_path() fs_param_is_path
fsparam_fd() fs_param_is_fd
+ ======================= ===============================================
all of which take two arguments, name string and option number - for
- example:
+ example::
static const struct fs_parameter_spec afs_param_specs[] = {
fsparam_flag ("autocell", Opt_autocell),
@@ -618,10 +708,12 @@ The members are as follows:
of arguments to specify the type and the flags for anything that doesn't
match one of the above macros.
- (2) const struct fs_parameter_enum *enums;
+ (2) ::
+
+ const struct fs_parameter_enum *enums;
Table of enum value names to integer mappings, terminated with a null
- entry. This is of type:
+ entry. This is of type::
struct fs_parameter_enum {
u8 opt;
@@ -630,7 +722,7 @@ The members are as follows:
};
Where the array is an unsorted list of { parameter ID, name }-keyed
- elements that indicate the value to map to, e.g.:
+ elements that indicate the value to map to, e.g.::
static const struct fs_parameter_enum afs_param_enums[] = {
{ Opt_bar, "x", 1},
@@ -648,18 +740,19 @@ CONFIG_VALIDATE_FS_PARSER=y) and will allow the description to be queried from
userspace using the fsinfo() syscall.
-==========================
-PARAMETER HELPER FUNCTIONS
+Parameter Helper Functions
==========================
A number of helper functions are provided to help a filesystem or an LSM
process the parameters it is given.
- (*) int lookup_constant(const struct constant_table tbl[],
- const char *name, int not_found);
+ * ::
+
+ int lookup_constant(const struct constant_table tbl[],
+ const char *name, int not_found);
Look up a constant by name in a table of name -> integer mappings. The
- table is an array of elements of the following type:
+ table is an array of elements of the following type::
struct constant_table {
const char *name;
@@ -669,9 +762,11 @@ process the parameters it is given.
If a match is found, the corresponding value is returned. If a match
isn't found, the not_found value is returned instead.
- (*) bool validate_constant_table(const struct constant_table *tbl,
- size_t tbl_size,
- int low, int high, int special);
+ * ::
+
+ bool validate_constant_table(const struct constant_table *tbl,
+ size_t tbl_size,
+ int low, int high, int special);
Validate a constant table. Checks that all the elements are appropriately
ordered, that there are no duplicates and that the values are between low
@@ -682,16 +777,20 @@ process the parameters it is given.
If all is good, true is returned. If the table is invalid, errors are
logged to dmesg and false is returned.
- (*) bool fs_validate_description(const struct fs_parameter_description *desc);
+ * ::
+
+ bool fs_validate_description(const struct fs_parameter_description *desc);
This performs some validation checks on a parameter description. It
returns true if the description is good and false if it is not. It will
log errors to dmesg if validation fails.
- (*) int fs_parse(struct fs_context *fc,
- const struct fs_parameter_description *desc,
- struct fs_parameter *param,
- struct fs_parse_result *result);
+ * ::
+
+ int fs_parse(struct fs_context *fc,
+ const struct fs_parameter_description *desc,
+ struct fs_parameter *param,
+ struct fs_parse_result *result);
This is the main interpreter of parameters. It uses the parameter
description to look up a parameter by key name and to convert that to an
@@ -711,14 +810,16 @@ process the parameters it is given.
parameter is matched, but the value is erroneous, -EINVAL will be
returned; otherwise the parameter's option number will be returned.
- (*) int fs_lookup_param(struct fs_context *fc,
- struct fs_parameter *value,
- bool want_bdev,
- struct path *_path);
+ * ::
+
+ int fs_lookup_param(struct fs_context *fc,
+ struct fs_parameter *value,
+ bool want_bdev,
+ struct path *_path);
This takes a parameter that carries a string or filename type and attempts
to do a path lookup on it. If the parameter expects a blockdev, a check
is made that the inode actually represents one.
- Returns 0 if successful and *_path will be set; returns a negative error
- code if not.
+ Returns 0 if successful and ``*_path`` will be set; returns a negative
+ error code if not.
diff --git a/Documentation/filesystems/orangefs.rst b/Documentation/filesystems/orangefs.rst
index e41369709c5b..463e37694250 100644
--- a/Documentation/filesystems/orangefs.rst
+++ b/Documentation/filesystems/orangefs.rst
@@ -119,9 +119,7 @@ it comes to that question::
/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
-Create an /etc/pvfs2tab file::
-
-Localhost is fine for your pvfs2tab file:
+Create an /etc/pvfs2tab file (localhost is fine)::
echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
/etc/pvfs2tab
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 38b606991065..430963e0e8c3 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -543,6 +543,7 @@ encoded manner. The codes are the following:
hg huge page advise flag
nh no huge page advise flag
mg mergable advise flag
+ bt - arm64 BTI guarded page
== =======================================
Note that there is no guarantee that every flag and associated mnemonic will
@@ -1042,8 +1043,8 @@ PageTables
amount of memory dedicated to the lowest level of page
tables.
NFS_Unstable
- NFS pages sent to the server, but not yet committed to stable
- storage
+ Always zero. Previous counted pages which had been written to
+ the server, but has not been committed to stable storage.
Bounce
Memory used for block device "bounce buffers"
WritebackTmp
@@ -1870,7 +1871,7 @@ unbindable mount is unbindable
For more information on mount propagation see:
- Documentation/filesystems/sharedsubtree.txt
+ Documentation/filesystems/sharedsubtree.rst
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
diff --git a/Documentation/filesystems/quota.txt b/Documentation/filesystems/quota.rst
index 32874b06ebe9..a30cdd47c652 100644
--- a/Documentation/filesystems/quota.txt
+++ b/Documentation/filesystems/quota.rst
@@ -1,4 +1,6 @@
+.. SPDX-License-Identifier: GPL-2.0
+===============
Quota subsystem
===============
@@ -39,6 +41,7 @@ Currently, the interface supports only one message type QUOTA_NL_C_WARNING.
This command is used to send a notification about any of the above mentioned
events. Each message has six attributes. These are (type of the argument is
in parentheses):
+
QUOTA_NL_A_QTYPE (u32)
- type of quota being exceeded (one of USRQUOTA, GRPQUOTA)
QUOTA_NL_A_EXCESS_ID (u64)
@@ -48,20 +51,34 @@ in parentheses):
- UID of a user who caused the event
QUOTA_NL_A_WARNING (u32)
- what kind of limit is exceeded:
- QUOTA_NL_IHARDWARN - inode hardlimit
- QUOTA_NL_ISOFTLONGWARN - inode softlimit is exceeded longer
- than given grace period
- QUOTA_NL_ISOFTWARN - inode softlimit
- QUOTA_NL_BHARDWARN - space (block) hardlimit
- QUOTA_NL_BSOFTLONGWARN - space (block) softlimit is exceeded
- longer than given grace period.
- QUOTA_NL_BSOFTWARN - space (block) softlimit
+
+ QUOTA_NL_IHARDWARN
+ inode hardlimit
+ QUOTA_NL_ISOFTLONGWARN
+ inode softlimit is exceeded longer
+ than given grace period
+ QUOTA_NL_ISOFTWARN
+ inode softlimit
+ QUOTA_NL_BHARDWARN
+ space (block) hardlimit
+ QUOTA_NL_BSOFTLONGWARN
+ space (block) softlimit is exceeded
+ longer than given grace period.
+ QUOTA_NL_BSOFTWARN
+ space (block) softlimit
+
- four warnings are also defined for the event when user stops
exceeding some limit:
- QUOTA_NL_IHARDBELOW - inode hardlimit
- QUOTA_NL_ISOFTBELOW - inode softlimit
- QUOTA_NL_BHARDBELOW - space (block) hardlimit
- QUOTA_NL_BSOFTBELOW - space (block) softlimit
+
+ QUOTA_NL_IHARDBELOW
+ inode hardlimit
+ QUOTA_NL_ISOFTBELOW
+ inode softlimit
+ QUOTA_NL_BHARDBELOW
+ space (block) hardlimit
+ QUOTA_NL_BSOFTBELOW
+ space (block) softlimit
+
QUOTA_NL_A_DEV_MAJOR (u32)
- major number of a device with the affected filesystem
QUOTA_NL_A_DEV_MINOR (u32)
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.rst b/Documentation/filesystems/ramfs-rootfs-initramfs.rst
index 6c576e241d86..3fddacc6bf14 100644
--- a/Documentation/filesystems/ramfs-rootfs-initramfs.rst
+++ b/Documentation/filesystems/ramfs-rootfs-initramfs.rst
@@ -71,7 +71,7 @@ be allowed write access to a ramfs mount.
A ramfs derivative called tmpfs was created to add size limits, and the ability
to write the data to swap space. Normal users can be allowed write access to
-tmpfs mounts. See Documentation/filesystems/tmpfs.txt for more information.
+tmpfs mounts. See Documentation/filesystems/tmpfs.rst for more information.
What is rootfs?
---------------
diff --git a/Documentation/filesystems/seq_file.txt b/Documentation/filesystems/seq_file.rst
index d412b236a9d6..fab302046b13 100644
--- a/Documentation/filesystems/seq_file.txt
+++ b/Documentation/filesystems/seq_file.rst
@@ -1,6 +1,11 @@
-The seq_file interface
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+The seq_file Interface
+======================
Copyright 2003 Jonathan Corbet <corbet@lwn.net>
+
This file is originally from the LWN.net Driver Porting series at
http://lwn.net/Articles/driver-porting/
@@ -43,7 +48,7 @@ loadable module which creates a file called /proc/sequence. The file, when
read, simply produces a set of increasing integer values, one per line. The
sequence will continue until the user loses patience and finds something
better to do. The file is seekable, in that one can do something like the
-following:
+following::
dd if=/proc/sequence of=out1 count=1
dd if=/proc/sequence skip=1 of=out2 count=1
@@ -55,16 +60,18 @@ wanting to see the full source for this module can find it at
http://lwn.net/Articles/22359/).
Deprecated create_proc_entry
+============================
Note that the above article uses create_proc_entry which was removed in
-kernel 3.10. Current versions require the following update
+kernel 3.10. Current versions require the following update::
-- entry = create_proc_entry("sequence", 0, NULL);
-- if (entry)
-- entry->proc_fops = &ct_file_ops;
-+ entry = proc_create("sequence", 0, NULL, &ct_file_ops);
+ - entry = create_proc_entry("sequence", 0, NULL);
+ - if (entry)
+ - entry->proc_fops = &ct_file_ops;
+ + entry = proc_create("sequence", 0, NULL, &ct_file_ops);
The iterator interface
+======================
Modules implementing a virtual file with seq_file must implement an
iterator object that allows stepping through the data of interest
@@ -99,7 +106,7 @@ position. The pos passed to start() will always be either zero, or
the most recent pos used in the previous session.
For our simple sequence example,
-the start() function looks like:
+the start() function looks like::
static void *ct_seq_start(struct seq_file *s, loff_t *pos)
{
@@ -129,7 +136,7 @@ move the iterator forward to the next position in the sequence. The
example module can simply increment the position by one; more useful
modules will do what is needed to step through some data structure. The
next() function returns a new iterator, or NULL if the sequence is
-complete. Here's the example version:
+complete. Here's the example version::
static void *ct_seq_next(struct seq_file *s, void *v, loff_t *pos)
{
@@ -141,10 +148,10 @@ complete. Here's the example version:
The stop() function closes a session; its job, of course, is to clean
up. If dynamic memory is allocated for the iterator, stop() is the
place to free it; if a lock was taken by start(), stop() must release
-that lock. The value that *pos was set to by the last next() call
+that lock. The value that ``*pos`` was set to by the last next() call
before stop() is remembered, and used for the first start() call of
the next session unless lseek() has been called on the file; in that
-case next start() will be asked to start at position zero.
+case next start() will be asked to start at position zero::
static void ct_seq_stop(struct seq_file *s, void *v)
{
@@ -152,7 +159,7 @@ case next start() will be asked to start at position zero.
}
Finally, the show() function should format the object currently pointed to
-by the iterator for output. The example module's show() function is:
+by the iterator for output. The example module's show() function is::
static int ct_seq_show(struct seq_file *s, void *v)
{
@@ -169,7 +176,7 @@ generated output before returning SEQ_SKIP, that output will be dropped.
We will look at seq_printf() in a moment. But first, the definition of the
seq_file iterator is finished by creating a seq_operations structure with
-the four functions we have just defined:
+the four functions we have just defined::
static const struct seq_operations ct_seq_ops = {
.start = ct_seq_start,
@@ -194,6 +201,7 @@ other locks while the iterator is active.
Formatted output
+================
The seq_file code manages positioning within the output created by the
iterator and getting it into the user's buffer. But, for that to work, that
@@ -203,7 +211,7 @@ been defined which make this task easy.
Most code will simply use seq_printf(), which works pretty much like
printk(), but which requires the seq_file pointer as an argument.
-For straight character output, the following functions may be used:
+For straight character output, the following functions may be used::
seq_putc(struct seq_file *m, char c);
seq_puts(struct seq_file *m, const char *s);
@@ -213,7 +221,7 @@ The first two output a single character and a string, just like one would
expect. seq_escape() is like seq_puts(), except that any character in s
which is in the string esc will be represented in octal form in the output.
-There are also a pair of functions for printing filenames:
+There are also a pair of functions for printing filenames::
int seq_path(struct seq_file *m, const struct path *path,
const char *esc);
@@ -226,8 +234,10 @@ the path relative to the current process's filesystem root. If a different
root is desired, it can be used with seq_path_root(). If it turns out that
path cannot be reached from root, seq_path_root() returns SEQ_SKIP.
-A function producing complicated output may want to check
+A function producing complicated output may want to check::
+
bool seq_has_overflowed(struct seq_file *m);
+
and avoid further seq_<output> calls if true is returned.
A true return from seq_has_overflowed means that the seq_file buffer will
@@ -236,6 +246,7 @@ buffer and retry printing.
Making it all work
+==================
So far, we have a nice set of functions which can produce output within the
seq_file system, but we have not yet turned them into a file that a user
@@ -244,7 +255,7 @@ creation of a set of file_operations which implement the operations on that
file. The seq_file interface provides a set of canned operations which do
most of the work. The virtual file author still must implement the open()
method, however, to hook everything up. The open function is often a single
-line, as in the example module:
+line, as in the example module::
static int ct_open(struct inode *inode, struct file *file)
{
@@ -263,7 +274,7 @@ by the iterator functions.
There is also a wrapper function to seq_open() called seq_open_private(). It
kmallocs a zero filled block of memory and stores a pointer to it in the
private field of the seq_file structure, returning 0 on success. The
-block size is specified in a third parameter to the function, e.g.:
+block size is specified in a third parameter to the function, e.g.::
static int ct_open(struct inode *inode, struct file *file)
{
@@ -273,7 +284,7 @@ block size is specified in a third parameter to the function, e.g.:
There is also a variant function, __seq_open_private(), which is functionally
identical except that, if successful, it returns the pointer to the allocated
-memory block, allowing further initialisation e.g.:
+memory block, allowing further initialisation e.g.::
static int ct_open(struct inode *inode, struct file *file)
{
@@ -295,7 +306,7 @@ frees the memory allocated in the corresponding open.
The other operations of interest - read(), llseek(), and release() - are
all implemented by the seq_file code itself. So a virtual file's
-file_operations structure will look like:
+file_operations structure will look like::
static const struct file_operations ct_file_ops = {
.owner = THIS_MODULE,
@@ -309,7 +320,7 @@ There is also a seq_release_private() which passes the contents of the
seq_file private field to kfree() before releasing the structure.
The final step is the creation of the /proc file itself. In the example
-code, that is done in the initialization code in the usual way:
+code, that is done in the initialization code in the usual way::
static int ct_init(void)
{
@@ -325,9 +336,10 @@ And that is pretty much it.
seq_list
+========
If your file will be iterating through a linked list, you may find these
-routines useful:
+routines useful::
struct list_head *seq_list_start(struct list_head *head,
loff_t pos);
@@ -338,15 +350,16 @@ routines useful:
These helpers will interpret pos as a position within the list and iterate
accordingly. Your start() and next() functions need only invoke the
-seq_list_* helpers with a pointer to the appropriate list_head structure.
+``seq_list_*`` helpers with a pointer to the appropriate list_head structure.
The extra-simple version
+========================
For extremely simple virtual files, there is an even easier interface. A
module can define only the show() function, which should create all the
output that the virtual file will contain. The file's open() method then
-calls:
+calls::
int single_open(struct file *file,
int (*show)(struct seq_file *m, void *p),
diff --git a/Documentation/filesystems/sharedsubtree.txt b/Documentation/filesystems/sharedsubtree.rst
index 8ccfbd55244b..d83395354250 100644
--- a/Documentation/filesystems/sharedsubtree.txt
+++ b/Documentation/filesystems/sharedsubtree.rst
@@ -1,7 +1,10 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
Shared Subtrees
----------------
+===============
-Contents:
+.. Contents:
1) Overview
2) Features
3) Setting mount states
@@ -41,31 +44,38 @@ replicas continue to be exactly same.
Here is an example:
- Let's say /mnt has a mount that is shared.
- mount --make-shared /mnt
+ Let's say /mnt has a mount that is shared::
+
+ mount --make-shared /mnt
Note: mount(8) command now supports the --make-shared flag,
so the sample 'smount' program is no longer needed and has been
removed.
- # mount --bind /mnt /tmp
+ ::
+
+ # mount --bind /mnt /tmp
+
The above command replicates the mount at /mnt to the mountpoint /tmp
and the contents of both the mounts remain identical.
- #ls /mnt
- a b c
+ ::
- #ls /tmp
- a b c
+ #ls /mnt
+ a b c
- Now let's say we mount a device at /tmp/a
- # mount /dev/sd0 /tmp/a
+ #ls /tmp
+ a b c
- #ls /tmp/a
- t1 t2 t3
+ Now let's say we mount a device at /tmp/a::
- #ls /mnt/a
- t1 t2 t3
+ # mount /dev/sd0 /tmp/a
+
+ #ls /tmp/a
+ t1 t2 t3
+
+ #ls /mnt/a
+ t1 t2 t3
Note that the mount has propagated to the mount at /mnt as well.
@@ -123,14 +133,15 @@ replicas continue to be exactly same.
2d) A unbindable mount is a unbindable private mount
- let's say we have a mount at /mnt and we make it unbindable
+ let's say we have a mount at /mnt and we make it unbindable::
+
+ # mount --make-unbindable /mnt
- # mount --make-unbindable /mnt
+ Let's try to bind mount this mount somewhere else::
- Let's try to bind mount this mount somewhere else.
- # mount --bind /mnt /tmp
- mount: wrong fs type, bad option, bad superblock on /mnt,
- or too many mounted file systems
+ # mount --bind /mnt /tmp
+ mount: wrong fs type, bad option, bad superblock on /mnt,
+ or too many mounted file systems
Binding a unbindable mount is a invalid operation.
@@ -138,12 +149,12 @@ replicas continue to be exactly same.
3) Setting mount states
The mount command (util-linux package) can be used to set mount
- states:
+ states::
- mount --make-shared mountpoint
- mount --make-slave mountpoint
- mount --make-private mountpoint
- mount --make-unbindable mountpoint
+ mount --make-shared mountpoint
+ mount --make-slave mountpoint
+ mount --make-private mountpoint
+ mount --make-unbindable mountpoint
4) Use cases
@@ -154,9 +165,10 @@ replicas continue to be exactly same.
Solution:
- The system administrator can make the mount at /cdrom shared
- mount --bind /cdrom /cdrom
- mount --make-shared /cdrom
+ The system administrator can make the mount at /cdrom shared::
+
+ mount --bind /cdrom /cdrom
+ mount --make-shared /cdrom
Now any process that clones off a new namespace will have a
mount at /cdrom which is a replica of the same mount in the
@@ -172,14 +184,14 @@ replicas continue to be exactly same.
Solution:
To begin with, the administrator can mark the entire mount tree
- as shareable.
+ as shareable::
- mount --make-rshared /
+ mount --make-rshared /
A new process can clone off a new namespace. And mark some part
- of its namespace as slave
+ of its namespace as slave::
- mount --make-rslave /myprivatetree
+ mount --make-rslave /myprivatetree
Hence forth any mounts within the /myprivatetree done by the
process will not show up in any other namespace. However mounts
@@ -206,13 +218,13 @@ replicas continue to be exactly same.
versions of the file depending on the path used to access that
file.
- An example is:
+ An example is::
- mount --make-shared /
- mount --rbind / /view/v1
- mount --rbind / /view/v2
- mount --rbind / /view/v3
- mount --rbind / /view/v4
+ mount --make-shared /
+ mount --rbind / /view/v1
+ mount --rbind / /view/v2
+ mount --rbind / /view/v3
+ mount --rbind / /view/v4
and if /usr has a versioning filesystem mounted, then that
mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and
@@ -224,8 +236,8 @@ replicas continue to be exactly same.
filesystem is being requested and return the corresponding
inode.
-5) Detailed semantics:
--------------------
+5) Detailed semantics
+---------------------
The section below explains the detailed semantics of
bind, rbind, move, mount, umount and clone-namespace operations.
@@ -235,6 +247,7 @@ replicas continue to be exactly same.
5a) Mount states
A given mount can be in one of the following states
+
1) shared
2) slave
3) shared and slave
@@ -252,7 +265,8 @@ replicas continue to be exactly same.
A 'shared mount' is defined as a vfsmount that belongs to a
'peer group'.
- For example:
+ For example::
+
mount --make-shared /mnt
mount --bind /mnt /tmp
@@ -270,7 +284,7 @@ replicas continue to be exactly same.
A slave mount as the name implies has a master mount from which
mount/unmount events are received. Events do not propagate from
the slave mount to the master. Only a shared mount can be made
- a slave by executing the following command
+ a slave by executing the following command::
mount --make-slave mount
@@ -290,8 +304,10 @@ replicas continue to be exactly same.
peer group.
Only a slave vfsmount can be made as 'shared and slave' by
- either executing the following command
+ either executing the following command::
+
mount --make-shared mount
+
or by moving the slave vfsmount under a shared vfsmount.
(4) Private mount
@@ -307,30 +323,32 @@ replicas continue to be exactly same.
State diagram:
+
The state diagram below explains the state transition of a mount,
- in response to various commands.
- ------------------------------------------------------------------------
- | |make-shared | make-slave | make-private |make-unbindab|
- --------------|------------|--------------|--------------|-------------|
- |shared |shared |*slave/private| private | unbindable |
- | | | | | |
- |-------------|------------|--------------|--------------|-------------|
- |slave |shared | **slave | private | unbindable |
- | |and slave | | | |
- |-------------|------------|--------------|--------------|-------------|
- |shared |shared | slave | private | unbindable |
- |and slave |and slave | | | |
- |-------------|------------|--------------|--------------|-------------|
- |private |shared | **private | private | unbindable |
- |-------------|------------|--------------|--------------|-------------|
- |unbindable |shared |**unbindable | private | unbindable |
- ------------------------------------------------------------------------
-
- * if the shared mount is the only mount in its peer group, making it
- slave, makes it private automatically. Note that there is no master to
- which it can be slaved to.
-
- ** slaving a non-shared mount has no effect on the mount.
+ in response to various commands::
+
+ -----------------------------------------------------------------------
+ | |make-shared | make-slave | make-private |make-unbindab|
+ --------------|------------|--------------|--------------|-------------|
+ |shared |shared |*slave/private| private | unbindable |
+ | | | | | |
+ |-------------|------------|--------------|--------------|-------------|
+ |slave |shared | **slave | private | unbindable |
+ | |and slave | | | |
+ |-------------|------------|--------------|--------------|-------------|
+ |shared |shared | slave | private | unbindable |
+ |and slave |and slave | | | |
+ |-------------|------------|--------------|--------------|-------------|
+ |private |shared | **private | private | unbindable |
+ |-------------|------------|--------------|--------------|-------------|
+ |unbindable |shared |**unbindable | private | unbindable |
+ ------------------------------------------------------------------------
+
+ * if the shared mount is the only mount in its peer group, making it
+ slave, makes it private automatically. Note that there is no master to
+ which it can be slaved to.
+
+ ** slaving a non-shared mount has no effect on the mount.
Apart from the commands listed below, the 'move' operation also changes
the state of a mount depending on type of the destination mount. Its
@@ -338,31 +356,32 @@ replicas continue to be exactly same.
5b) Bind semantics
- Consider the following command
+ Consider the following command::
- mount --bind A/a B/b
+ mount --bind A/a B/b
where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B'
is the destination mount and 'b' is the dentry in the destination mount.
The outcome depends on the type of mount of 'A' and 'B'. The table
- below contains quick reference.
- ---------------------------------------------------------------------------
- | BIND MOUNT OPERATION |
- |**************************************************************************
- |source(A)->| shared | private | slave | unbindable |
- | dest(B) | | | | |
- | | | | | | |
- | v | | | | |
- |**************************************************************************
- | shared | shared | shared | shared & slave | invalid |
- | | | | | |
- |non-shared| shared | private | slave | invalid |
- ***************************************************************************
+ below contains quick reference::
+
+ --------------------------------------------------------------------------
+ | BIND MOUNT OPERATION |
+ |************************************************************************|
+ |source(A)->| shared | private | slave | unbindable |
+ | dest(B) | | | | |
+ | | | | | | |
+ | v | | | | |
+ |************************************************************************|
+ | shared | shared | shared | shared & slave | invalid |
+ | | | | | |
+ |non-shared| shared | private | slave | invalid |
+ **************************************************************************
Details:
- 1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C'
+ 1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C'
which is clone of 'A', is created. Its root dentry is 'a' . 'C' is
mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
are created and mounted at the dentry 'b' on all mounts where 'B'
@@ -371,7 +390,7 @@ replicas continue to be exactly same.
'B'. And finally the peer-group of 'C' is merged with the peer group
of 'A'.
- 2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C'
+ 2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C'
which is clone of 'A', is created. Its root dentry is 'a'. 'C' is
mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
are created and mounted at the dentry 'b' on all mounts where 'B'
@@ -379,7 +398,7 @@ replicas continue to be exactly same.
'C', 'C1', .., 'Cn' with exactly the same configuration as the
propagation tree for 'B'.
- 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new
+ 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new
mount 'C' which is clone of 'A', is created. Its root dentry is 'a' .
'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2',
'C3' ... are created and mounted at the dentry 'b' on all mounts where
@@ -389,19 +408,19 @@ replicas continue to be exactly same.
is made the slave of mount 'Z'. In other words, mount 'C' is in the
state 'slave and shared'.
- 4. 'A' is a unbindable mount and 'B' is a shared mount. This is a
+ 4. 'A' is a unbindable mount and 'B' is a shared mount. This is a
invalid operation.
- 5. 'A' is a private mount and 'B' is a non-shared(private or slave or
+ 5. 'A' is a private mount and 'B' is a non-shared(private or slave or
unbindable) mount. A new mount 'C' which is clone of 'A', is created.
Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'.
- 6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C'
+ 6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C'
which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is
mounted on mount 'B' at dentry 'b'. 'C' is made a member of the
peer-group of 'A'.
- 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A
+ 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A
new mount 'C' which is a clone of 'A' is created. Its root dentry is
'a'. 'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a
slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of
@@ -409,7 +428,7 @@ replicas continue to be exactly same.
mount/unmount on 'A' do not propagate anywhere else. Similarly
mount/unmount on 'C' do not propagate anywhere else.
- 8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a
+ 8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a
invalid operation. A unbindable mount cannot be bind mounted.
5c) Rbind semantics
@@ -422,7 +441,9 @@ replicas continue to be exactly same.
then the subtree under the unbindable mount is pruned in the new
location.
- eg: let's say we have the following mount tree.
+ eg:
+
+ let's say we have the following mount tree::
A
/ \
@@ -430,12 +451,12 @@ replicas continue to be exactly same.
/ \ / \
D E F G
- Let's say all the mount except the mount C in the tree are
- of a type other than unbindable.
+ Let's say all the mount except the mount C in the tree are
+ of a type other than unbindable.
- If this tree is rbound to say Z
+ If this tree is rbound to say Z
- We will have the following tree at the new location.
+ We will have the following tree at the new location::
Z
|
@@ -457,24 +478,26 @@ replicas continue to be exactly same.
the dentry in the destination mount.
The outcome depends on the type of the mount of 'A' and 'B'. The table
- below is a quick reference.
- ---------------------------------------------------------------------------
- | MOVE MOUNT OPERATION |
- |**************************************************************************
- | source(A)->| shared | private | slave | unbindable |
- | dest(B) | | | | |
- | | | | | | |
- | v | | | | |
- |**************************************************************************
- | shared | shared | shared |shared and slave| invalid |
- | | | | | |
- |non-shared| shared | private | slave | unbindable |
- ***************************************************************************
- NOTE: moving a mount residing under a shared mount is invalid.
+ below is a quick reference::
+
+ ---------------------------------------------------------------------------
+ | MOVE MOUNT OPERATION |
+ |**************************************************************************
+ | source(A)->| shared | private | slave | unbindable |
+ | dest(B) | | | | |
+ | | | | | | |
+ | v | | | | |
+ |**************************************************************************
+ | shared | shared | shared |shared and slave| invalid |
+ | | | | | |
+ |non-shared| shared | private | slave | unbindable |
+ ***************************************************************************
+
+ .. Note:: moving a mount residing under a shared mount is invalid.
Details follow:
- 1. 'A' is a shared mount and 'B' is a shared mount. The mount 'A' is
+ 1. 'A' is a shared mount and 'B' is a shared mount. The mount 'A' is
mounted on mount 'B' at dentry 'b'. Also new mounts 'A1', 'A2'...'An'
are created and mounted at dentry 'b' on all mounts that receive
propagation from mount 'B'. A new propagation tree is created in the
@@ -483,7 +506,7 @@ replicas continue to be exactly same.
propagation tree is appended to the already existing propagation tree
of 'A'.
- 2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is
+ 2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is
mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An'
are created and mounted at dentry 'b' on all mounts that receive
propagation from mount 'B'. The mount 'A' becomes a shared mount and a
@@ -491,7 +514,7 @@ replicas continue to be exactly same.
'B'. This new propagation tree contains all the new mounts 'A1',
'A2'... 'An'.
- 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. The
+ 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. The
mount 'A' is mounted on mount 'B' at dentry 'b'. Also new mounts 'A1',
'A2'... 'An' are created and mounted at dentry 'b' on all mounts that
receive propagation from mount 'B'. A new propagation tree is created
@@ -501,32 +524,32 @@ replicas continue to be exactly same.
'A'. Mount 'A' continues to be the slave mount of 'Z' but it also
becomes 'shared'.
- 4. 'A' is a unbindable mount and 'B' is a shared mount. The operation
+ 4. 'A' is a unbindable mount and 'B' is a shared mount. The operation
is invalid. Because mounting anything on the shared mount 'B' can
create new mounts that get mounted on the mounts that receive
propagation from 'B'. And since the mount 'A' is unbindable, cloning
it to mount at other mountpoints is not possible.
- 5. 'A' is a private mount and 'B' is a non-shared(private or slave or
+ 5. 'A' is a private mount and 'B' is a non-shared(private or slave or
unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'.
- 6. 'A' is a shared mount and 'B' is a non-shared mount. The mount 'A'
+ 6. 'A' is a shared mount and 'B' is a non-shared mount. The mount 'A'
is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
shared mount.
- 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount.
+ 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount.
The mount 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A'
continues to be a slave mount of mount 'Z'.
- 8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount
+ 8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount
'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
unbindable mount.
5e) Mount semantics
- Consider the following command
+ Consider the following command::
- mount device B/b
+ mount device B/b
'B' is the destination mount and 'b' is the dentry in the destination
mount.
@@ -537,9 +560,9 @@ replicas continue to be exactly same.
5f) Unmount semantics
- Consider the following command
+ Consider the following command::
- umount A
+ umount A
where 'A' is a mount mounted on mount 'B' at dentry 'b'.
@@ -592,10 +615,12 @@ replicas continue to be exactly same.
A. What is the result of the following command sequence?
- mount --bind /mnt /mnt
- mount --make-shared /mnt
- mount --bind /mnt /tmp
- mount --move /tmp /mnt/1
+ ::
+
+ mount --bind /mnt /mnt
+ mount --make-shared /mnt
+ mount --bind /mnt /tmp
+ mount --move /tmp /mnt/1
what should be the contents of /mnt /mnt/1 /mnt/1/1 should be?
Should they all be identical? or should /mnt and /mnt/1 be
@@ -604,23 +629,27 @@ replicas continue to be exactly same.
B. What is the result of the following command sequence?
- mount --make-rshared /
- mkdir -p /v/1
- mount --rbind / /v/1
+ ::
+
+ mount --make-rshared /
+ mkdir -p /v/1
+ mount --rbind / /v/1
what should be the content of /v/1/v/1 be?
C. What is the result of the following command sequence?
- mount --bind /mnt /mnt
- mount --make-shared /mnt
- mkdir -p /mnt/1/2/3 /mnt/1/test
- mount --bind /mnt/1 /tmp
- mount --make-slave /mnt
- mount --make-shared /mnt
- mount --bind /mnt/1/2 /tmp1
- mount --make-slave /mnt
+ ::
+
+ mount --bind /mnt /mnt
+ mount --make-shared /mnt
+ mkdir -p /mnt/1/2/3 /mnt/1/test
+ mount --bind /mnt/1 /tmp
+ mount --make-slave /mnt
+ mount --make-shared /mnt
+ mount --bind /mnt/1/2 /tmp1
+ mount --make-slave /mnt
At this point we have the first mount at /tmp and
its root dentry is 1. Let's call this mount 'A'
@@ -668,7 +697,8 @@ replicas continue to be exactly same.
step 1:
let's say the root tree has just two directories with
- one vfsmount.
+ one vfsmount::
+
root
/ \
tmp usr
@@ -676,14 +706,17 @@ replicas continue to be exactly same.
And we want to replicate the tree at multiple
mountpoints under /root/tmp
- step2:
- mount --make-shared /root
+ step 2:
+ ::
- mkdir -p /tmp/m1
- mount --rbind /root /tmp/m1
+ mount --make-shared /root
- the new tree now looks like this:
+ mkdir -p /tmp/m1
+
+ mount --rbind /root /tmp/m1
+
+ the new tree now looks like this::
root
/ \
@@ -697,11 +730,13 @@ replicas continue to be exactly same.
it has two vfsmounts
- step3:
+ step 3:
+ ::
+
mkdir -p /tmp/m2
mount --rbind /root /tmp/m2
- the new tree now looks like this:
+ the new tree now looks like this::
root
/ \
@@ -724,6 +759,7 @@ replicas continue to be exactly same.
it has 6 vfsmounts
step 4:
+ ::
mkdir -p /tmp/m3
mount --rbind /root /tmp/m3
@@ -740,7 +776,8 @@ replicas continue to be exactly same.
step 1:
let's say the root tree has just two directories with
- one vfsmount.
+ one vfsmount::
+
root
/ \
tmp usr
@@ -748,17 +785,20 @@ replicas continue to be exactly same.
How do we set up the same tree at multiple locations under
/root/tmp
- step2:
- mount --bind /root/tmp /root/tmp
+ step 2:
+ ::
- mount --make-rshared /root
- mount --make-unbindable /root/tmp
- mkdir -p /tmp/m1
+ mount --bind /root/tmp /root/tmp
- mount --rbind /root /tmp/m1
+ mount --make-rshared /root
+ mount --make-unbindable /root/tmp
- the new tree now looks like this:
+ mkdir -p /tmp/m1
+
+ mount --rbind /root /tmp/m1
+
+ the new tree now looks like this::
root
/ \
@@ -768,11 +808,13 @@ replicas continue to be exactly same.
/ \
tmp usr
- step3:
+ step 3:
+ ::
+
mkdir -p /tmp/m2
mount --rbind /root /tmp/m2
- the new tree now looks like this:
+ the new tree now looks like this::
root
/ \
@@ -782,12 +824,13 @@ replicas continue to be exactly same.
/ \ / \
tmp usr tmp usr
- step4:
+ step 4:
+ ::
mkdir -p /tmp/m3
mount --rbind /root /tmp/m3
- the new tree now looks like this:
+ the new tree now looks like this::
root
/ \
@@ -801,25 +844,31 @@ replicas continue to be exactly same.
8A) Datastructure
- 4 new fields are introduced to struct vfsmount
- ->mnt_share
- ->mnt_slave_list
- ->mnt_slave
- ->mnt_master
+ 4 new fields are introduced to struct vfsmount:
+
+ * ->mnt_share
+ * ->mnt_slave_list
+ * ->mnt_slave
+ * ->mnt_master
- ->mnt_share links together all the mount to/from which this vfsmount
+ ->mnt_share
+ links together all the mount to/from which this vfsmount
send/receives propagation events.
- ->mnt_slave_list links all the mounts to which this vfsmount propagates
+ ->mnt_slave_list
+ links all the mounts to which this vfsmount propagates
to.
- ->mnt_slave links together all the slaves that its master vfsmount
+ ->mnt_slave
+ links together all the slaves that its master vfsmount
propagates to.
- ->mnt_master points to the master vfsmount from which this vfsmount
+ ->mnt_master
+ points to the master vfsmount from which this vfsmount
receives propagation.
- ->mnt_flags takes two more flags to indicate the propagation status of
+ ->mnt_flags
+ takes two more flags to indicate the propagation status of
the vfsmount. MNT_SHARE indicates that the vfsmount is a shared
vfsmount. MNT_UNCLONABLE indicates that the vfsmount cannot be
replicated.
@@ -842,7 +891,7 @@ replicas continue to be exactly same.
A example propagation tree looks as shown in the figure below.
[ NOTE: Though it looks like a forest, if we consider all the shared
- mounts as a conceptual entity called 'pnode', it becomes a tree]
+ mounts as a conceptual entity called 'pnode', it becomes a tree]::
A <--> B <--> C <---> D
@@ -864,14 +913,19 @@ replicas continue to be exactly same.
A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G'
E's ->mnt_share links with ->mnt_share of K
- 'E', 'K', 'F', 'G' have their ->mnt_master point to struct
- vfsmount of 'A'
+
+ 'E', 'K', 'F', 'G' have their ->mnt_master point to struct vfsmount of 'A'
+
'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K'
+
K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N'
C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K'
+
J and K's ->mnt_master points to struct vfsmount of C
+
and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I'
+
'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'.
@@ -903,6 +957,7 @@ replicas continue to be exactly same.
Prepare phase:
for each mount in the source tree:
+
a) Create the necessary number of mount trees to
be attached to each of the mounts that receive
propagation from the destination mount.
@@ -929,11 +984,12 @@ replicas continue to be exactly same.
Abort phase
delete all the newly created trees.
- NOTE: all the propagation related functionality resides in the file
- pnode.c
+ .. Note::
+ all the propagation related functionality resides in the file pnode.c
------------------------------------------------------------------------
version 0.1 (created the initial document, Ram Pai linuxram@us.ibm.com)
+
version 0.2 (Incorporated comments from Al Viro)
diff --git a/Documentation/filesystems/spufs/index.rst b/Documentation/filesystems/spufs/index.rst
new file mode 100644
index 000000000000..5ed4a8494967
--- /dev/null
+++ b/Documentation/filesystems/spufs/index.rst
@@ -0,0 +1,13 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+SPU Filesystem
+==============
+
+
+.. toctree::
+ :maxdepth: 1
+
+ spufs
+ spu_create
+ spu_run
diff --git a/Documentation/filesystems/spufs/spu_create.rst b/Documentation/filesystems/spufs/spu_create.rst
new file mode 100644
index 000000000000..83108c099696
--- /dev/null
+++ b/Documentation/filesystems/spufs/spu_create.rst
@@ -0,0 +1,131 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========
+spu_create
+==========
+
+Name
+====
+ spu_create - create a new spu context
+
+
+Synopsis
+========
+
+ ::
+
+ #include <sys/types.h>
+ #include <sys/spu.h>
+
+ int spu_create(const char *pathname, int flags, mode_t mode);
+
+Description
+===========
+ The spu_create system call is used on PowerPC machines that implement
+ the Cell Broadband Engine Architecture in order to access Synergistic
+ Processor Units (SPUs). It creates a new logical context for an SPU in
+ pathname and returns a handle to associated with it. pathname must
+ point to a non-existing directory in the mount point of the SPU file
+ system (spufs). When spu_create is successful, a directory gets cre-
+ ated on pathname and it is populated with files.
+
+ The returned file handle can only be passed to spu_run(2) or closed,
+ other operations are not defined on it. When it is closed, all associ-
+ ated directory entries in spufs are removed. When the last file handle
+ pointing either inside of the context directory or to this file
+ descriptor is closed, the logical SPU context is destroyed.
+
+ The parameter flags can be zero or any bitwise or'd combination of the
+ following constants:
+
+ SPU_RAWIO
+ Allow mapping of some of the hardware registers of the SPU into
+ user space. This flag requires the CAP_SYS_RAWIO capability, see
+ capabilities(7).
+
+ The mode parameter specifies the permissions used for creating the new
+ directory in spufs. mode is modified with the user's umask(2) value
+ and then used for both the directory and the files contained in it. The
+ file permissions mask out some more bits of mode because they typically
+ support only read or write access. See stat(2) for a full list of the
+ possible mode values.
+
+
+Return Value
+============
+ spu_create returns a new file descriptor. It may return -1 to indicate
+ an error condition and set errno to one of the error codes listed
+ below.
+
+
+Errors
+======
+ EACCES
+ The current user does not have write access on the spufs mount
+ point.
+
+ EEXIST An SPU context already exists at the given path name.
+
+ EFAULT pathname is not a valid string pointer in the current address
+ space.
+
+ EINVAL pathname is not a directory in the spufs mount point.
+
+ ELOOP Too many symlinks were found while resolving pathname.
+
+ EMFILE The process has reached its maximum open file limit.
+
+ ENAMETOOLONG
+ pathname was too long.
+
+ ENFILE The system has reached the global open file limit.
+
+ ENOENT Part of pathname could not be resolved.
+
+ ENOMEM The kernel could not allocate all resources required.
+
+ ENOSPC There are not enough SPU resources available to create a new
+ context or the user specific limit for the number of SPU con-
+ texts has been reached.
+
+ ENOSYS the functionality is not provided by the current system, because
+ either the hardware does not provide SPUs or the spufs module is
+ not loaded.
+
+ ENOTDIR
+ A part of pathname is not a directory.
+
+
+
+Notes
+=====
+ spu_create is meant to be used from libraries that implement a more
+ abstract interface to SPUs, not to be used from regular applications.
+ See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec-
+ ommended libraries.
+
+
+Files
+=====
+ pathname must point to a location beneath the mount point of spufs. By
+ convention, it gets mounted in /spu.
+
+
+Conforming to
+=============
+ This call is Linux specific and only implemented by the ppc64 architec-
+ ture. Programs using this system call are not portable.
+
+
+Bugs
+====
+ The code does not yet fully implement all features lined out here.
+
+
+Author
+======
+ Arnd Bergmann <arndb@de.ibm.com>
+
+See Also
+========
+ capabilities(7), close(2), spu_run(2), spufs(7)
diff --git a/Documentation/filesystems/spufs/spu_run.rst b/Documentation/filesystems/spufs/spu_run.rst
new file mode 100644
index 000000000000..7fdb1c31cb91
--- /dev/null
+++ b/Documentation/filesystems/spufs/spu_run.rst
@@ -0,0 +1,138 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======
+spu_run
+=======
+
+
+Name
+====
+ spu_run - execute an spu context
+
+
+Synopsis
+========
+
+ ::
+
+ #include <sys/spu.h>
+
+ int spu_run(int fd, unsigned int *npc, unsigned int *event);
+
+Description
+===========
+ The spu_run system call is used on PowerPC machines that implement the
+ Cell Broadband Engine Architecture in order to access Synergistic Pro-
+ cessor Units (SPUs). It uses the fd that was returned from spu_cre-
+ ate(2) to address a specific SPU context. When the context gets sched-
+ uled to a physical SPU, it starts execution at the instruction pointer
+ passed in npc.
+
+ Execution of SPU code happens synchronously, meaning that spu_run does
+ not return while the SPU is still running. If there is a need to exe-
+ cute SPU code in parallel with other code on either the main CPU or
+ other SPUs, you need to create a new thread of execution first, e.g.
+ using the pthread_create(3) call.
+
+ When spu_run returns, the current value of the SPU instruction pointer
+ is written back to npc, so you can call spu_run again without updating
+ the pointers.
+
+ event can be a NULL pointer or point to an extended status code that
+ gets filled when spu_run returns. It can be one of the following con-
+ stants:
+
+ SPE_EVENT_DMA_ALIGNMENT
+ A DMA alignment error
+
+ SPE_EVENT_SPE_DATA_SEGMENT
+ A DMA segmentation error
+
+ SPE_EVENT_SPE_DATA_STORAGE
+ A DMA storage error
+
+ If NULL is passed as the event argument, these errors will result in a
+ signal delivered to the calling process.
+
+Return Value
+============
+ spu_run returns the value of the spu_status register or -1 to indicate
+ an error and set errno to one of the error codes listed below. The
+ spu_status register value contains a bit mask of status codes and
+ optionally a 14 bit code returned from the stop-and-signal instruction
+ on the SPU. The bit masks for the status codes are:
+
+ 0x02
+ SPU was stopped by stop-and-signal.
+
+ 0x04
+ SPU was stopped by halt.
+
+ 0x08
+ SPU is waiting for a channel.
+
+ 0x10
+ SPU is in single-step mode.
+
+ 0x20
+ SPU has tried to execute an invalid instruction.
+
+ 0x40
+ SPU has tried to access an invalid channel.
+
+ 0x3fff0000
+ The bits masked with this value contain the code returned from
+ stop-and-signal.
+
+ There are always one or more of the lower eight bits set or an error
+ code is returned from spu_run.
+
+Errors
+======
+ EAGAIN or EWOULDBLOCK
+ fd is in non-blocking mode and spu_run would block.
+
+ EBADF fd is not a valid file descriptor.
+
+ EFAULT npc is not a valid pointer or status is neither NULL nor a valid
+ pointer.
+
+ EINTR A signal occurred while spu_run was in progress. The npc value
+ has been updated to the new program counter value if necessary.
+
+ EINVAL fd is not a file descriptor returned from spu_create(2).
+
+ ENOMEM Insufficient memory was available to handle a page fault result-
+ ing from an MFC direct memory access.
+
+ ENOSYS the functionality is not provided by the current system, because
+ either the hardware does not provide SPUs or the spufs module is
+ not loaded.
+
+
+Notes
+=====
+ spu_run is meant to be used from libraries that implement a more
+ abstract interface to SPUs, not to be used from regular applications.
+ See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec-
+ ommended libraries.
+
+
+Conforming to
+=============
+ This call is Linux specific and only implemented by the ppc64 architec-
+ ture. Programs using this system call are not portable.
+
+
+Bugs
+====
+ The code does not yet fully implement all features lined out here.
+
+
+Author
+======
+ Arnd Bergmann <arndb@de.ibm.com>
+
+See Also
+========
+ capabilities(7), close(2), spu_create(2), spufs(7)
diff --git a/Documentation/filesystems/spufs.txt b/Documentation/filesystems/spufs/spufs.rst
index eb9e3aa63026..8a42859bb100 100644
--- a/Documentation/filesystems/spufs.txt
+++ b/Documentation/filesystems/spufs/spufs.rst
@@ -1,12 +1,18 @@
-SPUFS(2) Linux Programmer's Manual SPUFS(2)
+.. SPDX-License-Identifier: GPL-2.0
+=====
+spufs
+=====
+Name
+====
-NAME
spufs - the SPU file system
-DESCRIPTION
+Description
+===========
+
The SPU file system is used on PowerPC machines that implement the Cell
Broadband Engine Architecture in order to access Synergistic Processor
Units (SPUs).
@@ -21,7 +27,9 @@ DESCRIPTION
ally add or remove files.
-MOUNT OPTIONS
+Mount Options
+=============
+
uid=<uid>
set the user owning the mount point, the default is 0 (root).
@@ -29,7 +37,9 @@ MOUNT OPTIONS
set the group owning the mount point, the default is 0 (root).
-FILES
+Files
+=====
+
The files in spufs mostly follow the standard behavior for regular sys-
tem calls like read(2) or write(2), but often support only a subset of
the operations supported on regular file systems. This list details the
@@ -125,14 +135,12 @@ FILES
space is available for writing.
- /mbox_stat
- /ibox_stat
- /wbox_stat
+ /mbox_stat, /ibox_stat, /wbox_stat
Read-only files that contain the length of the current queue, i.e. how
many words can be read from mbox or ibox or how many words can be
written to wbox without blocking. The files can be read only in 4-byte
units and return a big-endian binary integer number. The possible
- operations on an open *box_stat file are:
+ operations on an open ``*box_stat`` file are:
read(2)
If a count smaller than four is requested, read returns -1 and
@@ -143,12 +151,7 @@ FILES
in EAGAIN.
- /npc
- /decr
- /decr_status
- /spu_tag_mask
- /event_mask
- /srr0
+ /npc, /decr, /decr_status, /spu_tag_mask, /event_mask, /srr0
Internal registers of the SPU. The representation is an ASCII string
with the numeric value of the next instruction to be executed. These
can be used in read/write mode for debugging, but normal operation of
@@ -157,17 +160,14 @@ FILES
The contents of these files are:
+ =================== ===================================
npc Next Program Counter
-
decr SPU Decrementer
-
decr_status Decrementer Status
-
spu_tag_mask MFC tag mask for SPU DMA
-
event_mask Event mask for SPU interrupts
-
srr0 Interrupt Return address register
+ =================== ===================================
The possible operations on an open npc, decr, decr_status,
@@ -206,8 +206,7 @@ FILES
from the data buffer, updating the value of the fpcr register.
- /signal1
- /signal2
+ /signal1, /signal2
The two signal notification channels of an SPU. These are read-write
files that operate on a 32 bit word. Writing to one of these files
triggers an interrupt on the SPU. The value written to the signal
@@ -233,8 +232,7 @@ FILES
file.
- /signal1_type
- /signal2_type
+ /signal1_type, /signal2_type
These two files change the behavior of the signal1 and signal2 notifi-
cation files. The contain a numerical ASCII string which is read as
either "1" or "0". In mode 0 (overwrite), the hardware replaces the
@@ -259,263 +257,17 @@ FILES
the previous setting.
-EXAMPLES
+Examples
+========
/etc/fstab entry
none /spu spufs gid=spu 0 0
-AUTHORS
+Authors
+=======
Arnd Bergmann <arndb@de.ibm.com>, Mark Nutter <mnutter@us.ibm.com>,
Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
-SEE ALSO
+See Also
+========
capabilities(7), close(2), spu_create(2), spu_run(2), spufs(7)
-
-
-
-Linux 2005-09-28 SPUFS(2)
-
-------------------------------------------------------------------------------
-
-SPU_RUN(2) Linux Programmer's Manual SPU_RUN(2)
-
-
-
-NAME
- spu_run - execute an spu context
-
-
-SYNOPSIS
- #include <sys/spu.h>
-
- int spu_run(int fd, unsigned int *npc, unsigned int *event);
-
-DESCRIPTION
- The spu_run system call is used on PowerPC machines that implement the
- Cell Broadband Engine Architecture in order to access Synergistic Pro-
- cessor Units (SPUs). It uses the fd that was returned from spu_cre-
- ate(2) to address a specific SPU context. When the context gets sched-
- uled to a physical SPU, it starts execution at the instruction pointer
- passed in npc.
-
- Execution of SPU code happens synchronously, meaning that spu_run does
- not return while the SPU is still running. If there is a need to exe-
- cute SPU code in parallel with other code on either the main CPU or
- other SPUs, you need to create a new thread of execution first, e.g.
- using the pthread_create(3) call.
-
- When spu_run returns, the current value of the SPU instruction pointer
- is written back to npc, so you can call spu_run again without updating
- the pointers.
-
- event can be a NULL pointer or point to an extended status code that
- gets filled when spu_run returns. It can be one of the following con-
- stants:
-
- SPE_EVENT_DMA_ALIGNMENT
- A DMA alignment error
-
- SPE_EVENT_SPE_DATA_SEGMENT
- A DMA segmentation error
-
- SPE_EVENT_SPE_DATA_STORAGE
- A DMA storage error
-
- If NULL is passed as the event argument, these errors will result in a
- signal delivered to the calling process.
-
-RETURN VALUE
- spu_run returns the value of the spu_status register or -1 to indicate
- an error and set errno to one of the error codes listed below. The
- spu_status register value contains a bit mask of status codes and
- optionally a 14 bit code returned from the stop-and-signal instruction
- on the SPU. The bit masks for the status codes are:
-
- 0x02 SPU was stopped by stop-and-signal.
-
- 0x04 SPU was stopped by halt.
-
- 0x08 SPU is waiting for a channel.
-
- 0x10 SPU is in single-step mode.
-
- 0x20 SPU has tried to execute an invalid instruction.
-
- 0x40 SPU has tried to access an invalid channel.
-
- 0x3fff0000
- The bits masked with this value contain the code returned from
- stop-and-signal.
-
- There are always one or more of the lower eight bits set or an error
- code is returned from spu_run.
-
-ERRORS
- EAGAIN or EWOULDBLOCK
- fd is in non-blocking mode and spu_run would block.
-
- EBADF fd is not a valid file descriptor.
-
- EFAULT npc is not a valid pointer or status is neither NULL nor a valid
- pointer.
-
- EINTR A signal occurred while spu_run was in progress. The npc value
- has been updated to the new program counter value if necessary.
-
- EINVAL fd is not a file descriptor returned from spu_create(2).
-
- ENOMEM Insufficient memory was available to handle a page fault result-
- ing from an MFC direct memory access.
-
- ENOSYS the functionality is not provided by the current system, because
- either the hardware does not provide SPUs or the spufs module is
- not loaded.
-
-
-NOTES
- spu_run is meant to be used from libraries that implement a more
- abstract interface to SPUs, not to be used from regular applications.
- See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec-
- ommended libraries.
-
-
-CONFORMING TO
- This call is Linux specific and only implemented by the ppc64 architec-
- ture. Programs using this system call are not portable.
-
-
-BUGS
- The code does not yet fully implement all features lined out here.
-
-
-AUTHOR
- Arnd Bergmann <arndb@de.ibm.com>
-
-SEE ALSO
- capabilities(7), close(2), spu_create(2), spufs(7)
-
-
-
-Linux 2005-09-28 SPU_RUN(2)
-
-------------------------------------------------------------------------------
-
-SPU_CREATE(2) Linux Programmer's Manual SPU_CREATE(2)
-
-
-
-NAME
- spu_create - create a new spu context
-
-
-SYNOPSIS
- #include <sys/types.h>
- #include <sys/spu.h>
-
- int spu_create(const char *pathname, int flags, mode_t mode);
-
-DESCRIPTION
- The spu_create system call is used on PowerPC machines that implement
- the Cell Broadband Engine Architecture in order to access Synergistic
- Processor Units (SPUs). It creates a new logical context for an SPU in
- pathname and returns a handle to associated with it. pathname must
- point to a non-existing directory in the mount point of the SPU file
- system (spufs). When spu_create is successful, a directory gets cre-
- ated on pathname and it is populated with files.
-
- The returned file handle can only be passed to spu_run(2) or closed,
- other operations are not defined on it. When it is closed, all associ-
- ated directory entries in spufs are removed. When the last file handle
- pointing either inside of the context directory or to this file
- descriptor is closed, the logical SPU context is destroyed.
-
- The parameter flags can be zero or any bitwise or'd combination of the
- following constants:
-
- SPU_RAWIO
- Allow mapping of some of the hardware registers of the SPU into
- user space. This flag requires the CAP_SYS_RAWIO capability, see
- capabilities(7).
-
- The mode parameter specifies the permissions used for creating the new
- directory in spufs. mode is modified with the user's umask(2) value
- and then used for both the directory and the files contained in it. The
- file permissions mask out some more bits of mode because they typically
- support only read or write access. See stat(2) for a full list of the
- possible mode values.
-
-
-RETURN VALUE
- spu_create returns a new file descriptor. It may return -1 to indicate
- an error condition and set errno to one of the error codes listed
- below.
-
-
-ERRORS
- EACCES
- The current user does not have write access on the spufs mount
- point.
-
- EEXIST An SPU context already exists at the given path name.
-
- EFAULT pathname is not a valid string pointer in the current address
- space.
-
- EINVAL pathname is not a directory in the spufs mount point.
-
- ELOOP Too many symlinks were found while resolving pathname.
-
- EMFILE The process has reached its maximum open file limit.
-
- ENAMETOOLONG
- pathname was too long.
-
- ENFILE The system has reached the global open file limit.
-
- ENOENT Part of pathname could not be resolved.
-
- ENOMEM The kernel could not allocate all resources required.
-
- ENOSPC There are not enough SPU resources available to create a new
- context or the user specific limit for the number of SPU con-
- texts has been reached.
-
- ENOSYS the functionality is not provided by the current system, because
- either the hardware does not provide SPUs or the spufs module is
- not loaded.
-
- ENOTDIR
- A part of pathname is not a directory.
-
-
-
-NOTES
- spu_create is meant to be used from libraries that implement a more
- abstract interface to SPUs, not to be used from regular applications.
- See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec-
- ommended libraries.
-
-
-FILES
- pathname must point to a location beneath the mount point of spufs. By
- convention, it gets mounted in /spu.
-
-
-CONFORMING TO
- This call is Linux specific and only implemented by the ppc64 architec-
- ture. Programs using this system call are not portable.
-
-
-BUGS
- The code does not yet fully implement all features lined out here.
-
-
-AUTHOR
- Arnd Bergmann <arndb@de.ibm.com>
-
-SEE ALSO
- capabilities(7), close(2), spu_run(2), spufs(7)
-
-
-
-Linux 2005-09-28 SPU_CREATE(2)
diff --git a/Documentation/filesystems/sysfs-pci.txt b/Documentation/filesystems/sysfs-pci.rst
index 06f1d64c6f70..a265f3e2cc80 100644
--- a/Documentation/filesystems/sysfs-pci.txt
+++ b/Documentation/filesystems/sysfs-pci.rst
@@ -1,8 +1,11 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================================
Accessing PCI device resources through sysfs
---------------------------------------------
+============================================
sysfs, usually mounted at /sys, provides access to PCI resources on platforms
-that support it. For example, a given bus might look like this:
+that support it. For example, a given bus might look like this::
/sys/devices/pci0000:17
|-- 0000:17:00.0
@@ -30,8 +33,9 @@ This bus contains a single function device in slot 0. The domain and bus
numbers are reproduced for convenience. Under the device directory are several
files, each with their own function.
+ =================== =====================================================
file function
- ---- --------
+ =================== =====================================================
class PCI class (ascii, ro)
config PCI config space (binary, rw)
device PCI device (ascii, ro)
@@ -40,13 +44,16 @@ files, each with their own function.
local_cpus nearby CPU mask (cpumask, ro)
remove remove device from kernel's list (ascii, wo)
resource PCI resource host addresses (ascii, ro)
- resource0..N PCI resource N, if present (binary, mmap, rw[1])
+ resource0..N PCI resource N, if present (binary, mmap, rw\ [1]_)
resource0_wc..N_wc PCI WC map resource N, if prefetchable (binary, mmap)
revision PCI revision (ascii, ro)
rom PCI ROM resource, if present (binary, ro)
subsystem_device PCI subsystem device (ascii, ro)
subsystem_vendor PCI subsystem vendor (ascii, ro)
vendor PCI vendor (ascii, ro)
+ =================== =====================================================
+
+::
ro - read only file
rw - file is readable and writable
@@ -56,7 +63,7 @@ files, each with their own function.
binary - file contains binary data
cpumask - file contains a cpumask type
-[1] rw for RESOURCE_IO (I/O port) regions only
+.. [1] rw for RESOURCE_IO (I/O port) regions only
The read only files are informational, writes to them will be ignored, with
the exception of the 'rom' file. Writable files can be used to perform
@@ -67,11 +74,11 @@ don't support mmapping of certain resources, so be sure to check the return
value from any attempted mmap. The most notable of these are I/O port
resources, which also provide read/write access.
-The 'enable' file provides a counter that indicates how many times the device
+The 'enable' file provides a counter that indicates how many times the device
has been enabled. If the 'enable' file currently returns '4', and a '1' is
echoed into it, it will then return '5'. Echoing a '0' into it will decrease
the count. Even when it returns to 0, though, some of the initialisation
-may not be reversed.
+may not be reversed.
The 'rom' file is special in that it provides read-only access to the device's
ROM file, if available. It's disabled by default, however, so applications
@@ -93,7 +100,7 @@ Accessing legacy resources through sysfs
Legacy I/O port and ISA memory resources are also provided in sysfs if the
underlying platform supports them. They're located in the PCI class hierarchy,
-e.g.
+e.g.::
/sys/class/pci_bus/0000:17/
|-- bridge -> ../../../devices/pci0000:17
diff --git a/Documentation/filesystems/sysfs-tagging.txt b/Documentation/filesystems/sysfs-tagging.rst
index c7c8e6438958..8888a05c398e 100644
--- a/Documentation/filesystems/sysfs-tagging.txt
+++ b/Documentation/filesystems/sysfs-tagging.rst
@@ -1,5 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
Sysfs tagging
--------------
+=============
(Taken almost verbatim from Eric Biederman's netns tagging patch
commit msg)
@@ -18,25 +21,28 @@ in the directories and applications only see a limited set of
the network devices.
Each sysfs directory entry may be tagged with a namespace via the
-void *ns member of its kernfs_node. If a directory entry is tagged,
-then kernfs_node->flags will have a flag between KOBJ_NS_TYPE_NONE
+``void *ns member`` of its ``kernfs_node``. If a directory entry is tagged,
+then ``kernfs_node->flags`` will have a flag between KOBJ_NS_TYPE_NONE
and KOBJ_NS_TYPES, and ns will point to the namespace to which it
belongs.
-Each sysfs superblock's kernfs_super_info contains an array void
-*ns[KOBJ_NS_TYPES]. When a task in a tagging namespace
+Each sysfs superblock's kernfs_super_info contains an array
+``void *ns[KOBJ_NS_TYPES]``. When a task in a tagging namespace
kobj_nstype first mounts sysfs, a new superblock is created. It
will be differentiated from other sysfs mounts by having its
-s_fs_info->ns[kobj_nstype] set to the new namespace. Note that
+``s_fs_info->ns[kobj_nstype]`` set to the new namespace. Note that
through bind mounting and mounts propagation, a task can easily view
the contents of other namespaces' sysfs mounts. Therefore, when a
namespace exits, it will call kobj_ns_exit() to invalidate any
kernfs_node->ns pointers pointing to it.
Users of this interface:
-- define a type in the kobj_ns_type enumeration.
-- call kobj_ns_type_register() with its kobj_ns_type_operations which has
+
+- define a type in the ``kobj_ns_type`` enumeration.
+- call kobj_ns_type_register() with its ``kobj_ns_type_operations`` which has
+
- current_ns() which returns current's namespace
- netlink_ns() which returns a socket's namespace
- initial_ns() which returns the initial namesapce
+
- call kobj_ns_exit() when an individual tag is no longer valid
diff --git a/Documentation/filesystems/sysfs.rst b/Documentation/filesystems/sysfs.rst
index 290891c3fecb..ab0f7795792b 100644
--- a/Documentation/filesystems/sysfs.rst
+++ b/Documentation/filesystems/sysfs.rst
@@ -20,7 +20,7 @@ a means to export kernel data structures, their attributes, and the
linkages between them to userspace.
sysfs is tied inherently to the kobject infrastructure. Please read
-Documentation/kobject.txt for more information concerning the kobject
+Documentation/core-api/kobject.rst for more information concerning the kobject
interface.
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7d4d09dd5e6d..ed17771c212b 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -706,6 +706,7 @@ cache in your filesystem. The following members are defined:
int (*readpage)(struct file *, struct page *);
int (*writepages)(struct address_space *, struct writeback_control *);
int (*set_page_dirty)(struct page *page);
+ void (*readahead)(struct readahead_control *);
int (*readpages)(struct file *filp, struct address_space *mapping,
struct list_head *pages, unsigned nr_pages);
int (*write_begin)(struct file *, struct address_space *mapping,
@@ -781,12 +782,26 @@ cache in your filesystem. The following members are defined:
If defined, it should set the PageDirty flag, and the
PAGECACHE_TAG_DIRTY tag in the radix tree.
+``readahead``
+ Called by the VM to read pages associated with the address_space
+ object. The pages are consecutive in the page cache and are
+ locked. The implementation should decrement the page refcount
+ after starting I/O on each page. Usually the page will be
+ unlocked by the I/O completion handler. If the filesystem decides
+ to stop attempting I/O before reaching the end of the readahead
+ window, it can simply return. The caller will decrement the page
+ refcount and unlock the remaining pages for you. Set PageUptodate
+ if the I/O completes successfully. Setting PageError on any page
+ will be ignored; simply unlock the page if an I/O error occurs.
+
``readpages``
called by the VM to read pages associated with the address_space
object. This is essentially just a vector version of readpage.
Instead of just one page, several pages are requested.
readpages is only used for read-ahead, so read errors are
ignored. If anything goes wrong, feel free to give up.
+ This interface is deprecated and will be removed by the end of
+ 2020; implement readahead instead.
``write_begin``
Called by the generic buffered write code to ask the filesystem
diff --git a/Documentation/filesystems/xfs-delayed-logging-design.txt b/Documentation/filesystems/xfs-delayed-logging-design.rst
index 9a6dd289b17b..464405d2801e 100644
--- a/Documentation/filesystems/xfs-delayed-logging-design.txt
+++ b/Documentation/filesystems/xfs-delayed-logging-design.rst
@@ -1,8 +1,11 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
XFS Delayed Logging Design
---------------------------
+==========================
Introduction to Re-logging in XFS
----------------------------------
+=================================
XFS logging is a combination of logical and physical logging. Some objects,
such as inodes and dquots, are logged in logical format where the details
@@ -25,7 +28,7 @@ changes in the new transaction that is written to the log.
That is, if we have a sequence of changes A through to F, and the object was
written to disk after change D, we would see in the log the following series
of transactions, their contents and the log sequence number (LSN) of the
-transaction:
+transaction::
Transaction Contents LSN
A A X
@@ -85,7 +88,7 @@ IO permanently. Hence the XFS journalling subsystem can be considered to be IO
bound.
Delayed Logging: Concepts
--------------------------
+=========================
The key thing to note about the asynchronous logging combined with the
relogging technique XFS uses is that we can be relogging changed objects
@@ -154,9 +157,10 @@ The fundamental requirements for delayed logging in XFS are simple:
6. No performance regressions for synchronous transaction workloads.
Delayed Logging: Design
------------------------
+=======================
Storing Changes
+---------------
The problem with accumulating changes at a logical level (i.e. just using the
existing log item dirty region tracking) is that when it comes to writing the
@@ -194,30 +198,30 @@ asynchronous transactions to the log. The differences between the existing
formatting method and the delayed logging formatting can be seen in the
diagram below.
-Current format log vector:
+Current format log vector::
-Object +---------------------------------------------+
-Vector 1 +----+
-Vector 2 +----+
-Vector 3 +----------+
+ Object +---------------------------------------------+
+ Vector 1 +----+
+ Vector 2 +----+
+ Vector 3 +----------+
-After formatting:
+After formatting::
-Log Buffer +-V1-+-V2-+----V3----+
+ Log Buffer +-V1-+-V2-+----V3----+
-Delayed logging vector:
+Delayed logging vector::
-Object +---------------------------------------------+
-Vector 1 +----+
-Vector 2 +----+
-Vector 3 +----------+
+ Object +---------------------------------------------+
+ Vector 1 +----+
+ Vector 2 +----+
+ Vector 3 +----------+
-After formatting:
+After formatting::
-Memory Buffer +-V1-+-V2-+----V3----+
-Vector 1 +----+
-Vector 2 +----+
-Vector 3 +----------+
+ Memory Buffer +-V1-+-V2-+----V3----+
+ Vector 1 +----+
+ Vector 2 +----+
+ Vector 3 +----------+
The memory buffer and associated vector need to be passed as a single object,
but still need to be associated with the parent object so if the object is
@@ -242,6 +246,7 @@ relogged in memory.
Tracking Changes
+----------------
Now that we can record transactional changes in memory in a form that allows
them to be used without limitations, we need to be able to track and accumulate
@@ -278,6 +283,7 @@ done for convenience/sanity of the developers.
Delayed Logging: Checkpoints
+----------------------------
When we have a log synchronisation event, commonly known as a "log force",
all the items in the CIL must be written into the log via the log buffers.
@@ -341,7 +347,7 @@ Hence log vectors need to be able to be chained together to allow them to be
detached from the log items. That is, when the CIL is flushed the memory
buffer and log vector attached to each log item needs to be attached to the
checkpoint context so that the log item can be released. In diagrammatic form,
-the CIL would look like this before the flush:
+the CIL would look like this before the flush::
CIL Head
|
@@ -362,7 +368,7 @@ the CIL would look like this before the flush:
-> vector array
And after the flush the CIL head is empty, and the checkpoint context log
-vector list would look like:
+vector list would look like::
Checkpoint Context
|
@@ -411,6 +417,7 @@ compare" situation that can be done after a working and reviewed implementation
is in the dev tree....
Delayed Logging: Checkpoint Sequencing
+--------------------------------------
One of the key aspects of the XFS transaction subsystem is that it tags
committed transactions with the log sequence number of the transaction commit.
@@ -474,6 +481,7 @@ force the log at the LSN of that transaction) and so the higher level code
behaves the same regardless of whether delayed logging is being used or not.
Delayed Logging: Checkpoint Log Space Accounting
+------------------------------------------------
The big issue for a checkpoint transaction is the log space reservation for the
transaction. We don't know how big a checkpoint transaction is going to be
@@ -491,7 +499,7 @@ the size of the transaction and the number of regions being logged (the number
of log vectors in the transaction).
An example of the differences would be logging directory changes versus logging
-inode changes. If you modify lots of inode cores (e.g. chmod -R g+w *), then
+inode changes. If you modify lots of inode cores (e.g. ``chmod -R g+w *``), then
there are lots of transactions that only contain an inode core and an inode log
format structure. That is, two vectors totaling roughly 150 bytes. If we modify
10,000 inodes, we have about 1.5MB of metadata to write in 20,000 vectors. Each
@@ -565,6 +573,7 @@ which is once every 30s.
Delayed Logging: Log Item Pinning
+---------------------------------
Currently log items are pinned during transaction commit while the items are
still locked. This happens just after the items are formatted, though it could
@@ -605,6 +614,7 @@ object, we have a race with CIL being flushed between the check and the pin
lock to guarantee that we pin the items correctly.
Delayed Logging: Concurrent Scalability
+---------------------------------------
A fundamental requirement for the CIL is that accesses through transaction
commits must scale to many concurrent commits. The current transaction commit
@@ -683,8 +693,9 @@ woken by the wrong event.
Lifecycle Changes
+-----------------
-The existing log item life cycle is as follows:
+The existing log item life cycle is as follows::
1. Transaction allocate
2. Transaction reserve
@@ -729,7 +740,7 @@ at the same time. If the log item is in the AIL or between steps 6 and 7
and steps 1-6 are re-entered, then the item is relogged. Only when steps 8-9
are entered and completed is the object considered clean.
-With delayed logging, there are new steps inserted into the life cycle:
+With delayed logging, there are new steps inserted into the life cycle::
1. Transaction allocate
2. Transaction reserve
diff --git a/Documentation/filesystems/xfs-self-describing-metadata.txt b/Documentation/filesystems/xfs-self-describing-metadata.rst
index 8db0121d0980..b79dbf36dc94 100644
--- a/Documentation/filesystems/xfs-self-describing-metadata.txt
+++ b/Documentation/filesystems/xfs-self-describing-metadata.rst
@@ -1,8 +1,11 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
XFS Self Describing Metadata
-----------------------------
+============================
Introduction
-------------
+============
The largest scalability problem facing XFS is not one of algorithmic
scalability, but of verification of the filesystem structure. Scalabilty of the
@@ -34,7 +37,7 @@ required for basic forensic analysis of the filesystem structure.
Self Describing Metadata
-------------------------
+========================
One of the problems with the current metadata format is that apart from the
magic number in the metadata block, we have no other way of identifying what it
@@ -142,7 +145,7 @@ modification occurred between the corruption being written and when it was
detected.
Runtime Validation
-------------------
+==================
Validation of self-describing metadata takes place at runtime in two places:
@@ -183,18 +186,18 @@ error occurs during this process, the buffer is again marked with a EFSCORRUPTED
error for the higher layers to catch.
Structures
-----------
+==========
-A typical on-disk structure needs to contain the following information:
+A typical on-disk structure needs to contain the following information::
-struct xfs_ondisk_hdr {
- __be32 magic; /* magic number */
- __be32 crc; /* CRC, not logged */
- uuid_t uuid; /* filesystem identifier */
- __be64 owner; /* parent object */
- __be64 blkno; /* location on disk */
- __be64 lsn; /* last modification in log, not logged */
-};
+ struct xfs_ondisk_hdr {
+ __be32 magic; /* magic number */
+ __be32 crc; /* CRC, not logged */
+ uuid_t uuid; /* filesystem identifier */
+ __be64 owner; /* parent object */
+ __be64 blkno; /* location on disk */
+ __be64 lsn; /* last modification in log, not logged */
+ };
Depending on the metadata, this information may be part of a header structure
separate to the metadata contents, or may be distributed through an existing
@@ -214,24 +217,24 @@ level of information is generally provided. For example:
well. hence the additional metadata headers change the overall format
of the metadata.
-A typical buffer read verifier is structured as follows:
+A typical buffer read verifier is structured as follows::
-#define XFS_FOO_CRC_OFF offsetof(struct xfs_ondisk_hdr, crc)
+ #define XFS_FOO_CRC_OFF offsetof(struct xfs_ondisk_hdr, crc)
-static void
-xfs_foo_read_verify(
- struct xfs_buf *bp)
-{
- struct xfs_mount *mp = bp->b_mount;
+ static void
+ xfs_foo_read_verify(
+ struct xfs_buf *bp)
+ {
+ struct xfs_mount *mp = bp->b_mount;
- if ((xfs_sb_version_hascrc(&mp->m_sb) &&
- !xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length),
- XFS_FOO_CRC_OFF)) ||
- !xfs_foo_verify(bp)) {
- XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
- xfs_buf_ioerror(bp, EFSCORRUPTED);
- }
-}
+ if ((xfs_sb_version_hascrc(&mp->m_sb) &&
+ !xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length),
+ XFS_FOO_CRC_OFF)) ||
+ !xfs_foo_verify(bp)) {
+ XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
+ xfs_buf_ioerror(bp, EFSCORRUPTED);
+ }
+ }
The code ensures that the CRC is only checked if the filesystem has CRCs enabled
by checking the superblock of the feature bit, and then if the CRC verifies OK
@@ -239,83 +242,83 @@ by checking the superblock of the feature bit, and then if the CRC verifies OK
The verifier function will take a couple of different forms, depending on
whether the magic number can be used to determine the format of the block. In
-the case it can't, the code is structured as follows:
+the case it can't, the code is structured as follows::
-static bool
-xfs_foo_verify(
- struct xfs_buf *bp)
-{
- struct xfs_mount *mp = bp->b_mount;
- struct xfs_ondisk_hdr *hdr = bp->b_addr;
+ static bool
+ xfs_foo_verify(
+ struct xfs_buf *bp)
+ {
+ struct xfs_mount *mp = bp->b_mount;
+ struct xfs_ondisk_hdr *hdr = bp->b_addr;
- if (hdr->magic != cpu_to_be32(XFS_FOO_MAGIC))
- return false;
+ if (hdr->magic != cpu_to_be32(XFS_FOO_MAGIC))
+ return false;
- if (!xfs_sb_version_hascrc(&mp->m_sb)) {
- if (!uuid_equal(&hdr->uuid, &mp->m_sb.sb_uuid))
- return false;
- if (bp->b_bn != be64_to_cpu(hdr->blkno))
- return false;
- if (hdr->owner == 0)
- return false;
- }
+ if (!xfs_sb_version_hascrc(&mp->m_sb)) {
+ if (!uuid_equal(&hdr->uuid, &mp->m_sb.sb_uuid))
+ return false;
+ if (bp->b_bn != be64_to_cpu(hdr->blkno))
+ return false;
+ if (hdr->owner == 0)
+ return false;
+ }
- /* object specific verification checks here */
+ /* object specific verification checks here */
- return true;
-}
+ return true;
+ }
If there are different magic numbers for the different formats, the verifier
-will look like:
-
-static bool
-xfs_foo_verify(
- struct xfs_buf *bp)
-{
- struct xfs_mount *mp = bp->b_mount;
- struct xfs_ondisk_hdr *hdr = bp->b_addr;
-
- if (hdr->magic == cpu_to_be32(XFS_FOO_CRC_MAGIC)) {
- if (!uuid_equal(&hdr->uuid, &mp->m_sb.sb_uuid))
- return false;
- if (bp->b_bn != be64_to_cpu(hdr->blkno))
- return false;
- if (hdr->owner == 0)
- return false;
- } else if (hdr->magic != cpu_to_be32(XFS_FOO_MAGIC))
- return false;
-
- /* object specific verification checks here */
-
- return true;
-}
+will look like::
+
+ static bool
+ xfs_foo_verify(
+ struct xfs_buf *bp)
+ {
+ struct xfs_mount *mp = bp->b_mount;
+ struct xfs_ondisk_hdr *hdr = bp->b_addr;
+
+ if (hdr->magic == cpu_to_be32(XFS_FOO_CRC_MAGIC)) {
+ if (!uuid_equal(&hdr->uuid, &mp->m_sb.sb_uuid))
+ return false;
+ if (bp->b_bn != be64_to_cpu(hdr->blkno))
+ return false;
+ if (hdr->owner == 0)
+ return false;
+ } else if (hdr->magic != cpu_to_be32(XFS_FOO_MAGIC))
+ return false;
+
+ /* object specific verification checks here */
+
+ return true;
+ }
Write verifiers are very similar to the read verifiers, they just do things in
-the opposite order to the read verifiers. A typical write verifier:
+the opposite order to the read verifiers. A typical write verifier::
-static void
-xfs_foo_write_verify(
- struct xfs_buf *bp)
-{
- struct xfs_mount *mp = bp->b_mount;
- struct xfs_buf_log_item *bip = bp->b_fspriv;
+ static void
+ xfs_foo_write_verify(
+ struct xfs_buf *bp)
+ {
+ struct xfs_mount *mp = bp->b_mount;
+ struct xfs_buf_log_item *bip = bp->b_fspriv;
- if (!xfs_foo_verify(bp)) {
- XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
- xfs_buf_ioerror(bp, EFSCORRUPTED);
- return;
- }
+ if (!xfs_foo_verify(bp)) {
+ XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
+ xfs_buf_ioerror(bp, EFSCORRUPTED);
+ return;
+ }
- if (!xfs_sb_version_hascrc(&mp->m_sb))
- return;
+ if (!xfs_sb_version_hascrc(&mp->m_sb))
+ return;
- if (bip) {
- struct xfs_ondisk_hdr *hdr = bp->b_addr;
- hdr->lsn = cpu_to_be64(bip->bli_item.li_lsn);
- }
- xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length), XFS_FOO_CRC_OFF);
-}
+ if (bip) {
+ struct xfs_ondisk_hdr *hdr = bp->b_addr;
+ hdr->lsn = cpu_to_be64(bip->bli_item.li_lsn);
+ }
+ xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length), XFS_FOO_CRC_OFF);
+ }
This will verify the internal structure of the metadata before we go any
further, detecting corruptions that have occurred as the metadata has been
@@ -324,7 +327,7 @@ update the LSN field (when it was last modified) and calculate the CRC on the
metadata. Once this is done, we can issue the IO.
Inodes and Dquots
------------------
+=================
Inodes and dquots are special snowflakes. They have per-object CRC and
self-identifiers, but they are packed so that there are multiple objects per
@@ -337,14 +340,13 @@ buffer.
The structure of the verifiers and the identifiers checks is very similar to the
buffer code described above. The only difference is where they are called. For
-example, inode read verification is done in xfs_iread() when the inode is first
-read out of the buffer and the struct xfs_inode is instantiated. The inode is
-already extensively verified during writeback in xfs_iflush_int, so the only
-addition here is to add the LSN and CRC to the inode as it is copied back into
-the buffer.
+example, inode read verification is done in xfs_inode_from_disk() when the inode
+is first read out of the buffer and the struct xfs_inode is instantiated. The
+inode is already extensively verified during writeback in xfs_iflush_int, so the
+only addition here is to add the LSN and CRC to the inode as it is copied back
+into the buffer.
XXX: inode unlinked list modification doesn't recalculate the inode CRC! None of
the unlinked list modifications check or update CRCs, neither during unlink nor
log recovery. So, it's gone unnoticed until now. This won't matter immediately -
repair will probably complain about it - but it needs to be fixed.
-
diff --git a/Documentation/gpu/amdgpu.rst b/Documentation/gpu/amdgpu.rst
index 0efede580039..4cc74325bf91 100644
--- a/Documentation/gpu/amdgpu.rst
+++ b/Documentation/gpu/amdgpu.rst
@@ -202,3 +202,91 @@ busy_percent
.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
:doc: busy_percent
+
+GPU Product Information
+=======================
+
+Information about the GPU can be obtained on certain cards
+via sysfs
+
+product_name
+------------
+
+.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+ :doc: product_name
+
+product_number
+--------------
+
+.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+ :doc: product_name
+
+serial_number
+-------------
+
+.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+ :doc: serial_number
+
+unique_id
+---------
+
+.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+ :doc: unique_id
+
+GPU Memory Usage Information
+============================
+
+Various memory accounting can be accessed via sysfs
+
+mem_info_vram_total
+-------------------
+
+.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+ :doc: mem_info_vram_total
+
+mem_info_vram_used
+------------------
+
+.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+ :doc: mem_info_vram_used
+
+mem_info_vis_vram_total
+-----------------------
+
+.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+ :doc: mem_info_vis_vram_total
+
+mem_info_vis_vram_used
+----------------------
+
+.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+ :doc: mem_info_vis_vram_used
+
+mem_info_gtt_total
+------------------
+
+.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+ :doc: mem_info_gtt_total
+
+mem_info_gtt_used
+-----------------
+
+.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+ :doc: mem_info_gtt_used
+
+PCIe Accounting Information
+===========================
+
+pcie_bw
+-------
+
+.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+ :doc: pcie_bw
+
+pcie_replay_count
+-----------------
+
+.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+ :doc: pcie_replay_count
+
+
diff --git a/Documentation/gpu/drm-internals.rst b/Documentation/gpu/drm-internals.rst
index a73320576ca9..12272b168580 100644
--- a/Documentation/gpu/drm-internals.rst
+++ b/Documentation/gpu/drm-internals.rst
@@ -132,6 +132,18 @@ be unmapped; on many devices, the ROM address decoder is shared with
other BARs, so leaving it mapped could cause undesired behaviour like
hangs or memory corruption.
+Managed Resources
+-----------------
+
+.. kernel-doc:: drivers/gpu/drm/drm_managed.c
+ :doc: managed resources
+
+.. kernel-doc:: drivers/gpu/drm/drm_managed.c
+ :export:
+
+.. kernel-doc:: include/drm/drm_managed.h
+ :internal:
+
Bus-specific Device Registration and PCI Support
------------------------------------------------
diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
index 906771e03103..397314d08f77 100644
--- a/Documentation/gpu/drm-kms.rst
+++ b/Documentation/gpu/drm-kms.rst
@@ -3,7 +3,7 @@ Kernel Mode Setting (KMS)
=========================
Drivers must initialize the mode setting core by calling
-drm_mode_config_init() on the DRM device. The function
+drmm_mode_config_init() on the DRM device. The function
initializes the :c:type:`struct drm_device <drm_device>`
mode_config field and never fails. Once done, mode configuration must
be setup by initializing the following fields.
@@ -397,6 +397,9 @@ Connector Functions Reference
Writeback Connectors
--------------------
+.. kernel-doc:: include/drm/drm_writeback.h
+ :internal:
+
.. kernel-doc:: drivers/gpu/drm/drm_writeback.c
:doc: overview
diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index c77b32601260..1839762044be 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -373,15 +373,6 @@ GEM CMA Helper Functions Reference
.. kernel-doc:: drivers/gpu/drm/drm_gem_cma_helper.c
:export:
-VRAM Helper Function Reference
-==============================
-
-.. kernel-doc:: drivers/gpu/drm/drm_vram_helper_common.c
- :doc: overview
-
-.. kernel-doc:: include/drm/drm_gem_vram_helper.h
- :internal:
-
GEM VRAM Helper Functions Reference
-----------------------------------
diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index f6d363b6756e..33cc6ddf8f64 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -329,6 +329,52 @@ for execution also include a list of all locations within buffers that
refer to GPU-addresses so that the kernel can edit the buffer correctly.
This process is dubbed relocation.
+Locking Guidelines
+------------------
+
+.. note::
+ This is a description of how the locking should be after
+ refactoring is done. Does not necessarily reflect what the locking
+ looks like while WIP.
+
+#. All locking rules and interface contracts with cross-driver interfaces
+ (dma-buf, dma_fence) need to be followed.
+
+#. No struct_mutex anywhere in the code
+
+#. dma_resv will be the outermost lock (when needed) and ww_acquire_ctx
+ is to be hoisted at highest level and passed down within i915_gem_ctx
+ in the call chain
+
+#. While holding lru/memory manager (buddy, drm_mm, whatever) locks
+ system memory allocations are not allowed
+
+ * Enforce this by priming lockdep (with fs_reclaim). If we
+ allocate memory while holding these looks we get a rehash
+ of the shrinker vs. struct_mutex saga, and that would be
+ real bad.
+
+#. Do not nest different lru/memory manager locks within each other.
+ Take them in turn to update memory allocations, relying on the object’s
+ dma_resv ww_mutex to serialize against other operations.
+
+#. The suggestion for lru/memory managers locks is that they are small
+ enough to be spinlocks.
+
+#. All features need to come with exhaustive kernel selftests and/or
+ IGT tests when appropriate
+
+#. All LMEM uAPI paths need to be fully restartable (_interruptible()
+ for all locks/waits/sleeps)
+
+ * Error handling validation through signal injection.
+ Still the best strategy we have for validating GEM uAPI
+ corner cases.
+ Must be excessively used in the IGT, and we need to check
+ that we really have full path coverage of all error cases.
+
+ * -EDEADLK handling with ww_mutex
+
GEM BO Management Implementation Details
----------------------------------------
@@ -391,19 +437,19 @@ Global GTT views
GTT Fences and Swizzling
------------------------
-.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_fence_reg.c
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
:internal:
Global GTT Fence Handling
~~~~~~~~~~~~~~~~~~~~~~~~~
-.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_fence_reg.c
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
:doc: fence register handling
Hardware Tiling and Swizzling Details
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_fence_reg.c
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
:doc: tiling swizzling details
Object Tiling IOCTLs
diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst
index 439656f55c5d..658b52f7ffc6 100644
--- a/Documentation/gpu/todo.rst
+++ b/Documentation/gpu/todo.rst
@@ -347,18 +347,6 @@ Contact: Sean Paul
Level: Starter
-Remove drm_display_mode.hsync
------------------------------
-
-We have drm_mode_hsync() to calculate this from hsync_start/end, since drivers
-shouldn't/don't use this, remove this member to avoid any temptations to use it
-in the future. If there is any debug code using drm_display_mode.hsync, convert
-it to use drm_mode_hsync() instead.
-
-Contact: Sean Paul
-
-Level: Starter
-
connector register/unregister fixes
-----------------------------------
diff --git a/Documentation/hwmon/amd_energy.rst b/Documentation/hwmon/amd_energy.rst
new file mode 100644
index 000000000000..f8288edff664
--- /dev/null
+++ b/Documentation/hwmon/amd_energy.rst
@@ -0,0 +1,109 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Kernel driver amd_energy
+==========================
+
+Supported chips:
+
+* AMD Family 17h Processors
+
+ Prefix: 'amd_energy'
+
+ Addresses used: RAPL MSRs
+
+ Datasheets:
+
+ - Processor Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors
+
+ https://developer.amd.com/wp-content/resources/55570-B1_PUB.zip
+
+ - Preliminary Processor Programming Reference (PPR) for AMD Family 17h Model 31h, Revision B0 Processors
+
+ https://developer.amd.com/wp-content/resources/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip
+
+Author: Naveen Krishna Chatradhi <nchatrad@amd.com>
+
+Description
+-----------
+
+The Energy driver exposes the energy counters that are
+reported via the Running Average Power Limit (RAPL)
+Model-specific Registers (MSRs) via the hardware monitor
+(HWMON) sysfs interface.
+
+1. Power, Energy and Time Units
+ MSR_RAPL_POWER_UNIT/ C001_0299:
+ shared with all cores in the socket
+
+2. Energy consumed by each Core
+ MSR_CORE_ENERGY_STATUS/ C001_029A:
+ 32-bitRO, Accumulator, core-level power reporting
+
+3. Energy consumed by Socket
+ MSR_PACKAGE_ENERGY_STATUS/ C001_029B:
+ 32-bitRO, Accumulator, socket-level power reporting,
+ shared with all cores in socket
+
+These registers are updated every 1ms and cleared on
+reset of the system.
+
+Note: If SMT is enabled, Linux enumerates all threads as cpus.
+Since, the energy status registers are accessed at core level,
+reading those registers from the sibling threads would result
+in duplicate values. Hence, energy counter entries are not
+populated for the siblings.
+
+Energy Caluclation
+------------------
+
+Energy information (in Joules) is based on the multiplier,
+1/2^ESU; where ESU is an unsigned integer read from
+MSR_RAPL_POWER_UNIT register. Default value is 10000b,
+indicating energy status unit is 15.3 micro-Joules increment.
+
+Reported values are scaled as per the formula
+
+scaled value = ((1/2^ESU) * (Raw value) * 1000000UL) in uJoules
+
+Users calculate power for a given domain by calculating
+ dEnergy/dTime for that domain.
+
+Energy accumulation
+--------------------------
+
+Current, Socket energy status register is 32bit, assuming a 240W
+2P system, the register would wrap around in
+
+ 2^32*15.3 e-6/240 * 2 = 547.60833024 secs to wrap(~9 mins)
+
+The Core energy register may wrap around after several days.
+
+To improve the wrap around time, a kernel thread is implemented
+to accumulate the socket energy counters and one core energy counter
+per run to a respective 64-bit counter. The kernel thread starts
+running during probe, wakes up every 100secs and stops running
+when driver is removed.
+
+A socket and core energy read would return the current register
+value added to the respective energy accumulator.
+
+Sysfs attributes
+----------------
+
+=============== ======== =====================================
+Attribute Label Description
+=============== ======== =====================================
+
+* For index N between [1] and [nr_cpus]
+
+=============== ======== ======================================
+energy[N]_input EcoreX Core Energy X = [0] to [nr_cpus - 1]
+ Measured input core energy
+=============== ======== ======================================
+
+* For N between [nr_cpus] and [nr_cpus + nr_socks]
+
+=============== ======== ======================================
+energy[N]_input EsocketX Socket Energy X = [0] to [nr_socks -1]
+ Measured input socket energy
+=============== ======== ======================================
diff --git a/Documentation/hwmon/bcm54140.rst b/Documentation/hwmon/bcm54140.rst
new file mode 100644
index 000000000000..bc6ea4b45966
--- /dev/null
+++ b/Documentation/hwmon/bcm54140.rst
@@ -0,0 +1,45 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+Broadcom BCM54140 Quad SGMII/QSGMII PHY
+=======================================
+
+Supported chips:
+
+ * Broadcom BCM54140
+
+ Datasheet: not public
+
+Author: Michael Walle <michael@walle.cc>
+
+Description
+-----------
+
+The Broadcom BCM54140 is a Quad SGMII/QSGMII PHY which supports monitoring
+its die temperature as well as two analog voltages.
+
+The AVDDL is a 1.0V analogue voltage, the AVDDH is a 3.3V analogue voltage.
+Both voltages and the temperature are measured in a round-robin fashion.
+
+Sysfs entries
+-------------
+
+The following attributes are supported.
+
+======================= ========================================================
+in0_label "AVDDL"
+in0_input Measured AVDDL voltage.
+in0_min Minimum AVDDL voltage.
+in0_max Maximum AVDDL voltage.
+in0_alarm AVDDL voltage alarm.
+
+in1_label "AVDDH"
+in1_input Measured AVDDH voltage.
+in1_min Minimum AVDDH voltage.
+in1_max Maximum AVDDH voltage.
+in1_alarm AVDDH voltage alarm.
+
+temp1_input Die temperature.
+temp1_min Minimum die temperature.
+temp1_max Maximum die temperature.
+temp1_alarm Die temperature alarm.
+======================= ========================================================
diff --git a/Documentation/hwmon/bt1-pvt.rst b/Documentation/hwmon/bt1-pvt.rst
new file mode 100644
index 000000000000..cbb0c0613132
--- /dev/null
+++ b/Documentation/hwmon/bt1-pvt.rst
@@ -0,0 +1,117 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+Kernel driver bt1-pvt
+=====================
+
+Supported chips:
+
+ * Baikal-T1 PVT sensor (in SoC)
+
+ Prefix: 'bt1-pvt'
+
+ Addresses scanned: -
+
+ Datasheet: Provided by BAIKAL ELECTRONICS upon request and under NDA
+
+Authors:
+ Maxim Kaurkin <maxim.kaurkin@baikalelectronics.ru>
+ Serge Semin <Sergey.Semin@baikalelectronics.ru>
+
+Description
+-----------
+
+This driver implements support for the hardware monitoring capabilities of the
+embedded into Baikal-T1 process, voltage and temperature sensors. PVT IP-core
+consists of one temperature and four voltage sensors, which can be used to
+monitor the chip internal environment like heating, supply voltage and
+transistors performance. The driver can optionally provide the hwmon alarms
+for each sensor the PVT controller supports. The alarms functionality is made
+compile-time configurable due to the hardware interface implementation
+peculiarity, which is connected with an ability to convert data from only one
+sensor at a time. Additional limitation is that the controller performs the
+thresholds checking synchronously with the data conversion procedure. Due to
+these in order to have the hwmon alarms automatically detected the driver code
+must switch from one sensor to another, read converted data and manually check
+the threshold status bits. Depending on the measurements timeout settings
+(update_interval sysfs node value) this design may cause additional burden on
+the system performance. So in case if alarms are unnecessary in your system
+design it's recommended to have them disabled to prevent the PVT IRQs being
+periodically raised to get the data cache/alarms status up to date. By default
+in alarm-less configuration the data conversion is performed by the driver
+on demand when read operation is requested via corresponding _input-file.
+
+Temperature Monitoring
+----------------------
+
+Temperature is measured with 10-bit resolution and reported in millidegree
+Celsius. The driver performs all the scaling by itself therefore reports true
+temperatures that don't need any user-space adjustments. While the data
+translation formulae isn't linear, which gives us non-linear discreteness,
+it's close to one, but giving a bit better accuracy for higher temperatures.
+The temperature input is mapped as follows (the last column indicates the input
+ranges)::
+
+ temp1: CPU embedded diode -48.38C - +147.438C
+
+In case if the alarms kernel config is enabled in the driver the temperature input
+has associated min and max limits which trigger an alarm when crossed.
+
+Voltage Monitoring
+------------------
+
+The voltage inputs are also sampled with 10-bit resolution and reported in
+millivolts. But in this case the data translation formulae is linear, which
+provides a constant measurements discreteness. The data scaling is also
+performed by the driver, so returning true millivolts. The voltage inputs are
+mapped as follows (the last column indicates the input ranges)::
+
+ in0: VDD (processor core) 0.62V - 1.168V
+ in1: Low-Vt (low voltage threshold) 0.62V - 1.168V
+ in2: High-Vt (high voltage threshold) 0.62V - 1.168V
+ in3: Standard-Vt (standard voltage threshold) 0.62V - 1.168V
+
+In case if the alarms config is enabled in the driver the voltage inputs
+have associated min and max limits which trigger an alarm when crossed.
+
+Sysfs Attributes
+----------------
+
+Following is a list of all sysfs attributes that the driver provides, their
+permissions and a short description:
+
+=============================== ======= =======================================
+Name Perm Description
+=============================== ======= =======================================
+update_interval RW Measurements update interval per
+ sensor.
+temp1_type RO Sensor type (always 1 as CPU embedded
+ diode).
+temp1_label RO CPU Core Temperature sensor.
+temp1_input RO Measured temperature in millidegree
+ Celsius.
+temp1_min RW Low limit for temp input.
+temp1_max RW High limit for temp input.
+temp1_min_alarm RO Temperature input alarm. Returns 1 if
+ temperature input went below min limit,
+ 0 otherwise.
+temp1_max_alarm RO Temperature input alarm. Returns 1 if
+ temperature input went above max limit,
+ 0 otherwise.
+temp1_offset RW Temperature offset in millidegree
+ Celsius which is added to the
+ temperature reading by the chip. It can
+ be used to manually adjust the
+ temperature measurements within 7.130
+ degrees Celsius.
+in[0-3]_label RO CPU Voltage sensor (either core or
+ low/high/standard thresholds).
+in[0-3]_input RO Measured voltage in millivolts.
+in[0-3]_min RW Low limit for voltage input.
+in[0-3]_max RW High limit for voltage input.
+in[0-3]_min_alarm RO Voltage input alarm. Returns 1 if
+ voltage input went below min limit,
+ 0 otherwise.
+in[0-3]_max_alarm RO Voltage input alarm. Returns 1 if
+ voltage input went above max limit,
+ 0 otherwise.
+=============================== ======= =======================================
diff --git a/Documentation/hwmon/gsc-hwmon.rst b/Documentation/hwmon/gsc-hwmon.rst
new file mode 100644
index 000000000000..ffac392a7129
--- /dev/null
+++ b/Documentation/hwmon/gsc-hwmon.rst
@@ -0,0 +1,53 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Kernel driver gsc-hwmon
+=======================
+
+Supported chips: Gateworks GSC
+Datasheet: http://trac.gateworks.com/wiki/gsc
+Author: Tim Harvey <tharvey@gateworks.com>
+
+Description:
+------------
+
+This driver supports hardware monitoring for the temperature sensor,
+various ADC's connected to the GSC, and optional FAN controller available
+on some boards.
+
+
+Voltage Monitoring
+------------------
+
+The voltage inputs are scaled either internally or by the driver depending
+on the GSC version and firmware. The values returned by the driver do not need
+further scaling. The voltage input labels provide the voltage rail name:
+
+inX_input Measured voltage (mV).
+inX_label Name of voltage rail.
+
+
+Temperature Monitoring
+----------------------
+
+Temperatures are measured with 12-bit or 10-bit resolution and are scaled
+either internally or by the driver depending on the GSC version and firmware.
+The values returned by the driver reflect millidegree Celcius:
+
+tempX_input Measured temperature.
+tempX_label Name of temperature input.
+
+
+PWM Output Control
+------------------
+
+The GSC features 1 PWM output that operates in automatic mode where the
+PWM value will be scalled depending on 6 temperature boundaries.
+The tempeature boundaries are read-write and in millidegree Celcius and the
+read-only PWM values range from 0 (off) to 255 (full speed).
+Fan speed will be set to minimum (off) when the temperature sensor reads
+less than pwm1_auto_point1_temp and maximum when the temperature sensor
+equals or exceeds pwm1_auto_point6_temp.
+
+pwm1_auto_point[1-6]_pwm PWM value.
+pwm1_auto_point[1-6]_temp Temperature boundary.
+
diff --git a/Documentation/hwmon/ina2xx.rst b/Documentation/hwmon/ina2xx.rst
index 94b9a260c518..ed81f5416331 100644
--- a/Documentation/hwmon/ina2xx.rst
+++ b/Documentation/hwmon/ina2xx.rst
@@ -99,6 +99,25 @@ Sysfs entries for ina226, ina230 and ina231 only
------------------------------------------------
======================= ====================================================
+in0_lcrit Critical low shunt voltage
+in0_crit Critical high shunt voltage
+in0_lcrit_alarm Shunt voltage critical low alarm
+in0_crit_alarm Shunt voltage critical high alarm
+in1_lcrit Critical low bus voltage
+in1_crit Critical high bus voltage
+in1_lcrit_alarm Bus voltage critical low alarm
+in1_crit_alarm Bus voltage critical high alarm
+power1_crit Critical high power
+power1_crit_alarm Power critical high alarm
update_interval data conversion time; affects number of samples used
to average results for shunt and bus voltages.
======================= ====================================================
+
+.. note::
+
+ - Configure `shunt_resistor` before configure `power1_crit`, because power
+ value is calculated based on `shunt_resistor` set.
+ - Because of the underlying register implementation, only one `*crit` setting
+ and its `alarm` can be active. Writing to one `*crit` setting clears other
+ `*crit` settings and alarms. Writing 0 to any `*crit` setting clears all
+ `*crit` settings and alarms.
diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index 8ef62fd39787..55ff4b7c5349 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -39,10 +39,13 @@ Hardware Monitoring Kernel Drivers
adt7470
adt7475
amc6821
+ amd_energy
asb100
asc7621
aspeed-pwm-tacho
+ bcm54140
bel-pfe
+ bt1-pvt
coretemp
da9052
da9055
@@ -60,6 +63,7 @@ Hardware Monitoring Kernel Drivers
ftsteutates
g760a
g762
+ gsc-hwmon
gl518sm
hih6130
ibmaem
@@ -106,6 +110,7 @@ Hardware Monitoring Kernel Drivers
max16064
max16065
max1619
+ max16601
max1668
max197
max20730
diff --git a/Documentation/hwmon/lm90.rst b/Documentation/hwmon/lm90.rst
index 953315987c06..78dfc01b47a2 100644
--- a/Documentation/hwmon/lm90.rst
+++ b/Documentation/hwmon/lm90.rst
@@ -123,6 +123,18 @@ Supported chips:
http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3497
+ * Maxim MAX6654
+
+ Prefix: 'max6654'
+
+ Addresses scanned: I2C 0x18, 0x19, 0x1a, 0x29, 0x2a, 0x2b,
+
+ 0x4c, 0x4d and 0x4e
+
+ Datasheet: Publicly available at the Maxim website
+
+ https://www.maximintegrated.com/en/products/sensors/MAX6654.html
+
* Maxim MAX6657
Prefix: 'max6657'
@@ -301,6 +313,13 @@ ADT7461, ADT7461A, NCT1008:
* Extended temperature range (breaks compatibility)
* Lower resolution for remote temperature
+MAX6654:
+ * Better local resolution
+ * Selectable address
+ * Remote sensor type selection
+ * Extended temperature range
+ * Extended resolution only available when conversion rate <= 1 Hz
+
MAX6657 and MAX6658:
* Better local resolution
* Remote sensor type selection
@@ -336,8 +355,8 @@ SA56004X:
All temperature values are given in degrees Celsius. Resolution
is 1.0 degree for the local temperature, 0.125 degree for the remote
-temperature, except for the MAX6657, MAX6658 and MAX6659 which have a
-resolution of 0.125 degree for both temperatures.
+temperature, except for the MAX6654, MAX6657, MAX6658 and MAX6659 which have
+a resolution of 0.125 degree for both temperatures.
Each sensor has its own high and low limits, plus a critical limit.
Additionally, there is a relative hysteresis value common to both critical
diff --git a/Documentation/hwmon/max16601.rst b/Documentation/hwmon/max16601.rst
new file mode 100644
index 000000000000..346e74674c51
--- /dev/null
+++ b/Documentation/hwmon/max16601.rst
@@ -0,0 +1,159 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Kernel driver max16601
+======================
+
+Supported chips:
+
+ * Maxim MAX16601
+
+ Prefix: 'max16601'
+
+ Addresses scanned: -
+
+ Datasheet: Not published
+
+Author: Guenter Roeck <linux@roeck-us.net>
+
+
+Description
+-----------
+
+This driver supports the MAX16601 VR13.HC Dual-Output Voltage Regulator
+Chipset.
+
+The driver is a client driver to the core PMBus driver.
+Please see Documentation/hwmon/pmbus.rst for details on PMBus client drivers.
+
+
+Usage Notes
+-----------
+
+This driver does not auto-detect devices. You will have to instantiate the
+devices explicitly. Please see Documentation/i2c/instantiating-devices.rst for
+details.
+
+
+Platform data support
+---------------------
+
+The driver supports standard PMBus driver platform data.
+
+
+Sysfs entries
+-------------
+
+The following attributes are supported.
+
+======================= =======================================================
+in1_label "vin1"
+in1_input VCORE input voltage.
+in1_alarm Input voltage alarm.
+
+in2_label "vout1"
+in2_input VCORE output voltage.
+in2_alarm Output voltage alarm.
+
+curr1_label "iin1"
+curr1_input VCORE input current, derived from duty cycle and output
+ current.
+curr1_max Maximum input current.
+curr1_max_alarm Current high alarm.
+
+curr2_label "iin1.0"
+curr2_input VCORE phase 0 input current.
+
+curr3_label "iin1.1"
+curr3_input VCORE phase 1 input current.
+
+curr4_label "iin1.2"
+curr4_input VCORE phase 2 input current.
+
+curr5_label "iin1.3"
+curr5_input VCORE phase 3 input current.
+
+curr6_label "iin1.4"
+curr6_input VCORE phase 4 input current.
+
+curr7_label "iin1.5"
+curr7_input VCORE phase 5 input current.
+
+curr8_label "iin1.6"
+curr8_input VCORE phase 6 input current.
+
+curr9_label "iin1.7"
+curr9_input VCORE phase 7 input current.
+
+curr10_label "iin2"
+curr10_input VCORE input current, derived from sensor element.
+
+curr11_label "iin3"
+curr11_input VSA input current.
+
+curr12_label "iout1"
+curr12_input VCORE output current.
+curr12_crit Critical output current.
+curr12_crit_alarm Output current critical alarm.
+curr12_max Maximum output current.
+curr12_max_alarm Output current high alarm.
+
+curr13_label "iout1.0"
+curr13_input VCORE phase 0 output current.
+
+curr14_label "iout1.1"
+curr14_input VCORE phase 1 output current.
+
+curr15_label "iout1.2"
+curr15_input VCORE phase 2 output current.
+
+curr16_label "iout1.3"
+curr16_input VCORE phase 3 output current.
+
+curr17_label "iout1.4"
+curr17_input VCORE phase 4 output current.
+
+curr18_label "iout1.5"
+curr18_input VCORE phase 5 output current.
+
+curr19_label "iout1.6"
+curr19_input VCORE phase 6 output current.
+
+curr20_label "iout1.7"
+curr20_input VCORE phase 7 output current.
+
+curr21_label "iout3"
+curr21_input VSA output current.
+curr21_highest Historical maximum VSA output current.
+curr21_reset_history Write any value to reset curr21_highest.
+curr21_crit Critical output current.
+curr21_crit_alarm Output current critical alarm.
+curr21_max Maximum output current.
+curr21_max_alarm Output current high alarm.
+
+power1_label "pin1"
+power1_input Input power, derived from duty cycle and output current.
+power1_alarm Input power alarm.
+
+power2_label "pin2"
+power2_input Input power, derived from input current sensor.
+
+power3_label "pout"
+power3_input Output power.
+
+temp1_input VCORE temperature.
+temp1_crit Critical high temperature.
+temp1_crit_alarm Chip temperature critical high alarm.
+temp1_max Maximum temperature.
+temp1_max_alarm Chip temperature high alarm.
+
+temp2_input TSENSE_0 temperature
+temp3_input TSENSE_1 temperature
+temp4_input TSENSE_2 temperature
+temp5_input TSENSE_3 temperature
+
+temp6_input VSA temperature.
+temp6_crit Critical high temperature.
+temp6_crit_alarm Chip temperature critical high alarm.
+temp6_max Maximum temperature.
+temp6_max_alarm Chip temperature high alarm.
+======================= =======================================================
diff --git a/Documentation/i2c/i2c.svg b/Documentation/i2c/i2c_bus.svg
index 5979405ad1c3..3170de976373 100644
--- a/Documentation/i2c/i2c.svg
+++ b/Documentation/i2c/i2c_bus.svg
@@ -9,7 +9,7 @@
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
- sodipodi:docname="i2c.svg"
+ sodipodi:docname="i2c_bus.svg"
inkscape:version="0.92.3 (2405546, 2018-03-11)"
version="1.1"
id="svg2"
diff --git a/Documentation/i2c/summary.rst b/Documentation/i2c/summary.rst
index ce7230025b33..136c4e333be7 100644
--- a/Documentation/i2c/summary.rst
+++ b/Documentation/i2c/summary.rst
@@ -34,7 +34,7 @@ Terminology
Using the terminology from the official documentation, the I2C bus connects
one or more *master* chips and one or more *slave* chips.
-.. kernel-figure:: i2c.svg
+.. kernel-figure:: i2c_bus.svg
:alt: Simple I2C bus with one master and 3 slaves
Simple I2C bus
diff --git a/Documentation/ia64/irq-redir.rst b/Documentation/ia64/irq-redir.rst
index 39bf94484a15..6bbbbe4f73ef 100644
--- a/Documentation/ia64/irq-redir.rst
+++ b/Documentation/ia64/irq-redir.rst
@@ -7,7 +7,7 @@ IRQ affinity on IA64 platforms
By writing to /proc/irq/IRQ#/smp_affinity the interrupt routing can be
controlled. The behavior on IA64 platforms is slightly different from
-that described in Documentation/IRQ-affinity.txt for i386 systems.
+that described in Documentation/core-api/irq/irq-affinity.rst for i386 systems.
Because of the usage of SAPIC mode and physical destination mode the
IRQ target is one particular CPU and cannot be a mask of several
diff --git a/Documentation/iio/iio_configfs.rst b/Documentation/iio/iio_configfs.rst
index ecbfdb3afef7..6e38cbbd2981 100644
--- a/Documentation/iio/iio_configfs.rst
+++ b/Documentation/iio/iio_configfs.rst
@@ -9,7 +9,7 @@ Configfs is a filesystem-based manager of kernel objects. IIO uses some
objects that could be easily configured using configfs (e.g.: devices,
triggers).
-See Documentation/filesystems/configfs/configfs.txt for more information
+See Documentation/filesystems/configfs.rst for more information
about how configfs works.
2. Usage
diff --git a/Documentation/kbuild/makefiles.rst b/Documentation/kbuild/makefiles.rst
index 04d5c01a2e99..b80257a03830 100644
--- a/Documentation/kbuild/makefiles.rst
+++ b/Documentation/kbuild/makefiles.rst
@@ -1241,7 +1241,8 @@ When kbuild executes, the following steps are followed (roughly):
will be displayed with "make KBUILD_VERBOSE=0".
---- 6.9 Preprocessing linker scripts
+6.9 Preprocessing linker scripts
+--------------------------------
When the vmlinux image is built, the linker script
arch/$(ARCH)/kernel/vmlinux.lds is used.
diff --git a/Documentation/futex-requeue-pi.txt b/Documentation/locking/futex-requeue-pi.rst
index 14ab5787b9a7..14ab5787b9a7 100644
--- a/Documentation/futex-requeue-pi.txt
+++ b/Documentation/locking/futex-requeue-pi.rst
diff --git a/Documentation/hwspinlock.txt b/Documentation/locking/hwspinlock.rst
index 6f03713b7003..6f03713b7003 100644
--- a/Documentation/hwspinlock.txt
+++ b/Documentation/locking/hwspinlock.rst
diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
index 5d6800a723dc..d785878cad65 100644
--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -16,6 +16,13 @@ locking
rt-mutex
spinlocks
ww-mutex-design
+ preempt-locking
+ pi-futex
+ futex-requeue-pi
+ hwspinlock
+ percpu-rw-semaphore
+ robust-futexes
+ robust-futex-ABI
.. only:: subproject and html
diff --git a/Documentation/locking/locktorture.rst b/Documentation/locking/locktorture.rst
index 5bcb99ba7bd9..8012a74555e7 100644
--- a/Documentation/locking/locktorture.rst
+++ b/Documentation/locking/locktorture.rst
@@ -110,7 +110,7 @@ stutter
same period of time. Defaults to "stutter=5", so as
to run and pause for (roughly) five-second intervals.
Specifying "stutter=0" causes the test to run continuously
- without pausing, which is the old default behavior.
+ without pausing.
shuffle_interval
The number of seconds to keep the test threads affinitied
diff --git a/Documentation/locking/locktypes.rst b/Documentation/locking/locktypes.rst
index 09f45ce38d26..1b577a8bf982 100644
--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -13,6 +13,7 @@ The kernel provides a variety of locking primitives which can be divided
into two categories:
- Sleeping locks
+ - CPU local locks
- Spinning locks
This document conceptually describes these lock types and provides rules
@@ -44,9 +45,23 @@ Sleeping lock types:
On PREEMPT_RT kernels, these lock types are converted to sleeping locks:
+ - local_lock
- spinlock_t
- rwlock_t
+
+CPU local locks
+---------------
+
+ - local_lock
+
+On non-PREEMPT_RT kernels, local_lock functions are wrappers around
+preemption and interrupt disabling primitives. Contrary to other locking
+mechanisms, disabling preemption or interrupts are pure CPU local
+concurrency control mechanisms and not suited for inter-CPU concurrency
+control.
+
+
Spinning locks
--------------
@@ -67,6 +82,7 @@ can have suffixes which apply further protections:
_irqsave/restore() Save and disable / restore interrupt disabled state
=================== ====================================================
+
Owner semantics
===============
@@ -139,6 +155,56 @@ implementation, thus changing the fairness:
writer from starving readers.
+local_lock
+==========
+
+local_lock provides a named scope to critical sections which are protected
+by disabling preemption or interrupts.
+
+On non-PREEMPT_RT kernels local_lock operations map to the preemption and
+interrupt disabling and enabling primitives:
+
+ =========================== ======================
+ local_lock(&llock) preempt_disable()
+ local_unlock(&llock) preempt_enable()
+ local_lock_irq(&llock) local_irq_disable()
+ local_unlock_irq(&llock) local_irq_enable()
+ local_lock_save(&llock) local_irq_save()
+ local_lock_restore(&llock) local_irq_save()
+ =========================== ======================
+
+The named scope of local_lock has two advantages over the regular
+primitives:
+
+ - The lock name allows static analysis and is also a clear documentation
+ of the protection scope while the regular primitives are scopeless and
+ opaque.
+
+ - If lockdep is enabled the local_lock gains a lockmap which allows to
+ validate the correctness of the protection. This can detect cases where
+ e.g. a function using preempt_disable() as protection mechanism is
+ invoked from interrupt or soft-interrupt context. Aside of that
+ lockdep_assert_held(&llock) works as with any other locking primitive.
+
+local_lock and PREEMPT_RT
+-------------------------
+
+PREEMPT_RT kernels map local_lock to a per-CPU spinlock_t, thus changing
+semantics:
+
+ - All spinlock_t changes also apply to local_lock.
+
+local_lock usage
+----------------
+
+local_lock should be used in situations where disabling preemption or
+interrupts is the appropriate form of concurrency control to protect
+per-CPU data structures on a non PREEMPT_RT kernel.
+
+local_lock is not suitable to protect against preemption or interrupts on a
+PREEMPT_RT kernel due to the PREEMPT_RT specific spinlock_t semantics.
+
+
raw_spinlock_t and spinlock_t
=============================
@@ -258,10 +324,82 @@ implementation, thus changing semantics:
PREEMPT_RT caveats
==================
+local_lock on RT
+----------------
+
+The mapping of local_lock to spinlock_t on PREEMPT_RT kernels has a few
+implications. For example, on a non-PREEMPT_RT kernel the following code
+sequence works as expected::
+
+ local_lock_irq(&local_lock);
+ raw_spin_lock(&lock);
+
+and is fully equivalent to::
+
+ raw_spin_lock_irq(&lock);
+
+On a PREEMPT_RT kernel this code sequence breaks because local_lock_irq()
+is mapped to a per-CPU spinlock_t which neither disables interrupts nor
+preemption. The following code sequence works perfectly correct on both
+PREEMPT_RT and non-PREEMPT_RT kernels::
+
+ local_lock_irq(&local_lock);
+ spin_lock(&lock);
+
+Another caveat with local locks is that each local_lock has a specific
+protection scope. So the following substitution is wrong::
+
+ func1()
+ {
+ local_irq_save(flags); -> local_lock_irqsave(&local_lock_1, flags);
+ func3();
+ local_irq_restore(flags); -> local_lock_irqrestore(&local_lock_1, flags);
+ }
+
+ func2()
+ {
+ local_irq_save(flags); -> local_lock_irqsave(&local_lock_2, flags);
+ func3();
+ local_irq_restore(flags); -> local_lock_irqrestore(&local_lock_2, flags);
+ }
+
+ func3()
+ {
+ lockdep_assert_irqs_disabled();
+ access_protected_data();
+ }
+
+On a non-PREEMPT_RT kernel this works correctly, but on a PREEMPT_RT kernel
+local_lock_1 and local_lock_2 are distinct and cannot serialize the callers
+of func3(). Also the lockdep assert will trigger on a PREEMPT_RT kernel
+because local_lock_irqsave() does not disable interrupts due to the
+PREEMPT_RT-specific semantics of spinlock_t. The correct substitution is::
+
+ func1()
+ {
+ local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags);
+ func3();
+ local_irq_restore(flags); -> local_lock_irqrestore(&local_lock, flags);
+ }
+
+ func2()
+ {
+ local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags);
+ func3();
+ local_irq_restore(flags); -> local_lock_irqrestore(&local_lock, flags);
+ }
+
+ func3()
+ {
+ lockdep_assert_held(&local_lock);
+ access_protected_data();
+ }
+
+
spinlock_t and rwlock_t
-----------------------
-These changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
+The changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
have a few implications. For example, on a non-PREEMPT_RT kernel the
following code sequence works as expected::
@@ -282,9 +420,61 @@ local_lock mechanism. Acquiring the local_lock pins the task to a CPU,
allowing things like per-CPU interrupt disabled locks to be acquired.
However, this approach should be used only where absolutely necessary.
+A typical scenario is protection of per-CPU variables in thread context::
-raw_spinlock_t
---------------
+ struct foo *p = get_cpu_ptr(&var1);
+
+ spin_lock(&p->lock);
+ p->count += this_cpu_read(var2);
+
+This is correct code on a non-PREEMPT_RT kernel, but on a PREEMPT_RT kernel
+this breaks. The PREEMPT_RT-specific change of spinlock_t semantics does
+not allow to acquire p->lock because get_cpu_ptr() implicitly disables
+preemption. The following substitution works on both kernels::
+
+ struct foo *p;
+
+ migrate_disable();
+ p = this_cpu_ptr(&var1);
+ spin_lock(&p->lock);
+ p->count += this_cpu_read(var2);
+
+On a non-PREEMPT_RT kernel migrate_disable() maps to preempt_disable()
+which makes the above code fully equivalent. On a PREEMPT_RT kernel
+migrate_disable() ensures that the task is pinned on the current CPU which
+in turn guarantees that the per-CPU access to var1 and var2 are staying on
+the same CPU.
+
+The migrate_disable() substitution is not valid for the following
+scenario::
+
+ func()
+ {
+ struct foo *p;
+
+ migrate_disable();
+ p = this_cpu_ptr(&var1);
+ p->val = func2();
+
+While correct on a non-PREEMPT_RT kernel, this breaks on PREEMPT_RT because
+here migrate_disable() does not protect against reentrancy from a
+preempting task. A correct substitution for this case is::
+
+ func()
+ {
+ struct foo *p;
+
+ local_lock(&foo_lock);
+ p = this_cpu_ptr(&var1);
+ p->val = func2();
+
+On a non-PREEMPT_RT kernel this protects against reentrancy by disabling
+preemption. On a PREEMPT_RT kernel this is achieved by acquiring the
+underlying per-CPU spinlock.
+
+
+raw_spinlock_t on RT
+--------------------
Acquiring a raw_spinlock_t disables preemption and possibly also
interrupts, so the critical section must avoid acquiring a regular
@@ -325,22 +515,25 @@ Lock type nesting rules
The most basic rules are:
- - Lock types of the same lock category (sleeping, spinning) can nest
- arbitrarily as long as they respect the general lock ordering rules to
- prevent deadlocks.
+ - Lock types of the same lock category (sleeping, CPU local, spinning)
+ can nest arbitrarily as long as they respect the general lock ordering
+ rules to prevent deadlocks.
+
+ - Sleeping lock types cannot nest inside CPU local and spinning lock types.
- - Sleeping lock types cannot nest inside spinning lock types.
+ - CPU local and spinning lock types can nest inside sleeping lock types.
- - Spinning lock types can nest inside sleeping lock types.
+ - Spinning lock types can nest inside all lock types
These constraints apply both in PREEMPT_RT and otherwise.
The fact that PREEMPT_RT changes the lock category of spinlock_t and
-rwlock_t from spinning to sleeping means that they cannot be acquired while
-holding a raw spinlock. This results in the following nesting ordering:
+rwlock_t from spinning to sleeping and substitutes local_lock with a
+per-CPU spinlock_t means that they cannot be acquired while holding a raw
+spinlock. This results in the following nesting ordering:
1) Sleeping locks
- 2) spinlock_t and rwlock_t
+ 2) spinlock_t, rwlock_t, local_lock
3) raw_spinlock_t and bit spinlocks
Lockdep will complain if these constraints are violated, both in
diff --git a/Documentation/percpu-rw-semaphore.txt b/Documentation/locking/percpu-rw-semaphore.rst
index 247de6410855..247de6410855 100644
--- a/Documentation/percpu-rw-semaphore.txt
+++ b/Documentation/locking/percpu-rw-semaphore.rst
diff --git a/Documentation/pi-futex.txt b/Documentation/locking/pi-futex.rst
index c33ba2befbf8..c33ba2befbf8 100644
--- a/Documentation/pi-futex.txt
+++ b/Documentation/locking/pi-futex.rst
diff --git a/Documentation/preempt-locking.txt b/Documentation/locking/preempt-locking.rst
index dce336134e54..dce336134e54 100644
--- a/Documentation/preempt-locking.txt
+++ b/Documentation/locking/preempt-locking.rst
diff --git a/Documentation/robust-futex-ABI.txt b/Documentation/locking/robust-futex-ABI.rst
index f24904f1c16f..f24904f1c16f 100644
--- a/Documentation/robust-futex-ABI.txt
+++ b/Documentation/locking/robust-futex-ABI.rst
diff --git a/Documentation/robust-futexes.txt b/Documentation/locking/robust-futexes.rst
index 6361fb01c9c1..6361fb01c9c1 100644
--- a/Documentation/robust-futexes.txt
+++ b/Documentation/locking/robust-futexes.rst
diff --git a/Documentation/locking/rt-mutex.rst b/Documentation/locking/rt-mutex.rst
index c365dc302081..3b5097a380e6 100644
--- a/Documentation/locking/rt-mutex.rst
+++ b/Documentation/locking/rt-mutex.rst
@@ -4,7 +4,7 @@ RT-mutex subsystem with PI support
RT-mutexes with priority inheritance are used to support PI-futexes,
which enable pthread_mutex_t priority inheritance attributes
-(PTHREAD_PRIO_INHERIT). [See Documentation/pi-futex.txt for more details
+(PTHREAD_PRIO_INHERIT). [See Documentation/locking/pi-futex.rst for more details
about PI-futexes.]
This technology was developed in the -rt tree and streamlined for
diff --git a/Documentation/maintainer/maintainer-entry-profile.rst b/Documentation/maintainer/maintainer-entry-profile.rst
index 11ebe3682771..77e43c8b24b4 100644
--- a/Documentation/maintainer/maintainer-entry-profile.rst
+++ b/Documentation/maintainer/maintainer-entry-profile.rst
@@ -7,7 +7,7 @@ The Maintainer Entry Profile supplements the top-level process documents
(submitting-patches, submitting drivers...) with
subsystem/device-driver-local customs as well as details about the patch
submission life-cycle. A contributor uses this document to level set
-their expectations and avoid common mistakes, maintainers may use these
+their expectations and avoid common mistakes; maintainers may use these
profiles to look across subsystems for opportunities to converge on
common practices.
@@ -26,7 +26,7 @@ Example questions to consider:
- Does the subsystem have a patchwork instance? Are patchwork state
changes notified?
- Any bots or CI infrastructure that watches the list, or automated
- testing feedback that the subsystem gates acceptance?
+ testing feedback that the subsystem uses to gate acceptance?
- Git branches that are pulled into -next?
- What branch should contributors submit against?
- Links to any other Maintainer Entry Profiles? For example a
@@ -54,8 +54,8 @@ One of the common misunderstandings of submitters is that patches can be
sent at any time before the merge window closes and can still be
considered for the next -rc1. The reality is that most patches need to
be settled in soaking in linux-next in advance of the merge window
-opening. Clarify for the submitter the key dates (in terms rc release
-week) that patches might considered for merging and when patches need to
+opening. Clarify for the submitter the key dates (in terms of -rc release
+week) that patches might be considered for merging and when patches need to
wait for the next -rc. At a minimum:
- Last -rc for new feature submissions:
@@ -70,8 +70,8 @@ wait for the next -rc. At a minimum:
- Last -rc to merge features: Deadline for merge decisions
Indicate to contributors the point at which an as yet un-applied patch
set will need to wait for the NEXT+1 merge window. Of course there is no
- obligation to ever except any given patchset, but if the review has not
- concluded by this point the expectation the contributor should wait and
+ obligation to ever accept any given patchset, but if the review has not
+ concluded by this point the expectation is the contributor should wait and
resubmit for the following merge window.
Optional:
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index e1c355e84edd..eaabc3134294 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -620,7 +620,7 @@ because the CPUs that the Linux kernel supports don't do writes
until they are certain (1) that the write will actually happen, (2)
of the location of the write, and (3) of the value to be written.
But please carefully read the "CONTROL DEPENDENCIES" section and the
-Documentation/RCU/rcu_dereference.txt file: The compiler can and does
+Documentation/RCU/rcu_dereference.rst file: The compiler can and does
break dependencies in a great many highly creative ways.
CPU 1 CPU 2
diff --git a/Documentation/misc-devices/index.rst b/Documentation/misc-devices/index.rst
index c1dcd2628911..1ecc05fbe6f4 100644
--- a/Documentation/misc-devices/index.rst
+++ b/Documentation/misc-devices/index.rst
@@ -21,4 +21,5 @@ fit into other categories.
lis3lv02d
max6875
mic/index
+ uacce
xilinx_sdfec
diff --git a/Documentation/networking/6pack.txt b/Documentation/networking/6pack.rst
index 8f339428fdf4..bc5bf1f1a98f 100644
--- a/Documentation/networking/6pack.txt
+++ b/Documentation/networking/6pack.rst
@@ -1,27 +1,36 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+6pack Protocol
+==============
+
This is the 6pack-mini-HOWTO, written by
Andreas Könsgen DG3KQ
-Internet: ajk@comnets.uni-bremen.de
-AMPR-net: dg3kq@db0pra.ampr.org
-AX.25: dg3kq@db0ach.#nrw.deu.eu
+
+:Internet: ajk@comnets.uni-bremen.de
+:AMPR-net: dg3kq@db0pra.ampr.org
+:AX.25: dg3kq@db0ach.#nrw.deu.eu
Last update: April 7, 1998
1. What is 6pack, and what are the advantages to KISS?
+======================================================
6pack is a transmission protocol for data exchange between the PC and
the TNC over a serial line. It can be used as an alternative to KISS.
6pack has two major advantages:
+
- The PC is given full control over the radio
channel. Special control data is exchanged between the PC and the TNC so
that the PC knows at any time if the TNC is receiving data, if a TNC
buffer underrun or overrun has occurred, if the PTT is
set and so on. This control data is processed at a higher priority than
normal data, so a data stream can be interrupted at any time to issue an
- important event. This helps to improve the channel access and timing
- algorithms as everything is computed in the PC. It would even be possible
- to experiment with something completely different from the known CSMA and
+ important event. This helps to improve the channel access and timing
+ algorithms as everything is computed in the PC. It would even be possible
+ to experiment with something completely different from the known CSMA and
DAMA channel access methods.
This kind of real-time control is especially important to supply several
TNCs that are connected between each other and the PC by a daisy chain
@@ -36,6 +45,7 @@ More details about 6pack are described in the file 6pack.ps that is located
in the doc directory of the AX.25 utilities package.
2. Who has developed the 6pack protocol?
+========================================
The 6pack protocol has been developed by Ekki Plicht DF4OR, Henning Rech
DF9IC and Gunter Jost DK7WJ. A driver for 6pack, written by Gunter Jost and
@@ -44,12 +54,14 @@ They have also written a firmware for TNCs to perform the 6pack
protocol (see section 4 below).
3. Where can I get the latest version of 6pack for LinuX?
+=========================================================
At the moment, the 6pack stuff can obtained via anonymous ftp from
db0bm.automation.fh-aachen.de. In the directory /incoming/dg3kq,
there is a file named 6pack.tgz.
4. Preparing the TNC for 6pack operation
+========================================
To be able to use 6pack, a special firmware for the TNC is needed. The EPROM
of a newly bought TNC does not contain 6pack, so you will have to
@@ -75,12 +87,14 @@ and the status LED are lit for about a second if the firmware initialises
the TNC correctly.
5. Building and installing the 6pack driver
+===========================================
The driver has been tested with kernel version 2.1.90. Use with older
kernels may lead to a compilation error because the interface to a kernel
function has been changed in the 2.1.8x kernels.
How to turn on 6pack support:
+=============================
- In the linux kernel configuration program, select the code maturity level
options menu and turn on the prompting for development drivers.
@@ -94,27 +108,28 @@ To use the driver, the kissattach program delivered with the AX.25 utilities
has to be modified.
- Do a cd to the directory that holds the kissattach sources. Edit the
- kissattach.c file. At the top, insert the following lines:
+ kissattach.c file. At the top, insert the following lines::
+
+ #ifndef N_6PACK
+ #define N_6PACK (N_AX25+1)
+ #endif
- #ifndef N_6PACK
- #define N_6PACK (N_AX25+1)
- #endif
+ Then find the line:
- Then find the line
-
- int disc = N_AX25;
+ int disc = N_AX25;
and replace N_AX25 by N_6PACK.
- Recompile kissattach. Rename it to spattach to avoid confusions.
Installing the driver:
+----------------------
-- Do an insmod 6pack. Look at your /var/log/messages file to check if the
+- Do an insmod 6pack. Look at your /var/log/messages file to check if the
module has printed its initialization message.
- Do a spattach as you would launch kissattach when starting a KISS port.
- Check if the kernel prints the message '6pack: TNC found'.
+ Check if the kernel prints the message '6pack: TNC found'.
- From here, everything should work as if you were setting up a KISS port.
The only difference is that the network device that represents
@@ -138,6 +153,7 @@ from the PC to the TNC over the serial line, the status LED if data is
sent to the PC.
6. Known problems
+=================
When testing the driver with 2.0.3x kernels and
operating with data rates on the radio channel of 9600 Baud or higher,
diff --git a/Documentation/networking/altera_tse.txt b/Documentation/networking/altera_tse.rst
index 50b8589d12fd..7a7040072e58 100644
--- a/Documentation/networking/altera_tse.txt
+++ b/Documentation/networking/altera_tse.rst
@@ -1,6 +1,12 @@
- Altera Triple-Speed Ethernet MAC driver
+.. SPDX-License-Identifier: GPL-2.0
-Copyright (C) 2008-2014 Altera Corporation
+.. include:: <isonum.txt>
+
+=======================================
+Altera Triple-Speed Ethernet MAC driver
+=======================================
+
+Copyright |copy| 2008-2014 Altera Corporation
This is the driver for the Altera Triple-Speed Ethernet (TSE) controllers
using the SGDMA and MSGDMA soft DMA IP components. The driver uses the
@@ -46,23 +52,33 @@ Jumbo frames are not supported at this time.
The driver limits PHY operations to 10/100Mbps, and has not yet been fully
tested for 1Gbps. This support will be added in a future maintenance update.
-1) Kernel Configuration
+1. Kernel Configuration
+=======================
+
The kernel configuration option is ALTERA_TSE:
+
Device Drivers ---> Network device support ---> Ethernet driver support --->
Altera Triple-Speed Ethernet MAC support (ALTERA_TSE)
-2) Driver parameters list:
- debug: message level (0: no output, 16: all);
- dma_rx_num: Number of descriptors in the RX list (default is 64);
- dma_tx_num: Number of descriptors in the TX list (default is 64).
+2. Driver parameters list
+=========================
+
+ - debug: message level (0: no output, 16: all);
+ - dma_rx_num: Number of descriptors in the RX list (default is 64);
+ - dma_tx_num: Number of descriptors in the TX list (default is 64).
+
+3. Command line options
+=======================
+
+Driver parameters can be also passed in command line by using::
-3) Command line options
-Driver parameters can be also passed in command line by using:
altera_tse=dma_rx_num:128,dma_tx_num:512
-4) Driver information and notes
+4. Driver information and notes
+===============================
-4.1) Transmit process
+4.1. Transmit process
+---------------------
When the driver's transmit routine is called by the kernel, it sets up a
transmit descriptor by calling the underlying DMA transmit routine (SGDMA or
MSGDMA), and initiates a transmit operation. Once the transmit is complete, an
@@ -70,7 +86,8 @@ interrupt is driven by the transmit DMA logic. The driver handles the transmit
completion in the context of the interrupt handling chain by recycling
resource required to send and track the requested transmit operation.
-4.2) Receive process
+4.2. Receive process
+--------------------
The driver will post receive buffers to the receive DMA logic during driver
initialization. Receive buffers may or may not be queued depending upon the
underlying DMA logic (MSGDMA is able queue receive buffers, SGDMA is not able
@@ -79,34 +96,39 @@ received, the DMA logic generates an interrupt. The driver handles a receive
interrupt by obtaining the DMA receive logic status, reaping receive
completions until no more receive completions are available.
-4.3) Interrupt Mitigation
+4.3. Interrupt Mitigation
+-------------------------
The driver is able to mitigate the number of its DMA interrupts
using NAPI for receive operations. Interrupt mitigation is not yet supported
for transmit operations, but will be added in a future maintenance release.
4.4) Ethtool support
+--------------------
Ethtool is supported. Driver statistics and internal errors can be taken using:
ethtool -S ethX command. It is possible to dump registers etc.
4.5) PHY Support
+----------------
The driver is compatible with PAL to work with PHY and GPHY devices.
4.7) List of source files:
- o Kconfig
- o Makefile
- o altera_tse_main.c: main network device driver
- o altera_tse_ethtool.c: ethtool support
- o altera_tse.h: private driver structure and common definitions
- o altera_msgdma.h: MSGDMA implementation function definitions
- o altera_sgdma.h: SGDMA implementation function definitions
- o altera_msgdma.c: MSGDMA implementation
- o altera_sgdma.c: SGDMA implementation
- o altera_sgdmahw.h: SGDMA register and descriptor definitions
- o altera_msgdmahw.h: MSGDMA register and descriptor definitions
- o altera_utils.c: Driver utility functions
- o altera_utils.h: Driver utility function definitions
-
-5) Debug Information
+--------------------------
+ - Kconfig
+ - Makefile
+ - altera_tse_main.c: main network device driver
+ - altera_tse_ethtool.c: ethtool support
+ - altera_tse.h: private driver structure and common definitions
+ - altera_msgdma.h: MSGDMA implementation function definitions
+ - altera_sgdma.h: SGDMA implementation function definitions
+ - altera_msgdma.c: MSGDMA implementation
+ - altera_sgdma.c: SGDMA implementation
+ - altera_sgdmahw.h: SGDMA register and descriptor definitions
+ - altera_msgdmahw.h: MSGDMA register and descriptor definitions
+ - altera_utils.c: Driver utility functions
+ - altera_utils.h: Driver utility function definitions
+
+5. Debug Information
+====================
The driver exports debug information such as internal statistics,
debug information, MAC and DMA registers etc.
@@ -118,17 +140,18 @@ or sees the MAC registers: e.g. using: ethtool -d ethX
The developer can also use the "debug" module parameter to get
further debug information.
-6) Statistics Support
+6. Statistics Support
+=====================
The controller and driver support a mix of IEEE standard defined statistics,
RFC defined statistics, and driver or Altera defined statistics. The four
specifications containing the standard definitions for these statistics are
as follows:
- o IEEE 802.3-2012 - IEEE Standard for Ethernet.
- o RFC 2863 found at http://www.rfc-editor.org/rfc/rfc2863.txt.
- o RFC 2819 found at http://www.rfc-editor.org/rfc/rfc2819.txt.
- o Altera Triple Speed Ethernet User Guide, found at http://www.altera.com
+ - IEEE 802.3-2012 - IEEE Standard for Ethernet.
+ - RFC 2863 found at http://www.rfc-editor.org/rfc/rfc2863.txt.
+ - RFC 2819 found at http://www.rfc-editor.org/rfc/rfc2819.txt.
+ - Altera Triple Speed Ethernet User Guide, found at http://www.altera.com
The statistics supported by the TSE and the device driver are as follows:
diff --git a/Documentation/networking/arcnet-hardware.txt b/Documentation/networking/arcnet-hardware.rst
index 731de411513c..ac249ac8fcf2 100644
--- a/Documentation/networking/arcnet-hardware.txt
+++ b/Documentation/networking/arcnet-hardware.rst
@@ -1,11 +1,15 @@
-
------------------------------------------------------------------------------
-1) This file is a supplement to arcnet.txt. Please read that for general
- driver configuration help.
------------------------------------------------------------------------------
-2) This file is no longer Linux-specific. It should probably be moved out of
- the kernel sources. Ideas?
------------------------------------------------------------------------------
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+ARCnet Hardware
+===============
+
+.. note::
+
+ 1) This file is a supplement to arcnet.txt. Please read that for general
+ driver configuration help.
+ 2) This file is no longer Linux-specific. It should probably be moved out
+ of the kernel sources. Ideas?
Because so many people (myself included) seem to have obtained ARCnet cards
without manuals, this file contains a quick introduction to ARCnet hardware,
@@ -14,8 +18,8 @@ e-mail apenwarr@worldvisions.ca with any settings for your particular card,
or any other information you have!
-INTRODUCTION TO ARCNET
-----------------------
+Introduction to ARCnet
+======================
ARCnet is a network type which works in a way similar to popular Ethernet
networks but which is also different in some very important ways.
@@ -30,7 +34,7 @@ since I only have the 2.5 Mbps variety. It is probably not going to saturate
your 100 Mbps card. Stop complaining. :)
You also cannot connect an ARCnet card to any kind of Ethernet card and
-expect it to work.
+expect it to work.
There are two "types" of ARCnet - STAR topology and BUS topology. This
refers to how the cards are meant to be wired together. According to most
@@ -71,19 +75,24 @@ although they are generally kept down to the Ethernet-style 1500 bytes.
For more information on the advantages and disadvantages (mostly the
advantages) of ARCnet networks, you might try the "ARCnet Trade Association"
WWW page:
+
http://www.arcnet.com
-CABLING ARCNET NETWORKS
------------------------
+Cabling ARCnet Networks
+=======================
+
+This section was rewritten by
+
+ Vojtech Pavlik <vojtech@suse.cz>
-This section was rewritten by
- Vojtech Pavlik <vojtech@suse.cz>
using information from several people, including:
- Avery Pennraun <apenwarr@worldvisions.ca>
- Stephen A. Wood <saw@hallc1.cebaf.gov>
- John Paul Morrison <jmorriso@bogomips.ee.ubc.ca>
- Joachim Koenig <jojo@repas.de>
+
+ - Avery Pennraun <apenwarr@worldvisions.ca>
+ - Stephen A. Wood <saw@hallc1.cebaf.gov>
+ - John Paul Morrison <jmorriso@bogomips.ee.ubc.ca>
+ - Joachim Koenig <jojo@repas.de>
+
and Avery touched it up a bit, at Vojtech's request.
ARCnet (the classic 2.5 Mbps version) can be connected by two different
@@ -103,13 +112,13 @@ equal to a high impedance one with a terminator installed.
Usually, the ARCnet networks are built up from STAR cards and hubs. There
are two types of hubs - active and passive. Passive hubs are small boxes
-with four BNC connectors containing four 47 Ohm resistors:
+with four BNC connectors containing four 47 Ohm resistors::
- | | wires
- R + junction
--R-+-R- R 47 Ohm resistors
- R
- |
+ | | wires
+ R + junction
+ -R-+-R- R 47 Ohm resistors
+ R
+ |
The shielding is connected together. Active hubs are much more complicated;
they are powered and contain electronics to amplify the signal and send it
@@ -127,14 +136,15 @@ And now to the cabling. What you can connect together:
2. A card to a passive hub. Remember that all unused connectors on the hub
must be properly terminated with 93 Ohm (or something else if you don't
have the right ones) terminators.
- (Avery's note: oops, I didn't know that. Mine (TV cable) works
+
+ (Avery's note: oops, I didn't know that. Mine (TV cable) works
anyway, though.)
3. A card to an active hub. Here is no need to terminate the unused
connectors except some kind of aesthetic feeling. But, there may not be
more than eleven active hubs between any two computers. That of course
doesn't limit the number of active hubs on the network.
-
+
4. An active hub to another.
5. An active hub to passive hub.
@@ -142,22 +152,22 @@ And now to the cabling. What you can connect together:
Remember that you cannot connect two passive hubs together. The power loss
implied by such a connection is too high for the net to operate reliably.
-An example of a typical ARCnet network:
+An example of a typical ARCnet network::
- R S - STAR type card
+ R S - STAR type card
S------H--------A-------S R - Terminator
- | | H - Hub
- | | A - Active hub
- | S----H----S
- S |
- |
- S
-
+ | | H - Hub
+ | | A - Active hub
+ | S----H----S
+ S |
+ |
+ S
+
The BUS topology is very similar to the one used by Ethernet. The only
difference is in cable and terminators: they should be 93 Ohm. Ethernet
uses 50 Ohm impedance. You use T connectors to put the computers on a single
line of cable, the bus. You have to put terminators at both ends of the
-cable. A typical BUS ARCnet network looks like:
+cable. A typical BUS ARCnet network looks like::
RT----T------T------T------T------TR
B B B B B B
@@ -168,63 +178,63 @@ cable. A typical BUS ARCnet network looks like:
But that is not all! The two types can be connected together. According to
the official documentation the only way of connecting them is using an active
-hub:
+hub::
- A------T------T------TR
- | B B B
+ A------T------T------TR
+ | B B B
S---H---S
- |
- S
+ |
+ S
The official docs also state that you can use STAR cards at the ends of
-BUS network in place of a BUS card and a terminator:
+BUS network in place of a BUS card and a terminator::
S------T------T------S
- B B
+ B B
But, according to my own experiments, you can simply hang a BUS type card
anywhere in middle of a cable in a STAR topology network. And more - you
can use the bus card in place of any star card if you use a terminator. Then
you can build very complicated networks fulfilling all your needs! An
-example:
-
- S
- |
- RT------T-------T------H------S
- B B B |
- | R
- S------A------T-------T-------A-------H------TR
- | B B | | B
- | S BT |
- | | | S----A-----S
- S------H---A----S | |
- | | S------T----H---S |
- S S B R S
-
+example::
+
+ S
+ |
+ RT------T-------T------H------S
+ B B B |
+ | R
+ S------A------T-------T-------A-------H------TR
+ | B B | | B
+ | S BT |
+ | | | S----A-----S
+ S------H---A----S | |
+ | | S------T----H---S |
+ S S B R S
+
A basically different cabling scheme is used with Twisted Pair cabling. Each
of the TP cards has two RJ (phone-cord style) connectors. The cards are
then daisy-chained together using a cable connecting every two neighboring
cards. The ends are terminated with RJ 93 Ohm terminators which plug into
-the empty connectors of cards on the ends of the chain. An example:
+the empty connectors of cards on the ends of the chain. An example::
- ___________ ___________
- _R_|_ _|_|_ _|_R_
- | | | | | |
- |Card | |Card | |Card |
- |_____| |_____| |_____|
+ ___________ ___________
+ _R_|_ _|_|_ _|_R_
+ | | | | | |
+ |Card | |Card | |Card |
+ |_____| |_____| |_____|
There are also hubs for the TP topology. There is nothing difficult
involved in using them; you just connect a TP chain to a hub on any end or
-even at both. This way you can create almost any network configuration.
+even at both. This way you can create almost any network configuration.
The maximum of 11 hubs between any two computers on the net applies here as
-well. An example:
+well. An example::
RP-------P--------P--------H-----P------P-----PR
- |
+ |
RP-----H--------P--------H-----P------PR
- | |
- PR PR
+ | |
+ PR PR
R - RJ Terminator
P - TP Card
@@ -234,11 +244,13 @@ Like any network, ARCnet has a limited cable length. These are the maximum
cable lengths between two active ends (an active end being an active hub or
a STAR card).
+ ========== ======= ===========
RG-62 93 Ohm up to 650 m
RG-59/U 75 Ohm up to 457 m
RG-11/U 75 Ohm up to 533 m
IBM Type 1 150 Ohm up to 200 m
IBM Type 3 100 Ohm up to 100 m
+ ========== ======= ===========
The maximum length of all cables connected to a passive hub is limited to 65
meters for RG-62 cabling; less for others. You can see that using passive
@@ -248,8 +260,8 @@ most distant points of the net is limited to 3000 meters. The maximum length
of a TP cable between two cards/hubs is 650 meters.
-SETTING THE JUMPERS
--------------------
+Setting the Jumpers
+===================
All ARCnet cards should have a total of four or five different settings:
@@ -261,43 +273,51 @@ All ARCnet cards should have a total of four or five different settings:
eating net connections on my system (at least) otherwise. My guess is
this may be because, if your card is at 0x2E0, probing for a serial port
at 0x2E8 will reset the card and probably mess things up royally.
+
- Avery's favourite: 0x300.
- the IRQ: on 8-bit cards, it might be 2 (9), 3, 4, 5, or 7.
- on 16-bit cards, it might be 2 (9), 3, 4, 5, 7, or 10-15.
-
+ on 16-bit cards, it might be 2 (9), 3, 4, 5, 7, or 10-15.
+
Make sure this is different from any other card on your system. Note
that IRQ2 is the same as IRQ9, as far as Linux is concerned. You can
"cat /proc/interrupts" for a somewhat complete list of which ones are in
use at any given time. Here is a list of common usages from Vojtech
Pavlik <vojtech@suse.cz>:
- ("Not on bus" means there is no way for a card to generate this
+
+ ("Not on bus" means there is no way for a card to generate this
interrupt)
- IRQ 0 - Timer 0 (Not on bus)
- IRQ 1 - Keyboard (Not on bus)
- IRQ 2 - IRQ Controller 2 (Not on bus, nor does interrupt the CPU)
- IRQ 3 - COM2
- IRQ 4 - COM1
- IRQ 5 - FREE (LPT2 if you have it; sometimes COM3; maybe PLIP)
- IRQ 6 - Floppy disk controller
- IRQ 7 - FREE (LPT1 if you don't use the polling driver; PLIP)
- IRQ 8 - Realtime Clock Interrupt (Not on bus)
- IRQ 9 - FREE (VGA vertical sync interrupt if enabled)
- IRQ 10 - FREE
- IRQ 11 - FREE
- IRQ 12 - FREE
- IRQ 13 - Numeric Coprocessor (Not on bus)
- IRQ 14 - Fixed Disk Controller
- IRQ 15 - FREE (Fixed Disk Controller 2 if you have it)
-
- Note: IRQ 9 is used on some video cards for the "vertical retrace"
- interrupt. This interrupt would have been handy for things like
- video games, as it occurs exactly once per screen refresh, but
- unfortunately IBM cancelled this feature starting with the original
- VGA and thus many VGA/SVGA cards do not support it. For this
- reason, no modern software uses this interrupt and it can almost
- always be safely disabled, if your video card supports it at all.
-
+
+ ====== =========================================================
+ IRQ 0 Timer 0 (Not on bus)
+ IRQ 1 Keyboard (Not on bus)
+ IRQ 2 IRQ Controller 2 (Not on bus, nor does interrupt the CPU)
+ IRQ 3 COM2
+ IRQ 4 COM1
+ IRQ 5 FREE (LPT2 if you have it; sometimes COM3; maybe PLIP)
+ IRQ 6 Floppy disk controller
+ IRQ 7 FREE (LPT1 if you don't use the polling driver; PLIP)
+ IRQ 8 Realtime Clock Interrupt (Not on bus)
+ IRQ 9 FREE (VGA vertical sync interrupt if enabled)
+ IRQ 10 FREE
+ IRQ 11 FREE
+ IRQ 12 FREE
+ IRQ 13 Numeric Coprocessor (Not on bus)
+ IRQ 14 Fixed Disk Controller
+ IRQ 15 FREE (Fixed Disk Controller 2 if you have it)
+ ====== =========================================================
+
+
+ .. note::
+
+ IRQ 9 is used on some video cards for the "vertical retrace"
+ interrupt. This interrupt would have been handy for things like
+ video games, as it occurs exactly once per screen refresh, but
+ unfortunately IBM cancelled this feature starting with the original
+ VGA and thus many VGA/SVGA cards do not support it. For this
+ reason, no modern software uses this interrupt and it can almost
+ always be safely disabled, if your video card supports it at all.
+
If your card for some reason CANNOT disable this IRQ (usually there
is a jumper), one solution would be to clip the printed circuit
contact on the board: it's the fourth contact from the left on the
@@ -308,14 +328,18 @@ All ARCnet cards should have a total of four or five different settings:
- the memory address: Unlike most cards, ARCnets use "shared memory" for
copying buffers around. Make SURE it doesn't conflict with any other
used memory in your system!
+
+ ::
+
A0000 - VGA graphics memory (ok if you don't have VGA)
- B0000 - Monochrome text mode
- C0000 \ One of these is your VGA BIOS - usually C0000.
- E0000 /
- F0000 - System BIOS
+ B0000 - Monochrome text mode
+ C0000 \ One of these is your VGA BIOS - usually C0000.
+ E0000 /
+ F0000 - System BIOS
Anything less than 0xA0000 is, well, a BAD idea since it isn't above
640k.
+
- Avery's favourite: 0xD0000
- the station address: Every ARCnet card has its own "unique" network
@@ -326,6 +350,7 @@ All ARCnet cards should have a total of four or five different settings:
neat stuff will probably happen if you DO use them). By the way, if you
haven't already guessed, don't set this the same as any other ARCnet on
your network!
+
- Avery's favourite: 3 and 4. Not that it matters.
- There may be ETS1 and ETS2 settings. These may or may not make a
@@ -336,28 +361,34 @@ All ARCnet cards should have a total of four or five different settings:
requirement here is that all cards on the network with ETS1 and ETS2
jumpers have them in the same position. Chris Hindy <chrish@io.org>
sent in a chart with actual values for this:
+
+ ======= ======= =============== ====================
ET1 ET2 Response Time Reconfiguration Time
- --- --- ------------- --------------------
+ ======= ======= =============== ====================
open open 74.7us 840us
open closed 283.4us 1680us
closed open 561.8us 1680us
closed closed 1118.6us 1680us
-
+ ======= ======= =============== ====================
+
Make sure you set ETS1 and ETS2 to the SAME VALUE for all cards on your
network.
-
-Also, on many cards (not mine, though) there are red and green LED's.
+
+Also, on many cards (not mine, though) there are red and green LED's.
Vojtech Pavlik <vojtech@suse.cz> tells me this is what they mean:
+
+ =============== =============== =====================================
GREEN RED Status
- ----- --- ------
+ =============== =============== =====================================
OFF OFF Power off
OFF Short flashes Cabling problems (broken cable or not
- terminated)
+ terminated)
OFF (short) ON Card init
ON ON Normal state - everything OK, nothing
- happens
+ happens
ON Long flashes Data transfer
ON OFF Never happens (maybe when wrong ID)
+ =============== =============== =====================================
The following is all the specific information people have sent me about
@@ -366,7 +397,7 @@ huge amounts of duplicated information. I have no time to fix it. If you
want to, PLEASE DO! Just send me a 'diff -u' of all your changes.
The model # is listed right above specifics for that card, so you should be
-able to use your text viewer's "search" function to find the entry you want.
+able to use your text viewer's "search" function to find the entry you want.
If you don't KNOW what kind of card you have, try looking through the
various diagrams to see if you can tell.
@@ -378,8 +409,9 @@ model that is, please e-mail me to say so.
Cards Listed in this file (in this order, mostly):
+ =============== ======================= ====
Manufacturer Model # Bits
- ------------ ------- ----
+ =============== ======================= ====
SMC PC100 8
SMC PC110 8
SMC PC120 8
@@ -404,17 +436,19 @@ Cards Listed in this file (in this order, mostly):
No Name Taiwan R.O.C? 8
No Name Model 9058 8
Tiara Tiara Lancard? 8
-
+ =============== ======================= ====
-** SMC = Standard Microsystems Corp.
-** CNet Tech = CNet Technology, Inc.
+* SMC = Standard Microsystems Corp.
+* CNet Tech = CNet Technology, Inc.
Unclassified Stuff
-------------------
+==================
+
- Please send any other information you can find.
-
- - And some other stuff (more info is welcome!):
+
+ - And some other stuff (more info is welcome!)::
+
From: root@ultraworld.xs4all.nl (Timo Hilbrink)
To: apenwarr@foxnet.net (Avery Pennarun)
Date: Wed, 26 Oct 1994 02:10:32 +0000 (GMT)
@@ -423,7 +457,7 @@ Unclassified Stuff
[...parts deleted...]
About the jumpers: On my PC130 there is one more jumper, located near the
- cable-connector and it's for changing to star or bus topology;
+ cable-connector and it's for changing to star or bus topology;
closed: star - open: bus
On the PC500 are some more jumper-pins, one block labeled with RX,PDN,TXI
and another with ALE,LA17,LA18,LA19 these are undocumented..
@@ -432,136 +466,130 @@ Unclassified Stuff
--- CUT ---
+Standard Microsystems Corp (SMC)
+================================
+
+PC100, PC110, PC120, PC130 (8-bit cards) and PC500, PC600 (16-bit cards)
+------------------------------------------------------------------------
-** Standard Microsystems Corp (SMC) **
-PC100, PC110, PC120, PC130 (8-bit cards)
-PC500, PC600 (16-bit cards)
----------------------------------
- mainly from Avery Pennarun <apenwarr@worldvisions.ca>. Values depicted
are from Avery's setup.
- special thanks to Timo Hilbrink <timoh@xs4all.nl> for noting that PC120,
- 130, 500, and 600 all have the same switches as Avery's PC100.
+ 130, 500, and 600 all have the same switches as Avery's PC100.
PC500/600 have several extra, undocumented pins though. (?)
- PC110 settings were verified by Stephen A. Wood <saw@cebaf.gov>
- Also, the JP- and S-numbers probably don't match your card exactly. Try
to find jumpers/switches with the same number of settings - it's
probably more reliable.
-
-
- JP5 [|] : : : :
-(IRQ Setting) IRQ2 IRQ3 IRQ4 IRQ5 IRQ7
- Put exactly one jumper on exactly one set of pins.
-
-
- 1 2 3 4 5 6 7 8 9 10
- S1 /----------------------------------\
-(I/O and Memory | 1 1 * 0 0 0 0 * 1 1 0 1 |
- addresses) \----------------------------------/
- |--| |--------| |--------|
- (a) (b) (m)
-
- WARNING. It's very important when setting these which way
- you're holding the card, and which way you think is '1'!
-
- If you suspect that your settings are not being made
- correctly, try reversing the direction or inverting the
- switch positions.
-
- a: The first digit of the I/O address.
- Setting Value
- ------- -----
- 00 0
- 01 1
- 10 2
- 11 3
-
- b: The second digit of the I/O address.
- Setting Value
- ------- -----
- 0000 0
- 0001 1
- 0010 2
- ... ...
- 1110 E
- 1111 F
-
- The I/O address is in the form ab0. For example, if
- a is 0x2 and b is 0xE, the address will be 0x2E0.
-
- DO NOT SET THIS LESS THAN 0x200!!!!!
-
-
- m: The first digit of the memory address.
- Setting Value
- ------- -----
- 0000 0
- 0001 1
- 0010 2
- ... ...
- 1110 E
- 1111 F
-
- The memory address is in the form m0000. For example, if
- m is D, the address will be 0xD0000.
-
- DO NOT SET THIS TO C0000, F0000, OR LESS THAN A0000!
-
- 1 2 3 4 5 6 7 8
- S2 /--------------------------\
-(Station Address) | 1 1 0 0 0 0 0 0 |
- \--------------------------/
-
- Setting Value
- ------- -----
- 00000000 00
- 10000000 01
- 01000000 02
- ...
- 01111111 FE
- 11111111 FF
-
- Note that this is binary with the digits reversed!
-
- DO NOT SET THIS TO 0 OR 255 (0xFF)!
+::
+
+ JP5 [|] : : : :
+ (IRQ Setting) IRQ2 IRQ3 IRQ4 IRQ5 IRQ7
+ Put exactly one jumper on exactly one set of pins.
+
+
+ 1 2 3 4 5 6 7 8 9 10
+ S1 /----------------------------------\
+ (I/O and Memory | 1 1 * 0 0 0 0 * 1 1 0 1 |
+ addresses) \----------------------------------/
+ |--| |--------| |--------|
+ (a) (b) (m)
+
+ WARNING. It's very important when setting these which way
+ you're holding the card, and which way you think is '1'!
+
+ If you suspect that your settings are not being made
+ correctly, try reversing the direction or inverting the
+ switch positions.
+
+ a: The first digit of the I/O address.
+ Setting Value
+ ------- -----
+ 00 0
+ 01 1
+ 10 2
+ 11 3
+
+ b: The second digit of the I/O address.
+ Setting Value
+ ------- -----
+ 0000 0
+ 0001 1
+ 0010 2
+ ... ...
+ 1110 E
+ 1111 F
+
+ The I/O address is in the form ab0. For example, if
+ a is 0x2 and b is 0xE, the address will be 0x2E0.
+
+ DO NOT SET THIS LESS THAN 0x200!!!!!
+
+
+ m: The first digit of the memory address.
+ Setting Value
+ ------- -----
+ 0000 0
+ 0001 1
+ 0010 2
+ ... ...
+ 1110 E
+ 1111 F
+
+ The memory address is in the form m0000. For example, if
+ m is D, the address will be 0xD0000.
+
+ DO NOT SET THIS TO C0000, F0000, OR LESS THAN A0000!
+
+ 1 2 3 4 5 6 7 8
+ S2 /--------------------------\
+ (Station Address) | 1 1 0 0 0 0 0 0 |
+ \--------------------------/
+
+ Setting Value
+ ------- -----
+ 00000000 00
+ 10000000 01
+ 01000000 02
+ ...
+ 01111111 FE
+ 11111111 FF
+
+ Note that this is binary with the digits reversed!
+
+ DO NOT SET THIS TO 0 OR 255 (0xFF)!
-*****************************************************************************
-** Standard Microsystems Corp (SMC) **
PC130E/PC270E (8-bit cards)
---------------------------
- - from Juergen Seifert <seifert@htwm.de>
-
-STANDARD MICROSYSTEMS CORPORATION (SMC) ARCNET(R)-PC130E/PC270E
-===============================================================
+ - from Juergen Seifert <seifert@htwm.de>
This description has been written by Juergen Seifert <seifert@htwm.de>
-using information from the following Original SMC Manual
+using information from the following Original SMC Manual
- "Configuration Guide for
- ARCNET(R)-PC130E/PC270
- Network Controller Boards
- Pub. # 900.044A
- June, 1989"
+ "Configuration Guide for ARCNET(R)-PC130E/PC270 Network
+ Controller Boards Pub. # 900.044A June, 1989"
ARCNET is a registered trademark of the Datapoint Corporation
-SMC is a registered trademark of the Standard Microsystems Corporation
+SMC is a registered trademark of the Standard Microsystems Corporation
-The PC130E is an enhanced version of the PC130 board, is equipped with a
+The PC130E is an enhanced version of the PC130 board, is equipped with a
standard BNC female connector for connection to RG-62/U coax cable.
Since this board is designed both for point-to-point connection in star
-networks and for connection to bus networks, it is downwardly compatible
+networks and for connection to bus networks, it is downwardly compatible
with all the other standard boards designed for coax networks (that is,
-the PC120, PC110 and PC100 star topology boards and the PC220, PC210 and
+the PC120, PC110 and PC100 star topology boards and the PC220, PC210 and
PC200 bus topology boards).
-The PC270E is an enhanced version of the PC260 board, is equipped with two
+The PC270E is an enhanced version of the PC260 board, is equipped with two
modular RJ11-type jacks for connection to twisted pair wiring.
It can be used in a star or a daisy-chained network.
+::
- 8 7 6 5 4 3 2 1
+ 8 7 6 5 4 3 2 1
________________________________________________________________
| | S1 | |
| |_________________| |
@@ -587,27 +615,27 @@ It can be used in a star or a daisy-chained network.
| |
|_____________________________________________|
-Legend:
+Legend::
-SMC 90C63 ARCNET Controller / Transceiver /Logic
-S1 1-3: I/O Base Address Select
+ SMC 90C63 ARCNET Controller / Transceiver /Logic
+ S1 1-3: I/O Base Address Select
4-6: Memory Base Address Select
7-8: RAM Offset Select
-S2 1-8: Node ID Select
-EXT Extended Timeout Select
-ROM ROM Enable Select
-STAR Selected - Star Topology (PC130E only)
+ S2 1-8: Node ID Select
+ EXT Extended Timeout Select
+ ROM ROM Enable Select
+ STAR Selected - Star Topology (PC130E only)
Deselected - Bus Topology (PC130E only)
-CR3/CR4 Diagnostic LEDs
-J1 BNC RG62/U Connector (PC130E only)
-J1 6-position Telephone Jack (PC270E only)
-J2 6-position Telephone Jack (PC270E only)
+ CR3/CR4 Diagnostic LEDs
+ J1 BNC RG62/U Connector (PC130E only)
+ J1 6-position Telephone Jack (PC270E only)
+ J2 6-position Telephone Jack (PC270E only)
Setting one of the switches to Off/Open means "1", On/Closed means "0".
Setting the Node ID
--------------------
+^^^^^^^^^^^^^^^^^^^
The eight switches in group S2 are used to set the node ID.
These switches work in a way similar to the PC100-series cards; see that
@@ -615,10 +643,10 @@ entry for more information.
Setting the I/O Base Address
-----------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The first three switches in switch group S1 are used to select one
-of eight possible I/O Base addresses using the following table
+of eight possible I/O Base addresses using the following table::
Switch | Hex I/O
@@ -635,14 +663,16 @@ of eight possible I/O Base addresses using the following table
Setting the Base Memory (RAM) buffer Address
---------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The memory buffer requires 2K of a 16K block of RAM. The base of this
16K block can be located in any of eight positions.
Switches 4-6 of switch group S1 select the Base of the 16K block.
-Within that 16K address space, the buffer may be assigned any one of four
+Within that 16K address space, the buffer may be assigned any one of four
positions, determined by the offset, switches 7 and 8 of group S1.
+::
+
Switch | Hex RAM | Hex ROM
4 5 6 7 8 | Address | Address *)
-----------|---------|-----------
@@ -650,115 +680,111 @@ positions, determined by the offset, switches 7 and 8 of group S1.
0 0 0 0 1 | C0800 | C2000
0 0 0 1 0 | C1000 | C2000
0 0 0 1 1 | C1800 | C2000
- | |
+ | |
0 0 1 0 0 | C4000 | C6000
0 0 1 0 1 | C4800 | C6000
0 0 1 1 0 | C5000 | C6000
0 0 1 1 1 | C5800 | C6000
- | |
+ | |
0 1 0 0 0 | CC000 | CE000
0 1 0 0 1 | CC800 | CE000
0 1 0 1 0 | CD000 | CE000
0 1 0 1 1 | CD800 | CE000
- | |
+ | |
0 1 1 0 0 | D0000 | D2000 (Manufacturer's default)
0 1 1 0 1 | D0800 | D2000
0 1 1 1 0 | D1000 | D2000
0 1 1 1 1 | D1800 | D2000
- | |
+ | |
1 0 0 0 0 | D4000 | D6000
1 0 0 0 1 | D4800 | D6000
1 0 0 1 0 | D5000 | D6000
1 0 0 1 1 | D5800 | D6000
- | |
+ | |
1 0 1 0 0 | D8000 | DA000
1 0 1 0 1 | D8800 | DA000
1 0 1 1 0 | D9000 | DA000
1 0 1 1 1 | D9800 | DA000
- | |
+ | |
1 1 0 0 0 | DC000 | DE000
1 1 0 0 1 | DC800 | DE000
1 1 0 1 0 | DD000 | DE000
1 1 0 1 1 | DD800 | DE000
- | |
+ | |
1 1 1 0 0 | E0000 | E2000
1 1 1 0 1 | E0800 | E2000
1 1 1 1 0 | E1000 | E2000
1 1 1 1 1 | E1800 | E2000
-
-*) To enable the 8K Boot PROM install the jumper ROM.
- The default is jumper ROM not installed.
+
+ *) To enable the 8K Boot PROM install the jumper ROM.
+ The default is jumper ROM not installed.
Setting the Timeouts and Interrupt
-----------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The jumpers labeled EXT1 and EXT2 are used to determine the timeout
+The jumpers labeled EXT1 and EXT2 are used to determine the timeout
parameters. These two jumpers are normally left open.
To select a hardware interrupt level set one (only one!) of the jumpers
IRQ2, IRQ3, IRQ4, IRQ5, IRQ7. The Manufacturer's default is IRQ2.
-
+
Configuring the PC130E for Star or Bus Topology
------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The single jumper labeled STAR is used to configure the PC130E board for
+The single jumper labeled STAR is used to configure the PC130E board for
star or bus topology.
-When the jumper is installed, the board may be used in a star network, when
+When the jumper is installed, the board may be used in a star network, when
it is removed, the board can be used in a bus topology.
Diagnostic LEDs
----------------
+^^^^^^^^^^^^^^^
Two diagnostic LEDs are visible on the rear bracket of the board.
The green LED monitors the network activity: the red one shows the
-board activity:
+board activity::
Green | Status Red | Status
-------|------------------- ---------|-------------------
on | normal activity flash/on | data transfer
blink | reconfiguration off | no data transfer;
off | defective board or | incorrect memory or
- | node ID is zero | I/O address
+ | node ID is zero | I/O address
-*****************************************************************************
-
-** Standard Microsystems Corp (SMC) **
PC500/PC550 Longboard (16-bit cards)
--------------------------------------
+------------------------------------
+
- from Juergen Seifert <seifert@htwm.de>
-STANDARD MICROSYSTEMS CORPORATION (SMC) ARCNET-PC500/PC550 Long Board
-=====================================================================
+ .. note::
-Note: There is another Version of the PC500 called Short Version, which
+ There is another Version of the PC500 called Short Version, which
is different in hard- and software! The most important differences
are:
+
- The long board has no Shared memory.
- On the long board the selection of the interrupt is done by binary
- coded switch, on the short board directly by jumper.
-
+ coded switch, on the short board directly by jumper.
+
[Avery's note: pay special attention to that: the long board HAS NO SHARED
-MEMORY. This means the current Linux-ARCnet driver can't use these cards.
+MEMORY. This means the current Linux-ARCnet driver can't use these cards.
I have obtained a PC500Longboard and will be doing some experiments on it in
the future, but don't hold your breath. Thanks again to Juergen Seifert for
his advice about this!]
This description has been written by Juergen Seifert <seifert@htwm.de>
-using information from the following Original SMC Manual
+using information from the following Original SMC Manual
- "Configuration Guide for
- SMC ARCNET-PC500/PC550
- Series Network Controller Boards
- Pub. # 900.033 Rev. A
- November, 1989"
+ "Configuration Guide for SMC ARCNET-PC500/PC550
+ Series Network Controller Boards Pub. # 900.033 Rev. A
+ November, 1989"
ARCNET is a registered trademark of the Datapoint Corporation
-SMC is a registered trademark of the Standard Microsystems Corporation
+SMC is a registered trademark of the Standard Microsystems Corporation
The PC500 is equipped with a standard BNC female connector for connection
to RG-62/U coax cable.
@@ -769,7 +795,9 @@ The PC550 is equipped with two modular RJ11-type jacks for connection
to twisted pair wiring.
It can be used in a star or a daisy-chained (BUS) network.
- 1
+::
+
+ 1
0 9 8 7 6 5 4 3 2 1 6 5 4 3 2 1
____________________________________________________________________
< | SW1 | | SW2 | |
@@ -796,34 +824,34 @@ It can be used in a star or a daisy-chained (BUS) network.
> | | |
<____| |_____________________________________________|
-Legend:
+Legend::
-SW1 1-6: I/O Base Address Select
+ SW1 1-6: I/O Base Address Select
7-10: Interrupt Select
-SW2 1-6: Reserved for Future Use
-SW3 1-8: Node ID Select
-JP2 1-4: Extended Timeout Select
-JP6 Selected - Star Topology (PC500 only)
+ SW2 1-6: Reserved for Future Use
+ SW3 1-8: Node ID Select
+ JP2 1-4: Extended Timeout Select
+ JP6 Selected - Star Topology (PC500 only)
Deselected - Bus Topology (PC500 only)
-CR3 Green Monitors Network Activity
-CR4 Red Monitors Board Activity
-J1 BNC RG62/U Connector (PC500 only)
-J1 6-position Telephone Jack (PC550 only)
-J2 6-position Telephone Jack (PC550 only)
+ CR3 Green Monitors Network Activity
+ CR4 Red Monitors Board Activity
+ J1 BNC RG62/U Connector (PC500 only)
+ J1 6-position Telephone Jack (PC550 only)
+ J2 6-position Telephone Jack (PC550 only)
Setting one of the switches to Off/Open means "1", On/Closed means "0".
Setting the Node ID
--------------------
+^^^^^^^^^^^^^^^^^^^
The eight switches in group SW3 are used to set the node ID. Each node
-attached to the network must have an unique node ID which must be
+attached to the network must have an unique node ID which must be
different from 0.
Switch 1 serves as the least significant bit (LSB).
-The node ID is the sum of the values of all switches set to "1"
-These values are:
+The node ID is the sum of the values of all switches set to "1"
+These values are::
Switch | Value
-------|-------
@@ -836,30 +864,30 @@ These values are:
7 | 64
8 | 128
-Some Examples:
+Some Examples::
- Switch | Hex | Decimal
+ Switch | Hex | Decimal
8 7 6 5 4 3 2 1 | Node ID | Node ID
----------------|---------|---------
0 0 0 0 0 0 0 0 | not allowed
- 0 0 0 0 0 0 0 1 | 1 | 1
+ 0 0 0 0 0 0 0 1 | 1 | 1
0 0 0 0 0 0 1 0 | 2 | 2
0 0 0 0 0 0 1 1 | 3 | 3
. . . | |
0 1 0 1 0 1 0 1 | 55 | 85
. . . | |
1 0 1 0 1 0 1 0 | AA | 170
- . . . | |
+ . . . | |
1 1 1 1 1 1 0 1 | FD | 253
1 1 1 1 1 1 1 0 | FE | 254
- 1 1 1 1 1 1 1 1 | FF | 255
+ 1 1 1 1 1 1 1 1 | FF | 255
Setting the I/O Base Address
-----------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The first six switches in switch group SW1 are used to select one
-of 32 possible I/O Base addresses using the following table
+of 32 possible I/O Base addresses using the following table::
Switch | Hex I/O
6 5 4 3 2 1 | Address
@@ -899,16 +927,18 @@ of 32 possible I/O Base addresses using the following table
Setting the Interrupt
----------------------
+^^^^^^^^^^^^^^^^^^^^^
-Switches seven through ten of switch group SW1 are used to select the
-interrupt level. The interrupt level is binary coded, so selections
+Switches seven through ten of switch group SW1 are used to select the
+interrupt level. The interrupt level is binary coded, so selections
from 0 to 15 would be possible, but only the following eight values will
be supported: 3, 4, 5, 7, 9, 10, 11, 12.
+::
+
Switch | IRQ
- 10 9 8 7 |
- ---------|--------
+ 10 9 8 7 |
+ ---------|--------
0 0 1 1 | 3
0 1 0 0 | 4
0 1 0 1 | 5
@@ -919,52 +949,50 @@ be supported: 3, 4, 5, 7, 9, 10, 11, 12.
1 1 0 0 | 12
-Setting the Timeouts
---------------------
+Setting the Timeouts
+^^^^^^^^^^^^^^^^^^^^
-The two jumpers JP2 (1-4) are used to determine the timeout parameters.
+The two jumpers JP2 (1-4) are used to determine the timeout parameters.
These two jumpers are normally left open.
Refer to the COM9026 Data Sheet for alternate configurations.
Configuring the PC500 for Star or Bus Topology
-----------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The single jumper labeled JP6 is used to configure the PC500 board for
+The single jumper labeled JP6 is used to configure the PC500 board for
star or bus topology.
-When the jumper is installed, the board may be used in a star network, when
+When the jumper is installed, the board may be used in a star network, when
it is removed, the board can be used in a bus topology.
Diagnostic LEDs
----------------
+^^^^^^^^^^^^^^^
Two diagnostic LEDs are visible on the rear bracket of the board.
The green LED monitors the network activity: the red one shows the
-board activity:
+board activity::
Green | Status Red | Status
-------|------------------- ---------|-------------------
on | normal activity flash/on | data transfer
blink | reconfiguration off | no data transfer;
off | defective board or | incorrect memory or
- | node ID is zero | I/O address
+ | node ID is zero | I/O address
-*****************************************************************************
-
-** SMC **
PC710 (8-bit card)
------------------
+
- from J.S. van Oosten <jvoosten@compiler.tdcnet.nl>
-
+
Note: this data is gathered by experimenting and looking at info of other
cards. However, I'm sure I got 99% of the settings right.
The SMC710 card resembles the PC270 card, but is much more basic (i.e. no
-LEDs, RJ11 jacks, etc.) and 8 bit. Here's a little drawing:
+LEDs, RJ11 jacks, etc.) and 8 bit. Here's a little drawing::
- _______________________________________
+ _______________________________________
| +---------+ +---------+ |____
| | S2 | | S1 | |
| +---------+ +---------+ |
@@ -976,12 +1004,12 @@ LEDs, RJ11 jacks, etc.) and 8 bit. Here's a little drawing:
| +===+ |
| |
| .. JP1 +----------+ |
- | .. | big chip | |
+ | .. | big chip | |
| .. | 90C63 | |
| .. | | |
| .. +----------+ |
------- -----------
- |||||||||||||||||||||
+ |||||||||||||||||||||
The row of jumpers at JP1 actually consists of 8 jumpers, (sometimes
labelled) the same as on the PC270, from top to bottom: EXT2, EXT1, ROM,
@@ -992,71 +1020,76 @@ are swapped (S1 is the nodeaddress, S2 sets IO- and RAM-address).
I know it works when connected to a PC110 type ARCnet board.
-
+
*****************************************************************************
-** Possibly SMC **
+Possibly SMC
+============
+
LCS-8830(-T) (8 and 16-bit cards)
---------------------------------
+
- from Mathias Katzer <mkatzer@HRZ.Uni-Bielefeld.DE>
- Marek Michalkiewicz <marekm@i17linuxb.ists.pwr.wroc.pl> says the
LCS-8830 is slightly different from LCS-8830-T. These are 8 bit, BUS
only (the JP0 jumper is hardwired), and BNC only.
-
+
This is a LCS-8830-T made by SMC, I think ('SMC' only appears on one PLCC,
nowhere else, not even on the few Xeroxed sheets from the manual).
-SMC ARCnet Board Type LCS-8830-T
+SMC ARCnet Board Type LCS-8830-T::
- ------------------------------------
- | |
- | JP3 88 8 JP2 |
- | ##### | \ |
- | ##### ET1 ET2 ###|
- | 8 ###|
- | U3 SW 1 JP0 ###| Phone Jacks
- | -- ###|
- | | | |
- | | | SW2 |
- | | | |
- | | | ##### |
- | -- ##### #### BNC Connector
- | ####
- | 888888 JP1 |
- | 234567 |
- -- -------
- |||||||||||||||||||||||||||
- --------------------------
-
-
-SW1: DIP-Switches for Station Address
-SW2: DIP-Switches for Memory Base and I/O Base addresses
-
-JP0: If closed, internal termination on (default open)
-JP1: IRQ Jumpers
-JP2: Boot-ROM enabled if closed
-JP3: Jumpers for response timeout
-
-U3: Boot-ROM Socket
-
-
-ET1 ET2 Response Time Idle Time Reconfiguration Time
-
- 78 86 840
- X 285 316 1680
- X 563 624 1680
- X X 1130 1237 1680
-
-(X means closed jumper)
-
-(DIP-Switch downwards means "0")
+ ------------------------------------
+ | |
+ | JP3 88 8 JP2 |
+ | ##### | \ |
+ | ##### ET1 ET2 ###|
+ | 8 ###|
+ | U3 SW 1 JP0 ###| Phone Jacks
+ | -- ###|
+ | | | |
+ | | | SW2 |
+ | | | |
+ | | | ##### |
+ | -- ##### #### BNC Connector
+ | ####
+ | 888888 JP1 |
+ | 234567 |
+ -- -------
+ |||||||||||||||||||||||||||
+ --------------------------
+
+
+ SW1: DIP-Switches for Station Address
+ SW2: DIP-Switches for Memory Base and I/O Base addresses
+
+ JP0: If closed, internal termination on (default open)
+ JP1: IRQ Jumpers
+ JP2: Boot-ROM enabled if closed
+ JP3: Jumpers for response timeout
+
+ U3: Boot-ROM Socket
+
+
+ ET1 ET2 Response Time Idle Time Reconfiguration Time
+
+ 78 86 840
+ X 285 316 1680
+ X 563 624 1680
+ X X 1130 1237 1680
+
+ (X means closed jumper)
+
+ (DIP-Switch downwards means "0")
The station address is binary-coded with SW1.
The I/O base address is coded with DIP-Switches 6,7 and 8 of SW2:
+======== ========
Switches Base
678 Address
+======== ========
000 260-26f
100 290-29f
010 2e0-2ef
@@ -1065,19 +1098,22 @@ Switches Base
101 350-35f
011 380-38f
111 3e0-3ef
+======== ========
DIP Switches 1-5 of SW2 encode the RAM and ROM Address Range:
+======== ============= ================
Switches RAM ROM
12345 Address Range Address Range
+======== ============= ================
00000 C:0000-C:07ff C:2000-C:3fff
10000 C:0800-C:0fff
01000 C:1000-C:17ff
11000 C:1800-C:1fff
00100 C:4000-C:47ff C:6000-C:7fff
10100 C:4800-C:4fff
-01100 C:5000-C:57ff
+01100 C:5000-C:57ff
11100 C:5800-C:5fff
00010 C:C000-C:C7ff C:E000-C:ffff
10010 C:C800-C:Cfff
@@ -1094,7 +1130,7 @@ Switches RAM ROM
00101 D:8000-D:87ff D:A000-D:bfff
10101 D:8800-D:8fff
01101 D:9000-D:97ff
-11101 D:9800-D:9fff
+11101 D:9800-D:9fff
00011 D:C000-D:c7ff D:E000-D:ffff
10011 D:C800-D:cfff
01011 D:D000-D:d7ff
@@ -1103,34 +1139,37 @@ Switches RAM ROM
10111 E:0800-E:0fff
01111 E:1000-E:17ff
11111 E:1800-E:1fff
+======== ============= ================
-*****************************************************************************
+PureData Corp
+=============
-** PureData Corp **
PDI507 (8-bit card)
--------------------
+
- from Mark Rejhon <mdrejhon@magi.com> (slight modifications by Avery)
- Avery's note: I think PDI508 cards (but definitely NOT PDI508Plus cards)
are mostly the same as this. PDI508Plus cards appear to be mainly
software-configured.
Jumpers:
+
There is a jumper array at the bottom of the card, near the edge
- connector. This array is labelled J1. They control the IRQs and
- something else. Put only one jumper on the IRQ pins.
+ connector. This array is labelled J1. They control the IRQs and
+ something else. Put only one jumper on the IRQ pins.
ETS1, ETS2 are for timing on very long distance networks. See the
more general information near the top of this file.
There is a J2 jumper on two pins. A jumper should be put on them,
- since it was already there when I got the card. I don't know what
- this jumper is for though.
+ since it was already there when I got the card. I don't know what
+ this jumper is for though.
There is a two-jumper array for J3. I don't know what it is for,
- but there were already two jumpers on it when I got the card. It's
- a six pin grid in a two-by-three fashion. The jumpers were
- configured as follows:
+ but there were already two jumpers on it when I got the card. It's
+ a six pin grid in a two-by-three fashion. The jumpers were
+ configured as follows::
.-------.
o | o o |
@@ -1140,28 +1179,28 @@ Jumpers:
Carl de Billy <CARL@carainfo.com> explains J3 and J4:
- J3 Diagram:
+ J3 Diagram::
- .-------.
- o | o o |
- :-------: TWIST Technology
- o | o o |
- `-------'
- .-------.
- | o o | o
- :-------: COAX Technology
- | o o | o
- `-------'
+ .-------.
+ o | o o |
+ :-------: TWIST Technology
+ o | o o |
+ `-------'
+ .-------.
+ | o o | o
+ :-------: COAX Technology
+ | o o | o
+ `-------'
- If using coax cable in a bus topology the J4 jumper must be removed;
place it on one pin.
- - If using bus topology with twisted pair wiring move the J3
+ - If using bus topology with twisted pair wiring move the J3
jumpers so they connect the middle pin and the pins closest to the RJ11
Connectors. Also the J4 jumper must be removed; place it on one pin of
J4 jumper for storage.
- - If using star topology with twisted pair wiring move the J3
+ - If using star topology with twisted pair wiring move the J3
jumpers so they connect the middle pin and the pins closest to the RJ11
connectors.
@@ -1169,40 +1208,43 @@ Carl de Billy <CARL@carainfo.com> explains J3 and J4:
DIP Switches:
The DIP switches accessible on the accessible end of the card while
- it is installed, is used to set the ARCnet address. There are 8
- switches. Use an address from 1 to 254.
+ it is installed, is used to set the ARCnet address. There are 8
+ switches. Use an address from 1 to 254
- Switch No.
- 12345678 ARCnet address
- -----------------------------------------
+ ========== =========================
+ Switch No. ARCnet address
+ 12345678
+ ========== =========================
00000000 FF (Don't use this!)
00000001 FE
00000010 FD
- ....
- 11111101 2
+ ...
+ 11111101 2
11111110 1
11111111 0 (Don't use this!)
+ ========== =========================
There is another array of eight DIP switches at the top of the
- card. There are five labelled MS0-MS4 which seem to control the
- memory address, and another three labelled IO0-IO2 which seem to
- control the base I/O address of the card.
+ card. There are five labelled MS0-MS4 which seem to control the
+ memory address, and another three labelled IO0-IO2 which seem to
+ control the base I/O address of the card.
This was difficult to test by trial and error, and the I/O addresses
- are in a weird order. This was tested by setting the DIP switches,
- rebooting the computer, and attempting to load ARCETHER at various
- addresses (mostly between 0x200 and 0x400). The address that caused
- the red transmit LED to blink, is the one that I thought works.
+ are in a weird order. This was tested by setting the DIP switches,
+ rebooting the computer, and attempting to load ARCETHER at various
+ addresses (mostly between 0x200 and 0x400). The address that caused
+ the red transmit LED to blink, is the one that I thought works.
Also, the address 0x3D0 seem to have a special meaning, since the
- ARCETHER packet driver loaded fine, but without the red LED
- blinking. I don't know what 0x3D0 is for though. I recommend using
- an address of 0x300 since Windows may not like addresses below
- 0x300.
-
- IO Switch No.
- 210 I/O address
- -------------------------------
+ ARCETHER packet driver loaded fine, but without the red LED
+ blinking. I don't know what 0x3D0 is for though. I recommend using
+ an address of 0x300 since Windows may not like addresses below
+ 0x300.
+
+ ============= ===========
+ IO Switch No. I/O address
+ 210
+ ============= ===========
111 0x260
110 0x290
101 0x2E0
@@ -1211,29 +1253,31 @@ DIP Switches:
010 0x350
001 0x380
000 0x3E0
+ ============= ===========
The memory switches set a reserved address space of 0x1000 bytes
- (0x100 segment units, or 4k). For example if I set an address of
- 0xD000, it will use up addresses 0xD000 to 0xD100.
+ (0x100 segment units, or 4k). For example if I set an address of
+ 0xD000, it will use up addresses 0xD000 to 0xD100.
The memory switches were tested by booting using QEMM386 stealth,
- and using LOADHI to see what address automatically became excluded
- from the upper memory regions, and then attempting to load ARCETHER
- using these addresses.
+ and using LOADHI to see what address automatically became excluded
+ from the upper memory regions, and then attempting to load ARCETHER
+ using these addresses.
I recommend using an ARCnet memory address of 0xD000, and putting
- the EMS page frame at 0xC000 while using QEMM stealth mode. That
- way, you get contiguous high memory from 0xD100 almost all the way
- the end of the megabyte.
+ the EMS page frame at 0xC000 while using QEMM stealth mode. That
+ way, you get contiguous high memory from 0xD100 almost all the way
+ the end of the megabyte.
Memory Switch 0 (MS0) didn't seem to work properly when set to OFF
- on my card. It could be malfunctioning on my card. Experiment with
- it ON first, and if it doesn't work, set it to OFF. (It may be a
- modifier for the 0x200 bit?)
+ on my card. It could be malfunctioning on my card. Experiment with
+ it ON first, and if it doesn't work, set it to OFF. (It may be a
+ modifier for the 0x200 bit?)
+ ============= ============================================
MS Switch No.
43210 Memory address
- --------------------------------
+ ============= ============================================
00001 0xE100 (guessed - was not detected by QEMM)
00011 0xE000 (guessed - was not detected by QEMM)
00101 0xDD00
@@ -1250,40 +1294,36 @@ DIP Switches:
11011 0xC800 (guessed - crashes tested system)
11101 0xC500 (guessed - crashes tested system)
11111 0xC400 (guessed - crashes tested system)
-
-
-*****************************************************************************
+ ============= ============================================
+
+CNet Technology Inc. (8-bit cards)
+==================================
-** CNet Technology Inc. **
120 Series (8-bit cards)
------------------------
- from Juergen Seifert <seifert@htwm.de>
-
-CNET TECHNOLOGY INC. (CNet) ARCNET 120A SERIES
-==============================================
-
This description has been written by Juergen Seifert <seifert@htwm.de>
-using information from the following Original CNet Manual
-
- "ARCNET
- USER'S MANUAL
- for
- CN120A
- CN120AB
- CN120TP
- CN120ST
- CN120SBT
- P/N:12-01-0007
- Revision 3.00"
+using information from the following Original CNet Manual
+
+ "ARCNET USER'S MANUAL for
+ CN120A
+ CN120AB
+ CN120TP
+ CN120ST
+ CN120SBT
+ P/N:12-01-0007
+ Revision 3.00"
ARCNET is a registered trademark of the Datapoint Corporation
-P/N 120A ARCNET 8 bit XT/AT Star
-P/N 120AB ARCNET 8 bit XT/AT Bus
-P/N 120TP ARCNET 8 bit XT/AT Twisted Pair
-P/N 120ST ARCNET 8 bit XT/AT Star, Twisted Pair
-P/N 120SBT ARCNET 8 bit XT/AT Star, Bus, Twisted Pair
+- P/N 120A ARCNET 8 bit XT/AT Star
+- P/N 120AB ARCNET 8 bit XT/AT Bus
+- P/N 120TP ARCNET 8 bit XT/AT Twisted Pair
+- P/N 120ST ARCNET 8 bit XT/AT Star, Twisted Pair
+- P/N 120SBT ARCNET 8 bit XT/AT Star, Bus, Twisted Pair
+
+::
__________________________________________________________________
| |
@@ -1307,75 +1347,77 @@ P/N 120SBT ARCNET 8 bit XT/AT Star, Bus, Twisted Pair
| > SOCKET | JP 6 5 4 3 2 |o|o|o| | J1 |
| |______________| |o|o|o|o|o| |o|o|o| |_____|
|_____ |o|o|o|o|o| ______________|
- | |
- |_____________________________________________|
-
-Legend:
-
-90C65 ARCNET Probe
-S1 1-5: Base Memory Address Select
- 6-8: Base I/O Address Select
-S2 1-8: Node ID Select (ID0-ID7)
-JP1 ROM Enable Select
-JP2 IRQ2
-JP3 IRQ3
-JP4 IRQ4
-JP5 IRQ5
-JP6 IRQ7
-JP7/JP8 ET1, ET2 Timeout Parameters
-JP10/JP11 Coax / Twisted Pair Select (CN120ST/SBT only)
-JP12 Terminator Select (CN120AB/ST/SBT only)
-J1 BNC RG62/U Connector (all except CN120TP)
-J2 Two 6-position Telephone Jack (CN120TP/ST/SBT only)
+ | |
+ |_____________________________________________|
+
+Legend::
+
+ 90C65 ARCNET Probe
+ S1 1-5: Base Memory Address Select
+ 6-8: Base I/O Address Select
+ S2 1-8: Node ID Select (ID0-ID7)
+ JP1 ROM Enable Select
+ JP2 IRQ2
+ JP3 IRQ3
+ JP4 IRQ4
+ JP5 IRQ5
+ JP6 IRQ7
+ JP7/JP8 ET1, ET2 Timeout Parameters
+ JP10/JP11 Coax / Twisted Pair Select (CN120ST/SBT only)
+ JP12 Terminator Select (CN120AB/ST/SBT only)
+ J1 BNC RG62/U Connector (all except CN120TP)
+ J2 Two 6-position Telephone Jack (CN120TP/ST/SBT only)
Setting one of the switches to Off means "1", On means "0".
Setting the Node ID
--------------------
+^^^^^^^^^^^^^^^^^^^
The eight switches in SW2 are used to set the node ID. Each node attached
to the network must have an unique node ID which must be different from 0.
Switch 1 (ID0) serves as the least significant bit (LSB).
-The node ID is the sum of the values of all switches set to "1"
+The node ID is the sum of the values of all switches set to "1"
These values are:
- Switch | Label | Value
- -------|-------|-------
- 1 | ID0 | 1
- 2 | ID1 | 2
- 3 | ID2 | 4
- 4 | ID3 | 8
- 5 | ID4 | 16
- 6 | ID5 | 32
- 7 | ID6 | 64
- 8 | ID7 | 128
-
-Some Examples:
-
- Switch | Hex | Decimal
+ ======= ====== =====
+ Switch Label Value
+ ======= ====== =====
+ 1 ID0 1
+ 2 ID1 2
+ 3 ID2 4
+ 4 ID3 8
+ 5 ID4 16
+ 6 ID5 32
+ 7 ID6 64
+ 8 ID7 128
+ ======= ====== =====
+
+Some Examples::
+
+ Switch | Hex | Decimal
8 7 6 5 4 3 2 1 | Node ID | Node ID
----------------|---------|---------
0 0 0 0 0 0 0 0 | not allowed
- 0 0 0 0 0 0 0 1 | 1 | 1
+ 0 0 0 0 0 0 0 1 | 1 | 1
0 0 0 0 0 0 1 0 | 2 | 2
0 0 0 0 0 0 1 1 | 3 | 3
. . . | |
0 1 0 1 0 1 0 1 | 55 | 85
. . . | |
1 0 1 0 1 0 1 0 | AA | 170
- . . . | |
+ . . . | |
1 1 1 1 1 1 0 1 | FD | 253
1 1 1 1 1 1 1 0 | FE | 254
1 1 1 1 1 1 1 1 | FF | 255
Setting the I/O Base Address
-----------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The last three switches in switch block SW1 are used to select one
-of eight possible I/O Base addresses using the following table
+of eight possible I/O Base addresses using the following table::
Switch | Hex I/O
@@ -1392,13 +1434,15 @@ of eight possible I/O Base addresses using the following table
Setting the Base Memory (RAM) buffer Address
---------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The memory buffer (RAM) requires 2K. The base of this buffer can be
+The memory buffer (RAM) requires 2K. The base of this buffer can be
located in any of eight positions. The address of the Boot Prom is
memory base + 8K or memory base + 0x2000.
Switches 1-5 of switch block SW1 select the Memory Base address.
+::
+
Switch | Hex RAM | Hex ROM
1 2 3 4 5 | Address | Address *)
--------------------|---------|-----------
@@ -1410,22 +1454,24 @@ Switches 1-5 of switch block SW1 select the Memory Base address.
ON ON OFF ON OFF | D8000 | DA000
ON ON ON OFF OFF | DC000 | DE000
ON ON OFF OFF OFF | E0000 | E2000
-
-*) To enable the Boot ROM install the jumper JP1
-Note: Since the switches 1 and 2 are always set to ON it may be possible
+ *) To enable the Boot ROM install the jumper JP1
+
+.. note::
+
+ Since the switches 1 and 2 are always set to ON it may be possible
that they can be used to add an offset of 2K, 4K or 6K to the base
address, but this feature is not documented in the manual and I
haven't tested it yet.
Setting the Interrupt Line
---------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^
To select a hardware interrupt level install one (only one!) of the jumpers
-JP2, JP3, JP4, JP5, JP6. JP2 is the default.
+JP2, JP3, JP4, JP5, JP6. JP2 is the default::
- Jumper | IRQ
+ Jumper | IRQ
-------|-----
2 | 2
3 | 3
@@ -1435,71 +1481,66 @@ JP2, JP3, JP4, JP5, JP6. JP2 is the default.
Setting the Internal Terminator on CN120AB/TP/SBT
---------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The jumper JP12 is used to enable the internal terminator.
+The jumper JP12 is used to enable the internal terminator::
- -----
- 0 | 0 |
+ -----
+ 0 | 0 |
----- ON | | ON
| 0 | | 0 |
| | OFF ----- OFF
| 0 | 0
-----
- Terminator Terminator
+ Terminator Terminator
disabled enabled
-
+
Selecting the Connector Type on CN120ST/SBT
--------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+::
JP10 JP11 JP10 JP11
- ----- -----
- 0 0 | 0 | | 0 |
+ ----- -----
+ 0 0 | 0 | | 0 |
----- ----- | | | |
| 0 | | 0 | | 0 | | 0 |
| | | | ----- -----
- | 0 | | 0 | 0 0
+ | 0 | | 0 | 0 0
----- -----
- Coaxial Cable Twisted Pair Cable
+ Coaxial Cable Twisted Pair Cable
(Default)
Setting the Timeout Parameters
-------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The jumpers labeled EXT1 and EXT2 are used to determine the timeout
+The jumpers labeled EXT1 and EXT2 are used to determine the timeout
parameters. These two jumpers are normally left open.
+CNet Technology Inc. (16-bit cards)
+===================================
-*****************************************************************************
-
-** CNet Technology Inc. **
160 Series (16-bit cards)
-------------------------
- from Juergen Seifert <seifert@htwm.de>
-CNET TECHNOLOGY INC. (CNet) ARCNET 160A SERIES
-==============================================
-
This description has been written by Juergen Seifert <seifert@htwm.de>
-using information from the following Original CNet Manual
+using information from the following Original CNet Manual
- "ARCNET
- USER'S MANUAL
- for
- CN160A
- CN160AB
- CN160TP
- P/N:12-01-0006
- Revision 3.00"
+ "ARCNET USER'S MANUAL for
+ CN160A CN160AB CN160TP
+ P/N:12-01-0006 Revision 3.00"
ARCNET is a registered trademark of the Datapoint Corporation
-P/N 160A ARCNET 16 bit XT/AT Star
-P/N 160AB ARCNET 16 bit XT/AT Bus
-P/N 160TP ARCNET 16 bit XT/AT Twisted Pair
+- P/N 160A ARCNET 16 bit XT/AT Star
+- P/N 160AB ARCNET 16 bit XT/AT Bus
+- P/N 160TP ARCNET 16 bit XT/AT Twisted Pair
+
+::
___________________________________________________________________
< _________________________ ___|
@@ -1526,30 +1567,30 @@ P/N 160TP ARCNET 16 bit XT/AT Twisted Pair
> | | |
<____________| |_______________________________________|
-Legend:
+Legend::
-9026 ARCNET Probe
-SW1 1-6: Base I/O Address Select
- 7-10: Base Memory Address Select
-SW2 1-8: Node ID Select (ID0-ID7)
-JP1/JP2 ET1, ET2 Timeout Parameters
-JP3-JP13 Interrupt Select
-J1 BNC RG62/U Connector (CN160A/AB only)
-J1 Two 6-position Telephone Jack (CN160TP only)
-LED
+ 9026 ARCNET Probe
+ SW1 1-6: Base I/O Address Select
+ 7-10: Base Memory Address Select
+ SW2 1-8: Node ID Select (ID0-ID7)
+ JP1/JP2 ET1, ET2 Timeout Parameters
+ JP3-JP13 Interrupt Select
+ J1 BNC RG62/U Connector (CN160A/AB only)
+ J1 Two 6-position Telephone Jack (CN160TP only)
+ LED
Setting one of the switches to Off means "1", On means "0".
Setting the Node ID
--------------------
+^^^^^^^^^^^^^^^^^^^
The eight switches in SW2 are used to set the node ID. Each node attached
to the network must have an unique node ID which must be different from 0.
Switch 1 (ID0) serves as the least significant bit (LSB).
-The node ID is the sum of the values of all switches set to "1"
-These values are:
+The node ID is the sum of the values of all switches set to "1"
+These values are::
Switch | Label | Value
-------|-------|-------
@@ -1562,32 +1603,32 @@ These values are:
7 | ID6 | 64
8 | ID7 | 128
-Some Examples:
+Some Examples::
- Switch | Hex | Decimal
+ Switch | Hex | Decimal
8 7 6 5 4 3 2 1 | Node ID | Node ID
----------------|---------|---------
0 0 0 0 0 0 0 0 | not allowed
- 0 0 0 0 0 0 0 1 | 1 | 1
+ 0 0 0 0 0 0 0 1 | 1 | 1
0 0 0 0 0 0 1 0 | 2 | 2
0 0 0 0 0 0 1 1 | 3 | 3
. . . | |
0 1 0 1 0 1 0 1 | 55 | 85
. . . | |
1 0 1 0 1 0 1 0 | AA | 170
- . . . | |
+ . . . | |
1 1 1 1 1 1 0 1 | FD | 253
1 1 1 1 1 1 1 0 | FE | 254
1 1 1 1 1 1 1 1 | FF | 255
Setting the I/O Base Address
-----------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The first six switches in switch block SW1 are used to select the I/O Base
-address using the following table:
+address using the following table::
- Switch | Hex I/O
+ Switch | Hex I/O
1 2 3 4 5 6 | Address
------------------------|--------
OFF ON ON OFF OFF ON | 260
@@ -1604,10 +1645,10 @@ Note: Other IO-Base addresses seem to be selectable, but only the above
Setting the Base Memory (RAM) buffer Address
---------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The switches 7-10 of switch block SW1 are used to select the Memory
-Base address of the RAM (2K) and the PROM.
+Base address of the RAM (2K) and the PROM::
Switch | Hex RAM | Hex ROM
7 8 9 10 | Address | Address
@@ -1616,17 +1657,19 @@ Base address of the RAM (2K) and the PROM.
OFF OFF ON OFF | D0000 | D8000 (Default)
OFF OFF OFF ON | E0000 | E8000
-Note: Other MEM-Base addresses seem to be selectable, but only the above
+.. note::
+
+ Other MEM-Base addresses seem to be selectable, but only the above
combinations are documented.
Setting the Interrupt Line
---------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^
To select a hardware interrupt level install one (only one!) of the jumpers
-JP3 through JP13 using the following table:
+JP3 through JP13 using the following table::
- Jumper | IRQ
+ Jumper | IRQ
-------|-----------------
3 | 14
4 | 15
@@ -1640,10 +1683,12 @@ JP3 through JP13 using the following table:
12 | 7
13 | 2 (=9) Default!
-Note: - Do not use JP11=IRQ6, it may conflict with your Floppy Disk
- Controller
+.. note::
+
+ - Do not use JP11=IRQ6, it may conflict with your Floppy Disk
+ Controller
- Use JP3=IRQ14 only, if you don't have an IDE-, MFM-, or RLL-
- Hard Disk, it may conflict with their controllers
+ Hard Disk, it may conflict with their controllers
Setting the Timeout Parameters
@@ -1653,14 +1698,16 @@ The jumpers labeled JP1 and JP2 are used to determine the timeout
parameters. These two jumpers are normally left open.
-*****************************************************************************
+Lantech
+=======
-** Lantech **
8-bit card, unknown model
-------------------------
- from Vlad Lungu <vlungu@ugal.ro> - his e-mail address seemed broken at
the time I tried to reach him. Sorry Vlad, if you didn't get my reply.
+::
+
________________________________________________________________
| 1 8 |
| ___________ __|
@@ -1683,25 +1730,27 @@ parameters. These two jumpers are normally left open.
| | PROM | |ooooo| JP6 |
| |____________| |ooooo| |
|_____________ _ _|
- |____________________________________________| |__|
+ |____________________________________________| |__|
UM9065L : ARCnet Controller
SW 1 : Shared Memory Address and I/O Base
- ON=0
+::
- 12345|Memory Address
- -----|--------------
- 00001| D4000
- 00010| CC000
- 00110| D0000
- 01110| D1000
- 01101| D9000
- 10010| CC800
- 10011| DC800
- 11110| D1800
+ ON=0
+
+ 12345|Memory Address
+ -----|--------------
+ 00001| D4000
+ 00010| CC000
+ 00110| D0000
+ 01110| D1000
+ 01101| D9000
+ 10010| CC800
+ 10011| DC800
+ 11110| D1800
It seems that the bits are considered in reverse order. Also, you must
observe that some of those addresses are unusual and I didn't probe them; I
@@ -1710,43 +1759,48 @@ some others that I didn't write here the card seems to conflict with the
video card (an S3 GENDAC). I leave the full decoding of those addresses to
you.
- 678| I/O Address
- ---|------------
- 000| 260
- 001| failed probe
- 010| 2E0
- 011| 380
- 100| 290
- 101| 350
- 110| failed probe
- 111| 3E0
+::
-SW 2 : Node ID (binary coded)
+ 678| I/O Address
+ ---|------------
+ 000| 260
+ 001| failed probe
+ 010| 2E0
+ 011| 380
+ 100| 290
+ 101| 350
+ 110| failed probe
+ 111| 3E0
-JP 4 : Boot PROM enable CLOSE - enabled
- OPEN - disabled
+ SW 2 : Node ID (binary coded)
-JP 6 : IRQ set (ONLY ONE jumper on 1-5 for IRQ 2-6)
+ JP 4 : Boot PROM enable CLOSE - enabled
+ OPEN - disabled
+ JP 6 : IRQ set (ONLY ONE jumper on 1-5 for IRQ 2-6)
-*****************************************************************************
-** Acer **
+Acer
+====
+
8-bit card, Model 5210-003
--------------------------
+
- from Vojtech Pavlik <vojtech@suse.cz> using portions of the existing
arcnet-hardware file.
This is a 90C26 based card. Its configuration seems similar to the SMC
PC100, but has some additional jumpers I don't know the meaning of.
- __
- | |
+::
+
+ __
+ | |
___________|__|_________________________
| | | |
| | BNC | |
| |______| ___|
- | _____________________ |___
+ | _____________________ |___
| | | |
| | Hybrid IC | |
| | | o|o J1 |
@@ -1762,51 +1816,51 @@ PC100, but has some additional jumpers I don't know the meaning of.
| _____ |
| | | _____ |
| | | | | ___|
- | | | | | |
- | _____ | ROM | | UFS | |
- | | | | | | | |
- | | | ___ | | | | |
- | | | | | |__.__| |__.__| |
- | | NCR | |XTL| _____ _____ |
- | | | |___| | | | | |
- | |90C26| | | | | |
- | | | | RAM | | UFS | |
- | | | J17 o|o | | | | |
- | | | J16 o|o | | | | |
- | |__.__| |__.__| |__.__| |
- | ___ |
- | | |8 |
- | |SW2| |
- | | | |
- | |___|1 |
- | ___ |
- | | |10 J18 o|o |
- | | | o|o |
- | |SW1| o|o |
- | | | J21 o|o |
- | |___|1 |
- | |
- |____________________________________|
-
-
-Legend:
-
-90C26 ARCNET Chip
-XTL 20 MHz Crystal
-SW1 1-6 Base I/O Address Select
- 7-10 Memory Address Select
-SW2 1-8 Node ID Select (ID0-ID7)
-J1-J5 IRQ Select
-J6-J21 Unknown (Probably extra timeouts & ROM enable ...)
-LED1 Activity LED
-BNC Coax connector (STAR ARCnet)
-RAM 2k of SRAM
-ROM Boot ROM socket
-UFS Unidentified Flying Sockets
+ | | | | | |
+ | _____ | ROM | | UFS | |
+ | | | | | | | |
+ | | | ___ | | | | |
+ | | | | | |__.__| |__.__| |
+ | | NCR | |XTL| _____ _____ |
+ | | | |___| | | | | |
+ | |90C26| | | | | |
+ | | | | RAM | | UFS | |
+ | | | J17 o|o | | | | |
+ | | | J16 o|o | | | | |
+ | |__.__| |__.__| |__.__| |
+ | ___ |
+ | | |8 |
+ | |SW2| |
+ | | | |
+ | |___|1 |
+ | ___ |
+ | | |10 J18 o|o |
+ | | | o|o |
+ | |SW1| o|o |
+ | | | J21 o|o |
+ | |___|1 |
+ | |
+ |____________________________________|
+
+
+Legend::
+
+ 90C26 ARCNET Chip
+ XTL 20 MHz Crystal
+ SW1 1-6 Base I/O Address Select
+ 7-10 Memory Address Select
+ SW2 1-8 Node ID Select (ID0-ID7)
+ J1-J5 IRQ Select
+ J6-J21 Unknown (Probably extra timeouts & ROM enable ...)
+ LED1 Activity LED
+ BNC Coax connector (STAR ARCnet)
+ RAM 2k of SRAM
+ ROM Boot ROM socket
+ UFS Unidentified Flying Sockets
Setting the Node ID
--------------------
+^^^^^^^^^^^^^^^^^^^
The eight switches in SW2 are used to set the node ID. Each node attached
to the network must have an unique node ID which must not be 0.
@@ -1815,7 +1869,7 @@ Switch 1 (ID0) serves as the least significant bit (LSB).
Setting one of the switches to OFF means "1", ON means "0".
The node ID is the sum of the values of all switches set to "1"
-These values are:
+These values are::
Switch | Value
-------|-------
@@ -1832,40 +1886,40 @@ Don't set this to 0 or 255; these values are reserved.
Setting the I/O Base Address
-----------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The switches 1 to 6 of switch block SW1 are used to select one
-of 32 possible I/O Base addresses using the following tables
-
- | Hex
+of 32 possible I/O Base addresses using the following tables::
+
+ | Hex
Switch | Value
-------|-------
- 1 | 200
- 2 | 100
- 3 | 80
- 4 | 40
- 5 | 20
- 6 | 10
+ 1 | 200
+ 2 | 100
+ 3 | 80
+ 4 | 40
+ 5 | 20
+ 6 | 10
The I/O address is sum of all switches set to "1". Remember that
the I/O address space bellow 0x200 is RESERVED for mainboard, so
-switch 1 should be ALWAYS SET TO OFF.
+switch 1 should be ALWAYS SET TO OFF.
Setting the Base Memory (RAM) buffer Address
---------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The memory buffer (RAM) requires 2K. The base of this buffer can be
located in any of sixteen positions. However, the addresses below
A0000 are likely to cause system hang because there's main RAM.
-Jumpers 7-10 of switch block SW1 select the Memory Base address.
+Jumpers 7-10 of switch block SW1 select the Memory Base address::
Switch | Hex RAM
7 8 9 10 | Address
----------------|---------
OFF OFF OFF OFF | F0000 (conflicts with main BIOS)
- OFF OFF OFF ON | E0000
+ OFF OFF OFF ON | E0000
OFF OFF ON OFF | D0000
OFF OFF ON ON | C0000 (conflicts with video BIOS)
OFF ON OFF OFF | B0000 (conflicts with mono video)
@@ -1873,10 +1927,10 @@ Jumpers 7-10 of switch block SW1 select the Memory Base address.
Setting the Interrupt Line
---------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^
-Jumpers 1-5 of the jumper block J1 control the IRQ level. ON means
-shorted, OFF means open.
+Jumpers 1-5 of the jumper block J1 control the IRQ level. ON means
+shorted, OFF means open::
Jumper | IRQ
1 2 3 4 5 |
@@ -1889,65 +1943,67 @@ shorted, OFF means open.
Unknown jumpers & sockets
--------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^
I know nothing about these. I just guess that J16&J17 are timeout
jumpers and maybe one of J18-J21 selects ROM. Also J6-J10 and
J11-J15 are connecting IRQ2-7 to some pins on the UFSs. I can't
guess the purpose.
+Datapoint?
+==========
-*****************************************************************************
-
-** Datapoint? **
LAN-ARC-8, an 8-bit card
------------------------
+
- from Vojtech Pavlik <vojtech@suse.cz>
This is another SMC 90C65-based ARCnet card. I couldn't identify the
manufacturer, but it might be DataPoint, because the card has the
original arcNet logo in its upper right corner.
- _______________________________________________________
- | _________ |
- | | SW2 | ON arcNet |
- | |_________| OFF ___|
- | _____________ 1 ______ 8 | | 8
- | | | SW1 | XTAL | ____________ | S |
- | > RAM (2k) | |______|| | | W |
- | |_____________| | H | | 3 |
- | _________|_____ y | |___| 1
- | _________ | | |b | |
- | |_________| | | |r | |
- | | SMC | |i | |
- | | 90C65| |d | |
- | _________ | | | | |
- | | SW1 | ON | | |I | |
- | |_________| OFF |_________|_____/C | _____|
- | 1 8 | | | |___
- | ______________ | | | BNC |___|
- | | | |____________| |_____|
- | > EPROM SOCKET | _____________ |
- | |______________| |_____________| |
- | ______________|
- | |
- |________________________________________|
-
-Legend:
-
-90C65 ARCNET Chip
-SW1 1-5: Base Memory Address Select
- 6-8: Base I/O Address Select
-SW2 1-8: Node ID Select
-SW3 1-5: IRQ Select
- 6-7: Extra Timeout
- 8 : ROM Enable
-BNC Coax connector
-XTAL 20 MHz Crystal
+::
+
+ _______________________________________________________
+ | _________ |
+ | | SW2 | ON arcNet |
+ | |_________| OFF ___|
+ | _____________ 1 ______ 8 | | 8
+ | | | SW1 | XTAL | ____________ | S |
+ | > RAM (2k) | |______|| | | W |
+ | |_____________| | H | | 3 |
+ | _________|_____ y | |___| 1
+ | _________ | | |b | |
+ | |_________| | | |r | |
+ | | SMC | |i | |
+ | | 90C65| |d | |
+ | _________ | | | | |
+ | | SW1 | ON | | |I | |
+ | |_________| OFF |_________|_____/C | _____|
+ | 1 8 | | | |___
+ | ______________ | | | BNC |___|
+ | | | |____________| |_____|
+ | > EPROM SOCKET | _____________ |
+ | |______________| |_____________| |
+ | ______________|
+ | |
+ |________________________________________|
+
+Legend::
+
+ 90C65 ARCNET Chip
+ SW1 1-5: Base Memory Address Select
+ 6-8: Base I/O Address Select
+ SW2 1-8: Node ID Select
+ SW3 1-5: IRQ Select
+ 6-7: Extra Timeout
+ 8 : ROM Enable
+ BNC Coax connector
+ XTAL 20 MHz Crystal
Setting the Node ID
--------------------
+^^^^^^^^^^^^^^^^^^^
The eight switches in SW3 are used to set the node ID. Each node attached
to the network must have an unique node ID which must not be 0.
@@ -1955,8 +2011,8 @@ Switch 1 serves as the least significant bit (LSB).
Setting one of the switches to Off means "1", On means "0".
-The node ID is the sum of the values of all switches set to "1"
-These values are:
+The node ID is the sum of the values of all switches set to "1"
+These values are::
Switch | Value
-------|-------
@@ -1971,10 +2027,10 @@ These values are:
Setting the I/O Base Address
-----------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The last three switches in switch block SW1 are used to select one
-of eight possible I/O Base addresses using the following table
+of eight possible I/O Base addresses using the following table::
Switch | Hex I/O
@@ -1991,13 +2047,16 @@ of eight possible I/O Base addresses using the following table
Setting the Base Memory (RAM) buffer Address
---------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The memory buffer (RAM) requires 2K. The base of this buffer can be
+The memory buffer (RAM) requires 2K. The base of this buffer can be
located in any of eight positions. The address of the Boot Prom is
memory base + 0x2000.
+
Jumpers 3-5 of switch block SW1 select the Memory Base address.
+::
+
Switch | Hex RAM | Hex ROM
1 2 3 4 5 | Address | Address *)
--------------------|---------|-----------
@@ -2009,16 +2068,16 @@ Jumpers 3-5 of switch block SW1 select the Memory Base address.
ON ON OFF ON OFF | D8000 | DA000
ON ON ON OFF OFF | DC000 | DE000
ON ON OFF OFF OFF | E0000 | E2000
-
-*) To enable the Boot ROM set the switch 8 of switch block SW3 to position ON.
+
+ *) To enable the Boot ROM set the switch 8 of switch block SW3 to position ON.
The switches 1 and 2 probably add 0x0800 and 0x1000 to RAM base address.
Setting the Interrupt Line
---------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^
-Switches 1-5 of the switch block SW3 control the IRQ level.
+Switches 1-5 of the switch block SW3 control the IRQ level::
Jumper | IRQ
1 2 3 4 5 |
@@ -2031,64 +2090,67 @@ Switches 1-5 of the switch block SW3 control the IRQ level.
Setting the Timeout Parameters
-------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The switches 6-7 of the switch block SW3 are used to determine the timeout
parameters. These two switches are normally left in the OFF position.
-*****************************************************************************
+Topware
+=======
-** Topware **
8-bit card, TA-ARC/10
--------------------------
+---------------------
+
- from Vojtech Pavlik <vojtech@suse.cz>
This is another very similar 90C65 card. Most of the switches and jumpers
are the same as on other clones.
- _____________________________________________________________________
-| ___________ | | ______ |
-| |SW2 NODE ID| | | | XTAL | |
-| |___________| | Hybrid IC | |______| |
-| ___________ | | __|
-| |SW1 MEM+I/O| |_________________________| LED1|__|)
-| |___________| 1 2 |
-| J3 |o|o| TIMEOUT ______|
-| ______________ |o|o| | |
-| | | ___________________ | RJ |
-| > EPROM SOCKET | | \ |------|
-|J2 |______________| | | | |
-||o| | | |______|
-||o| ROM ENABLE | SMC | _________ |
-| _____________ | 90C65 | |_________| _____|
-| | | | | | |___
-| > RAM (2k) | | | | BNC |___|
-| |_____________| | | |_____|
-| |____________________| |
-| ________ IRQ 2 3 4 5 7 ___________ |
-||________| |o|o|o|o|o| |___________| |
-|________ J1|o|o|o|o|o| ______________|
- | |
- |_____________________________________________|
-
-Legend:
-
-90C65 ARCNET Chip
-XTAL 20 MHz Crystal
-SW1 1-5 Base Memory Address Select
- 6-8 Base I/O Address Select
-SW2 1-8 Node ID Select (ID0-ID7)
-J1 IRQ Select
-J2 ROM Enable
-J3 Extra Timeout
-LED1 Activity LED
-BNC Coax connector (BUS ARCnet)
-RJ Twisted Pair Connector (daisy chain)
+::
+
+ _____________________________________________________________________
+ | ___________ | | ______ |
+ | |SW2 NODE ID| | | | XTAL | |
+ | |___________| | Hybrid IC | |______| |
+ | ___________ | | __|
+ | |SW1 MEM+I/O| |_________________________| LED1|__|)
+ | |___________| 1 2 |
+ | J3 |o|o| TIMEOUT ______|
+ | ______________ |o|o| | |
+ | | | ___________________ | RJ |
+ | > EPROM SOCKET | | \ |------|
+ |J2 |______________| | | | |
+ ||o| | | |______|
+ ||o| ROM ENABLE | SMC | _________ |
+ | _____________ | 90C65 | |_________| _____|
+ | | | | | | |___
+ | > RAM (2k) | | | | BNC |___|
+ | |_____________| | | |_____|
+ | |____________________| |
+ | ________ IRQ 2 3 4 5 7 ___________ |
+ ||________| |o|o|o|o|o| |___________| |
+ |________ J1|o|o|o|o|o| ______________|
+ | |
+ |_____________________________________________|
+
+Legend::
+
+ 90C65 ARCNET Chip
+ XTAL 20 MHz Crystal
+ SW1 1-5 Base Memory Address Select
+ 6-8 Base I/O Address Select
+ SW2 1-8 Node ID Select (ID0-ID7)
+ J1 IRQ Select
+ J2 ROM Enable
+ J3 Extra Timeout
+ LED1 Activity LED
+ BNC Coax connector (BUS ARCnet)
+ RJ Twisted Pair Connector (daisy chain)
Setting the Node ID
--------------------
+^^^^^^^^^^^^^^^^^^^
The eight switches in SW2 are used to set the node ID. Each node attached to
the network must have an unique node ID which must not be 0. Switch 1 (ID0)
@@ -2097,7 +2159,7 @@ serves as the least significant bit (LSB).
Setting one of the switches to Off means "1", On means "0".
The node ID is the sum of the values of all switches set to "1"
-These values are:
+These values are::
Switch | Label | Value
-------|-------|-------
@@ -2111,10 +2173,10 @@ These values are:
8 | ID7 | 128
Setting the I/O Base Address
-----------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The last three switches in switch block SW1 are used to select one
-of eight possible I/O Base addresses using the following table:
+of eight possible I/O Base addresses using the following table::
Switch | Hex I/O
@@ -2122,7 +2184,7 @@ of eight possible I/O Base addresses using the following table:
------------|--------
ON ON ON | 260 (Manufacturer's default)
OFF ON ON | 290
- ON OFF ON | 2E0
+ ON OFF ON | 2E0
OFF OFF ON | 2F0
ON ON OFF | 300
OFF ON OFF | 350
@@ -2131,35 +2193,38 @@ of eight possible I/O Base addresses using the following table:
Setting the Base Memory (RAM) buffer Address
---------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The memory buffer (RAM) requires 2K. The base of this buffer can be
located in any of eight positions. The address of the Boot Prom is
memory base + 0x2000.
+
Jumpers 3-5 of switch block SW1 select the Memory Base address.
+::
+
Switch | Hex RAM | Hex ROM
1 2 3 4 5 | Address | Address *)
--------------------|---------|-----------
ON ON ON ON ON | C0000 | C2000
- ON ON OFF ON ON | C4000 | C6000 (Manufacturer's default)
+ ON ON OFF ON ON | C4000 | C6000 (Manufacturer's default)
ON ON ON OFF ON | CC000 | CE000
- ON ON OFF OFF ON | D0000 | D2000
+ ON ON OFF OFF ON | D0000 | D2000
ON ON ON ON OFF | D4000 | D6000
ON ON OFF ON OFF | D8000 | DA000
ON ON ON OFF OFF | DC000 | DE000
ON ON OFF OFF OFF | E0000 | E2000
-*) To enable the Boot ROM short the jumper J2.
+ *) To enable the Boot ROM short the jumper J2.
The jumpers 1 and 2 probably add 0x0800 and 0x1000 to RAM address.
Setting the Interrupt Line
---------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^
Jumpers 1-5 of the jumper block J1 control the IRQ level. ON means
-shorted, OFF means open.
+shorted, OFF means open::
Jumper | IRQ
1 2 3 4 5 |
@@ -2172,19 +2237,21 @@ shorted, OFF means open.
Setting the Timeout Parameters
-------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The jumpers J3 are used to set the timeout parameters. These two
+The jumpers J3 are used to set the timeout parameters. These two
jumpers are normally left open.
-
-*****************************************************************************
+Thomas-Conrad
+=============
-** Thomas-Conrad **
Model #500-6242-0097 REV A (8-bit card)
---------------------------------------
+
- from Lars Karlsson <100617.3473@compuserve.com>
+::
+
________________________________________________________
| ________ ________ |_____
| |........| |........| |
@@ -2194,11 +2261,11 @@ Model #500-6242-0097 REV A (8-bit card)
| address | |
| ______ switch | |
| | | | |
- | | | |___|
+ | | | |___|
| | | ______ |___._
| |______| |______| ____| BNC
| Jumper- _____| Connector
- | Main chip block _ __| '
+ | Main chip block _ __| '
| | | | RJ Connector
| |_| | with 110 Ohm
| |__ Terminator
@@ -2208,46 +2275,49 @@ Model #500-6242-0097 REV A (8-bit card)
| |___________| |_____| |__
| Boot PROM socket IRQ-jumpers |_ Diagnostic
|________ __ _| LED (red)
- | | | | | | | | | | | | | | | | | | | | | |
- | | | | | | | | | | | | | | | | | | | | |________|
- |
- |
+ | | | | | | | | | | | | | | | | | | | | | |
+ | | | | | | | | | | | | | | | | | | | | |________|
+ |
+ |
And here are the settings for some of the switches and jumpers on the cards.
+::
- I/O
+ I/O
- 1 2 3 4 5 6 7 8
+ 1 2 3 4 5 6 7 8
-2E0----- 0 0 0 1 0 0 0 1
-2F0----- 0 0 0 1 0 0 0 0
-300----- 0 0 0 0 1 1 1 1
-350----- 0 0 0 0 1 1 1 0
+ 2E0----- 0 0 0 1 0 0 0 1
+ 2F0----- 0 0 0 1 0 0 0 0
+ 300----- 0 0 0 0 1 1 1 1
+ 350----- 0 0 0 0 1 1 1 0
"0" in the above example means switch is off "1" means that it is on.
+::
- ShMem address.
+ ShMem address.
- 1 2 3 4 5 6 7 8
+ 1 2 3 4 5 6 7 8
-CX00--0 0 1 1 | | |
-DX00--0 0 1 0 |
-X000--------- 1 1 |
-X400--------- 1 0 |
-X800--------- 0 1 |
-XC00--------- 0 0
-ENHANCED----------- 1
-COMPATIBLE--------- 0
+ CX00--0 0 1 1 | | |
+ DX00--0 0 1 0 |
+ X000--------- 1 1 |
+ X400--------- 1 0 |
+ X800--------- 0 1 |
+ XC00--------- 0 0
+ ENHANCED----------- 1
+ COMPATIBLE--------- 0
+::
- IRQ
+ IRQ
- 3 4 5 7 2
- . . . . .
- . . . . .
+ 3 4 5 7 2
+ . . . . .
+ . . . . .
There is a DIP-switch with 8 switches, used to set the shared memory address
@@ -2266,10 +2336,9 @@ varies by the type of card involved. I fail to see how either of these
enhance anything. Send me more detailed information about this mode, or
just use "compatible" mode instead.]
+Waterloo Microsystems Inc. ??
+=============================
-*****************************************************************************
-
-** Waterloo Microsystems Inc. ?? **
8-bit card (C) 1985
-------------------
- from Robert Michael Best <rmb117@cs.usask.ca>
@@ -2283,103 +2352,104 @@ e-mail me.]
The probe has not been able to detect the card on any of the J2 settings,
and I tried them again with the "Waterloo" chip removed.
-
- _____________________________________________________________________
-| \/ \/ ___ __ __ |
-| C4 C4 |^| | M || ^ ||^| |
-| -- -- |_| | 5 || || | C3 |
-| \/ \/ C10 |___|| ||_| |
-| C4 C4 _ _ | | ?? |
-| -- -- | \/ || | |
-| | || | |
-| | || C1 | |
-| | || | \/ _____|
-| | C6 || | C9 | |___
-| | || | -- | BNC |___|
-| | || | >C7| |_____|
-| | || | |
-| __ __ |____||_____| 1 2 3 6 |
-|| ^ | >C4| |o|o|o|o|o|o| J2 >C4| |
-|| | |o|o|o|o|o|o| |
-|| C2 | >C4| >C4| |
-|| | >C8| |
-|| | 2 3 4 5 6 7 IRQ >C4| |
-||_____| |o|o|o|o|o|o| J3 |
-|_______ |o|o|o|o|o|o| _______________|
- | |
- |_____________________________________________|
-
-C1 -- "COM9026
- SMC 8638"
- In a chip socket.
-
-C2 -- "@Copyright
- Waterloo Microsystems Inc.
- 1985"
- In a chip Socket with info printed on a label covering a round window
- showing the circuit inside. (The window indicates it is an EPROM chip.)
-
-C3 -- "COM9032
- SMC 8643"
- In a chip socket.
-
-C4 -- "74LS"
- 9 total no sockets.
-
-M5 -- "50006-136
- 20.000000 MHZ
- MTQ-T1-S3
- 0 M-TRON 86-40"
- Metallic case with 4 pins, no socket.
-
-C6 -- "MOSTEK@TC8643
- MK6116N-20
- MALAYSIA"
- No socket.
-
-C7 -- No stamp or label but in a 20 pin chip socket.
-
-C8 -- "PAL10L8CN
- 8623"
- In a 20 pin socket.
-
-C9 -- "PAl16R4A-2CN
- 8641"
- In a 20 pin socket.
-
-C10 -- "M8640
- NMC
- 9306N"
- In an 8 pin socket.
-
-?? -- Some components on a smaller board and attached with 20 pins all
- along the side closest to the BNC connector. The are coated in a dark
- resin.
-
-On the board there are two jumper banks labeled J2 and J3. The
-manufacturer didn't put a J1 on the board. The two boards I have both
+
+::
+
+ _____________________________________________________________________
+ | \/ \/ ___ __ __ |
+ | C4 C4 |^| | M || ^ ||^| |
+ | -- -- |_| | 5 || || | C3 |
+ | \/ \/ C10 |___|| ||_| |
+ | C4 C4 _ _ | | ?? |
+ | -- -- | \/ || | |
+ | | || | |
+ | | || C1 | |
+ | | || | \/ _____|
+ | | C6 || | C9 | |___
+ | | || | -- | BNC |___|
+ | | || | >C7| |_____|
+ | | || | |
+ | __ __ |____||_____| 1 2 3 6 |
+ || ^ | >C4| |o|o|o|o|o|o| J2 >C4| |
+ || | |o|o|o|o|o|o| |
+ || C2 | >C4| >C4| |
+ || | >C8| |
+ || | 2 3 4 5 6 7 IRQ >C4| |
+ ||_____| |o|o|o|o|o|o| J3 |
+ |_______ |o|o|o|o|o|o| _______________|
+ | |
+ |_____________________________________________|
+
+ C1 -- "COM9026
+ SMC 8638"
+ In a chip socket.
+
+ C2 -- "@Copyright
+ Waterloo Microsystems Inc.
+ 1985"
+ In a chip Socket with info printed on a label covering a round window
+ showing the circuit inside. (The window indicates it is an EPROM chip.)
+
+ C3 -- "COM9032
+ SMC 8643"
+ In a chip socket.
+
+ C4 -- "74LS"
+ 9 total no sockets.
+
+ M5 -- "50006-136
+ 20.000000 MHZ
+ MTQ-T1-S3
+ 0 M-TRON 86-40"
+ Metallic case with 4 pins, no socket.
+
+ C6 -- "MOSTEK@TC8643
+ MK6116N-20
+ MALAYSIA"
+ No socket.
+
+ C7 -- No stamp or label but in a 20 pin chip socket.
+
+ C8 -- "PAL10L8CN
+ 8623"
+ In a 20 pin socket.
+
+ C9 -- "PAl16R4A-2CN
+ 8641"
+ In a 20 pin socket.
+
+ C10 -- "M8640
+ NMC
+ 9306N"
+ In an 8 pin socket.
+
+ ?? -- Some components on a smaller board and attached with 20 pins all
+ along the side closest to the BNC connector. The are coated in a dark
+ resin.
+
+On the board there are two jumper banks labeled J2 and J3. The
+manufacturer didn't put a J1 on the board. The two boards I have both
came with a jumper box for each bank.
-J2 -- Numbered 1 2 3 4 5 6.
- 4 and 5 are not stamped due to solder points.
-
-J3 -- IRQ 2 3 4 5 6 7
+::
+
+ J2 -- Numbered 1 2 3 4 5 6.
+ 4 and 5 are not stamped due to solder points.
+
+ J3 -- IRQ 2 3 4 5 6 7
-The board itself has a maple leaf stamped just above the irq jumpers
-and "-2 46-86" beside C2. Between C1 and C6 "ASS 'Y 300163" and "@1986
+The board itself has a maple leaf stamped just above the irq jumpers
+and "-2 46-86" beside C2. Between C1 and C6 "ASS 'Y 300163" and "@1986
CORMAN CUSTOM ELECTRONICS CORP." stamped just below the BNC connector.
Below that "MADE IN CANADA"
-
-*****************************************************************************
+No Name
+=======
-** No Name **
8-bit cards, 16-bit cards
-------------------------
+
- from Juergen Seifert <seifert@htwm.de>
-
-NONAME 8-BIT ARCNET
-===================
I have named this ARCnet card "NONAME", since there is no name of any
manufacturer on the Installation manual nor on the shipping box. The only
@@ -2388,8 +2458,10 @@ it is "Made in Taiwan"
This description has been written by Juergen Seifert <seifert@htwm.de>
using information from the Original
- "ARCnet Installation Manual"
+ "ARCnet Installation Manual"
+
+::
________________________________________________________________
| |STAR| BUS| T/P| |
@@ -2416,32 +2488,32 @@ using information from the Original
| \ IRQ / T T O |
|__________________1_2_M______________________|
-Legend:
+Legend::
-COM90C65: ARCnet Probe
-S1 1-8: Node ID Select
-S2 1-3: I/O Base Address Select
- 4-6: Memory Base Address Select
- 7-8: RAM Offset Select
-ET1, ET2 Extended Timeout Select
-ROM ROM Enable Select
-CN RG62 Coax Connector
-STAR| BUS | T/P Three fields for placing a sign (colored circle)
- indicating the topology of the card
+ COM90C65: ARCnet Probe
+ S1 1-8: Node ID Select
+ S2 1-3: I/O Base Address Select
+ 4-6: Memory Base Address Select
+ 7-8: RAM Offset Select
+ ET1, ET2 Extended Timeout Select
+ ROM ROM Enable Select
+ CN RG62 Coax Connector
+ STAR| BUS | T/P Three fields for placing a sign (colored circle)
+ indicating the topology of the card
Setting one of the switches to Off means "1", On means "0".
Setting the Node ID
--------------------
+^^^^^^^^^^^^^^^^^^^
The eight switches in group SW1 are used to set the node ID.
Each node attached to the network must have an unique node ID which
must be different from 0.
Switch 8 serves as the least significant bit (LSB).
-The node ID is the sum of the values of all switches set to "1"
-These values are:
+The node ID is the sum of the values of all switches set to "1"
+These values are::
Switch | Value
-------|-------
@@ -2454,30 +2526,30 @@ These values are:
2 | 64
1 | 128
-Some Examples:
+Some Examples::
- Switch | Hex | Decimal
+ Switch | Hex | Decimal
1 2 3 4 5 6 7 8 | Node ID | Node ID
----------------|---------|---------
0 0 0 0 0 0 0 0 | not allowed
- 0 0 0 0 0 0 0 1 | 1 | 1
+ 0 0 0 0 0 0 0 1 | 1 | 1
0 0 0 0 0 0 1 0 | 2 | 2
0 0 0 0 0 0 1 1 | 3 | 3
. . . | |
0 1 0 1 0 1 0 1 | 55 | 85
. . . | |
1 0 1 0 1 0 1 0 | AA | 170
- . . . | |
+ . . . | |
1 1 1 1 1 1 0 1 | FD | 253
1 1 1 1 1 1 1 0 | FE | 254
1 1 1 1 1 1 1 1 | FF | 255
Setting the I/O Base Address
-----------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The first three switches in switch group SW2 are used to select one
-of eight possible I/O Base addresses using the following table
+of eight possible I/O Base addresses using the following table::
Switch | Hex I/O
1 2 3 | Address
@@ -2493,7 +2565,7 @@ of eight possible I/O Base addresses using the following table
Setting the Base Memory (RAM) buffer Address
---------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The memory buffer requires 2K of a 16K block of RAM. The base of this
16K block can be located in any of eight positions.
@@ -2501,6 +2573,8 @@ Switches 4-6 of switch group SW2 select the Base of the 16K block.
Within that 16K address space, the buffer may be assigned any one of four
positions, determined by the offset, switches 7 and 8 of group SW2.
+::
+
Switch | Hex RAM | Hex ROM
4 5 6 7 8 | Address | Address *)
-----------|---------|-----------
@@ -2508,60 +2582,62 @@ positions, determined by the offset, switches 7 and 8 of group SW2.
0 0 0 0 1 | C0800 | C2000
0 0 0 1 0 | C1000 | C2000
0 0 0 1 1 | C1800 | C2000
- | |
+ | |
0 0 1 0 0 | C4000 | C6000
0 0 1 0 1 | C4800 | C6000
0 0 1 1 0 | C5000 | C6000
0 0 1 1 1 | C5800 | C6000
- | |
+ | |
0 1 0 0 0 | CC000 | CE000
0 1 0 0 1 | CC800 | CE000
0 1 0 1 0 | CD000 | CE000
0 1 0 1 1 | CD800 | CE000
- | |
+ | |
0 1 1 0 0 | D0000 | D2000 (Manufacturer's default)
0 1 1 0 1 | D0800 | D2000
0 1 1 1 0 | D1000 | D2000
0 1 1 1 1 | D1800 | D2000
- | |
+ | |
1 0 0 0 0 | D4000 | D6000
1 0 0 0 1 | D4800 | D6000
1 0 0 1 0 | D5000 | D6000
1 0 0 1 1 | D5800 | D6000
- | |
+ | |
1 0 1 0 0 | D8000 | DA000
1 0 1 0 1 | D8800 | DA000
1 0 1 1 0 | D9000 | DA000
1 0 1 1 1 | D9800 | DA000
- | |
+ | |
1 1 0 0 0 | DC000 | DE000
1 1 0 0 1 | DC800 | DE000
1 1 0 1 0 | DD000 | DE000
1 1 0 1 1 | DD800 | DE000
- | |
+ | |
1 1 1 0 0 | E0000 | E2000
1 1 1 0 1 | E0800 | E2000
1 1 1 1 0 | E1000 | E2000
1 1 1 1 1 | E1800 | E2000
-
-*) To enable the 8K Boot PROM install the jumper ROM.
- The default is jumper ROM not installed.
+
+ *) To enable the 8K Boot PROM install the jumper ROM.
+ The default is jumper ROM not installed.
Setting Interrupt Request Lines (IRQ)
--------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To select a hardware interrupt level set one (only one!) of the jumpers
IRQ2, IRQ3, IRQ4, IRQ5 or IRQ7. The manufacturer's default is IRQ2.
-
+
Setting the Timeouts
---------------------
+^^^^^^^^^^^^^^^^^^^^
The two jumpers labeled ET1 and ET2 are used to determine the timeout
parameters (response and reconfiguration time). Every node in a network
must be set to the same timeout values.
+::
+
ET1 ET2 | Response Time (us) | Reconfiguration Time (ms)
--------|--------------------|--------------------------
Off Off | 78 | 840 (Default)
@@ -2572,8 +2648,8 @@ must be set to the same timeout values.
On means jumper installed, Off means jumper not installed
-NONAME 16-BIT ARCNET
-====================
+16-BIT ARCNET
+-------------
The manual of my 8-Bit NONAME ARCnet Card contains another description
of a 16-Bit Coax / Twisted Pair Card. This description is incomplete,
@@ -2584,13 +2660,16 @@ the booklet there is a different way of counting ... 2-9, 2-10, A-1,
Also the picture of the board layout is not as good as the picture of
8-Bit card, because there isn't any letter like "SW1" written to the
picture.
+
Should somebody have such a board, please feel free to complete this
description or to send a mail to me!
This description has been written by Juergen Seifert <seifert@htwm.de>
using information from the Original
- "ARCnet Installation Manual"
+ "ARCnet Installation Manual"
+
+::
___________________________________________________________________
< _________________ _________________ |
@@ -2622,15 +2701,15 @@ Setting one of the switches to Off means "1", On means "0".
Setting the Node ID
--------------------
+^^^^^^^^^^^^^^^^^^^
The eight switches in group SW2 are used to set the node ID.
Each node attached to the network must have an unique node ID which
must be different from 0.
Switch 8 serves as the least significant bit (LSB).
-The node ID is the sum of the values of all switches set to "1"
-These values are:
+The node ID is the sum of the values of all switches set to "1"
+These values are::
Switch | Value
-------|-------
@@ -2643,30 +2722,30 @@ These values are:
2 | 64
1 | 128
-Some Examples:
+Some Examples::
- Switch | Hex | Decimal
+ Switch | Hex | Decimal
1 2 3 4 5 6 7 8 | Node ID | Node ID
----------------|---------|---------
0 0 0 0 0 0 0 0 | not allowed
- 0 0 0 0 0 0 0 1 | 1 | 1
+ 0 0 0 0 0 0 0 1 | 1 | 1
0 0 0 0 0 0 1 0 | 2 | 2
0 0 0 0 0 0 1 1 | 3 | 3
. . . | |
0 1 0 1 0 1 0 1 | 55 | 85
. . . | |
1 0 1 0 1 0 1 0 | AA | 170
- . . . | |
+ . . . | |
1 1 1 1 1 1 0 1 | FD | 253
1 1 1 1 1 1 1 0 | FE | 254
1 1 1 1 1 1 1 1 | FF | 255
Setting the I/O Base Address
-----------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The first three switches in switch group SW1 are used to select one
-of eight possible I/O Base addresses using the following table
+of eight possible I/O Base addresses using the following table::
Switch | Hex I/O
3 2 1 | Address
@@ -2682,13 +2761,13 @@ of eight possible I/O Base addresses using the following table
Setting the Base Memory (RAM) buffer Address
---------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The memory buffer requires 2K of a 16K block of RAM. The base of this
16K block can be located in any of eight positions.
Switches 6-8 of switch group SW1 select the Base of the 16K block.
Within that 16K address space, the buffer may be assigned any one of four
-positions, determined by the offset, switches 4 and 5 of group SW1.
+positions, determined by the offset, switches 4 and 5 of group SW1::
Switch | Hex RAM | Hex ROM
8 7 6 5 4 | Address | Address
@@ -2697,111 +2776,111 @@ positions, determined by the offset, switches 4 and 5 of group SW1.
0 0 0 0 1 | C0800 | C2000
0 0 0 1 0 | C1000 | C2000
0 0 0 1 1 | C1800 | C2000
- | |
+ | |
0 0 1 0 0 | C4000 | C6000
0 0 1 0 1 | C4800 | C6000
0 0 1 1 0 | C5000 | C6000
0 0 1 1 1 | C5800 | C6000
- | |
+ | |
0 1 0 0 0 | CC000 | CE000
0 1 0 0 1 | CC800 | CE000
0 1 0 1 0 | CD000 | CE000
0 1 0 1 1 | CD800 | CE000
- | |
+ | |
0 1 1 0 0 | D0000 | D2000 (Manufacturer's default)
0 1 1 0 1 | D0800 | D2000
0 1 1 1 0 | D1000 | D2000
0 1 1 1 1 | D1800 | D2000
- | |
+ | |
1 0 0 0 0 | D4000 | D6000
1 0 0 0 1 | D4800 | D6000
1 0 0 1 0 | D5000 | D6000
1 0 0 1 1 | D5800 | D6000
- | |
+ | |
1 0 1 0 0 | D8000 | DA000
1 0 1 0 1 | D8800 | DA000
1 0 1 1 0 | D9000 | DA000
1 0 1 1 1 | D9800 | DA000
- | |
+ | |
1 1 0 0 0 | DC000 | DE000
1 1 0 0 1 | DC800 | DE000
1 1 0 1 0 | DD000 | DE000
1 1 0 1 1 | DD800 | DE000
- | |
+ | |
1 1 1 0 0 | E0000 | E2000
1 1 1 0 1 | E0800 | E2000
1 1 1 1 0 | E1000 | E2000
1 1 1 1 1 | E1800 | E2000
-
+
Setting Interrupt Request Lines (IRQ)
--------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
??????????????????????????????????????
Setting the Timeouts
---------------------
+^^^^^^^^^^^^^^^^^^^^
??????????????????????????????????????
-*****************************************************************************
-
-** No Name **
8-bit cards ("Made in Taiwan R.O.C.")
------------
+-------------------------------------
+
- from Vojtech Pavlik <vojtech@suse.cz>
I have named this ARCnet card "NONAME", since I got only the card with
-no manual at all and the only text identifying the manufacturer is
+no manual at all and the only text identifying the manufacturer is
"MADE IN TAIWAN R.O.C" printed on the card.
- ____________________________________________________________
- | 1 2 3 4 5 6 7 8 |
- | |o|o| JP1 o|o|o|o|o|o|o|o| ON |
- | + o|o|o|o|o|o|o|o| ___|
- | _____________ o|o|o|o|o|o|o|o| OFF _____ | | ID7
- | | | SW1 | | | | ID6
- | > RAM (2k) | ____________________ | H | | S | ID5
- | |_____________| | || y | | W | ID4
- | | || b | | 2 | ID3
- | | || r | | | ID2
- | | || i | | | ID1
- | | 90C65 || d | |___| ID0
- | SW3 | || | |
- | |o|o|o|o|o|o|o|o| ON | || I | |
- | |o|o|o|o|o|o|o|o| | || C | |
- | |o|o|o|o|o|o|o|o| OFF |____________________|| | _____|
- | 1 2 3 4 5 6 7 8 | | | |___
- | ______________ | | | BNC |___|
- | | | |_____| |_____|
- | > EPROM SOCKET | |
- | |______________| |
- | ______________|
- | |
- |_____________________________________________|
-
-Legend:
-
-90C65 ARCNET Chip
-SW1 1-5: Base Memory Address Select
- 6-8: Base I/O Address Select
-SW2 1-8: Node ID Select (ID0-ID7)
-SW3 1-5: IRQ Select
- 6-7: Extra Timeout
- 8 : ROM Enable
-JP1 Led connector
-BNC Coax connector
-
-Although the jumpers SW1 and SW3 are marked SW, not JP, they are jumpers, not
+::
+
+ ____________________________________________________________
+ | 1 2 3 4 5 6 7 8 |
+ | |o|o| JP1 o|o|o|o|o|o|o|o| ON |
+ | + o|o|o|o|o|o|o|o| ___|
+ | _____________ o|o|o|o|o|o|o|o| OFF _____ | | ID7
+ | | | SW1 | | | | ID6
+ | > RAM (2k) | ____________________ | H | | S | ID5
+ | |_____________| | || y | | W | ID4
+ | | || b | | 2 | ID3
+ | | || r | | | ID2
+ | | || i | | | ID1
+ | | 90C65 || d | |___| ID0
+ | SW3 | || | |
+ | |o|o|o|o|o|o|o|o| ON | || I | |
+ | |o|o|o|o|o|o|o|o| | || C | |
+ | |o|o|o|o|o|o|o|o| OFF |____________________|| | _____|
+ | 1 2 3 4 5 6 7 8 | | | |___
+ | ______________ | | | BNC |___|
+ | | | |_____| |_____|
+ | > EPROM SOCKET | |
+ | |______________| |
+ | ______________|
+ | |
+ |_____________________________________________|
+
+Legend::
+
+ 90C65 ARCNET Chip
+ SW1 1-5: Base Memory Address Select
+ 6-8: Base I/O Address Select
+ SW2 1-8: Node ID Select (ID0-ID7)
+ SW3 1-5: IRQ Select
+ 6-7: Extra Timeout
+ 8 : ROM Enable
+ JP1 Led connector
+ BNC Coax connector
+
+Although the jumpers SW1 and SW3 are marked SW, not JP, they are jumpers, not
switches.
-Setting the jumpers to ON means connecting the upper two pins, off the bottom
+Setting the jumpers to ON means connecting the upper two pins, off the bottom
two - or - in case of IRQ setting, connecting none of them at all.
Setting the Node ID
--------------------
+^^^^^^^^^^^^^^^^^^^
The eight switches in SW2 are used to set the node ID. Each node attached
to the network must have an unique node ID which must not be 0.
@@ -2809,8 +2888,8 @@ Switch 1 (ID0) serves as the least significant bit (LSB).
Setting one of the switches to Off means "1", On means "0".
-The node ID is the sum of the values of all switches set to "1"
-These values are:
+The node ID is the sum of the values of all switches set to "1"
+These values are::
Switch | Label | Value
-------|-------|-------
@@ -2823,30 +2902,30 @@ These values are:
7 | ID6 | 64
8 | ID7 | 128
-Some Examples:
+Some Examples::
- Switch | Hex | Decimal
+ Switch | Hex | Decimal
8 7 6 5 4 3 2 1 | Node ID | Node ID
----------------|---------|---------
0 0 0 0 0 0 0 0 | not allowed
- 0 0 0 0 0 0 0 1 | 1 | 1
+ 0 0 0 0 0 0 0 1 | 1 | 1
0 0 0 0 0 0 1 0 | 2 | 2
0 0 0 0 0 0 1 1 | 3 | 3
. . . | |
0 1 0 1 0 1 0 1 | 55 | 85
. . . | |
1 0 1 0 1 0 1 0 | AA | 170
- . . . | |
+ . . . | |
1 1 1 1 1 1 0 1 | FD | 253
1 1 1 1 1 1 1 0 | FE | 254
1 1 1 1 1 1 1 1 | FF | 255
Setting the I/O Base Address
-----------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The last three switches in switch block SW1 are used to select one
-of eight possible I/O Base addresses using the following table
+of eight possible I/O Base addresses using the following table::
Switch | Hex I/O
@@ -2863,13 +2942,16 @@ of eight possible I/O Base addresses using the following table
Setting the Base Memory (RAM) buffer Address
---------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The memory buffer (RAM) requires 2K. The base of this buffer can be
+The memory buffer (RAM) requires 2K. The base of this buffer can be
located in any of eight positions. The address of the Boot Prom is
memory base + 0x2000.
+
Jumpers 3-5 of jumper block SW1 select the Memory Base address.
+::
+
Switch | Hex RAM | Hex ROM
1 2 3 4 5 | Address | Address *)
--------------------|---------|-----------
@@ -2881,15 +2963,15 @@ Jumpers 3-5 of jumper block SW1 select the Memory Base address.
ON ON OFF ON OFF | D8000 | DA000
ON ON ON OFF OFF | DC000 | DE000
ON ON OFF OFF OFF | E0000 | E2000
-
-*) To enable the Boot ROM set the jumper 8 of jumper block SW3 to position ON.
+
+ *) To enable the Boot ROM set the jumper 8 of jumper block SW3 to position ON.
The jumpers 1 and 2 probably add 0x0800, 0x1000 and 0x1800 to RAM adders.
Setting the Interrupt Line
---------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^
-Jumpers 1-5 of the jumper block SW3 control the IRQ level.
+Jumpers 1-5 of the jumper block SW3 control the IRQ level::
Jumper | IRQ
1 2 3 4 5 |
@@ -2902,23 +2984,24 @@ Jumpers 1-5 of the jumper block SW3 control the IRQ level.
Setting the Timeout Parameters
-------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The jumpers 6-7 of the jumper block SW3 are used to determine the timeout
+The jumpers 6-7 of the jumper block SW3 are used to determine the timeout
parameters. These two jumpers are normally left in the OFF position.
-*****************************************************************************
-** No Name **
(Generic Model 9058)
--------------------
- from Andrew J. Kroll <ag784@freenet.buffalo.edu>
- Sorry this sat in my to-do box for so long, Andrew! (yikes - over a
year!)
- _____
- | <
- | .---'
+
+::
+
+ _____
+ | <
+ | .---'
________________________________________________________________ | |
| | SW2 | | |
| ___________ |_____________| | |
@@ -2936,7 +3019,7 @@ parameters. These two jumpers are normally left in the OFF position.
| |________________| | | : B |- | |
| 1 2 3 4 5 6 7 8 | | : O |- | |
| |_________o____|..../ A |- _______| |
- | ____________________ | R |- | |------,
+ | ____________________ | R |- | |------,
| | | | D |- | BNC | # |
| > 2764 PROM SOCKET | |__________|- |_______|------'
| |____________________| _________ | |
@@ -2945,23 +3028,24 @@ parameters. These two jumpers are normally left in the OFF position.
|___ ______________| |
|H H H H H H H H H H H H H H H H H H H H H H H| | |
|U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U_U| | |
- \|
-Legend:
+ \|
+
+Legend::
-SL90C65 ARCNET Controller / Transceiver /Logic
-SW1 1-5: IRQ Select
+ SL90C65 ARCNET Controller / Transceiver /Logic
+ SW1 1-5: IRQ Select
6: ET1
7: ET2
- 8: ROM ENABLE
-SW2 1-3: Memory Buffer/PROM Address
+ 8: ROM ENABLE
+ SW2 1-3: Memory Buffer/PROM Address
3-6: I/O Address Map
-SW3 1-8: Node ID Select
-BNC BNC RG62/U Connection
+ SW3 1-8: Node ID Select
+ BNC BNC RG62/U Connection
*I* have had success using RG59B/U with *NO* terminators!
What gives?!
SW1: Timeouts, Interrupt and ROM
----------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To select a hardware interrupt level set one (only one!) of the dip switches
up (on) SW1...(switches 1-5)
@@ -2976,10 +3060,10 @@ are normally left off (down).
Setting the I/O Base Address
-----------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The last three switches in switch group SW2 are used to select one
-of eight possible I/O Base addresses using the following table
+of eight possible I/O Base addresses using the following table::
Switch | Hex I/O
@@ -2996,7 +3080,7 @@ of eight possible I/O Base addresses using the following table
Setting the Base Memory Address (RAM & ROM)
--------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The memory buffer requires 2K of a 16K block of RAM. The base of this
16K block can be located in any of eight positions.
@@ -3004,13 +3088,16 @@ Switches 1-3 of switch group SW2 select the Base of the 16K block.
(0 = DOWN, 1 = UP)
I could, however, only verify two settings...
+
+::
+
Switch| Hex RAM | Hex ROM
1 2 3 | Address | Address
------|---------|-----------
0 0 0 | E0000 | E2000
0 0 1 | D0000 | D2000 (Manufacturer's default)
0 1 0 | ????? | ?????
- 0 1 1 | ????? | ?????
+ 0 1 1 | ????? | ?????
1 0 0 | ????? | ?????
1 0 1 | ????? | ?????
1 1 0 | ????? | ?????
@@ -3018,7 +3105,7 @@ I could, however, only verify two settings...
Setting the Node ID
--------------------
+^^^^^^^^^^^^^^^^^^^
The eight switches in group SW3 are used to set the node ID.
Each node attached to the network must have an unique node ID which
@@ -3026,8 +3113,9 @@ must be different from 0.
Switch 1 serves as the least significant bit (LSB).
switches in the DOWN position are OFF (0) and in the UP position are ON (1)
-The node ID is the sum of the values of all switches set to "1"
-These values are:
+The node ID is the sum of the values of all switches set to "1"
+These values are::
+
Switch | Value
-------|-------
1 | 1
@@ -3039,70 +3127,80 @@ These values are:
7 | 64
8 | 128
-Some Examples:
-
- Switch# | Hex | Decimal
-8 7 6 5 4 3 2 1 | Node ID | Node ID
-----------------|---------|---------
-0 0 0 0 0 0 0 0 | not allowed <-.
-0 0 0 0 0 0 0 1 | 1 | 1 |
-0 0 0 0 0 0 1 0 | 2 | 2 |
-0 0 0 0 0 0 1 1 | 3 | 3 |
- . . . | | |
-0 1 0 1 0 1 0 1 | 55 | 85 |
- . . . | | + Don't use 0 or 255!
-1 0 1 0 1 0 1 0 | AA | 170 |
- . . . | | |
-1 1 1 1 1 1 0 1 | FD | 253 |
-1 1 1 1 1 1 1 0 | FE | 254 |
-1 1 1 1 1 1 1 1 | FF | 255 <-'
-
+Some Examples::
-*****************************************************************************
+ Switch# | Hex | Decimal
+ 8 7 6 5 4 3 2 1 | Node ID | Node ID
+ ----------------|---------|---------
+ 0 0 0 0 0 0 0 0 | not allowed <-.
+ 0 0 0 0 0 0 0 1 | 1 | 1 |
+ 0 0 0 0 0 0 1 0 | 2 | 2 |
+ 0 0 0 0 0 0 1 1 | 3 | 3 |
+ . . . | | |
+ 0 1 0 1 0 1 0 1 | 55 | 85 |
+ . . . | | + Don't use 0 or 255!
+ 1 0 1 0 1 0 1 0 | AA | 170 |
+ . . . | | |
+ 1 1 1 1 1 1 0 1 | FD | 253 |
+ 1 1 1 1 1 1 1 0 | FE | 254 |
+ 1 1 1 1 1 1 1 1 | FF | 255 <-'
+
+
+Tiara
+=====
-** Tiara **
(model unknown)
--------------------------
+---------------
+
- from Christoph Lameter <christoph@lameter.com>
-
-
-Here is information about my card as far as I could figure it out:
------------------------------------------------ tiara
-Tiara LanCard of Tiara Computer Systems.
-
-+----------------------------------------------+
-! ! Transmitter Unit ! !
-! +------------------+ -------
-! MEM Coax Connector
-! ROM 7654321 <- I/O -------
-! : : +--------+ !
-! : : ! 90C66LJ! +++
-! : : ! ! !D Switch to set
-! : : ! ! !I the Nodenumber
-! : : +--------+ !P
-! !++
-! 234567 <- IRQ !
-+------------!!!!!!!!!!!!!!!!!!!!!!!!--------+
- !!!!!!!!!!!!!!!!!!!!!!!!
-
-0 = Jumper Installed
-1 = Open
+
+
+Here is information about my card as far as I could figure it out::
+
+
+ ----------------------------------------------- tiara
+ Tiara LanCard of Tiara Computer Systems.
+
+ +----------------------------------------------+
+ ! ! Transmitter Unit ! !
+ ! +------------------+ -------
+ ! MEM Coax Connector
+ ! ROM 7654321 <- I/O -------
+ ! : : +--------+ !
+ ! : : ! 90C66LJ! +++
+ ! : : ! ! !D Switch to set
+ ! : : ! ! !I the Nodenumber
+ ! : : +--------+ !P
+ ! !++
+ ! 234567 <- IRQ !
+ +------------!!!!!!!!!!!!!!!!!!!!!!!!--------+
+ !!!!!!!!!!!!!!!!!!!!!!!!
+
+- 0 = Jumper Installed
+- 1 = Open
Top Jumper line Bit 7 = ROM Enable 654=Memory location 321=I/O
Settings for Memory Location (Top Jumper Line)
+
+=== ================
456 Address selected
+=== ================
000 C0000
001 C4000
010 CC000
011 D0000
100 D4000
101 D8000
-110 DC000
+110 DC000
111 E0000
+=== ================
Settings for I/O Address (Top Jumper Line)
+
+=== ====
123 Port
+=== ====
000 260
001 290
010 2E0
@@ -3111,23 +3209,26 @@ Settings for I/O Address (Top Jumper Line)
101 350
110 380
111 3E0
+=== ====
Settings for IRQ Selection (Lower Jumper Line)
+
+====== =====
234567
+====== =====
011111 IRQ 2
101111 IRQ 3
110111 IRQ 4
111011 IRQ 5
111110 IRQ 7
-
-*****************************************************************************
-
+====== =====
Other Cards
------------
+===========
I have no information on other models of ARCnet cards at the moment. Please
send any and all info to:
+
apenwarr@worldvisions.ca
Thanks.
diff --git a/Documentation/networking/arcnet.txt b/Documentation/networking/arcnet.rst
index aff97f47c05c..e93d9820f0f1 100644
--- a/Documentation/networking/arcnet.txt
+++ b/Documentation/networking/arcnet.rst
@@ -1,11 +1,18 @@
-----------------------------------------------------------------------------
-NOTE: See also arcnet-hardware.txt in this directory for jumper-setting
-and cabling information if you're like many of us and didn't happen to get a
-manual with your ARCnet card.
-----------------------------------------------------------------------------
+.. SPDX-License-Identifier: GPL-2.0
+
+======
+ARCnet
+======
+
+.. note::
+
+ See also arcnet-hardware.txt in this directory for jumper-setting
+ and cabling information if you're like many of us and didn't happen to get a
+ manual with your ARCnet card.
Since no one seems to listen to me otherwise, perhaps a poem will get your
-attention:
+attention::
+
This driver's getting fat and beefy,
But my cat is still named Fifi.
@@ -24,28 +31,21 @@ Come on, be a sport! Send me a success report!
(hey, that was even better than my original poem... this is getting bad!)
---------
-WARNING:
---------
-
-If you don't e-mail me about your success/failure soon, I may be forced to
-start SINGING. And we don't want that, do we?
+.. warning::
-(You know, it might be argued that I'm pushing this point a little too much.
-If you think so, why not flame me in a quick little e-mail? Please also
-include the type of card(s) you're using, software, size of network, and
-whether it's working or not.)
+ If you don't e-mail me about your success/failure soon, I may be forced to
+ start SINGING. And we don't want that, do we?
-My e-mail address is: apenwarr@worldvisions.ca
+ (You know, it might be argued that I'm pushing this point a little too much.
+ If you think so, why not flame me in a quick little e-mail? Please also
+ include the type of card(s) you're using, software, size of network, and
+ whether it's working or not.)
+ My e-mail address is: apenwarr@worldvisions.ca
----------------------------------------------------------------------------
-
-
These are the ARCnet drivers for Linux.
-
-This new release (2.91) has been put together by David Woodhouse
+This new release (2.91) has been put together by David Woodhouse
<dwmw2@infradead.org>, in an attempt to tidy up the driver after adding support
for yet another chipset. Now the generic support has been separated from the
individual chipset drivers, and the source files aren't quite so packed with
@@ -62,12 +62,13 @@ included and seems to be working fine!
Where do I discuss these drivers?
---------------------------------
-Tomasz has been so kind as to set up a new and improved mailing list.
+Tomasz has been so kind as to set up a new and improved mailing list.
Subscribe by sending a message with the BODY "subscribe linux-arcnet YOUR
REAL NAME" to listserv@tichy.ch.uj.edu.pl. Then, to submit messages to the
list, mail to linux-arcnet@tichy.ch.uj.edu.pl.
There are archives of the mailing list at:
+
http://epistolary.org/mailman/listinfo.cgi/arcnet
The people on linux-net@vger.kernel.org (now defunct, replaced by
@@ -80,17 +81,20 @@ Other Drivers and Info
----------------------
You can try my ARCNET page on the World Wide Web at:
- http://www.qis.net/~jschmitz/arcnet/
+
+ http://www.qis.net/~jschmitz/arcnet/
Also, SMC (one of the companies that makes ARCnet cards) has a WWW site you
might be interested in, which includes several drivers for various cards
including ARCnet. Try:
+
http://www.smc.com/
-
+
Performance Technologies makes various network software that supports
ARCnet:
+
http://www.perftech.com/ or ftp to ftp.perftech.com.
-
+
Novell makes a networking stack for DOS which includes ARCnet drivers. Try
FTPing to ftp.novell.com.
@@ -99,19 +103,20 @@ one you'll want to use with ARCnet cards) from
oak.oakland.edu:/simtel/msdos/pktdrvr. It won't work perfectly on a 386+
without patches, though, and also doesn't like several cards. Fixed
versions are available on my WWW page, or via e-mail if you don't have WWW
-access.
+access.
Installing the Driver
---------------------
-All you will need to do in order to install the driver is:
+All you will need to do in order to install the driver is::
+
make config
- (be sure to choose ARCnet in the network devices
+ (be sure to choose ARCnet in the network devices
and at least one chipset driver.)
make clean
make zImage
-
+
If you obtained this ARCnet package as an upgrade to the ARCnet driver in
your current kernel, you will need to first copy arcnet.c over the one in
the linux/drivers/net directory.
@@ -125,10 +130,12 @@ There are four chipset options:
This is the normal ARCnet card, which you've probably got. This is the only
chipset driver which will autoprobe if not told where the card is.
-It following options on the command line:
+It following options on the command line::
+
com90xx=[<io>[,<irq>[,<shmem>]]][,<name>] | <name>
-If you load the chipset support as a module, the options are:
+If you load the chipset support as a module, the options are::
+
io=<io> irq=<irq> shmem=<shmem> device=<name>
To disable the autoprobe, just specify "com90xx=" on the kernel command line.
@@ -136,14 +143,17 @@ To specify the name alone, but allow autoprobe, just put "com90xx=<name>"
2. ARCnet COM20020 chipset.
-This is the new chipset from SMC with support for promiscuous mode (packet
+This is the new chipset from SMC with support for promiscuous mode (packet
sniffing), extra diagnostic information, etc. Unfortunately, there is no
sensible method of autoprobing for these cards. You must specify the I/O
address on the kernel command line.
-The command line options are:
+
+The command line options are::
+
com20020=<io>[,<irq>[,<node_ID>[,backplane[,CKP[,timeout]]]]][,name]
-If you load the chipset support as a module, the options are:
+If you load the chipset support as a module, the options are::
+
io=<io> irq=<irq> node=<node_ID> backplane=<backplane> clock=<CKP>
timeout=<timeout> device=<name>
@@ -160,8 +170,10 @@ you have a card which doesn't support shared memory, or (strangely) in case
you have so many ARCnet cards in your machine that you run out of shmem slots.
If you don't give the IO address on the kernel command line, then the driver
will not find the card.
-The command line options are:
- com90io=<io>[,<irq>][,<name>]
+
+The command line options are::
+
+ com90io=<io>[,<irq>][,<name>]
If you load the chipset support as a module, the options are:
io=<io> irq=<irq> device=<name>
@@ -169,44 +181,49 @@ If you load the chipset support as a module, the options are:
4. ARCnet RIM I cards.
These are COM90xx chips which are _completely_ memory mapped. The support for
-these is not tested. If you have one, please mail the author with a success
+these is not tested. If you have one, please mail the author with a success
report. All options must be specified, except the device name.
-Command line options:
+Command line options::
+
arcrimi=<shmem>,<irq>,<node_ID>[,<name>]
-If you load the chipset support as a module, the options are:
+If you load the chipset support as a module, the options are::
+
shmem=<shmem> irq=<irq> node=<node_ID> device=<name>
Loadable Module Support
-----------------------
-Configure and rebuild Linux. When asked, answer 'm' to "Generic ARCnet
+Configure and rebuild Linux. When asked, answer 'm' to "Generic ARCnet
support" and to support for your ARCnet chipset if you want to use the
-loadable module. You can also say 'y' to "Generic ARCnet support" and 'm'
+loadable module. You can also say 'y' to "Generic ARCnet support" and 'm'
to the chipset support if you wish.
+::
+
make config
- make clean
+ make clean
make zImage
make modules
-
+
If you're using a loadable module, you need to use insmod to load it, and
you can specify various characteristics of your card on the command
line. (In recent versions of the driver, autoprobing is much more reliable
and works as a module, so most of this is now unnecessary.)
-For example:
+For example::
+
cd /usr/src/linux/modules
insmod arcnet.o
insmod com90xx.o
insmod com20020.o io=0x2e0 device=eth1
-
+
Using the Driver
----------------
-If you build your kernel with ARCnet COM90xx support included, it should
+If you build your kernel with ARCnet COM90xx support included, it should
probe for your card automatically when you boot. If you use a different
chipset driver complied into the kernel, you must give the necessary options
on the kernel command line, as detailed above.
@@ -224,69 +241,78 @@ Multiple Cards in One Computer
------------------------------
Linux has pretty good support for this now, but since I've been busy, the
-ARCnet driver has somewhat suffered in this respect. COM90xx support, if
-compiled into the kernel, will (try to) autodetect all the installed cards.
+ARCnet driver has somewhat suffered in this respect. COM90xx support, if
+compiled into the kernel, will (try to) autodetect all the installed cards.
+
+If you have other cards, with support compiled into the kernel, then you can
+just repeat the options on the kernel command line, e.g.::
+
+ LILO: linux com20020=0x2e0 com20020=0x380 com90io=0x260
-If you have other cards, with support compiled into the kernel, then you can
-just repeat the options on the kernel command line, e.g.:
-LILO: linux com20020=0x2e0 com20020=0x380 com90io=0x260
+If you have the chipset support built as a loadable module, then you need to
+do something like this::
-If you have the chipset support built as a loadable module, then you need to
-do something like this:
insmod -o arc0 com90xx
insmod -o arc1 com20020 io=0x2e0
insmod -o arc2 com90xx
+
The ARCnet drivers will now sort out their names automatically.
How do I get it to work with...?
--------------------------------
-NFS: Should be fine linux->linux, just pretend you're using Ethernet cards.
- oak.oakland.edu:/simtel/msdos/nfs has some nice DOS clients. There
- is also a DOS-based NFS server called SOSS. It doesn't multitask
- quite the way Linux does (actually, it doesn't multitask AT ALL) but
- you never know what you might need.
-
- With AmiTCP (and possibly others), you may need to set the following
- options in your Amiga nfstab: MD 1024 MR 1024 MW 1024
- (Thanks to Christian Gottschling <ferksy@indigo.tng.oche.de>
+NFS:
+ Should be fine linux->linux, just pretend you're using Ethernet cards.
+ oak.oakland.edu:/simtel/msdos/nfs has some nice DOS clients. There
+ is also a DOS-based NFS server called SOSS. It doesn't multitask
+ quite the way Linux does (actually, it doesn't multitask AT ALL) but
+ you never know what you might need.
+
+ With AmiTCP (and possibly others), you may need to set the following
+ options in your Amiga nfstab: MD 1024 MR 1024 MW 1024
+ (Thanks to Christian Gottschling <ferksy@indigo.tng.oche.de>
for this.)
-
+
Probably these refer to maximum NFS data/read/write block sizes. I
don't know why the defaults on the Amiga didn't work; write to me if
you know more.
-DOS: If you're using the freeware arcether.com, you might want to install
- the driver patch from my web page. It helps with PC/TCP, and also
- can get arcether to load if it timed out too quickly during
- initialization. In fact, if you use it on a 386+ you REALLY need
- the patch, really.
-
-Windows: See DOS :) Trumpet Winsock works fine with either the Novell or
+DOS:
+ If you're using the freeware arcether.com, you might want to install
+ the driver patch from my web page. It helps with PC/TCP, and also
+ can get arcether to load if it timed out too quickly during
+ initialization. In fact, if you use it on a 386+ you REALLY need
+ the patch, really.
+
+Windows:
+ See DOS :) Trumpet Winsock works fine with either the Novell or
Arcether client, assuming you remember to load winpkt of course.
-LAN Manager and Windows for Workgroups: These programs use protocols that
- are incompatible with the Internet standard. They try to pretend
- the cards are Ethernet, and confuse everyone else on the network.
-
- However, v2.00 and higher of the Linux ARCnet driver supports this
- protocol via the 'arc0e' device. See the section on "Multiprotocol
- Support" for more information.
+LAN Manager and Windows for Workgroups:
+ These programs use protocols that
+ are incompatible with the Internet standard. They try to pretend
+ the cards are Ethernet, and confuse everyone else on the network.
+
+ However, v2.00 and higher of the Linux ARCnet driver supports this
+ protocol via the 'arc0e' device. See the section on "Multiprotocol
+ Support" for more information.
Using the freeware Samba server and clients for Linux, you can now
interface quite nicely with TCP/IP-based WfWg or Lan Manager
networks.
-
-Windows 95: Tools are included with Win95 that let you use either the LANMAN
+
+Windows 95:
+ Tools are included with Win95 that let you use either the LANMAN
style network drivers (NDIS) or Novell drivers (ODI) to handle your
ARCnet packets. If you use ODI, you'll need to use the 'arc0'
- device with Linux. If you use NDIS, then try the 'arc0e' device.
+ device with Linux. If you use NDIS, then try the 'arc0e' device.
See the "Multiprotocol Support" section below if you need arc0e,
you're completely insane, and/or you need to build some kind of
hybrid network that uses both encapsulation types.
-OS/2: I've been told it works under Warp Connect with an ARCnet driver from
+OS/2:
+ I've been told it works under Warp Connect with an ARCnet driver from
SMC. You need to use the 'arc0e' interface for this. If you get
the SMC driver to work with the TCP/IP stuff included in the
"normal" Warp Bonus Pack, let me know.
@@ -295,7 +321,8 @@ OS/2: I've been told it works under Warp Connect with an ARCnet driver from
which should use the same protocol as WfWg does. I had no luck
installing it under Warp, however. Please mail me with any results.
-NetBSD/AmiTCP: These use an old version of the Internet standard ARCnet
+NetBSD/AmiTCP:
+ These use an old version of the Internet standard ARCnet
protocol (RFC1051) which is compatible with the Linux driver v2.10
ALPHA and above using the arc0s device. (See "Multiprotocol ARCnet"
below.) ** Newer versions of NetBSD apparently support RFC1201.
@@ -307,16 +334,17 @@ Using Multiprotocol ARCnet
The ARCnet driver v2.10 ALPHA supports three protocols, each on its own
"virtual network device":
- arc0 - RFC1201 protocol, the official Internet standard which just
- happens to be 100% compatible with Novell's TRXNET driver.
+ ====== ===============================================================
+ arc0 RFC1201 protocol, the official Internet standard which just
+ happens to be 100% compatible with Novell's TRXNET driver.
Version 1.00 of the ARCnet driver supported _only_ this
protocol. arc0 is the fastest of the three protocols (for
whatever reason), and allows larger packets to be used
- because it supports RFC1201 "packet splitting" operations.
+ because it supports RFC1201 "packet splitting" operations.
Unless you have a specific need to use a different protocol,
I strongly suggest that you stick with this one.
-
- arc0e - "Ethernet-Encapsulation" which sends packets over ARCnet
+
+ arc0e "Ethernet-Encapsulation" which sends packets over ARCnet
that are actually a lot like Ethernet packets, including the
6-byte hardware addresses. This protocol is compatible with
Microsoft's NDIS ARCnet driver, like the one in WfWg and
@@ -328,8 +356,8 @@ The ARCnet driver v2.10 ALPHA supports three protocols, each on its own
fit. arc0e also works slightly more slowly than arc0, for
reasons yet to be determined. (Probably it's the smaller
MTU that does it.)
-
- arc0s - The "[s]imple" RFC1051 protocol is the "previous" Internet
+
+ arc0s The "[s]imple" RFC1051 protocol is the "previous" Internet
standard that is completely incompatible with the new
standard. Some software today, however, continues to
support the old standard (and only the old standard)
@@ -338,9 +366,10 @@ The ARCnet driver v2.10 ALPHA supports three protocols, each on its own
smaller than the Internet "requirement," so it's quite
possible that you may run into problems. It's also slower
than RFC1201 by about 25%, for the same reason as arc0e.
-
+
The arc0s support was contributed by Tomasz Motylewski
and modified somewhat by me. Bugs are probably my fault.
+ ====== ===============================================================
You can choose not to compile arc0e and arc0s into the driver if you want -
this will save you a bit of memory and avoid confusion when eg. trying to
@@ -358,19 +387,21 @@ can set up your network then:
two available protocols. As mentioned above, it's a good idea to use
only arc0 unless you have a good reason (like some other software, ie.
WfWg, that only works with arc0e).
-
- If you need only arc0, then the following commands should get you going:
- ifconfig arc0 MY.IP.ADD.RESS
- route add MY.IP.ADD.RESS arc0
- route add -net SUB.NET.ADD.RESS arc0
- [add other local routes here]
-
- If you need arc0e (and only arc0e), it's a little different:
- ifconfig arc0 MY.IP.ADD.RESS
- ifconfig arc0e MY.IP.ADD.RESS
- route add MY.IP.ADD.RESS arc0e
- route add -net SUB.NET.ADD.RESS arc0e
-
+
+ If you need only arc0, then the following commands should get you going::
+
+ ifconfig arc0 MY.IP.ADD.RESS
+ route add MY.IP.ADD.RESS arc0
+ route add -net SUB.NET.ADD.RESS arc0
+ [add other local routes here]
+
+ If you need arc0e (and only arc0e), it's a little different::
+
+ ifconfig arc0 MY.IP.ADD.RESS
+ ifconfig arc0e MY.IP.ADD.RESS
+ route add MY.IP.ADD.RESS arc0e
+ route add -net SUB.NET.ADD.RESS arc0e
+
arc0s works much the same way as arc0e.
@@ -391,29 +422,32 @@ can set up your network then:
XT (patience), however, does not have its own Internet IP address and so
I assigned it one on a "private subnet" (as defined by RFC1597).
- To start with, take a simple network with just insight and freedom.
+ To start with, take a simple network with just insight and freedom.
Insight needs to:
- - talk to freedom via RFC1201 (arc0) protocol, because I like it
+
+ - talk to freedom via RFC1201 (arc0) protocol, because I like it
more and it's faster.
- use freedom as its Internet gateway.
-
- That's pretty easy to do. Set up insight like this:
- ifconfig arc0 insight
- route add insight arc0
- route add freedom arc0 /* I would use the subnet here (like I said
+
+ That's pretty easy to do. Set up insight like this::
+
+ ifconfig arc0 insight
+ route add insight arc0
+ route add freedom arc0 /* I would use the subnet here (like I said
to to in "single protocol" above),
- but the rest of the subnet
- unfortunately lies across the PPP
- link on freedom, which confuses
- things. */
- route add default gw freedom
-
- And freedom gets configured like so:
- ifconfig arc0 freedom
- route add freedom arc0
- route add insight arc0
- /* and default gateway is configured by pppd */
-
+ but the rest of the subnet
+ unfortunately lies across the PPP
+ link on freedom, which confuses
+ things. */
+ route add default gw freedom
+
+ And freedom gets configured like so::
+
+ ifconfig arc0 freedom
+ route add freedom arc0
+ route add insight arc0
+ /* and default gateway is configured by pppd */
+
Great, now insight talks to freedom directly on arc0, and sends packets
to the Internet through freedom. If you didn't know how to do the above,
you should probably stop reading this section now because it only gets
@@ -425,7 +459,7 @@ can set up your network then:
Internet. (Recall that patience has a "private IP address" which won't
work on the Internet; that's okay, I configured Linux IP masquerading on
freedom for this subnet).
-
+
So patience (necessarily; I don't have another IP number from my
provider) has an IP address on a different subnet than freedom and
insight, but needs to use freedom as an Internet gateway. Worse, most
@@ -435,53 +469,54 @@ can set up your network then:
insight, patience WILL send through its default gateway, regardless of
the fact that both freedom and insight (courtesy of the arc0e device)
could understand a direct transmission.
-
- I compensate by giving freedom an extra IP address - aliased 'gatekeeper'
- - that is on my private subnet, the same subnet that patience is on. I
+
+ I compensate by giving freedom an extra IP address - aliased 'gatekeeper' -
+ that is on my private subnet, the same subnet that patience is on. I
then define gatekeeper to be the default gateway for patience.
-
- To configure freedom (in addition to the commands above):
- ifconfig arc0e gatekeeper
- route add gatekeeper arc0e
- route add patience arc0e
-
+
+ To configure freedom (in addition to the commands above)::
+
+ ifconfig arc0e gatekeeper
+ route add gatekeeper arc0e
+ route add patience arc0e
+
This way, freedom will send all packets for patience through arc0e,
giving its IP address as gatekeeper (on the private subnet). When it
talks to insight or the Internet, it will use its "freedom" Internet IP
address.
-
- You will notice that we haven't configured the arc0e device on insight.
+
+ You will notice that we haven't configured the arc0e device on insight.
This would work, but is not really necessary, and would require me to
assign insight another special IP number from my private subnet. Since
both insight and patience are using freedom as their default gateway, the
two can already talk to each other.
-
+
It's quite fortunate that I set things up like this the first time (cough
cough) because it's really handy when I boot insight into DOS. There, it
- runs the Novell ODI protocol stack, which only works with RFC1201 ARCnet.
+ runs the Novell ODI protocol stack, which only works with RFC1201 ARCnet.
In this mode it would be impossible for insight to communicate directly
with patience, since the Novell stack is incompatible with Microsoft's
Ethernet-Encap. Without changing any settings on freedom or patience, I
simply set freedom as the default gateway for insight (now in DOS,
remember) and all the forwarding happens "automagically" between the two
hosts that would normally not be able to communicate at all.
-
+
For those who like diagrams, I have created two "virtual subnets" on the
- same physical ARCnet wire. You can picture it like this:
-
-
- [RFC1201 NETWORK] [ETHER-ENCAP NETWORK]
+ same physical ARCnet wire. You can picture it like this::
+
+
+ [RFC1201 NETWORK] [ETHER-ENCAP NETWORK]
(registered Internet subnet) (RFC1597 private subnet)
-
- (IP Masquerade)
- /---------------\ * /---------------\
- | | * | |
- | +-Freedom-*-Gatekeeper-+ |
- | | | * | |
- \-------+-------/ | * \-------+-------/
- | | |
- Insight | Patience
- (Internet)
+
+ (IP Masquerade)
+ /---------------\ * /---------------\
+ | | * | |
+ | +-Freedom-*-Gatekeeper-+ |
+ | | | * | |
+ \-------+-------/ | * \-------+-------/
+ | | |
+ Insight | Patience
+ (Internet)
@@ -491,6 +526,7 @@ It works: what now?
Send mail describing your setup, preferably including driver version, kernel
version, ARCnet card model, CPU type, number of systems on your network, and
list of software in use to me at the following address:
+
apenwarr@worldvisions.ca
I do send (sometimes automated) replies to all messages I receive. My email
@@ -525,7 +561,7 @@ this, you should grab the pertinent RFCs. (some are listed near the top of
arcnet.c). arcdump assumes your card is at 0xD0000. If it isn't, edit the
script.
-Buffers 0 and 1 are used for receiving, and Buffers 2 and 3 are for sending.
+Buffers 0 and 1 are used for receiving, and Buffers 2 and 3 are for sending.
Ping-pong buffers are implemented both ways.
If your debug level includes D_DURING and you did NOT define SLOW_XMIT_COPY,
@@ -535,9 +571,11 @@ decides that the driver is broken). During a transmit, unused parts of the
buffer will be cleared to 0x42 as well. This is to make it easier to figure
out which bytes are being used by a packet.
-You can change the debug level without recompiling the kernel by typing:
+You can change the debug level without recompiling the kernel by typing::
+
ifconfig arc0 down metric 1xxx
/etc/rc.d/rc.inet1
+
where "xxx" is the debug level you want. For example, "metric 1015" would put
you at debug level 15. Debug level 7 is currently the default.
@@ -546,7 +584,7 @@ combination of different debug flags; so debug level 7 is really 1+2+4 or
D_NORMAL+D_EXTRA+D_INIT. To include D_DURING, you would add 16 to this,
resulting in debug level 23.
-If you don't understand that, you probably don't want to know anyway.
+If you don't understand that, you probably don't want to know anyway.
E-mail me about your problem.
diff --git a/Documentation/networking/atm.txt b/Documentation/networking/atm.rst
index 82921cee77fe..c1df8c038525 100644
--- a/Documentation/networking/atm.txt
+++ b/Documentation/networking/atm.rst
@@ -1,3 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+ATM
+===
+
In order to use anything but the most primitive functions of ATM,
several user-mode programs are required to assist the kernel. These
programs and related material can be found via the ATM on Linux Web
diff --git a/Documentation/networking/ax25.txt b/Documentation/networking/ax25.rst
index 8257dbf9be57..824afd7002db 100644
--- a/Documentation/networking/ax25.txt
+++ b/Documentation/networking/ax25.rst
@@ -1,3 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====
+AX.25
+=====
+
To use the amateur radio protocols within Linux you will need to get a
suitable copy of the AX.25 Utilities. More detailed information about
AX.25, NET/ROM and ROSE, associated programs and and utilities can be
diff --git a/Documentation/networking/baycom.txt b/Documentation/networking/baycom.rst
index 688f18fd4467..fe2d010f0e86 100644
--- a/Documentation/networking/baycom.txt
+++ b/Documentation/networking/baycom.rst
@@ -1,26 +1,31 @@
- LINUX DRIVERS FOR BAYCOM MODEMS
+.. SPDX-License-Identifier: GPL-2.0
- Thomas M. Sailer, HB9JNX/AE4WA, <sailer@ife.ee.ethz.ch>
+===============================
+Linux Drivers for Baycom Modems
+===============================
-!!NEW!! (04/98) The drivers for the baycom modems have been split into
+Thomas M. Sailer, HB9JNX/AE4WA, <sailer@ife.ee.ethz.ch>
+
+The drivers for the baycom modems have been split into
separate drivers as they did not share any code, and the driver
and device names have changed.
This document describes the Linux Kernel Drivers for simple Baycom style
-amateur radio modems.
+amateur radio modems.
The following drivers are available:
+====================================
baycom_ser_fdx:
This driver supports the SER12 modems either full or half duplex.
- Its baud rate may be changed via the `baud' module parameter,
+ Its baud rate may be changed via the ``baud`` module parameter,
therefore it supports just about every bit bang modem on a
serial port. Its devices are called bcsf0 through bcsf3.
This is the recommended driver for SER12 type modems,
however if you have a broken UART clone that does not have working
- delta status bits, you may try baycom_ser_hdx.
+ delta status bits, you may try baycom_ser_hdx.
-baycom_ser_hdx:
+baycom_ser_hdx:
This is an alternative driver for SER12 type modems.
It only supports half duplex, and only 1200 baud. Its devices
are called bcsh0 through bcsh3. Use this driver only if baycom_ser_fdx
@@ -37,45 +42,48 @@ baycom_epp:
The following modems are supported:
-ser12: This is a very simple 1200 baud AFSK modem. The modem consists only
- of a modulator/demodulator chip, usually a TI TCM3105. The computer
- is responsible for regenerating the receiver bit clock, as well as
- for handling the HDLC protocol. The modem connects to a serial port,
- hence the name. Since the serial port is not used as an async serial
- port, the kernel driver for serial ports cannot be used, and this
- driver only supports standard serial hardware (8250, 16450, 16550)
-
-par96: This is a modem for 9600 baud FSK compatible to the G3RUH standard.
- The modem does all the filtering and regenerates the receiver clock.
- Data is transferred from and to the PC via a shift register.
- The shift register is filled with 16 bits and an interrupt is signalled.
- The PC then empties the shift register in a burst. This modem connects
- to the parallel port, hence the name. The modem leaves the
- implementation of the HDLC protocol and the scrambler polynomial to
- the PC.
-
-picpar: This is a redesign of the par96 modem by Henning Rech, DF9IC. The modem
- is protocol compatible to par96, but uses only three low power ICs
- and can therefore be fed from the parallel port and does not require
- an additional power supply. Furthermore, it incorporates a carrier
- detect circuitry.
-
-EPP: This is a high-speed modem adaptor that connects to an enhanced parallel port.
- Its target audience is users working over a high speed hub (76.8kbit/s).
-
-eppfpga: This is a redesign of the EPP adaptor.
-
-
+======= ========================================================================
+ser12 This is a very simple 1200 baud AFSK modem. The modem consists only
+ of a modulator/demodulator chip, usually a TI TCM3105. The computer
+ is responsible for regenerating the receiver bit clock, as well as
+ for handling the HDLC protocol. The modem connects to a serial port,
+ hence the name. Since the serial port is not used as an async serial
+ port, the kernel driver for serial ports cannot be used, and this
+ driver only supports standard serial hardware (8250, 16450, 16550)
+
+par96 This is a modem for 9600 baud FSK compatible to the G3RUH standard.
+ The modem does all the filtering and regenerates the receiver clock.
+ Data is transferred from and to the PC via a shift register.
+ The shift register is filled with 16 bits and an interrupt is signalled.
+ The PC then empties the shift register in a burst. This modem connects
+ to the parallel port, hence the name. The modem leaves the
+ implementation of the HDLC protocol and the scrambler polynomial to
+ the PC.
+
+picpar This is a redesign of the par96 modem by Henning Rech, DF9IC. The modem
+ is protocol compatible to par96, but uses only three low power ICs
+ and can therefore be fed from the parallel port and does not require
+ an additional power supply. Furthermore, it incorporates a carrier
+ detect circuitry.
+
+EPP This is a high-speed modem adaptor that connects to an enhanced parallel
+ port.
+
+ Its target audience is users working over a high speed hub (76.8kbit/s).
+
+eppfpga This is a redesign of the EPP adaptor.
+======= ========================================================================
All of the above modems only support half duplex communications. However,
the driver supports the KISS (see below) fullduplex command. It then simply
starts to send as soon as there's a packet to transmit and does not care
about DCD, i.e. it starts to send even if there's someone else on the channel.
-This command is required by some implementations of the DAMA channel
+This command is required by some implementations of the DAMA channel
access protocol.
The Interface of the drivers
+============================
Unlike previous drivers, these drivers are no longer character devices,
but they are now true kernel network interfaces. Installation is therefore
@@ -88,20 +96,22 @@ me for WAMPES which allows attaching a kernel network interface directly.
Configuring the driver
+======================
Every time a driver is inserted into the kernel, it has to know which
modems it should access at which ports. This can be done with the setbaycom
utility. If you are only using one modem, you can also configure the
driver from the insmod command line (or by means of an option line in
-/etc/modprobe.d/*.conf).
+``/etc/modprobe.d/*.conf``).
+
+Examples::
-Examples:
modprobe baycom_ser_fdx mode="ser12*" iobase=0x3f8 irq=4
sethdlc -i bcsf0 -p mode "ser12*" io 0x3f8 irq 4
Both lines configure the first port to drive a ser12 modem at the first
-serial port (COM1 under DOS). The * in the mode parameter instructs the driver to use
-the software DCD algorithm (see below).
+serial port (COM1 under DOS). The * in the mode parameter instructs the driver
+to use the software DCD algorithm (see below)::
insmod baycom_par mode="picpar" iobase=0x378
sethdlc -i bcp0 -p mode "picpar" io 0x378
@@ -115,29 +125,33 @@ Note that both utilities interpret the values slightly differently.
Hardware DCD versus Software DCD
+================================
To avoid collisions on the air, the driver must know when the channel is
busy. This is the task of the DCD circuitry/software. The driver may either
utilise a software DCD algorithm (options=1) or use a DCD signal from
the hardware (options=0).
-ser12: if software DCD is utilised, the radio's squelch should always be
- open. It is highly recommended to use the software DCD algorithm,
- as it is much faster than most hardware squelch circuitry. The
- disadvantage is a slightly higher load on the system.
+======= =================================================================
+ser12 if software DCD is utilised, the radio's squelch should always be
+ open. It is highly recommended to use the software DCD algorithm,
+ as it is much faster than most hardware squelch circuitry. The
+ disadvantage is a slightly higher load on the system.
-par96: the software DCD algorithm for this type of modem is rather poor.
- The modem simply does not provide enough information to implement
- a reasonable DCD algorithm in software. Therefore, if your radio
- feeds the DCD input of the PAR96 modem, the use of the hardware
- DCD circuitry is recommended.
+par96 the software DCD algorithm for this type of modem is rather poor.
+ The modem simply does not provide enough information to implement
+ a reasonable DCD algorithm in software. Therefore, if your radio
+ feeds the DCD input of the PAR96 modem, the use of the hardware
+ DCD circuitry is recommended.
-picpar: the picpar modem features a builtin DCD hardware, which is highly
- recommended.
+picpar the picpar modem features a builtin DCD hardware, which is highly
+ recommended.
+======= =================================================================
Compatibility with the rest of the Linux kernel
+===============================================
The serial driver and the baycom serial drivers compete
for the same hardware resources. Of course only one driver can access a given
@@ -154,5 +168,7 @@ The parallel port drivers (baycom_par, baycom_epp) now use the parport subsystem
to arbitrate the ports between different client drivers.
vy 73s de
+
Tom Sailer, sailer@ife.ee.ethz.ch
+
hb9jnx @ hb9w.ampr.org
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.rst
index e3abfbd32f71..24168b0d16bd 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.rst
@@ -1,10 +1,15 @@
+.. SPDX-License-Identifier: GPL-2.0
- Linux Ethernet Bonding Driver HOWTO
+===================================
+Linux Ethernet Bonding Driver HOWTO
+===================================
- Latest update: 27 April 2011
+Latest update: 27 April 2011
+
+Initial release: Thomas Davis <tadavis at lbl.gov>
+
+Corrections, HA extensions: 2000/10/03-15:
-Initial release : Thomas Davis <tadavis at lbl.gov>
-Corrections, HA extensions : 2000/10/03-15 :
- Willy Tarreau <willy at meta-x.org>
- Constantine Gavrilov <const-g at xpert.com>
- Chad N. Tindel <ctindel at ieee dot org>
@@ -13,98 +18,98 @@ Corrections, HA extensions : 2000/10/03-15 :
Reorganized and updated Feb 2005 by Jay Vosburgh
Added Sysfs information: 2006/04/24
+
- Mitch Williams <mitch.a.williams at intel.com>
Introduction
============
- The Linux bonding driver provides a method for aggregating
+The Linux bonding driver provides a method for aggregating
multiple network interfaces into a single logical "bonded" interface.
The behavior of the bonded interfaces depends upon the mode; generally
speaking, modes provide either hot standby or load balancing services.
Additionally, link integrity monitoring may be performed.
-
- The bonding driver originally came from Donald Becker's
+
+The bonding driver originally came from Donald Becker's
beowulf patches for kernel 2.0. It has changed quite a bit since, and
the original tools from extreme-linux and beowulf sites will not work
with this version of the driver.
- For new versions of the driver, updated userspace tools, and
+For new versions of the driver, updated userspace tools, and
who to ask for help, please follow the links at the end of this file.
-Table of Contents
-=================
+.. Table of Contents
-1. Bonding Driver Installation
+ 1. Bonding Driver Installation
-2. Bonding Driver Options
+ 2. Bonding Driver Options
-3. Configuring Bonding Devices
-3.1 Configuration with Sysconfig Support
-3.1.1 Using DHCP with Sysconfig
-3.1.2 Configuring Multiple Bonds with Sysconfig
-3.2 Configuration with Initscripts Support
-3.2.1 Using DHCP with Initscripts
-3.2.2 Configuring Multiple Bonds with Initscripts
-3.3 Configuring Bonding Manually with Ifenslave
-3.3.1 Configuring Multiple Bonds Manually
-3.4 Configuring Bonding Manually via Sysfs
-3.5 Configuration with Interfaces Support
-3.6 Overriding Configuration for Special Cases
-3.7 Configuring LACP for 802.3ad mode in a more secure way
+ 3. Configuring Bonding Devices
+ 3.1 Configuration with Sysconfig Support
+ 3.1.1 Using DHCP with Sysconfig
+ 3.1.2 Configuring Multiple Bonds with Sysconfig
+ 3.2 Configuration with Initscripts Support
+ 3.2.1 Using DHCP with Initscripts
+ 3.2.2 Configuring Multiple Bonds with Initscripts
+ 3.3 Configuring Bonding Manually with Ifenslave
+ 3.3.1 Configuring Multiple Bonds Manually
+ 3.4 Configuring Bonding Manually via Sysfs
+ 3.5 Configuration with Interfaces Support
+ 3.6 Overriding Configuration for Special Cases
+ 3.7 Configuring LACP for 802.3ad mode in a more secure way
-4. Querying Bonding Configuration
-4.1 Bonding Configuration
-4.2 Network Configuration
+ 4. Querying Bonding Configuration
+ 4.1 Bonding Configuration
+ 4.2 Network Configuration
-5. Switch Configuration
+ 5. Switch Configuration
-6. 802.1q VLAN Support
+ 6. 802.1q VLAN Support
-7. Link Monitoring
-7.1 ARP Monitor Operation
-7.2 Configuring Multiple ARP Targets
-7.3 MII Monitor Operation
+ 7. Link Monitoring
+ 7.1 ARP Monitor Operation
+ 7.2 Configuring Multiple ARP Targets
+ 7.3 MII Monitor Operation
-8. Potential Trouble Sources
-8.1 Adventures in Routing
-8.2 Ethernet Device Renaming
-8.3 Painfully Slow Or No Failed Link Detection By Miimon
+ 8. Potential Trouble Sources
+ 8.1 Adventures in Routing
+ 8.2 Ethernet Device Renaming
+ 8.3 Painfully Slow Or No Failed Link Detection By Miimon
-9. SNMP agents
+ 9. SNMP agents
-10. Promiscuous mode
+ 10. Promiscuous mode
-11. Configuring Bonding for High Availability
-11.1 High Availability in a Single Switch Topology
-11.2 High Availability in a Multiple Switch Topology
-11.2.1 HA Bonding Mode Selection for Multiple Switch Topology
-11.2.2 HA Link Monitoring for Multiple Switch Topology
+ 11. Configuring Bonding for High Availability
+ 11.1 High Availability in a Single Switch Topology
+ 11.2 High Availability in a Multiple Switch Topology
+ 11.2.1 HA Bonding Mode Selection for Multiple Switch Topology
+ 11.2.2 HA Link Monitoring for Multiple Switch Topology
-12. Configuring Bonding for Maximum Throughput
-12.1 Maximum Throughput in a Single Switch Topology
-12.1.1 MT Bonding Mode Selection for Single Switch Topology
-12.1.2 MT Link Monitoring for Single Switch Topology
-12.2 Maximum Throughput in a Multiple Switch Topology
-12.2.1 MT Bonding Mode Selection for Multiple Switch Topology
-12.2.2 MT Link Monitoring for Multiple Switch Topology
+ 12. Configuring Bonding for Maximum Throughput
+ 12.1 Maximum Throughput in a Single Switch Topology
+ 12.1.1 MT Bonding Mode Selection for Single Switch Topology
+ 12.1.2 MT Link Monitoring for Single Switch Topology
+ 12.2 Maximum Throughput in a Multiple Switch Topology
+ 12.2.1 MT Bonding Mode Selection for Multiple Switch Topology
+ 12.2.2 MT Link Monitoring for Multiple Switch Topology
-13. Switch Behavior Issues
-13.1 Link Establishment and Failover Delays
-13.2 Duplicated Incoming Packets
+ 13. Switch Behavior Issues
+ 13.1 Link Establishment and Failover Delays
+ 13.2 Duplicated Incoming Packets
-14. Hardware Specific Considerations
-14.1 IBM BladeCenter
+ 14. Hardware Specific Considerations
+ 14.1 IBM BladeCenter
-15. Frequently Asked Questions
+ 15. Frequently Asked Questions
-16. Resources and Links
+ 16. Resources and Links
1. Bonding Driver Installation
==============================
- Most popular distro kernels ship with the bonding driver
+Most popular distro kernels ship with the bonding driver
already available as a module. If your distro does not, or you
have need to compile bonding from source (e.g., configuring and
installing a mainline kernel from kernel.org), you'll need to perform
@@ -113,54 +118,54 @@ the following steps:
1.1 Configure and build the kernel with bonding
-----------------------------------------------
- The current version of the bonding driver is available in the
+The current version of the bonding driver is available in the
drivers/net/bonding subdirectory of the most recent kernel source
(which is available on http://kernel.org). Most users "rolling their
own" will want to use the most recent kernel from kernel.org.
- Configure kernel with "make menuconfig" (or "make xconfig" or
+Configure kernel with "make menuconfig" (or "make xconfig" or
"make config"), then select "Bonding driver support" in the "Network
device support" section. It is recommended that you configure the
driver as module since it is currently the only way to pass parameters
to the driver or configure more than one bonding device.
- Build and install the new kernel and modules.
+Build and install the new kernel and modules.
1.2 Bonding Control Utility
--------------------------------------
+---------------------------
- It is recommended to configure bonding via iproute2 (netlink)
+It is recommended to configure bonding via iproute2 (netlink)
or sysfs, the old ifenslave control utility is obsolete.
2. Bonding Driver Options
=========================
- Options for the bonding driver are supplied as parameters to the
+Options for the bonding driver are supplied as parameters to the
bonding module at load time, or are specified via sysfs.
- Module options may be given as command line arguments to the
+Module options may be given as command line arguments to the
insmod or modprobe command, but are usually specified in either the
-/etc/modprobe.d/*.conf configuration files, or in a distro-specific
+``/etc/modprobe.d/*.conf`` configuration files, or in a distro-specific
configuration file (some of which are detailed in the next section).
- Details on bonding support for sysfs is provided in the
+Details on bonding support for sysfs is provided in the
"Configuring Bonding Manually via Sysfs" section, below.
- The available bonding driver parameters are listed below. If a
+The available bonding driver parameters are listed below. If a
parameter is not specified the default value is used. When initially
configuring a bond, it is recommended "tail -f /var/log/messages" be
run in a separate window to watch for bonding driver error messages.
- It is critical that either the miimon or arp_interval and
+It is critical that either the miimon or arp_interval and
arp_ip_target parameters be specified, otherwise serious network
degradation will occur during link failures. Very few devices do not
support at least miimon, so there is really no reason not to use it.
- Options with textual values will accept either the text name
+Options with textual values will accept either the text name
or, for backwards compatibility, the option value. E.g.,
"mode=802.3ad" and "mode=4" set the same mode.
- The parameters are as follows:
+The parameters are as follows:
active_slave
@@ -246,10 +251,13 @@ ad_user_port_key
In an AD system, the port-key has three parts as shown below -
+ ===== ============
Bits Use
+ ===== ============
00 Duplex
01-05 Speed
06-15 User-defined
+ ===== ============
This defines the upper 10 bits of the port key. The values can be
from 0 - 1023. If not given, the system defaults to 0.
@@ -699,7 +707,7 @@ mode
swapped with the new curr_active_slave that was
chosen.
-num_grat_arp
+num_grat_arp,
num_unsol_na
Specify the number of peer notifications (gratuitous ARPs and
@@ -729,13 +737,13 @@ packets_per_slave
peer_notif_delay
- Specify the delay, in milliseconds, between each peer
- notification (gratuitous ARP and unsolicited IPv6 Neighbor
- Advertisement) when they are issued after a failover event.
- This delay should be a multiple of the link monitor interval
- (arp_interval or miimon, whichever is active). The default
- value is 0 which means to match the value of the link monitor
- interval.
+ Specify the delay, in milliseconds, between each peer
+ notification (gratuitous ARP and unsolicited IPv6 Neighbor
+ Advertisement) when they are issued after a failover event.
+ This delay should be a multiple of the link monitor interval
+ (arp_interval or miimon, whichever is active). The default
+ value is 0 which means to match the value of the link monitor
+ interval.
primary
@@ -977,88 +985,88 @@ lp_interval
3. Configuring Bonding Devices
==============================
- You can configure bonding using either your distro's network
+You can configure bonding using either your distro's network
initialization scripts, or manually using either iproute2 or the
sysfs interface. Distros generally use one of three packages for the
network initialization scripts: initscripts, sysconfig or interfaces.
Recent versions of these packages have support for bonding, while older
versions do not.
- We will first describe the options for configuring bonding for
+We will first describe the options for configuring bonding for
distros using versions of initscripts, sysconfig and interfaces with full
or partial support for bonding, then provide information on enabling
bonding without support from the network initialization scripts (i.e.,
older versions of initscripts or sysconfig).
- If you're unsure whether your distro uses sysconfig,
+If you're unsure whether your distro uses sysconfig,
initscripts or interfaces, or don't know if it's new enough, have no fear.
Determining this is fairly straightforward.
- First, look for a file called interfaces in /etc/network directory.
+First, look for a file called interfaces in /etc/network directory.
If this file is present in your system, then your system use interfaces. See
Configuration with Interfaces Support.
- Else, issue the command:
+Else, issue the command::
-$ rpm -qf /sbin/ifup
+ $ rpm -qf /sbin/ifup
- It will respond with a line of text starting with either
+It will respond with a line of text starting with either
"initscripts" or "sysconfig," followed by some numbers. This is the
package that provides your network initialization scripts.
- Next, to determine if your installation supports bonding,
-issue the command:
+Next, to determine if your installation supports bonding,
+issue the command::
-$ grep ifenslave /sbin/ifup
+ $ grep ifenslave /sbin/ifup
- If this returns any matches, then your initscripts or
+If this returns any matches, then your initscripts or
sysconfig has support for bonding.
3.1 Configuration with Sysconfig Support
----------------------------------------
- This section applies to distros using a version of sysconfig
+This section applies to distros using a version of sysconfig
with bonding support, for example, SuSE Linux Enterprise Server 9.
- SuSE SLES 9's networking configuration system does support
+SuSE SLES 9's networking configuration system does support
bonding, however, at this writing, the YaST system configuration
front end does not provide any means to work with bonding devices.
Bonding devices can be managed by hand, however, as follows.
- First, if they have not already been configured, configure the
+First, if they have not already been configured, configure the
slave devices. On SLES 9, this is most easily done by running the
yast2 sysconfig configuration utility. The goal is for to create an
ifcfg-id file for each slave device. The simplest way to accomplish
this is to configure the devices for DHCP (this is only to get the
file ifcfg-id file created; see below for some issues with DHCP). The
-name of the configuration file for each device will be of the form:
+name of the configuration file for each device will be of the form::
-ifcfg-id-xx:xx:xx:xx:xx:xx
+ ifcfg-id-xx:xx:xx:xx:xx:xx
- Where the "xx" portion will be replaced with the digits from
+Where the "xx" portion will be replaced with the digits from
the device's permanent MAC address.
- Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been
+Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been
created, it is necessary to edit the configuration files for the slave
devices (the MAC addresses correspond to those of the slave devices).
Before editing, the file will contain multiple lines, and will look
-something like this:
+something like this::
-BOOTPROTO='dhcp'
-STARTMODE='on'
-USERCTL='no'
-UNIQUE='XNzu.WeZGOGF+4wE'
-_nm_name='bus-pci-0001:61:01.0'
+ BOOTPROTO='dhcp'
+ STARTMODE='on'
+ USERCTL='no'
+ UNIQUE='XNzu.WeZGOGF+4wE'
+ _nm_name='bus-pci-0001:61:01.0'
- Change the BOOTPROTO and STARTMODE lines to the following:
+Change the BOOTPROTO and STARTMODE lines to the following::
-BOOTPROTO='none'
-STARTMODE='off'
+ BOOTPROTO='none'
+ STARTMODE='off'
- Do not alter the UNIQUE or _nm_name lines. Remove any other
+Do not alter the UNIQUE or _nm_name lines. Remove any other
lines (USERCTL, etc).
- Once the ifcfg-id-xx:xx:xx:xx:xx:xx files have been modified,
+Once the ifcfg-id-xx:xx:xx:xx:xx:xx files have been modified,
it's time to create the configuration file for the bonding device
itself. This file is named ifcfg-bondX, where X is the number of the
bonding device to create, starting at 0. The first such file is
@@ -1066,49 +1074,52 @@ ifcfg-bond0, the second is ifcfg-bond1, and so on. The sysconfig
network configuration system will correctly start multiple instances
of bonding.
- The contents of the ifcfg-bondX file is as follows:
-
-BOOTPROTO="static"
-BROADCAST="10.0.2.255"
-IPADDR="10.0.2.10"
-NETMASK="255.255.0.0"
-NETWORK="10.0.2.0"
-REMOTE_IPADDR=""
-STARTMODE="onboot"
-BONDING_MASTER="yes"
-BONDING_MODULE_OPTS="mode=active-backup miimon=100"
-BONDING_SLAVE0="eth0"
-BONDING_SLAVE1="bus-pci-0000:06:08.1"
-
- Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK
+The contents of the ifcfg-bondX file is as follows::
+
+ BOOTPROTO="static"
+ BROADCAST="10.0.2.255"
+ IPADDR="10.0.2.10"
+ NETMASK="255.255.0.0"
+ NETWORK="10.0.2.0"
+ REMOTE_IPADDR=""
+ STARTMODE="onboot"
+ BONDING_MASTER="yes"
+ BONDING_MODULE_OPTS="mode=active-backup miimon=100"
+ BONDING_SLAVE0="eth0"
+ BONDING_SLAVE1="bus-pci-0000:06:08.1"
+
+Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK
values with the appropriate values for your network.
- The STARTMODE specifies when the device is brought online.
+The STARTMODE specifies when the device is brought online.
The possible values are:
- onboot: The device is started at boot time. If you're not
+ ======== ======================================================
+ onboot The device is started at boot time. If you're not
sure, this is probably what you want.
- manual: The device is started only when ifup is called
+ manual The device is started only when ifup is called
manually. Bonding devices may be configured this
way if you do not wish them to start automatically
at boot for some reason.
- hotplug: The device is started by a hotplug event. This is not
+ hotplug The device is started by a hotplug event. This is not
a valid choice for a bonding device.
- off or ignore: The device configuration is ignored.
+ off or The device configuration is ignored.
+ ignore
+ ======== ======================================================
- The line BONDING_MASTER='yes' indicates that the device is a
+The line BONDING_MASTER='yes' indicates that the device is a
bonding master device. The only useful value is "yes."
- The contents of BONDING_MODULE_OPTS are supplied to the
+The contents of BONDING_MODULE_OPTS are supplied to the
instance of the bonding module for this device. Specify the options
for the bonding mode, link monitoring, and so on here. Do not include
the max_bonds bonding parameter; this will confuse the configuration
system if you have multiple bonding devices.
- Finally, supply one BONDING_SLAVEn="slave device" for each
+Finally, supply one BONDING_SLAVEn="slave device" for each
slave. where "n" is an increasing value, one for each slave. The
"slave device" is either an interface name, e.g., "eth0", or a device
specifier for the network device. The interface name is easier to
@@ -1120,34 +1131,34 @@ changes (for example, it is moved from one PCI slot to another). The
example above uses one of each type for demonstration purposes; most
configurations will choose one or the other for all slave devices.
- When all configuration files have been modified or created,
+When all configuration files have been modified or created,
networking must be restarted for the configuration changes to take
-effect. This can be accomplished via the following:
+effect. This can be accomplished via the following::
-# /etc/init.d/network restart
+ # /etc/init.d/network restart
- Note that the network control script (/sbin/ifdown) will
+Note that the network control script (/sbin/ifdown) will
remove the bonding module as part of the network shutdown processing,
so it is not necessary to remove the module by hand if, e.g., the
module parameters have changed.
- Also, at this writing, YaST/YaST2 will not manage bonding
+Also, at this writing, YaST/YaST2 will not manage bonding
devices (they do not show bonding interfaces on its list of network
devices). It is necessary to edit the configuration file by hand to
change the bonding configuration.
- Additional general options and details of the ifcfg file
-format can be found in an example ifcfg template file:
+Additional general options and details of the ifcfg file
+format can be found in an example ifcfg template file::
-/etc/sysconfig/network/ifcfg.template
+ /etc/sysconfig/network/ifcfg.template
- Note that the template does not document the various BONDING_
+Note that the template does not document the various ``BONDING_*``
settings described above, but does describe many of the other options.
3.1.1 Using DHCP with Sysconfig
-------------------------------
- Under sysconfig, configuring a device with BOOTPROTO='dhcp'
+Under sysconfig, configuring a device with BOOTPROTO='dhcp'
will cause it to query DHCP for its IP address information. At this
writing, this does not function for bonding devices; the scripts
attempt to obtain the device address from DHCP prior to adding any of
@@ -1157,7 +1168,7 @@ sent to the network.
3.1.2 Configuring Multiple Bonds with Sysconfig
-----------------------------------------------
- The sysconfig network initialization system is capable of
+The sysconfig network initialization system is capable of
handling multiple bonding devices. All that is necessary is for each
bonding instance to have an appropriately configured ifcfg-bondX file
(as described above). Do not specify the "max_bonds" parameter to any
@@ -1165,14 +1176,14 @@ instance of bonding, as this will confuse sysconfig. If you require
multiple bonding devices with identical parameters, create multiple
ifcfg-bondX files.
- Because the sysconfig scripts supply the bonding module
+Because the sysconfig scripts supply the bonding module
options in the ifcfg-bondX file, it is not necessary to add them to
-the system /etc/modules.d/*.conf configuration files.
+the system ``/etc/modules.d/*.conf`` configuration files.
3.2 Configuration with Initscripts Support
------------------------------------------
- This section applies to distros using a recent version of
+This section applies to distros using a recent version of
initscripts with bonding support, for example, Red Hat Enterprise Linux
version 3 or later, Fedora, etc. On these systems, the network
initialization scripts have knowledge of bonding, and can be configured to
@@ -1180,7 +1191,7 @@ control bonding devices. Note that older versions of the initscripts
package have lower levels of support for bonding; this will be noted where
applicable.
- These distros will not automatically load the network adapter
+These distros will not automatically load the network adapter
driver unless the ethX device is configured with an IP address.
Because of this constraint, users must manually configure a
network-script file for all physical adapters that will be members of
@@ -1188,19 +1199,19 @@ a bondX link. Network script files are located in the directory:
/etc/sysconfig/network-scripts
- The file name must be prefixed with "ifcfg-eth" and suffixed
+The file name must be prefixed with "ifcfg-eth" and suffixed
with the adapter's physical adapter number. For example, the script
for eth0 would be named /etc/sysconfig/network-scripts/ifcfg-eth0.
-Place the following text in the file:
+Place the following text in the file::
-DEVICE=eth0
-USERCTL=no
-ONBOOT=yes
-MASTER=bond0
-SLAVE=yes
-BOOTPROTO=none
+ DEVICE=eth0
+ USERCTL=no
+ ONBOOT=yes
+ MASTER=bond0
+ SLAVE=yes
+ BOOTPROTO=none
- The DEVICE= line will be different for every ethX device and
+The DEVICE= line will be different for every ethX device and
must correspond with the name of the file, i.e., ifcfg-eth1 must have
a device line of DEVICE=eth1. The setting of the MASTER= line will
also depend on the final bonding interface name chosen for your bond.
@@ -1208,69 +1219,70 @@ As with other network devices, these typically start at 0, and go up
one for each device, i.e., the first bonding instance is bond0, the
second is bond1, and so on.
- Next, create a bond network script. The file name for this
+Next, create a bond network script. The file name for this
script will be /etc/sysconfig/network-scripts/ifcfg-bondX where X is
the number of the bond. For bond0 the file is named "ifcfg-bond0",
for bond1 it is named "ifcfg-bond1", and so on. Within that file,
-place the following text:
-
-DEVICE=bond0
-IPADDR=192.168.1.1
-NETMASK=255.255.255.0
-NETWORK=192.168.1.0
-BROADCAST=192.168.1.255
-ONBOOT=yes
-BOOTPROTO=none
-USERCTL=no
-
- Be sure to change the networking specific lines (IPADDR,
+place the following text::
+
+ DEVICE=bond0
+ IPADDR=192.168.1.1
+ NETMASK=255.255.255.0
+ NETWORK=192.168.1.0
+ BROADCAST=192.168.1.255
+ ONBOOT=yes
+ BOOTPROTO=none
+ USERCTL=no
+
+Be sure to change the networking specific lines (IPADDR,
NETMASK, NETWORK and BROADCAST) to match your network configuration.
- For later versions of initscripts, such as that found with Fedora
+For later versions of initscripts, such as that found with Fedora
7 (or later) and Red Hat Enterprise Linux version 5 (or later), it is possible,
and, indeed, preferable, to specify the bonding options in the ifcfg-bond0
-file, e.g. a line of the format:
+file, e.g. a line of the format::
-BONDING_OPTS="mode=active-backup arp_interval=60 arp_ip_target=192.168.1.254"
+ BONDING_OPTS="mode=active-backup arp_interval=60 arp_ip_target=192.168.1.254"
- will configure the bond with the specified options. The options
+will configure the bond with the specified options. The options
specified in BONDING_OPTS are identical to the bonding module parameters
except for the arp_ip_target field when using versions of initscripts older
than and 8.57 (Fedora 8) and 8.45.19 (Red Hat Enterprise Linux 5.2). When
using older versions each target should be included as a separate option and
should be preceded by a '+' to indicate it should be added to the list of
-queried targets, e.g.,
+queried targets, e.g.,::
- arp_ip_target=+192.168.1.1 arp_ip_target=+192.168.1.2
+ arp_ip_target=+192.168.1.1 arp_ip_target=+192.168.1.2
- is the proper syntax to specify multiple targets. When specifying
-options via BONDING_OPTS, it is not necessary to edit /etc/modprobe.d/*.conf.
+is the proper syntax to specify multiple targets. When specifying
+options via BONDING_OPTS, it is not necessary to edit
+``/etc/modprobe.d/*.conf``.
- For even older versions of initscripts that do not support
+For even older versions of initscripts that do not support
BONDING_OPTS, it is necessary to edit /etc/modprobe.d/*.conf, depending upon
your distro) to load the bonding module with your desired options when the
bond0 interface is brought up. The following lines in /etc/modprobe.d/*.conf
will load the bonding module, and select its options:
-alias bond0 bonding
-options bond0 mode=balance-alb miimon=100
+ alias bond0 bonding
+ options bond0 mode=balance-alb miimon=100
- Replace the sample parameters with the appropriate set of
+Replace the sample parameters with the appropriate set of
options for your configuration.
- Finally run "/etc/rc.d/init.d/network restart" as root. This
+Finally run "/etc/rc.d/init.d/network restart" as root. This
will restart the networking subsystem and your bond link should be now
up and running.
3.2.1 Using DHCP with Initscripts
---------------------------------
- Recent versions of initscripts (the versions supplied with Fedora
+Recent versions of initscripts (the versions supplied with Fedora
Core 3 and Red Hat Enterprise Linux 4, or later versions, are reported to
work) have support for assigning IP information to bonding devices via
DHCP.
- To configure bonding for DHCP, configure it as described
+To configure bonding for DHCP, configure it as described
above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp"
and add a line consisting of "TYPE=Bonding". Note that the TYPE value
is case sensitive.
@@ -1278,7 +1290,7 @@ is case sensitive.
3.2.2 Configuring Multiple Bonds with Initscripts
-------------------------------------------------
- Initscripts packages that are included with Fedora 7 and Red Hat
+Initscripts packages that are included with Fedora 7 and Red Hat
Enterprise Linux 5 support multiple bonding interfaces by simply
specifying the appropriate BONDING_OPTS= in ifcfg-bondX where X is the
number of the bond. This support requires sysfs support in the kernel,
@@ -1290,77 +1302,77 @@ below.
3.3 Configuring Bonding Manually with iproute2
-----------------------------------------------
- This section applies to distros whose network initialization
+This section applies to distros whose network initialization
scripts (the sysconfig or initscripts package) do not have specific
knowledge of bonding. One such distro is SuSE Linux Enterprise Server
version 8.
- The general method for these systems is to place the bonding
+The general method for these systems is to place the bonding
module parameters into a config file in /etc/modprobe.d/ (as
appropriate for the installed distro), then add modprobe and/or
`ip link` commands to the system's global init script. The name of
the global init script differs; for sysconfig, it is
/etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local.
- For example, if you wanted to make a simple bond of two e100
+For example, if you wanted to make a simple bond of two e100
devices (presumed to be eth0 and eth1), and have it persist across
reboots, edit the appropriate file (/etc/init.d/boot.local or
-/etc/rc.d/rc.local), and add the following:
+/etc/rc.d/rc.local), and add the following::
-modprobe bonding mode=balance-alb miimon=100
-modprobe e100
-ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up
-ip link set eth0 master bond0
-ip link set eth1 master bond0
+ modprobe bonding mode=balance-alb miimon=100
+ modprobe e100
+ ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up
+ ip link set eth0 master bond0
+ ip link set eth1 master bond0
- Replace the example bonding module parameters and bond0
+Replace the example bonding module parameters and bond0
network configuration (IP address, netmask, etc) with the appropriate
values for your configuration.
- Unfortunately, this method will not provide support for the
+Unfortunately, this method will not provide support for the
ifup and ifdown scripts on the bond devices. To reload the bonding
-configuration, it is necessary to run the initialization script, e.g.,
+configuration, it is necessary to run the initialization script, e.g.,::
-# /etc/init.d/boot.local
+ # /etc/init.d/boot.local
- or
+or::
-# /etc/rc.d/rc.local
+ # /etc/rc.d/rc.local
- It may be desirable in such a case to create a separate script
+It may be desirable in such a case to create a separate script
which only initializes the bonding configuration, then call that
separate script from within boot.local. This allows for bonding to be
enabled without re-running the entire global init script.
- To shut down the bonding devices, it is necessary to first
+To shut down the bonding devices, it is necessary to first
mark the bonding device itself as being down, then remove the
appropriate device driver modules. For our example above, you can do
-the following:
+the following::
-# ifconfig bond0 down
-# rmmod bonding
-# rmmod e100
+ # ifconfig bond0 down
+ # rmmod bonding
+ # rmmod e100
- Again, for convenience, it may be desirable to create a script
+Again, for convenience, it may be desirable to create a script
with these commands.
3.3.1 Configuring Multiple Bonds Manually
-----------------------------------------
- This section contains information on configuring multiple
+This section contains information on configuring multiple
bonding devices with differing options for those systems whose network
initialization scripts lack support for configuring multiple bonds.
- If you require multiple bonding devices, but all with the same
+If you require multiple bonding devices, but all with the same
options, you may wish to use the "max_bonds" module parameter,
documented above.
- To create multiple bonding devices with differing options, it is
+To create multiple bonding devices with differing options, it is
preferable to use bonding parameters exported by sysfs, documented in the
section below.
- For versions of bonding without sysfs support, the only means to
+For versions of bonding without sysfs support, the only means to
provide multiple instances of bonding with differing options is to load
the bonding driver multiple times. Note that current versions of the
sysconfig network initialization scripts handle this automatically; if
@@ -1368,35 +1380,35 @@ your distro uses these scripts, no special action is needed. See the
section Configuring Bonding Devices, above, if you're not sure about your
network initialization scripts.
- To load multiple instances of the module, it is necessary to
+To load multiple instances of the module, it is necessary to
specify a different name for each instance (the module loading system
requires that every loaded module, even multiple instances of the same
module, have a unique name). This is accomplished by supplying multiple
-sets of bonding options in /etc/modprobe.d/*.conf, for example:
+sets of bonding options in ``/etc/modprobe.d/*.conf``, for example::
-alias bond0 bonding
-options bond0 -o bond0 mode=balance-rr miimon=100
+ alias bond0 bonding
+ options bond0 -o bond0 mode=balance-rr miimon=100
-alias bond1 bonding
-options bond1 -o bond1 mode=balance-alb miimon=50
+ alias bond1 bonding
+ options bond1 -o bond1 mode=balance-alb miimon=50
- will load the bonding module two times. The first instance is
+will load the bonding module two times. The first instance is
named "bond0" and creates the bond0 device in balance-rr mode with an
miimon of 100. The second instance is named "bond1" and creates the
bond1 device in balance-alb mode with an miimon of 50.
- In some circumstances (typically with older distributions),
+In some circumstances (typically with older distributions),
the above does not work, and the second bonding instance never sees
its options. In that case, the second options line can be substituted
-as follows:
+as follows::
-install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \
- mode=balance-alb miimon=50
+ install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \
+ mode=balance-alb miimon=50
- This may be repeated any number of times, specifying a new and
+This may be repeated any number of times, specifying a new and
unique name in place of bond1 for each subsequent instance.
- It has been observed that some Red Hat supplied kernels are unable
+It has been observed that some Red Hat supplied kernels are unable
to rename modules at load time (the "-o bond1" part). Attempts to pass
that option to modprobe will produce an "Operation not permitted" error.
This has been reported on some Fedora Core kernels, and has been seen on
@@ -1407,18 +1419,18 @@ kernels, and also lack sysfs support).
3.4 Configuring Bonding Manually via Sysfs
------------------------------------------
- Starting with version 3.0.0, Channel Bonding may be configured
+Starting with version 3.0.0, Channel Bonding may be configured
via the sysfs interface. This interface allows dynamic configuration
of all bonds in the system without unloading the module. It also
allows for adding and removing bonds at runtime. Ifenslave is no
longer required, though it is still supported.
- Use of the sysfs interface allows you to use multiple bonds
+Use of the sysfs interface allows you to use multiple bonds
with different configurations without having to reload the module.
It also allows you to use multiple, differently configured bonds when
bonding is compiled into the kernel.
- You must have the sysfs filesystem mounted to configure
+You must have the sysfs filesystem mounted to configure
bonding this way. The examples in this document assume that you
are using the standard mount point for sysfs, e.g. /sys. If your
sysfs filesystem is mounted elsewhere, you will need to adjust the
@@ -1426,38 +1438,45 @@ example paths accordingly.
Creating and Destroying Bonds
-----------------------------
-To add a new bond foo:
-# echo +foo > /sys/class/net/bonding_masters
+To add a new bond foo::
+
+ # echo +foo > /sys/class/net/bonding_masters
+
+To remove an existing bond bar::
-To remove an existing bond bar:
-# echo -bar > /sys/class/net/bonding_masters
+ # echo -bar > /sys/class/net/bonding_masters
-To show all existing bonds:
-# cat /sys/class/net/bonding_masters
+To show all existing bonds::
-NOTE: due to 4K size limitation of sysfs files, this list may be
-truncated if you have more than a few hundred bonds. This is unlikely
-to occur under normal operating conditions.
+ # cat /sys/class/net/bonding_masters
+
+.. note::
+
+ due to 4K size limitation of sysfs files, this list may be
+ truncated if you have more than a few hundred bonds. This is unlikely
+ to occur under normal operating conditions.
Adding and Removing Slaves
--------------------------
- Interfaces may be enslaved to a bond using the file
+Interfaces may be enslaved to a bond using the file
/sys/class/net/<bond>/bonding/slaves. The semantics for this file
are the same as for the bonding_masters file.
-To enslave interface eth0 to bond bond0:
-# ifconfig bond0 up
-# echo +eth0 > /sys/class/net/bond0/bonding/slaves
+To enslave interface eth0 to bond bond0::
+
+ # ifconfig bond0 up
+ # echo +eth0 > /sys/class/net/bond0/bonding/slaves
-To free slave eth0 from bond bond0:
-# echo -eth0 > /sys/class/net/bond0/bonding/slaves
+To free slave eth0 from bond bond0::
- When an interface is enslaved to a bond, symlinks between the
+ # echo -eth0 > /sys/class/net/bond0/bonding/slaves
+
+When an interface is enslaved to a bond, symlinks between the
two are created in the sysfs filesystem. In this case, you would get
/sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and
/sys/class/net/eth0/master pointing to /sys/class/net/bond0.
- This means that you can tell quickly whether or not an
+This means that you can tell quickly whether or not an
interface is enslaved by looking for the master symlink. Thus:
# echo -eth0 > /sys/class/net/eth0/master/bonding/slaves
will free eth0 from whatever bond it is enslaved to, regardless of
@@ -1465,127 +1484,143 @@ the name of the bond interface.
Changing a Bond's Configuration
-------------------------------
- Each bond may be configured individually by manipulating the
+Each bond may be configured individually by manipulating the
files located in /sys/class/net/<bond name>/bonding
- The names of these files correspond directly with the command-
+The names of these files correspond directly with the command-
line parameters described elsewhere in this file, and, with the
exception of arp_ip_target, they accept the same values. To see the
current setting, simply cat the appropriate file.
- A few examples will be given here; for specific usage
+A few examples will be given here; for specific usage
guidelines for each parameter, see the appropriate section in this
document.
-To configure bond0 for balance-alb mode:
-# ifconfig bond0 down
-# echo 6 > /sys/class/net/bond0/bonding/mode
- - or -
-# echo balance-alb > /sys/class/net/bond0/bonding/mode
- NOTE: The bond interface must be down before the mode can be
-changed.
-
-To enable MII monitoring on bond0 with a 1 second interval:
-# echo 1000 > /sys/class/net/bond0/bonding/miimon
- NOTE: If ARP monitoring is enabled, it will disabled when MII
-monitoring is enabled, and vice-versa.
-
-To add ARP targets:
-# echo +192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target
-# echo +192.168.0.101 > /sys/class/net/bond0/bonding/arp_ip_target
- NOTE: up to 16 target addresses may be specified.
-
-To remove an ARP target:
-# echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target
-
-To configure the interval between learning packet transmits:
-# echo 12 > /sys/class/net/bond0/bonding/lp_interval
- NOTE: the lp_interval is the number of seconds between instances where
-the bonding driver sends learning packets to each slaves peer switch. The
-default interval is 1 second.
+To configure bond0 for balance-alb mode::
+
+ # ifconfig bond0 down
+ # echo 6 > /sys/class/net/bond0/bonding/mode
+ - or -
+ # echo balance-alb > /sys/class/net/bond0/bonding/mode
+
+.. note::
+
+ The bond interface must be down before the mode can be changed.
+
+To enable MII monitoring on bond0 with a 1 second interval::
+
+ # echo 1000 > /sys/class/net/bond0/bonding/miimon
+
+.. note::
+
+ If ARP monitoring is enabled, it will disabled when MII
+ monitoring is enabled, and vice-versa.
+
+To add ARP targets::
+
+ # echo +192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target
+ # echo +192.168.0.101 > /sys/class/net/bond0/bonding/arp_ip_target
+
+.. note::
+
+ up to 16 target addresses may be specified.
+
+To remove an ARP target::
+
+ # echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target
+
+To configure the interval between learning packet transmits::
+
+ # echo 12 > /sys/class/net/bond0/bonding/lp_interval
+
+.. note::
+
+ the lp_interval is the number of seconds between instances where
+ the bonding driver sends learning packets to each slaves peer switch. The
+ default interval is 1 second.
Example Configuration
---------------------
- We begin with the same example that is shown in section 3.3,
+We begin with the same example that is shown in section 3.3,
executed with sysfs, and without using ifenslave.
- To make a simple bond of two e100 devices (presumed to be eth0
+To make a simple bond of two e100 devices (presumed to be eth0
and eth1), and have it persist across reboots, edit the appropriate
file (/etc/init.d/boot.local or /etc/rc.d/rc.local), and add the
-following:
+following::
-modprobe bonding
-modprobe e100
-echo balance-alb > /sys/class/net/bond0/bonding/mode
-ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up
-echo 100 > /sys/class/net/bond0/bonding/miimon
-echo +eth0 > /sys/class/net/bond0/bonding/slaves
-echo +eth1 > /sys/class/net/bond0/bonding/slaves
+ modprobe bonding
+ modprobe e100
+ echo balance-alb > /sys/class/net/bond0/bonding/mode
+ ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up
+ echo 100 > /sys/class/net/bond0/bonding/miimon
+ echo +eth0 > /sys/class/net/bond0/bonding/slaves
+ echo +eth1 > /sys/class/net/bond0/bonding/slaves
- To add a second bond, with two e1000 interfaces in
+To add a second bond, with two e1000 interfaces in
active-backup mode, using ARP monitoring, add the following lines to
-your init script:
+your init script::
-modprobe e1000
-echo +bond1 > /sys/class/net/bonding_masters
-echo active-backup > /sys/class/net/bond1/bonding/mode
-ifconfig bond1 192.168.2.1 netmask 255.255.255.0 up
-echo +192.168.2.100 /sys/class/net/bond1/bonding/arp_ip_target
-echo 2000 > /sys/class/net/bond1/bonding/arp_interval
-echo +eth2 > /sys/class/net/bond1/bonding/slaves
-echo +eth3 > /sys/class/net/bond1/bonding/slaves
+ modprobe e1000
+ echo +bond1 > /sys/class/net/bonding_masters
+ echo active-backup > /sys/class/net/bond1/bonding/mode
+ ifconfig bond1 192.168.2.1 netmask 255.255.255.0 up
+ echo +192.168.2.100 /sys/class/net/bond1/bonding/arp_ip_target
+ echo 2000 > /sys/class/net/bond1/bonding/arp_interval
+ echo +eth2 > /sys/class/net/bond1/bonding/slaves
+ echo +eth3 > /sys/class/net/bond1/bonding/slaves
3.5 Configuration with Interfaces Support
-----------------------------------------
- This section applies to distros which use /etc/network/interfaces file
+This section applies to distros which use /etc/network/interfaces file
to describe network interface configuration, most notably Debian and it's
derivatives.
- The ifup and ifdown commands on Debian don't support bonding out of
+The ifup and ifdown commands on Debian don't support bonding out of
the box. The ifenslave-2.6 package should be installed to provide bonding
-support. Once installed, this package will provide bond-* options to be used
-into /etc/network/interfaces.
+support. Once installed, this package will provide ``bond-*`` options
+to be used into /etc/network/interfaces.
- Note that ifenslave-2.6 package will load the bonding module and use
+Note that ifenslave-2.6 package will load the bonding module and use
the ifenslave command when appropriate.
Example Configurations
----------------------
In /etc/network/interfaces, the following stanza will configure bond0, in
-active-backup mode, with eth0 and eth1 as slaves.
+active-backup mode, with eth0 and eth1 as slaves::
-auto bond0
-iface bond0 inet dhcp
- bond-slaves eth0 eth1
- bond-mode active-backup
- bond-miimon 100
- bond-primary eth0 eth1
+ auto bond0
+ iface bond0 inet dhcp
+ bond-slaves eth0 eth1
+ bond-mode active-backup
+ bond-miimon 100
+ bond-primary eth0 eth1
If the above configuration doesn't work, you might have a system using
upstart for system startup. This is most notably true for recent
Ubuntu versions. The following stanza in /etc/network/interfaces will
-produce the same result on those systems.
-
-auto bond0
-iface bond0 inet dhcp
- bond-slaves none
- bond-mode active-backup
- bond-miimon 100
-
-auto eth0
-iface eth0 inet manual
- bond-master bond0
- bond-primary eth0 eth1
-
-auto eth1
-iface eth1 inet manual
- bond-master bond0
- bond-primary eth0 eth1
-
-For a full list of bond-* supported options in /etc/network/interfaces and some
-more advanced examples tailored to you particular distros, see the files in
+produce the same result on those systems::
+
+ auto bond0
+ iface bond0 inet dhcp
+ bond-slaves none
+ bond-mode active-backup
+ bond-miimon 100
+
+ auto eth0
+ iface eth0 inet manual
+ bond-master bond0
+ bond-primary eth0 eth1
+
+ auto eth1
+ iface eth1 inet manual
+ bond-master bond0
+ bond-primary eth0 eth1
+
+For a full list of ``bond-*`` supported options in /etc/network/interfaces and
+some more advanced examples tailored to you particular distros, see the files in
/usr/share/doc/ifenslave-2.6.
3.6 Overriding Configuration for Special Cases
@@ -1604,37 +1639,37 @@ can safely be sent over either interface. Such configurations may be achieved
using the traffic control utilities inherent in linux.
By default the bonding driver is multiqueue aware and 16 queues are created
-when the driver initializes (see Documentation/networking/multiqueue.txt
+when the driver initializes (see Documentation/networking/multiqueue.rst
for details). If more or less queues are desired the module parameter
tx_queues can be used to change this value. There is no sysfs parameter
available as the allocation is done at module init time.
The output of the file /proc/net/bonding/bondX has changed so the output Queue
-ID is now printed for each slave:
+ID is now printed for each slave::
-Bonding Mode: fault-tolerance (active-backup)
-Primary Slave: None
-Currently Active Slave: eth0
-MII Status: up
-MII Polling Interval (ms): 0
-Up Delay (ms): 0
-Down Delay (ms): 0
+ Bonding Mode: fault-tolerance (active-backup)
+ Primary Slave: None
+ Currently Active Slave: eth0
+ MII Status: up
+ MII Polling Interval (ms): 0
+ Up Delay (ms): 0
+ Down Delay (ms): 0
-Slave Interface: eth0
-MII Status: up
-Link Failure Count: 0
-Permanent HW addr: 00:1a:a0:12:8f:cb
-Slave queue ID: 0
+ Slave Interface: eth0
+ MII Status: up
+ Link Failure Count: 0
+ Permanent HW addr: 00:1a:a0:12:8f:cb
+ Slave queue ID: 0
-Slave Interface: eth1
-MII Status: up
-Link Failure Count: 0
-Permanent HW addr: 00:1a:a0:12:8f:cc
-Slave queue ID: 2
+ Slave Interface: eth1
+ MII Status: up
+ Link Failure Count: 0
+ Permanent HW addr: 00:1a:a0:12:8f:cc
+ Slave queue ID: 2
-The queue_id for a slave can be set using the command:
+The queue_id for a slave can be set using the command::
-# echo "eth1:2" > /sys/class/net/bond0/bonding/queue_id
+ # echo "eth1:2" > /sys/class/net/bond0/bonding/queue_id
Any interface that needs a queue_id set should set it with multiple calls
like the one above until proper priorities are set for all interfaces. On
@@ -1645,12 +1680,12 @@ These queue id's can be used in conjunction with the tc utility to configure
a multiqueue qdisc and filters to bias certain traffic to transmit on certain
slave devices. For instance, say we wanted, in the above configuration to
force all traffic bound to 192.168.1.100 to use eth1 in the bond as its output
-device. The following commands would accomplish this:
+device. The following commands would accomplish this::
-# tc qdisc add dev bond0 handle 1 root multiq
+ # tc qdisc add dev bond0 handle 1 root multiq
-# tc filter add dev bond0 protocol ip parent 1: prio 1 u32 match ip dst \
- 192.168.1.100 action skbedit queue_mapping 2
+ # tc filter add dev bond0 protocol ip parent 1: prio 1 u32 match ip \
+ dst 192.168.1.100 action skbedit queue_mapping 2
These commands tell the kernel to attach a multiqueue queue discipline to the
bond0 interface and filter traffic enqueued to it, such that packets with a dst
@@ -1663,7 +1698,7 @@ that normal output policy selection should take place. One benefit to simply
leaving the qid for a slave to 0 is the multiqueue awareness in the bonding
driver that is now present. This awareness allows tc filters to be placed on
slave devices as well as bond devices and the bonding driver will simply act as
-a pass-through for selecting output queues on the slave device rather than
+a pass-through for selecting output queues on the slave device rather than
output port selection.
This feature first appeared in bonding driver version 3.7.0 and support for
@@ -1689,31 +1724,31 @@ few bonding parameters:
(a) ad_actor_system : You can set a random mac-address that can be used for
these LACPDU exchanges. The value can not be either NULL or Multicast.
Also it's preferable to set the local-admin bit. Following shell code
- generates a random mac-address as described above.
+ generates a random mac-address as described above::
- # sys_mac_addr=$(printf '%02x:%02x:%02x:%02x:%02x:%02x' \
- $(( (RANDOM & 0xFE) | 0x02 )) \
- $(( RANDOM & 0xFF )) \
- $(( RANDOM & 0xFF )) \
- $(( RANDOM & 0xFF )) \
- $(( RANDOM & 0xFF )) \
- $(( RANDOM & 0xFF )))
- # echo $sys_mac_addr > /sys/class/net/bond0/bonding/ad_actor_system
+ # sys_mac_addr=$(printf '%02x:%02x:%02x:%02x:%02x:%02x' \
+ $(( (RANDOM & 0xFE) | 0x02 )) \
+ $(( RANDOM & 0xFF )) \
+ $(( RANDOM & 0xFF )) \
+ $(( RANDOM & 0xFF )) \
+ $(( RANDOM & 0xFF )) \
+ $(( RANDOM & 0xFF )))
+ # echo $sys_mac_addr > /sys/class/net/bond0/bonding/ad_actor_system
(b) ad_actor_sys_prio : Randomize the system priority. The default value
is 65535, but system can take the value from 1 - 65535. Following shell
- code generates random priority and sets it.
+ code generates random priority and sets it::
- # sys_prio=$(( 1 + RANDOM + RANDOM ))
- # echo $sys_prio > /sys/class/net/bond0/bonding/ad_actor_sys_prio
+ # sys_prio=$(( 1 + RANDOM + RANDOM ))
+ # echo $sys_prio > /sys/class/net/bond0/bonding/ad_actor_sys_prio
(c) ad_user_port_key : Use the user portion of the port-key. The default
keeps this empty. These are the upper 10 bits of the port-key and value
ranges from 0 - 1023. Following shell code generates these 10 bits and
- sets it.
+ sets it::
- # usr_port_key=$(( RANDOM & 0x3FF ))
- # echo $usr_port_key > /sys/class/net/bond0/bonding/ad_user_port_key
+ # usr_port_key=$(( RANDOM & 0x3FF ))
+ # echo $usr_port_key > /sys/class/net/bond0/bonding/ad_user_port_key
4 Querying Bonding Configuration
@@ -1722,81 +1757,81 @@ few bonding parameters:
4.1 Bonding Configuration
-------------------------
- Each bonding device has a read-only file residing in the
+Each bonding device has a read-only file residing in the
/proc/net/bonding directory. The file contents include information
about the bonding configuration, options and state of each slave.
- For example, the contents of /proc/net/bonding/bond0 after the
+For example, the contents of /proc/net/bonding/bond0 after the
driver is loaded with parameters of mode=0 and miimon=1000 is
-generally as follows:
+generally as follows::
Ethernet Channel Bonding Driver: 2.6.1 (October 29, 2004)
- Bonding Mode: load balancing (round-robin)
- Currently Active Slave: eth0
- MII Status: up
- MII Polling Interval (ms): 1000
- Up Delay (ms): 0
- Down Delay (ms): 0
-
- Slave Interface: eth1
- MII Status: up
- Link Failure Count: 1
-
- Slave Interface: eth0
- MII Status: up
- Link Failure Count: 1
-
- The precise format and contents will change depending upon the
+ Bonding Mode: load balancing (round-robin)
+ Currently Active Slave: eth0
+ MII Status: up
+ MII Polling Interval (ms): 1000
+ Up Delay (ms): 0
+ Down Delay (ms): 0
+
+ Slave Interface: eth1
+ MII Status: up
+ Link Failure Count: 1
+
+ Slave Interface: eth0
+ MII Status: up
+ Link Failure Count: 1
+
+The precise format and contents will change depending upon the
bonding configuration, state, and version of the bonding driver.
4.2 Network configuration
-------------------------
- The network configuration can be inspected using the ifconfig
+The network configuration can be inspected using the ifconfig
command. Bonding devices will have the MASTER flag set; Bonding slave
devices will have the SLAVE flag set. The ifconfig output does not
contain information on which slaves are associated with which masters.
- In the example below, the bond0 interface is the master
+In the example below, the bond0 interface is the master
(MASTER) while eth0 and eth1 are slaves (SLAVE). Notice all slaves of
bond0 have the same MAC address (HWaddr) as bond0 for all modes except
-TLB and ALB that require a unique MAC address for each slave.
-
-# /sbin/ifconfig
-bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
- inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0
- UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
- RX packets:7224794 errors:0 dropped:0 overruns:0 frame:0
- TX packets:3286647 errors:1 dropped:0 overruns:1 carrier:0
- collisions:0 txqueuelen:0
-
-eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
- UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
- RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0
- TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0
- collisions:0 txqueuelen:100
- Interrupt:10 Base address:0x1080
-
-eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
- UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
- RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0
- TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0
- collisions:0 txqueuelen:100
- Interrupt:9 Base address:0x1400
+TLB and ALB that require a unique MAC address for each slave::
+
+ # /sbin/ifconfig
+ bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
+ inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0
+ UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
+ RX packets:7224794 errors:0 dropped:0 overruns:0 frame:0
+ TX packets:3286647 errors:1 dropped:0 overruns:1 carrier:0
+ collisions:0 txqueuelen:0
+
+ eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
+ UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
+ RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0
+ TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0
+ collisions:0 txqueuelen:100
+ Interrupt:10 Base address:0x1080
+
+ eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
+ UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
+ RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0
+ TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0
+ collisions:0 txqueuelen:100
+ Interrupt:9 Base address:0x1400
5. Switch Configuration
=======================
- For this section, "switch" refers to whatever system the
+For this section, "switch" refers to whatever system the
bonded devices are directly connected to (i.e., where the other end of
the cable plugs into). This may be an actual dedicated switch device,
or it may be another regular system (e.g., another computer running
Linux),
- The active-backup, balance-tlb and balance-alb modes do not
+The active-backup, balance-tlb and balance-alb modes do not
require any specific configuration of the switch.
- The 802.3ad mode requires that the switch have the appropriate
+The 802.3ad mode requires that the switch have the appropriate
ports configured as an 802.3ad aggregation. The precise method used
to configure this varies from switch to switch, but, for example, a
Cisco 3550 series switch requires that the appropriate ports first be
@@ -1804,7 +1839,7 @@ grouped together in a single etherchannel instance, then that
etherchannel is set to mode "lacp" to enable 802.3ad (instead of
standard EtherChannel).
- The balance-rr, balance-xor and broadcast modes generally
+The balance-rr, balance-xor and broadcast modes generally
require that the switch have the appropriate ports grouped together.
The nomenclature for such a group differs between switches, it may be
called an "etherchannel" (as in the Cisco example, above), a "trunk
@@ -1820,7 +1855,7 @@ with another EtherChannel group.
6. 802.1q VLAN Support
======================
- It is possible to configure VLAN devices over a bond interface
+It is possible to configure VLAN devices over a bond interface
using the 8021q driver. However, only packets coming from the 8021q
driver and passing through bonding will be tagged by default. Self
generated packets, for example, bonding's learning packets or ARP
@@ -1829,7 +1864,7 @@ tagged internally by bonding itself. As a result, bonding must
"learn" the VLAN IDs configured above it, and use those IDs to tag
self generated packets.
- For reasons of simplicity, and to support the use of adapters
+For reasons of simplicity, and to support the use of adapters
that can do VLAN hardware acceleration offloading, the bonding
interface declares itself as fully hardware offloading capable, it gets
the add_vid/kill_vid notifications to gather the necessary
@@ -1839,7 +1874,7 @@ should go through an adapter that is not offloading capable are
"un-accelerated" by the bonding driver so the VLAN tag sits in the
regular location.
- VLAN interfaces *must* be added on top of a bonding interface
+VLAN interfaces *must* be added on top of a bonding interface
only after enslaving at least one slave. The bonding interface has a
hardware address of 00:00:00:00:00:00 until the first slave is added.
If the VLAN interface is created prior to the first enslavement, it
@@ -1847,23 +1882,23 @@ would pick up the all-zeroes hardware address. Once the first slave
is attached to the bond, the bond device itself will pick up the
slave's hardware address, which is then available for the VLAN device.
- Also, be aware that a similar problem can occur if all slaves
+Also, be aware that a similar problem can occur if all slaves
are released from a bond that still has one or more VLAN interfaces on
top of it. When a new slave is added, the bonding interface will
obtain its hardware address from the first slave, which might not
match the hardware address of the VLAN interfaces (which was
ultimately copied from an earlier slave).
- There are two methods to insure that the VLAN device operates
+There are two methods to insure that the VLAN device operates
with the correct hardware address if all slaves are removed from a
bond interface:
- 1. Remove all VLAN interfaces then recreate them
+1. Remove all VLAN interfaces then recreate them
- 2. Set the bonding interface's hardware address so that it
+2. Set the bonding interface's hardware address so that it
matches the hardware address of the VLAN interfaces.
- Note that changing a VLAN interface's HW address would set the
+Note that changing a VLAN interface's HW address would set the
underlying device -- i.e. the bonding interface -- to promiscuous
mode, which might not be what you want.
@@ -1871,24 +1906,24 @@ mode, which might not be what you want.
7. Link Monitoring
==================
- The bonding driver at present supports two schemes for
+The bonding driver at present supports two schemes for
monitoring a slave device's link state: the ARP monitor and the MII
monitor.
- At the present time, due to implementation restrictions in the
+At the present time, due to implementation restrictions in the
bonding driver itself, it is not possible to enable both ARP and MII
monitoring simultaneously.
7.1 ARP Monitor Operation
-------------------------
- The ARP monitor operates as its name suggests: it sends ARP
+The ARP monitor operates as its name suggests: it sends ARP
queries to one or more designated peer systems on the network, and
uses the response as an indication that the link is operating. This
gives some assurance that traffic is actually flowing to and from one
or more peers on the local network.
- The ARP monitor relies on the device driver itself to verify
+The ARP monitor relies on the device driver itself to verify
that traffic is flowing. In particular, the driver must keep up to
date the last receive time, dev->last_rx. Drivers that use NETIF_F_LLTX
flag must also update netdev_queue->trans_start. If they do not, then the
@@ -1900,36 +1935,36 @@ your device driver is not updating last_rx and trans_start.
7.2 Configuring Multiple ARP Targets
------------------------------------
- While ARP monitoring can be done with just one target, it can
+While ARP monitoring can be done with just one target, it can
be useful in a High Availability setup to have several targets to
monitor. In the case of just one target, the target itself may go
down or have a problem making it unresponsive to ARP requests. Having
an additional target (or several) increases the reliability of the ARP
monitoring.
- Multiple ARP targets must be separated by commas as follows:
+Multiple ARP targets must be separated by commas as follows::
-# example options for ARP monitoring with three targets
-alias bond0 bonding
-options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9
+ # example options for ARP monitoring with three targets
+ alias bond0 bonding
+ options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9
- For just a single target the options would resemble:
+For just a single target the options would resemble::
-# example options for ARP monitoring with one target
-alias bond0 bonding
-options bond0 arp_interval=60 arp_ip_target=192.168.0.100
+ # example options for ARP monitoring with one target
+ alias bond0 bonding
+ options bond0 arp_interval=60 arp_ip_target=192.168.0.100
7.3 MII Monitor Operation
-------------------------
- The MII monitor monitors only the carrier state of the local
+The MII monitor monitors only the carrier state of the local
network interface. It accomplishes this in one of three ways: by
depending upon the device driver to maintain its carrier state, by
querying the device's MII registers, or by making an ethtool query to
the device.
- If the use_carrier module parameter is 1 (the default value),
+If the use_carrier module parameter is 1 (the default value),
then the MII monitor will rely on the driver for carrier state
information (via the netif_carrier subsystem). As explained in the
use_carrier parameter information, above, if the MII monitor fails to
@@ -1937,7 +1972,7 @@ detect carrier loss on the device (e.g., when the cable is physically
disconnected), it may be that the driver does not support
netif_carrier.
- If use_carrier is 0, then the MII monitor will first query the
+If use_carrier is 0, then the MII monitor will first query the
device's (via ioctl) MII registers and check the link state. If that
request fails (not just that it returns carrier down), then the MII
monitor will make an ethtool ETHOOL_GLINK request to attempt to obtain
@@ -1952,25 +1987,25 @@ up.
8.1 Adventures in Routing
-------------------------
- When bonding is configured, it is important that the slave
+When bonding is configured, it is important that the slave
devices not have routes that supersede routes of the master (or,
generally, not have routes at all). For example, suppose the bonding
device bond0 has two slaves, eth0 and eth1, and the routing table is
-as follows:
+as follows::
-Kernel IP routing table
-Destination Gateway Genmask Flags MSS Window irtt Iface
-10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth0
-10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth1
-10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 bond0
-127.0.0.0 0.0.0.0 255.0.0.0 U 40 0 0 lo
+ Kernel IP routing table
+ Destination Gateway Genmask Flags MSS Window irtt Iface
+ 10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth0
+ 10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth1
+ 10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 bond0
+ 127.0.0.0 0.0.0.0 255.0.0.0 U 40 0 0 lo
- This routing configuration will likely still update the
+This routing configuration will likely still update the
receive/transmit times in the driver (needed by the ARP monitor), but
may bypass the bonding driver (because outgoing traffic to, in this
case, another host on network 10 would use eth0 or eth1 before bond0).
- The ARP monitor (and ARP itself) may become confused by this
+The ARP monitor (and ARP itself) may become confused by this
configuration, because ARP requests (generated by the ARP monitor)
will be sent on one interface (bond0), but the corresponding reply
will arrive on a different interface (eth0). This reply looks to ARP
@@ -1978,7 +2013,7 @@ as an unsolicited ARP reply (because ARP matches replies on an
interface basis), and is discarded. The MII monitor is not affected
by the state of the routing table.
- The solution here is simply to insure that slaves do not have
+The solution here is simply to insure that slaves do not have
routes of their own, and if for some reason they must, those routes do
not supersede routes of their master. This should generally be the
case, but unusual configurations or errant manual or automatic static
@@ -1987,22 +2022,22 @@ route additions may cause trouble.
8.2 Ethernet Device Renaming
----------------------------
- On systems with network configuration scripts that do not
+On systems with network configuration scripts that do not
associate physical devices directly with network interface names (so
that the same physical device always has the same "ethX" name), it may
be necessary to add some special logic to config files in
/etc/modprobe.d/.
- For example, given a modules.conf containing the following:
+For example, given a modules.conf containing the following::
-alias bond0 bonding
-options bond0 mode=some-mode miimon=50
-alias eth0 tg3
-alias eth1 tg3
-alias eth2 e1000
-alias eth3 e1000
+ alias bond0 bonding
+ options bond0 mode=some-mode miimon=50
+ alias eth0 tg3
+ alias eth1 tg3
+ alias eth2 e1000
+ alias eth3 e1000
- If neither eth0 and eth1 are slaves to bond0, then when the
+If neither eth0 and eth1 are slaves to bond0, then when the
bond0 interface comes up, the devices may end up reordered. This
happens because bonding is loaded first, then its slave device's
drivers are loaded next. Since no other drivers have been loaded,
@@ -2010,36 +2045,36 @@ when the e1000 driver loads, it will receive eth0 and eth1 for its
devices, but the bonding configuration tries to enslave eth2 and eth3
(which may later be assigned to the tg3 devices).
- Adding the following:
+Adding the following::
-add above bonding e1000 tg3
+ add above bonding e1000 tg3
- causes modprobe to load e1000 then tg3, in that order, when
+causes modprobe to load e1000 then tg3, in that order, when
bonding is loaded. This command is fully documented in the
modules.conf manual page.
- On systems utilizing modprobe an equivalent problem can occur.
+On systems utilizing modprobe an equivalent problem can occur.
In this case, the following can be added to config files in
-/etc/modprobe.d/ as:
+/etc/modprobe.d/ as::
-softdep bonding pre: tg3 e1000
+ softdep bonding pre: tg3 e1000
- This will load tg3 and e1000 modules before loading the bonding one.
+This will load tg3 and e1000 modules before loading the bonding one.
Full documentation on this can be found in the modprobe.d and modprobe
manual pages.
8.3. Painfully Slow Or No Failed Link Detection By Miimon
---------------------------------------------------------
- By default, bonding enables the use_carrier option, which
+By default, bonding enables the use_carrier option, which
instructs bonding to trust the driver to maintain carrier state.
- As discussed in the options section, above, some drivers do
+As discussed in the options section, above, some drivers do
not support the netif_carrier_on/_off link state tracking system.
With use_carrier enabled, bonding will always see these links as up,
regardless of their actual state.
- Additionally, other drivers do support netif_carrier, but do
+Additionally, other drivers do support netif_carrier, but do
not maintain it in real time, e.g., only polling the link state at
some fixed interval. In this case, miimon will detect failures, but
only after some long period of time has expired. If it appears that
@@ -2051,7 +2086,7 @@ use_carrier=0 method of querying the registers directly works). If
use_carrier=0 does not improve the failover, then the driver may cache
the registers, or the problem may be elsewhere.
- Also, remember that miimon only checks for the device's
+Also, remember that miimon only checks for the device's
carrier state. It has no way to determine the state of devices on or
beyond other ports of a switch, or if a switch is refusing to pass
traffic while still maintaining carrier on.
@@ -2059,7 +2094,7 @@ traffic while still maintaining carrier on.
9. SNMP agents
===============
- If running SNMP agents, the bonding driver should be loaded
+If running SNMP agents, the bonding driver should be loaded
before any network drivers participating in a bond. This requirement
is due to the interface index (ipAdEntIfIndex) being associated to
the first interface found with a given IP address. That is, there is
@@ -2070,6 +2105,8 @@ with the eth0 interface. This configuration is shown below, the IP
address 192.168.1.1 has an interface index of 2 which indexes to eth0
in the ifDescr table (ifDescr.2).
+::
+
interfaces.ifTable.ifEntry.ifDescr.1 = lo
interfaces.ifTable.ifEntry.ifDescr.2 = eth0
interfaces.ifTable.ifEntry.ifDescr.3 = eth1
@@ -2081,7 +2118,7 @@ in the ifDescr table (ifDescr.2).
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
- This problem is avoided by loading the bonding driver before
+This problem is avoided by loading the bonding driver before
any network drivers participating in a bond. Below is an example of
loading the bonding driver first, the IP address 192.168.1.1 is
correctly associated with ifDescr.2.
@@ -2097,7 +2134,7 @@ correctly associated with ifDescr.2.
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
- While some distributions may not report the interface name in
+While some distributions may not report the interface name in
ifDescr, the association between the IP address and IfIndex remains
and SNMP functions such as Interface_Scan_Next will report that
association.
@@ -2105,34 +2142,34 @@ association.
10. Promiscuous mode
====================
- When running network monitoring tools, e.g., tcpdump, it is
+When running network monitoring tools, e.g., tcpdump, it is
common to enable promiscuous mode on the device, so that all traffic
is seen (instead of seeing only traffic destined for the local host).
The bonding driver handles promiscuous mode changes to the bonding
master device (e.g., bond0), and propagates the setting to the slave
devices.
- For the balance-rr, balance-xor, broadcast, and 802.3ad modes,
+For the balance-rr, balance-xor, broadcast, and 802.3ad modes,
the promiscuous mode setting is propagated to all slaves.
- For the active-backup, balance-tlb and balance-alb modes, the
+For the active-backup, balance-tlb and balance-alb modes, the
promiscuous mode setting is propagated only to the active slave.
- For balance-tlb mode, the active slave is the slave currently
+For balance-tlb mode, the active slave is the slave currently
receiving inbound traffic.
- For balance-alb mode, the active slave is the slave used as a
+For balance-alb mode, the active slave is the slave used as a
"primary." This slave is used for mode-specific control traffic, for
sending to peers that are unassigned or if the load is unbalanced.
- For the active-backup, balance-tlb and balance-alb modes, when
+For the active-backup, balance-tlb and balance-alb modes, when
the active slave changes (e.g., due to a link failure), the
promiscuous setting will be propagated to the new active slave.
11. Configuring Bonding for High Availability
=============================================
- High Availability refers to configurations that provide
+High Availability refers to configurations that provide
maximum network availability by having redundant or backup devices,
links or switches between the host and the rest of the world. The
goal is to provide the maximum availability of network connectivity
@@ -2142,7 +2179,7 @@ could provide higher throughput.
11.1 High Availability in a Single Switch Topology
--------------------------------------------------
- If two hosts (or a host and a single switch) are directly
+If two hosts (or a host and a single switch) are directly
connected via multiple physical links, then there is no availability
penalty to optimizing for maximum bandwidth. In this case, there is
only one switch (or peer), so if it fails, there is no alternative
@@ -2150,32 +2187,32 @@ access to fail over to. Additionally, the bonding load balance modes
support link monitoring of their members, so if individual links fail,
the load will be rebalanced across the remaining devices.
- See Section 12, "Configuring Bonding for Maximum Throughput"
+See Section 12, "Configuring Bonding for Maximum Throughput"
for information on configuring bonding with one peer device.
11.2 High Availability in a Multiple Switch Topology
----------------------------------------------------
- With multiple switches, the configuration of bonding and the
+With multiple switches, the configuration of bonding and the
network changes dramatically. In multiple switch topologies, there is
a trade off between network availability and usable bandwidth.
- Below is a sample network, configured to maximize the
-availability of the network:
-
- | |
- |port3 port3|
- +-----+----+ +-----+----+
- | |port2 ISL port2| |
- | switch A +--------------------------+ switch B |
- | | | |
- +-----+----+ +-----++---+
- |port1 port1|
- | +-------+ |
- +-------------+ host1 +---------------+
- eth0 +-------+ eth1
-
- In this configuration, there is a link between the two
+Below is a sample network, configured to maximize the
+availability of the network::
+
+ | |
+ |port3 port3|
+ +-----+----+ +-----+----+
+ | |port2 ISL port2| |
+ | switch A +--------------------------+ switch B |
+ | | | |
+ +-----+----+ +-----++---+
+ |port1 port1|
+ | +-------+ |
+ +-------------+ host1 +---------------+
+ eth0 +-------+ eth1
+
+In this configuration, there is a link between the two
switches (ISL, or inter switch link), and multiple ports connecting to
the outside world ("port3" on each switch). There is no technical
reason that this could not be extended to a third switch.
@@ -2183,19 +2220,21 @@ reason that this could not be extended to a third switch.
11.2.1 HA Bonding Mode Selection for Multiple Switch Topology
-------------------------------------------------------------
- In a topology such as the example above, the active-backup and
+In a topology such as the example above, the active-backup and
broadcast modes are the only useful bonding modes when optimizing for
availability; the other modes require all links to terminate on the
same peer for them to behave rationally.
-active-backup: This is generally the preferred mode, particularly if
+active-backup:
+ This is generally the preferred mode, particularly if
the switches have an ISL and play together well. If the
network configuration is such that one switch is specifically
a backup switch (e.g., has lower capacity, higher cost, etc),
then the primary option can be used to insure that the
preferred link is always used when it is available.
-broadcast: This mode is really a special purpose mode, and is suitable
+broadcast:
+ This mode is really a special purpose mode, and is suitable
only for very specific needs. For example, if the two
switches are not connected (no ISL), and the networks beyond
them are totally independent. In this case, if it is
@@ -2205,7 +2244,7 @@ broadcast: This mode is really a special purpose mode, and is suitable
11.2.2 HA Link Monitoring Selection for Multiple Switch Topology
----------------------------------------------------------------
- The choice of link monitoring ultimately depends upon your
+The choice of link monitoring ultimately depends upon your
switch. If the switch can reliably fail ports in response to other
failures, then either the MII or ARP monitors should work. For
example, in the above example, if the "port3" link fails at the remote
@@ -2213,7 +2252,7 @@ end, the MII monitor has no direct means to detect this. The ARP
monitor could be configured with a target at the remote end of port3,
thus detecting that failure without switch support.
- In general, however, in a multiple switch topology, the ARP
+In general, however, in a multiple switch topology, the ARP
monitor can provide a higher level of reliability in detecting end to
end connectivity failures (which may be caused by the failure of any
individual component to pass traffic for any reason). Additionally,
@@ -2222,7 +2261,7 @@ one for each switch in the network). This will insure that,
regardless of which switch is active, the ARP monitor has a suitable
target to query.
- Note, also, that of late many switches now support a functionality
+Note, also, that of late many switches now support a functionality
generally referred to as "trunk failover." This is a feature of the
switch that causes the link state of a particular switch port to be set
down (or up) when the state of another switch port goes down (or up).
@@ -2238,18 +2277,18 @@ suitable switches.
12.1 Maximizing Throughput in a Single Switch Topology
------------------------------------------------------
- In a single switch configuration, the best method to maximize
+In a single switch configuration, the best method to maximize
throughput depends upon the application and network environment. The
various load balancing modes each have strengths and weaknesses in
different environments, as detailed below.
- For this discussion, we will break down the topologies into
+For this discussion, we will break down the topologies into
two categories. Depending upon the destination of most traffic, we
categorize them into either "gatewayed" or "local" configurations.
- In a gatewayed configuration, the "switch" is acting primarily
+In a gatewayed configuration, the "switch" is acting primarily
as a router, and the majority of traffic passes through this router to
-other networks. An example would be the following:
+other networks. An example would be the following::
+----------+ +----------+
@@ -2259,25 +2298,25 @@ other networks. An example would be the following:
| |eth1 port2| | here somewhere
+----------+ +----------+
- The router may be a dedicated router device, or another host
+The router may be a dedicated router device, or another host
acting as a gateway. For our discussion, the important point is that
the majority of traffic from Host A will pass through the router to
some other network before reaching its final destination.
- In a gatewayed network configuration, although Host A may
+In a gatewayed network configuration, although Host A may
communicate with many other systems, all of its traffic will be sent
and received via one other peer on the local network, the router.
- Note that the case of two systems connected directly via
+Note that the case of two systems connected directly via
multiple physical links is, for purposes of configuring bonding, the
same as a gatewayed configuration. In that case, it happens that all
traffic is destined for the "gateway" itself, not some other network
beyond the gateway.
- In a local configuration, the "switch" is acting primarily as
+In a local configuration, the "switch" is acting primarily as
a switch, and the majority of traffic passes through this switch to
reach other stations on the same network. An example would be the
-following:
+following::
+----------+ +----------+ +--------+
| |eth0 port1| +-------+ Host B |
@@ -2287,19 +2326,19 @@ following:
+----------+ +----------+port4 +--------+
- Again, the switch may be a dedicated switch device, or another
+Again, the switch may be a dedicated switch device, or another
host acting as a gateway. For our discussion, the important point is
that the majority of traffic from Host A is destined for other hosts
on the same local network (Hosts B and C in the above example).
- In summary, in a gatewayed configuration, traffic to and from
+In summary, in a gatewayed configuration, traffic to and from
the bonded device will be to the same MAC level peer on the network
(the gateway itself, i.e., the router), regardless of its final
destination. In a local configuration, traffic flows directly to and
from the final destinations, thus, each destination (Host B, Host C)
will be addressed directly by their individual MAC addresses.
- This distinction between a gatewayed and a local network
+This distinction between a gatewayed and a local network
configuration is important because many of the load balancing modes
available use the MAC addresses of the local network source and
destination to make load balancing decisions. The behavior of each
@@ -2309,11 +2348,12 @@ mode is described below.
12.1.1 MT Bonding Mode Selection for Single Switch Topology
-----------------------------------------------------------
- This configuration is the easiest to set up and to understand,
+This configuration is the easiest to set up and to understand,
although you will have to decide which bonding mode best suits your
needs. The trade offs for each mode are detailed below:
-balance-rr: This mode is the only mode that will permit a single
+balance-rr:
+ This mode is the only mode that will permit a single
TCP/IP connection to stripe traffic across multiple
interfaces. It is therefore the only mode that will allow a
single TCP/IP stream to utilize more than one interface's
@@ -2351,7 +2391,8 @@ balance-rr: This mode is the only mode that will permit a single
This mode requires the switch to have the appropriate ports
configured for "etherchannel" or "trunking."
-active-backup: There is not much advantage in this network topology to
+active-backup:
+ There is not much advantage in this network topology to
the active-backup mode, as the inactive backup devices are all
connected to the same peer as the primary. In this case, a
load balancing mode (with link monitoring) will provide the
@@ -2361,7 +2402,8 @@ active-backup: There is not much advantage in this network topology to
have value if the hardware available does not support any of
the load balance modes.
-balance-xor: This mode will limit traffic such that packets destined
+balance-xor:
+ This mode will limit traffic such that packets destined
for specific peers will always be sent over the same
interface. Since the destination is determined by the MAC
addresses involved, this mode works best in a "local" network
@@ -2373,10 +2415,12 @@ balance-xor: This mode will limit traffic such that packets destined
As with balance-rr, the switch ports need to be configured for
"etherchannel" or "trunking."
-broadcast: Like active-backup, there is not much advantage to this
+broadcast:
+ Like active-backup, there is not much advantage to this
mode in this type of network topology.
-802.3ad: This mode can be a good choice for this type of network
+802.3ad:
+ This mode can be a good choice for this type of network
topology. The 802.3ad mode is an IEEE standard, so all peers
that implement 802.3ad should interoperate well. The 802.3ad
protocol includes automatic configuration of the aggregates,
@@ -2390,7 +2434,7 @@ broadcast: Like active-backup, there is not much advantage to this
the same speed and duplex. Also, as with all bonding load
balance modes other than balance-rr, no single connection will
be able to utilize more than a single interface's worth of
- bandwidth.
+ bandwidth.
Additionally, the linux bonding 802.3ad implementation
distributes traffic by peer (using an XOR of MAC addresses
@@ -2404,7 +2448,8 @@ broadcast: Like active-backup, there is not much advantage to this
Finally, the 802.3ad mode mandates the use of the MII monitor,
therefore, the ARP monitor is not available in this mode.
-balance-tlb: The balance-tlb mode balances outgoing traffic by peer.
+balance-tlb:
+ The balance-tlb mode balances outgoing traffic by peer.
Since the balancing is done according to MAC address, in a
"gatewayed" configuration (as described above), this mode will
send all traffic across a single device. However, in a
@@ -2422,7 +2467,8 @@ balance-tlb: The balance-tlb mode balances outgoing traffic by peer.
network device driver of the slave interfaces, and the ARP
monitor is not available.
-balance-alb: This mode is everything that balance-tlb is, and more.
+balance-alb:
+ This mode is everything that balance-tlb is, and more.
It has all of the features (and restrictions) of balance-tlb,
and will also balance incoming traffic from local network
peers (as described in the Bonding Module Options section,
@@ -2435,7 +2481,7 @@ balance-alb: This mode is everything that balance-tlb is, and more.
12.1.2 MT Link Monitoring for Single Switch Topology
----------------------------------------------------
- The choice of link monitoring may largely depend upon which
+The choice of link monitoring may largely depend upon which
mode you choose to use. The more advanced load balancing modes do not
support the use of the ARP monitor, and are thus restricted to using
the MII monitor (which does not provide as high a level of end to end
@@ -2444,27 +2490,27 @@ assurance as the ARP monitor).
12.2 Maximum Throughput in a Multiple Switch Topology
-----------------------------------------------------
- Multiple switches may be utilized to optimize for throughput
+Multiple switches may be utilized to optimize for throughput
when they are configured in parallel as part of an isolated network
-between two or more systems, for example:
-
- +-----------+
- | Host A |
- +-+---+---+-+
- | | |
- +--------+ | +---------+
- | | |
- +------+---+ +-----+----+ +-----+----+
- | Switch A | | Switch B | | Switch C |
- +------+---+ +-----+----+ +-----+----+
- | | |
- +--------+ | +---------+
- | | |
- +-+---+---+-+
- | Host B |
- +-----------+
-
- In this configuration, the switches are isolated from one
+between two or more systems, for example::
+
+ +-----------+
+ | Host A |
+ +-+---+---+-+
+ | | |
+ +--------+ | +---------+
+ | | |
+ +------+---+ +-----+----+ +-----+----+
+ | Switch A | | Switch B | | Switch C |
+ +------+---+ +-----+----+ +-----+----+
+ | | |
+ +--------+ | +---------+
+ | | |
+ +-+---+---+-+
+ | Host B |
+ +-----------+
+
+In this configuration, the switches are isolated from one
another. One reason to employ a topology such as this is for an
isolated network with many hosts (a cluster configured for high
performance, for example), using multiple smaller switches can be more
@@ -2472,14 +2518,14 @@ cost effective than a single larger switch, e.g., on a network with 24
hosts, three 24 port switches can be significantly less expensive than
a single 72 port switch.
- If access beyond the network is required, an individual host
+If access beyond the network is required, an individual host
can be equipped with an additional network device connected to an
external network; this host then additionally acts as a gateway.
12.2.1 MT Bonding Mode Selection for Multiple Switch Topology
-------------------------------------------------------------
- In actual practice, the bonding mode typically employed in
+In actual practice, the bonding mode typically employed in
configurations of this type is balance-rr. Historically, in this
network configuration, the usual caveats about out of order packet
delivery are mitigated by the use of network adapters that do not do
@@ -2492,7 +2538,7 @@ utilize greater than one interface's bandwidth.
12.2.2 MT Link Monitoring for Multiple Switch Topology
------------------------------------------------------
- Again, in actual practice, the MII monitor is most often used
+Again, in actual practice, the MII monitor is most often used
in this configuration, as performance is given preference over
availability. The ARP monitor will function in this topology, but its
advantages over the MII monitor are mitigated by the volume of probes
@@ -2505,10 +2551,10 @@ host in the network is configured with bonding).
13.1 Link Establishment and Failover Delays
-------------------------------------------
- Some switches exhibit undesirable behavior with regard to the
+Some switches exhibit undesirable behavior with regard to the
timing of link up and down reporting by the switch.
- First, when a link comes up, some switches may indicate that
+First, when a link comes up, some switches may indicate that
the link is up (carrier available), but not pass traffic over the
interface for some period of time. This delay is typically due to
some type of autonegotiation or routing protocol, but may also occur
@@ -2517,12 +2563,12 @@ failure). If you find this to be a problem, specify an appropriate
value to the updelay bonding module option to delay the use of the
relevant interface(s).
- Second, some switches may "bounce" the link state one or more
+Second, some switches may "bounce" the link state one or more
times while a link is changing state. This occurs most commonly while
the switch is initializing. Again, an appropriate updelay value may
help.
- Note that when a bonding interface has no active links, the
+Note that when a bonding interface has no active links, the
driver will immediately reuse the first link that goes up, even if the
updelay parameter has been specified (the updelay is ignored in this
case). If there are slave interfaces waiting for the updelay timeout
@@ -2532,7 +2578,7 @@ value of updelay has been overestimated, and since this occurs only in
cases with no connectivity, there is no additional penalty for
ignoring the updelay.
- In addition to the concerns about switch timings, if your
+In addition to the concerns about switch timings, if your
switches take a long time to go into backup mode, it may be desirable
to not activate a backup interface immediately after a link goes down.
Failover may be delayed via the downdelay bonding module option.
@@ -2540,31 +2586,31 @@ Failover may be delayed via the downdelay bonding module option.
13.2 Duplicated Incoming Packets
--------------------------------
- NOTE: Starting with version 3.0.2, the bonding driver has logic to
+NOTE: Starting with version 3.0.2, the bonding driver has logic to
suppress duplicate packets, which should largely eliminate this problem.
The following description is kept for reference.
- It is not uncommon to observe a short burst of duplicated
+It is not uncommon to observe a short burst of duplicated
traffic when the bonding device is first used, or after it has been
idle for some period of time. This is most easily observed by issuing
a "ping" to some other host on the network, and noticing that the
output from ping flags duplicates (typically one per slave).
- For example, on a bond in active-backup mode with five slaves
-all connected to one switch, the output may appear as follows:
-
-# ping -n 10.0.4.2
-PING 10.0.4.2 (10.0.4.2) from 10.0.3.10 : 56(84) bytes of data.
-64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.7 ms
-64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
-64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
-64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
-64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
-64 bytes from 10.0.4.2: icmp_seq=2 ttl=64 time=0.216 ms
-64 bytes from 10.0.4.2: icmp_seq=3 ttl=64 time=0.267 ms
-64 bytes from 10.0.4.2: icmp_seq=4 ttl=64 time=0.222 ms
-
- This is not due to an error in the bonding driver, rather, it
+For example, on a bond in active-backup mode with five slaves
+all connected to one switch, the output may appear as follows::
+
+ # ping -n 10.0.4.2
+ PING 10.0.4.2 (10.0.4.2) from 10.0.3.10 : 56(84) bytes of data.
+ 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.7 ms
+ 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
+ 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
+ 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
+ 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
+ 64 bytes from 10.0.4.2: icmp_seq=2 ttl=64 time=0.216 ms
+ 64 bytes from 10.0.4.2: icmp_seq=3 ttl=64 time=0.267 ms
+ 64 bytes from 10.0.4.2: icmp_seq=4 ttl=64 time=0.222 ms
+
+This is not due to an error in the bonding driver, rather, it
is a side effect of how many switches update their MAC forwarding
tables. Initially, the switch does not associate the MAC address in
the packet with a particular switch port, and so it may send the
@@ -2574,7 +2620,7 @@ single switch, when the switch (temporarily) floods the traffic to all
ports, the bond device receives multiple copies of the same packet
(one per slave device).
- The duplicated packet behavior is switch dependent, some
+The duplicated packet behavior is switch dependent, some
switches exhibit this, and some do not. On switches that display this
behavior, it can be induced by clearing the MAC forwarding table (on
most Cisco switches, the privileged command "clear mac address-table
@@ -2583,16 +2629,16 @@ dynamic" will accomplish this).
14. Hardware Specific Considerations
====================================
- This section contains additional information for configuring
+This section contains additional information for configuring
bonding on specific hardware platforms, or for interfacing bonding
with particular switches or other devices.
14.1 IBM BladeCenter
--------------------
- This applies to the JS20 and similar systems.
+This applies to the JS20 and similar systems.
- On the JS20 blades, the bonding driver supports only
+On the JS20 blades, the bonding driver supports only
balance-rr, active-backup, balance-tlb and balance-alb modes. This is
largely due to the network topology inside the BladeCenter, detailed
below.
@@ -2600,7 +2646,7 @@ below.
JS20 network adapter information
--------------------------------
- All JS20s come with two Broadcom Gigabit Ethernet ports
+All JS20s come with two Broadcom Gigabit Ethernet ports
integrated on the planar (that's "motherboard" in IBM-speak). In the
BladeCenter chassis, the eth0 port of all JS20 blades is hard wired to
I/O Module #1; similarly, all eth1 ports are wired to I/O Module #2.
@@ -2608,36 +2654,36 @@ An add-on Broadcom daughter card can be installed on a JS20 to provide
two more Gigabit Ethernet ports. These ports, eth2 and eth3, are
wired to I/O Modules 3 and 4, respectively.
- Each I/O Module may contain either a switch or a passthrough
+Each I/O Module may contain either a switch or a passthrough
module (which allows ports to be directly connected to an external
switch). Some bonding modes require a specific BladeCenter internal
network topology in order to function; these are detailed below.
- Additional BladeCenter-specific networking information can be
+Additional BladeCenter-specific networking information can be
found in two IBM Redbooks (www.ibm.com/redbooks):
-"IBM eServer BladeCenter Networking Options"
-"IBM eServer BladeCenter Layer 2-7 Network Switching"
+- "IBM eServer BladeCenter Networking Options"
+- "IBM eServer BladeCenter Layer 2-7 Network Switching"
BladeCenter networking configuration
------------------------------------
- Because a BladeCenter can be configured in a very large number
+Because a BladeCenter can be configured in a very large number
of ways, this discussion will be confined to describing basic
configurations.
- Normally, Ethernet Switch Modules (ESMs) are used in I/O
+Normally, Ethernet Switch Modules (ESMs) are used in I/O
modules 1 and 2. In this configuration, the eth0 and eth1 ports of a
JS20 will be connected to different internal switches (in the
respective I/O modules).
- A passthrough module (OPM or CPM, optical or copper,
+A passthrough module (OPM or CPM, optical or copper,
passthrough module) connects the I/O module directly to an external
switch. By using PMs in I/O module #1 and #2, the eth0 and eth1
interfaces of a JS20 can be redirected to the outside world and
connected to a common external switch.
- Depending upon the mix of ESMs and PMs, the network will
+Depending upon the mix of ESMs and PMs, the network will
appear to bonding as either a single switch topology (all PMs) or as a
multiple switch topology (one or more ESMs, zero or more PMs). It is
also possible to connect ESMs together, resulting in a configuration
@@ -2647,24 +2693,24 @@ Topology," above.
Requirements for specific modes
-------------------------------
- The balance-rr mode requires the use of passthrough modules
+The balance-rr mode requires the use of passthrough modules
for devices in the bond, all connected to an common external switch.
That switch must be configured for "etherchannel" or "trunking" on the
appropriate ports, as is usual for balance-rr.
- The balance-alb and balance-tlb modes will function with
+The balance-alb and balance-tlb modes will function with
either switch modules or passthrough modules (or a mix). The only
specific requirement for these modes is that all network interfaces
must be able to reach all destinations for traffic sent over the
bonding device (i.e., the network must converge at some point outside
the BladeCenter).
- The active-backup mode has no additional requirements.
+The active-backup mode has no additional requirements.
Link monitoring issues
----------------------
- When an Ethernet Switch Module is in place, only the ARP
+When an Ethernet Switch Module is in place, only the ARP
monitor will reliably detect link loss to an external switch. This is
nothing unusual, but examination of the BladeCenter cabinet would
suggest that the "external" network ports are the ethernet ports for
@@ -2672,166 +2718,173 @@ the system, when it fact there is a switch between these "external"
ports and the devices on the JS20 system itself. The MII monitor is
only able to detect link failures between the ESM and the JS20 system.
- When a passthrough module is in place, the MII monitor does
+When a passthrough module is in place, the MII monitor does
detect failures to the "external" port, which is then directly
connected to the JS20 system.
Other concerns
--------------
- The Serial Over LAN (SoL) link is established over the primary
+The Serial Over LAN (SoL) link is established over the primary
ethernet (eth0) only, therefore, any loss of link to eth0 will result
in losing your SoL connection. It will not fail over with other
network traffic, as the SoL system is beyond the control of the
bonding driver.
- It may be desirable to disable spanning tree on the switch
+It may be desirable to disable spanning tree on the switch
(either the internal Ethernet Switch Module, or an external switch) to
avoid fail-over delay issues when using bonding.
-
+
15. Frequently Asked Questions
==============================
1. Is it SMP safe?
+-------------------
- Yes. The old 2.0.xx channel bonding patch was not SMP safe.
+Yes. The old 2.0.xx channel bonding patch was not SMP safe.
The new driver was designed to be SMP safe from the start.
2. What type of cards will work with it?
+-----------------------------------------
- Any Ethernet type cards (you can even mix cards - a Intel
+Any Ethernet type cards (you can even mix cards - a Intel
EtherExpress PRO/100 and a 3com 3c905b, for example). For most modes,
devices need not be of the same speed.
- Starting with version 3.2.1, bonding also supports Infiniband
+Starting with version 3.2.1, bonding also supports Infiniband
slaves in active-backup mode.
3. How many bonding devices can I have?
+----------------------------------------
- There is no limit.
+There is no limit.
4. How many slaves can a bonding device have?
+----------------------------------------------
- This is limited only by the number of network interfaces Linux
+This is limited only by the number of network interfaces Linux
supports and/or the number of network cards you can place in your
system.
5. What happens when a slave link dies?
+----------------------------------------
- If link monitoring is enabled, then the failing device will be
+If link monitoring is enabled, then the failing device will be
disabled. The active-backup mode will fail over to a backup link, and
other modes will ignore the failed link. The link will continue to be
monitored, and should it recover, it will rejoin the bond (in whatever
manner is appropriate for the mode). See the sections on High
Availability and the documentation for each mode for additional
information.
-
- Link monitoring can be enabled via either the miimon or
+
+Link monitoring can be enabled via either the miimon or
arp_interval parameters (described in the module parameters section,
above). In general, miimon monitors the carrier state as sensed by
the underlying network device, and the arp monitor (arp_interval)
monitors connectivity to another host on the local network.
- If no link monitoring is configured, the bonding driver will
+If no link monitoring is configured, the bonding driver will
be unable to detect link failures, and will assume that all links are
always available. This will likely result in lost packets, and a
resulting degradation of performance. The precise performance loss
depends upon the bonding mode and network configuration.
6. Can bonding be used for High Availability?
+----------------------------------------------
- Yes. See the section on High Availability for details.
+Yes. See the section on High Availability for details.
7. Which switches/systems does it work with?
+---------------------------------------------
- The full answer to this depends upon the desired mode.
+The full answer to this depends upon the desired mode.
- In the basic balance modes (balance-rr and balance-xor), it
+In the basic balance modes (balance-rr and balance-xor), it
works with any system that supports etherchannel (also called
trunking). Most managed switches currently available have such
support, and many unmanaged switches as well.
- The advanced balance modes (balance-tlb and balance-alb) do
+The advanced balance modes (balance-tlb and balance-alb) do
not have special switch requirements, but do need device drivers that
support specific features (described in the appropriate section under
module parameters, above).
- In 802.3ad mode, it works with systems that support IEEE
+In 802.3ad mode, it works with systems that support IEEE
802.3ad Dynamic Link Aggregation. Most managed and many unmanaged
switches currently available support 802.3ad.
- The active-backup mode should work with any Layer-II switch.
+The active-backup mode should work with any Layer-II switch.
8. Where does a bonding device get its MAC address from?
+---------------------------------------------------------
- When using slave devices that have fixed MAC addresses, or when
+When using slave devices that have fixed MAC addresses, or when
the fail_over_mac option is enabled, the bonding device's MAC address is
the MAC address of the active slave.
- For other configurations, if not explicitly configured (with
+For other configurations, if not explicitly configured (with
ifconfig or ip link), the MAC address of the bonding device is taken from
its first slave device. This MAC address is then passed to all following
slaves and remains persistent (even if the first slave is removed) until
the bonding device is brought down or reconfigured.
- If you wish to change the MAC address, you can set it with
-ifconfig or ip link:
+If you wish to change the MAC address, you can set it with
+ifconfig or ip link::
-# ifconfig bond0 hw ether 00:11:22:33:44:55
+ # ifconfig bond0 hw ether 00:11:22:33:44:55
-# ip link set bond0 address 66:77:88:99:aa:bb
+ # ip link set bond0 address 66:77:88:99:aa:bb
- The MAC address can be also changed by bringing down/up the
-device and then changing its slaves (or their order):
+The MAC address can be also changed by bringing down/up the
+device and then changing its slaves (or their order)::
-# ifconfig bond0 down ; modprobe -r bonding
-# ifconfig bond0 .... up
-# ifenslave bond0 eth...
+ # ifconfig bond0 down ; modprobe -r bonding
+ # ifconfig bond0 .... up
+ # ifenslave bond0 eth...
- This method will automatically take the address from the next
+This method will automatically take the address from the next
slave that is added.
- To restore your slaves' MAC addresses, you need to detach them
-from the bond (`ifenslave -d bond0 eth0'). The bonding driver will
+To restore your slaves' MAC addresses, you need to detach them
+from the bond (``ifenslave -d bond0 eth0``). The bonding driver will
then restore the MAC addresses that the slaves had before they were
enslaved.
16. Resources and Links
=======================
- The latest version of the bonding driver can be found in the latest
+The latest version of the bonding driver can be found in the latest
version of the linux kernel, found on http://kernel.org
- The latest version of this document can be found in the latest kernel
-source (named Documentation/networking/bonding.txt).
+The latest version of this document can be found in the latest kernel
+source (named Documentation/networking/bonding.rst).
- Discussions regarding the usage of the bonding driver take place on the
+Discussions regarding the usage of the bonding driver take place on the
bonding-devel mailing list, hosted at sourceforge.net. If you have questions or
problems, post them to the list. The list address is:
bonding-devel@lists.sourceforge.net
- The administrative interface (to subscribe or unsubscribe) can
+The administrative interface (to subscribe or unsubscribe) can
be found at:
https://lists.sourceforge.net/lists/listinfo/bonding-devel
- Discussions regarding the development of the bonding driver take place
+Discussions regarding the development of the bonding driver take place
on the main Linux network mailing list, hosted at vger.kernel.org. The list
address is:
netdev@vger.kernel.org
- The administrative interface (to subscribe or unsubscribe) can
+The administrative interface (to subscribe or unsubscribe) can
be found at:
http://vger.kernel.org/vger-lists.html#netdev
Donald Becker's Ethernet Drivers and diag programs may be found at :
- - http://web.archive.org/web/*/http://www.scyld.com/network/
+
+ - http://web.archive.org/web/%2E/http://www.scyld.com/network/
You will also find a lot of information regarding Ethernet, NWay, MII,
etc. at www.scyld.com.
-
--- END --
diff --git a/Documentation/networking/caif/caif.rst b/Documentation/networking/caif/caif.rst
index 07afc8063d4d..a07213030ccf 100644
--- a/Documentation/networking/caif/caif.rst
+++ b/Documentation/networking/caif/caif.rst
@@ -1,5 +1,3 @@
-:orphan:
-
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
diff --git a/Documentation/networking/caif/index.rst b/Documentation/networking/caif/index.rst
new file mode 100644
index 000000000000..86e5b7832ec3
--- /dev/null
+++ b/Documentation/networking/caif/index.rst
@@ -0,0 +1,13 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+CAIF
+====
+
+Contents:
+
+.. toctree::
+ :maxdepth: 2
+
+ linux_caif
+ caif
+ spi_porting
diff --git a/Documentation/networking/caif/Linux-CAIF.txt b/Documentation/networking/caif/linux_caif.rst
index 0aa4bd381bec..a0480862ab8c 100644
--- a/Documentation/networking/caif/Linux-CAIF.txt
+++ b/Documentation/networking/caif/linux_caif.rst
@@ -1,12 +1,19 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+==========
Linux CAIF
-===========
-copyright (C) ST-Ericsson AB 2010
-Author: Sjur Brendeland/ sjur.brandeland@stericsson.com
-License terms: GNU General Public License (GPL) version 2
+==========
+
+Copyright |copy| ST-Ericsson AB 2010
+
+:Author: Sjur Brendeland/ sjur.brandeland@stericsson.com
+:License terms: GNU General Public License (GPL) version 2
Introduction
-------------
+============
+
CAIF is a MUX protocol used by ST-Ericsson cellular modems for
communication between Modem and host. The host processes can open virtual AT
channels, initiate GPRS Data connections, Video channels and Utility Channels.
@@ -16,13 +23,16 @@ ST-Ericsson modems support a number of transports between modem
and host. Currently, UART and Loopback are available for Linux.
-Architecture:
-------------
+Architecture
+============
+
The implementation of CAIF is divided into:
+
* CAIF Socket Layer and GPRS IP Interface.
* CAIF Core Protocol Implementation
* CAIF Link Layer, implemented as NET devices.
+::
RTNL
!
@@ -46,12 +56,12 @@ The implementation of CAIF is divided into:
-I M P L E M E N T A T I O N
-===========================
+Implementation
+==============
CAIF Core Protocol Layer
-=========================================
+------------------------
CAIF Core layer implements the CAIF protocol as defined by ST-Ericsson.
It implements the CAIF protocol stack in a layered approach, where
@@ -59,8 +69,11 @@ each layer described in the specification is implemented as a separate layer.
The architecture is inspired by the design patterns "Protocol Layer" and
"Protocol Packet".
-== CAIF structure ==
+CAIF structure
+^^^^^^^^^^^^^^
+
The Core CAIF implementation contains:
+
- Simple implementation of CAIF.
- Layered architecture (a la Streams), each layer in the CAIF
specification is implemented in a separate c-file.
@@ -73,7 +86,8 @@ The Core CAIF implementation contains:
to the called function (except for framing layers' receive function)
Layered Architecture
---------------------
+====================
+
The CAIF protocol can be divided into two parts: Support functions and Protocol
Implementation. The support functions include:
@@ -112,7 +126,7 @@ The CAIF Protocol implementation contains:
- CFSERL CAIF Serial layer. Handles concatenation/split of frames
into CAIF Frames with correct length.
-
+::
+---------+
| Config |
@@ -143,18 +157,24 @@ The CAIF Protocol implementation contains:
In this layered approach the following "rules" apply.
+
- All layers embed the same structure "struct cflayer"
- A layer does not depend on any other layer's private data.
- - Layers are stacked by setting the pointers
+ - Layers are stacked by setting the pointers::
+
layer->up , layer->dn
- - In order to send data upwards, each layer should do
+
+ - In order to send data upwards, each layer should do::
+
layer->up->receive(layer->up, packet);
- - In order to send data downwards, each layer should do
+
+ - In order to send data downwards, each layer should do::
+
layer->dn->transmit(layer->dn, packet);
CAIF Socket and IP interface
-===========================
+============================
The IP interface and CAIF socket API are implemented on top of the
CAIF Core protocol. The IP Interface and CAIF socket have an instance of
diff --git a/Documentation/networking/caif/spi_porting.rst b/Documentation/networking/caif/spi_porting.rst
new file mode 100644
index 000000000000..d49f874b20ac
--- /dev/null
+++ b/Documentation/networking/caif/spi_porting.rst
@@ -0,0 +1,229 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+CAIF SPI porting
+================
+
+CAIF SPI basics
+===============
+
+Running CAIF over SPI needs some extra setup, owing to the nature of SPI.
+Two extra GPIOs have been added in order to negotiate the transfers
+between the master and the slave. The minimum requirement for running
+CAIF over SPI is a SPI slave chip and two GPIOs (more details below).
+Please note that running as a slave implies that you need to keep up
+with the master clock. An overrun or underrun event is fatal.
+
+CAIF SPI framework
+==================
+
+To make porting as easy as possible, the CAIF SPI has been divided in
+two parts. The first part (called the interface part) deals with all
+generic functionality such as length framing, SPI frame negotiation
+and SPI frame delivery and transmission. The other part is the CAIF
+SPI slave device part, which is the module that you have to write if
+you want to run SPI CAIF on a new hardware. This part takes care of
+the physical hardware, both with regard to SPI and to GPIOs.
+
+- Implementing a CAIF SPI device:
+
+ - Functionality provided by the CAIF SPI slave device:
+
+ In order to implement a SPI device you will, as a minimum,
+ need to implement the following
+ functions:
+
+ ::
+
+ int (*init_xfer) (struct cfspi_xfer * xfer, struct cfspi_dev *dev):
+
+ This function is called by the CAIF SPI interface to give
+ you a chance to set up your hardware to be ready to receive
+ a stream of data from the master. The xfer structure contains
+ both physical and logical addresses, as well as the total length
+ of the transfer in both directions.The dev parameter can be used
+ to map to different CAIF SPI slave devices.
+
+ ::
+
+ void (*sig_xfer) (bool xfer, struct cfspi_dev *dev):
+
+ This function is called by the CAIF SPI interface when the output
+ (SPI_INT) GPIO needs to change state. The boolean value of the xfer
+ variable indicates whether the GPIO should be asserted (HIGH) or
+ deasserted (LOW). The dev parameter can be used to map to different CAIF
+ SPI slave devices.
+
+ - Functionality provided by the CAIF SPI interface:
+
+ ::
+
+ void (*ss_cb) (bool assert, struct cfspi_ifc *ifc);
+
+ This function is called by the CAIF SPI slave device in order to
+ signal a change of state of the input GPIO (SS) to the interface.
+ Only active edges are mandatory to be reported.
+ This function can be called from IRQ context (recommended in order
+ not to introduce latency). The ifc parameter should be the pointer
+ returned from the platform probe function in the SPI device structure.
+
+ ::
+
+ void (*xfer_done_cb) (struct cfspi_ifc *ifc);
+
+ This function is called by the CAIF SPI slave device in order to
+ report that a transfer is completed. This function should only be
+ called once both the transmission and the reception are completed.
+ This function can be called from IRQ context (recommended in order
+ not to introduce latency). The ifc parameter should be the pointer
+ returned from the platform probe function in the SPI device structure.
+
+ - Connecting the bits and pieces:
+
+ - Filling in the SPI slave device structure:
+
+ Connect the necessary callback functions.
+
+ Indicate clock speed (used to calculate toggle delays).
+
+ Chose a suitable name (helps debugging if you use several CAIF
+ SPI slave devices).
+
+ Assign your private data (can be used to map to your
+ structure).
+
+ - Filling in the SPI slave platform device structure:
+
+ Add name of driver to connect to ("cfspi_sspi").
+
+ Assign the SPI slave device structure as platform data.
+
+Padding
+=======
+
+In order to optimize throughput, a number of SPI padding options are provided.
+Padding can be enabled independently for uplink and downlink transfers.
+Padding can be enabled for the head, the tail and for the total frame size.
+The padding needs to be correctly configured on both sides of the link.
+The padding can be changed via module parameters in cfspi_sspi.c or via
+the sysfs directory of the cfspi_sspi driver (before device registration).
+
+- CAIF SPI device template::
+
+ /*
+ * Copyright (C) ST-Ericsson AB 2010
+ * Author: Daniel Martensson / Daniel.Martensson@stericsson.com
+ * License terms: GNU General Public License (GPL), version 2.
+ *
+ */
+
+ #include <linux/init.h>
+ #include <linux/module.h>
+ #include <linux/device.h>
+ #include <linux/wait.h>
+ #include <linux/interrupt.h>
+ #include <linux/dma-mapping.h>
+ #include <net/caif/caif_spi.h>
+
+ MODULE_LICENSE("GPL");
+
+ struct sspi_struct {
+ struct cfspi_dev sdev;
+ struct cfspi_xfer *xfer;
+ };
+
+ static struct sspi_struct slave;
+ static struct platform_device slave_device;
+
+ static irqreturn_t sspi_irq(int irq, void *arg)
+ {
+ /* You only need to trigger on an edge to the active state of the
+ * SS signal. Once a edge is detected, the ss_cb() function should be
+ * called with the parameter assert set to true. It is OK
+ * (and even advised) to call the ss_cb() function in IRQ context in
+ * order not to add any delay. */
+
+ return IRQ_HANDLED;
+ }
+
+ static void sspi_complete(void *context)
+ {
+ /* Normally the DMA or the SPI framework will call you back
+ * in something similar to this. The only thing you need to
+ * do is to call the xfer_done_cb() function, providing the pointer
+ * to the CAIF SPI interface. It is OK to call this function
+ * from IRQ context. */
+ }
+
+ static int sspi_init_xfer(struct cfspi_xfer *xfer, struct cfspi_dev *dev)
+ {
+ /* Store transfer info. For a normal implementation you should
+ * set up your DMA here and make sure that you are ready to
+ * receive the data from the master SPI. */
+
+ struct sspi_struct *sspi = (struct sspi_struct *)dev->priv;
+
+ sspi->xfer = xfer;
+
+ return 0;
+ }
+
+ void sspi_sig_xfer(bool xfer, struct cfspi_dev *dev)
+ {
+ /* If xfer is true then you should assert the SPI_INT to indicate to
+ * the master that you are ready to receive the data from the master
+ * SPI. If xfer is false then you should de-assert SPI_INT to indicate
+ * that the transfer is done.
+ */
+
+ struct sspi_struct *sspi = (struct sspi_struct *)dev->priv;
+ }
+
+ static void sspi_release(struct device *dev)
+ {
+ /*
+ * Here you should release your SPI device resources.
+ */
+ }
+
+ static int __init sspi_init(void)
+ {
+ /* Here you should initialize your SPI device by providing the
+ * necessary functions, clock speed, name and private data. Once
+ * done, you can register your device with the
+ * platform_device_register() function. This function will return
+ * with the CAIF SPI interface initialized. This is probably also
+ * the place where you should set up your GPIOs, interrupts and SPI
+ * resources. */
+
+ int res = 0;
+
+ /* Initialize slave device. */
+ slave.sdev.init_xfer = sspi_init_xfer;
+ slave.sdev.sig_xfer = sspi_sig_xfer;
+ slave.sdev.clk_mhz = 13;
+ slave.sdev.priv = &slave;
+ slave.sdev.name = "spi_sspi";
+ slave_device.dev.release = sspi_release;
+
+ /* Initialize platform device. */
+ slave_device.name = "cfspi_sspi";
+ slave_device.dev.platform_data = &slave.sdev;
+
+ /* Register platform device. */
+ res = platform_device_register(&slave_device);
+ if (res) {
+ printk(KERN_WARNING "sspi_init: failed to register dev.\n");
+ return -ENODEV;
+ }
+
+ return res;
+ }
+
+ static void __exit sspi_exit(void)
+ {
+ platform_device_del(&slave_device);
+ }
+
+ module_init(sspi_init);
+ module_exit(sspi_exit);
diff --git a/Documentation/networking/caif/spi_porting.txt b/Documentation/networking/caif/spi_porting.txt
deleted file mode 100644
index 9efd0687dc4c..000000000000
--- a/Documentation/networking/caif/spi_porting.txt
+++ /dev/null
@@ -1,208 +0,0 @@
-- CAIF SPI porting -
-
-- CAIF SPI basics:
-
-Running CAIF over SPI needs some extra setup, owing to the nature of SPI.
-Two extra GPIOs have been added in order to negotiate the transfers
- between the master and the slave. The minimum requirement for running
-CAIF over SPI is a SPI slave chip and two GPIOs (more details below).
-Please note that running as a slave implies that you need to keep up
-with the master clock. An overrun or underrun event is fatal.
-
-- CAIF SPI framework:
-
-To make porting as easy as possible, the CAIF SPI has been divided in
-two parts. The first part (called the interface part) deals with all
-generic functionality such as length framing, SPI frame negotiation
-and SPI frame delivery and transmission. The other part is the CAIF
-SPI slave device part, which is the module that you have to write if
-you want to run SPI CAIF on a new hardware. This part takes care of
-the physical hardware, both with regard to SPI and to GPIOs.
-
-- Implementing a CAIF SPI device:
-
- - Functionality provided by the CAIF SPI slave device:
-
- In order to implement a SPI device you will, as a minimum,
- need to implement the following
- functions:
-
- int (*init_xfer) (struct cfspi_xfer * xfer, struct cfspi_dev *dev):
-
- This function is called by the CAIF SPI interface to give
- you a chance to set up your hardware to be ready to receive
- a stream of data from the master. The xfer structure contains
- both physical and logical addresses, as well as the total length
- of the transfer in both directions.The dev parameter can be used
- to map to different CAIF SPI slave devices.
-
- void (*sig_xfer) (bool xfer, struct cfspi_dev *dev):
-
- This function is called by the CAIF SPI interface when the output
- (SPI_INT) GPIO needs to change state. The boolean value of the xfer
- variable indicates whether the GPIO should be asserted (HIGH) or
- deasserted (LOW). The dev parameter can be used to map to different CAIF
- SPI slave devices.
-
- - Functionality provided by the CAIF SPI interface:
-
- void (*ss_cb) (bool assert, struct cfspi_ifc *ifc);
-
- This function is called by the CAIF SPI slave device in order to
- signal a change of state of the input GPIO (SS) to the interface.
- Only active edges are mandatory to be reported.
- This function can be called from IRQ context (recommended in order
- not to introduce latency). The ifc parameter should be the pointer
- returned from the platform probe function in the SPI device structure.
-
- void (*xfer_done_cb) (struct cfspi_ifc *ifc);
-
- This function is called by the CAIF SPI slave device in order to
- report that a transfer is completed. This function should only be
- called once both the transmission and the reception are completed.
- This function can be called from IRQ context (recommended in order
- not to introduce latency). The ifc parameter should be the pointer
- returned from the platform probe function in the SPI device structure.
-
- - Connecting the bits and pieces:
-
- - Filling in the SPI slave device structure:
-
- Connect the necessary callback functions.
- Indicate clock speed (used to calculate toggle delays).
- Chose a suitable name (helps debugging if you use several CAIF
- SPI slave devices).
- Assign your private data (can be used to map to your structure).
-
- - Filling in the SPI slave platform device structure:
- Add name of driver to connect to ("cfspi_sspi").
- Assign the SPI slave device structure as platform data.
-
-- Padding:
-
-In order to optimize throughput, a number of SPI padding options are provided.
-Padding can be enabled independently for uplink and downlink transfers.
-Padding can be enabled for the head, the tail and for the total frame size.
-The padding needs to be correctly configured on both sides of the link.
-The padding can be changed via module parameters in cfspi_sspi.c or via
-the sysfs directory of the cfspi_sspi driver (before device registration).
-
-- CAIF SPI device template:
-
-/*
- * Copyright (C) ST-Ericsson AB 2010
- * Author: Daniel Martensson / Daniel.Martensson@stericsson.com
- * License terms: GNU General Public License (GPL), version 2.
- *
- */
-
-#include <linux/init.h>
-#include <linux/module.h>
-#include <linux/device.h>
-#include <linux/wait.h>
-#include <linux/interrupt.h>
-#include <linux/dma-mapping.h>
-#include <net/caif/caif_spi.h>
-
-MODULE_LICENSE("GPL");
-
-struct sspi_struct {
- struct cfspi_dev sdev;
- struct cfspi_xfer *xfer;
-};
-
-static struct sspi_struct slave;
-static struct platform_device slave_device;
-
-static irqreturn_t sspi_irq(int irq, void *arg)
-{
- /* You only need to trigger on an edge to the active state of the
- * SS signal. Once a edge is detected, the ss_cb() function should be
- * called with the parameter assert set to true. It is OK
- * (and even advised) to call the ss_cb() function in IRQ context in
- * order not to add any delay. */
-
- return IRQ_HANDLED;
-}
-
-static void sspi_complete(void *context)
-{
- /* Normally the DMA or the SPI framework will call you back
- * in something similar to this. The only thing you need to
- * do is to call the xfer_done_cb() function, providing the pointer
- * to the CAIF SPI interface. It is OK to call this function
- * from IRQ context. */
-}
-
-static int sspi_init_xfer(struct cfspi_xfer *xfer, struct cfspi_dev *dev)
-{
- /* Store transfer info. For a normal implementation you should
- * set up your DMA here and make sure that you are ready to
- * receive the data from the master SPI. */
-
- struct sspi_struct *sspi = (struct sspi_struct *)dev->priv;
-
- sspi->xfer = xfer;
-
- return 0;
-}
-
-void sspi_sig_xfer(bool xfer, struct cfspi_dev *dev)
-{
- /* If xfer is true then you should assert the SPI_INT to indicate to
- * the master that you are ready to receive the data from the master
- * SPI. If xfer is false then you should de-assert SPI_INT to indicate
- * that the transfer is done.
- */
-
- struct sspi_struct *sspi = (struct sspi_struct *)dev->priv;
-}
-
-static void sspi_release(struct device *dev)
-{
- /*
- * Here you should release your SPI device resources.
- */
-}
-
-static int __init sspi_init(void)
-{
- /* Here you should initialize your SPI device by providing the
- * necessary functions, clock speed, name and private data. Once
- * done, you can register your device with the
- * platform_device_register() function. This function will return
- * with the CAIF SPI interface initialized. This is probably also
- * the place where you should set up your GPIOs, interrupts and SPI
- * resources. */
-
- int res = 0;
-
- /* Initialize slave device. */
- slave.sdev.init_xfer = sspi_init_xfer;
- slave.sdev.sig_xfer = sspi_sig_xfer;
- slave.sdev.clk_mhz = 13;
- slave.sdev.priv = &slave;
- slave.sdev.name = "spi_sspi";
- slave_device.dev.release = sspi_release;
-
- /* Initialize platform device. */
- slave_device.name = "cfspi_sspi";
- slave_device.dev.platform_data = &slave.sdev;
-
- /* Register platform device. */
- res = platform_device_register(&slave_device);
- if (res) {
- printk(KERN_WARNING "sspi_init: failed to register dev.\n");
- return -ENODEV;
- }
-
- return res;
-}
-
-static void __exit sspi_exit(void)
-{
- platform_device_del(&slave_device);
-}
-
-module_init(sspi_init);
-module_exit(sspi_exit);
diff --git a/Documentation/networking/can.rst b/Documentation/networking/can.rst
index 2fd0b51a8c52..ff05cbd05e0d 100644
--- a/Documentation/networking/can.rst
+++ b/Documentation/networking/can.rst
@@ -1058,7 +1058,7 @@ drivers you mainly have to deal with:
- TX: Put the CAN frame from the socket buffer to the CAN controller.
- RX: Put the CAN frame from the CAN controller to the socket buffer.
-See e.g. at Documentation/networking/netdevices.txt . The differences
+See e.g. at Documentation/networking/netdevices.rst . The differences
for writing CAN network device driver are described below:
diff --git a/Documentation/networking/cdc_mbim.txt b/Documentation/networking/cdc_mbim.rst
index 4e68f0bc5dba..0048409c06b4 100644
--- a/Documentation/networking/cdc_mbim.txt
+++ b/Documentation/networking/cdc_mbim.rst
@@ -1,5 +1,8 @@
- cdc_mbim - Driver for CDC MBIM Mobile Broadband modems
- ========================================================
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================================
+cdc_mbim - Driver for CDC MBIM Mobile Broadband modems
+======================================================
The cdc_mbim driver supports USB devices conforming to the "Universal
Serial Bus Communications Class Subclass Specification for Mobile
@@ -19,9 +22,9 @@ by a cdc_ncm driver parameter:
prefer_mbim
-----------
-Type: Boolean
-Valid Range: N/Y (0-1)
-Default Value: Y (MBIM is preferred)
+:Type: Boolean
+:Valid Range: N/Y (0-1)
+:Default Value: Y (MBIM is preferred)
This parameter sets the system policy for NCM/MBIM functions. Such
functions will be handled by either the cdc_ncm driver or the cdc_mbim
@@ -44,11 +47,13 @@ userspace MBIM management application always is required to enable a
MBIM function.
Such userspace applications includes, but are not limited to:
+
- mbimcli (included with the libmbim [3] library), and
- ModemManager [4]
Establishing a MBIM IP session reequires at least these actions by the
management application:
+
- open the control channel
- configure network connection settings
- connect to network
@@ -76,7 +81,7 @@ complies with all the control channel requirements in [1].
The cdc-wdmX device is created as a child of the MBIM control
interface USB device. The character device associated with a specific
-MBIM function can be looked up using sysfs. For example:
+MBIM function can be looked up using sysfs. For example::
bjorn@nemi:~$ ls /sys/bus/usb/drivers/cdc_mbim/2-4:2.12/usbmisc
cdc-wdm0
@@ -119,13 +124,15 @@ negotiated control message size.
/dev/cdc-wdmX ioctl()
---------------------
+---------------------
IOCTL_WDM_MAX_COMMAND: Get Maximum Command Size
This ioctl returns the wMaxControlMessage field of the CDC MBIM
functional descriptor for MBIM devices. This is intended as a
convenience, eliminating the need to parse the USB descriptors from
userspace.
+::
+
#include <stdio.h>
#include <fcntl.h>
#include <sys/ioctl.h>
@@ -178,7 +185,7 @@ VLAN links prior to establishing MBIM IP sessions where the SessionId
is greater than 0. These links can be added by using the normal VLAN
kernel interfaces, either ioctl or netlink.
-For example, adding a link for a MBIM IP session with SessionId 3:
+For example, adding a link for a MBIM IP session with SessionId 3::
ip link add link wwan0 name wwan0.3 type vlan id 3
@@ -207,6 +214,7 @@ the stream to the end user in an appropriate way for the stream type.
The network device ABI requires a dummy ethernet header for every DSS
data frame being transported. The contents of this header is
arbitrary, with the following exceptions:
+
- TX frames using an IP protocol (0x0800 or 0x86dd) will be dropped
- RX frames will have the protocol field set to ETH_P_802_3 (but will
not be properly formatted 802.3 frames)
@@ -218,7 +226,7 @@ adding the dummy ethernet header on TX and stripping it on RX.
This is a simple example using tools commonly available, exporting
DssSessionId 5 as a pty character device pointed to by a /dev/nmea
-symlink:
+symlink::
ip link add link wwan0 name wwan0.dss5 type vlan id 261
ip link set dev wwan0.dss5 up
@@ -236,7 +244,7 @@ map frames to the correct DSS session and adding 18 byte VLAN ethernet
headers with the appropriate tag on TX. In this case using a socket
filter is recommended, matching only the DSS VLAN subset. This avoid
unnecessary copying of unrelated IP session data to userspace. For
-example:
+example::
static struct sock_filter dssfilter[] = {
/* use special negative offsets to get VLAN tag */
@@ -249,11 +257,11 @@ example:
BPF_JUMP(BPF_JMP|BPF_JGE|BPF_K, 512, 3, 0), /* 511 is last DSS VLAN */
/* verify ethertype */
- BPF_STMT(BPF_LD|BPF_H|BPF_ABS, 2 * ETH_ALEN),
- BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, ETH_P_802_3, 0, 1),
+ BPF_STMT(BPF_LD|BPF_H|BPF_ABS, 2 * ETH_ALEN),
+ BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, ETH_P_802_3, 0, 1),
- BPF_STMT(BPF_RET|BPF_K, (u_int)-1), /* accept */
- BPF_STMT(BPF_RET|BPF_K, 0), /* ignore */
+ BPF_STMT(BPF_RET|BPF_K, (u_int)-1), /* accept */
+ BPF_STMT(BPF_RET|BPF_K, 0), /* ignore */
};
@@ -266,6 +274,7 @@ network device.
This mapping implies a few restrictions on multiplexed IPS and DSS
sessions, which may not always be practical:
+
- no IPS or DSS session can use a frame size greater than the MTU on
IP session 0
- no IPS or DSS session can be in the up state unless the network
@@ -280,7 +289,7 @@ device.
Tip: It might be less confusing to the end user to name this VLAN
subdevice after the MBIM SessionID instead of the VLAN ID. For
-example:
+example::
ip link add link wwan0 name wwan0.0 type vlan id 4094
@@ -290,7 +299,7 @@ VLAN mapping
Summarizing the cdc_mbim driver mapping described above, we have this
relationship between VLAN tags on the wwanY network device and MBIM
-sessions on the shared USB data channel:
+sessions on the shared USB data channel::
VLAN ID MBIM type MBIM SessionID Notes
---------------------------------------------------------
@@ -310,30 +319,37 @@ sessions on the shared USB data channel:
References
==========
-[1] USB Implementers Forum, Inc. - "Universal Serial Bus
- Communications Class Subclass Specification for Mobile Broadband
- Interface Model", Revision 1.0 (Errata 1), May 1, 2013
+ 1) USB Implementers Forum, Inc. - "Universal Serial Bus
+ Communications Class Subclass Specification for Mobile Broadband
+ Interface Model", Revision 1.0 (Errata 1), May 1, 2013
+
- http://www.usb.org/developers/docs/devclass_docs/
-[2] USB Implementers Forum, Inc. - "Universal Serial Bus
- Communications Class Subclass Specifications for Network Control
- Model Devices", Revision 1.0 (Errata 1), November 24, 2010
+ 2) USB Implementers Forum, Inc. - "Universal Serial Bus
+ Communications Class Subclass Specifications for Network Control
+ Model Devices", Revision 1.0 (Errata 1), November 24, 2010
+
- http://www.usb.org/developers/docs/devclass_docs/
-[3] libmbim - "a glib-based library for talking to WWAN modems and
- devices which speak the Mobile Interface Broadband Model (MBIM)
- protocol"
+ 3) libmbim - "a glib-based library for talking to WWAN modems and
+ devices which speak the Mobile Interface Broadband Model (MBIM)
+ protocol"
+
- http://www.freedesktop.org/wiki/Software/libmbim/
-[4] ModemManager - "a DBus-activated daemon which controls mobile
- broadband (2G/3G/4G) devices and connections"
+ 4) ModemManager - "a DBus-activated daemon which controls mobile
+ broadband (2G/3G/4G) devices and connections"
+
- http://www.freedesktop.org/wiki/Software/ModemManager/
-[5] "MBIM (Mobile Broadband Interface Model) Registry"
+ 5) "MBIM (Mobile Broadband Interface Model) Registry"
+
- http://compliance.usb.org/mbim/
-[6] "/sys/kernel/debug/usb/devices output format"
+ 6) "/sys/kernel/debug/usb/devices output format"
+
- Documentation/driver-api/usb/usb.rst
-[7] "/sys/bus/usb/devices/.../descriptors"
+ 7) "/sys/bus/usb/devices/.../descriptors"
+
- Documentation/ABI/stable/sysfs-bus-usb
diff --git a/Documentation/networking/checksum-offloads.rst b/Documentation/networking/checksum-offloads.rst
index 905c8a84b103..69b23cf6879e 100644
--- a/Documentation/networking/checksum-offloads.rst
+++ b/Documentation/networking/checksum-offloads.rst
@@ -59,7 +59,7 @@ recomputed for each resulting segment. See the skbuff.h comment (section 'E')
for more details.
A driver declares its offload capabilities in netdev->hw_features; see
-Documentation/networking/netdev-features.txt for more. Note that a device
+Documentation/networking/netdev-features.rst for more. Note that a device
which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and
csum_offset given in the SKB; if it tries to deduce these itself in hardware
(as some NICs do) the driver should check that the values in the SKB match
diff --git a/Documentation/networking/cops.rst b/Documentation/networking/cops.rst
new file mode 100644
index 000000000000..964ba80599a9
--- /dev/null
+++ b/Documentation/networking/cops.rst
@@ -0,0 +1,80 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================================
+The COPS LocalTalk Linux driver (cops.c)
+========================================
+
+By Jay Schulist <jschlst@samba.org>
+
+This driver has two modes and they are: Dayna mode and Tangent mode.
+Each mode corresponds with the type of card. It has been found
+that there are 2 main types of cards and all other cards are
+the same and just have different names or only have minor differences
+such as more IO ports. As this driver is tested it will
+become more clear exactly what cards are supported.
+
+Right now these cards are known to work with the COPS driver. The
+LT-200 cards work in a somewhat more limited capacity than the
+DL200 cards, which work very well and are in use by many people.
+
+TANGENT driver mode:
+ - Tangent ATB-II, Novell NL-1000, Daystar Digital LT-200
+
+DAYNA driver mode:
+ - Dayna DL2000/DaynaTalk PC (Half Length), COPS LT-95,
+ - Farallon PhoneNET PC III, Farallon PhoneNET PC II
+
+Other cards possibly supported mode unknown though:
+ - Dayna DL2000 (Full length)
+
+The COPS driver defaults to using Dayna mode. To change the driver's
+mode if you built a driver with dual support use board_type=1 or
+board_type=2 for Dayna or Tangent with insmod.
+
+Operation/loading of the driver
+===============================
+
+Use modprobe like this: /sbin/modprobe cops.o (IO #) (IRQ #)
+If you do not specify any options the driver will try and use the IO = 0x240,
+IRQ = 5. As of right now I would only use IRQ 5 for the card, if autoprobing.
+
+To load multiple COPS driver Localtalk cards you can do one of the following::
+
+ insmod cops io=0x240 irq=5
+ insmod -o cops2 cops io=0x260 irq=3
+
+Or in lilo.conf put something like this::
+
+ append="ether=5,0x240,lt0 ether=3,0x260,lt1"
+
+Then bring up the interface with ifconfig. It will look something like this::
+
+ lt0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-F7-00-00-00-00-00-00-00-00
+ inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
+ UP BROADCAST RUNNING NOARP MULTICAST MTU:600 Metric:1
+ RX packets:0 errors:0 dropped:0 overruns:0 frame:0
+ TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 coll:0
+
+Netatalk Configuration
+======================
+
+You will need to configure atalkd with something like the following to make
+it work with the cops.c driver.
+
+* For single LTalk card use::
+
+ dummy -seed -phase 2 -net 2000 -addr 2000.10 -zone "1033"
+ lt0 -seed -phase 1 -net 1000 -addr 1000.50 -zone "1033"
+
+* For multiple cards, Ethernet and LocalTalk::
+
+ eth0 -seed -phase 2 -net 3000 -addr 3000.20 -zone "1033"
+ lt0 -seed -phase 1 -net 1000 -addr 1000.50 -zone "1033"
+
+* For multiple LocalTalk cards, and an Ethernet card.
+
+* Order seems to matter here, Ethernet last::
+
+ lt0 -seed -phase 1 -net 1000 -addr 1000.10 -zone "LocalTalk1"
+ lt1 -seed -phase 1 -net 2000 -addr 2000.20 -zone "LocalTalk2"
+ eth0 -seed -phase 2 -net 3000 -addr 3000.30 -zone "EtherTalk"
diff --git a/Documentation/networking/cops.txt b/Documentation/networking/cops.txt
deleted file mode 100644
index 3e344b448e07..000000000000
--- a/Documentation/networking/cops.txt
+++ /dev/null
@@ -1,63 +0,0 @@
-Text File for the COPS LocalTalk Linux driver (cops.c).
- By Jay Schulist <jschlst@samba.org>
-
-This driver has two modes and they are: Dayna mode and Tangent mode.
-Each mode corresponds with the type of card. It has been found
-that there are 2 main types of cards and all other cards are
-the same and just have different names or only have minor differences
-such as more IO ports. As this driver is tested it will
-become more clear exactly what cards are supported.
-
-Right now these cards are known to work with the COPS driver. The
-LT-200 cards work in a somewhat more limited capacity than the
-DL200 cards, which work very well and are in use by many people.
-
-TANGENT driver mode:
- Tangent ATB-II, Novell NL-1000, Daystar Digital LT-200
-DAYNA driver mode:
- Dayna DL2000/DaynaTalk PC (Half Length), COPS LT-95,
- Farallon PhoneNET PC III, Farallon PhoneNET PC II
-Other cards possibly supported mode unknown though:
- Dayna DL2000 (Full length)
-
-The COPS driver defaults to using Dayna mode. To change the driver's
-mode if you built a driver with dual support use board_type=1 or
-board_type=2 for Dayna or Tangent with insmod.
-
-** Operation/loading of the driver.
-Use modprobe like this: /sbin/modprobe cops.o (IO #) (IRQ #)
-If you do not specify any options the driver will try and use the IO = 0x240,
-IRQ = 5. As of right now I would only use IRQ 5 for the card, if autoprobing.
-
-To load multiple COPS driver Localtalk cards you can do one of the following.
-
-insmod cops io=0x240 irq=5
-insmod -o cops2 cops io=0x260 irq=3
-
-Or in lilo.conf put something like this:
- append="ether=5,0x240,lt0 ether=3,0x260,lt1"
-
-Then bring up the interface with ifconfig. It will look something like this:
-lt0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-F7-00-00-00-00-00-00-00-00
- inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
- UP BROADCAST RUNNING NOARP MULTICAST MTU:600 Metric:1
- RX packets:0 errors:0 dropped:0 overruns:0 frame:0
- TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 coll:0
-
-** Netatalk Configuration
-You will need to configure atalkd with something like the following to make
-it work with the cops.c driver.
-
-* For single LTalk card use.
-dummy -seed -phase 2 -net 2000 -addr 2000.10 -zone "1033"
-lt0 -seed -phase 1 -net 1000 -addr 1000.50 -zone "1033"
-
-* For multiple cards, Ethernet and LocalTalk.
-eth0 -seed -phase 2 -net 3000 -addr 3000.20 -zone "1033"
-lt0 -seed -phase 1 -net 1000 -addr 1000.50 -zone "1033"
-
-* For multiple LocalTalk cards, and an Ethernet card.
-* Order seems to matter here, Ethernet last.
-lt0 -seed -phase 1 -net 1000 -addr 1000.10 -zone "LocalTalk1"
-lt1 -seed -phase 1 -net 2000 -addr 2000.20 -zone "LocalTalk2"
-eth0 -seed -phase 2 -net 3000 -addr 3000.30 -zone "EtherTalk"
diff --git a/Documentation/networking/cxacru.txt b/Documentation/networking/cxacru.rst
index 2cce04457b4d..6088af2ffeda 100644
--- a/Documentation/networking/cxacru.txt
+++ b/Documentation/networking/cxacru.rst
@@ -1,3 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================
+ATM cxacru device driver
+========================
+
Firmware is required for this device: http://accessrunner.sourceforge.net/
While it is capable of managing/maintaining the ADSL connection without the
@@ -19,29 +25,35 @@ several sysfs attribute files for retrieving device statistics:
* adsl_headend
* adsl_headend_environment
- Information about the remote headend.
+
+ - Information about the remote headend.
* adsl_config
- Configuration writing interface.
- Write parameters in hexadecimal format <index>=<value>,
- separated by whitespace, e.g.:
+
+ - Configuration writing interface.
+ - Write parameters in hexadecimal format <index>=<value>,
+ separated by whitespace, e.g.:
+
"1=0 a=5"
- Up to 7 parameters at a time will be sent and the modem will restart
- the ADSL connection when any value is set. These are logged for future
- reference.
+
+ - Up to 7 parameters at a time will be sent and the modem will restart
+ the ADSL connection when any value is set. These are logged for future
+ reference.
* downstream_attenuation (dB)
* downstream_bits_per_frame
* downstream_rate (kbps)
* downstream_snr_margin (dB)
- Downstream stats.
+
+ - Downstream stats.
* upstream_attenuation (dB)
* upstream_bits_per_frame
* upstream_rate (kbps)
* upstream_snr_margin (dB)
* transmitter_power (dBm/Hz)
- Upstream stats.
+
+ - Upstream stats.
* downstream_crc_errors
* downstream_fec_errors
@@ -49,48 +61,56 @@ several sysfs attribute files for retrieving device statistics:
* upstream_crc_errors
* upstream_fec_errors
* upstream_hec_errors
- Error counts.
+
+ - Error counts.
* line_startable
- Indicates that ADSL support on the device
- is/can be enabled, see adsl_start.
+
+ - Indicates that ADSL support on the device
+ is/can be enabled, see adsl_start.
* line_status
- "initialising"
- "down"
- "attempting to activate"
- "training"
- "channel analysis"
- "exchange"
- "waiting"
- "up"
+
+ - "initialising"
+ - "down"
+ - "attempting to activate"
+ - "training"
+ - "channel analysis"
+ - "exchange"
+ - "waiting"
+ - "up"
Changes between "down" and "attempting to activate"
if there is no signal.
* link_status
- "not connected"
- "connected"
- "lost"
+
+ - "not connected"
+ - "connected"
+ - "lost"
* mac_address
* modulation
- "" (when not connected)
- "ANSI T1.413"
- "ITU-T G.992.1 (G.DMT)"
- "ITU-T G.992.2 (G.LITE)"
+
+ - "" (when not connected)
+ - "ANSI T1.413"
+ - "ITU-T G.992.1 (G.DMT)"
+ - "ITU-T G.992.2 (G.LITE)"
* startup_attempts
- Count of total attempts to initialise ADSL.
+
+ - Count of total attempts to initialise ADSL.
To enable/disable ADSL, the following can be written to the adsl_state file:
- "start"
- "stop
- "restart" (stops, waits 1.5s, then starts)
- "poll" (used to resume status polling if it was disabled due to failure)
-Changes in adsl/line state are reported via kernel log messages:
+ - "start"
+ - "stop
+ - "restart" (stops, waits 1.5s, then starts)
+ - "poll" (used to resume status polling if it was disabled due to failure)
+
+Changes in adsl/line state are reported via kernel log messages::
+
[4942145.150704] ATM dev 0: ADSL state: running
[4942243.663766] ATM dev 0: ADSL line: down
[4942249.665075] ATM dev 0: ADSL line: attempting to activate
diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.rst
index 55c575fcaf17..dde16be04456 100644
--- a/Documentation/networking/dccp.txt
+++ b/Documentation/networking/dccp.rst
@@ -1,16 +1,18 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
DCCP protocol
=============
-Contents
-========
-- Introduction
-- Missing features
-- Socket options
-- Sysctl variables
-- IOCTLs
-- Other tunables
-- Notes
+.. Contents
+ - Introduction
+ - Missing features
+ - Socket options
+ - Sysctl variables
+ - IOCTLs
+ - Other tunables
+ - Notes
Introduction
@@ -38,6 +40,7 @@ The Linux DCCP implementation does not currently support all the features that a
specified in RFCs 4340...42.
The known bugs are at:
+
http://www.linuxfoundation.org/collaborate/workgroups/networking/todo#DCCP
For more up-to-date versions of the DCCP implementation, please consider using
@@ -54,7 +57,8 @@ defined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special,
and a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an
u32 priority value as ancillary data to sendmsg(), where higher numbers indicate
a higher packet priority (similar to SO_PRIORITY). This ancillary data needs to
-be formatted using a cmsg(3) message header filled in as follows:
+be formatted using a cmsg(3) message header filled in as follows::
+
cmsg->cmsg_level = SOL_DCCP;
cmsg->cmsg_type = DCCP_SCM_PRIORITY;
cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t)); /* or CMSG_LEN(4) */
@@ -94,7 +98,7 @@ must be registered on the socket before calling connect() or listen().
DCCP_SOCKOPT_TX_CCID is read/write. It returns the current CCID (if set) or sets
the preference list for the TX CCID, using the same format as DCCP_SOCKOPT_CCID.
-Please note that the getsockopt argument type here is `int', not uint8_t.
+Please note that the getsockopt argument type here is ``int``, not uint8_t.
DCCP_SOCKOPT_RX_CCID is analogous to DCCP_SOCKOPT_TX_CCID, but for the RX CCID.
@@ -113,6 +117,7 @@ be enabled at the receiver, too with suitable choice of CsCov.
DCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the
range 0..15 are acceptable. The default setting is 0 (full coverage),
values between 1..15 indicate partial coverage.
+
DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it
sets a threshold, where again values 0..15 are acceptable. The default
of 0 means that all packets with a partial coverage will be discarded.
@@ -123,11 +128,13 @@ DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it
The following two options apply to CCID 3 exclusively and are getsockopt()-only.
In either case, a TFRC info struct (defined in <linux/tfrc.h>) is returned.
+
DCCP_SOCKOPT_CCID_RX_INFO
- Returns a `struct tfrc_rx_info' in optval; the buffer for optval and
+ Returns a ``struct tfrc_rx_info`` in optval; the buffer for optval and
optlen must be set to at least sizeof(struct tfrc_rx_info).
+
DCCP_SOCKOPT_CCID_TX_INFO
- Returns a `struct tfrc_tx_info' in optval; the buffer for optval and
+ Returns a ``struct tfrc_tx_info`` in optval; the buffer for optval and
optlen must be set to at least sizeof(struct tfrc_tx_info).
On unidirectional connections it is useful to close the unused half-connection
@@ -182,7 +189,7 @@ sync_ratelimit = 125 ms
IOCTLS
======
FIONREAD
- Works as in udp(7): returns in the `int' argument pointer the size of
+ Works as in udp(7): returns in the ``int`` argument pointer the size of
the next pending datagram in bytes, or 0 when no datagram is pending.
@@ -191,10 +198,12 @@ Other tunables
Per-route rto_min support
CCID-2 supports the RTAX_RTO_MIN per-route setting for the minimum value
of the RTO timer. This setting can be modified via the 'rto_min' option
- of iproute2; for example:
+ of iproute2; for example::
+
> ip route change 10.0.0.0/24 rto_min 250j dev wlan0
> ip route add 10.0.0.254/32 rto_min 800j dev wlan0
> ip route show dev wlan0
+
CCID-3 also supports the rto_min setting: it is used to define the lower
bound for the expiry of the nofeedback timer. This can be useful on LANs
with very low RTTs (e.g., loopback, Gbit ethernet).
diff --git a/Documentation/networking/dctcp.txt b/Documentation/networking/dctcp.rst
index 13a857753208..4cc8bb2dad50 100644
--- a/Documentation/networking/dctcp.txt
+++ b/Documentation/networking/dctcp.rst
@@ -1,11 +1,14 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
DCTCP (DataCenter TCP)
-----------------------
+======================
DCTCP is an enhancement to the TCP congestion control algorithm for data
center networks and leverages Explicit Congestion Notification (ECN) in
the data center network to provide multi-bit feedback to the end hosts.
-To enable it on end hosts:
+To enable it on end hosts::
sysctl -w net.ipv4.tcp_congestion_control=dctcp
sysctl -w net.ipv4.tcp_ecn_fallback=0 (optional)
@@ -25,14 +28,19 @@ SIGCOMM/SIGMETRICS papers:
i) Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye,
Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan:
- "Data Center TCP (DCTCP)", Data Center Networks session
+
+ "Data Center TCP (DCTCP)", Data Center Networks session"
+
Proc. ACM SIGCOMM, New Delhi, 2010.
+
http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf
http://www.sigcomm.org/ccr/papers/2010/October/1851275.1851192
ii) Mohammad Alizadeh, Adel Javanmard, and Balaji Prabhakar:
+
"Analysis of DCTCP: Stability, Convergence, and Fairness"
Proc. ACM SIGMETRICS, San Jose, 2011.
+
http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp_analysis-full.pdf
IETF informational draft:
diff --git a/Documentation/networking/decnet.txt b/Documentation/networking/decnet.rst
index d192f8b9948b..b8bc11ff8370 100644
--- a/Documentation/networking/decnet.txt
+++ b/Documentation/networking/decnet.rst
@@ -1,26 +1,31 @@
- Linux DECnet Networking Layer Information
- ===========================================
+.. SPDX-License-Identifier: GPL-2.0
-1) Other documentation....
+=========================================
+Linux DECnet Networking Layer Information
+=========================================
- o Project Home Pages
- http://www.chygwyn.com/ - Kernel info
- http://linux-decnet.sourceforge.net/ - Userland tools
- http://www.sourceforge.net/projects/linux-decnet/ - Status page
+1. Other documentation....
+==========================
-2) Configuring the kernel
+ - Project Home Pages
+ - http://www.chygwyn.com/ - Kernel info
+ - http://linux-decnet.sourceforge.net/ - Userland tools
+ - http://www.sourceforge.net/projects/linux-decnet/ - Status page
+
+2. Configuring the kernel
+=========================
Be sure to turn on the following options:
- CONFIG_DECNET (obviously)
- CONFIG_PROC_FS (to see what's going on)
- CONFIG_SYSCTL (for easy configuration)
+ - CONFIG_DECNET (obviously)
+ - CONFIG_PROC_FS (to see what's going on)
+ - CONFIG_SYSCTL (for easy configuration)
if you want to try out router support (not properly debugged yet)
you'll need the following options as well...
- CONFIG_DECNET_ROUTER (to be able to add/delete routes)
- CONFIG_NETFILTER (will be required for the DECnet routing daemon)
+ - CONFIG_DECNET_ROUTER (to be able to add/delete routes)
+ - CONFIG_NETFILTER (will be required for the DECnet routing daemon)
Don't turn on SIOCGIFCONF support for DECnet unless you are really sure
that you need it, in general you won't and it can cause ifconfig to
@@ -29,7 +34,7 @@ malfunction.
Run time configuration has changed slightly from the 2.4 system. If you
want to configure an endnode, then the simplified procedure is as follows:
- o Set the MAC address on your ethernet card before starting _any_ other
+ - Set the MAC address on your ethernet card before starting _any_ other
network protocols.
As soon as your network card is brought into the UP state, DECnet should
@@ -37,7 +42,8 @@ start working. If you need something more complicated or are unsure how
to set the MAC address, see the next section. Also all configurations which
worked with 2.4 will work under 2.5 with no change.
-3) Command line options
+3. Command line options
+=======================
You can set a DECnet address on the kernel command line for compatibility
with the 2.4 configuration procedure, but in general it's not needed any more.
@@ -56,7 +62,7 @@ interface then you won't see any entries in /proc/net/neigh for the local
host until such time as you start a connection. This doesn't affect the
operation of the local communications in any other way though.
-The kernel command line takes options looking like the following:
+The kernel command line takes options looking like the following::
decnet.addr=1,2
@@ -82,7 +88,7 @@ address of the node in order for it to be autoconfigured (and then appear in
FTP sites called dn2ethaddr which can compute the correct ethernet
address to use. The address can be set by ifconfig either before or
at the time the device is brought up. If you are using RedHat you can
-add the line:
+add the line::
MACADDR=AA:00:04:00:03:04
@@ -95,7 +101,7 @@ verify with iproute2).
The default device for routing can be set through the /proc filesystem
by setting /proc/sys/net/decnet/default_device to the
device you want DECnet to route packets out of when no specific route
-is available. Usually this will be eth0, for example:
+is available. Usually this will be eth0, for example::
echo -n "eth0" >/proc/sys/net/decnet/default_device
@@ -106,7 +112,9 @@ confirm that by looking in the default_device file of course.
There is a list of what the other files under /proc/sys/net/decnet/ do
on the kernel patch web site (shown above).
-4) Run time kernel configuration
+4. Run time kernel configuration
+================================
+
This is either done through the sysctl/proc interface (see the kernel web
pages for details on what the various options do) or through the iproute2
@@ -122,20 +130,21 @@ since its the _only_ way to add and delete routes currently. Eventually
there will be a routing daemon to send and receive routing messages for
each interface and update the kernel routing tables accordingly. The
routing daemon will use netfilter to listen to routing packets, and
-rtnetlink to update the kernels routing tables.
+rtnetlink to update the kernels routing tables.
The DECnet raw socket layer has been removed since it was there purely
for use by the routing daemon which will now use netfilter (a much cleaner
and more generic solution) instead.
-5) How can I tell if its working ?
+5. How can I tell if its working?
+=================================
Here is a quick guide of what to look for in order to know if your DECnet
kernel subsystem is working.
- Is the node address set (see /proc/sys/net/decnet/node_address)
- - Is the node of the correct type
- (see /proc/sys/net/decnet/conf/<dev>/forwarding)
+ - Is the node of the correct type
+ (see /proc/sys/net/decnet/conf/<dev>/forwarding)
- Is the Ethernet MAC address of each Ethernet card set to match
the DECnet address. If in doubt use the dn2ethaddr utility available
at the ftp archive.
@@ -160,7 +169,8 @@ kernel subsystem is working.
network, and see if you can obtain the same results.
- At this point you are on your own... :-)
-6) How to send a bug report
+6. How to send a bug report
+===========================
If you've found a bug and want to report it, then there are several things
you can do to help me work out exactly what it is that is wrong. Useful
@@ -175,18 +185,19 @@ information (_most_ of which _is_ _essential_) includes:
- How much data was being transferred ?
- Was the network congested ?
- How can the problem be reproduced ?
- - Can you use tcpdump to get a trace ? (N.B. Most (all?) versions of
+ - Can you use tcpdump to get a trace ? (N.B. Most (all?) versions of
tcpdump don't understand how to dump DECnet properly, so including
the hex listing of the packet contents is _essential_, usually the -x flag.
You may also need to increase the length grabbed with the -s flag. The
-e flag also provides very useful information (ethernet MAC addresses))
-7) MAC FAQ
+7. MAC FAQ
+==========
A quick FAQ on ethernet MAC addresses to explain how Linux and DECnet
-interact and how to get the best performance from your hardware.
+interact and how to get the best performance from your hardware.
-Ethernet cards are designed to normally only pass received network frames
+Ethernet cards are designed to normally only pass received network frames
to a host computer when they are addressed to it, or to the broadcast address.
Linux has an interface which allows the setting of extra addresses for
@@ -197,8 +208,8 @@ significant processor time and bus bandwidth can be used up on a busy
network (see the NAPI documentation for a longer explanation of these
effects).
-DECnet makes use of this interface to allow running DECnet on an ethernet
-card which has already been configured using TCP/IP (presumably using the
+DECnet makes use of this interface to allow running DECnet on an ethernet
+card which has already been configured using TCP/IP (presumably using the
built in MAC address of the card, as usual) and/or to allow multiple DECnet
addresses on each physical interface. If you do this, be aware that if your
ethernet card doesn't support perfect hashing in its MAC address filter
@@ -210,7 +221,8 @@ to gain the best efficiency. Better still is to use a card which supports
NAPI as well.
-8) Mailing list
+8. Mailing list
+===============
If you are keen to get involved in development, or want to ask questions
about configuration, or even just report bugs, then there is a mailing
@@ -218,7 +230,8 @@ list that you can join, details are at:
http://sourceforge.net/mail/?group_id=4993
-9) Legal Info
+9. Legal Info
+=============
The Linux DECnet project team have placed their code under the GPL. The
software is provided "as is" and without warranty express or implied.
diff --git a/Documentation/networking/defza.txt b/Documentation/networking/defza.rst
index 663e4a906751..73c2f793ea26 100644
--- a/Documentation/networking/defza.txt
+++ b/Documentation/networking/defza.rst
@@ -1,4 +1,10 @@
-Notes on the DEC FDDIcontroller 700 (DEFZA-xx) driver v.1.1.4.
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================================================
+Notes on the DEC FDDIcontroller 700 (DEFZA-xx) driver
+=====================================================
+
+:Version: v.1.1.4
DEC FDDIcontroller 700 is DEC's first-generation TURBOchannel FDDI
diff --git a/Documentation/networking/device_drivers/3com/3c509.txt b/Documentation/networking/device_drivers/3com/3c509.rst
index fbf722e15ac3..47f706bacdd9 100644
--- a/Documentation/networking/device_drivers/3com/3c509.txt
+++ b/Documentation/networking/device_drivers/3com/3c509.rst
@@ -1,17 +1,21 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============================================================================
Linux and the 3Com EtherLink III Series Ethercards (driver v1.18c and higher)
-----------------------------------------------------------------------------
+=============================================================================
This file contains the instructions and caveats for v1.18c and higher versions
of the 3c509 driver. You should not use the driver without reading this file.
release 1.0
+
28 February 2002
+
Current maintainer (corrections to):
David Ruggiero <jdr@farfalle.com>
-----------------------------------------------------------------------------
-
-(0) Introduction
+Introduction
+============
The following are notes and information on using the 3Com EtherLink III series
ethercards in Linux. These cards are commonly known by the most widely-used
@@ -21,11 +25,11 @@ be (but sometimes are) confused with the similarly-numbered PCI-bus "3c905"
provided by the module 3c509.c, which has code to support all of the following
models:
- 3c509 (original ISA card)
- 3c509B (later revision of the ISA card; supports full-duplex)
- 3c589 (PCMCIA)
- 3c589B (later revision of the 3c589; supports full-duplex)
- 3c579 (EISA)
+ - 3c509 (original ISA card)
+ - 3c509B (later revision of the ISA card; supports full-duplex)
+ - 3c589 (PCMCIA)
+ - 3c589B (later revision of the 3c589; supports full-duplex)
+ - 3c579 (EISA)
Large portions of this documentation were heavily borrowed from the guide
written the original author of the 3c509 driver, Donald Becker. The master
@@ -33,32 +37,34 @@ copy of that document, which contains notes on older versions of the driver,
currently resides on Scyld web server: http://www.scyld.com/.
-(1) Special Driver Features
+Special Driver Features
+=======================
Overriding card settings
The driver allows boot- or load-time overriding of the card's detected IOADDR,
IRQ, and transceiver settings, although this capability shouldn't generally be
needed except to enable full-duplex mode (see below). An example of the syntax
-for LILO parameters for doing this:
+for LILO parameters for doing this::
- ether=10,0x310,3,0x3c509,eth0
+ ether=10,0x310,3,0x3c509,eth0
This configures the first found 3c509 card for IRQ 10, base I/O 0x310, and
transceiver type 3 (10base2). The flag "0x3c509" must be set to avoid conflicts
with other card types when overriding the I/O address. When the driver is
loaded as a module, only the IRQ may be overridden. For example,
setting two cards to IRQ10 and IRQ11 is done by using the irq module
-option:
+option::
options 3c509 irq=10,11
-(2) Full-duplex mode
+Full-duplex mode
+================
The v1.18c driver added support for the 3c509B's full-duplex capabilities.
In order to enable and successfully use full-duplex mode, three conditions
-must be met:
+must be met:
(a) You must have a Etherlink III card model whose hardware supports full-
duplex operations. Currently, the only members of the 3c509 family that are
@@ -78,27 +84,32 @@ duplex-capable Ethernet switch (*not* a hub), or a full-duplex-capable NIC on
another system that's connected directly to the 3c509B via a crossover cable.
Full-duplex mode can be enabled using 'ethtool'.
-
-/////Extremely important caution concerning full-duplex mode/////
-Understand that the 3c509B's hardware's full-duplex support is much more
-limited than that provide by more modern network interface cards. Although
-at the physical layer of the network it fully supports full-duplex operation,
-the card was designed before the current Ethernet auto-negotiation (N-way)
-spec was written. This means that the 3c509B family ***cannot and will not
-auto-negotiate a full-duplex connection with its link partner under any
-circumstances, no matter how it is initialized***. If the full-duplex mode
-of the 3c509B is enabled, its link partner will very likely need to be
-independently _forced_ into full-duplex mode as well; otherwise various nasty
-failures will occur - at the very least, you'll see massive numbers of packet
-collisions. This is one of very rare circumstances where disabling auto-
-negotiation and forcing the duplex mode of a network interface card or switch
-would ever be necessary or desirable.
-
-
-(3) Available Transceiver Types
+
+.. warning::
+
+ Extremely important caution concerning full-duplex mode
+
+ Understand that the 3c509B's hardware's full-duplex support is much more
+ limited than that provide by more modern network interface cards. Although
+ at the physical layer of the network it fully supports full-duplex operation,
+ the card was designed before the current Ethernet auto-negotiation (N-way)
+ spec was written. This means that the 3c509B family ***cannot and will not
+ auto-negotiate a full-duplex connection with its link partner under any
+ circumstances, no matter how it is initialized***. If the full-duplex mode
+ of the 3c509B is enabled, its link partner will very likely need to be
+ independently _forced_ into full-duplex mode as well; otherwise various nasty
+ failures will occur - at the very least, you'll see massive numbers of packet
+ collisions. This is one of very rare circumstances where disabling auto-
+ negotiation and forcing the duplex mode of a network interface card or switch
+ would ever be necessary or desirable.
+
+
+Available Transceiver Types
+===========================
For versions of the driver v1.18c and above, the available transceiver types are:
-
+
+== =========================================================================
0 transceiver type from EEPROM config (normally 10baseT); force half-duplex
1 AUI (thick-net / DB15 connector)
2 (undefined)
@@ -106,6 +117,7 @@ For versions of the driver v1.18c and above, the available transceiver types are
4 10baseT (RJ-45 connector); force half-duplex mode
8 transceiver type and duplex mode taken from card's EEPROM config settings
12 10baseT (RJ-45 connector); force full-duplex mode
+== =========================================================================
Prior to driver version 1.18c, only transceiver codes 0-4 were supported. Note
that the new transceiver codes 8 and 12 are the *only* ones that will enable
@@ -116,26 +128,30 @@ it must always be explicitly enabled via one of these code in order to be
activated.
The transceiver type can be changed using 'ethtool'.
-
-(4a) Interpretation of error messages and common problems
+
+Interpretation of error messages and common problems
+----------------------------------------------------
Error Messages
+^^^^^^^^^^^^^^
-eth0: Infinite loop in interrupt, status 2011.
+eth0: Infinite loop in interrupt, status 2011.
These are "mostly harmless" message indicating that the driver had too much
work during that interrupt cycle. With a status of 0x2011 you are receiving
packets faster than they can be removed from the card. This should be rare
or impossible in normal operation. Possible causes of this error report are:
-
+
- a "green" mode enabled that slows the processor down when there is no
- keyboard activity.
+ keyboard activity.
- some other device or device driver hogging the bus or disabling interrupts.
Check /proc/interrupts for excessive interrupt counts. The timer tick
- interrupt should always be incrementing faster than the others.
+ interrupt should always be incrementing faster than the others.
+
+No received packets
+^^^^^^^^^^^^^^^^^^^
-No received packets
If a 3c509, 3c562 or 3c589 can successfully transmit packets, but never
receives packets (as reported by /proc/net/dev or 'ifconfig') you likely
have an interrupt line problem. Check /proc/interrupts to verify that the
@@ -146,26 +162,37 @@ or IRQ5, and the easiest solution is to move the 3c509 to a different
interrupt line. If the device is receiving packets but 'ping' doesn't work,
you have a routing problem.
-Tx Carrier Errors Reported in /proc/net/dev
+Tx Carrier Errors Reported in /proc/net/dev
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+
If an EtherLink III appears to transmit packets, but the "Tx carrier errors"
field in /proc/net/dev increments as quickly as the Tx packet count, you
-likely have an unterminated network or the incorrect media transceiver selected.
+likely have an unterminated network or the incorrect media transceiver selected.
+
+3c509B card is not detected on machines with an ISA PnP BIOS.
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-3c509B card is not detected on machines with an ISA PnP BIOS.
While the updated driver works with most PnP BIOS programs, it does not work
with all. This can be fixed by disabling PnP support using the 3Com-supplied
-setup program.
+setup program.
+
+3c509 card is not detected on overclocked machines
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-3c509 card is not detected on overclocked machines
Increase the delay time in id_read_eeprom() from the current value, 500,
-to an absurdly high value, such as 5000.
+to an absurdly high value, such as 5000.
+
+Decoding Status and Error Messages
+----------------------------------
-(4b) Decoding Status and Error Messages
-The bits in the main status register are:
+The bits in the main status register are:
+===== ======================================
value description
+===== ======================================
0x01 Interrupt latch
0x02 Tx overrun, or Rx underrun
0x04 Tx complete
@@ -174,30 +201,38 @@ value description
0x20 A Rx packet has started to arrive
0x40 The driver has requested an interrupt
0x80 Statistics counter nearly full
+===== ======================================
-The bits in the transmit (Tx) status word are:
+The bits in the transmit (Tx) status word are:
-value description
-0x02 Out-of-window collision.
-0x04 Status stack overflow (normally impossible).
-0x08 16 collisions.
-0x10 Tx underrun (not enough PCI bus bandwidth).
-0x20 Tx jabber.
-0x40 Tx interrupt requested.
-0x80 Status is valid (this should always be set).
+===== ============================================
+value description
+===== ============================================
+0x02 Out-of-window collision.
+0x04 Status stack overflow (normally impossible).
+0x08 16 collisions.
+0x10 Tx underrun (not enough PCI bus bandwidth).
+0x20 Tx jabber.
+0x40 Tx interrupt requested.
+0x80 Status is valid (this should always be set).
+===== ============================================
-When a transmit error occurs the driver produces a status message such as
+When a transmit error occurs the driver produces a status message such as::
eth0: Transmit error, Tx status register 82
The two values typically seen here are:
-0x82
+0x82
+^^^^
+
Out of window collision. This typically occurs when some other Ethernet
-host is incorrectly set to full duplex on a half duplex network.
+host is incorrectly set to full duplex on a half duplex network.
+
+0x88
+^^^^
-0x88
16 collisions. This typically occurs when the network is exceptionally busy
or when another host doesn't correctly back off after a collision. If this
error is mixed with 0x82 errors it is the result of a host incorrectly set
@@ -207,7 +242,8 @@ Both of these errors are the result of network problems that should be
corrected. They do not represent driver malfunction.
-(5) Revision history (this file)
+Revision history (this file)
+============================
28Feb02 v1.0 DR New; major portions based on Becker original 3c509 docs
diff --git a/Documentation/networking/device_drivers/3com/vortex.txt b/Documentation/networking/device_drivers/3com/vortex.rst
index 587f3fcfbcae..800add5be338 100644
--- a/Documentation/networking/device_drivers/3com/vortex.txt
+++ b/Documentation/networking/device_drivers/3com/vortex.rst
@@ -1,5 +1,13 @@
-Documentation/networking/device_drivers/3com/vortex.txt
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================
+3Com Vortex device driver
+=========================
+
+Documentation/networking/device_drivers/3com/vortex.rst
+
Andrew Morton
+
30 April 2000
@@ -8,12 +16,12 @@ driver for Linux, 3c59x.c.
The driver was written by Donald Becker <becker@scyld.com>
-Don is no longer the prime maintainer of this version of the driver.
+Don is no longer the prime maintainer of this version of the driver.
Please report problems to one or more of:
- Andrew Morton
- Netdev mailing list <netdev@vger.kernel.org>
- Linux kernel mailing list <linux-kernel@vger.kernel.org>
+- Andrew Morton
+- Netdev mailing list <netdev@vger.kernel.org>
+- Linux kernel mailing list <linux-kernel@vger.kernel.org>
Please note the 'Reporting and Diagnosing Problems' section at the end
of this file.
@@ -24,58 +32,58 @@ Since kernel 2.3.99-pre6, this driver incorporates the support for the
This driver supports the following hardware:
- 3c590 Vortex 10Mbps
- 3c592 EISA 10Mbps Demon/Vortex
- 3c597 EISA Fast Demon/Vortex
- 3c595 Vortex 100baseTx
- 3c595 Vortex 100baseT4
- 3c595 Vortex 100base-MII
- 3c900 Boomerang 10baseT
- 3c900 Boomerang 10Mbps Combo
- 3c900 Cyclone 10Mbps TPO
- 3c900 Cyclone 10Mbps Combo
- 3c900 Cyclone 10Mbps TPC
- 3c900B-FL Cyclone 10base-FL
- 3c905 Boomerang 100baseTx
- 3c905 Boomerang 100baseT4
- 3c905B Cyclone 100baseTx
- 3c905B Cyclone 10/100/BNC
- 3c905B-FX Cyclone 100baseFx
- 3c905C Tornado
- 3c920B-EMB-WNM (ATI Radeon 9100 IGP)
- 3c980 Cyclone
- 3c980C Python-T
- 3cSOHO100-TX Hurricane
- 3c555 Laptop Hurricane
- 3c556 Laptop Tornado
- 3c556B Laptop Hurricane
- 3c575 [Megahertz] 10/100 LAN CardBus
- 3c575 Boomerang CardBus
- 3CCFE575BT Cyclone CardBus
- 3CCFE575CT Tornado CardBus
- 3CCFE656 Cyclone CardBus
- 3CCFEM656B Cyclone+Winmodem CardBus
- 3CXFEM656C Tornado+Winmodem CardBus
- 3c450 HomePNA Tornado
- 3c920 Tornado
- 3c982 Hydra Dual Port A
- 3c982 Hydra Dual Port B
- 3c905B-T4
- 3c920B-EMB-WNM Tornado
+ - 3c590 Vortex 10Mbps
+ - 3c592 EISA 10Mbps Demon/Vortex
+ - 3c597 EISA Fast Demon/Vortex
+ - 3c595 Vortex 100baseTx
+ - 3c595 Vortex 100baseT4
+ - 3c595 Vortex 100base-MII
+ - 3c900 Boomerang 10baseT
+ - 3c900 Boomerang 10Mbps Combo
+ - 3c900 Cyclone 10Mbps TPO
+ - 3c900 Cyclone 10Mbps Combo
+ - 3c900 Cyclone 10Mbps TPC
+ - 3c900B-FL Cyclone 10base-FL
+ - 3c905 Boomerang 100baseTx
+ - 3c905 Boomerang 100baseT4
+ - 3c905B Cyclone 100baseTx
+ - 3c905B Cyclone 10/100/BNC
+ - 3c905B-FX Cyclone 100baseFx
+ - 3c905C Tornado
+ - 3c920B-EMB-WNM (ATI Radeon 9100 IGP)
+ - 3c980 Cyclone
+ - 3c980C Python-T
+ - 3cSOHO100-TX Hurricane
+ - 3c555 Laptop Hurricane
+ - 3c556 Laptop Tornado
+ - 3c556B Laptop Hurricane
+ - 3c575 [Megahertz] 10/100 LAN CardBus
+ - 3c575 Boomerang CardBus
+ - 3CCFE575BT Cyclone CardBus
+ - 3CCFE575CT Tornado CardBus
+ - 3CCFE656 Cyclone CardBus
+ - 3CCFEM656B Cyclone+Winmodem CardBus
+ - 3CXFEM656C Tornado+Winmodem CardBus
+ - 3c450 HomePNA Tornado
+ - 3c920 Tornado
+ - 3c982 Hydra Dual Port A
+ - 3c982 Hydra Dual Port B
+ - 3c905B-T4
+ - 3c920B-EMB-WNM Tornado
Module parameters
=================
There are several parameters which may be provided to the driver when
-its module is loaded. These are usually placed in /etc/modprobe.d/*.conf
-configuration files. Example:
+its module is loaded. These are usually placed in ``/etc/modprobe.d/*.conf``
+configuration files. Example::
-options 3c59x debug=3 rx_copybreak=300
+ options 3c59x debug=3 rx_copybreak=300
If you are using the PCMCIA tools (cardmgr) then the options may be
-placed in /etc/pcmcia/config.opts:
+placed in /etc/pcmcia/config.opts::
-module "3c59x" opts "debug=3 rx_copybreak=300"
+ module "3c59x" opts "debug=3 rx_copybreak=300"
The supported parameters are:
@@ -89,7 +97,7 @@ options=N1,N2,N3,...
Each number in the list provides an option to the corresponding
network card. So if you have two 3c905's and you wish to provide
- them with option 0x204 you would use:
+ them with option 0x204 you would use::
options=0x204,0x204
@@ -97,6 +105,8 @@ options=N1,N2,N3,...
have the following meanings:
Possible media type settings
+
+ == =================================
0 10baseT
1 10Mbs AUI
2 undefined
@@ -108,17 +118,20 @@ options=N1,N2,N3,...
8 Autonegotiate
9 External MII
10 Use default setting from EEPROM
+ == =================================
When generating a value for the 'options' setting, the above media
selection values may be OR'ed (or added to) the following:
+ ====== =============================================
0x8000 Set driver debugging level to 7
0x4000 Set driver debugging level to 2
0x0400 Enable Wake-on-LAN
0x0200 Force full duplex mode.
0x0010 Bus-master enable bit (Old Vortex cards only)
+ ====== =============================================
- For example:
+ For example::
insmod 3c59x options=0x204
@@ -127,14 +140,14 @@ options=N1,N2,N3,...
global_options=N
- Sets the `options' parameter for all 3c59x NICs in the machine.
- Entries in the `options' array above will override any setting of
+ Sets the ``options`` parameter for all 3c59x NICs in the machine.
+ Entries in the ``options`` array above will override any setting of
this.
full_duplex=N1,N2,N3...
Similar to bit 9 of 'options'. Forces the corresponding card into
- full-duplex mode. Please use this in preference to the `options'
+ full-duplex mode. Please use this in preference to the ``options``
parameter.
In fact, please don't use this at all! You're better off getting
@@ -143,13 +156,13 @@ full_duplex=N1,N2,N3...
global_full_duplex=N1
Sets full duplex mode for all 3c59x NICs in the machine. Entries
- in the `full_duplex' array above will override any setting of this.
+ in the ``full_duplex`` array above will override any setting of this.
flow_ctrl=N1,N2,N3...
Use 802.3x MAC-layer flow control. The 3com cards only support the
PAUSE command, which means that they will stop sending packets for a
- short period if they receive a PAUSE frame from the link partner.
+ short period if they receive a PAUSE frame from the link partner.
The driver only allows flow control on a link which is operating in
full duplex mode.
@@ -170,14 +183,14 @@ rx_copybreak=M
This is a speed/space tradeoff.
- The value of rx_copybreak is used to decide when to make the copy.
- If the packet size is less than rx_copybreak, the packet is copied.
+ The value of rx_copybreak is used to decide when to make the copy.
+ If the packet size is less than rx_copybreak, the packet is copied.
The default value for rx_copybreak is 200 bytes.
max_interrupt_work=N
The driver's interrupt service routine can handle many receive and
- transmit packets in a single invocation. It does this in a loop.
+ transmit packets in a single invocation. It does this in a loop.
The value of max_interrupt_work governs how many times the interrupt
service routine will loop. The default value is 32 loops. If this
is exceeded the interrupt service routine gives up and generates a
@@ -186,7 +199,7 @@ max_interrupt_work=N
hw_checksums=N1,N2,N3,...
Recent 3com NICs are able to generate IPv4, TCP and UDP checksums
- in hardware. Linux has used the Rx checksumming for a long time.
+ in hardware. Linux has used the Rx checksumming for a long time.
The "zero copy" patch which is planned for the 2.4 kernel series
allows you to make use of the NIC's DMA scatter/gather and transmit
checksumming as well.
@@ -196,11 +209,11 @@ hw_checksums=N1,N2,N3,...
This module parameter has been provided so you can override this
decision. If you think that Tx checksums are causing a problem, you
- may disable the feature with `hw_checksums=0'.
+ may disable the feature with ``hw_checksums=0``.
If you think your NIC should be performing Tx checksumming and the
driver isn't enabling it, you can force the use of hardware Tx
- checksumming with `hw_checksums=1'.
+ checksumming with ``hw_checksums=1``.
The driver drops a message in the logfiles to indicate whether or
not it is using hardware scatter/gather and hardware Tx checksums.
@@ -210,8 +223,8 @@ hw_checksums=N1,N2,N3,...
decrease in throughput for send(). There is no effect upon receive
efficiency.
-compaq_ioaddr=N
-compaq_irq=N
+compaq_ioaddr=N,
+compaq_irq=N,
compaq_device_id=N
"Variables to work-around the Compaq PCI BIOS32 problem"....
@@ -219,7 +232,7 @@ compaq_device_id=N
watchdog=N
Sets the time duration (in milliseconds) after which the kernel
- decides that the transmitter has become stuck and needs to be reset.
+ decides that the transmitter has become stuck and needs to be reset.
This is mainly for debugging purposes, although it may be advantageous
to increase this value on LANs which have very high collision rates.
The default value is 5000 (5.0 seconds).
@@ -227,7 +240,7 @@ watchdog=N
enable_wol=N1,N2,N3,...
Enable Wake-on-LAN support for the relevant interface. Donald
- Becker's `ether-wake' application may be used to wake suspended
+ Becker's ``ether-wake`` application may be used to wake suspended
machines.
Also enables the NIC's power management support.
@@ -235,7 +248,7 @@ enable_wol=N1,N2,N3,...
global_enable_wol=N
Sets enable_wol mode for all 3c59x NICs in the machine. Entries in
- the `enable_wol' array above will override any setting of this.
+ the ``enable_wol`` array above will override any setting of this.
Media selection
---------------
@@ -325,12 +338,12 @@ Autonegotiation notes
Cisco switches (Jeff Busch <jbusch@deja.com>)
- My "standard config" for ports to which PC's/servers connect directly:
+ My "standard config" for ports to which PC's/servers connect directly::
- interface FastEthernet0/N
- description machinename
- load-interval 30
- spanning-tree portfast
+ interface FastEthernet0/N
+ description machinename
+ load-interval 30
+ spanning-tree portfast
If autonegotiation is a problem, you may need to specify "speed
100" and "duplex full" as well (or "speed 10" and "duplex half").
@@ -368,9 +381,9 @@ steps you should take:
But for most problems it is useful to provide the following:
- o Kernel version, driver version
+ - Kernel version, driver version
- o A copy of the banner message which the driver generates when
+ - A copy of the banner message which the driver generates when
it is initialised. For example:
eth0: 3Com PCI 3c905C Tornado at 0xa400, 00:50:da:6a:88:f0, IRQ 19
@@ -378,68 +391,68 @@ steps you should take:
MII transceiver found at address 24, status 782d.
Enabling bus-master transmits and whole-frame receives.
- NOTE: You must provide the `debug=2' modprobe option to generate
- a full detection message. Please do this:
+ NOTE: You must provide the ``debug=2`` modprobe option to generate
+ a full detection message. Please do this::
modprobe 3c59x debug=2
- o If it is a PCI device, the relevant output from 'lspci -vx', eg:
-
- 00:09.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 74)
- Subsystem: 3Com Corporation: Unknown device 9200
- Flags: bus master, medium devsel, latency 32, IRQ 19
- I/O ports at a400 [size=128]
- Memory at db000000 (32-bit, non-prefetchable) [size=128]
- Expansion ROM at <unassigned> [disabled] [size=128K]
- Capabilities: [dc] Power Management version 2
- 00: b7 10 00 92 07 00 10 02 74 00 00 02 08 20 00 00
- 10: 01 a4 00 00 00 00 00 db 00 00 00 00 00 00 00 00
- 20: 00 00 00 00 00 00 00 00 00 00 00 00 b7 10 00 10
- 30: 00 00 00 00 dc 00 00 00 00 00 00 00 05 01 0a 0a
-
- o A description of the environment: 10baseT? 100baseT?
+ - If it is a PCI device, the relevant output from 'lspci -vx', eg::
+
+ 00:09.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 74)
+ Subsystem: 3Com Corporation: Unknown device 9200
+ Flags: bus master, medium devsel, latency 32, IRQ 19
+ I/O ports at a400 [size=128]
+ Memory at db000000 (32-bit, non-prefetchable) [size=128]
+ Expansion ROM at <unassigned> [disabled] [size=128K]
+ Capabilities: [dc] Power Management version 2
+ 00: b7 10 00 92 07 00 10 02 74 00 00 02 08 20 00 00
+ 10: 01 a4 00 00 00 00 00 db 00 00 00 00 00 00 00 00
+ 20: 00 00 00 00 00 00 00 00 00 00 00 00 b7 10 00 10
+ 30: 00 00 00 00 dc 00 00 00 00 00 00 00 05 01 0a 0a
+
+ - A description of the environment: 10baseT? 100baseT?
full/half duplex? switched or hubbed?
- o Any additional module parameters which you may be providing to the driver.
+ - Any additional module parameters which you may be providing to the driver.
- o Any kernel logs which are produced. The more the merrier.
+ - Any kernel logs which are produced. The more the merrier.
If this is a large file and you are sending your report to a
mailing list, mention that you have the logfile, but don't send
it. If you're reporting direct to the maintainer then just send
it.
To ensure that all kernel logs are available, add the
- following line to /etc/syslog.conf:
+ following line to /etc/syslog.conf::
- kern.* /var/log/messages
+ kern.* /var/log/messages
- Then restart syslogd with:
+ Then restart syslogd with::
- /etc/rc.d/init.d/syslog restart
+ /etc/rc.d/init.d/syslog restart
(The above may vary, depending upon which Linux distribution you use).
- o If your problem is reproducible then that's great. Try the
+ - If your problem is reproducible then that's great. Try the
following:
1) Increase the debug level. Usually this is done via:
- a) modprobe driver debug=7
- b) In /etc/modprobe.d/driver.conf:
- options driver debug=7
+ a) modprobe driver debug=7
+ b) In /etc/modprobe.d/driver.conf:
+ options driver debug=7
2) Recreate the problem with the higher debug level,
- send all logs to the maintainer.
+ send all logs to the maintainer.
3) Download you card's diagnostic tool from Donald
- Becker's website <http://www.scyld.com/ethercard_diag.html>.
- Download mii-diag.c as well. Build these.
+ Becker's website <http://www.scyld.com/ethercard_diag.html>.
+ Download mii-diag.c as well. Build these.
- a) Run 'vortex-diag -aaee' and 'mii-diag -v' when the card is
- working correctly. Save the output.
+ a) Run 'vortex-diag -aaee' and 'mii-diag -v' when the card is
+ working correctly. Save the output.
- b) Run the above commands when the card is malfunctioning. Send
- both sets of output.
+ b) Run the above commands when the card is malfunctioning. Send
+ both sets of output.
Finally, please be patient and be prepared to do some work. You may
end up working on this problem for a week or more as the maintainer
diff --git a/Documentation/networking/device_drivers/amazon/ena.txt b/Documentation/networking/device_drivers/amazon/ena.rst
index 1bb55c7b604c..11af6388ea87 100644
--- a/Documentation/networking/device_drivers/amazon/ena.txt
+++ b/Documentation/networking/device_drivers/amazon/ena.rst
@@ -1,8 +1,12 @@
-Linux kernel driver for Elastic Network Adapter (ENA) family:
-=============================================================
+.. SPDX-License-Identifier: GPL-2.0
+
+============================================================
+Linux kernel driver for Elastic Network Adapter (ENA) family
+============================================================
+
+Overview
+========
-Overview:
-=========
ENA is a networking interface designed to make good use of modern CPU
features and system architectures.
@@ -35,32 +39,40 @@ debug logs.
Some of the ENA devices support a working mode called Low-latency
Queue (LLQ), which saves several more microseconds.
-Supported PCI vendor ID/device IDs:
+Supported PCI vendor ID/device IDs
+==================================
+
+========= =======================
+1d0f:0ec2 ENA PF
+1d0f:1ec2 ENA PF with LLQ support
+1d0f:ec20 ENA VF
+1d0f:ec21 ENA VF with LLQ support
+========= =======================
+
+ENA Source Code Directory Structure
===================================
-1d0f:0ec2 - ENA PF
-1d0f:1ec2 - ENA PF with LLQ support
-1d0f:ec20 - ENA VF
-1d0f:ec21 - ENA VF with LLQ support
-
-ENA Source Code Directory Structure:
-====================================
-ena_com.[ch] - Management communication layer. This layer is
- responsible for the handling all the management
- (admin) communication between the device and the
- driver.
-ena_eth_com.[ch] - Tx/Rx data path.
-ena_admin_defs.h - Definition of ENA management interface.
-ena_eth_io_defs.h - Definition of ENA data path interface.
-ena_common_defs.h - Common definitions for ena_com layer.
-ena_regs_defs.h - Definition of ENA PCI memory-mapped (MMIO) registers.
-ena_netdev.[ch] - Main Linux kernel driver.
-ena_syfsfs.[ch] - Sysfs files.
-ena_ethtool.c - ethtool callbacks.
-ena_pci_id_tbl.h - Supported device IDs.
+
+================= ======================================================
+ena_com.[ch] Management communication layer. This layer is
+ responsible for the handling all the management
+ (admin) communication between the device and the
+ driver.
+ena_eth_com.[ch] Tx/Rx data path.
+ena_admin_defs.h Definition of ENA management interface.
+ena_eth_io_defs.h Definition of ENA data path interface.
+ena_common_defs.h Common definitions for ena_com layer.
+ena_regs_defs.h Definition of ENA PCI memory-mapped (MMIO) registers.
+ena_netdev.[ch] Main Linux kernel driver.
+ena_syfsfs.[ch] Sysfs files.
+ena_ethtool.c ethtool callbacks.
+ena_pci_id_tbl.h Supported device IDs.
+================= ======================================================
Management Interface:
=====================
+
ENA management interface is exposed by means of:
+
- PCIe Configuration Space
- Device Registers
- Admin Queue (AQ) and Admin Completion Queue (ACQ)
@@ -78,6 +90,7 @@ vendor-specific extensions. Most of the management operations are
framed in a generic Get/Set feature command.
The following admin queue commands are supported:
+
- Create I/O submission queue
- Create I/O completion queue
- Destroy I/O submission queue
@@ -96,12 +109,16 @@ be reported using ACQ. AENQ events are subdivided into groups. Each
group may have multiple syndromes, as shown below
The events are:
+
+ ==================== ===============
Group Syndrome
- Link state change - X -
- Fatal error - X -
+ ==================== ===============
+ Link state change **X**
+ Fatal error **X**
Notification Suspend traffic
Notification Resume traffic
- Keep-Alive - X -
+ Keep-Alive **X**
+ ==================== ===============
ACQ and AENQ share the same MSI-X vector.
@@ -113,8 +130,8 @@ the device every second. The driver re-arms the WD upon reception of a
Keep-Alive event. A missed Keep-Alive event causes the WD handler to
fire.
-Data Path Interface:
-====================
+Data Path Interface
+===================
I/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx
SQ correspondingly). Each SQ has a completion queue (CQ) associated
with it.
@@ -123,11 +140,15 @@ The SQs and CQs are implemented as descriptor rings in contiguous
physical memory.
The ENA driver supports two Queue Operation modes for Tx SQs:
+
- Regular mode
+
* In this mode the Tx SQs reside in the host's memory. The ENA
device fetches the ENA Tx descriptors and packet data from host
memory.
+
- Low Latency Queue (LLQ) mode or "push-mode".
+
* In this mode the driver pushes the transmit descriptors and the
first 128 bytes of the packet directly to the ENA device memory
space. The rest of the packet payload is fetched by the
@@ -142,6 +163,7 @@ Note: Not all ENA devices support LLQ, and this feature is negotiated
The driver supports multi-queue for both Tx and Rx. This has various
benefits:
+
- Reduced CPU/thread/process contention on a given Ethernet interface.
- Cache miss rate on completion is reduced, particularly for data
cache lines that hold the sk_buff structures.
@@ -151,8 +173,8 @@ benefits:
packet is running.
- In hardware interrupt re-direction.
-Interrupt Modes:
-================
+Interrupt Modes
+===============
The driver assigns a single MSI-X vector per queue pair (for both Tx
and Rx directions). The driver assigns an additional dedicated MSI-X vector
for management (for ACQ and AENQ).
@@ -163,9 +185,12 @@ removed. I/O queue interrupt registration is performed when the Linux
interface of the adapter is opened, and it is de-registered when the
interface is closed.
-The management interrupt is named:
+The management interrupt is named::
+
ena-mgmnt@pci:<PCI domain:bus:slot.function>
-and for each queue pair, an interrupt is named:
+
+and for each queue pair, an interrupt is named::
+
<interface name>-Tx-Rx-<queue index>
The ENA device operates in auto-mask and auto-clear interrupt
@@ -173,8 +198,8 @@ modes. That is, once MSI-X is delivered to the host, its Cause bit is
automatically cleared and the interrupt is masked. The interrupt is
unmasked by the driver after NAPI processing is complete.
-Interrupt Moderation:
-=====================
+Interrupt Moderation
+====================
ENA driver and device can operate in conventional or adaptive interrupt
moderation mode.
@@ -202,45 +227,46 @@ delay value to each level.
The user can enable/disable adaptive moderation, modify the interrupt
delay table and restore its default values through sysfs.
-RX copybreak:
-=============
+RX copybreak
+============
The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK
and can be configured by the ETHTOOL_STUNABLE command of the
SIOCETHTOOL ioctl.
-SKB:
-====
+SKB
+===
The driver-allocated SKB for frames received from Rx handling using
NAPI context. The allocation method depends on the size of the packet.
If the frame length is larger than rx_copybreak, napi_get_frags()
is used, otherwise netdev_alloc_skb_ip_align() is used, the buffer
content is copied (by CPU) to the SKB, and the buffer is recycled.
-Statistics:
-===========
+Statistics
+==========
The user can obtain ENA device and driver statistics using ethtool.
The driver can collect regular or extended statistics (including
per-queue stats) from the device.
In addition the driver logs the stats to syslog upon device reset.
-MTU:
-====
+MTU
+===
The driver supports an arbitrarily large MTU with a maximum that is
negotiated with the device. The driver configures MTU using the
SetFeature command (ENA_ADMIN_MTU property). The user can change MTU
via ip(8) and similar legacy tools.
-Stateless Offloads:
-===================
+Stateless Offloads
+==================
The ENA driver supports:
+
- TSO over IPv4/IPv6
- TSO with ECN
- IPv4 header checksum offload
- TCP/UDP over IPv4/IPv6 checksum offloads
-RSS:
-====
+RSS
+===
- The ENA device supports RSS that allows flexible Rx traffic
steering.
- Toeplitz and CRC32 hash functions are supported.
@@ -255,11 +281,13 @@ RSS:
- The user can provide a hash key, hash function, and configure the
indirection table through ethtool(8).
-DATA PATH:
-==========
-Tx:
----
+DATA PATH
+=========
+Tx
+--
+
end_start_xmit() is called by the stack. This function does the following:
+
- Maps data buffers (skb->data and frags).
- Populates ena_buf for the push buffer (if the driver and device are
in push mode.)
@@ -271,8 +299,10 @@ end_start_xmit() is called by the stack. This function does the following:
- Calls ena_com_prepare_tx(), an ENA communication layer that converts
the ena_bufs to ENA descriptors (and adds meta ENA descriptors as
needed.)
+
* This function also copies the ENA descriptors and the push buffer
to the Device memory space (if in push mode.)
+
- Writes doorbell to the ENA device.
- When the ENA device finishes sending the packet, a completion
interrupt is raised.
@@ -280,14 +310,16 @@ end_start_xmit() is called by the stack. This function does the following:
- The ena_clean_tx_irq() function is called. This function handles the
completion descriptors generated by the ENA, with a single
completion descriptor per completed packet.
+
* req_id is retrieved from the completion descriptor. The tx_info of
the packet is retrieved via the req_id. The data buffers are
unmapped and req_id is returned to the empty req_id ring.
* The function stops when the completion descriptors are completed or
the budget is reached.
-Rx:
----
+Rx
+--
+
- When a packet is received from the ENA device.
- The interrupt handler schedules NAPI.
- The ena_clean_rx_irq() function is called. This function calls
@@ -296,13 +328,17 @@ Rx:
no new packet is found.
- Then it calls the ena_clean_rx_irq() function.
- ena_eth_rx_skb() checks packet length:
+
* If the packet is small (len < rx_copybreak), the driver allocates
a SKB for the new packet, and copies the packet payload into the
SKB data buffer.
+
- In this way the original data buffer is not passed to the stack
and is reused for future Rx packets.
+
* Otherwise the function unmaps the Rx buffer, then allocates the
new SKB structure and hooks the Rx buffer to the SKB frags.
+
- The new SKB is updated with the necessary information (protocol,
checksum hw verify result, etc.), and then passed to the network
stack, using the NAPI interface function napi_gro_receive().
diff --git a/Documentation/networking/device_drivers/aquantia/atlantic.txt b/Documentation/networking/device_drivers/aquantia/atlantic.rst
index 2013fcedc2da..595ddef1c8b3 100644
--- a/Documentation/networking/device_drivers/aquantia/atlantic.txt
+++ b/Documentation/networking/device_drivers/aquantia/atlantic.rst
@@ -1,83 +1,96 @@
-Marvell(Aquantia) AQtion Driver for the aQuantia Multi-Gigabit PCI Express
-Family of Ethernet Adapters
-=============================================================================
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
-Contents
-========
+===============================
+Marvell(Aquantia) AQtion Driver
+===============================
-- Identifying Your Adapter
-- Configuration
-- Supported ethtool options
-- Command Line Parameters
-- Config file parameters
-- Support
-- License
+For the aQuantia Multi-Gigabit PCI Express Family of Ethernet Adapters
+
+.. Contents
+
+ - Identifying Your Adapter
+ - Configuration
+ - Supported ethtool options
+ - Command Line Parameters
+ - Config file parameters
+ - Support
+ - License
Identifying Your Adapter
========================
-The driver in this release is compatible with AQC-100, AQC-107, AQC-108 based ethernet adapters.
+The driver in this release is compatible with AQC-100, AQC-107, AQC-108
+based ethernet adapters.
SFP+ Devices (for AQC-100 based adapters)
-----------------------------------
+-----------------------------------------
-This release tested with passive Direct Attach Cables (DAC) and SFP+/LC Optical Transceiver.
+This release tested with passive Direct Attach Cables (DAC) and SFP+/LC
+Optical Transceiver.
Configuration
-=========================
- Viewing Link Messages
- ---------------------
+=============
+
+Viewing Link Messages
+---------------------
Link messages will not be displayed to the console if the distribution is
restricting system messages. In order to see network driver link messages on
- your console, set dmesg to eight by entering the following:
+ your console, set dmesg to eight by entering the following::
dmesg -n 8
- NOTE: This setting is not saved across reboots.
+ .. note::
- Jumbo Frames
- ------------
+ This setting is not saved across reboots.
+
+Jumbo Frames
+------------
The driver supports Jumbo Frames for all adapters. Jumbo Frames support is
enabled by changing the MTU to a value larger than the default of 1500.
The maximum value for the MTU is 16000. Use the `ip` command to
- increase the MTU size. For example:
+ increase the MTU size. For example::
- ip link set mtu 16000 dev enp1s0
+ ip link set mtu 16000 dev enp1s0
- ethtool
- -------
+ethtool
+-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The latest
ethtool version is required for this functionality.
- NAPI
- ----
+NAPI
+----
NAPI (Rx polling mode) is supported in the atlantic driver.
Supported ethtool options
-============================
- Viewing adapter settings
- ---------------------
- ethtool <ethX>
+=========================
+
+Viewing adapter settings
+------------------------
+
+ ::
- Output example:
+ ethtool <ethX>
+
+ Output example::
Settings for enp1s0:
Supported ports: [ TP ]
Supported link modes: 100baseT/Full
- 1000baseT/Full
- 10000baseT/Full
- 2500baseT/Full
- 5000baseT/Full
+ 1000baseT/Full
+ 10000baseT/Full
+ 2500baseT/Full
+ 5000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 100baseT/Full
- 1000baseT/Full
- 10000baseT/Full
- 2500baseT/Full
- 5000baseT/Full
+ 1000baseT/Full
+ 10000baseT/Full
+ 2500baseT/Full
+ 5000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
@@ -92,16 +105,22 @@ Supported ethtool options
Wake-on: d
Link detected: yes
- ---
- Note: AQrate speeds (2.5/5 Gb/s) will be displayed only with linux kernels > 4.10.
- But you can still use these speeds:
+
+ .. note::
+
+ AQrate speeds (2.5/5 Gb/s) will be displayed only with linux kernels > 4.10.
+ But you can still use these speeds::
+
ethtool -s eth0 autoneg off speed 2500
- Viewing adapter information
- ---------------------
- ethtool -i <ethX>
+Viewing adapter information
+---------------------------
- Output example:
+ ::
+
+ ethtool -i <ethX>
+
+ Output example::
driver: atlantic
version: 5.2.0-050200rc5-generic-kern
@@ -115,12 +134,16 @@ Supported ethtool options
supports-priv-flags: no
- Viewing Ethernet adapter statistics:
- ---------------------
- ethtool -S <ethX>
+Viewing Ethernet adapter statistics
+-----------------------------------
+
+ ::
- Output example:
- NIC statistics:
+ ethtool -S <ethX>
+
+ Output example::
+
+ NIC statistics:
InPackets: 13238607
InUCast: 13293852
InMCast: 52
@@ -164,85 +187,95 @@ Supported ethtool options
Queue[3] InLroPackets: 0
Queue[3] InErrors: 0
- Interrupt coalescing support
- ---------------------------------
- ITR mode, TX/RX coalescing timings could be viewed with:
+Interrupt coalescing support
+----------------------------
- ethtool -c <ethX>
+ ITR mode, TX/RX coalescing timings could be viewed with::
- and changed with:
+ ethtool -c <ethX>
- ethtool -C <ethX> tx-usecs <usecs> rx-usecs <usecs>
+ and changed with::
- To disable coalescing:
+ ethtool -C <ethX> tx-usecs <usecs> rx-usecs <usecs>
- ethtool -C <ethX> tx-usecs 0 rx-usecs 0 tx-max-frames 1 tx-max-frames 1
+ To disable coalescing::
- Wake on LAN support
- ---------------------------------
+ ethtool -C <ethX> tx-usecs 0 rx-usecs 0 tx-max-frames 1 tx-max-frames 1
- WOL support by magic packet:
+Wake on LAN support
+-------------------
- ethtool -s <ethX> wol g
+ WOL support by magic packet::
- To disable WOL:
+ ethtool -s <ethX> wol g
- ethtool -s <ethX> wol d
+ To disable WOL::
- Set and check the driver message level
- ---------------------------------
+ ethtool -s <ethX> wol d
+
+Set and check the driver message level
+--------------------------------------
Set message level
- ethtool -s <ethX> msglvl <level>
+ ::
+
+ ethtool -s <ethX> msglvl <level>
Level values:
- 0x0001 - general driver status.
- 0x0002 - hardware probing.
- 0x0004 - link state.
- 0x0008 - periodic status check.
- 0x0010 - interface being brought down.
- 0x0020 - interface being brought up.
- 0x0040 - receive error.
- 0x0080 - transmit error.
- 0x0200 - interrupt handling.
- 0x0400 - transmit completion.
- 0x0800 - receive completion.
- 0x1000 - packet contents.
- 0x2000 - hardware status.
- 0x4000 - Wake-on-LAN status.
+ ====== =============================
+ 0x0001 general driver status.
+ 0x0002 hardware probing.
+ 0x0004 link state.
+ 0x0008 periodic status check.
+ 0x0010 interface being brought down.
+ 0x0020 interface being brought up.
+ 0x0040 receive error.
+ 0x0080 transmit error.
+ 0x0200 interrupt handling.
+ 0x0400 transmit completion.
+ 0x0800 receive completion.
+ 0x1000 packet contents.
+ 0x2000 hardware status.
+ 0x4000 Wake-on-LAN status.
+ ====== =============================
By default, the level of debugging messages is set 0x0001(general driver status).
Check message level
- ethtool <ethX> | grep "Current message level"
+ ::
- If you want to disable the output of messages
+ ethtool <ethX> | grep "Current message level"
- ethtool -s <ethX> msglvl 0
+ If you want to disable the output of messages::
+
+ ethtool -s <ethX> msglvl 0
+
+RX flow rules (ntuple filters)
+------------------------------
- RX flow rules (ntuple filters)
- ---------------------------------
There are separate rules supported, that applies in that order:
+
1. 16 VLAN ID rules
2. 16 L2 EtherType rules
3. 8 L3/L4 5-Tuple rules
The driver utilizes the ethtool interface for configuring ntuple filters,
- via "ethtool -N <device> <filter>".
+ via ``ethtool -N <device> <filter>``.
- To enable or disable the RX flow rules:
+ To enable or disable the RX flow rules::
- ethtool -K ethX ntuple <on|off>
+ ethtool -K ethX ntuple <on|off>
When disabling ntuple filters, all the user programed filters are
flushed from the driver cache and hardware. All needed filters must
be re-added when ntuple is re-enabled.
Because of the fixed order of the rules, the location of filters is also fixed:
+
- Locations 0 - 15 for VLAN ID filters
- Locations 16 - 31 for L2 EtherType filters
- Locations 32 - 39 for L3/L4 5-tuple filters (locations 32, 36 for IPv6)
@@ -253,32 +286,34 @@ Supported ethtool options
addresses can be supported. Source and destination ports are only compared for
TCP/UDP/SCTP packets.
- To add a filter that directs packet to queue 5, use <-N|-U|--config-nfc|--config-ntuple> switch:
+ To add a filter that directs packet to queue 5, use
+ ``<-N|-U|--config-nfc|--config-ntuple>`` switch::
- ethtool -N <ethX> flow-type udp4 src-ip 10.0.0.1 dst-ip 10.0.0.2 src-port 2000 dst-port 2001 action 5 <loc 32>
+ ethtool -N <ethX> flow-type udp4 src-ip 10.0.0.1 dst-ip 10.0.0.2 src-port 2000 dst-port 2001 action 5 <loc 32>
- action is the queue number.
- loc is the rule number.
- For "flow-type ip4|udp4|tcp4|sctp4|ip6|udp6|tcp6|sctp6" you must set the loc
+ For ``flow-type ip4|udp4|tcp4|sctp4|ip6|udp6|tcp6|sctp6`` you must set the loc
number within 32 - 39.
- For "flow-type ip4|udp4|tcp4|sctp4|ip6|udp6|tcp6|sctp6" you can set 8 rules
+ For ``flow-type ip4|udp4|tcp4|sctp4|ip6|udp6|tcp6|sctp6`` you can set 8 rules
for traffic IPv4 or you can set 2 rules for traffic IPv6. Loc number traffic
IPv6 is 32 and 36.
At the moment you can not use IPv4 and IPv6 filters at the same time.
- Example filter for IPv6 filter traffic:
+ Example filter for IPv6 filter traffic::
- sudo ethtool -N <ethX> flow-type tcp6 src-ip 2001:db8:0:f101::1 dst-ip 2001:db8:0:f101::2 action 1 loc 32
- sudo ethtool -N <ethX> flow-type ip6 src-ip 2001:db8:0:f101::2 dst-ip 2001:db8:0:f101::5 action -1 loc 36
+ sudo ethtool -N <ethX> flow-type tcp6 src-ip 2001:db8:0:f101::1 dst-ip 2001:db8:0:f101::2 action 1 loc 32
+ sudo ethtool -N <ethX> flow-type ip6 src-ip 2001:db8:0:f101::2 dst-ip 2001:db8:0:f101::5 action -1 loc 36
- Example filter for IPv4 filter traffic:
+ Example filter for IPv4 filter traffic::
- sudo ethtool -N <ethX> flow-type udp4 src-ip 10.0.0.4 dst-ip 10.0.0.7 src-port 2000 dst-port 2001 loc 32
- sudo ethtool -N <ethX> flow-type tcp4 src-ip 10.0.0.3 dst-ip 10.0.0.9 src-port 2000 dst-port 2001 loc 33
- sudo ethtool -N <ethX> flow-type ip4 src-ip 10.0.0.6 dst-ip 10.0.0.4 loc 34
+ sudo ethtool -N <ethX> flow-type udp4 src-ip 10.0.0.4 dst-ip 10.0.0.7 src-port 2000 dst-port 2001 loc 32
+ sudo ethtool -N <ethX> flow-type tcp4 src-ip 10.0.0.3 dst-ip 10.0.0.9 src-port 2000 dst-port 2001 loc 33
+ sudo ethtool -N <ethX> flow-type ip4 src-ip 10.0.0.6 dst-ip 10.0.0.4 loc 34
If you set action -1, then all traffic corresponding to the filter will be discarded.
+
The maximum value action is 31.
@@ -287,8 +322,9 @@ Supported ethtool options
from L2 Ethertype filter with UserPriority since both User Priority and VLAN ID
are passed in the same 'vlan' parameter.
- To add a filter that directs packets from VLAN 2001 to queue 5:
- ethtool -N <ethX> flow-type ip4 vlan 2001 m 0xF000 action 1 loc 0
+ To add a filter that directs packets from VLAN 2001 to queue 5::
+
+ ethtool -N <ethX> flow-type ip4 vlan 2001 m 0xF000 action 1 loc 0
L2 EtherType filters allows filter packet by EtherType field or both EtherType
@@ -297,17 +333,17 @@ Supported ethtool options
distinguish VLAN filter from L2 Ethertype filter with UserPriority since both
User Priority and VLAN ID are passed in the same 'vlan' parameter.
- To add a filter that directs IP4 packess of priority 3 to queue 3:
- ethtool -N <ethX> flow-type ether proto 0x800 vlan 0x600 m 0x1FFF action 3 loc 16
+ To add a filter that directs IP4 packess of priority 3 to queue 3::
+ ethtool -N <ethX> flow-type ether proto 0x800 vlan 0x600 m 0x1FFF action 3 loc 16
- To see the list of filters currently present:
+ To see the list of filters currently present::
- ethtool <-u|-n|--show-nfc|--show-ntuple> <ethX>
+ ethtool <-u|-n|--show-nfc|--show-ntuple> <ethX>
- Rules may be deleted from the table itself. This is done using:
+ Rules may be deleted from the table itself. This is done using::
- sudo ethtool <-N|-U|--config-nfc|--config-ntuple> <ethX> delete <loc>
+ sudo ethtool <-N|-U|--config-nfc|--config-ntuple> <ethX> delete <loc>
- loc is the rule number to be deleted.
@@ -316,34 +352,37 @@ Supported ethtool options
case, any flow that matches the filter criteria will be directed to the
appropriate queue. RX filters is supported on all kernels 2.6.30 and later.
- RSS for UDP
- ---------------------------------
+RSS for UDP
+-----------
+
Currently, NIC does not support RSS for fragmented IP packets, which leads to
incorrect working of RSS for fragmented UDP traffic. To disable RSS for UDP the
RX Flow L3/L4 rule may be used.
- Example:
- ethtool -N eth0 flow-type udp4 action 0 loc 32
+ Example::
+
+ ethtool -N eth0 flow-type udp4 action 0 loc 32
+
+UDP GSO hardware offload
+------------------------
- UDP GSO hardware offload
- ---------------------------------
UDP GSO allows to boost UDP tx rates by offloading UDP headers allocation
into hardware. A special userspace socket option is required for this,
- could be validated with /kernel/tools/testing/selftests/net/
+ could be validated with /kernel/tools/testing/selftests/net/::
udpgso_bench_tx -u -4 -D 10.0.1.1 -s 6300 -S 100
Will cause sending out of 100 byte sized UDP packets formed from single
6300 bytes user buffer.
- UDP GSO is configured by:
+ UDP GSO is configured by::
ethtool -K eth0 tx-udp-segmentation on
- Private flags (testing)
- ---------------------------------
+Private flags (testing)
+-----------------------
- Atlantic driver supports private flags for hardware custom features:
+ Atlantic driver supports private flags for hardware custom features::
$ ethtool --show-priv-flags ethX
@@ -354,7 +393,7 @@ Supported ethtool options
PHYInternalLoopback: off
PHYExternalLoopback: off
- Example:
+ Example::
$ ethtool --set-priv-flags ethX DMASystemLoopback on
@@ -370,93 +409,130 @@ Command Line Parameters
The following command line parameters are available on atlantic driver:
aq_itr -Interrupt throttling mode
-----------------------------------------
+---------------------------------
Accepted values: 0, 1, 0xFFFF
+
Default value: 0xFFFF
-0 - Disable interrupt throttling.
-1 - Enable interrupt throttling and use specified tx and rx rates.
-0xFFFF - Auto throttling mode. Driver will choose the best RX and TX
- interrupt throtting settings based on link speed.
+
+====== ==============================================================
+0 Disable interrupt throttling.
+1 Enable interrupt throttling and use specified tx and rx rates.
+0xFFFF Auto throttling mode. Driver will choose the best RX and TX
+ interrupt throtting settings based on link speed.
+====== ==============================================================
aq_itr_tx - TX interrupt throttle rate
-----------------------------------------
+--------------------------------------
+
Accepted values: 0 - 0x1FF
+
Default value: 0
+
TX side throttling in microseconds. Adapter will setup maximum interrupt delay
to this value. Minimum interrupt delay will be a half of this value
aq_itr_rx - RX interrupt throttle rate
-----------------------------------------
+--------------------------------------
+
Accepted values: 0 - 0x1FF
+
Default value: 0
+
RX side throttling in microseconds. Adapter will setup maximum interrupt delay
to this value. Minimum interrupt delay will be a half of this value
-Note: ITR settings could be changed in runtime by ethtool -c means (see below)
+.. note::
+
+ ITR settings could be changed in runtime by ethtool -c means (see below)
Config file parameters
-=======================
+======================
+
For some fine tuning and performance optimizations,
some parameters can be changed in the {source_dir}/aq_cfg.h file.
AQ_CFG_RX_PAGEORDER
-----------------------------------------
+-------------------
+
Default value: 0
+
RX page order override. Thats a power of 2 number of RX pages allocated for
-each descriptor. Received descriptor size is still limited by AQ_CFG_RX_FRAME_MAX.
+each descriptor. Received descriptor size is still limited by
+AQ_CFG_RX_FRAME_MAX.
+
Increasing pageorder makes page reuse better (actual on iommu enabled systems).
AQ_CFG_RX_REFILL_THRES
-----------------------------------------
+----------------------
+
Default value: 32
+
RX refill threshold. RX path will not refill freed descriptors until the
specified number of free descriptors is observed. Larger values may help
better page reuse but may lead to packet drops as well.
AQ_CFG_VECS_DEF
-------------------------------------------------------------
+---------------
+
Number of queues
+
Valid Range: 0 - 8 (up to AQ_CFG_VECS_MAX)
+
Default value: 8
+
Notice this value will be capped by the number of cores available on the system.
AQ_CFG_IS_RSS_DEF
-------------------------------------------------------------
+-----------------
+
Enable/disable Receive Side Scaling
This feature allows the adapter to distribute receive processing
across multiple CPU-cores and to prevent from overloading a single CPU core.
Valid values
-0 - disabled
-1 - enabled
+
+== ========
+0 disabled
+1 enabled
+== ========
Default value: 1
AQ_CFG_NUM_RSS_QUEUES_DEF
-------------------------------------------------------------
+-------------------------
+
Number of queues for Receive Side Scaling
+
Valid Range: 0 - 8 (up to AQ_CFG_VECS_DEF)
Default value: AQ_CFG_VECS_DEF
AQ_CFG_IS_LRO_DEF
-------------------------------------------------------------
+-----------------
+
Enable/disable Large Receive Offload
This offload enables the adapter to coalesce multiple TCP segments and indicate
them as a single coalesced unit to the OS networking subsystem.
-The system consumes less energy but it also introduces more latency in packets processing.
+
+The system consumes less energy but it also introduces more latency in packets
+processing.
Valid values
-0 - disabled
-1 - enabled
+
+== ========
+0 disabled
+1 enabled
+== ========
Default value: 1
AQ_CFG_TX_CLEAN_BUDGET
-----------------------------------------
+----------------------
+
Maximum descriptors to cleanup on TX at once.
+
Default value: 256
After the aq_cfg.h file changed the driver must be rebuilt to take effect.
@@ -472,7 +548,8 @@ License
=======
aQuantia Corporation Network Driver
-Copyright(c) 2014 - 2019 aQuantia Corporation.
+
+Copyright |copy| 2014 - 2019 aQuantia Corporation.
This program is free software; you can redistribute it and/or modify it
under the terms and conditions of the GNU General Public License,
diff --git a/Documentation/networking/device_drivers/chelsio/cxgb.txt b/Documentation/networking/device_drivers/chelsio/cxgb.rst
index 20a887615c4a..435dce5fa2c7 100644
--- a/Documentation/networking/device_drivers/chelsio/cxgb.txt
+++ b/Documentation/networking/device_drivers/chelsio/cxgb.rst
@@ -1,13 +1,18 @@
- Chelsio N210 10Gb Ethernet Network Controller
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
- Driver Release Notes for Linux
+=============================================
+Chelsio N210 10Gb Ethernet Network Controller
+=============================================
- Version 2.1.1
+Driver Release Notes for Linux
- June 20, 2005
+Version 2.1.1
+
+June 20, 2005
+
+.. Contents
-CONTENTS
-========
INTRODUCTION
FEATURES
PERFORMANCE
@@ -16,7 +21,7 @@ CONTENTS
SUPPORT
-INTRODUCTION
+Introduction
============
This document describes the Linux driver for Chelsio 10Gb Ethernet Network
@@ -24,11 +29,11 @@ INTRODUCTION
compatible with the Chelsio N110 model 10Gb NICs.
-FEATURES
+Features
========
- Adaptive Interrupts (adaptive-rx)
- ---------------------------------
+Adaptive Interrupts (adaptive-rx)
+---------------------------------
This feature provides an adaptive algorithm that adjusts the interrupt
coalescing parameters, allowing the driver to dynamically adapt the latency
@@ -39,24 +44,24 @@ FEATURES
ethtool manpage for additional usage information.
By default, adaptive-rx is disabled.
- To enable adaptive-rx:
+ To enable adaptive-rx::
ethtool -C <interface> adaptive-rx on
- To disable adaptive-rx, use ethtool:
+ To disable adaptive-rx, use ethtool::
ethtool -C <interface> adaptive-rx off
After disabling adaptive-rx, the timer latency value will be set to 50us.
- You may set the timer latency after disabling adaptive-rx:
+ You may set the timer latency after disabling adaptive-rx::
ethtool -C <interface> rx-usecs <microseconds>
- An example to set the timer latency value to 100us on eth0:
+ An example to set the timer latency value to 100us on eth0::
ethtool -C eth0 rx-usecs 100
- You may also provide a timer latency value while disabling adaptive-rx:
+ You may also provide a timer latency value while disabling adaptive-rx::
ethtool -C <interface> adaptive-rx off rx-usecs <microseconds>
@@ -64,13 +69,13 @@ FEATURES
will be set to the specified value until changed by the user or until
adaptive-rx is enabled.
- To view the status of the adaptive-rx and timer latency values:
+ To view the status of the adaptive-rx and timer latency values::
ethtool -c <interface>
- TCP Segmentation Offloading (TSO) Support
- -----------------------------------------
+TCP Segmentation Offloading (TSO) Support
+-----------------------------------------
This feature, also known as "large send", enables a system's protocol stack
to offload portions of outbound TCP processing to a network interface card
@@ -80,20 +85,20 @@ FEATURES
Please see the ethtool manpage for additional usage information.
By default, TSO is enabled.
- To disable TSO:
+ To disable TSO::
ethtool -K <interface> tso off
- To enable TSO:
+ To enable TSO::
ethtool -K <interface> tso on
- To view the status of TSO:
+ To view the status of TSO::
ethtool -k <interface>
-PERFORMANCE
+Performance
===========
The following information is provided as an example of how to change system
@@ -111,59 +116,81 @@ PERFORMANCE
your system. You may want to write a script that runs at boot-up which
includes the optimal settings for your system.
- Setting PCI Latency Timer:
- setpci -d 1425:* 0x0c.l=0x0000F800
+ Setting PCI Latency Timer::
+
+ setpci -d 1425::
+
+* 0x0c.l=0x0000F800
+
+ Disabling TCP timestamp::
- Disabling TCP timestamp:
sysctl -w net.ipv4.tcp_timestamps=0
- Disabling SACK:
+ Disabling SACK::
+
sysctl -w net.ipv4.tcp_sack=0
- Setting large number of incoming connection requests:
+ Setting large number of incoming connection requests::
+
sysctl -w net.ipv4.tcp_max_syn_backlog=3000
- Setting maximum receive socket buffer size:
+ Setting maximum receive socket buffer size::
+
sysctl -w net.core.rmem_max=1024000
- Setting maximum send socket buffer size:
+ Setting maximum send socket buffer size::
+
sysctl -w net.core.wmem_max=1024000
- Set smp_affinity (on a multiprocessor system) to a single CPU:
+ Set smp_affinity (on a multiprocessor system) to a single CPU::
+
echo 1 > /proc/irq/<interrupt_number>/smp_affinity
- Setting default receive socket buffer size:
+ Setting default receive socket buffer size::
+
sysctl -w net.core.rmem_default=524287
- Setting default send socket buffer size:
+ Setting default send socket buffer size::
+
sysctl -w net.core.wmem_default=524287
- Setting maximum option memory buffers:
+ Setting maximum option memory buffers::
+
sysctl -w net.core.optmem_max=524287
- Setting maximum backlog (# of unprocessed packets before kernel drops):
+ Setting maximum backlog (# of unprocessed packets before kernel drops)::
+
sysctl -w net.core.netdev_max_backlog=300000
- Setting TCP read buffers (min/default/max):
+ Setting TCP read buffers (min/default/max)::
+
sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
- Setting TCP write buffers (min/pressure/max):
+ Setting TCP write buffers (min/pressure/max)::
+
sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
- Setting TCP buffer space (min/pressure/max):
+ Setting TCP buffer space (min/pressure/max)::
+
sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000"
TCP window size for single connections:
+
The receive buffer (RX_WINDOW) size must be at least as large as the
Bandwidth-Delay Product of the communication link between the sender and
receiver. Due to the variations of RTT, you may want to increase the buffer
size up to 2 times the Bandwidth-Delay Product. Reference page 289 of
"TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens.
- At 10Gb speeds, use the following formula:
+
+ At 10Gb speeds, use the following formula::
+
RX_WINDOW >= 1.25MBytes * RTT(in milliseconds)
Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000
+
RX_WINDOW sizes of 256KB - 512KB should be sufficient.
- Setting the min, max, and default receive buffer (RX_WINDOW) size:
+
+ Setting the min, max, and default receive buffer (RX_WINDOW) size::
+
sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>"
TCP window size for multiple connections:
@@ -174,30 +201,35 @@ PERFORMANCE
not supported on the machine. Experimentation may be necessary to attain
the correct value. This method is provided as a starting point for the
correct receive buffer size.
+
Setting the min, max, and default receive buffer (RX_WINDOW) size is
performed in the same manner as single connection.
-DRIVER MESSAGES
+Driver Messages
===============
The following messages are the most common messages logged by syslog. These
may be found in /var/log/messages.
- Driver up:
+ Driver up::
+
Chelsio Network Driver - version 2.1.1
- NIC detected:
+ NIC detected::
+
eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit
- Link up:
+ Link up::
+
eth#: link is up at 10 Gbps, full duplex
- Link down:
+ Link down::
+
eth#: link is down
-KNOWN ISSUES
+Known Issues
============
These issues have been identified during testing. The following information
@@ -214,27 +246,33 @@ KNOWN ISSUES
To eliminate the TCP retransmits, set smp_affinity on the particular
interrupt to a single CPU. You can locate the interrupt (IRQ) used on
- the N110/N210 by using ifconfig:
- ifconfig <dev_name> | grep Interrupt
- Set the smp_affinity to a single CPU:
- echo 1 > /proc/irq/<interrupt_number>/smp_affinity
+ the N110/N210 by using ifconfig::
+
+ ifconfig <dev_name> | grep Interrupt
+
+ Set the smp_affinity to a single CPU::
+
+ echo 1 > /proc/irq/<interrupt_number>/smp_affinity
It is highly suggested that you do not run the irqbalance daemon on your
system, as this will change any smp_affinity setting you have applied.
The irqbalance daemon runs on a 10 second interval and binds interrupts
- to the least loaded CPU determined by the daemon. To disable this daemon:
- chkconfig --level 2345 irqbalance off
+ to the least loaded CPU determined by the daemon. To disable this daemon::
+
+ chkconfig --level 2345 irqbalance off
By default, some Linux distributions enable the kernel feature,
irqbalance, which performs the same function as the daemon. To disable
- this feature, add the following line to your bootloader:
- noirqbalance
+ this feature, add the following line to your bootloader::
+
+ noirqbalance
+
+ Example using the Grub bootloader::
- Example using the Grub bootloader:
- title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp)
- root (hd0,0)
- kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance
- initrd /initrd-2.4.21-27.ELsmp.img
+ title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp)
+ root (hd0,0)
+ kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance
+ initrd /initrd-2.4.21-27.ELsmp.img
2. After running insmod, the driver is loaded and the incorrect network
interface is brought up without running ifup.
@@ -277,12 +315,13 @@ KNOWN ISSUES
AMD's provides three workarounds for this problem, however, Chelsio
recommends the first option for best performance with this bug:
- For 133Mhz secondary bus operation, limit the transaction length and
- the number of outstanding transactions, via BIOS configuration
- programming of the PCI-X card, to the following:
+ For 133Mhz secondary bus operation, limit the transaction length and
+ the number of outstanding transactions, via BIOS configuration
+ programming of the PCI-X card, to the following:
- Data Length (bytes): 1k
- Total allowed outstanding transactions: 2
+ Data Length (bytes): 1k
+
+ Total allowed outstanding transactions: 2
Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004,
section 56, "133-MHz Mode Split Completion Data Corruption" for more
@@ -293,8 +332,10 @@ KNOWN ISSUES
have issues with these settings, please revert to the "safe" settings
and duplicate the problem before submitting a bug or asking for support.
- NOTE: The default setting on most systems is 8 outstanding transactions
- and 2k bytes data length.
+ .. note::
+
+ The default setting on most systems is 8 outstanding transactions
+ and 2k bytes data length.
4. On multiprocessor systems, it has been noted that an application which
is handling 10Gb networking can switch between CPUs causing degraded
@@ -320,14 +361,16 @@ KNOWN ISSUES
particular CPU: runon 0 ifup eth0
-SUPPORT
+Support
=======
If you have problems with the software or hardware, please contact our
customer support team via email at support@chelsio.com or check our website
at http://www.chelsio.com
-===============================================================================
+-------------------------------------------------------------------------------
+
+::
Chelsio Communications
370 San Aleso Ave.
@@ -343,10 +386,8 @@ You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
-THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
+THIS SOFTWARE IS PROVIDED ``AS IS`` AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
- Copyright (c) 2003-2005 Chelsio Communications. All rights reserved.
-
-===============================================================================
+Copyright |copy| 2003-2005 Chelsio Communications. All rights reserved.
diff --git a/Documentation/networking/device_drivers/cirrus/cs89x0.txt b/Documentation/networking/device_drivers/cirrus/cs89x0.rst
index 0e190180eec8..e5c283940ac5 100644
--- a/Documentation/networking/device_drivers/cirrus/cs89x0.txt
+++ b/Documentation/networking/device_drivers/cirrus/cs89x0.rst
@@ -1,79 +1,84 @@
+.. SPDX-License-Identifier: GPL-2.0
-NOTE
-----
+================================================
+Cirrus Logic LAN CS8900/CS8920 Ethernet Adapters
+================================================
-This document was contributed by Cirrus Logic for kernel 2.2.5. This version
-has been updated for 2.3.48 by Andrew Morton.
+.. note::
+
+ This document was contributed by Cirrus Logic for kernel 2.2.5. This version
+ has been updated for 2.3.48 by Andrew Morton.
+
+ Still, this is too outdated! A major cleanup is needed here.
Cirrus make a copy of this driver available at their website, as
described below. In general, you should use the driver version which
comes with your Linux distribution.
-
-CIRRUS LOGIC LAN CS8900/CS8920 ETHERNET ADAPTERS
Linux Network Interface Driver ver. 2.00 <kernel 2.3.48>
-===============================================================================
-
-
-TABLE OF CONTENTS
-
-1.0 CIRRUS LOGIC LAN CS8900/CS8920 ETHERNET ADAPTERS
- 1.1 Product Overview
- 1.2 Driver Description
- 1.2.1 Driver Name
- 1.2.2 File in the Driver Package
- 1.3 System Requirements
- 1.4 Licensing Information
-
-2.0 ADAPTER INSTALLATION and CONFIGURATION
- 2.1 CS8900-based Adapter Configuration
- 2.2 CS8920-based Adapter Configuration
-
-3.0 LOADING THE DRIVER AS A MODULE
-
-4.0 COMPILING THE DRIVER
- 4.1 Compiling the Driver as a Loadable Module
- 4.2 Compiling the driver to support memory mode
- 4.3 Compiling the driver to support Rx DMA
-
-5.0 TESTING AND TROUBLESHOOTING
- 5.1 Known Defects and Limitations
- 5.2 Testing the Adapter
- 5.2.1 Diagnostic Self-Test
- 5.2.2 Diagnostic Network Test
- 5.3 Using the Adapter's LEDs
- 5.4 Resolving I/O Conflicts
-
-6.0 TECHNICAL SUPPORT
- 6.1 Contacting Cirrus Logic's Technical Support
- 6.2 Information Required Before Contacting Technical Support
- 6.3 Obtaining the Latest Driver Version
- 6.4 Current maintainer
- 6.5 Kernel boot parameters
-
-
-1.0 CIRRUS LOGIC LAN CS8900/CS8920 ETHERNET ADAPTERS
-===============================================================================
-
-
-1.1 PRODUCT OVERVIEW
-
-The CS8900-based ISA Ethernet Adapters from Cirrus Logic follow
-IEEE 802.3 standards and support half or full-duplex operation in ISA bus
-computers on 10 Mbps Ethernet networks. The adapters are designed for operation
-in 16-bit ISA or EISA bus expansion slots and are available in
-10BaseT-only or 3-media configurations (10BaseT, 10Base2, and AUI for 10Base-5
-or fiber networks).
-
-CS8920-based adapters are similar to the CS8900-based adapter with additional
-features for Plug and Play (PnP) support and Wakeup Frame recognition. As
-such, the configuration procedures differ somewhat between the two types of
-adapters. Refer to the "Adapter Configuration" section for details on
+
+
+.. TABLE OF CONTENTS
+
+ 1.0 CIRRUS LOGIC LAN CS8900/CS8920 ETHERNET ADAPTERS
+ 1.1 Product Overview
+ 1.2 Driver Description
+ 1.2.1 Driver Name
+ 1.2.2 File in the Driver Package
+ 1.3 System Requirements
+ 1.4 Licensing Information
+
+ 2.0 ADAPTER INSTALLATION and CONFIGURATION
+ 2.1 CS8900-based Adapter Configuration
+ 2.2 CS8920-based Adapter Configuration
+
+ 3.0 LOADING THE DRIVER AS A MODULE
+
+ 4.0 COMPILING THE DRIVER
+ 4.1 Compiling the Driver as a Loadable Module
+ 4.2 Compiling the driver to support memory mode
+ 4.3 Compiling the driver to support Rx DMA
+
+ 5.0 TESTING AND TROUBLESHOOTING
+ 5.1 Known Defects and Limitations
+ 5.2 Testing the Adapter
+ 5.2.1 Diagnostic Self-Test
+ 5.2.2 Diagnostic Network Test
+ 5.3 Using the Adapter's LEDs
+ 5.4 Resolving I/O Conflicts
+
+ 6.0 TECHNICAL SUPPORT
+ 6.1 Contacting Cirrus Logic's Technical Support
+ 6.2 Information Required Before Contacting Technical Support
+ 6.3 Obtaining the Latest Driver Version
+ 6.4 Current maintainer
+ 6.5 Kernel boot parameters
+
+
+1. Cirrus Logic LAN CS8900/CS8920 Ethernet Adapters
+===================================================
+
+
+1.1. Product Overview
+=====================
+
+The CS8900-based ISA Ethernet Adapters from Cirrus Logic follow
+IEEE 802.3 standards and support half or full-duplex operation in ISA bus
+computers on 10 Mbps Ethernet networks. The adapters are designed for operation
+in 16-bit ISA or EISA bus expansion slots and are available in
+10BaseT-only or 3-media configurations (10BaseT, 10Base2, and AUI for 10Base-5
+or fiber networks).
+
+CS8920-based adapters are similar to the CS8900-based adapter with additional
+features for Plug and Play (PnP) support and Wakeup Frame recognition. As
+such, the configuration procedures differ somewhat between the two types of
+adapters. Refer to the "Adapter Configuration" section for details on
configuring both types of adapters.
-1.2 DRIVER DESCRIPTION
+1.2. Driver Description
+=======================
The CS8900/CS8920 Ethernet Adapter driver for Linux supports the Linux
v2.3.48 or greater kernel. It can be compiled directly into the kernel
@@ -85,22 +90,25 @@ or loaded at run-time as a device driver module.
The files in the driver at Cirrus' website include:
- readme.txt - this file
- build - batch file to compile cs89x0.c.
- cs89x0.c - driver C code
- cs89x0.h - driver header file
- cs89x0.o - pre-compiled module (for v2.2.5 kernel)
- config/Config.in - sample file to include cs89x0 driver in the kernel.
- config/Makefile - sample file to include cs89x0 driver in the kernel.
- config/Space.c - sample file to include cs89x0 driver in the kernel.
+ =================== ====================================================
+ readme.txt this file
+ build batch file to compile cs89x0.c.
+ cs89x0.c driver C code
+ cs89x0.h driver header file
+ cs89x0.o pre-compiled module (for v2.2.5 kernel)
+ config/Config.in sample file to include cs89x0 driver in the kernel.
+ config/Makefile sample file to include cs89x0 driver in the kernel.
+ config/Space.c sample file to include cs89x0 driver in the kernel.
+ =================== ====================================================
-1.3 SYSTEM REQUIREMENTS
+1.3. System Requirements
+------------------------
The following hardware is required:
- * Cirrus Logic LAN (CS8900/20-based) Ethernet ISA Adapter
+ * Cirrus Logic LAN (CS8900/20-based) Ethernet ISA Adapter
* IBM or IBM-compatible PC with:
* An 80386 or higher processor
@@ -118,20 +126,21 @@ The following software is required:
* LINUX kernel sources for your kernel (if compiling into kernel)
- * GNU Toolkit (gcc and make) v2.6 or above (if compiling into kernel
- or a module)
+ * GNU Toolkit (gcc and make) v2.6 or above (if compiling into kernel
+ or a module)
-1.4 LICENSING INFORMATION
+1.4. Licensing Information
+--------------------------
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, version 1.
This program is distributed in the hope that it will be useful, but WITHOUT
-ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
For a full copy of the GNU General Public License, write to the Free Software
@@ -139,28 +148,29 @@ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
-2.0 ADAPTER INSTALLATION and CONFIGURATION
-===============================================================================
+2. Adapter Installation and Configuration
+=========================================
-Both the CS8900 and CS8920-based adapters can be configured using parameters
-stored in an on-board EEPROM. You must use the DOS-based CS8900/20 Setup
-Utility if you want to change the adapter's configuration in EEPROM.
+Both the CS8900 and CS8920-based adapters can be configured using parameters
+stored in an on-board EEPROM. You must use the DOS-based CS8900/20 Setup
+Utility if you want to change the adapter's configuration in EEPROM.
-When loading the driver as a module, you can specify many of the adapter's
-configuration parameters on the command-line to override the EEPROM's settings
-or for interface configuration when an EEPROM is not used. (CS8920-based
+When loading the driver as a module, you can specify many of the adapter's
+configuration parameters on the command-line to override the EEPROM's settings
+or for interface configuration when an EEPROM is not used. (CS8920-based
adapters must use an EEPROM.) See Section 3.0 LOADING THE DRIVER AS A MODULE.
-Since the CS8900/20 Setup Utility is a DOS-based application, you must install
-and configure the adapter in a DOS-based system using the CS8900/20 Setup
-Utility before installation in the target LINUX system. (Not required if
+Since the CS8900/20 Setup Utility is a DOS-based application, you must install
+and configure the adapter in a DOS-based system using the CS8900/20 Setup
+Utility before installation in the target LINUX system. (Not required if
installing a CS8900-based adapter and the default configuration is acceptable.)
-
-2.1 CS8900-BASED ADAPTER CONFIGURATION
-CS8900-based adapters shipped from Cirrus Logic have been configured
-with the following "default" settings:
+2.1. CS8900-based Adapter Configuration
+---------------------------------------
+
+CS8900-based adapters shipped from Cirrus Logic have been configured
+with the following "default" settings::
Operation Mode: Memory Mode
IRQ: 10
@@ -169,15 +179,16 @@ with the following "default" settings:
Optimization: DOS Client
Transmission Mode: Half-duplex
BootProm: None
- Media Type: Autodetect (3-media cards) or
- 10BASE-T (10BASE-T only adapter)
+ Media Type: Autodetect (3-media cards) or
+ 10BASE-T (10BASE-T only adapter)
-You should only change the default configuration settings if conflicts with
-another adapter exists. To change the adapter's configuration, run the
-CS8900/20 Setup Utility.
+You should only change the default configuration settings if conflicts with
+another adapter exists. To change the adapter's configuration, run the
+CS8900/20 Setup Utility.
-2.2 CS8920-BASED ADAPTER CONFIGURATION
+2.2. CS8920-based Adapter Configuration
+---------------------------------------
CS8920-based adapters are shipped from Cirrus Logic configured as Plug
and Play (PnP) enabled. However, since the cs89x0 driver does NOT
@@ -185,82 +196,83 @@ support PnP, you must install the CS8920 adapter in a DOS-based PC and
run the CS8900/20 Setup Utility to disable PnP and configure the
adapter before installation in the target Linux system. Failure to do
this will leave the adapter inactive and the driver will be unable to
-communicate with the adapter.
+communicate with the adapter.
+::
- ****************************************************************
- * CS8920-BASED ADAPTERS: *
- * *
- * CS8920-BASED ADAPTERS ARE PLUG and PLAY ENABLED BY DEFAULT. *
- * THE CS89X0 DRIVER DOES NOT SUPPORT PnP. THEREFORE, YOU MUST *
- * RUN THE CS8900/20 SETUP UTILITY TO DISABLE PnP SUPPORT AND *
- * TO ACTIVATE THE ADAPTER. *
- ****************************************************************
+ ****************************************************************
+ * CS8920-BASED ADAPTERS: *
+ * *
+ * CS8920-BASED ADAPTERS ARE PLUG and PLAY ENABLED BY DEFAULT. *
+ * THE CS89X0 DRIVER DOES NOT SUPPORT PnP. THEREFORE, YOU MUST *
+ * RUN THE CS8900/20 SETUP UTILITY TO DISABLE PnP SUPPORT AND *
+ * TO ACTIVATE THE ADAPTER. *
+ ****************************************************************
-3.0 LOADING THE DRIVER AS A MODULE
-===============================================================================
+3. Loading the Driver as a Module
+=================================
If the driver is compiled as a loadable module, you can load the driver module
-with the 'modprobe' command. Many of the adapter's configuration parameters can
-be specified as command-line arguments to the load command. This facility
-provides a means to override the EEPROM's settings or for interface
+with the 'modprobe' command. Many of the adapter's configuration parameters can
+be specified as command-line arguments to the load command. This facility
+provides a means to override the EEPROM's settings or for interface
configuration when an EEPROM is not used.
-Example:
+Example::
insmod cs89x0.o io=0x200 irq=0xA media=aui
This example loads the module and configures the adapter to use an IO port base
address of 200h, interrupt 10, and use the AUI media connection. The following
-configuration options are available on the command line:
-
-* io=### - specify IO address (200h-360h)
-* irq=## - specify interrupt level
-* use_dma=1 - Enable DMA
-* dma=# - specify dma channel (Driver is compiled to support
- Rx DMA only)
-* dmasize=# (16 or 64) - DMA size 16K or 64K. Default value is set to 16.
-* media=rj45 - specify media type
+configuration options are available on the command line::
+
+ io=### - specify IO address (200h-360h)
+ irq=## - specify interrupt level
+ use_dma=1 - Enable DMA
+ dma=# - specify dma channel (Driver is compiled to support
+ Rx DMA only)
+ dmasize=# (16 or 64) - DMA size 16K or 64K. Default value is set to 16.
+ media=rj45 - specify media type
or media=bnc
or media=aui
or media=auto
-* duplex=full - specify forced half/full/autonegotiate duplex
+ duplex=full - specify forced half/full/autonegotiate duplex
or duplex=half
or duplex=auto
-* debug=# - debug level (only available if the driver was compiled
- for debugging)
+ debug=# - debug level (only available if the driver was compiled
+ for debugging)
-NOTES:
+**Notes:**
a) If an EEPROM is present, any specified command-line parameter
will override the corresponding configuration value stored in
EEPROM.
-b) The "io" parameter must be specified on the command-line.
+b) The "io" parameter must be specified on the command-line.
c) The driver's hardware probe routine is designed to avoid
writing to I/O space until it knows that there is a cs89x0
card at the written addresses. This could cause problems
with device probing. To avoid this behaviour, add one
- to the `io=' module parameter. This doesn't actually change
+ to the ``io=`` module parameter. This doesn't actually change
the I/O address, but it is a flag to tell the driver
to partially initialise the hardware before trying to
identify the card. This could be dangerous if you are
not sure that there is a cs89x0 card at the provided address.
For example, to scan for an adapter located at IO base 0x300,
- specify an IO address of 0x301.
+ specify an IO address of 0x301.
d) The "duplex=auto" parameter is only supported for the CS8920.
e) The minimum command-line configuration required if an EEPROM is
not present is:
- io
- irq
+ io
+ irq
media type (no autodetect)
f) The following additional parameters are CS89XX defaults (values
@@ -282,13 +294,13 @@ h) Many Linux distributions use the 'modprobe' command to load
module when it is loaded. All the configuration options which are
described above may be placed within /etc/conf.modules.
- For example:
+ For example::
- > cat /etc/conf.modules
- ...
- alias eth0 cs89x0
- options cs89x0 io=0x0200 dma=5 use_dma=1
- ...
+ > cat /etc/conf.modules
+ ...
+ alias eth0 cs89x0
+ options cs89x0 io=0x0200 dma=5 use_dma=1
+ ...
In this example we are telling the module system that the
ethernet driver for this machine should use the cs89x0 driver. We
@@ -305,9 +317,9 @@ j) The cs89x0 supports DMA for receiving only. DMA mode is
k) If your Linux kernel was compiled with inbuilt plug-and-play
support you will be able to find information about the cs89x0 card
- with the command
+ with the command::
- cat /proc/isapnp
+ cat /proc/isapnp
l) If during DMA operation you find erratic behavior or network data
corruption you should use your PC's BIOS to slow the EISA bus clock.
@@ -321,11 +333,11 @@ n) If the cs89x0 driver is compiled directly into the kernel, DMA
mode may be selected by providing the kernel with a boot option
'cs89x0_dma=N' where 'N' is the desired DMA channel number (5, 6 or 7).
- Kernel boot options may be provided on the LILO command line:
+ Kernel boot options may be provided on the LILO command line::
LILO boot: linux cs89x0_dma=5
- or they may be placed in /etc/lilo.conf:
+ or they may be placed in /etc/lilo.conf::
image=/boot/bzImage-2.3.48
append="cs89x0_dma=5"
@@ -337,237 +349,246 @@ n) If the cs89x0 driver is compiled directly into the kernel, DMA
(64k mode is not available).
-4.0 COMPILING THE DRIVER
-===============================================================================
+4. Compiling the Driver
+=======================
The cs89x0 driver can be compiled directly into the kernel or compiled into
a loadable device driver module.
+Just use the standard way to configure the driver and compile the Kernel.
-4.1 COMPILING THE DRIVER AS A LOADABLE MODULE
-
-To compile the driver into a loadable module, use the following command
-(single command line, without quotes):
-
-"gcc -D__KERNEL__ -I/usr/src/linux/include -I/usr/src/linux/net/inet -Wall
--Wstrict-prototypes -O2 -fomit-frame-pointer -DMODULE -DCONFIG_MODVERSIONS
--c cs89x0.c"
-
-4.2 COMPILING THE DRIVER TO SUPPORT MEMORY MODE
-
-Support for memory mode was not carried over into the 2.3 series kernels.
-4.3 COMPILING THE DRIVER TO SUPPORT Rx DMA
+4.1. Compiling the Driver to Support Rx DMA
+-------------------------------------------
The compile-time optionality for DMA was removed in the 2.3 kernel
series. DMA support is now unconditionally part of the driver. It is
enabled by the 'use_dma=1' module option.
-5.0 TESTING AND TROUBLESHOOTING
-===============================================================================
+5. Testing and Troubleshooting
+==============================
-5.1 KNOWN DEFECTS and LIMITATIONS
+5.1. Known Defects and Limitations
+----------------------------------
-Refer to the RELEASE.TXT file distributed as part of this archive for a list of
+Refer to the RELEASE.TXT file distributed as part of this archive for a list of
known defects, driver limitations, and work arounds.
-5.2 TESTING THE ADAPTER
+5.2. Testing the Adapter
+------------------------
-Once the adapter has been installed and configured, the diagnostic option of
-the CS8900/20 Setup Utility can be used to test the functionality of the
+Once the adapter has been installed and configured, the diagnostic option of
+the CS8900/20 Setup Utility can be used to test the functionality of the
adapter and its network connection. Use the diagnostics 'Self Test' option to
test the functionality of the adapter with the hardware configuration you have
assigned. You can use the diagnostics 'Network Test' to test the ability of the
-adapter to communicate across the Ethernet with another PC equipped with a
-CS8900/20-based adapter card (it must also be running the CS8900/20 Setup
+adapter to communicate across the Ethernet with another PC equipped with a
+CS8900/20-based adapter card (it must also be running the CS8900/20 Setup
Utility).
- NOTE: The Setup Utility's diagnostics are designed to run in a
- DOS-only operating system environment. DO NOT run the diagnostics
- from a DOS or command prompt session under Windows 95, Windows NT,
- OS/2, or other operating system.
+.. note::
+
+ The Setup Utility's diagnostics are designed to run in a
+ DOS-only operating system environment. DO NOT run the diagnostics
+ from a DOS or command prompt session under Windows 95, Windows NT,
+ OS/2, or other operating system.
To run the diagnostics tests on the CS8900/20 adapter:
- 1.) Boot DOS on the PC and start the CS8900/20 Setup Utility.
+ 1. Boot DOS on the PC and start the CS8900/20 Setup Utility.
- 2.) The adapter's current configuration is displayed. Hit the ENTER key to
+ 2. The adapter's current configuration is displayed. Hit the ENTER key to
get to the main menu.
- 4.) Select 'Diagnostics' (ALT-G) from the main menu.
+ 4. Select 'Diagnostics' (ALT-G) from the main menu.
* Select 'Self-Test' to test the adapter's basic functionality.
* Select 'Network Test' to test the network connection and cabling.
-5.2.1 DIAGNOSTIC SELF-TEST
+5.2.1. Diagnostic Self-test
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The diagnostic self-test checks the adapter's basic functionality as well as
-its ability to communicate across the ISA bus based on the system resources
+The diagnostic self-test checks the adapter's basic functionality as well as
+its ability to communicate across the ISA bus based on the system resources
assigned during hardware configuration. The following tests are performed:
* IO Register Read/Write Test
- The IO Register Read/Write test insures that the CS8900/20 can be
+
+ The IO Register Read/Write test insures that the CS8900/20 can be
accessed in IO mode, and that the IO base address is correct.
* Shared Memory Test
- The Shared Memory test insures the CS8900/20 can be accessed in memory
- mode and that the range of memory addresses assigned does not conflict
+
+ The Shared Memory test insures the CS8900/20 can be accessed in memory
+ mode and that the range of memory addresses assigned does not conflict
with other devices in the system.
* Interrupt Test
+
The Interrupt test insures there are no conflicts with the assigned IRQ
signal.
* EEPROM Test
+
The EEPROM test insures the EEPROM can be read.
* Chip RAM Test
+
The Chip RAM test insures the 4K of memory internal to the CS8900/20 is
working properly.
* Internal Loop-back Test
- The Internal Loop Back test insures the adapter's transmitter and
- receiver are operating properly. If this test fails, make sure the
- adapter's cable is connected to the network (check for LED activity for
+
+ The Internal Loop Back test insures the adapter's transmitter and
+ receiver are operating properly. If this test fails, make sure the
+ adapter's cable is connected to the network (check for LED activity for
example).
* Boot PROM Test
+
The Boot PROM test insures the Boot PROM is present, and can be read.
Failure indicates the Boot PROM was not successfully read due to a
hardware problem or due to a conflicts on the Boot PROM address
assignment. (Test only applies if the adapter is configured to use the
Boot PROM option.)
-Failure of a test item indicates a possible system resource conflict with
-another device on the ISA bus. In this case, you should use the Manual Setup
+Failure of a test item indicates a possible system resource conflict with
+another device on the ISA bus. In this case, you should use the Manual Setup
option to reconfigure the adapter by selecting a different value for the system
resource that failed.
-5.2.2 DIAGNOSTIC NETWORK TEST
+5.2.2. Diagnostic Network Test
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The Diagnostic Network Test verifies a working network connection by
-transferring data between two CS8900/20 adapters installed in different PCs
-on the same network. (Note: the diagnostic network test should not be run
-between two nodes across a router.)
+The Diagnostic Network Test verifies a working network connection by
+transferring data between two CS8900/20 adapters installed in different PCs
+on the same network. (Note: the diagnostic network test should not be run
+between two nodes across a router.)
This test requires that each of the two PCs have a CS8900/20-based adapter
-installed and have the CS8900/20 Setup Utility running. The first PC is
-configured as a Responder and the other PC is configured as an Initiator.
-Once the Initiator is started, it sends data frames to the Responder which
+installed and have the CS8900/20 Setup Utility running. The first PC is
+configured as a Responder and the other PC is configured as an Initiator.
+Once the Initiator is started, it sends data frames to the Responder which
returns the frames to the Initiator.
-The total number of frames received and transmitted are displayed on the
-Initiator's display, along with a count of the number of frames received and
-transmitted OK or in error. The test can be terminated anytime by the user at
+The total number of frames received and transmitted are displayed on the
+Initiator's display, along with a count of the number of frames received and
+transmitted OK or in error. The test can be terminated anytime by the user at
either PC.
To setup the Diagnostic Network Test:
- 1.) Select a PC with a CS8900/20-based adapter and a known working network
- connection to act as the Responder. Run the CS8900/20 Setup Utility
- and select 'Diagnostics -> Network Test -> Responder' from the main
- menu. Hit ENTER to start the Responder.
+ 1. Select a PC with a CS8900/20-based adapter and a known working network
+ connection to act as the Responder. Run the CS8900/20 Setup Utility
+ and select 'Diagnostics -> Network Test -> Responder' from the main
+ menu. Hit ENTER to start the Responder.
- 2.) Return to the PC with the CS8900/20-based adapter you want to test and
- start the CS8900/20 Setup Utility.
+ 2. Return to the PC with the CS8900/20-based adapter you want to test and
+ start the CS8900/20 Setup Utility.
+
+ 3. From the main menu, Select 'Diagnostic -> Network Test -> Initiator'.
+ Hit ENTER to start the test.
- 3.) From the main menu, Select 'Diagnostic -> Network Test -> Initiator'.
- Hit ENTER to start the test.
-
You may stop the test on the Initiator at any time while allowing the Responder
-to continue running. In this manner, you can move to additional PCs and test
-them by starting the Initiator on another PC without having to stop/start the
+to continue running. In this manner, you can move to additional PCs and test
+them by starting the Initiator on another PC without having to stop/start the
Responder.
-
-5.3 USING THE ADAPTER'S LEDs
-The 2 and 3-media adapters have two LEDs visible on the back end of the board
-located near the 10Base-T connector.
+5.3. Using the Adapter's LEDs
+-----------------------------
+
+The 2 and 3-media adapters have two LEDs visible on the back end of the board
+located near the 10Base-T connector.
-Link Integrity LED: A "steady" ON of the green LED indicates a valid 10Base-T
+Link Integrity LED: A "steady" ON of the green LED indicates a valid 10Base-T
connection. (Only applies to 10Base-T. The green LED has no significance for
a 10Base-2 or AUI connection.)
-TX/RX LED: The yellow LED lights briefly each time the adapter transmits or
+TX/RX LED: The yellow LED lights briefly each time the adapter transmits or
receives data. (The yellow LED will appear to "flicker" on a typical network.)
-5.4 RESOLVING I/O CONFLICTS
+5.4. Resolving I/O Conflicts
+----------------------------
-An IO conflict occurs when two or more adapter use the same ISA resource (IO
-address, memory address or IRQ). You can usually detect an IO conflict in one
+An IO conflict occurs when two or more adapter use the same ISA resource (IO
+address, memory address or IRQ). You can usually detect an IO conflict in one
of four ways after installing and or configuring the CS8900/20-based adapter:
- 1.) The system does not boot properly (or at all).
+ 1. The system does not boot properly (or at all).
- 2.) The driver cannot communicate with the adapter, reporting an "Adapter
- not found" error message.
+ 2. The driver cannot communicate with the adapter, reporting an "Adapter
+ not found" error message.
- 3.) You cannot connect to the network or the driver will not load.
+ 3. You cannot connect to the network or the driver will not load.
- 4.) If you have configured the adapter to run in memory mode but the driver
- reports it is using IO mode when loading, this is an indication of a
- memory address conflict.
+ 4. If you have configured the adapter to run in memory mode but the driver
+ reports it is using IO mode when loading, this is an indication of a
+ memory address conflict.
-If an IO conflict occurs, run the CS8900/20 Setup Utility and perform a
-diagnostic self-test. Normally, the ISA resource in conflict will fail the
-self-test. If so, reconfigure the adapter selecting another choice for the
-resource in conflict. Run the diagnostics again to check for further IO
+If an IO conflict occurs, run the CS8900/20 Setup Utility and perform a
+diagnostic self-test. Normally, the ISA resource in conflict will fail the
+self-test. If so, reconfigure the adapter selecting another choice for the
+resource in conflict. Run the diagnostics again to check for further IO
conflicts.
In some cases, such as when the PC will not boot, it may be necessary to remove
-the adapter and reconfigure it by installing it in another PC to run the
-CS8900/20 Setup Utility. Once reinstalled in the target system, run the
-diagnostics self-test to ensure the new configuration is free of conflicts
+the adapter and reconfigure it by installing it in another PC to run the
+CS8900/20 Setup Utility. Once reinstalled in the target system, run the
+diagnostics self-test to ensure the new configuration is free of conflicts
before loading the driver again.
-When manually configuring the adapter, keep in mind the typical ISA system
+When manually configuring the adapter, keep in mind the typical ISA system
resource usage as indicated in the tables below.
-I/O Address Device IRQ Device
------------ -------- --- --------
- 200-20F Game I/O adapter 3 COM2, Bus Mouse
- 230-23F Bus Mouse 4 COM1
- 270-27F LPT3: third parallel port 5 LPT2
- 2F0-2FF COM2: second serial port 6 Floppy Disk controller
- 320-32F Fixed disk controller 7 LPT1
- 8 Real-time Clock
- 9 EGA/VGA display adapter
- 12 Mouse (PS/2)
-Memory Address Device 13 Math Coprocessor
--------------- --------------------- 14 Hard Disk controller
-A000-BFFF EGA Graphics Adapter
-A000-C7FF VGA Graphics Adapter
-B000-BFFF Mono Graphics Adapter
-B800-BFFF Color Graphics Adapter
-E000-FFFF AT BIOS
+::
+ I/O Address Device IRQ Device
+ ----------- -------- --- --------
+ 200-20F Game I/O adapter 3 COM2, Bus Mouse
+ 230-23F Bus Mouse 4 COM1
+ 270-27F LPT3: third parallel port 5 LPT2
+ 2F0-2FF COM2: second serial port 6 Floppy Disk controller
+ 320-32F Fixed disk controller 7 LPT1
+ 8 Real-time Clock
+ 9 EGA/VGA display adapter
+ 12 Mouse (PS/2)
+ Memory Address Device 13 Math Coprocessor
+ -------------- --------------------- 14 Hard Disk controller
+ A000-BFFF EGA Graphics Adapter
+ A000-C7FF VGA Graphics Adapter
+ B000-BFFF Mono Graphics Adapter
+ B800-BFFF Color Graphics Adapter
+ E000-FFFF AT BIOS
-6.0 TECHNICAL SUPPORT
-===============================================================================
-6.1 CONTACTING CIRRUS LOGIC'S TECHNICAL SUPPORT
+6. Technical Support
+====================
-Cirrus Logic's CS89XX Technical Application Support can be reached at:
+6.1. Contacting Cirrus Logic's Technical Support
+------------------------------------------------
-Telephone :(800) 888-5016 (from inside U.S. and Canada)
- :(512) 442-7555 (from outside the U.S. and Canada)
-Fax :(512) 912-3871
-Email :ethernet@crystal.cirrus.com
-WWW :http://www.cirrus.com
+Cirrus Logic's CS89XX Technical Application Support can be reached at::
+ Telephone :(800) 888-5016 (from inside U.S. and Canada)
+ :(512) 442-7555 (from outside the U.S. and Canada)
+ Fax :(512) 912-3871
+ Email :ethernet@crystal.cirrus.com
+ WWW :http://www.cirrus.com
-6.2 INFORMATION REQUIRED BEFORE CONTACTING TECHNICAL SUPPORT
-Before contacting Cirrus Logic for technical support, be prepared to provide as
-Much of the following information as possible.
+6.2. Information Required before Contacting Technical Support
+-------------------------------------------------------------
+
+Before contacting Cirrus Logic for technical support, be prepared to provide as
+Much of the following information as possible.
1.) Adapter type (CRD8900, CDB8900, CDB8920, etc.)
@@ -575,7 +596,7 @@ Much of the following information as possible.
* IO Base, Memory Base, IO or memory mode enabled, IRQ, DMA channel
* Plug and Play enabled/disabled (CS8920-based adapters only)
- * Configured for media auto-detect or specific media type (which type).
+ * Configured for media auto-detect or specific media type (which type).
3.) PC System's Configuration
@@ -590,35 +611,37 @@ Much of the following information as possible.
* CS89XX driver and version
* Your network operating system and version
- * Your system's OS version
+ * Your system's OS version
* Version of all protocol support files
5.) Any Error Message displayed.
-6.3 OBTAINING THE LATEST DRIVER VERSION
+6.3 Obtaining the Latest Driver Version
+---------------------------------------
-You can obtain the latest CS89XX drivers and support software from Cirrus Logic's
+You can obtain the latest CS89XX drivers and support software from Cirrus Logic's
Web site. You can also contact Cirrus Logic's Technical Support (email:
-ethernet@crystal.cirrus.com) and request that you be registered for automatic
+ethernet@crystal.cirrus.com) and request that you be registered for automatic
software-update notification.
Cirrus Logic maintains a web page at http://www.cirrus.com with the
latest drivers and technical publications.
-6.4 Current maintainer
+6.4. Current maintainer
+-----------------------
In February 2000 the maintenance of this driver was assumed by Andrew
Morton.
6.5 Kernel module parameters
+----------------------------
For use in embedded environments with no cs89x0 EEPROM, the kernel boot
-parameter `cs89x0_media=' has been implemented. Usage is:
+parameter ``cs89x0_media=`` has been implemented. Usage is::
cs89x0_media=rj45 or
cs89x0_media=aui or
cs89x0_media=bnc
-
diff --git a/Documentation/networking/device_drivers/davicom/dm9000.txt b/Documentation/networking/device_drivers/davicom/dm9000.rst
index 5552e2e575c5..d5458da01083 100644
--- a/Documentation/networking/device_drivers/davicom/dm9000.txt
+++ b/Documentation/networking/device_drivers/davicom/dm9000.rst
@@ -1,7 +1,11 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
DM9000 Network driver
=====================
Copyright 2008 Simtec Electronics,
+
Ben Dooks <ben@simtec.co.uk> <ben-linux@fluff.org>
@@ -30,9 +34,9 @@ These resources should be specified in that order, as the ordering of the
two address regions is important (the driver expects these to be address
and then data).
-An example from arch/arm/mach-s3c2410/mach-bast.c is:
+An example from arch/arm/mach-s3c2410/mach-bast.c is::
-static struct resource bast_dm9k_resource[] = {
+ static struct resource bast_dm9k_resource[] = {
[0] = {
.start = S3C2410_CS5 + BAST_PA_DM9000,
.end = S3C2410_CS5 + BAST_PA_DM9000 + 3,
@@ -48,14 +52,14 @@ static struct resource bast_dm9k_resource[] = {
.end = IRQ_DM9000,
.flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHLEVEL,
}
-};
+ };
-static struct platform_device bast_device_dm9k = {
+ static struct platform_device bast_device_dm9k = {
.name = "dm9000",
.id = 0,
.num_resources = ARRAY_SIZE(bast_dm9k_resource),
.resource = bast_dm9k_resource,
-};
+ };
Note the setting of the IRQ trigger flag in bast_dm9k_resource[2].flags,
as this will generate a warning if it is not present. The trigger from
@@ -64,13 +68,13 @@ handler to ensure that the IRQ is setup correctly.
This shows a typical platform device, without the optional configuration
platform data supplied. The next example uses the same resources, but adds
-the optional platform data to pass extra configuration data:
+the optional platform data to pass extra configuration data::
-static struct dm9000_plat_data bast_dm9k_platdata = {
+ static struct dm9000_plat_data bast_dm9k_platdata = {
.flags = DM9000_PLATF_16BITONLY,
-};
+ };
-static struct platform_device bast_device_dm9k = {
+ static struct platform_device bast_device_dm9k = {
.name = "dm9000",
.id = 0,
.num_resources = ARRAY_SIZE(bast_dm9k_resource),
@@ -78,7 +82,7 @@ static struct platform_device bast_device_dm9k = {
.dev = {
.platform_data = &bast_dm9k_platdata,
}
-};
+ };
The platform data is defined in include/linux/dm9000.h and described below.
diff --git a/Documentation/networking/device_drivers/dec/de4x5.txt b/Documentation/networking/device_drivers/dec/de4x5.rst
index 452aac58341d..e03e9c631879 100644
--- a/Documentation/networking/device_drivers/dec/de4x5.txt
+++ b/Documentation/networking/device_drivers/dec/de4x5.rst
@@ -1,48 +1,54 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================
+DEC EtherWORKS Ethernet De4x5 cards
+===================================
+
Originally, this driver was written for the Digital Equipment
Corporation series of EtherWORKS Ethernet cards:
- DE425 TP/COAX EISA
- DE434 TP PCI
- DE435 TP/COAX/AUI PCI
- DE450 TP/COAX/AUI PCI
- DE500 10/100 PCI Fasternet
+ - DE425 TP/COAX EISA
+ - DE434 TP PCI
+ - DE435 TP/COAX/AUI PCI
+ - DE450 TP/COAX/AUI PCI
+ - DE500 10/100 PCI Fasternet
but it will now attempt to support all cards which conform to the
Digital Semiconductor SROM Specification. The driver currently
recognises the following chips:
- DC21040 (no SROM)
- DC21041[A]
- DC21140[A]
- DC21142
- DC21143
+ - DC21040 (no SROM)
+ - DC21041[A]
+ - DC21140[A]
+ - DC21142
+ - DC21143
So far the driver is known to work with the following cards:
- KINGSTON
- Linksys
- ZNYX342
- SMC8432
- SMC9332 (w/new SROM)
- ZNYX31[45]
- ZNYX346 10/100 4 port (can act as a 10/100 bridge!)
+ - KINGSTON
+ - Linksys
+ - ZNYX342
+ - SMC8432
+ - SMC9332 (w/new SROM)
+ - ZNYX31[45]
+ - ZNYX346 10/100 4 port (can act as a 10/100 bridge!)
The driver has been tested on a relatively busy network using the DE425,
DE434, DE435 and DE500 cards and benchmarked with 'ttcp': it transferred
- 16M of data to a DECstation 5000/200 as follows:
+ 16M of data to a DECstation 5000/200 as follows::
- TCP UDP
- TX RX TX RX
- DE425 1030k 997k 1170k 1128k
- DE434 1063k 995k 1170k 1125k
- DE435 1063k 995k 1170k 1125k
- DE500 1063k 998k 1170k 1125k in 10Mb/s mode
+ TCP UDP
+ TX RX TX RX
+ DE425 1030k 997k 1170k 1128k
+ DE434 1063k 995k 1170k 1125k
+ DE435 1063k 995k 1170k 1125k
+ DE500 1063k 998k 1170k 1125k in 10Mb/s mode
All values are typical (in kBytes/sec) from a sample of 4 for each
measurement. Their error is +/-20k on a quiet (private) network and also
depend on what load the CPU has.
- =========================================================================
+----------------------------------------------------------------------------
The ability to load this driver as a loadable module has been included
and used extensively during the driver development (to save those long
@@ -55,31 +61,33 @@
0) have a copy of the loadable modules code installed on your system.
1) copy de4x5.c from the /linux/drivers/net directory to your favourite
- temporary directory.
+ temporary directory.
2) for fixed autoprobes (not recommended), edit the source code near
- line 5594 to reflect the I/O address you're using, or assign these when
- loading by:
+ line 5594 to reflect the I/O address you're using, or assign these when
+ loading by::
- insmod de4x5 io=0xghh where g = bus number
- hh = device number
+ insmod de4x5 io=0xghh where g = bus number
+ hh = device number
- NB: autoprobing for modules is now supported by default. You may just
- use:
+ .. note::
- insmod de4x5
+ autoprobing for modules is now supported by default. You may just
+ use::
- to load all available boards. For a specific board, still use
+ insmod de4x5
+
+ to load all available boards. For a specific board, still use
the 'io=?' above.
3) compile de4x5.c, but include -DMODULE in the command line to ensure
- that the correct bits are compiled (see end of source code).
+ that the correct bits are compiled (see end of source code).
4) if you are wanting to add a new card, goto 5. Otherwise, recompile a
- kernel with the de4x5 configuration turned off and reboot.
+ kernel with the de4x5 configuration turned off and reboot.
5) insmod de4x5 [io=0xghh]
- 6) run the net startup bits for your new eth?? interface(s) manually
- (usually /etc/rc.inet[12] at boot time).
+ 6) run the net startup bits for your new eth?? interface(s) manually
+ (usually /etc/rc.inet[12] at boot time).
7) enjoy!
- To unload a module, turn off the associated interface(s)
+ To unload a module, turn off the associated interface(s)
'ifconfig eth?? down' then 'rmmod de4x5'.
Automedia detection is included so that in principle you can disconnect
@@ -90,7 +98,7 @@
By default, the driver will now autodetect any DECchip based card.
Should you have a need to restrict the driver to DIGITAL only cards, you
can compile with a DEC_ONLY define, or if loading as a module, use the
- 'dec_only=1' parameter.
+ 'dec_only=1' parameter.
I've changed the timing routines to use the kernel timer and scheduling
functions so that the hangs and other assorted problems that occurred
@@ -158,18 +166,21 @@
either at the end of the parameter list or with another board name. The
following parameters are allowed:
- fdx for full duplex
- autosense to set the media/speed; with the following
- sub-parameters:
+ ========= ===============================================
+ fdx for full duplex
+ autosense to set the media/speed; with the following
+ sub-parameters:
TP, TP_NW, BNC, AUI, BNC_AUI, 100Mb, 10Mb, AUTO
+ ========= ===============================================
Case sensitivity is important for the sub-parameters. They *must* be
- upper case. Examples:
+ upper case. Examples::
+
+ insmod de4x5 args='eth1:fdx autosense=BNC eth0:autosense=100Mb'.
- insmod de4x5 args='eth1:fdx autosense=BNC eth0:autosense=100Mb'.
+ For a compiled in driver, in linux/drivers/net/CONFIG, place e.g.::
- For a compiled in driver, in linux/drivers/net/CONFIG, place e.g.
- DE4X5_OPTS = -DDE4X5_PARM='"eth0:fdx autosense=AUI eth2:autosense=TP"'
+ DE4X5_OPTS = -DDE4X5_PARM='"eth0:fdx autosense=AUI eth2:autosense=TP"'
Yes, I know full duplex isn't permissible on BNC or AUI; they're just
examples. By default, full duplex is turned off and AUTO is the default
diff --git a/Documentation/networking/device_drivers/dec/dmfe.txt b/Documentation/networking/device_drivers/dec/dmfe.rst
index 25320bf19c86..c4cf809cad84 100644
--- a/Documentation/networking/device_drivers/dec/dmfe.txt
+++ b/Documentation/networking/device_drivers/dec/dmfe.rst
@@ -1,6 +1,11 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============================================================
+Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver for Linux
+==============================================================
+
Note: This driver doesn't have a maintainer.
-Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver for Linux.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
@@ -16,29 +21,29 @@ GNU General Public License for more details.
This driver provides kernel support for Davicom DM9102(A)/DM9132/DM9801 ethernet cards ( CNET
10/100 ethernet cards uses Davicom chipset too, so this driver supports CNET cards too ).If you
didn't compile this driver as a module, it will automatically load itself on boot and print a
-line similar to :
+line similar to::
dmfe: Davicom DM9xxx net driver, version 1.36.4 (2002-01-17)
-If you compiled this driver as a module, you have to load it on boot.You can load it with command :
+If you compiled this driver as a module, you have to load it on boot.You can load it with command::
insmod dmfe
This way it will autodetect the device mode.This is the suggested way to load the module.Or you can pass
-a mode= setting to module while loading, like :
+a mode= setting to module while loading, like::
insmod dmfe mode=0 # Force 10M Half Duplex
insmod dmfe mode=1 # Force 100M Half Duplex
insmod dmfe mode=4 # Force 10M Full Duplex
insmod dmfe mode=5 # Force 100M Full Duplex
-Next you should configure your network interface with a command similar to :
+Next you should configure your network interface with a command similar to::
ifconfig eth0 172.22.3.18
- ^^^^^^^^^^^
+ ^^^^^^^^^^^
Your IP Address
-Then you may have to modify the default routing table with command :
+Then you may have to modify the default routing table with command::
route add default eth0
@@ -48,10 +53,10 @@ Now your ethernet card should be up and running.
TODO:
-Implement pci_driver::suspend() and pci_driver::resume() power management methods.
-Check on 64 bit boxes.
-Check and fix on big endian boxes.
-Test and make sure PCI latency is now correct for all cases.
+- Implement pci_driver::suspend() and pci_driver::resume() power management methods.
+- Check on 64 bit boxes.
+- Check and fix on big endian boxes.
+- Test and make sure PCI latency is now correct for all cases.
Authors:
@@ -60,7 +65,7 @@ Sten Wang <sten_wang@davicom.com.tw > : Original Author
Contributors:
-Marcelo Tosatti <marcelo@conectiva.com.br>
-Alan Cox <alan@lxorguk.ukuu.org.uk>
-Jeff Garzik <jgarzik@pobox.com>
-Vojtech Pavlik <vojtech@suse.cz>
+- Marcelo Tosatti <marcelo@conectiva.com.br>
+- Alan Cox <alan@lxorguk.ukuu.org.uk>
+- Jeff Garzik <jgarzik@pobox.com>
+- Vojtech Pavlik <vojtech@suse.cz>
diff --git a/Documentation/networking/device_drivers/dlink/dl2k.txt b/Documentation/networking/device_drivers/dlink/dl2k.rst
index cba74f7a3abc..ccdb5d0d7460 100644
--- a/Documentation/networking/device_drivers/dlink/dl2k.txt
+++ b/Documentation/networking/device_drivers/dlink/dl2k.rst
@@ -1,10 +1,13 @@
+.. SPDX-License-Identifier: GPL-2.0
- D-Link DL2000-based Gigabit Ethernet Adapter Installation
- for Linux
- May 23, 2002
+=========================================================
+D-Link DL2000-based Gigabit Ethernet Adapter Installation
+=========================================================
+
+May 23, 2002
+
+.. Contents
-Contents
-========
- Compatibility List
- Quick Install
- Compiling the Driver
@@ -15,12 +18,13 @@ Contents
Compatibility List
-=================
+==================
+
Adapter Support:
-D-Link DGE-550T Gigabit Ethernet Adapter.
-D-Link DGE-550SX Gigabit Ethernet Adapter.
-D-Link DL2000-based Gigabit Ethernet Adapter.
+- D-Link DGE-550T Gigabit Ethernet Adapter.
+- D-Link DGE-550SX Gigabit Ethernet Adapter.
+- D-Link DL2000-based Gigabit Ethernet Adapter.
The driver support Linux kernel 2.4.7 later. We had tested it
@@ -34,28 +38,32 @@ on the environments below.
Quick Install
=============
-Install linux driver as following command:
+Install linux driver as following command::
+
+ 1. make all
+ 2. insmod dl2k.ko
+ 3. ifconfig eth0 up 10.xxx.xxx.xxx netmask 255.0.0.0
+ ^^^^^^^^^^^^^^^\ ^^^^^^^^\
+ IP NETMASK
-1. make all
-2. insmod dl2k.ko
-3. ifconfig eth0 up 10.xxx.xxx.xxx netmask 255.0.0.0
- ^^^^^^^^^^^^^^^\ ^^^^^^^^\
- IP NETMASK
Now eth0 should active, you can test it by "ping" or get more information by
"ifconfig". If tested ok, continue the next step.
-4. cp dl2k.ko /lib/modules/`uname -r`/kernel/drivers/net
-5. Add the following line to /etc/modprobe.d/dl2k.conf:
+4. ``cp dl2k.ko /lib/modules/`uname -r`/kernel/drivers/net``
+5. Add the following line to /etc/modprobe.d/dl2k.conf::
+
alias eth0 dl2k
-6. Run depmod to updated module indexes.
-7. Run "netconfig" or "netconf" to create configuration script ifcfg-eth0
+
+6. Run ``depmod`` to updated module indexes.
+7. Run ``netconfig`` or ``netconf`` to create configuration script ifcfg-eth0
located at /etc/sysconfig/network-scripts or create it manually.
+
[see - Configuration Script Sample]
8. Driver will automatically load and configure at next boot time.
Compiling the Driver
====================
- In Linux, NIC drivers are most commonly configured as loadable modules.
+In Linux, NIC drivers are most commonly configured as loadable modules.
The approach of building a monolithic kernel has become obsolete. The driver
can be compiled as part of a monolithic kernel, but is strongly discouraged.
The remainder of this section assumes the driver is built as a loadable module.
@@ -73,93 +81,108 @@ to compile and link the driver:
CD-ROM drive
------------
-[root@XXX /] mkdir cdrom
-[root@XXX /] mount -r -t iso9660 -o conv=auto /dev/cdrom /cdrom
-[root@XXX /] cd root
-[root@XXX /root] mkdir dl2k
-[root@XXX /root] cd dl2k
-[root@XXX dl2k] cp /cdrom/linux/dl2k.tgz /root/dl2k
-[root@XXX dl2k] tar xfvz dl2k.tgz
-[root@XXX dl2k] make all
+::
+
+ [root@XXX /] mkdir cdrom
+ [root@XXX /] mount -r -t iso9660 -o conv=auto /dev/cdrom /cdrom
+ [root@XXX /] cd root
+ [root@XXX /root] mkdir dl2k
+ [root@XXX /root] cd dl2k
+ [root@XXX dl2k] cp /cdrom/linux/dl2k.tgz /root/dl2k
+ [root@XXX dl2k] tar xfvz dl2k.tgz
+ [root@XXX dl2k] make all
Floppy disc drive
-----------------
-[root@XXX /] cd root
-[root@XXX /root] mkdir dl2k
-[root@XXX /root] cd dl2k
-[root@XXX dl2k] mcopy a:/linux/dl2k.tgz /root/dl2k
-[root@XXX dl2k] tar xfvz dl2k.tgz
-[root@XXX dl2k] make all
+::
+
+ [root@XXX /] cd root
+ [root@XXX /root] mkdir dl2k
+ [root@XXX /root] cd dl2k
+ [root@XXX dl2k] mcopy a:/linux/dl2k.tgz /root/dl2k
+ [root@XXX dl2k] tar xfvz dl2k.tgz
+ [root@XXX dl2k] make all
Installing the Driver
=====================
- Manual Installation
- -------------------
+Manual Installation
+-------------------
+
Once the driver has been compiled, it must be loaded, enabled, and bound
to a protocol stack in order to establish network connectivity. To load a
- module enter the command:
+ module enter the command::
+
+ insmod dl2k.o
+
+ or::
+
+ insmod dl2k.o <optional parameter> ; add parameter
- insmod dl2k.o
+---------------------------------------------------------
- or
+ example::
- insmod dl2k.o <optional parameter> ; add parameter
+ insmod dl2k.o media=100mbps_hd
- ===============================================================
- example: insmod dl2k.o media=100mbps_hd
- or insmod dl2k.o media=3
- or insmod dl2k.o media=3,2 ; for 2 cards
- ===============================================================
+ or::
+
+ insmod dl2k.o media=3
+
+ or::
+
+ insmod dl2k.o media=3,2 ; for 2 cards
+
+---------------------------------------------------------
Please reference the list of the command line parameters supported by
the Linux device driver below.
The insmod command only loads the driver and gives it a name of the form
eth0, eth1, etc. To bring the NIC into an operational state,
- it is necessary to issue the following command:
+ it is necessary to issue the following command::
- ifconfig eth0 up
+ ifconfig eth0 up
Finally, to bind the driver to the active protocol (e.g., TCP/IP with
- Linux), enter the following command:
+ Linux), enter the following command::
- ifup eth0
+ ifup eth0
Note that this is meaningful only if the system can find a configuration
script that contains the necessary network information. A sample will be
given in the next paragraph.
- The commands to unload a driver are as follows:
+ The commands to unload a driver are as follows::
- ifdown eth0
- ifconfig eth0 down
- rmmod dl2k.o
+ ifdown eth0
+ ifconfig eth0 down
+ rmmod dl2k.o
The following are the commands to list the currently loaded modules and
- to see the current network configuration.
+ to see the current network configuration::
- lsmod
- ifconfig
+ lsmod
+ ifconfig
- Automated Installation
- ----------------------
+Automated Installation
+----------------------
This section describes how to install the driver such that it is
automatically loaded and configured at boot time. The following description
is based on a Red Hat 6.0/7.0 distribution, but it can easily be ported to
other distributions as well.
- Red Hat v6.x/v7.x
- -----------------
+Red Hat v6.x/v7.x
+-----------------
1. Copy dl2k.o to the network modules directory, typically
/lib/modules/2.x.x-xx/net or /lib/modules/2.x.x/kernel/drivers/net.
2. Locate the boot module configuration file, most commonly in the
- /etc/modprobe.d/ directory. Add the following lines:
+ /etc/modprobe.d/ directory. Add the following lines::
- alias ethx dl2k
- options dl2k <optional parameters>
+ alias ethx dl2k
+ options dl2k <optional parameters>
where ethx will be eth0 if the NIC is the only ethernet adapter, eth1 if
one other ethernet adapter is installed, etc. Refer to the table in the
@@ -180,11 +203,15 @@ parameter. Below is a list of the command line parameters supported by the
Linux device
driver.
-mtu=packet_size - Specifies the maximum packet size. default
+
+=============================== ==============================================
+mtu=packet_size Specifies the maximum packet size. default
is 1500.
-media=media_type - Specifies the media type the NIC operates at.
+media=media_type Specifies the media type the NIC operates at.
autosense Autosensing active media.
+
+ =========== =========================
10mbps_hd 10Mbps half duplex.
10mbps_fd 10Mbps full duplex.
100mbps_hd 100Mbps half duplex.
@@ -198,85 +225,90 @@ media=media_type - Specifies the media type the NIC operates at.
4 100Mbps full duplex.
5 1000Mbps half duplex.
6 1000Mbps full duplex.
+ =========== =========================
By default, the NIC operates at autosense.
1000mbps_fd and 1000mbps_hd types are only
available for fiber adapter.
-vlan=n - Specifies the VLAN ID. If vlan=0, the
+vlan=n Specifies the VLAN ID. If vlan=0, the
Virtual Local Area Network (VLAN) function is
disable.
-jumbo=[0|1] - Specifies the jumbo frame support. If jumbo=1,
+jumbo=[0|1] Specifies the jumbo frame support. If jumbo=1,
the NIC accept jumbo frames. By default, this
function is disabled.
Jumbo frame usually improve the performance
int gigabit.
- This feature need jumbo frame compatible
+ This feature need jumbo frame compatible
remote.
-
-rx_coalesce=m - Number of rx frame handled each interrupt.
-rx_timeout=n - Rx DMA wait time for an interrupt.
- If set rx_coalesce > 0, hardware only assert
- an interrupt for m frames. Hardware won't
+
+rx_coalesce=m Number of rx frame handled each interrupt.
+rx_timeout=n Rx DMA wait time for an interrupt.
+ If set rx_coalesce > 0, hardware only assert
+ an interrupt for m frames. Hardware won't
assert rx interrupt until m frames received or
- reach timeout of n * 640 nano seconds.
- Set proper rx_coalesce and rx_timeout can
+ reach timeout of n * 640 nano seconds.
+ Set proper rx_coalesce and rx_timeout can
reduce congestion collapse and overload which
has been a bottleneck for high speed network.
-
+
For example, rx_coalesce=10 rx_timeout=800.
- that is, hardware assert only 1 interrupt
- for 10 frames received or timeout of 512 us.
+ that is, hardware assert only 1 interrupt
+ for 10 frames received or timeout of 512 us.
-tx_coalesce=n - Number of tx frame handled each interrupt.
- Set n > 1 can reduce the interrupts
+tx_coalesce=n Number of tx frame handled each interrupt.
+ Set n > 1 can reduce the interrupts
congestion usually lower performance of
high speed network card. Default is 16.
-
-tx_flow=[1|0] - Specifies the Tx flow control. If tx_flow=0,
+
+tx_flow=[1|0] Specifies the Tx flow control. If tx_flow=0,
the Tx flow control disable else driver
autodetect.
-rx_flow=[1|0] - Specifies the Rx flow control. If rx_flow=0,
+rx_flow=[1|0] Specifies the Rx flow control. If rx_flow=0,
the Rx flow control enable else driver
autodetect.
+=============================== ==============================================
Configuration Script Sample
===========================
-Here is a sample of a simple configuration script:
+Here is a sample of a simple configuration script::
-DEVICE=eth0
-USERCTL=no
-ONBOOT=yes
-POOTPROTO=none
-BROADCAST=207.200.5.255
-NETWORK=207.200.5.0
-NETMASK=255.255.255.0
-IPADDR=207.200.5.2
+ DEVICE=eth0
+ USERCTL=no
+ ONBOOT=yes
+ POOTPROTO=none
+ BROADCAST=207.200.5.255
+ NETWORK=207.200.5.0
+ NETMASK=255.255.255.0
+ IPADDR=207.200.5.2
Troubleshooting
===============
Q1. Source files contain ^ M behind every line.
- Make sure all files are Unix file format (no LF). Try the following
- shell command to convert files.
+
+ Make sure all files are Unix file format (no LF). Try the following
+ shell command to convert files::
cat dl2k.c | col -b > dl2k.tmp
mv dl2k.tmp dl2k.c
- OR
+ OR::
cat dl2k.c | tr -d "\r" > dl2k.tmp
mv dl2k.tmp dl2k.c
-Q2: Could not find header files (*.h) ?
- To compile the driver, you need kernel header files. After
+Q2: Could not find header files (``*.h``)?
+
+ To compile the driver, you need kernel header files. After
installing the kernel source, the header files are usually located in
/usr/src/linux/include, which is the default include directory configured
in Makefile. For some distributions, there is a copy of header files in
/usr/src/include/linux and /usr/src/include/asm, that you can change the
INCLUDEDIR in Makefile to /usr/include without installing kernel source.
- Note that RH 7.0 didn't provide correct header files in /usr/include,
+
+ Note that RH 7.0 didn't provide correct header files in /usr/include,
including those files will make a wrong version driver.
diff --git a/Documentation/networking/device_drivers/freescale/dpaa.txt b/Documentation/networking/device_drivers/freescale/dpaa.rst
index b06601ff9200..241c6c6f6e68 100644
--- a/Documentation/networking/device_drivers/freescale/dpaa.txt
+++ b/Documentation/networking/device_drivers/freescale/dpaa.rst
@@ -1,12 +1,14 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============================
The QorIQ DPAA Ethernet Driver
==============================
Authors:
-Madalin Bucur <madalin.bucur@nxp.com>
-Camelia Groza <camelia.groza@nxp.com>
+- Madalin Bucur <madalin.bucur@nxp.com>
+- Camelia Groza <camelia.groza@nxp.com>
-Contents
-========
+.. Contents
- DPAA Ethernet Overview
- DPAA Ethernet Supported SoCs
@@ -34,7 +36,7 @@ following drivers in the Linux kernel:
- Queue Manager (QMan), Buffer Manager (BMan)
drivers/soc/fsl/qbman
-A simplified view of the dpaa_eth interfaces mapped to FMan MACs:
+A simplified view of the dpaa_eth interfaces mapped to FMan MACs::
dpaa_eth /eth0\ ... /ethN\
driver | | | |
@@ -42,89 +44,93 @@ A simplified view of the dpaa_eth interfaces mapped to FMan MACs:
-Ports / Tx Rx \ ... / Tx Rx \
FMan | | | |
-MACs | MAC0 | | MACN |
- / dtsec0 \ ... / dtsecN \ (or tgec)
- / \ / \(or memac)
+ / dtsec0 \ ... / dtsecN \ (or tgec)
+ / \ / \(or memac)
--------- -------------- --- -------------- ---------
FMan, FMan Port, FMan SP, FMan MURAM drivers
---------------------------------------------------------
FMan HW blocks: MURAM, MACs, Ports, SP
---------------------------------------------------------
-The dpaa_eth relation to the QMan, BMan and FMan:
- ________________________________
+The dpaa_eth relation to the QMan, BMan and FMan::
+
+ ________________________________
dpaa_eth / eth0 \
driver / \
--------- -^- -^- -^- --- ---------
QMan driver / \ / \ / \ \ / | BMan |
- |Rx | |Rx | |Tx | |Tx | | driver |
+ |Rx | |Rx | |Tx | |Tx | | driver |
--------- |Dfl| |Err| |Cnf| |FQs| | |
QMan HW |FQ | |FQ | |FQs| | | | |
- / \ / \ / \ \ / | |
+ / \ / \ / \ \ / | |
--------- --- --- --- -v- ---------
- | FMan QMI | |
- | FMan HW FMan BMI | BMan HW |
- ----------------------- --------
+ | FMan QMI | |
+ | FMan HW FMan BMI | BMan HW |
+ ----------------------- --------
where the acronyms used above (and in the code) are:
-DPAA = Data Path Acceleration Architecture
-FMan = DPAA Frame Manager
-QMan = DPAA Queue Manager
-BMan = DPAA Buffers Manager
-QMI = QMan interface in FMan
-BMI = BMan interface in FMan
-FMan SP = FMan Storage Profiles
-MURAM = Multi-user RAM in FMan
-FQ = QMan Frame Queue
-Rx Dfl FQ = default reception FQ
-Rx Err FQ = Rx error frames FQ
-Tx Cnf FQ = Tx confirmation FQs
-Tx FQs = transmission frame queues
-dtsec = datapath three speed Ethernet controller (10/100/1000 Mbps)
-tgec = ten gigabit Ethernet controller (10 Gbps)
-memac = multirate Ethernet MAC (10/100/1000/10000)
+
+=============== ===========================================================
+DPAA Data Path Acceleration Architecture
+FMan DPAA Frame Manager
+QMan DPAA Queue Manager
+BMan DPAA Buffers Manager
+QMI QMan interface in FMan
+BMI BMan interface in FMan
+FMan SP FMan Storage Profiles
+MURAM Multi-user RAM in FMan
+FQ QMan Frame Queue
+Rx Dfl FQ default reception FQ
+Rx Err FQ Rx error frames FQ
+Tx Cnf FQ Tx confirmation FQs
+Tx FQs transmission frame queues
+dtsec datapath three speed Ethernet controller (10/100/1000 Mbps)
+tgec ten gigabit Ethernet controller (10 Gbps)
+memac multirate Ethernet MAC (10/100/1000/10000)
+=============== ===========================================================
DPAA Ethernet Supported SoCs
============================
The DPAA drivers enable the Ethernet controllers present on the following SoCs:
-# PPC
-P1023
-P2041
-P3041
-P4080
-P5020
-P5040
-T1023
-T1024
-T1040
-T1042
-T2080
-T4240
-B4860
-
-# ARM
-LS1043A
-LS1046A
+PPC
+- P1023
+- P2041
+- P3041
+- P4080
+- P5020
+- P5040
+- T1023
+- T1024
+- T1040
+- T1042
+- T2080
+- T4240
+- B4860
+
+ARM
+- LS1043A
+- LS1046A
Configuring DPAA Ethernet in your kernel
========================================
-To enable the DPAA Ethernet driver, the following Kconfig options are required:
+To enable the DPAA Ethernet driver, the following Kconfig options are required::
-# common for arch/arm64 and arch/powerpc platforms
-CONFIG_FSL_DPAA=y
-CONFIG_FSL_FMAN=y
-CONFIG_FSL_DPAA_ETH=y
-CONFIG_FSL_XGMAC_MDIO=y
+ # common for arch/arm64 and arch/powerpc platforms
+ CONFIG_FSL_DPAA=y
+ CONFIG_FSL_FMAN=y
+ CONFIG_FSL_DPAA_ETH=y
+ CONFIG_FSL_XGMAC_MDIO=y
-# for arch/powerpc only
-CONFIG_FSL_PAMU=y
+ # for arch/powerpc only
+ CONFIG_FSL_PAMU=y
-# common options needed for the PHYs used on the RDBs
-CONFIG_VITESSE_PHY=y
-CONFIG_REALTEK_PHY=y
-CONFIG_AQUANTIA_PHY=y
+ # common options needed for the PHYs used on the RDBs
+ CONFIG_VITESSE_PHY=y
+ CONFIG_REALTEK_PHY=y
+ CONFIG_AQUANTIA_PHY=y
DPAA Ethernet Frame Processing
==============================
@@ -167,7 +173,9 @@ classes as follows:
* priorities 8 to 11 - traffic class 2 (medium-high priority)
* priorities 12 to 15 - traffic class 3 (high priority)
-tc qdisc add dev <int> root handle 1: \
+::
+
+ tc qdisc add dev <int> root handle 1: \
mqprio num_tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 hw 1
DPAA IRQ Affinity and Receive Side Scaling
@@ -201,11 +209,11 @@ of these frame queues will arrive at the same portal and will always
be processed by the same CPU. This ensures intra-flow order preservation
and workload distribution for multiple traffic flows.
-RSS can be turned off for a certain interface using ethtool, i.e.
+RSS can be turned off for a certain interface using ethtool, i.e.::
# ethtool -N fm1-mac9 rx-flow-hash tcp4 ""
-To turn it back on, one needs to set rx-flow-hash for tcp4/6 or udp4/6:
+To turn it back on, one needs to set rx-flow-hash for tcp4/6 or udp4/6::
# ethtool -N fm1-mac9 rx-flow-hash udp4 sfdn
@@ -216,7 +224,7 @@ going to control the rx-flow-hashing for all protocols on that interface.
Besides using the FMan Keygen computed hash for spreading traffic on the
128 Rx FQs, the DPAA Ethernet driver also sets the skb hash value when
the NETIF_F_RXHASH feature is on (active by default). This can be turned
-on or off through ethtool, i.e.:
+on or off through ethtool, i.e.::
# ethtool -K fm1-mac9 rx-hashing off
# ethtool -k fm1-mac9 | grep hash
@@ -246,6 +254,7 @@ The following statistics are exported for each interface through ethtool:
- Rx error count per CPU
- Rx error count per type
- congestion related statistics:
+
- congestion status
- time spent in congestion
- number of time the device entered congestion
@@ -254,7 +263,7 @@ The following statistics are exported for each interface through ethtool:
The driver also exports the following information in sysfs:
- the FQ IDs for each FQ type
- /sys/devices/platform/soc/<addr>.fman/<addr>.ethernet/dpaa-ethernet.<id>/net/fm<nr>-mac<nr>/fqids
+ /sys/devices/platform/soc/<addr>.fman/<addr>.ethernet/dpaa-ethernet.<id>/net/fm<nr>-mac<nr>/fqids
- the ID of the buffer pool in use
- /sys/devices/platform/soc/<addr>.fman/<addr>.ethernet/dpaa-ethernet.<id>/net/fm<nr>-mac<nr>/bpids
+ /sys/devices/platform/soc/<addr>.fman/<addr>.ethernet/dpaa-ethernet.<id>/net/fm<nr>-mac<nr>/bpids
diff --git a/Documentation/networking/device_drivers/freescale/gianfar.txt b/Documentation/networking/device_drivers/freescale/gianfar.rst
index ba1daea7f2e4..9c4a91d3824b 100644
--- a/Documentation/networking/device_drivers/freescale/gianfar.txt
+++ b/Documentation/networking/device_drivers/freescale/gianfar.rst
@@ -1,10 +1,15 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================
The Gianfar Ethernet Driver
+===========================
-Author: Andy Fleming <afleming@freescale.com>
-Updated: 2005-07-28
+:Author: Andy Fleming <afleming@freescale.com>
+:Updated: 2005-07-28
-CHECKSUM OFFLOADING
+Checksum Offloading
+===================
The eTSEC controller (first included in parts from late 2005 like
the 8548) has the ability to perform TCP, UDP, and IP checksums
@@ -15,13 +20,15 @@ packets. Use ethtool to enable or disable this feature for RX
and TX.
VLAN
+====
In order to use VLAN, please consult Linux documentation on
configuring VLANs. The gianfar driver supports hardware insertion and
extraction of VLAN headers, but not filtering. Filtering will be
done by the kernel.
-MULTICASTING
+Multicasting
+============
The gianfar driver supports using the group hash table on the
TSEC (and the extended hash table on the eTSEC) for multicast
@@ -29,13 +36,15 @@ filtering. On the eTSEC, the exact-match MAC registers are used
before the hash tables. See Linux documentation on how to join
multicast groups.
-PADDING
+Padding
+=======
The gianfar driver supports padding received frames with 2 bytes
to align the IP header to a 16-byte boundary, when supported by
hardware.
-ETHTOOL
+Ethtool
+=======
The gianfar driver supports the use of ethtool for many
configuration options. You must run ethtool only on currently
diff --git a/Documentation/networking/device_drivers/index.rst b/Documentation/networking/device_drivers/index.rst
index a191faaf97de..e18dad11bc72 100644
--- a/Documentation/networking/device_drivers/index.rst
+++ b/Documentation/networking/device_drivers/index.rst
@@ -27,6 +27,30 @@ Contents:
netronome/nfp
pensando/ionic
stmicro/stmmac
+ 3com/3c509
+ 3com/vortex
+ amazon/ena
+ aquantia/atlantic
+ chelsio/cxgb
+ cirrus/cs89x0
+ davicom/dm9000
+ dec/de4x5
+ dec/dmfe
+ dlink/dl2k
+ freescale/dpaa
+ freescale/gianfar
+ intel/ipw2100
+ intel/ipw2200
+ microsoft/netvsc
+ neterion/s2io
+ neterion/vxge
+ qualcomm/rmnet
+ sb1000
+ smsc/smc9
+ ti/cpsw_switchdev
+ ti/cpsw
+ ti/tlan
+ toshiba/spider_net
.. only:: subproject and html
diff --git a/Documentation/networking/device_drivers/intel/e100.rst b/Documentation/networking/device_drivers/intel/e100.rst
index caf023cc88de..3ac21e7119a7 100644
--- a/Documentation/networking/device_drivers/intel/e100.rst
+++ b/Documentation/networking/device_drivers/intel/e100.rst
@@ -33,7 +33,7 @@ The following features are now available in supported kernels:
- SNMP
Channel Bonding documentation can be found in the Linux kernel source:
-/Documentation/networking/bonding.txt
+/Documentation/networking/bonding.rst
Identifying Your Adapter
diff --git a/Documentation/networking/device_drivers/intel/ipw2100.txt b/Documentation/networking/device_drivers/intel/ipw2100.rst
index 6f85e1d06031..d54ad522f937 100644
--- a/Documentation/networking/device_drivers/intel/ipw2100.txt
+++ b/Documentation/networking/device_drivers/intel/ipw2100.rst
@@ -1,31 +1,37 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
-Intel(R) PRO/Wireless 2100 Driver for Linux in support of:
+===========================================
+Intel(R) PRO/Wireless 2100 Driver for Linux
+===========================================
-Intel(R) PRO/Wireless 2100 Network Connection
+Support for:
-Copyright (C) 2003-2006, Intel Corporation
+- Intel(R) PRO/Wireless 2100 Network Connection
+
+Copyright |copy| 2003-2006, Intel Corporation
README.ipw2100
-Version: git-1.1.5
-Date : January 25, 2006
+:Version: git-1.1.5
+:Date: January 25, 2006
-Index
------------------------------------------------
-0. IMPORTANT INFORMATION BEFORE USING THIS DRIVER
-1. Introduction
-2. Release git-1.1.5 Current Features
-3. Command Line Parameters
-4. Sysfs Helper Files
-5. Radio Kill Switch
-6. Dynamic Firmware
-7. Power Management
-8. Support
-9. License
+.. Index
+
+ 0. IMPORTANT INFORMATION BEFORE USING THIS DRIVER
+ 1. Introduction
+ 2. Release git-1.1.5 Current Features
+ 3. Command Line Parameters
+ 4. Sysfs Helper Files
+ 5. Radio Kill Switch
+ 6. Dynamic Firmware
+ 7. Power Management
+ 8. Support
+ 9. License
-0. IMPORTANT INFORMATION BEFORE USING THIS DRIVER
------------------------------------------------
+0. IMPORTANT INFORMATION BEFORE USING THIS DRIVER
+=================================================
Important Notice FOR ALL USERS OR DISTRIBUTORS!!!!
@@ -75,10 +81,10 @@ obtain a tested driver from Intel Customer Support at:
http://www.intel.com/support/wireless/sb/CS-006408.htm
1. Introduction
------------------------------------------------
+===============
-This document provides a brief overview of the features supported by the
-IPW2100 driver project. The main project website, where the latest
+This document provides a brief overview of the features supported by the
+IPW2100 driver project. The main project website, where the latest
development version of the driver can be found, is:
http://ipw2100.sourceforge.net
@@ -89,10 +95,11 @@ for the driver project.
2. Release git-1.1.5 Current Supported Features
------------------------------------------------
+===============================================
+
- Managed (BSS) and Ad-Hoc (IBSS)
- WEP (shared key and open)
-- Wireless Tools support
+- Wireless Tools support
- 802.1x (tested with XSupplicant 1.0.1)
Enabled (but not supported) features:
@@ -105,11 +112,11 @@ performed on a given feature.
3. Command Line Parameters
------------------------------------------------
+==========================
If the driver is built as a module, the following optional parameters are used
by entering them on the command line with the modprobe command using this
-syntax:
+syntax::
modprobe ipw2100 [<option>=<VAL1><,VAL2>...]
@@ -119,61 +126,76 @@ For example, to disable the radio on driver loading, enter:
The ipw2100 driver supports the following module parameters:
-Name Value Example:
-debug 0x0-0xffffffff debug=1024
-mode 0,1,2 mode=1 /* AdHoc */
-channel int channel=3 /* Only valid in AdHoc or Monitor */
-associate boolean associate=0 /* Do NOT auto associate */
-disable boolean disable=1 /* Do not power the HW */
+========= ============== ============ ==============================
+Name Value Example Meaning
+========= ============== ============ ==============================
+debug 0x0-0xffffffff debug=1024 Debug level set to 1024
+mode 0,1,2 mode=1 AdHoc
+channel int channel=3 Only valid in AdHoc or Monitor
+associate boolean associate=0 Do NOT auto associate
+disable boolean disable=1 Do not power the HW
+========= ============== ============ ==============================
4. Sysfs Helper Files
----------------------------
------------------------------------------------
+=====================
-There are several ways to control the behavior of the driver. Many of the
+There are several ways to control the behavior of the driver. Many of the
general capabilities are exposed through the Wireless Tools (iwconfig). There
are a few capabilities that are exposed through entries in the Linux Sysfs.
------ Driver Level ------
+**Driver Level**
+
For the driver level files, look in /sys/bus/pci/drivers/ipw2100/
- debug_level
-
- This controls the same global as the 'debug' module parameter. For
- information on the various debugging levels available, run the 'dvals'
+ debug_level
+ This controls the same global as the 'debug' module parameter. For
+ information on the various debugging levels available, run the 'dvals'
script found in the driver source directory.
- NOTE: 'debug_level' is only enabled if CONFIG_IPW2100_DEBUG is turn
- on.
+ .. note::
+
+ 'debug_level' is only enabled if CONFIG_IPW2100_DEBUG is turn on.
+
+**Device Level**
+
+For the device level files look in::
------ Device Level ------
-For the device level files look in
-
/sys/bus/pci/drivers/ipw2100/{PCI-ID}/
-For example:
+For example::
+
/sys/bus/pci/drivers/ipw2100/0000:02:01.0
For the device level files, see /sys/bus/pci/drivers/ipw2100:
rf_kill
- read -
- 0 = RF kill not enabled (radio on)
- 1 = SW based RF kill active (radio off)
- 2 = HW based RF kill active (radio off)
- 3 = Both HW and SW RF kill active (radio off)
- write -
- 0 = If SW based RF kill active, turn the radio back on
- 1 = If radio is on, activate SW based RF kill
+ read
+
+ == =========================================
+ 0 RF kill not enabled (radio on)
+ 1 SW based RF kill active (radio off)
+ 2 HW based RF kill active (radio off)
+ 3 Both HW and SW RF kill active (radio off)
+ == =========================================
+
+ write
+
+ == ==================================================
+ 0 If SW based RF kill active, turn the radio back on
+ 1 If radio is on, activate SW based RF kill
+ == ==================================================
- NOTE: If you enable the SW based RF kill and then toggle the HW
- based RF kill from ON -> OFF -> ON, the radio will NOT come back on
+ .. note::
+
+ If you enable the SW based RF kill and then toggle the HW
+ based RF kill from ON -> OFF -> ON, the radio will NOT come back on
5. Radio Kill Switch
------------------------------------------------
+====================
+
Most laptops provide the ability for the user to physically disable the radio.
Some vendors have implemented this as a physical switch that requires no
software to turn the radio off and on. On other laptops, however, the switch
@@ -186,9 +208,10 @@ on your system.
6. Dynamic Firmware
------------------------------------------------
-As the firmware is licensed under a restricted use license, it can not be
-included within the kernel sources. To enable the IPW2100 you will need a
+===================
+
+As the firmware is licensed under a restricted use license, it can not be
+included within the kernel sources. To enable the IPW2100 you will need a
firmware image to load into the wireless NIC's processors.
You can obtain these images from <http://ipw2100.sf.net/firmware.php>.
@@ -197,52 +220,57 @@ See INSTALL for instructions on installing the firmware.
7. Power Management
------------------------------------------------
-The IPW2100 supports the configuration of the Power Save Protocol
-through a private wireless extension interface. The IPW2100 supports
+===================
+
+The IPW2100 supports the configuration of the Power Save Protocol
+through a private wireless extension interface. The IPW2100 supports
the following different modes:
+ === ===========================================================
off No power management. Radio is always on.
on Automatic power management
- 1-5 Different levels of power management. The higher the
- number the greater the power savings, but with an impact to
- packet latencies.
-
-Power management works by powering down the radio after a certain
-interval of time has passed where no packets are passed through the
-radio. Once powered down, the radio remains in that state for a given
-period of time. For higher power savings, the interval between last
+ 1-5 Different levels of power management. The higher the
+ number the greater the power savings, but with an impact to
+ packet latencies.
+ === ===========================================================
+
+Power management works by powering down the radio after a certain
+interval of time has passed where no packets are passed through the
+radio. Once powered down, the radio remains in that state for a given
+period of time. For higher power savings, the interval between last
packet processed to sleep is shorter and the sleep period is longer.
-When the radio is asleep, the access point sending data to the station
-must buffer packets at the AP until the station wakes up and requests
-any buffered packets. If you have an AP that does not correctly support
-the PSP protocol you may experience packet loss or very poor performance
-while power management is enabled. If this is the case, you will need
-to try and find a firmware update for your AP, or disable power
-management (via `iwconfig eth1 power off`)
+When the radio is asleep, the access point sending data to the station
+must buffer packets at the AP until the station wakes up and requests
+any buffered packets. If you have an AP that does not correctly support
+the PSP protocol you may experience packet loss or very poor performance
+while power management is enabled. If this is the case, you will need
+to try and find a firmware update for your AP, or disable power
+management (via ``iwconfig eth1 power off``)
-To configure the power level on the IPW2100 you use a combination of
-iwconfig and iwpriv. iwconfig is used to turn power management on, off,
+To configure the power level on the IPW2100 you use a combination of
+iwconfig and iwpriv. iwconfig is used to turn power management on, off,
and set it to auto.
+ ========================= ====================================
iwconfig eth1 power off Disables radio power down
- iwconfig eth1 power on Enables radio power management to
+ iwconfig eth1 power on Enables radio power management to
last set level (defaults to AUTO)
- iwpriv eth1 set_power 0 Sets power level to AUTO and enables
- power management if not previously
+ iwpriv eth1 set_power 0 Sets power level to AUTO and enables
+ power management if not previously
enabled.
- iwpriv eth1 set_power 1-5 Set the power level as specified,
- enabling power management if not
+ iwpriv eth1 set_power 1-5 Set the power level as specified,
+ enabling power management if not
previously enabled.
+ ========================= ====================================
+
+You can view the current power level setting via::
-You can view the current power level setting via:
-
iwpriv eth1 get_power
It will return the current period or timeout that is configured as a string
in the form of xxxx/yyyy (z) where xxxx is the timeout interval (amount of
-time after packet processing), yyyy is the period to sleep (amount of time to
+time after packet processing), yyyy is the period to sleep (amount of time to
wait before powering the radio and querying the access point for buffered
packets), and z is the 'power level'. If power management is turned off the
xxxx/yyyy will be replaced with 'off' -- the level reported will be the active
@@ -250,44 +278,46 @@ level if `iwconfig eth1 power on` is invoked.
8. Support
------------------------------------------------
+==========
For general development information and support,
go to:
-
+
http://ipw2100.sf.net/
-The ipw2100 1.1.0 driver and firmware can be downloaded from:
+The ipw2100 1.1.0 driver and firmware can be downloaded from:
http://support.intel.com
-For installation support on the ipw2100 1.1.0 driver on Linux kernels
-2.6.8 or greater, email support is available from:
+For installation support on the ipw2100 1.1.0 driver on Linux kernels
+2.6.8 or greater, email support is available from:
http://supportmail.intel.com
9. License
------------------------------------------------
+==========
- Copyright(c) 2003 - 2006 Intel Corporation. All rights reserved.
+ Copyright |copy| 2003 - 2006 Intel Corporation. All rights reserved.
- This program is free software; you can redistribute it and/or modify it
- under the terms of the GNU General Public License (version 2) as
+ This program is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License (version 2) as
published by the Free Software Foundation.
-
- This program is distributed in the hope that it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+
+ This program is distributed in the hope that it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
-
+
You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc., 59
+ this program; if not, write to the Free Software Foundation, Inc., 59
Temple Place - Suite 330, Boston, MA 02111-1307, USA.
-
+
The full GNU General Public License is included in this distribution in the
file called LICENSE.
-
+
License Contact Information:
+
James P. Ketrenos <ipw2100-admin@linux.intel.com>
+
Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
diff --git a/Documentation/networking/device_drivers/intel/ipw2200.txt b/Documentation/networking/device_drivers/intel/ipw2200.rst
index b7658bed4906..0cb42d2fd7e5 100644
--- a/Documentation/networking/device_drivers/intel/ipw2200.txt
+++ b/Documentation/networking/device_drivers/intel/ipw2200.rst
@@ -1,8 +1,15 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
-Intel(R) PRO/Wireless 2915ABG Driver for Linux in support of:
+==============================================
+Intel(R) PRO/Wireless 2915ABG Driver for Linux
+==============================================
-Intel(R) PRO/Wireless 2200BG Network Connection
-Intel(R) PRO/Wireless 2915ABG Network Connection
+
+Support for:
+
+- Intel(R) PRO/Wireless 2200BG Network Connection
+- Intel(R) PRO/Wireless 2915ABG Network Connection
Note: The Intel(R) PRO/Wireless 2915ABG Driver for Linux and Intel(R)
PRO/Wireless 2200BG Driver for Linux is a unified driver that works on
@@ -10,37 +17,37 @@ both hardware adapters listed above. In this document the Intel(R)
PRO/Wireless 2915ABG Driver for Linux will be used to reference the
unified driver.
-Copyright (C) 2004-2006, Intel Corporation
+Copyright |copy| 2004-2006, Intel Corporation
README.ipw2200
-Version: 1.1.2
-Date : March 30, 2006
+:Version: 1.1.2
+:Date: March 30, 2006
-Index
------------------------------------------------
-0. IMPORTANT INFORMATION BEFORE USING THIS DRIVER
-1. Introduction
-1.1. Overview of features
-1.2. Module parameters
-1.3. Wireless Extension Private Methods
-1.4. Sysfs Helper Files
-1.5. Supported channels
-2. Ad-Hoc Networking
-3. Interacting with Wireless Tools
-3.1. iwconfig mode
-3.2. iwconfig sens
-4. About the Version Numbers
-5. Firmware installation
-6. Support
-7. License
+.. Index
+ 0. IMPORTANT INFORMATION BEFORE USING THIS DRIVER
+ 1. Introduction
+ 1.1. Overview of features
+ 1.2. Module parameters
+ 1.3. Wireless Extension Private Methods
+ 1.4. Sysfs Helper Files
+ 1.5. Supported channels
+ 2. Ad-Hoc Networking
+ 3. Interacting with Wireless Tools
+ 3.1. iwconfig mode
+ 3.2. iwconfig sens
+ 4. About the Version Numbers
+ 5. Firmware installation
+ 6. Support
+ 7. License
-0. IMPORTANT INFORMATION BEFORE USING THIS DRIVER
------------------------------------------------
-Important Notice FOR ALL USERS OR DISTRIBUTORS!!!!
+0. IMPORTANT INFORMATION BEFORE USING THIS DRIVER
+=================================================
+
+Important Notice FOR ALL USERS OR DISTRIBUTORS!!!!
Intel wireless LAN adapters are engineered, manufactured, tested, and
quality checked to ensure that they meet all necessary local and
@@ -56,7 +63,7 @@ product is granted. Intel's wireless LAN's EEPROM, firmware, and
software driver are designed to carefully control parameters that affect
radio operation and to ensure electromagnetic compliance (EMC). These
parameters include, without limitation, RF power, spectrum usage,
-channel scanning, and human exposure.
+channel scanning, and human exposure.
For these reasons Intel cannot permit any manipulation by third parties
of the software provided in binary format with the wireless WLAN
@@ -70,7 +77,7 @@ no liability, under any theory of liability for any issues associated
with the modified products, including without limitation, claims under
the warranty and/or issues arising from regulatory non-compliance, and
(iii) Intel will not provide or be required to assist in providing
-support to any third parties for such modified products.
+support to any third parties for such modified products.
Note: Many regulatory agencies consider Wireless LAN adapters to be
modules, and accordingly, condition system-level regulatory approval
@@ -78,23 +85,24 @@ upon receipt and review of test data documenting that the antennas and
system configuration do not cause the EMC and radio operation to be
non-compliant.
-The drivers available for download from SourceForge are provided as a
-part of a development project. Conformance to local regulatory
-requirements is the responsibility of the individual developer. As
-such, if you are interested in deploying or shipping a driver as part of
-solution intended to be used for purposes other than development, please
+The drivers available for download from SourceForge are provided as a
+part of a development project. Conformance to local regulatory
+requirements is the responsibility of the individual developer. As
+such, if you are interested in deploying or shipping a driver as part of
+solution intended to be used for purposes other than development, please
obtain a tested driver from Intel Customer Support at:
http://support.intel.com
-1. Introduction
------------------------------------------------
-The following sections attempt to provide a brief introduction to using
+1. Introduction
+===============
+
+The following sections attempt to provide a brief introduction to using
the Intel(R) PRO/Wireless 2915ABG Driver for Linux.
-This document is not meant to be a comprehensive manual on
-understanding or using wireless technologies, but should be sufficient
+This document is not meant to be a comprehensive manual on
+understanding or using wireless technologies, but should be sufficient
to get you moving without wires on Linux.
For information on building and installing the driver, see the INSTALL
@@ -102,14 +110,14 @@ file.
1.1. Overview of Features
------------------------------------------------
+-------------------------
The current release (1.1.2) supports the following features:
+ BSS mode (Infrastructure, Managed)
+ IBSS mode (Ad-Hoc)
+ WEP (OPEN and SHARED KEY mode)
+ 802.1x EAP via wpa_supplicant and xsupplicant
-+ Wireless Extension support
++ Wireless Extension support
+ Full B and G rate support (2200 and 2915)
+ Full A rate support (2915 only)
+ Transmit power control
@@ -122,102 +130,107 @@ supported:
+ long/short preamble support
+ Monitor mode (aka RFMon)
-The distinction between officially supported and enabled is a reflection
+The distinction between officially supported and enabled is a reflection
on the amount of validation and interoperability testing that has been
-performed on a given feature.
+performed on a given feature.
1.2. Command Line Parameters
------------------------------------------------
+----------------------------
Like many modules used in the Linux kernel, the Intel(R) PRO/Wireless
-2915ABG Driver for Linux allows configuration options to be provided
-as module parameters. The most common way to specify a module parameter
-is via the command line.
+2915ABG Driver for Linux allows configuration options to be provided
+as module parameters. The most common way to specify a module parameter
+is via the command line.
-The general form is:
+The general form is::
-% modprobe ipw2200 parameter=value
+ % modprobe ipw2200 parameter=value
Where the supported parameter are:
associate
Set to 0 to disable the auto scan-and-associate functionality of the
- driver. If disabled, the driver will not attempt to scan
- for and associate to a network until it has been configured with
- one or more properties for the target network, for example configuring
+ driver. If disabled, the driver will not attempt to scan
+ for and associate to a network until it has been configured with
+ one or more properties for the target network, for example configuring
the network SSID. Default is 0 (do not auto-associate)
-
+
Example: % modprobe ipw2200 associate=0
auto_create
- Set to 0 to disable the auto creation of an Ad-Hoc network
- matching the channel and network name parameters provided.
+ Set to 0 to disable the auto creation of an Ad-Hoc network
+ matching the channel and network name parameters provided.
Default is 1.
channel
channel number for association. The normal method for setting
- the channel would be to use the standard wireless tools
- (i.e. `iwconfig eth1 channel 10`), but it is useful sometimes
+ the channel would be to use the standard wireless tools
+ (i.e. `iwconfig eth1 channel 10`), but it is useful sometimes
to set this while debugging. Channel 0 means 'ANY'
debug
If using a debug build, this is used to control the amount of debug
info is logged. See the 'dvals' and 'load' script for more info on
- how to use this (the dvals and load scripts are provided as part
- of the ipw2200 development snapshot releases available from the
+ how to use this (the dvals and load scripts are provided as part
+ of the ipw2200 development snapshot releases available from the
SourceForge project at http://ipw2200.sf.net)
-
+
led
Can be used to turn on experimental LED code.
0 = Off, 1 = On. Default is 1.
mode
- Can be used to set the default mode of the adapter.
+ Can be used to set the default mode of the adapter.
0 = Managed, 1 = Ad-Hoc, 2 = Monitor
1.3. Wireless Extension Private Methods
------------------------------------------------
+---------------------------------------
-As an interface designed to handle generic hardware, there are certain
-capabilities not exposed through the normal Wireless Tool interface. As
-such, a provision is provided for a driver to declare custom, or
-private, methods. The Intel(R) PRO/Wireless 2915ABG Driver for Linux
+As an interface designed to handle generic hardware, there are certain
+capabilities not exposed through the normal Wireless Tool interface. As
+such, a provision is provided for a driver to declare custom, or
+private, methods. The Intel(R) PRO/Wireless 2915ABG Driver for Linux
defines several of these to configure various settings.
-The general form of using the private wireless methods is:
+The general form of using the private wireless methods is::
% iwpriv $IFNAME method parameters
-Where $IFNAME is the interface name the device is registered with
+Where $IFNAME is the interface name the device is registered with
(typically eth1, customized via one of the various network interface
name managers, such as ifrename)
The supported private methods are:
get_mode
- Can be used to report out which IEEE mode the driver is
+ Can be used to report out which IEEE mode the driver is
configured to support. Example:
-
+
% iwpriv eth1 get_mode
eth1 get_mode:802.11bg (6)
set_mode
- Can be used to configure which IEEE mode the driver will
- support.
+ Can be used to configure which IEEE mode the driver will
+ support.
+
+ Usage::
+
+ % iwpriv eth1 set_mode {mode}
- Usage:
- % iwpriv eth1 set_mode {mode}
Where {mode} is a number in the range 1-7:
+
+ == =====================
1 802.11a (2915 only)
2 802.11b
3 802.11ab (2915 only)
- 4 802.11g
+ 4 802.11g
5 802.11ag (2915 only)
6 802.11bg
7 802.11abg (2915 only)
+ == =====================
get_preamble
Can be used to report configuration of preamble length.
@@ -225,99 +238,123 @@ The supported private methods are:
set_preamble
Can be used to set the configuration of preamble length:
- Usage:
- % iwpriv eth1 set_preamble {mode}
+ Usage::
+
+ % iwpriv eth1 set_preamble {mode}
+
Where {mode} is one of:
+
+ == ========================================
1 Long preamble only
0 Auto (long or short based on connection)
-
+ == ========================================
+
-1.4. Sysfs Helper Files:
------------------------------------------------
+1.4. Sysfs Helper Files
+-----------------------
-The Linux kernel provides a pseudo file system that can be used to
+The Linux kernel provides a pseudo file system that can be used to
access various components of the operating system. The Intel(R)
PRO/Wireless 2915ABG Driver for Linux exposes several configuration
parameters through this mechanism.
-An entry in the sysfs can support reading and/or writing. You can
-typically query the contents of a sysfs entry through the use of cat,
-and can set the contents via echo. For example:
+An entry in the sysfs can support reading and/or writing. You can
+typically query the contents of a sysfs entry through the use of cat,
+and can set the contents via echo. For example::
-% cat /sys/bus/pci/drivers/ipw2200/debug_level
+ % cat /sys/bus/pci/drivers/ipw2200/debug_level
-Will report the current debug level of the driver's logging subsystem
+Will report the current debug level of the driver's logging subsystem
(only available if CONFIG_IPW2200_DEBUG was configured when the driver
was built).
-You can set the debug level via:
+You can set the debug level via::
-% echo $VALUE > /sys/bus/pci/drivers/ipw2200/debug_level
+ % echo $VALUE > /sys/bus/pci/drivers/ipw2200/debug_level
-Where $VALUE would be a number in the case of this sysfs entry. The
-input to sysfs files does not have to be a number. For example, the
-firmware loader used by hotplug utilizes sysfs entries for transferring
+Where $VALUE would be a number in the case of this sysfs entry. The
+input to sysfs files does not have to be a number. For example, the
+firmware loader used by hotplug utilizes sysfs entries for transferring
the firmware image from user space into the driver.
-The Intel(R) PRO/Wireless 2915ABG Driver for Linux exposes sysfs entries
-at two levels -- driver level, which apply to all instances of the driver
-(in the event that there are more than one device installed) and device
+The Intel(R) PRO/Wireless 2915ABG Driver for Linux exposes sysfs entries
+at two levels -- driver level, which apply to all instances of the driver
+(in the event that there are more than one device installed) and device
level, which applies only to the single specific instance.
1.4.1 Driver Level Sysfs Helper Files
------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For the driver level files, look in /sys/bus/pci/drivers/ipw2200/
- debug_level
-
+ debug_level
This controls the same global as the 'debug' module parameter
1.4.2 Device Level Sysfs Helper Files
------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+For the device level files, look in::
-For the device level files, look in
-
/sys/bus/pci/drivers/ipw2200/{PCI-ID}/
-For example:
+For example:::
+
/sys/bus/pci/drivers/ipw2200/0000:02:01.0
For the device level files, see /sys/bus/pci/drivers/ipw2200:
rf_kill
- read -
- 0 = RF kill not enabled (radio on)
- 1 = SW based RF kill active (radio off)
- 2 = HW based RF kill active (radio off)
- 3 = Both HW and SW RF kill active (radio off)
+ read -
+
+ == =========================================
+ 0 RF kill not enabled (radio on)
+ 1 SW based RF kill active (radio off)
+ 2 HW based RF kill active (radio off)
+ 3 Both HW and SW RF kill active (radio off)
+ == =========================================
+
write -
- 0 = If SW based RF kill active, turn the radio back on
- 1 = If radio is on, activate SW based RF kill
- NOTE: If you enable the SW based RF kill and then toggle the HW
- based RF kill from ON -> OFF -> ON, the radio will NOT come back on
-
- ucode
+ == ==================================================
+ 0 If SW based RF kill active, turn the radio back on
+ 1 If radio is on, activate SW based RF kill
+ == ==================================================
+
+ .. note::
+
+ If you enable the SW based RF kill and then toggle the HW
+ based RF kill from ON -> OFF -> ON, the radio will NOT come back on
+
+ ucode
read-only access to the ucode version number
led
read -
- 0 = LED code disabled
- 1 = LED code enabled
+
+ == =================
+ 0 LED code disabled
+ 1 LED code enabled
+ == =================
+
write -
- 0 = Disable LED code
- 1 = Enable LED code
- NOTE: The LED code has been reported to hang some systems when
- running ifconfig and is therefore disabled by default.
+ == ================
+ 0 Disable LED code
+ 1 Enable LED code
+ == ================
+
+
+ .. note::
+
+ The LED code has been reported to hang some systems when
+ running ifconfig and is therefore disabled by default.
1.5. Supported channels
------------------------------------------------
+-----------------------
Upon loading the Intel(R) PRO/Wireless 2915ABG Driver for Linux, a
message stating the detected geography code and the number of 802.11
@@ -326,44 +363,59 @@ channels supported by the card will be displayed in the log.
The geography code corresponds to a regulatory domain as shown in the
table below.
- Supported channels
-Code Geography 802.11bg 802.11a
-
---- Restricted 11 0
-ZZF Custom US/Canada 11 8
-ZZD Rest of World 13 0
-ZZA Custom USA & Europe & High 11 13
-ZZB Custom NA & Europe 11 13
-ZZC Custom Japan 11 4
-ZZM Custom 11 0
-ZZE Europe 13 19
-ZZJ Custom Japan 14 4
-ZZR Rest of World 14 0
-ZZH High Band 13 4
-ZZG Custom Europe 13 4
-ZZK Europe 13 24
-ZZL Europe 11 13
-
-
-2. Ad-Hoc Networking
------------------------------------------------
-
-When using a device in an Ad-Hoc network, it is useful to understand the
-sequence and requirements for the driver to be able to create, join, or
+ +------+----------------------------+--------------------+
+ | | | Supported channels |
+ | Code | Geography +----------+---------+
+ | | | 802.11bg | 802.11a |
+ +======+============================+==========+=========+
+ | --- | Restricted | 11 | 0 |
+ +------+----------------------------+----------+---------+
+ | ZZF | Custom US/Canada | 11 | 8 |
+ +------+----------------------------+----------+---------+
+ | ZZD | Rest of World | 13 | 0 |
+ +------+----------------------------+----------+---------+
+ | ZZA | Custom USA & Europe & High | 11 | 13 |
+ +------+----------------------------+----------+---------+
+ | ZZB | Custom NA & Europe | 11 | 13 |
+ +------+----------------------------+----------+---------+
+ | ZZC | Custom Japan | 11 | 4 |
+ +------+----------------------------+----------+---------+
+ | ZZM | Custom | 11 | 0 |
+ +------+----------------------------+----------+---------+
+ | ZZE | Europe | 13 | 19 |
+ +------+----------------------------+----------+---------+
+ | ZZJ | Custom Japan | 14 | 4 |
+ +------+----------------------------+----------+---------+
+ | ZZR | Rest of World | 14 | 0 |
+ +------+----------------------------+----------+---------+
+ | ZZH | High Band | 13 | 4 |
+ +------+----------------------------+----------+---------+
+ | ZZG | Custom Europe | 13 | 4 |
+ +------+----------------------------+----------+---------+
+ | ZZK | Europe | 13 | 24 |
+ +------+----------------------------+----------+---------+
+ | ZZL | Europe | 11 | 13 |
+ +------+----------------------------+----------+---------+
+
+2. Ad-Hoc Networking
+=====================
+
+When using a device in an Ad-Hoc network, it is useful to understand the
+sequence and requirements for the driver to be able to create, join, or
merge networks.
-The following attempts to provide enough information so that you can
-have a consistent experience while using the driver as a member of an
+The following attempts to provide enough information so that you can
+have a consistent experience while using the driver as a member of an
Ad-Hoc network.
2.1. Joining an Ad-Hoc Network
------------------------------------------------
+------------------------------
-The easiest way to get onto an Ad-Hoc network is to join one that
+The easiest way to get onto an Ad-Hoc network is to join one that
already exists.
2.2. Creating an Ad-Hoc Network
------------------------------------------------
+-------------------------------
An Ad-Hoc networks is created using the syntax of the Wireless tool.
@@ -371,21 +423,21 @@ For Example:
iwconfig eth1 mode ad-hoc essid testing channel 2
2.3. Merging Ad-Hoc Networks
------------------------------------------------
+----------------------------
-3. Interaction with Wireless Tools
------------------------------------------------
+3. Interaction with Wireless Tools
+==================================
3.1 iwconfig mode
------------------------------------------------
+-----------------
When configuring the mode of the adapter, all run-time configured parameters
are reset to the value used when the module was loaded. This includes
channels, rates, ESSID, etc.
3.2 iwconfig sens
------------------------------------------------
+-----------------
The 'iwconfig ethX sens XX' command will not set the signal sensitivity
threshold, as described in iwconfig documentation, but rather the number
@@ -394,35 +446,35 @@ to another access point. At the same time, it will set the disassociation
threshold to 3 times the given value.
-4. About the Version Numbers
------------------------------------------------
+4. About the Version Numbers
+=============================
-Due to the nature of open source development projects, there are
-frequently changes being incorporated that have not gone through
-a complete validation process. These changes are incorporated into
+Due to the nature of open source development projects, there are
+frequently changes being incorporated that have not gone through
+a complete validation process. These changes are incorporated into
development snapshot releases.
-Releases are numbered with a three level scheme:
+Releases are numbered with a three level scheme:
major.minor.development
Any version where the 'development' portion is 0 (for example
-1.0.0, 1.1.0, etc.) indicates a stable version that will be made
+1.0.0, 1.1.0, etc.) indicates a stable version that will be made
available for kernel inclusion.
Any version where the 'development' portion is not a 0 (for
example 1.0.1, 1.1.5, etc.) indicates a development version that is
-being made available for testing and cutting edge users. The stability
+being made available for testing and cutting edge users. The stability
and functionality of the development releases are not know. We make
efforts to try and keep all snapshots reasonably stable, but due to the
-frequency of their release, and the desire to get those releases
+frequency of their release, and the desire to get those releases
available as quickly as possible, unknown anomalies should be expected.
The major version number will be incremented when significant changes
are made to the driver. Currently, there are no major changes planned.
-5. Firmware installation
-----------------------------------------------
+5. Firmware installation
+========================
The driver requires a firmware image, download it and extract the
files under /lib/firmware (or wherever your hotplug's firmware.agent
@@ -433,40 +485,42 @@ The firmware can be downloaded from the following URL:
http://ipw2200.sf.net/
-6. Support
------------------------------------------------
+6. Support
+==========
-For direct support of the 1.0.0 version, you can contact
+For direct support of the 1.0.0 version, you can contact
http://supportmail.intel.com, or you can use the open source project
support.
For general information and support, go to:
-
+
http://ipw2200.sf.net/
-7. License
------------------------------------------------
+7. License
+==========
- Copyright(c) 2003 - 2006 Intel Corporation. All rights reserved.
+ Copyright |copy| 2003 - 2006 Intel Corporation. All rights reserved.
- This program is free software; you can redistribute it and/or modify it
- under the terms of the GNU General Public License version 2 as
+ This program is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License version 2 as
published by the Free Software Foundation.
-
- This program is distributed in the hope that it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+
+ This program is distributed in the hope that it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
-
+
You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc., 59
+ this program; if not, write to the Free Software Foundation, Inc., 59
Temple Place - Suite 330, Boston, MA 02111-1307, USA.
-
+
The full GNU General Public License is included in this distribution in the
file called LICENSE.
-
+
Contact Information:
+
James P. Ketrenos <ipw2100-admin@linux.intel.com>
+
Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
diff --git a/Documentation/networking/device_drivers/intel/ixgb.rst b/Documentation/networking/device_drivers/intel/ixgb.rst
index 945018207a92..ab624f1a44a8 100644
--- a/Documentation/networking/device_drivers/intel/ixgb.rst
+++ b/Documentation/networking/device_drivers/intel/ixgb.rst
@@ -37,7 +37,7 @@ The following features are available in this kernel:
- SNMP
Channel Bonding documentation can be found in the Linux kernel source:
-/Documentation/networking/bonding.txt
+/Documentation/networking/bonding.rst
The driver information previously displayed in the /proc filesystem is not
supported in this release. Alternatively, you can use ethtool (version 1.6
diff --git a/Documentation/networking/device_drivers/microsoft/netvsc.txt b/Documentation/networking/device_drivers/microsoft/netvsc.rst
index cd63556b27a0..c3f51c672a68 100644
--- a/Documentation/networking/device_drivers/microsoft/netvsc.txt
+++ b/Documentation/networking/device_drivers/microsoft/netvsc.rst
@@ -1,3 +1,6 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
Hyper-V network driver
======================
@@ -10,15 +13,15 @@ Windows 10.
Features
========
- Checksum offload
- ----------------
+Checksum offload
+----------------
The netvsc driver supports checksum offload as long as the
Hyper-V host version does. Windows Server 2016 and Azure
support checksum offload for TCP and UDP for both IPv4 and
IPv6. Windows Server 2012 only supports checksum offload for TCP.
- Receive Side Scaling
- --------------------
+Receive Side Scaling
+--------------------
Hyper-V supports receive side scaling. For TCP & UDP, packets can
be distributed among available queues based on IP address and port
number.
@@ -32,30 +35,37 @@ Features
hashing. Using L3 hashing is recommended in this case.
For example, for UDP over IPv4 on eth0:
- To include UDP port numbers in hashing:
- ethtool -N eth0 rx-flow-hash udp4 sdfn
- To exclude UDP port numbers in hashing:
- ethtool -N eth0 rx-flow-hash udp4 sd
- To show UDP hash level:
- ethtool -n eth0 rx-flow-hash udp4
-
- Generic Receive Offload, aka GRO
- --------------------------------
+
+ To include UDP port numbers in hashing::
+
+ ethtool -N eth0 rx-flow-hash udp4 sdfn
+
+ To exclude UDP port numbers in hashing::
+
+ ethtool -N eth0 rx-flow-hash udp4 sd
+
+ To show UDP hash level::
+
+ ethtool -n eth0 rx-flow-hash udp4
+
+Generic Receive Offload, aka GRO
+--------------------------------
The driver supports GRO and it is enabled by default. GRO coalesces
like packets and significantly reduces CPU usage under heavy Rx
load.
- Large Receive Offload (LRO), or Receive Side Coalescing (RSC)
- -------------------------------------------------------------
+Large Receive Offload (LRO), or Receive Side Coalescing (RSC)
+-------------------------------------------------------------
The driver supports LRO/RSC in the vSwitch feature. It reduces the per packet
processing overhead by coalescing multiple TCP segments when possible. The
feature is enabled by default on VMs running on Windows Server 2019 and
- later. It may be changed by ethtool command:
+ later. It may be changed by ethtool command::
+
ethtool -K eth0 lro on
ethtool -K eth0 lro off
- SR-IOV support
- --------------
+SR-IOV support
+--------------
Hyper-V supports SR-IOV as a hardware acceleration option. If SR-IOV
is enabled in both the vSwitch and the guest configuration, then the
Virtual Function (VF) device is passed to the guest as a PCI
@@ -70,8 +80,8 @@ Features
flow direction is desired, these should be applied directly to the
VF slave device.
- Receive Buffer
- --------------
+Receive Buffer
+--------------
Packets are received into a receive area which is created when device
is probed. The receive area is broken into MTU sized chunks and each may
contain one or more packets. The number of receive sections may be changed
@@ -83,8 +93,8 @@ Features
will use slower method to handle very large packets or if the send buffer
area is exhausted.
- XDP support
- -----------
+XDP support
+-----------
XDP (eXpress Data Path) is a feature that runs eBPF bytecode at the early
stage when packets arrive at a NIC card. The goal is to increase performance
for packet processing, reducing the overhead of SKB allocation and other
@@ -99,7 +109,8 @@ Features
overwritten by setting of synthetic NIC.
XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO
- before running XDP:
+ before running XDP::
+
ethtool -K eth0 lro off
XDP_REDIRECT action is not yet supported.
diff --git a/Documentation/networking/device_drivers/neterion/s2io.rst b/Documentation/networking/device_drivers/neterion/s2io.rst
new file mode 100644
index 000000000000..c5673ec4559b
--- /dev/null
+++ b/Documentation/networking/device_drivers/neterion/s2io.rst
@@ -0,0 +1,196 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================================================
+Neterion's (Formerly S2io) Xframe I/II PCI-X 10GbE driver
+=========================================================
+
+Release notes for Neterion's (Formerly S2io) Xframe I/II PCI-X 10GbE driver.
+
+.. Contents
+ - 1. Introduction
+ - 2. Identifying the adapter/interface
+ - 3. Features supported
+ - 4. Command line parameters
+ - 5. Performance suggestions
+ - 6. Available Downloads
+
+
+1. Introduction
+===============
+This Linux driver supports Neterion's Xframe I PCI-X 1.0 and
+Xframe II PCI-X 2.0 adapters. It supports several features
+such as jumbo frames, MSI/MSI-X, checksum offloads, TSO, UFO and so on.
+See below for complete list of features.
+
+All features are supported for both IPv4 and IPv6.
+
+2. Identifying the adapter/interface
+====================================
+
+a. Insert the adapter(s) in your system.
+b. Build and load driver::
+
+ # insmod s2io.ko
+
+c. View log messages::
+
+ # dmesg | tail -40
+
+You will see messages similar to::
+
+ eth3: Neterion Xframe I 10GbE adapter (rev 3), Version 2.0.9.1, Intr type INTA
+ eth4: Neterion Xframe II 10GbE adapter (rev 2), Version 2.0.9.1, Intr type INTA
+ eth4: Device is on 64 bit 133MHz PCIX(M1) bus
+
+The above messages identify the adapter type(Xframe I/II), adapter revision,
+driver version, interface name(eth3, eth4), Interrupt type(INTA, MSI, MSI-X).
+In case of Xframe II, the PCI/PCI-X bus width and frequency are displayed
+as well.
+
+To associate an interface with a physical adapter use "ethtool -p <ethX>".
+The corresponding adapter's LED will blink multiple times.
+
+3. Features supported
+=====================
+a. Jumbo frames. Xframe I/II supports MTU up to 9600 bytes,
+ modifiable using ip command.
+
+b. Offloads. Supports checksum offload(TCP/UDP/IP) on transmit
+ and receive, TSO.
+
+c. Multi-buffer receive mode. Scattering of packet across multiple
+ buffers. Currently driver supports 2-buffer mode which yields
+ significant performance improvement on certain platforms(SGI Altix,
+ IBM xSeries).
+
+d. MSI/MSI-X. Can be enabled on platforms which support this feature
+ (IA64, Xeon) resulting in noticeable performance improvement(up to 7%
+ on certain platforms).
+
+e. Statistics. Comprehensive MAC-level and software statistics displayed
+ using "ethtool -S" option.
+
+f. Multi-FIFO/Ring. Supports up to 8 transmit queues and receive rings,
+ with multiple steering options.
+
+4. Command line parameters
+==========================
+
+a. tx_fifo_num
+ Number of transmit queues
+
+Valid range: 1-8
+
+Default: 1
+
+b. rx_ring_num
+ Number of receive rings
+
+Valid range: 1-8
+
+Default: 1
+
+c. tx_fifo_len
+ Size of each transmit queue
+
+Valid range: Total length of all queues should not exceed 8192
+
+Default: 4096
+
+d. rx_ring_sz
+ Size of each receive ring(in 4K blocks)
+
+Valid range: Limited by memory on system
+
+Default: 30
+
+e. intr_type
+ Specifies interrupt type. Possible values 0(INTA), 2(MSI-X)
+
+Valid values: 0, 2
+
+Default: 2
+
+5. Performance suggestions
+==========================
+
+General:
+
+a. Set MTU to maximum(9000 for switch setup, 9600 in back-to-back configuration)
+b. Set TCP windows size to optimal value.
+
+For instance, for MTU=1500 a value of 210K has been observed to result in
+good performance::
+
+ # sysctl -w net.ipv4.tcp_rmem="210000 210000 210000"
+ # sysctl -w net.ipv4.tcp_wmem="210000 210000 210000"
+
+For MTU=9000, TCP window size of 10 MB is recommended::
+
+ # sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
+ # sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
+
+Transmit performance:
+
+a. By default, the driver respects BIOS settings for PCI bus parameters.
+ However, you may want to experiment with PCI bus parameters
+ max-split-transactions(MOST) and MMRBC (use setpci command).
+
+ A MOST value of 2 has been found optimal for Opterons and 3 for Itanium.
+
+ It could be different for your hardware.
+
+ Set MMRBC to 4K**.
+
+ For example you can set
+
+ For opteron::
+
+ #setpci -d 17d5:* 62=1d
+
+ For Itanium::
+
+ #setpci -d 17d5:* 62=3d
+
+ For detailed description of the PCI registers, please see Xframe User Guide.
+
+b. Ensure Transmit Checksum offload is enabled. Use ethtool to set/verify this
+ parameter.
+
+c. Turn on TSO(using "ethtool -K")::
+
+ # ethtool -K <ethX> tso on
+
+Receive performance:
+
+a. By default, the driver respects BIOS settings for PCI bus parameters.
+ However, you may want to set PCI latency timer to 248::
+
+ #setpci -d 17d5:* LATENCY_TIMER=f8
+
+ For detailed description of the PCI registers, please see Xframe User Guide.
+
+b. Use 2-buffer mode. This results in large performance boost on
+ certain platforms(eg. SGI Altix, IBM xSeries).
+
+c. Ensure Receive Checksum offload is enabled. Use "ethtool -K ethX" command to
+ set/verify this option.
+
+d. Enable NAPI feature(in kernel configuration Device Drivers ---> Network
+ device support ---> Ethernet (10000 Mbit) ---> S2IO 10Gbe Xframe NIC) to
+ bring down CPU utilization.
+
+.. note::
+
+ For AMD opteron platforms with 8131 chipset, MMRBC=1 and MOST=1 are
+ recommended as safe parameters.
+
+For more information, please review the AMD8131 errata at
+http://vip.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/
+26310_AMD-8131_HyperTransport_PCI-X_Tunnel_Revision_Guide_rev_3_18.pdf
+
+6. Support
+==========
+
+For further support please contact either your 10GbE Xframe NIC vendor (IBM,
+HP, SGI etc.)
diff --git a/Documentation/networking/device_drivers/neterion/s2io.txt b/Documentation/networking/device_drivers/neterion/s2io.txt
deleted file mode 100644
index 0362a42f7cf4..000000000000
--- a/Documentation/networking/device_drivers/neterion/s2io.txt
+++ /dev/null
@@ -1,141 +0,0 @@
-Release notes for Neterion's (Formerly S2io) Xframe I/II PCI-X 10GbE driver.
-
-Contents
-=======
-- 1. Introduction
-- 2. Identifying the adapter/interface
-- 3. Features supported
-- 4. Command line parameters
-- 5. Performance suggestions
-- 6. Available Downloads
-
-
-1. Introduction:
-This Linux driver supports Neterion's Xframe I PCI-X 1.0 and
-Xframe II PCI-X 2.0 adapters. It supports several features
-such as jumbo frames, MSI/MSI-X, checksum offloads, TSO, UFO and so on.
-See below for complete list of features.
-All features are supported for both IPv4 and IPv6.
-
-2. Identifying the adapter/interface:
-a. Insert the adapter(s) in your system.
-b. Build and load driver
-# insmod s2io.ko
-c. View log messages
-# dmesg | tail -40
-You will see messages similar to:
-eth3: Neterion Xframe I 10GbE adapter (rev 3), Version 2.0.9.1, Intr type INTA
-eth4: Neterion Xframe II 10GbE adapter (rev 2), Version 2.0.9.1, Intr type INTA
-eth4: Device is on 64 bit 133MHz PCIX(M1) bus
-
-The above messages identify the adapter type(Xframe I/II), adapter revision,
-driver version, interface name(eth3, eth4), Interrupt type(INTA, MSI, MSI-X).
-In case of Xframe II, the PCI/PCI-X bus width and frequency are displayed
-as well.
-
-To associate an interface with a physical adapter use "ethtool -p <ethX>".
-The corresponding adapter's LED will blink multiple times.
-
-3. Features supported:
-a. Jumbo frames. Xframe I/II supports MTU up to 9600 bytes,
-modifiable using ip command.
-
-b. Offloads. Supports checksum offload(TCP/UDP/IP) on transmit
-and receive, TSO.
-
-c. Multi-buffer receive mode. Scattering of packet across multiple
-buffers. Currently driver supports 2-buffer mode which yields
-significant performance improvement on certain platforms(SGI Altix,
-IBM xSeries).
-
-d. MSI/MSI-X. Can be enabled on platforms which support this feature
-(IA64, Xeon) resulting in noticeable performance improvement(up to 7%
-on certain platforms).
-
-e. Statistics. Comprehensive MAC-level and software statistics displayed
-using "ethtool -S" option.
-
-f. Multi-FIFO/Ring. Supports up to 8 transmit queues and receive rings,
-with multiple steering options.
-
-4. Command line parameters
-a. tx_fifo_num
-Number of transmit queues
-Valid range: 1-8
-Default: 1
-
-b. rx_ring_num
-Number of receive rings
-Valid range: 1-8
-Default: 1
-
-c. tx_fifo_len
-Size of each transmit queue
-Valid range: Total length of all queues should not exceed 8192
-Default: 4096
-
-d. rx_ring_sz
-Size of each receive ring(in 4K blocks)
-Valid range: Limited by memory on system
-Default: 30
-
-e. intr_type
-Specifies interrupt type. Possible values 0(INTA), 2(MSI-X)
-Valid values: 0, 2
-Default: 2
-
-5. Performance suggestions
-General:
-a. Set MTU to maximum(9000 for switch setup, 9600 in back-to-back configuration)
-b. Set TCP windows size to optimal value.
-For instance, for MTU=1500 a value of 210K has been observed to result in
-good performance.
-# sysctl -w net.ipv4.tcp_rmem="210000 210000 210000"
-# sysctl -w net.ipv4.tcp_wmem="210000 210000 210000"
-For MTU=9000, TCP window size of 10 MB is recommended.
-# sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
-# sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
-
-Transmit performance:
-a. By default, the driver respects BIOS settings for PCI bus parameters.
-However, you may want to experiment with PCI bus parameters
-max-split-transactions(MOST) and MMRBC (use setpci command).
-A MOST value of 2 has been found optimal for Opterons and 3 for Itanium.
-It could be different for your hardware.
-Set MMRBC to 4K**.
-
-For example you can set
-For opteron
-#setpci -d 17d5:* 62=1d
-For Itanium
-#setpci -d 17d5:* 62=3d
-
-For detailed description of the PCI registers, please see Xframe User Guide.
-
-b. Ensure Transmit Checksum offload is enabled. Use ethtool to set/verify this
-parameter.
-c. Turn on TSO(using "ethtool -K")
-# ethtool -K <ethX> tso on
-
-Receive performance:
-a. By default, the driver respects BIOS settings for PCI bus parameters.
-However, you may want to set PCI latency timer to 248.
-#setpci -d 17d5:* LATENCY_TIMER=f8
-For detailed description of the PCI registers, please see Xframe User Guide.
-b. Use 2-buffer mode. This results in large performance boost on
-certain platforms(eg. SGI Altix, IBM xSeries).
-c. Ensure Receive Checksum offload is enabled. Use "ethtool -K ethX" command to
-set/verify this option.
-d. Enable NAPI feature(in kernel configuration Device Drivers ---> Network
-device support ---> Ethernet (10000 Mbit) ---> S2IO 10Gbe Xframe NIC) to
-bring down CPU utilization.
-
-** For AMD opteron platforms with 8131 chipset, MMRBC=1 and MOST=1 are
-recommended as safe parameters.
-For more information, please review the AMD8131 errata at
-http://vip.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/
-26310_AMD-8131_HyperTransport_PCI-X_Tunnel_Revision_Guide_rev_3_18.pdf
-
-6. Support
-For further support please contact either your 10GbE Xframe NIC vendor (IBM,
-HP, SGI etc.)
diff --git a/Documentation/networking/device_drivers/neterion/vxge.txt b/Documentation/networking/device_drivers/neterion/vxge.rst
index abfec245f97c..589c6b15c63d 100644
--- a/Documentation/networking/device_drivers/neterion/vxge.txt
+++ b/Documentation/networking/device_drivers/neterion/vxge.rst
@@ -1,24 +1,30 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============================================================================
Neterion's (Formerly S2io) X3100 Series 10GbE PCIe Server Adapter Linux driver
==============================================================================
-Contents
---------
+.. Contents
+
+ 1) Introduction
+ 2) Features supported
+ 3) Configurable driver parameters
+ 4) Troubleshooting
-1) Introduction
-2) Features supported
-3) Configurable driver parameters
-4) Troubleshooting
+1. Introduction
+===============
-1) Introduction:
-----------------
This Linux driver supports all Neterion's X3100 series 10 GbE PCIe I/O
Virtualized Server adapters.
+
The X3100 series supports four modes of operation, configurable via
-firmware -
- Single function mode
- Multi function mode
- SRIOV mode
- MRIOV mode
+firmware:
+
+ - Single function mode
+ - Multi function mode
+ - SRIOV mode
+ - MRIOV mode
+
The functions share a 10GbE link and the pci-e bus, but hardly anything else
inside the ASIC. Features like independent hw reset, statistics, bandwidth/
priority allocation and guarantees, GRO, TSO, interrupt moderation etc are
@@ -26,41 +32,49 @@ supported independently on each function.
(See below for a complete list of features supported for both IPv4 and IPv6)
-2) Features supported:
-----------------------
+2. Features supported
+=====================
i) Single function mode (up to 17 queues)
ii) Multi function mode (up to 17 functions)
iii) PCI-SIG's I/O Virtualization
+
- Single Root mode: v1.0 (up to 17 functions)
- Multi-Root mode: v1.0 (up to 17 functions)
iv) Jumbo frames
+
X3100 Series supports MTU up to 9600 bytes, modifiable using
ip command.
v) Offloads supported: (Enabled by default)
- Checksum offload (TCP/UDP/IP) on transmit and receive paths
- TCP Segmentation Offload (TSO) on transmit path
- Generic Receive Offload (GRO) on receive path
+
+ - Checksum offload (TCP/UDP/IP) on transmit and receive paths
+ - TCP Segmentation Offload (TSO) on transmit path
+ - Generic Receive Offload (GRO) on receive path
vi) MSI-X: (Enabled by default)
+
Resulting in noticeable performance improvement (up to 7% on certain
platforms).
vii) NAPI: (Enabled by default)
+
For better Rx interrupt moderation.
viii)RTH (Receive Traffic Hash): (Enabled by default)
+
Receive side steering for better scaling.
ix) Statistics
+
Comprehensive MAC-level and software statistics displayed using
"ethtool -S" option.
x) Multiple hardware queues: (Enabled by default)
+
Up to 17 hardware based transmit and receive data channels, with
multiple steering options (transmit multiqueue enabled by default).
@@ -69,25 +83,33 @@ x) Multiple hardware queues: (Enabled by default)
i) max_config_dev
Specifies maximum device functions to be enabled.
+
Valid range: 1-8
ii) max_config_port
Specifies number of ports to be enabled.
+
Valid range: 1,2
+
Default: 1
-iii)max_config_vpath
+iii) max_config_vpath
Specifies maximum VPATH(s) configured for each device function.
+
Valid range: 1-17
iv) vlan_tag_strip
Enables/disables vlan tag stripping from all received tagged frames that
are not replicated at the internal L2 switch.
+
Valid range: 0,1 (disabled, enabled respectively)
+
Default: 1
v) addr_learn_en
Enable learning the mac address of the guest OS interface in
virtualization environment.
+
Valid range: 0,1 (disabled, enabled respectively)
+
Default: 0
diff --git a/Documentation/networking/device_drivers/pensando/ionic.rst b/Documentation/networking/device_drivers/pensando/ionic.rst
index c17d680cf334..0eabbc347d6c 100644
--- a/Documentation/networking/device_drivers/pensando/ionic.rst
+++ b/Documentation/networking/device_drivers/pensando/ionic.rst
@@ -11,6 +11,9 @@ Contents
========
- Identifying the Adapter
+- Enabling the driver
+- Configuring the driver
+- Statistics
- Support
Identifying the Adapter
@@ -28,12 +31,238 @@ and configure them for use. There should be log entries in the kernel
messages such as these::
$ dmesg | grep ionic
- ionic Pensando Ethernet NIC Driver, ver 0.15.0-k
+ ionic 0000:b5:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
ionic 0000:b5:00.0 enp181s0: renamed from eth0
+ ionic 0000:b5:00.0 enp181s0: Link up - 100 Gbps
+ ionic 0000:b6:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
ionic 0000:b6:00.0 enp182s0: renamed from eth0
+ ionic 0000:b6:00.0 enp182s0: Link up - 100 Gbps
+
+Driver and firmware version information can be gathered with either of
+ethtool or devlink tools::
+
+ $ ethtool -i enp181s0
+ driver: ionic
+ version: 5.7.0
+ firmware-version: 1.8.0-28
+ ...
+
+ $ devlink dev info pci/0000:b5:00.0
+ pci/0000:b5:00.0:
+ driver ionic
+ serial_number FLM18420073
+ versions:
+ fixed:
+ asic.id 0x0
+ asic.rev 0x0
+ running:
+ fw 1.8.0-28
+
+See Documentation/networking/devlink/ionic.rst for more information
+on the devlink dev info data.
+
+Enabling the driver
+===================
+
+The driver is enabled via the standard kernel configuration system,
+using the make command::
+
+ make oldconfig/menuconfig/etc.
+
+The driver is located in the menu structure at:
+
+ -> Device Drivers
+ -> Network device support (NETDEVICES [=y])
+ -> Ethernet driver support
+ -> Pensando devices
+ -> Pensando Ethernet IONIC Support
+
+Configuring the Driver
+======================
+
+MTU
+---
+
+Jumbo frame support is available with a maximim size of 9194 bytes.
+
+Interrupt coalescing
+--------------------
+
+Interrupt coalescing can be configured by changing the rx-usecs value with
+the "ethtool -C" command. The rx-usecs range is 0-190. The tx-usecs value
+reflects the rx-usecs value as they are tied together on the same interrupt.
+
+SR-IOV
+------
+
+Minimal SR-IOV support is currently offered and can be enabled by setting
+the sysfs 'sriov_numvfs' value, if supported by your particular firmware
+configuration.
+
+Statistics
+==========
+
+Basic hardware stats
+--------------------
+
+The commands ``netstat -i``, ``ip -s link show``, and ``ifconfig`` show
+a limited set of statistics taken directly from firmware. For example::
+
+ $ ip -s link show enp181s0
+ 7: enp181s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
+ link/ether 00:ae:cd:00:07:68 brd ff:ff:ff:ff:ff:ff
+ RX: bytes packets errors dropped overrun mcast
+ 414 5 0 0 0 0
+ TX: bytes packets errors dropped carrier collsns
+ 1384 18 0 0 0 0
+
+ethtool -S
+----------
+
+The statistics shown from the ``ethtool -S`` command includes a combination of
+driver counters and firmware counters, including port and queue specific values.
+The driver values are counters computed by the driver, and the firmware values
+are gathered by the firmware from the port hardware and passed through the
+driver with no further interpretation.
+
+Driver port specific::
+
+ tx_packets: 12
+ tx_bytes: 964
+ rx_packets: 5
+ rx_bytes: 414
+ tx_tso: 0
+ tx_tso_bytes: 0
+ tx_csum_none: 12
+ tx_csum: 0
+ rx_csum_none: 0
+ rx_csum_complete: 3
+ rx_csum_error: 0
+
+Driver queue specific::
+
+ tx_0_pkts: 3
+ tx_0_bytes: 294
+ tx_0_clean: 3
+ tx_0_dma_map_err: 0
+ tx_0_linearize: 0
+ tx_0_frags: 0
+ tx_0_tso: 0
+ tx_0_tso_bytes: 0
+ tx_0_csum_none: 3
+ tx_0_csum: 0
+ tx_0_vlan_inserted: 0
+ rx_0_pkts: 2
+ rx_0_bytes: 120
+ rx_0_dma_map_err: 0
+ rx_0_alloc_err: 0
+ rx_0_csum_none: 0
+ rx_0_csum_complete: 0
+ rx_0_csum_error: 0
+ rx_0_dropped: 0
+ rx_0_vlan_stripped: 0
+
+Firmware port specific::
+
+ hw_tx_dropped: 0
+ hw_rx_dropped: 0
+ hw_rx_over_errors: 0
+ hw_rx_missed_errors: 0
+ hw_tx_aborted_errors: 0
+ frames_rx_ok: 15
+ frames_rx_all: 15
+ frames_rx_bad_fcs: 0
+ frames_rx_bad_all: 0
+ octets_rx_ok: 1290
+ octets_rx_all: 1290
+ frames_rx_unicast: 10
+ frames_rx_multicast: 5
+ frames_rx_broadcast: 0
+ frames_rx_pause: 0
+ frames_rx_bad_length: 0
+ frames_rx_undersized: 0
+ frames_rx_oversized: 0
+ frames_rx_fragments: 0
+ frames_rx_jabber: 0
+ frames_rx_pripause: 0
+ frames_rx_stomped_crc: 0
+ frames_rx_too_long: 0
+ frames_rx_vlan_good: 3
+ frames_rx_dropped: 0
+ frames_rx_less_than_64b: 0
+ frames_rx_64b: 4
+ frames_rx_65b_127b: 11
+ frames_rx_128b_255b: 0
+ frames_rx_256b_511b: 0
+ frames_rx_512b_1023b: 0
+ frames_rx_1024b_1518b: 0
+ frames_rx_1519b_2047b: 0
+ frames_rx_2048b_4095b: 0
+ frames_rx_4096b_8191b: 0
+ frames_rx_8192b_9215b: 0
+ frames_rx_other: 0
+ frames_tx_ok: 31
+ frames_tx_all: 31
+ frames_tx_bad: 0
+ octets_tx_ok: 2614
+ octets_tx_total: 2614
+ frames_tx_unicast: 8
+ frames_tx_multicast: 21
+ frames_tx_broadcast: 2
+ frames_tx_pause: 0
+ frames_tx_pripause: 0
+ frames_tx_vlan: 0
+ frames_tx_less_than_64b: 0
+ frames_tx_64b: 4
+ frames_tx_65b_127b: 27
+ frames_tx_128b_255b: 0
+ frames_tx_256b_511b: 0
+ frames_tx_512b_1023b: 0
+ frames_tx_1024b_1518b: 0
+ frames_tx_1519b_2047b: 0
+ frames_tx_2048b_4095b: 0
+ frames_tx_4096b_8191b: 0
+ frames_tx_8192b_9215b: 0
+ frames_tx_other: 0
+ frames_tx_pri_0: 0
+ frames_tx_pri_1: 0
+ frames_tx_pri_2: 0
+ frames_tx_pri_3: 0
+ frames_tx_pri_4: 0
+ frames_tx_pri_5: 0
+ frames_tx_pri_6: 0
+ frames_tx_pri_7: 0
+ frames_rx_pri_0: 0
+ frames_rx_pri_1: 0
+ frames_rx_pri_2: 0
+ frames_rx_pri_3: 0
+ frames_rx_pri_4: 0
+ frames_rx_pri_5: 0
+ frames_rx_pri_6: 0
+ frames_rx_pri_7: 0
+ tx_pripause_0_1us_count: 0
+ tx_pripause_1_1us_count: 0
+ tx_pripause_2_1us_count: 0
+ tx_pripause_3_1us_count: 0
+ tx_pripause_4_1us_count: 0
+ tx_pripause_5_1us_count: 0
+ tx_pripause_6_1us_count: 0
+ tx_pripause_7_1us_count: 0
+ rx_pripause_0_1us_count: 0
+ rx_pripause_1_1us_count: 0
+ rx_pripause_2_1us_count: 0
+ rx_pripause_3_1us_count: 0
+ rx_pripause_4_1us_count: 0
+ rx_pripause_5_1us_count: 0
+ rx_pripause_6_1us_count: 0
+ rx_pripause_7_1us_count: 0
+ rx_pause_1us_count: 0
+ frames_tx_truncated: 0
+
Support
=======
+
For general Linux networking support, please use the netdev mailing
list, which is monitored by Pensando personnel::
diff --git a/Documentation/networking/device_drivers/qualcomm/rmnet.txt b/Documentation/networking/device_drivers/qualcomm/rmnet.rst
index 6b341eaf2062..70643b58de05 100644
--- a/Documentation/networking/device_drivers/qualcomm/rmnet.txt
+++ b/Documentation/networking/device_drivers/qualcomm/rmnet.rst
@@ -1,4 +1,11 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+Rmnet Driver
+============
+
1. Introduction
+===============
rmnet driver is used for supporting the Multiplexing and aggregation
Protocol (MAP). This protocol is used by all recent chipsets using Qualcomm
@@ -18,17 +25,18 @@ sending aggregated bunch of MAP frames. rmnet driver will de-aggregate
these MAP frames and send them to appropriate PDN's.
2. Packet format
+================
a. MAP packet (data / control)
MAP header has the same endianness of the IP packet.
-Packet format -
+Packet format::
-Bit 0 1 2-7 8 - 15 16 - 31
-Function Command / Data Reserved Pad Multiplexer ID Payload length
-Bit 32 - x
-Function Raw Bytes
+ Bit 0 1 2-7 8 - 15 16 - 31
+ Function Command / Data Reserved Pad Multiplexer ID Payload length
+ Bit 32 - x
+ Function Raw Bytes
Command (1)/ Data (0) bit value is to indicate if the packet is a MAP command
or data packet. Control packet is used for transport level flow control. Data
@@ -44,24 +52,27 @@ Multiplexer ID is to indicate the PDN on which data has to be sent.
Payload length includes the padding length but does not include MAP header
length.
-b. MAP packet (command specific)
+b. MAP packet (command specific)::
-Bit 0 1 2-7 8 - 15 16 - 31
-Function Command Reserved Pad Multiplexer ID Payload length
-Bit 32 - 39 40 - 45 46 - 47 48 - 63
-Function Command name Reserved Command Type Reserved
-Bit 64 - 95
-Function Transaction ID
-Bit 96 - 127
-Function Command data
+ Bit 0 1 2-7 8 - 15 16 - 31
+ Function Command Reserved Pad Multiplexer ID Payload length
+ Bit 32 - 39 40 - 45 46 - 47 48 - 63
+ Function Command name Reserved Command Type Reserved
+ Bit 64 - 95
+ Function Transaction ID
+ Bit 96 - 127
+ Function Command data
Command 1 indicates disabling flow while 2 is enabling flow
-Command types -
+Command types
+
+= ==========================================
0 for MAP command request
1 is to acknowledge the receipt of a command
2 is for unsupported commands
3 is for error during processing of commands
+= ==========================================
c. Aggregation
@@ -71,9 +82,11 @@ packets and either ACK the MAP command or deliver the IP packet to the
network stack as needed
MAP header|IP Packet|Optional padding|MAP header|IP Packet|Optional padding....
+
MAP header|IP Packet|Optional padding|MAP header|Command Packet|Optional pad...
3. Userspace configuration
+==========================
rmnet userspace configuration is done through netlink library librmnetctl
and command line utility rmnetcli. Utility is hosted in codeaurora forum git.
diff --git a/Documentation/networking/device_drivers/sb1000.rst b/Documentation/networking/device_drivers/sb1000.rst
new file mode 100644
index 000000000000..c8582ca4034d
--- /dev/null
+++ b/Documentation/networking/device_drivers/sb1000.rst
@@ -0,0 +1,222 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================
+SB100 device driver
+===================
+
+sb1000 is a module network device driver for the General Instrument (also known
+as NextLevel) SURFboard1000 internal cable modem board. This is an ISA card
+which is used by a number of cable TV companies to provide cable modem access.
+It's a one-way downstream-only cable modem, meaning that your upstream net link
+is provided by your regular phone modem.
+
+This driver was written by Franco Venturi <fventuri@mediaone.net>. He deserves
+a great deal of thanks for this wonderful piece of code!
+
+Needed tools
+============
+
+Support for this device is now a part of the standard Linux kernel. The
+driver source code file is drivers/net/sb1000.c. In addition to this
+you will need:
+
+1. The "cmconfig" program. This is a utility which supplements "ifconfig"
+ to configure the cable modem and network interface (usually called "cm0");
+
+2. Several PPP scripts which live in /etc/ppp to make connecting via your
+ cable modem easy.
+
+ These utilities can be obtained from:
+
+ http://www.jacksonville.net/~fventuri/
+
+ in Franco's original source code distribution .tar.gz file. Support for
+ the sb1000 driver can be found at:
+
+ - http://web.archive.org/web/%2E/http://home.adelphia.net/~siglercm/sb1000.html
+ - http://web.archive.org/web/%2E/http://linuxpower.cx/~cable/
+
+ along with these utilities.
+
+3. The standard isapnp tools. These are necessary to configure your SB1000
+ card at boot time (or afterwards by hand) since it's a PnP card.
+
+ If you don't have these installed as a standard part of your Linux
+ distribution, you can find them at:
+
+ http://www.roestock.demon.co.uk/isapnptools/
+
+ or check your Linux distribution binary CD or their web site. For help with
+ isapnp, pnpdump, or /etc/isapnp.conf, go to:
+
+ http://www.roestock.demon.co.uk/isapnptools/isapnpfaq.html
+
+Using the driver
+================
+
+To make the SB1000 card work, follow these steps:
+
+1. Run ``make config``, or ``make menuconfig``, or ``make xconfig``, whichever
+ you prefer, in the top kernel tree directory to set up your kernel
+ configuration. Make sure to say "Y" to "Prompt for development drivers"
+ and to say "M" to the sb1000 driver. Also say "Y" or "M" to all the standard
+ networking questions to get TCP/IP and PPP networking support.
+
+2. **BEFORE** you build the kernel, edit drivers/net/sb1000.c. Make sure
+ to redefine the value of READ_DATA_PORT to match the I/O address used
+ by isapnp to access your PnP cards. This is the value of READPORT in
+ /etc/isapnp.conf or given by the output of pnpdump.
+
+3. Build and install the kernel and modules as usual.
+
+4. Boot your new kernel following the usual procedures.
+
+5. Set up to configure the new SB1000 PnP card by capturing the output
+ of "pnpdump" to a file and editing this file to set the correct I/O ports,
+ IRQ, and DMA settings for all your PnP cards. Make sure none of the settings
+ conflict with one another. Then test this configuration by running the
+ "isapnp" command with your new config file as the input. Check for
+ errors and fix as necessary. (As an aside, I use I/O ports 0x110 and
+ 0x310 and IRQ 11 for my SB1000 card and these work well for me. YMMV.)
+ Then save the finished config file as /etc/isapnp.conf for proper
+ configuration on subsequent reboots.
+
+6. Download the original file sb1000-1.1.2.tar.gz from Franco's site or one of
+ the others referenced above. As root, unpack it into a temporary directory
+ and do a ``make cmconfig`` and then ``install -c cmconfig /usr/local/sbin``.
+ Don't do ``make install`` because it expects to find all the utilities built
+ and ready for installation, not just cmconfig.
+
+7. As root, copy all the files under the ppp/ subdirectory in Franco's
+ tar file into /etc/ppp, being careful not to overwrite any files that are
+ already in there. Then modify ppp@gi-on to set the correct login name,
+ phone number, and frequency for the cable modem. Also edit pap-secrets
+ to specify your login name and password and any site-specific information
+ you need.
+
+8. Be sure to modify /etc/ppp/firewall to use ipchains instead of
+ the older ipfwadm commands from the 2.0.x kernels. There's a neat utility to
+ convert ipfwadm commands to ipchains commands:
+
+ http://users.dhp.com/~whisper/ipfwadm2ipchains/
+
+ You may also wish to modify the firewall script to implement a different
+ firewalling scheme.
+
+9. Start the PPP connection via the script /etc/ppp/ppp@gi-on. You must be
+ root to do this. It's better to use a utility like sudo to execute
+ frequently used commands like this with root permissions if possible. If you
+ connect successfully the cable modem interface will come up and you'll see a
+ driver message like this at the console::
+
+ cm0: sb1000 at (0x110,0x310), csn 1, S/N 0x2a0d16d8, IRQ 11.
+ sb1000.c:v1.1.2 6/01/98 (fventuri@mediaone.net)
+
+ The "ifconfig" command should show two new interfaces, ppp0 and cm0.
+
+ The command "cmconfig cm0" will give you information about the cable modem
+ interface.
+
+10. Try pinging a site via ``ping -c 5 www.yahoo.com``, for example. You should
+ see packets received.
+
+11. If you can't get site names (like www.yahoo.com) to resolve into
+ IP addresses (like 204.71.200.67), be sure your /etc/resolv.conf file
+ has no syntax errors and has the right nameserver IP addresses in it.
+ If this doesn't help, try something like ``ping -c 5 204.71.200.67`` to
+ see if the networking is running but the DNS resolution is where the
+ problem lies.
+
+12. If you still have problems, go to the support web sites mentioned above
+ and read the information and documentation there.
+
+Common problems
+===============
+
+1. Packets go out on the ppp0 interface but don't come back on the cm0
+ interface. It looks like I'm connected but I can't even ping any
+ numerical IP addresses. (This happens predominantly on Debian systems due
+ to a default boot-time configuration script.)
+
+Solution
+ As root ``echo 0 > /proc/sys/net/ipv4/conf/cm0/rp_filter`` so it
+ can share the same IP address as the ppp0 interface. Note that this
+ command should probably be added to the /etc/ppp/cablemodem script
+ *right*between* the "/sbin/ifconfig" and "/sbin/cmconfig" commands.
+ You may need to do this to /proc/sys/net/ipv4/conf/ppp0/rp_filter as well.
+ If you do this to /proc/sys/net/ipv4/conf/default/rp_filter on each reboot
+ (in rc.local or some such) then any interfaces can share the same IP
+ addresses.
+
+2. I get "unresolved symbol" error messages on executing ``insmod sb1000.o``.
+
+Solution
+ You probably have a non-matching kernel source tree and
+ /usr/include/linux and /usr/include/asm header files. Make sure you
+ install the correct versions of the header files in these two directories.
+ Then rebuild and reinstall the kernel.
+
+3. When isapnp runs it reports an error, and my SB1000 card isn't working.
+
+Solution
+ There's a problem with later versions of isapnp using the "(CHECK)"
+ option in the lines that allocate the two I/O addresses for the SB1000 card.
+ This first popped up on RH 6.0. Delete "(CHECK)" for the SB1000 I/O addresses.
+ Make sure they don't conflict with any other pieces of hardware first! Then
+ rerun isapnp and go from there.
+
+4. I can't execute the /etc/ppp/ppp@gi-on file.
+
+Solution
+ As root do ``chmod ug+x /etc/ppp/ppp@gi-on``.
+
+5. The firewall script isn't working (with 2.2.x and higher kernels).
+
+Solution
+ Use the ipfwadm2ipchains script referenced above to convert the
+ /etc/ppp/firewall script from the deprecated ipfwadm commands to ipchains.
+
+6. I'm getting *tons* of firewall deny messages in the /var/kern.log,
+ /var/messages, and/or /var/syslog files, and they're filling up my /var
+ partition!!!
+
+Solution
+ First, tell your ISP that you're receiving DoS (Denial of Service)
+ and/or portscanning (UDP connection attempts) attacks! Look over the deny
+ messages to figure out what the attack is and where it's coming from. Next,
+ edit /etc/ppp/cablemodem and make sure the ",nobroadcast" option is turned on
+ to the "cmconfig" command (uncomment that line). If you're not receiving these
+ denied packets on your broadcast interface (IP address xxx.yyy.zzz.255
+ typically), then someone is attacking your machine in particular. Be careful
+ out there....
+
+7. Everything seems to work fine but my computer locks up after a while
+ (and typically during a lengthy download through the cable modem)!
+
+Solution
+ You may need to add a short delay in the driver to 'slow down' the
+ SURFboard because your PC might not be able to keep up with the transfer rate
+ of the SB1000. To do this, it's probably best to download Franco's
+ sb1000-1.1.2.tar.gz archive and build and install sb1000.o manually. You'll
+ want to edit the 'Makefile' and look for the 'SB1000_DELAY'
+ define. Uncomment those 'CFLAGS' lines (and comment out the default ones)
+ and try setting the delay to something like 60 microseconds with:
+ '-DSB1000_DELAY=60'. Then do ``make`` and as root ``make install`` and try
+ it out. If it still doesn't work or you like playing with the driver, you may
+ try other numbers. Remember though that the higher the delay, the slower the
+ driver (which slows down the rest of the PC too when it is actively
+ used). Thanks to Ed Daiga for this tip!
+
+Credits
+=======
+
+This README came from Franco Venturi's original README file which is
+still supplied with his driver .tar.gz archive. I and all other sb1000 users
+owe Franco a tremendous "Thank you!" Additional thanks goes to Carl Patten
+and Ralph Bonnell who are now managing the Linux SB1000 web site, and to
+the SB1000 users who reported and helped debug the common problems listed
+above.
+
+
+ Clemmitt Sigler
+ csigler@vt.edu
diff --git a/Documentation/networking/device_drivers/sb1000.txt b/Documentation/networking/device_drivers/sb1000.txt
deleted file mode 100644
index f92c2aac56a9..000000000000
--- a/Documentation/networking/device_drivers/sb1000.txt
+++ /dev/null
@@ -1,207 +0,0 @@
-sb1000 is a module network device driver for the General Instrument (also known
-as NextLevel) SURFboard1000 internal cable modem board. This is an ISA card
-which is used by a number of cable TV companies to provide cable modem access.
-It's a one-way downstream-only cable modem, meaning that your upstream net link
-is provided by your regular phone modem.
-
-This driver was written by Franco Venturi <fventuri@mediaone.net>. He deserves
-a great deal of thanks for this wonderful piece of code!
-
------------------------------------------------------------------------------
-
-Support for this device is now a part of the standard Linux kernel. The
-driver source code file is drivers/net/sb1000.c. In addition to this
-you will need:
-
-1.) The "cmconfig" program. This is a utility which supplements "ifconfig"
-to configure the cable modem and network interface (usually called "cm0");
-and
-
-2.) Several PPP scripts which live in /etc/ppp to make connecting via your
-cable modem easy.
-
- These utilities can be obtained from:
-
- http://www.jacksonville.net/~fventuri/
-
- in Franco's original source code distribution .tar.gz file. Support for
- the sb1000 driver can be found at:
-
- http://web.archive.org/web/*/http://home.adelphia.net/~siglercm/sb1000.html
- http://web.archive.org/web/*/http://linuxpower.cx/~cable/
-
- along with these utilities.
-
-3.) The standard isapnp tools. These are necessary to configure your SB1000
-card at boot time (or afterwards by hand) since it's a PnP card.
-
- If you don't have these installed as a standard part of your Linux
- distribution, you can find them at:
-
- http://www.roestock.demon.co.uk/isapnptools/
-
- or check your Linux distribution binary CD or their web site. For help with
- isapnp, pnpdump, or /etc/isapnp.conf, go to:
-
- http://www.roestock.demon.co.uk/isapnptools/isapnpfaq.html
-
------------------------------------------------------------------------------
-
-To make the SB1000 card work, follow these steps:
-
-1.) Run `make config', or `make menuconfig', or `make xconfig', whichever
-you prefer, in the top kernel tree directory to set up your kernel
-configuration. Make sure to say "Y" to "Prompt for development drivers"
-and to say "M" to the sb1000 driver. Also say "Y" or "M" to all the standard
-networking questions to get TCP/IP and PPP networking support.
-
-2.) *BEFORE* you build the kernel, edit drivers/net/sb1000.c. Make sure
-to redefine the value of READ_DATA_PORT to match the I/O address used
-by isapnp to access your PnP cards. This is the value of READPORT in
-/etc/isapnp.conf or given by the output of pnpdump.
-
-3.) Build and install the kernel and modules as usual.
-
-4.) Boot your new kernel following the usual procedures.
-
-5.) Set up to configure the new SB1000 PnP card by capturing the output
-of "pnpdump" to a file and editing this file to set the correct I/O ports,
-IRQ, and DMA settings for all your PnP cards. Make sure none of the settings
-conflict with one another. Then test this configuration by running the
-"isapnp" command with your new config file as the input. Check for
-errors and fix as necessary. (As an aside, I use I/O ports 0x110 and
-0x310 and IRQ 11 for my SB1000 card and these work well for me. YMMV.)
-Then save the finished config file as /etc/isapnp.conf for proper configuration
-on subsequent reboots.
-
-6.) Download the original file sb1000-1.1.2.tar.gz from Franco's site or one of
-the others referenced above. As root, unpack it into a temporary directory and
-do a `make cmconfig' and then `install -c cmconfig /usr/local/sbin'. Don't do
-`make install' because it expects to find all the utilities built and ready for
-installation, not just cmconfig.
-
-7.) As root, copy all the files under the ppp/ subdirectory in Franco's
-tar file into /etc/ppp, being careful not to overwrite any files that are
-already in there. Then modify ppp@gi-on to set the correct login name,
-phone number, and frequency for the cable modem. Also edit pap-secrets
-to specify your login name and password and any site-specific information
-you need.
-
-8.) Be sure to modify /etc/ppp/firewall to use ipchains instead of
-the older ipfwadm commands from the 2.0.x kernels. There's a neat utility to
-convert ipfwadm commands to ipchains commands:
-
- http://users.dhp.com/~whisper/ipfwadm2ipchains/
-
-You may also wish to modify the firewall script to implement a different
-firewalling scheme.
-
-9.) Start the PPP connection via the script /etc/ppp/ppp@gi-on. You must be
-root to do this. It's better to use a utility like sudo to execute
-frequently used commands like this with root permissions if possible. If you
-connect successfully the cable modem interface will come up and you'll see a
-driver message like this at the console:
-
- cm0: sb1000 at (0x110,0x310), csn 1, S/N 0x2a0d16d8, IRQ 11.
- sb1000.c:v1.1.2 6/01/98 (fventuri@mediaone.net)
-
-The "ifconfig" command should show two new interfaces, ppp0 and cm0.
-The command "cmconfig cm0" will give you information about the cable modem
-interface.
-
-10.) Try pinging a site via `ping -c 5 www.yahoo.com', for example. You should
-see packets received.
-
-11.) If you can't get site names (like www.yahoo.com) to resolve into
-IP addresses (like 204.71.200.67), be sure your /etc/resolv.conf file
-has no syntax errors and has the right nameserver IP addresses in it.
-If this doesn't help, try something like `ping -c 5 204.71.200.67' to
-see if the networking is running but the DNS resolution is where the
-problem lies.
-
-12.) If you still have problems, go to the support web sites mentioned above
-and read the information and documentation there.
-
------------------------------------------------------------------------------
-
-Common problems:
-
-1.) Packets go out on the ppp0 interface but don't come back on the cm0
-interface. It looks like I'm connected but I can't even ping any
-numerical IP addresses. (This happens predominantly on Debian systems due
-to a default boot-time configuration script.)
-
-Solution -- As root `echo 0 > /proc/sys/net/ipv4/conf/cm0/rp_filter' so it
-can share the same IP address as the ppp0 interface. Note that this
-command should probably be added to the /etc/ppp/cablemodem script
-*right*between* the "/sbin/ifconfig" and "/sbin/cmconfig" commands.
-You may need to do this to /proc/sys/net/ipv4/conf/ppp0/rp_filter as well.
-If you do this to /proc/sys/net/ipv4/conf/default/rp_filter on each reboot
-(in rc.local or some such) then any interfaces can share the same IP
-addresses.
-
-2.) I get "unresolved symbol" error messages on executing `insmod sb1000.o'.
-
-Solution -- You probably have a non-matching kernel source tree and
-/usr/include/linux and /usr/include/asm header files. Make sure you
-install the correct versions of the header files in these two directories.
-Then rebuild and reinstall the kernel.
-
-3.) When isapnp runs it reports an error, and my SB1000 card isn't working.
-
-Solution -- There's a problem with later versions of isapnp using the "(CHECK)"
-option in the lines that allocate the two I/O addresses for the SB1000 card.
-This first popped up on RH 6.0. Delete "(CHECK)" for the SB1000 I/O addresses.
-Make sure they don't conflict with any other pieces of hardware first! Then
-rerun isapnp and go from there.
-
-4.) I can't execute the /etc/ppp/ppp@gi-on file.
-
-Solution -- As root do `chmod ug+x /etc/ppp/ppp@gi-on'.
-
-5.) The firewall script isn't working (with 2.2.x and higher kernels).
-
-Solution -- Use the ipfwadm2ipchains script referenced above to convert the
-/etc/ppp/firewall script from the deprecated ipfwadm commands to ipchains.
-
-6.) I'm getting *tons* of firewall deny messages in the /var/kern.log,
-/var/messages, and/or /var/syslog files, and they're filling up my /var
-partition!!!
-
-Solution -- First, tell your ISP that you're receiving DoS (Denial of Service)
-and/or portscanning (UDP connection attempts) attacks! Look over the deny
-messages to figure out what the attack is and where it's coming from. Next,
-edit /etc/ppp/cablemodem and make sure the ",nobroadcast" option is turned on
-to the "cmconfig" command (uncomment that line). If you're not receiving these
-denied packets on your broadcast interface (IP address xxx.yyy.zzz.255
-typically), then someone is attacking your machine in particular. Be careful
-out there....
-
-7.) Everything seems to work fine but my computer locks up after a while
-(and typically during a lengthy download through the cable modem)!
-
-Solution -- You may need to add a short delay in the driver to 'slow down' the
-SURFboard because your PC might not be able to keep up with the transfer rate
-of the SB1000. To do this, it's probably best to download Franco's
-sb1000-1.1.2.tar.gz archive and build and install sb1000.o manually. You'll
-want to edit the 'Makefile' and look for the 'SB1000_DELAY'
-define. Uncomment those 'CFLAGS' lines (and comment out the default ones)
-and try setting the delay to something like 60 microseconds with:
-'-DSB1000_DELAY=60'. Then do `make' and as root `make install' and try
-it out. If it still doesn't work or you like playing with the driver, you may
-try other numbers. Remember though that the higher the delay, the slower the
-driver (which slows down the rest of the PC too when it is actively
-used). Thanks to Ed Daiga for this tip!
-
------------------------------------------------------------------------------
-
-Credits: This README came from Franco Venturi's original README file which is
-still supplied with his driver .tar.gz archive. I and all other sb1000 users
-owe Franco a tremendous "Thank you!" Additional thanks goes to Carl Patten
-and Ralph Bonnell who are now managing the Linux SB1000 web site, and to
-the SB1000 users who reported and helped debug the common problems listed
-above.
-
-
- Clemmitt Sigler
- csigler@vt.edu
diff --git a/Documentation/networking/device_drivers/smsc/smc9.rst b/Documentation/networking/device_drivers/smsc/smc9.rst
new file mode 100644
index 000000000000..e5eac896a631
--- /dev/null
+++ b/Documentation/networking/device_drivers/smsc/smc9.rst
@@ -0,0 +1,48 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+SMC 9xxxx Driver
+================
+
+Revision 0.12
+
+3/5/96
+
+Copyright 1996 Erik Stahlman
+
+Released under terms of the GNU General Public License.
+
+This file contains the instructions and caveats for my SMC9xxx driver. You
+should not be using the driver without reading this file.
+
+Things to note about installation:
+
+ 1. The driver should work on all kernels from 1.2.13 until 1.3.71.
+ (A kernel patch is supplied for 1.3.71 )
+
+ 2. If you include this into the kernel, you might need to change some
+ options, such as for forcing IRQ.
+
+
+ 3. To compile as a module, run 'make'.
+ Make will give you the appropriate options for various kernel support.
+
+ 4. Loading the driver as a module::
+
+ use: insmod smc9194.o
+ optional parameters:
+ io=xxxx : your base address
+ irq=xx : your irq
+ ifport=x : 0 for whatever is default
+ 1 for twisted pair
+ 2 for AUI ( or BNC on some cards )
+
+How to obtain the latest version?
+
+FTP:
+ ftp://fenris.campus.vt.edu/smc9/smc9-12.tar.gz
+ ftp://sfbox.vt.edu/filebox/F/fenris/smc9/smc9-12.tar.gz
+
+
+Contacting me:
+ erik@mail.vt.edu
diff --git a/Documentation/networking/device_drivers/smsc/smc9.txt b/Documentation/networking/device_drivers/smsc/smc9.txt
deleted file mode 100644
index d1e15074e43d..000000000000
--- a/Documentation/networking/device_drivers/smsc/smc9.txt
+++ /dev/null
@@ -1,42 +0,0 @@
-
-SMC 9xxxx Driver
-Revision 0.12
-3/5/96
-Copyright 1996 Erik Stahlman
-Released under terms of the GNU General Public License.
-
-This file contains the instructions and caveats for my SMC9xxx driver. You
-should not be using the driver without reading this file.
-
-Things to note about installation:
-
- 1. The driver should work on all kernels from 1.2.13 until 1.3.71.
- (A kernel patch is supplied for 1.3.71 )
-
- 2. If you include this into the kernel, you might need to change some
- options, such as for forcing IRQ.
-
-
- 3. To compile as a module, run 'make' .
- Make will give you the appropriate options for various kernel support.
-
- 4. Loading the driver as a module :
-
- use: insmod smc9194.o
- optional parameters:
- io=xxxx : your base address
- irq=xx : your irq
- ifport=x : 0 for whatever is default
- 1 for twisted pair
- 2 for AUI ( or BNC on some cards )
-
-How to obtain the latest version?
-
-FTP:
- ftp://fenris.campus.vt.edu/smc9/smc9-12.tar.gz
- ftp://sfbox.vt.edu/filebox/F/fenris/smc9/smc9-12.tar.gz
-
-
-Contacting me:
- erik@mail.vt.edu
-
diff --git a/Documentation/networking/device_drivers/ti/cpsw.rst b/Documentation/networking/device_drivers/ti/cpsw.rst
new file mode 100644
index 000000000000..a88946bd188b
--- /dev/null
+++ b/Documentation/networking/device_drivers/ti/cpsw.rst
@@ -0,0 +1,587 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================
+Texas Instruments CPSW ethernet driver
+======================================
+
+Multiqueue & CBS & MQPRIO
+=========================
+
+
+The cpsw has 3 CBS shapers for each external ports. This document
+describes MQPRIO and CBS Qdisc offload configuration for cpsw driver
+based on examples. It potentially can be used in audio video bridging
+(AVB) and time sensitive networking (TSN).
+
+The following examples were tested on AM572x EVM and BBB boards.
+
+Test setup
+==========
+
+Under consideration two examples with AM572x EVM running cpsw driver
+in dual_emac mode.
+
+Several prerequisites:
+
+- TX queues must be rated starting from txq0 that has highest priority
+- Traffic classes are used starting from 0, that has highest priority
+- CBS shapers should be used with rated queues
+- The bandwidth for CBS shapers has to be set a little bit more then
+ potential incoming rate, thus, rate of all incoming tx queues has
+ to be a little less
+- Real rates can differ, due to discreetness
+- Map skb-priority to txq is not enough, also skb-priority to l2 prio
+ map has to be created with ip or vconfig tool
+- Any l2/socket prio (0 - 7) for classes can be used, but for
+ simplicity default values are used: 3 and 2
+- only 2 classes tested: A and B, but checked and can work with more,
+ maximum allowed 4, but only for 3 rate can be set.
+
+Test setup for examples
+=======================
+
+::
+
+ +-------------------------------+
+ |--+ |
+ | | Workstation0 |
+ |E | MAC 18:03:73:66:87:42 |
+ +-----------------------------+ +--|t | |
+ | | 1 | E | | |h |./tsn_listener -d \ |
+ | Target board: | 0 | t |--+ |0 | 18:03:73:66:87:42 -i eth0 \|
+ | AM572x EVM | 0 | h | | | -s 1500 |
+ | | 0 | 0 | |--+ |
+ | Only 2 classes: |Mb +---| +-------------------------------+
+ | class A, class B | |
+ | | +---| +-------------------------------+
+ | | 1 | E | |--+ |
+ | | 0 | t | | | Workstation1 |
+ | | 0 | h |--+ |E | MAC 20:cf:30:85:7d:fd |
+ | |Mb | 1 | +--|t | |
+ +-----------------------------+ |h |./tsn_listener -d \ |
+ |0 | 20:cf:30:85:7d:fd -i eth0 \|
+ | | -s 1500 |
+ |--+ |
+ +-------------------------------+
+
+
+Example 1: One port tx AVB configuration scheme for target board
+----------------------------------------------------------------
+
+(prints and scheme for AM572x evm, applicable for single port boards)
+
+- tc - traffic class
+- txq - transmit queue
+- p - priority
+- f - fifo (cpsw fifo)
+- S - shaper configured
+
+::
+
+ +------------------------------------------------------------------+ u
+ | +---------------+ +---------------+ +------+ +------+ | s
+ | | | | | | | | | | e
+ | | App 1 | | App 2 | | Apps | | Apps | | r
+ | | Class A | | Class B | | Rest | | Rest | |
+ | | Eth0 | | Eth0 | | Eth0 | | Eth1 | | s
+ | | VLAN100 | | VLAN100 | | | | | | | | p
+ | | 40 Mb/s | | 20 Mb/s | | | | | | | | a
+ | | SO_PRIORITY=3 | | SO_PRIORITY=2 | | | | | | | | c
+ | | | | | | | | | | | | | | e
+ | +---|-----------+ +---|-----------+ +---|--+ +---|--+ |
+ +-----|------------------|------------------|--------|-------------+
+ +-+ +------------+ | |
+ | | +-----------------+ +--+
+ | | | |
+ +---|-------|-------------|-----------------------|----------------+
+ | +----+ +----+ +----+ +----+ +----+ |
+ | | p3 | | p2 | | p1 | | p0 | | p0 | | k
+ | \ / \ / \ / \ / \ / | e
+ | \ / \ / \ / \ / \ / | r
+ | \/ \/ \/ \/ \/ | n
+ | | | | | | e
+ | | | +-----+ | | l
+ | | | | | |
+ | +----+ +----+ +----+ +----+ | s
+ | |tc0 | |tc1 | |tc2 | |tc0 | | p
+ | \ / \ / \ / \ / | a
+ | \ / \ / \ / \ / | c
+ | \/ \/ \/ \/ | e
+ | | | +-----+ | |
+ | | | | | | |
+ | | | | | | |
+ | | | | | | |
+ | +----+ +----+ +----+ +----+ +----+ |
+ | |txq0| |txq1| |txq2| |txq3| |txq4| |
+ | \ / \ / \ / \ / \ / |
+ | \ / \ / \ / \ / \ / |
+ | \/ \/ \/ \/ \/ |
+ | +-|------|------|------|--+ +--|--------------+ |
+ | | | | | | | Eth0.100 | | Eth1 | |
+ +---|------|------|------|------------------------|----------------+
+ | | | | |
+ p p p p |
+ 3 2 0-1, 4-7 <- L2 priority |
+ | | | | |
+ | | | | |
+ +---|------|------|------|------------------------|----------------+
+ | | | | | |----------+ |
+ | +----+ +----+ +----+ +----+ +----+ |
+ | |dma7| |dma6| |dma5| |dma4| |dma3| |
+ | \ / \ / \ / \ / \ / | c
+ | \S / \S / \ / \ / \ / | p
+ | \/ \/ \/ \/ \/ | s
+ | | | | +----- | | w
+ | | | | | | |
+ | | | | | | | d
+ | +----+ +----+ +----+p p+----+ | r
+ | | | | | | |o o| | | i
+ | | f3 | | f2 | | f0 |r r| f0 | | v
+ | |tc0 | |tc1 | |tc2 |t t|tc0 | | e
+ | \CBS / \CBS / \CBS /1 2\CBS / | r
+ | \S / \S / \ / \ / |
+ | \/ \/ \/ \/ |
+ +------------------------------------------------------------------+
+
+
+1) ::
+
+
+ // Add 4 tx queues, for interface Eth0, and 1 tx queue for Eth1
+ $ ethtool -L eth0 rx 1 tx 5
+ rx unmodified, ignoring
+
+2) ::
+
+ // Check if num of queues is set correctly:
+ $ ethtool -l eth0
+ Channel parameters for eth0:
+ Pre-set maximums:
+ RX: 8
+ TX: 8
+ Other: 0
+ Combined: 0
+ Current hardware settings:
+ RX: 1
+ TX: 5
+ Other: 0
+ Combined: 0
+
+3) ::
+
+ // TX queues must be rated starting from 0, so set bws for tx0 and tx1
+ // Set rates 40 and 20 Mb/s appropriately.
+ // Pay attention, real speed can differ a bit due to discreetness.
+ // Leave last 2 tx queues not rated.
+ $ echo 40 > /sys/class/net/eth0/queues/tx-0/tx_maxrate
+ $ echo 20 > /sys/class/net/eth0/queues/tx-1/tx_maxrate
+
+4) ::
+
+ // Check maximum rate of tx (cpdma) queues:
+ $ cat /sys/class/net/eth0/queues/tx-*/tx_maxrate
+ 40
+ 20
+ 0
+ 0
+ 0
+
+5) ::
+
+ // Map skb->priority to traffic class:
+ // 3pri -> tc0, 2pri -> tc1, (0,1,4-7)pri -> tc2
+ // Map traffic class to transmit queue:
+ // tc0 -> txq0, tc1 -> txq1, tc2 -> (txq2, txq3)
+ $ tc qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
+ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 1
+
+5a) ::
+
+ // As two interface sharing same set of tx queues, assign all traffic
+ // coming to interface Eth1 to separate queue in order to not mix it
+ // with traffic from interface Eth0, so use separate txq to send
+ // packets to Eth1, so all prio -> tc0 and tc0 -> txq4
+ // Here hw 0, so here still default configuration for eth1 in hw
+ $ tc qdisc replace dev eth1 handle 100: parent root mqprio num_tc 1 \
+ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@4 hw 0
+
+6) ::
+
+ // Check classes settings
+ $ tc -g class show dev eth0
+ +---(100:ffe2) mqprio
+ | +---(100:3) mqprio
+ | +---(100:4) mqprio
+ |
+ +---(100:ffe1) mqprio
+ | +---(100:2) mqprio
+ |
+ +---(100:ffe0) mqprio
+ +---(100:1) mqprio
+
+ $ tc -g class show dev eth1
+ +---(100:ffe0) mqprio
+ +---(100:5) mqprio
+
+7) ::
+
+ // Set rate for class A - 41 Mbit (tc0, txq0) using CBS Qdisc
+ // Set it +1 Mb for reserve (important!)
+ // here only idle slope is important, others arg are ignored
+ // Pay attention, real speed can differ a bit due to discreetness
+ $ tc qdisc add dev eth0 parent 100:1 cbs locredit -1438 \
+ hicredit 62 sendslope -959000 idleslope 41000 offload 1
+ net eth0: set FIFO3 bw = 50
+
+8) ::
+
+ // Set rate for class B - 21 Mbit (tc1, txq1) using CBS Qdisc:
+ // Set it +1 Mb for reserve (important!)
+ $ tc qdisc add dev eth0 parent 100:2 cbs locredit -1468 \
+ hicredit 65 sendslope -979000 idleslope 21000 offload 1
+ net eth0: set FIFO2 bw = 30
+
+9) ::
+
+ // Create vlan 100 to map sk->priority to vlan qos
+ $ ip link add link eth0 name eth0.100 type vlan id 100
+ 8021q: 802.1Q VLAN Support v1.8
+ 8021q: adding VLAN 0 to HW filter on device eth0
+ 8021q: adding VLAN 0 to HW filter on device eth1
+ net eth0: Adding vlanid 100 to vlan filter
+
+10) ::
+
+ // Map skb->priority to L2 prio, 1 to 1
+ $ ip link set eth0.100 type vlan \
+ egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
+
+11) ::
+
+ // Check egress map for vlan 100
+ $ cat /proc/net/vlan/eth0.100
+ [...]
+ INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
+ EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
+
+12) ::
+
+ // Run your appropriate tools with socket option "SO_PRIORITY"
+ // to 3 for class A and/or to 2 for class B
+ // (I took at https://www.spinics.net/lists/netdev/msg460869.html)
+ ./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p3 -s 1500&
+ ./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p2 -s 1500&
+
+13) ::
+
+ // run your listener on workstation (should be in same vlan)
+ // (I took at https://www.spinics.net/lists/netdev/msg460869.html)
+ ./tsn_listener -d 18:03:73:66:87:42 -i enp5s0 -s 1500
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39000 kbps
+
+14) ::
+
+ // Restore default configuration if needed
+ $ ip link del eth0.100
+ $ tc qdisc del dev eth1 root
+ $ tc qdisc del dev eth0 root
+ net eth0: Prev FIFO2 is shaped
+ net eth0: set FIFO3 bw = 0
+ net eth0: set FIFO2 bw = 0
+ $ ethtool -L eth0 rx 1 tx 1
+
+Example 2: Two port tx AVB configuration scheme for target board
+----------------------------------------------------------------
+
+(prints and scheme for AM572x evm, for dual emac boards only)
+
+::
+
+ +------------------------------------------------------------------+ u
+ | +----------+ +----------+ +------+ +----------+ +----------+ | s
+ | | | | | | | | | | | | e
+ | | App 1 | | App 2 | | Apps | | App 3 | | App 4 | | r
+ | | Class A | | Class B | | Rest | | Class B | | Class A | |
+ | | Eth0 | | Eth0 | | | | | Eth1 | | Eth1 | | s
+ | | VLAN100 | | VLAN100 | | | | | VLAN100 | | VLAN100 | | p
+ | | 40 Mb/s | | 20 Mb/s | | | | | 10 Mb/s | | 30 Mb/s | | a
+ | | SO_PRI=3 | | SO_PRI=2 | | | | | SO_PRI=3 | | SO_PRI=2 | | c
+ | | | | | | | | | | | | | | | | | e
+ | +---|------+ +---|------+ +---|--+ +---|------+ +---|------+ |
+ +-----|-------------|-------------|---------|-------------|--------+
+ +-+ +-------+ | +----------+ +----+
+ | | +-------+------+ | |
+ | | | | | |
+ +---|-------|-------------|--------------|-------------|-------|---+
+ | +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ |
+ | | p3 | | p2 | | p1 | | p0 | | p0 | | p1 | | p2 | | p3 | | k
+ | \ / \ / \ / \ / \ / \ / \ / \ / | e
+ | \ / \ / \ / \ / \ / \ / \ / \ / | r
+ | \/ \/ \/ \/ \/ \/ \/ \/ | n
+ | | | | | | | | e
+ | | | +----+ +----+ | | | l
+ | | | | | | | |
+ | +----+ +----+ +----+ +----+ +----+ +----+ | s
+ | |tc0 | |tc1 | |tc2 | |tc2 | |tc1 | |tc0 | | p
+ | \ / \ / \ / \ / \ / \ / | a
+ | \ / \ / \ / \ / \ / \ / | c
+ | \/ \/ \/ \/ \/ \/ | e
+ | | | +-----+ +-----+ | | |
+ | | | | | | | | | |
+ | | | | | | | | | |
+ | | | | | E E | | | | |
+ | +----+ +----+ +----+ +----+ t t +----+ +----+ +----+ +----+ |
+ | |txq0| |txq1| |txq4| |txq5| h h |txq6| |txq7| |txq3| |txq2| |
+ | \ / \ / \ / \ / 0 1 \ / \ / \ / \ / |
+ | \ / \ / \ / \ / . . \ / \ / \ / \ / |
+ | \/ \/ \/ \/ 1 1 \/ \/ \/ \/ |
+ | +-|------|------|------|--+ 0 0 +-|------|------|------|--+ |
+ | | | | | | | 0 0 | | | | | | |
+ +---|------|------|------|---------------|------|------|------|----+
+ | | | | | | | |
+ p p p p p p p p
+ 3 2 0-1, 4-7 <-L2 pri-> 0-1, 4-7 2 3
+ | | | | | | | |
+ | | | | | | | |
+ +---|------|------|------|---------------|------|------|------|----+
+ | | | | | | | | | |
+ | +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ |
+ | |dma7| |dma6| |dma3| |dma2| |dma1| |dma0| |dma4| |dma5| |
+ | \ / \ / \ / \ / \ / \ / \ / \ / | c
+ | \S / \S / \ / \ / \ / \ / \S / \S / | p
+ | \/ \/ \/ \/ \/ \/ \/ \/ | s
+ | | | | +----- | | | | | w
+ | | | | | +----+ | | | |
+ | | | | | | | | | | d
+ | +----+ +----+ +----+p p+----+ +----+ +----+ | r
+ | | | | | | |o o| | | | | | | i
+ | | f3 | | f2 | | f0 |r CPSW r| f3 | | f2 | | f0 | | v
+ | |tc0 | |tc1 | |tc2 |t t|tc0 | |tc1 | |tc2 | | e
+ | \CBS / \CBS / \CBS /1 2\CBS / \CBS / \CBS / | r
+ | \S / \S / \ / \S / \S / \ / |
+ | \/ \/ \/ \/ \/ \/ |
+ +------------------------------------------------------------------+
+ ========================================Eth==========================>
+
+1) ::
+
+ // Add 8 tx queues, for interface Eth0, but they are common, so are accessed
+ // by two interfaces Eth0 and Eth1.
+ $ ethtool -L eth1 rx 1 tx 8
+ rx unmodified, ignoring
+
+2) ::
+
+ // Check if num of queues is set correctly:
+ $ ethtool -l eth0
+ Channel parameters for eth0:
+ Pre-set maximums:
+ RX: 8
+ TX: 8
+ Other: 0
+ Combined: 0
+ Current hardware settings:
+ RX: 1
+ TX: 8
+ Other: 0
+ Combined: 0
+
+3) ::
+
+ // TX queues must be rated starting from 0, so set bws for tx0 and tx1 for Eth0
+ // and for tx2 and tx3 for Eth1. That is, rates 40 and 20 Mb/s appropriately
+ // for Eth0 and 30 and 10 Mb/s for Eth1.
+ // Real speed can differ a bit due to discreetness
+ // Leave last 4 tx queues as not rated
+ $ echo 40 > /sys/class/net/eth0/queues/tx-0/tx_maxrate
+ $ echo 20 > /sys/class/net/eth0/queues/tx-1/tx_maxrate
+ $ echo 30 > /sys/class/net/eth1/queues/tx-2/tx_maxrate
+ $ echo 10 > /sys/class/net/eth1/queues/tx-3/tx_maxrate
+
+4) ::
+
+ // Check maximum rate of tx (cpdma) queues:
+ $ cat /sys/class/net/eth0/queues/tx-*/tx_maxrate
+ 40
+ 20
+ 30
+ 10
+ 0
+ 0
+ 0
+ 0
+
+5) ::
+
+ // Map skb->priority to traffic class for Eth0:
+ // 3pri -> tc0, 2pri -> tc1, (0,1,4-7)pri -> tc2
+ // Map traffic class to transmit queue:
+ // tc0 -> txq0, tc1 -> txq1, tc2 -> (txq4, txq5)
+ $ tc qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
+ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@4 hw 1
+
+6) ::
+
+ // Check classes settings
+ $ tc -g class show dev eth0
+ +---(100:ffe2) mqprio
+ | +---(100:5) mqprio
+ | +---(100:6) mqprio
+ |
+ +---(100:ffe1) mqprio
+ | +---(100:2) mqprio
+ |
+ +---(100:ffe0) mqprio
+ +---(100:1) mqprio
+
+7) ::
+
+ // Set rate for class A - 41 Mbit (tc0, txq0) using CBS Qdisc for Eth0
+ // here only idle slope is important, others ignored
+ // Real speed can differ a bit due to discreetness
+ $ tc qdisc add dev eth0 parent 100:1 cbs locredit -1470 \
+ hicredit 62 sendslope -959000 idleslope 41000 offload 1
+ net eth0: set FIFO3 bw = 50
+
+8) ::
+
+ // Set rate for class B - 21 Mbit (tc1, txq1) using CBS Qdisc for Eth0
+ $ tc qdisc add dev eth0 parent 100:2 cbs locredit -1470 \
+ hicredit 65 sendslope -979000 idleslope 21000 offload 1
+ net eth0: set FIFO2 bw = 30
+
+9) ::
+
+ // Create vlan 100 to map sk->priority to vlan qos for Eth0
+ $ ip link add link eth0 name eth0.100 type vlan id 100
+ net eth0: Adding vlanid 100 to vlan filter
+
+10) ::
+
+ // Map skb->priority to L2 prio for Eth0.100, one to one
+ $ ip link set eth0.100 type vlan \
+ egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
+
+11) ::
+
+ // Check egress map for vlan 100
+ $ cat /proc/net/vlan/eth0.100
+ [...]
+ INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
+ EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
+
+12) ::
+
+ // Map skb->priority to traffic class for Eth1:
+ // 3pri -> tc0, 2pri -> tc1, (0,1,4-7)pri -> tc2
+ // Map traffic class to transmit queue:
+ // tc0 -> txq2, tc1 -> txq3, tc2 -> (txq6, txq7)
+ $ tc qdisc replace dev eth1 handle 100: parent root mqprio num_tc 3 \
+ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@2 1@3 2@6 hw 1
+
+13) ::
+
+ // Check classes settings
+ $ tc -g class show dev eth1
+ +---(100:ffe2) mqprio
+ | +---(100:7) mqprio
+ | +---(100:8) mqprio
+ |
+ +---(100:ffe1) mqprio
+ | +---(100:4) mqprio
+ |
+ +---(100:ffe0) mqprio
+ +---(100:3) mqprio
+
+14) ::
+
+ // Set rate for class A - 31 Mbit (tc0, txq2) using CBS Qdisc for Eth1
+ // here only idle slope is important, others ignored, but calculated
+ // for interface speed - 100Mb for eth1 port.
+ // Set it +1 Mb for reserve (important!)
+ $ tc qdisc add dev eth1 parent 100:3 cbs locredit -1035 \
+ hicredit 465 sendslope -69000 idleslope 31000 offload 1
+ net eth1: set FIFO3 bw = 31
+
+15) ::
+
+ // Set rate for class B - 11 Mbit (tc1, txq3) using CBS Qdisc for Eth1
+ // Set it +1 Mb for reserve (important!)
+ $ tc qdisc add dev eth1 parent 100:4 cbs locredit -1335 \
+ hicredit 405 sendslope -89000 idleslope 11000 offload 1
+ net eth1: set FIFO2 bw = 11
+
+16) ::
+
+ // Create vlan 100 to map sk->priority to vlan qos for Eth1
+ $ ip link add link eth1 name eth1.100 type vlan id 100
+ net eth1: Adding vlanid 100 to vlan filter
+
+17) ::
+
+ // Map skb->priority to L2 prio for Eth1.100, one to one
+ $ ip link set eth1.100 type vlan \
+ egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
+
+18) ::
+
+ // Check egress map for vlan 100
+ $ cat /proc/net/vlan/eth1.100
+ [...]
+ INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
+ EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
+
+19) ::
+
+ // Run appropriate tools with socket option "SO_PRIORITY" to 3
+ // for class A and to 2 for class B. For both interfaces
+ ./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p2 -s 1500&
+ ./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p3 -s 1500&
+ ./tsn_talker -d 20:cf:30:85:7d:fd -i eth1.100 -p2 -s 1500&
+ ./tsn_talker -d 20:cf:30:85:7d:fd -i eth1.100 -p3 -s 1500&
+
+20) ::
+
+ // run your listener on workstation (should be in same vlan)
+ // (I took at https://www.spinics.net/lists/netdev/msg460869.html)
+ ./tsn_listener -d 18:03:73:66:87:42 -i enp5s0 -s 1500
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39012 kbps
+ Receiving data rate: 39000 kbps
+
+21) ::
+
+ // Restore default configuration if needed
+ $ ip link del eth1.100
+ $ ip link del eth0.100
+ $ tc qdisc del dev eth1 root
+ net eth1: Prev FIFO2 is shaped
+ net eth1: set FIFO3 bw = 0
+ net eth1: set FIFO2 bw = 0
+ $ tc qdisc del dev eth0 root
+ net eth0: Prev FIFO2 is shaped
+ net eth0: set FIFO3 bw = 0
+ net eth0: set FIFO2 bw = 0
+ $ ethtool -L eth0 rx 1 tx 1
diff --git a/Documentation/networking/device_drivers/ti/cpsw.txt b/Documentation/networking/device_drivers/ti/cpsw.txt
deleted file mode 100644
index d4d4c0751a09..000000000000
--- a/Documentation/networking/device_drivers/ti/cpsw.txt
+++ /dev/null
@@ -1,541 +0,0 @@
-* Texas Instruments CPSW ethernet driver
-
-Multiqueue & CBS & MQPRIO
-=====================================================================
-=====================================================================
-
-The cpsw has 3 CBS shapers for each external ports. This document
-describes MQPRIO and CBS Qdisc offload configuration for cpsw driver
-based on examples. It potentially can be used in audio video bridging
-(AVB) and time sensitive networking (TSN).
-
-The following examples were tested on AM572x EVM and BBB boards.
-
-Test setup
-==========
-
-Under consideration two examples with AM572x EVM running cpsw driver
-in dual_emac mode.
-
-Several prerequisites:
-- TX queues must be rated starting from txq0 that has highest priority
-- Traffic classes are used starting from 0, that has highest priority
-- CBS shapers should be used with rated queues
-- The bandwidth for CBS shapers has to be set a little bit more then
- potential incoming rate, thus, rate of all incoming tx queues has
- to be a little less
-- Real rates can differ, due to discreetness
-- Map skb-priority to txq is not enough, also skb-priority to l2 prio
- map has to be created with ip or vconfig tool
-- Any l2/socket prio (0 - 7) for classes can be used, but for
- simplicity default values are used: 3 and 2
-- only 2 classes tested: A and B, but checked and can work with more,
- maximum allowed 4, but only for 3 rate can be set.
-
-Test setup for examples
-=======================
- +-------------------------------+
- |--+ |
- | | Workstation0 |
- |E | MAC 18:03:73:66:87:42 |
-+-----------------------------+ +--|t | |
-| | 1 | E | | |h |./tsn_listener -d \ |
-| Target board: | 0 | t |--+ |0 | 18:03:73:66:87:42 -i eth0 \|
-| AM572x EVM | 0 | h | | | -s 1500 |
-| | 0 | 0 | |--+ |
-| Only 2 classes: |Mb +---| +-------------------------------+
-| class A, class B | |
-| | +---| +-------------------------------+
-| | 1 | E | |--+ |
-| | 0 | t | | | Workstation1 |
-| | 0 | h |--+ |E | MAC 20:cf:30:85:7d:fd |
-| |Mb | 1 | +--|t | |
-+-----------------------------+ |h |./tsn_listener -d \ |
- |0 | 20:cf:30:85:7d:fd -i eth0 \|
- | | -s 1500 |
- |--+ |
- +-------------------------------+
-
-*********************************************************************
-*********************************************************************
-*********************************************************************
-Example 1: One port tx AVB configuration scheme for target board
-----------------------------------------------------------------------
-(prints and scheme for AM572x evm, applicable for single port boards)
-
-tc - traffic class
-txq - transmit queue
-p - priority
-f - fifo (cpsw fifo)
-S - shaper configured
-
-+------------------------------------------------------------------+ u
-| +---------------+ +---------------+ +------+ +------+ | s
-| | | | | | | | | | e
-| | App 1 | | App 2 | | Apps | | Apps | | r
-| | Class A | | Class B | | Rest | | Rest | |
-| | Eth0 | | Eth0 | | Eth0 | | Eth1 | | s
-| | VLAN100 | | VLAN100 | | | | | | | | p
-| | 40 Mb/s | | 20 Mb/s | | | | | | | | a
-| | SO_PRIORITY=3 | | SO_PRIORITY=2 | | | | | | | | c
-| | | | | | | | | | | | | | e
-| +---|-----------+ +---|-----------+ +---|--+ +---|--+ |
-+-----|------------------|------------------|--------|-------------+
- +-+ +------------+ | |
- | | +-----------------+ +--+
- | | | |
-+---|-------|-------------|-----------------------|----------------+
-| +----+ +----+ +----+ +----+ +----+ |
-| | p3 | | p2 | | p1 | | p0 | | p0 | | k
-| \ / \ / \ / \ / \ / | e
-| \ / \ / \ / \ / \ / | r
-| \/ \/ \/ \/ \/ | n
-| | | | | | e
-| | | +-----+ | | l
-| | | | | |
-| +----+ +----+ +----+ +----+ | s
-| |tc0 | |tc1 | |tc2 | |tc0 | | p
-| \ / \ / \ / \ / | a
-| \ / \ / \ / \ / | c
-| \/ \/ \/ \/ | e
-| | | +-----+ | |
-| | | | | | |
-| | | | | | |
-| | | | | | |
-| +----+ +----+ +----+ +----+ +----+ |
-| |txq0| |txq1| |txq2| |txq3| |txq4| |
-| \ / \ / \ / \ / \ / |
-| \ / \ / \ / \ / \ / |
-| \/ \/ \/ \/ \/ |
-| +-|------|------|------|--+ +--|--------------+ |
-| | | | | | | Eth0.100 | | Eth1 | |
-+---|------|------|------|------------------------|----------------+
- | | | | |
- p p p p |
- 3 2 0-1, 4-7 <- L2 priority |
- | | | | |
- | | | | |
-+---|------|------|------|------------------------|----------------+
-| | | | | |----------+ |
-| +----+ +----+ +----+ +----+ +----+ |
-| |dma7| |dma6| |dma5| |dma4| |dma3| |
-| \ / \ / \ / \ / \ / | c
-| \S / \S / \ / \ / \ / | p
-| \/ \/ \/ \/ \/ | s
-| | | | +----- | | w
-| | | | | | |
-| | | | | | | d
-| +----+ +----+ +----+p p+----+ | r
-| | | | | | |o o| | | i
-| | f3 | | f2 | | f0 |r r| f0 | | v
-| |tc0 | |tc1 | |tc2 |t t|tc0 | | e
-| \CBS / \CBS / \CBS /1 2\CBS / | r
-| \S / \S / \ / \ / |
-| \/ \/ \/ \/ |
-+------------------------------------------------------------------+
-========================================Eth==========================>
-
-1)
-// Add 4 tx queues, for interface Eth0, and 1 tx queue for Eth1
-$ ethtool -L eth0 rx 1 tx 5
-rx unmodified, ignoring
-
-2)
-// Check if num of queues is set correctly:
-$ ethtool -l eth0
-Channel parameters for eth0:
-Pre-set maximums:
-RX: 8
-TX: 8
-Other: 0
-Combined: 0
-Current hardware settings:
-RX: 1
-TX: 5
-Other: 0
-Combined: 0
-
-3)
-// TX queues must be rated starting from 0, so set bws for tx0 and tx1
-// Set rates 40 and 20 Mb/s appropriately.
-// Pay attention, real speed can differ a bit due to discreetness.
-// Leave last 2 tx queues not rated.
-$ echo 40 > /sys/class/net/eth0/queues/tx-0/tx_maxrate
-$ echo 20 > /sys/class/net/eth0/queues/tx-1/tx_maxrate
-
-4)
-// Check maximum rate of tx (cpdma) queues:
-$ cat /sys/class/net/eth0/queues/tx-*/tx_maxrate
-40
-20
-0
-0
-0
-
-5)
-// Map skb->priority to traffic class:
-// 3pri -> tc0, 2pri -> tc1, (0,1,4-7)pri -> tc2
-// Map traffic class to transmit queue:
-// tc0 -> txq0, tc1 -> txq1, tc2 -> (txq2, txq3)
-$ tc qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
-map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 1
-
-5a)
-// As two interface sharing same set of tx queues, assign all traffic
-// coming to interface Eth1 to separate queue in order to not mix it
-// with traffic from interface Eth0, so use separate txq to send
-// packets to Eth1, so all prio -> tc0 and tc0 -> txq4
-// Here hw 0, so here still default configuration for eth1 in hw
-$ tc qdisc replace dev eth1 handle 100: parent root mqprio num_tc 1 \
-map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@4 hw 0
-
-6)
-// Check classes settings
-$ tc -g class show dev eth0
-+---(100:ffe2) mqprio
-| +---(100:3) mqprio
-| +---(100:4) mqprio
-|
-+---(100:ffe1) mqprio
-| +---(100:2) mqprio
-|
-+---(100:ffe0) mqprio
- +---(100:1) mqprio
-
-$ tc -g class show dev eth1
-+---(100:ffe0) mqprio
- +---(100:5) mqprio
-
-7)
-// Set rate for class A - 41 Mbit (tc0, txq0) using CBS Qdisc
-// Set it +1 Mb for reserve (important!)
-// here only idle slope is important, others arg are ignored
-// Pay attention, real speed can differ a bit due to discreetness
-$ tc qdisc add dev eth0 parent 100:1 cbs locredit -1438 \
-hicredit 62 sendslope -959000 idleslope 41000 offload 1
-net eth0: set FIFO3 bw = 50
-
-8)
-// Set rate for class B - 21 Mbit (tc1, txq1) using CBS Qdisc:
-// Set it +1 Mb for reserve (important!)
-$ tc qdisc add dev eth0 parent 100:2 cbs locredit -1468 \
-hicredit 65 sendslope -979000 idleslope 21000 offload 1
-net eth0: set FIFO2 bw = 30
-
-9)
-// Create vlan 100 to map sk->priority to vlan qos
-$ ip link add link eth0 name eth0.100 type vlan id 100
-8021q: 802.1Q VLAN Support v1.8
-8021q: adding VLAN 0 to HW filter on device eth0
-8021q: adding VLAN 0 to HW filter on device eth1
-net eth0: Adding vlanid 100 to vlan filter
-
-10)
-// Map skb->priority to L2 prio, 1 to 1
-$ ip link set eth0.100 type vlan \
-egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
-
-11)
-// Check egress map for vlan 100
-$ cat /proc/net/vlan/eth0.100
-[...]
-INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
-EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
-
-12)
-// Run your appropriate tools with socket option "SO_PRIORITY"
-// to 3 for class A and/or to 2 for class B
-// (I took at https://www.spinics.net/lists/netdev/msg460869.html)
-./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p3 -s 1500&
-./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p2 -s 1500&
-
-13)
-// run your listener on workstation (should be in same vlan)
-// (I took at https://www.spinics.net/lists/netdev/msg460869.html)
-./tsn_listener -d 18:03:73:66:87:42 -i enp5s0 -s 1500
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39000 kbps
-
-14)
-// Restore default configuration if needed
-$ ip link del eth0.100
-$ tc qdisc del dev eth1 root
-$ tc qdisc del dev eth0 root
-net eth0: Prev FIFO2 is shaped
-net eth0: set FIFO3 bw = 0
-net eth0: set FIFO2 bw = 0
-$ ethtool -L eth0 rx 1 tx 1
-
-*********************************************************************
-*********************************************************************
-*********************************************************************
-Example 2: Two port tx AVB configuration scheme for target board
-----------------------------------------------------------------------
-(prints and scheme for AM572x evm, for dual emac boards only)
-
-+------------------------------------------------------------------+ u
-| +----------+ +----------+ +------+ +----------+ +----------+ | s
-| | | | | | | | | | | | e
-| | App 1 | | App 2 | | Apps | | App 3 | | App 4 | | r
-| | Class A | | Class B | | Rest | | Class B | | Class A | |
-| | Eth0 | | Eth0 | | | | | Eth1 | | Eth1 | | s
-| | VLAN100 | | VLAN100 | | | | | VLAN100 | | VLAN100 | | p
-| | 40 Mb/s | | 20 Mb/s | | | | | 10 Mb/s | | 30 Mb/s | | a
-| | SO_PRI=3 | | SO_PRI=2 | | | | | SO_PRI=3 | | SO_PRI=2 | | c
-| | | | | | | | | | | | | | | | | e
-| +---|------+ +---|------+ +---|--+ +---|------+ +---|------+ |
-+-----|-------------|-------------|---------|-------------|--------+
- +-+ +-------+ | +----------+ +----+
- | | +-------+------+ | |
- | | | | | |
-+---|-------|-------------|--------------|-------------|-------|---+
-| +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ |
-| | p3 | | p2 | | p1 | | p0 | | p0 | | p1 | | p2 | | p3 | | k
-| \ / \ / \ / \ / \ / \ / \ / \ / | e
-| \ / \ / \ / \ / \ / \ / \ / \ / | r
-| \/ \/ \/ \/ \/ \/ \/ \/ | n
-| | | | | | | | e
-| | | +----+ +----+ | | | l
-| | | | | | | |
-| +----+ +----+ +----+ +----+ +----+ +----+ | s
-| |tc0 | |tc1 | |tc2 | |tc2 | |tc1 | |tc0 | | p
-| \ / \ / \ / \ / \ / \ / | a
-| \ / \ / \ / \ / \ / \ / | c
-| \/ \/ \/ \/ \/ \/ | e
-| | | +-----+ +-----+ | | |
-| | | | | | | | | |
-| | | | | | | | | |
-| | | | | E E | | | | |
-| +----+ +----+ +----+ +----+ t t +----+ +----+ +----+ +----+ |
-| |txq0| |txq1| |txq4| |txq5| h h |txq6| |txq7| |txq3| |txq2| |
-| \ / \ / \ / \ / 0 1 \ / \ / \ / \ / |
-| \ / \ / \ / \ / . . \ / \ / \ / \ / |
-| \/ \/ \/ \/ 1 1 \/ \/ \/ \/ |
-| +-|------|------|------|--+ 0 0 +-|------|------|------|--+ |
-| | | | | | | 0 0 | | | | | | |
-+---|------|------|------|---------------|------|------|------|----+
- | | | | | | | |
- p p p p p p p p
- 3 2 0-1, 4-7 <-L2 pri-> 0-1, 4-7 2 3
- | | | | | | | |
- | | | | | | | |
-+---|------|------|------|---------------|------|------|------|----+
-| | | | | | | | | |
-| +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ |
-| |dma7| |dma6| |dma3| |dma2| |dma1| |dma0| |dma4| |dma5| |
-| \ / \ / \ / \ / \ / \ / \ / \ / | c
-| \S / \S / \ / \ / \ / \ / \S / \S / | p
-| \/ \/ \/ \/ \/ \/ \/ \/ | s
-| | | | +----- | | | | | w
-| | | | | +----+ | | | |
-| | | | | | | | | | d
-| +----+ +----+ +----+p p+----+ +----+ +----+ | r
-| | | | | | |o o| | | | | | | i
-| | f3 | | f2 | | f0 |r CPSW r| f3 | | f2 | | f0 | | v
-| |tc0 | |tc1 | |tc2 |t t|tc0 | |tc1 | |tc2 | | e
-| \CBS / \CBS / \CBS /1 2\CBS / \CBS / \CBS / | r
-| \S / \S / \ / \S / \S / \ / |
-| \/ \/ \/ \/ \/ \/ |
-+------------------------------------------------------------------+
-========================================Eth==========================>
-
-1)
-// Add 8 tx queues, for interface Eth0, but they are common, so are accessed
-// by two interfaces Eth0 and Eth1.
-$ ethtool -L eth1 rx 1 tx 8
-rx unmodified, ignoring
-
-2)
-// Check if num of queues is set correctly:
-$ ethtool -l eth0
-Channel parameters for eth0:
-Pre-set maximums:
-RX: 8
-TX: 8
-Other: 0
-Combined: 0
-Current hardware settings:
-RX: 1
-TX: 8
-Other: 0
-Combined: 0
-
-3)
-// TX queues must be rated starting from 0, so set bws for tx0 and tx1 for Eth0
-// and for tx2 and tx3 for Eth1. That is, rates 40 and 20 Mb/s appropriately
-// for Eth0 and 30 and 10 Mb/s for Eth1.
-// Real speed can differ a bit due to discreetness
-// Leave last 4 tx queues as not rated
-$ echo 40 > /sys/class/net/eth0/queues/tx-0/tx_maxrate
-$ echo 20 > /sys/class/net/eth0/queues/tx-1/tx_maxrate
-$ echo 30 > /sys/class/net/eth1/queues/tx-2/tx_maxrate
-$ echo 10 > /sys/class/net/eth1/queues/tx-3/tx_maxrate
-
-4)
-// Check maximum rate of tx (cpdma) queues:
-$ cat /sys/class/net/eth0/queues/tx-*/tx_maxrate
-40
-20
-30
-10
-0
-0
-0
-0
-
-5)
-// Map skb->priority to traffic class for Eth0:
-// 3pri -> tc0, 2pri -> tc1, (0,1,4-7)pri -> tc2
-// Map traffic class to transmit queue:
-// tc0 -> txq0, tc1 -> txq1, tc2 -> (txq4, txq5)
-$ tc qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
-map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@4 hw 1
-
-6)
-// Check classes settings
-$ tc -g class show dev eth0
-+---(100:ffe2) mqprio
-| +---(100:5) mqprio
-| +---(100:6) mqprio
-|
-+---(100:ffe1) mqprio
-| +---(100:2) mqprio
-|
-+---(100:ffe0) mqprio
- +---(100:1) mqprio
-
-7)
-// Set rate for class A - 41 Mbit (tc0, txq0) using CBS Qdisc for Eth0
-// here only idle slope is important, others ignored
-// Real speed can differ a bit due to discreetness
-$ tc qdisc add dev eth0 parent 100:1 cbs locredit -1470 \
-hicredit 62 sendslope -959000 idleslope 41000 offload 1
-net eth0: set FIFO3 bw = 50
-
-8)
-// Set rate for class B - 21 Mbit (tc1, txq1) using CBS Qdisc for Eth0
-$ tc qdisc add dev eth0 parent 100:2 cbs locredit -1470 \
-hicredit 65 sendslope -979000 idleslope 21000 offload 1
-net eth0: set FIFO2 bw = 30
-
-9)
-// Create vlan 100 to map sk->priority to vlan qos for Eth0
-$ ip link add link eth0 name eth0.100 type vlan id 100
-net eth0: Adding vlanid 100 to vlan filter
-
-10)
-// Map skb->priority to L2 prio for Eth0.100, one to one
-$ ip link set eth0.100 type vlan \
-egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
-
-11)
-// Check egress map for vlan 100
-$ cat /proc/net/vlan/eth0.100
-[...]
-INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
-EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
-
-12)
-// Map skb->priority to traffic class for Eth1:
-// 3pri -> tc0, 2pri -> tc1, (0,1,4-7)pri -> tc2
-// Map traffic class to transmit queue:
-// tc0 -> txq2, tc1 -> txq3, tc2 -> (txq6, txq7)
-$ tc qdisc replace dev eth1 handle 100: parent root mqprio num_tc 3 \
-map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@2 1@3 2@6 hw 1
-
-13)
-// Check classes settings
-$ tc -g class show dev eth1
-+---(100:ffe2) mqprio
-| +---(100:7) mqprio
-| +---(100:8) mqprio
-|
-+---(100:ffe1) mqprio
-| +---(100:4) mqprio
-|
-+---(100:ffe0) mqprio
- +---(100:3) mqprio
-
-14)
-// Set rate for class A - 31 Mbit (tc0, txq2) using CBS Qdisc for Eth1
-// here only idle slope is important, others ignored, but calculated
-// for interface speed - 100Mb for eth1 port.
-// Set it +1 Mb for reserve (important!)
-$ tc qdisc add dev eth1 parent 100:3 cbs locredit -1035 \
-hicredit 465 sendslope -69000 idleslope 31000 offload 1
-net eth1: set FIFO3 bw = 31
-
-15)
-// Set rate for class B - 11 Mbit (tc1, txq3) using CBS Qdisc for Eth1
-// Set it +1 Mb for reserve (important!)
-$ tc qdisc add dev eth1 parent 100:4 cbs locredit -1335 \
-hicredit 405 sendslope -89000 idleslope 11000 offload 1
-net eth1: set FIFO2 bw = 11
-
-16)
-// Create vlan 100 to map sk->priority to vlan qos for Eth1
-$ ip link add link eth1 name eth1.100 type vlan id 100
-net eth1: Adding vlanid 100 to vlan filter
-
-17)
-// Map skb->priority to L2 prio for Eth1.100, one to one
-$ ip link set eth1.100 type vlan \
-egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
-
-18)
-// Check egress map for vlan 100
-$ cat /proc/net/vlan/eth1.100
-[...]
-INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
-EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
-
-19)
-// Run appropriate tools with socket option "SO_PRIORITY" to 3
-// for class A and to 2 for class B. For both interfaces
-./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p2 -s 1500&
-./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p3 -s 1500&
-./tsn_talker -d 20:cf:30:85:7d:fd -i eth1.100 -p2 -s 1500&
-./tsn_talker -d 20:cf:30:85:7d:fd -i eth1.100 -p3 -s 1500&
-
-20)
-// run your listener on workstation (should be in same vlan)
-// (I took at https://www.spinics.net/lists/netdev/msg460869.html)
-./tsn_listener -d 18:03:73:66:87:42 -i enp5s0 -s 1500
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39012 kbps
-Receiving data rate: 39000 kbps
-
-21)
-// Restore default configuration if needed
-$ ip link del eth1.100
-$ ip link del eth0.100
-$ tc qdisc del dev eth1 root
-net eth1: Prev FIFO2 is shaped
-net eth1: set FIFO3 bw = 0
-net eth1: set FIFO2 bw = 0
-$ tc qdisc del dev eth0 root
-net eth0: Prev FIFO2 is shaped
-net eth0: set FIFO3 bw = 0
-net eth0: set FIFO2 bw = 0
-$ ethtool -L eth0 rx 1 tx 1
diff --git a/Documentation/networking/device_drivers/ti/cpsw_switchdev.txt b/Documentation/networking/device_drivers/ti/cpsw_switchdev.rst
index 12855ab268b8..1241ecac73bd 100644
--- a/Documentation/networking/device_drivers/ti/cpsw_switchdev.txt
+++ b/Documentation/networking/device_drivers/ti/cpsw_switchdev.rst
@@ -1,30 +1,44 @@
-* Texas Instruments CPSW switchdev based ethernet driver 2.0
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================================
+Texas Instruments CPSW switchdev based ethernet driver
+======================================================
+
+:Version: 2.0
+
+Port renaming
+=============
-- Port renaming
On older udev versions renaming of ethX to swXpY will not be automatically
supported
-In order to rename via udev:
-ip -d link show dev sw0p1 | grep switchid
-SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}==<switchid>, \
- ATTR{phys_port_name}!="", NAME="sw0$attr{phys_port_name}"
+In order to rename via udev::
+
+ ip -d link show dev sw0p1 | grep switchid
+
+ SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}==<switchid>, \
+ ATTR{phys_port_name}!="", NAME="sw0$attr{phys_port_name}"
+
+Dual mac mode
+=============
-====================
-# Dual mac mode
-====================
- The new (cpsw_new.c) driver is operating in dual-emac mode by default, thus
-working as 2 individual network interfaces. Main differences from legacy CPSW
-driver are:
+ working as 2 individual network interfaces. Main differences from legacy CPSW
+ driver are:
+
- optimized promiscuous mode: The P0_UNI_FLOOD (both ports) is enabled in
-addition to ALLMULTI (current port) instead of ALE_BYPASS.
-So, Ports in promiscuous mode will keep possibility of mcast and vlan filtering,
-which is provides significant benefits when ports are joined to the same bridge,
-but without enabling "switch" mode, or to different bridges.
+ addition to ALLMULTI (current port) instead of ALE_BYPASS.
+ So, Ports in promiscuous mode will keep possibility of mcast and vlan
+ filtering, which is provides significant benefits when ports are joined
+ to the same bridge, but without enabling "switch" mode, or to different
+ bridges.
- learning disabled on ports as it make not too much sense for
segregated ports - no forwarding in HW.
- enabled basic support for devlink.
+ ::
+
devlink dev show
platform/48484000.switch
@@ -38,22 +52,25 @@ but without enabling "switch" mode, or to different bridges.
cmode runtime value false
Devlink configuration parameters
-====================
+================================
+
See Documentation/networking/devlink/ti-cpsw-switch.rst
-====================
-# Bridging in dual mac mode
-====================
+Bridging in dual mac mode
+=========================
+
The dual_mac mode requires two vids to be reserved for internal purposes,
which, by default, equal CPSW Port numbers. As result, bridge has to be
-configured in vlan unaware mode or default_pvid has to be adjusted.
+configured in vlan unaware mode or default_pvid has to be adjusted::
ip link add name br0 type bridge
ip link set dev br0 type bridge vlan_filtering 0
echo 0 > /sys/class/net/br0/bridge/default_pvid
ip link set dev sw0p1 master br0
ip link set dev sw0p2 master br0
- - or -
+
+or::
+
ip link add name br0 type bridge
ip link set dev br0 type bridge vlan_filtering 0
echo 100 > /sys/class/net/br0/bridge/default_pvid
@@ -61,11 +78,12 @@ configured in vlan unaware mode or default_pvid has to be adjusted.
ip link set dev sw0p1 master br0
ip link set dev sw0p2 master br0
-====================
-# Enabling "switch"
-====================
+Enabling "switch"
+=================
+
The Switch mode can be enabled by configuring devlink driver parameter
-"switch_mode" to 1/true:
+"switch_mode" to 1/true::
+
devlink dev param set platform/48484000.switch \
name switch_mode value 1 cmode runtime
@@ -79,9 +97,11 @@ marking packets with offload_fwd_mark flag unless "ale_bypass=0"
All configuration is implemented via switchdev API.
-====================
-# Bridge setup
-====================
+Bridge setup
+============
+
+::
+
devlink dev param set platform/48484000.switch \
name switch_mode value 1 cmode runtime
@@ -91,56 +111,65 @@ All configuration is implemented via switchdev API.
ip link set dev sw0p2 up
ip link set dev sw0p1 master br0
ip link set dev sw0p2 master br0
+
[*] bridge vlan add dev br0 vid 1 pvid untagged self
-[*] if vlan_filtering=1. where default_pvid=1
+ [*] if vlan_filtering=1. where default_pvid=1
-=================
-# On/off STP
-=================
-ip link set dev BRDEV type bridge stp_state 1/0
+ Note. Steps [*] are mandatory.
+
+
+On/off STP
+==========
-Note. Steps [*] are mandatory.
+::
-====================
-# VLAN configuration
-====================
-bridge vlan add dev br0 vid 1 pvid untagged self <---- add cpu port to VLAN 1
+ ip link set dev BRDEV type bridge stp_state 1/0
+
+VLAN configuration
+==================
+
+::
+
+ bridge vlan add dev br0 vid 1 pvid untagged self <---- add cpu port to VLAN 1
Note. This step is mandatory for bridge/default_pvid.
-=================
-# Add extra VLANs
-=================
- 1. untagged:
- bridge vlan add dev sw0p1 vid 100 pvid untagged master
- bridge vlan add dev sw0p2 vid 100 pvid untagged master
- bridge vlan add dev br0 vid 100 pvid untagged self <---- Add cpu port to VLAN100
+Add extra VLANs
+===============
- 2. tagged:
- bridge vlan add dev sw0p1 vid 100 master
- bridge vlan add dev sw0p2 vid 100 master
- bridge vlan add dev br0 vid 100 pvid tagged self <---- Add cpu port to VLAN100
+ 1. untagged::
+
+ bridge vlan add dev sw0p1 vid 100 pvid untagged master
+ bridge vlan add dev sw0p2 vid 100 pvid untagged master
+ bridge vlan add dev br0 vid 100 pvid untagged self <---- Add cpu port to VLAN100
+
+ 2. tagged::
+
+ bridge vlan add dev sw0p1 vid 100 master
+ bridge vlan add dev sw0p2 vid 100 master
+ bridge vlan add dev br0 vid 100 pvid tagged self <---- Add cpu port to VLAN100
-====
FDBs
-====
+----
+
FDBs are automatically added on the appropriate switch port upon detection
-Manually adding FDBs:
-bridge fdb add aa:bb:cc:dd:ee:ff dev sw0p1 master vlan 100
-bridge fdb add aa:bb:cc:dd:ee:fe dev sw0p2 master <---- Add on all VLANs
+Manually adding FDBs::
+
+ bridge fdb add aa:bb:cc:dd:ee:ff dev sw0p1 master vlan 100
+ bridge fdb add aa:bb:cc:dd:ee:fe dev sw0p2 master <---- Add on all VLANs
-====
MDBs
-====
+----
+
MDBs are automatically added on the appropriate switch port upon detection
-Manually adding MDBs:
-bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent vid 100
-bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent <---- Add on all VLANs
+Manually adding MDBs::
+
+ bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent vid 100
+ bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent <---- Add on all VLANs
-==================
Multicast flooding
==================
CPU port mcast_flooding is always on
@@ -148,9 +177,11 @@ CPU port mcast_flooding is always on
Turning flooding on/off on swithch ports:
bridge link set dev sw0p1 mcast_flood on/off
-==================
Access and Trunk port
-==================
+=====================
+
+::
+
bridge vlan add dev sw0p1 vid 100 pvid untagged master
bridge vlan add dev sw0p2 vid 100 master
@@ -158,52 +189,54 @@ Access and Trunk port
bridge vlan add dev br0 vid 100 self
ip link add link br0 name br0.100 type vlan id 100
- Note. Setting PVID on Bridge device itself working only for
- default VLAN (default_pvid).
+Note. Setting PVID on Bridge device itself working only for
+default VLAN (default_pvid).
+
+NFS
+===
-=====================
- NFS
-=====================
The only way for NFS to work is by chrooting to a minimal environment when
switch configuration that will affect connectivity is needed.
Assuming you are booting NFS with eth1 interface(the script is hacky and
it's just there to prove NFS is doable).
-setup.sh:
-#!/bin/sh
-mkdir proc
-mount -t proc none /proc
-ifconfig br0 > /dev/null
-if [ $? -ne 0 ]; then
- echo "Setting up bridge"
- ip link add name br0 type bridge
- ip link set dev br0 type bridge ageing_time 1000
- ip link set dev br0 type bridge vlan_filtering 1
-
- ip link set eth1 down
- ip link set eth1 name sw0p1
- ip link set dev sw0p1 up
- ip link set dev sw0p2 up
- ip link set dev sw0p2 master br0
- ip link set dev sw0p1 master br0
- bridge vlan add dev br0 vid 1 pvid untagged self
- ifconfig sw0p1 0.0.0.0
- udhchc -i br0
-fi
-umount /proc
-
-run_nfs.sh:
-#!/bin/sh
-mkdir /tmp/root/bin -p
-mkdir /tmp/root/lib -p
-
-cp -r /lib/ /tmp/root/
-cp -r /bin/ /tmp/root/
-cp /sbin/ip /tmp/root/bin
-cp /sbin/bridge /tmp/root/bin
-cp /sbin/ifconfig /tmp/root/bin
-cp /sbin/udhcpc /tmp/root/bin
-cp /path/to/setup.sh /tmp/root/bin
-chroot /tmp/root/ busybox sh /bin/setup.sh
-
-run ./run_nfs.sh
+setup.sh::
+
+ #!/bin/sh
+ mkdir proc
+ mount -t proc none /proc
+ ifconfig br0 > /dev/null
+ if [ $? -ne 0 ]; then
+ echo "Setting up bridge"
+ ip link add name br0 type bridge
+ ip link set dev br0 type bridge ageing_time 1000
+ ip link set dev br0 type bridge vlan_filtering 1
+
+ ip link set eth1 down
+ ip link set eth1 name sw0p1
+ ip link set dev sw0p1 up
+ ip link set dev sw0p2 up
+ ip link set dev sw0p2 master br0
+ ip link set dev sw0p1 master br0
+ bridge vlan add dev br0 vid 1 pvid untagged self
+ ifconfig sw0p1 0.0.0.0
+ udhchc -i br0
+ fi
+ umount /proc
+
+run_nfs.sh:::
+
+ #!/bin/sh
+ mkdir /tmp/root/bin -p
+ mkdir /tmp/root/lib -p
+
+ cp -r /lib/ /tmp/root/
+ cp -r /bin/ /tmp/root/
+ cp /sbin/ip /tmp/root/bin
+ cp /sbin/bridge /tmp/root/bin
+ cp /sbin/ifconfig /tmp/root/bin
+ cp /sbin/udhcpc /tmp/root/bin
+ cp /path/to/setup.sh /tmp/root/bin
+ chroot /tmp/root/ busybox sh /bin/setup.sh
+
+ run ./run_nfs.sh
diff --git a/Documentation/networking/device_drivers/ti/tlan.txt b/Documentation/networking/device_drivers/ti/tlan.rst
index 34550dfcef74..4fdc0907f4fc 100644
--- a/Documentation/networking/device_drivers/ti/tlan.txt
+++ b/Documentation/networking/device_drivers/ti/tlan.rst
@@ -1,20 +1,33 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+TLAN driver for Linux
+=====================
+
+:Version: 1.14a
+
(C) 1997-1998 Caldera, Inc.
+
(C) 1998 James Banks
+
(C) 1999-2001 Torben Mathiasen <tmm@image.dk, torben.mathiasen@compaq.com>
For driver information/updates visit http://www.compaq.com
-TLAN driver for Linux, version 1.14a
-README
-I. Supported Devices.
+
+I. Supported Devices
+====================
Only PCI devices will work with this driver.
Supported:
+
+ ========= ========= ===========================================
Vendor ID Device ID Name
+ ========= ========= ===========================================
0e11 ae32 Compaq Netelligent 10/100 TX PCI UTP
0e11 ae34 Compaq Netelligent 10 T PCI UTP
0e11 ae35 Compaq Integrated NetFlex 3/P
@@ -25,13 +38,14 @@ I. Supported Devices.
0e11 b030 Compaq Netelligent 10/100 TX UTP
0e11 f130 Compaq NetFlex 3/P
0e11 f150 Compaq NetFlex 3/P
- 108d 0012 Olicom OC-2325
+ 108d 0012 Olicom OC-2325
108d 0013 Olicom OC-2183
- 108d 0014 Olicom OC-2326
+ 108d 0014 Olicom OC-2326
+ ========= ========= ===========================================
Caveats:
-
+
I am not sure if 100BaseTX daughterboards (for those cards which
support such things) will work. I haven't had any solid evidence
either way.
@@ -41,21 +55,25 @@ I. Supported Devices.
The "Netelligent 10 T/2 PCI UTP/Coax" (b012) device is untested,
but I do not expect any problems.
-
-II. Driver Options
+
+II. Driver Options
+==================
+
1. You can append debug=x to the end of the insmod line to get
- debug messages, where x is a bit field where the bits mean
+ debug messages, where x is a bit field where the bits mean
the following:
-
+
+ ==== =====================================
0x01 Turn on general debugging messages.
0x02 Turn on receive debugging messages.
0x04 Turn on transmit debugging messages.
0x08 Turn on list debugging messages.
+ ==== =====================================
2. You can append aui=1 to the end of the insmod line to cause
- the adapter to use the AUI interface instead of the 10 Base T
- interface. This is also what to do if you want to use the BNC
+ the adapter to use the AUI interface instead of the 10 Base T
+ interface. This is also what to do if you want to use the BNC
connector on a TLAN based device. (Setting this option on a
device that does not have an AUI/BNC connector will probably
cause it to not function correctly.)
@@ -70,41 +88,45 @@ II. Driver Options
5. You have to use speed=X duplex=Y together now. If you just
do "insmod tlan.o speed=100" the driver will do Auto-Neg.
- To force a 10Mbps Half-Duplex link do "insmod tlan.o speed=10
+ To force a 10Mbps Half-Duplex link do "insmod tlan.o speed=10
duplex=1".
6. If the driver is built into the kernel, you can use the 3rd
and 4th parameters to set aui and debug respectively. For
- example:
+ example::
- ether=0,0,0x1,0x7,eth0
+ ether=0,0,0x1,0x7,eth0
This sets aui to 0x1 and debug to 0x7, assuming eth0 is a
supported TLAN device.
The bits in the third byte are assigned as follows:
- 0x01 = aui
- 0x02 = use half duplex
- 0x04 = use full duplex
- 0x08 = use 10BaseT
- 0x10 = use 100BaseTx
+ ==== ===============
+ 0x01 aui
+ 0x02 use half duplex
+ 0x04 use full duplex
+ 0x08 use 10BaseT
+ 0x10 use 100BaseTx
+ ==== ===============
You also need to set both speed and duplex settings when forcing
- speeds with kernel-parameters.
+ speeds with kernel-parameters.
ether=0,0,0x12,0,eth0 will force link to 100Mbps Half-Duplex.
7. If you have more than one tlan adapter in your system, you can
use the above options on a per adapter basis. To force a 100Mbit/HD
- link with your eth1 adapter use:
-
- insmod tlan speed=0,100 duplex=0,1
+ link with your eth1 adapter use::
+
+ insmod tlan speed=0,100 duplex=0,1
Now eth0 will use auto-neg and eth1 will be forced to 100Mbit/HD.
Note that the tlan driver supports a maximum of 8 adapters.
-III. Things to try if you have problems.
+III. Things to try if you have problems
+=======================================
+
1. Make sure your card's PCI id is among those listed in
section I, above.
2. Make sure routing is correct.
@@ -113,5 +135,6 @@ III. Things to try if you have problems.
There is also a tlan mailing list which you can join by sending "subscribe tlan"
in the body of an email to majordomo@vuser.vu.union.edu.
+
There is also a tlan website at http://www.compaq.com
diff --git a/Documentation/networking/device_drivers/toshiba/spider_net.txt b/Documentation/networking/device_drivers/toshiba/spider_net.rst
index b0b75f8463b3..fe5b32be15cd 100644
--- a/Documentation/networking/device_drivers/toshiba/spider_net.txt
+++ b/Documentation/networking/device_drivers/toshiba/spider_net.rst
@@ -1,6 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
- The Spidernet Device Driver
- ===========================
+===========================
+The Spidernet Device Driver
+===========================
Written by Linas Vepstas <linas@austin.ibm.com>
@@ -78,15 +80,15 @@ GDACTDPA, tail and head pointers. It will also summarize the contents
of the ring, starting at the tail pointer, and listing the status
of the descrs that follow.
-A typical example of the output, for a nearly idle system, might be
+A typical example of the output, for a nearly idle system, might be::
-net eth1: Total number of descrs=256
-net eth1: Chain tail located at descr=20
-net eth1: Chain head is at 20
-net eth1: HW curr desc (GDACTDPA) is at 21
-net eth1: Have 1 descrs with stat=x40800101
-net eth1: HW next desc (GDACNEXTDA) is at 22
-net eth1: Last 255 descrs with stat=xa0800000
+ net eth1: Total number of descrs=256
+ net eth1: Chain tail located at descr=20
+ net eth1: Chain head is at 20
+ net eth1: HW curr desc (GDACTDPA) is at 21
+ net eth1: Have 1 descrs with stat=x40800101
+ net eth1: HW next desc (GDACNEXTDA) is at 22
+ net eth1: Last 255 descrs with stat=xa0800000
In the above, the hardware has filled in one descr, number 20. Both
head and tail are pointing at 20, because it has not yet been emptied.
@@ -101,11 +103,11 @@ The status x4... corresponds to "full" and status xa... corresponds
to "empty". The actual value printed is RXCOMST_A.
In the device driver source code, a different set of names are
-used for these same concepts, so that
+used for these same concepts, so that::
-"empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa
-"full" == SPIDER_NET_DESCR_FRAME_END == 0x4
-"not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf
+ "empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa
+ "full" == SPIDER_NET_DESCR_FRAME_END == 0x4
+ "not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf
The RX RAM full bug/feature
@@ -137,19 +139,19 @@ while the hardware is waiting for a different set of descrs to become
empty.
A call to show_rx_chain() at this point indicates the nature of the
-problem. A typical print when the network is hung shows the following:
-
-net eth1: Spider RX RAM full, incoming packets might be discarded!
-net eth1: Total number of descrs=256
-net eth1: Chain tail located at descr=255
-net eth1: Chain head is at 255
-net eth1: HW curr desc (GDACTDPA) is at 0
-net eth1: Have 1 descrs with stat=xa0800000
-net eth1: HW next desc (GDACNEXTDA) is at 1
-net eth1: Have 127 descrs with stat=x40800101
-net eth1: Have 1 descrs with stat=x40800001
-net eth1: Have 126 descrs with stat=x40800101
-net eth1: Last 1 descrs with stat=xa0800000
+problem. A typical print when the network is hung shows the following::
+
+ net eth1: Spider RX RAM full, incoming packets might be discarded!
+ net eth1: Total number of descrs=256
+ net eth1: Chain tail located at descr=255
+ net eth1: Chain head is at 255
+ net eth1: HW curr desc (GDACTDPA) is at 0
+ net eth1: Have 1 descrs with stat=xa0800000
+ net eth1: HW next desc (GDACNEXTDA) is at 1
+ net eth1: Have 127 descrs with stat=x40800101
+ net eth1: Have 1 descrs with stat=x40800001
+ net eth1: Have 126 descrs with stat=x40800101
+ net eth1: Last 1 descrs with stat=xa0800000
Both the tail and head pointers are pointing at descr 255, which is
marked xa... which is "empty". Thus, from the OS point of view, there
@@ -198,7 +200,3 @@ For large packets, this mechanism generates a relatively small number
of interrupts, about 1K/sec. For smaller packets, this will drop to zero
interrupts, as the hardware can empty the queue faster than the kernel
can fill it.
-
-
- ======= END OF DOCUMENT ========
-
diff --git a/Documentation/networking/devlink-params-sja1105.txt b/Documentation/networking/devlink-params-sja1105.txt
new file mode 100644
index 000000000000..1d71742e270a
--- /dev/null
+++ b/Documentation/networking/devlink-params-sja1105.txt
@@ -0,0 +1,27 @@
+best_effort_vlan_filtering
+ [DEVICE, DRIVER-SPECIFIC]
+ Allow plain ETH_P_8021Q headers to be used as DSA tags.
+ Benefits:
+ - Can terminate untagged traffic over switch net
+ devices even when enslaved to a bridge with
+ vlan_filtering=1.
+ - Can terminate VLAN-tagged traffic over switch net
+ devices even when enslaved to a bridge with
+ vlan_filtering=1, with some constraints (no more than
+ 7 non-pvid VLANs per user port).
+ - Can do QoS based on VLAN PCP and VLAN membership
+ admission control for autonomously forwarded frames
+ (regardless of whether they can be terminated on the
+ CPU or not).
+ Drawbacks:
+ - User cannot use VLANs in range 1024-3071. If the
+ switch receives frames with such VIDs, it will
+ misinterpret them as DSA tags.
+ - Switch uses Shared VLAN Learning (FDB lookup uses
+ only DMAC as key).
+ - When VLANs span cross-chip topologies, the total
+ number of permitted VLANs may be less than 7 per
+ port, due to a maximum number of 32 VLAN retagging
+ rules per switch.
+ Configuration mode: runtime
+ Type: bool.
diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
index 04e04d1ff627..3654c3e9658f 100644
--- a/Documentation/networking/devlink/devlink-region.rst
+++ b/Documentation/networking/devlink/devlink-region.rst
@@ -14,6 +14,10 @@ Region snapshots are collected by the driver, and can be accessed via read
or dump commands. This allows future analysis on the created snapshots.
Regions may optionally support triggering snapshots on demand.
+Snapshot identifiers are scoped to the devlink instance, not a region.
+All snapshots with the same snapshot id within a devlink instance
+correspond to the same event.
+
The major benefit to creating a region is to provide access to internal
address regions that are otherwise inaccessible to the user.
@@ -23,7 +27,9 @@ states, but see also :doc:`devlink-health`
Regions may optionally support capturing a snapshot on demand via the
``DEVLINK_CMD_REGION_NEW`` netlink message. A driver wishing to allow
requested snapshots must implement the ``.snapshot`` callback for the region
-in its ``devlink_region_ops`` structure.
+in its ``devlink_region_ops`` structure. If snapshot id is not set in
+the ``DEVLINK_CMD_REGION_NEW`` request kernel will allocate one and send
+the snapshot information to user space.
example usage
-------------
@@ -45,7 +51,8 @@ example usage
$ devlink region del pci/0000:00:05.0/cr-space snapshot 1
# Request an immediate snapshot, if supported by the region
- $ devlink region new pci/0000:00:05.0/cr-space snapshot 5
+ $ devlink region new pci/0000:00:05.0/cr-space
+ pci/0000:00:05.0/cr-space: snapshot 5
# Dump a snapshot:
$ devlink region dump pci/0000:00:05.0/fw-health snapshot 1
diff --git a/Documentation/networking/devlink/devlink-trap.rst b/Documentation/networking/devlink/devlink-trap.rst
index fe089acb7783..1e3f3ffee248 100644
--- a/Documentation/networking/devlink/devlink-trap.rst
+++ b/Documentation/networking/devlink/devlink-trap.rst
@@ -55,7 +55,7 @@ The following diagram provides a general overview of ``devlink-trap``::
| |
+-------^--------+
|
- |
+ | Non-control traps
|
+----+----+
| | Kernel's Rx path
@@ -97,6 +97,12 @@ The ``devlink-trap`` mechanism supports the following packet trap types:
processed by ``devlink`` and injected to the kernel's Rx path. Changing the
action of such traps is not allowed, as it can easily break the control
plane.
+ * ``control``: Trapped packets were trapped by the device because these are
+ control packets required for the correct functioning of the control plane.
+ For example, ARP request and IGMP query packets. Packets are injected to
+ the kernel's Rx path, but not reported to the kernel's drop monitor.
+ Changing the action of such traps is not allowed, as it can easily break
+ the control plane.
.. _Trap-Actions:
@@ -108,6 +114,8 @@ The ``devlink-trap`` mechanism supports the following packet trap actions:
* ``trap``: The sole copy of the packet is sent to the CPU.
* ``drop``: The packet is dropped by the underlying device and a copy is not
sent to the CPU.
+ * ``mirror``: The packet is forwarded by the underlying device and a copy is
+ sent to the CPU.
Generic Packet Traps
====================
@@ -244,6 +252,159 @@ be added to the following table:
* - ``egress_flow_action_drop``
- ``drop``
- Traps packets dropped during processing of egress flow action drop
+ * - ``stp``
+ - ``control``
+ - Traps STP packets
+ * - ``lacp``
+ - ``control``
+ - Traps LACP packets
+ * - ``lldp``
+ - ``control``
+ - Traps LLDP packets
+ * - ``igmp_query``
+ - ``control``
+ - Traps IGMP Membership Query packets
+ * - ``igmp_v1_report``
+ - ``control``
+ - Traps IGMP Version 1 Membership Report packets
+ * - ``igmp_v2_report``
+ - ``control``
+ - Traps IGMP Version 2 Membership Report packets
+ * - ``igmp_v3_report``
+ - ``control``
+ - Traps IGMP Version 3 Membership Report packets
+ * - ``igmp_v2_leave``
+ - ``control``
+ - Traps IGMP Version 2 Leave Group packets
+ * - ``mld_query``
+ - ``control``
+ - Traps MLD Multicast Listener Query packets
+ * - ``mld_v1_report``
+ - ``control``
+ - Traps MLD Version 1 Multicast Listener Report packets
+ * - ``mld_v2_report``
+ - ``control``
+ - Traps MLD Version 2 Multicast Listener Report packets
+ * - ``mld_v1_done``
+ - ``control``
+ - Traps MLD Version 1 Multicast Listener Done packets
+ * - ``ipv4_dhcp``
+ - ``control``
+ - Traps IPv4 DHCP packets
+ * - ``ipv6_dhcp``
+ - ``control``
+ - Traps IPv6 DHCP packets
+ * - ``arp_request``
+ - ``control``
+ - Traps ARP request packets
+ * - ``arp_response``
+ - ``control``
+ - Traps ARP response packets
+ * - ``arp_overlay``
+ - ``control``
+ - Traps NVE-decapsulated ARP packets that reached the overlay network.
+ This is required, for example, when the address that needs to be
+ resolved is a local address
+ * - ``ipv6_neigh_solicit``
+ - ``control``
+ - Traps IPv6 Neighbour Solicitation packets
+ * - ``ipv6_neigh_advert``
+ - ``control``
+ - Traps IPv6 Neighbour Advertisement packets
+ * - ``ipv4_bfd``
+ - ``control``
+ - Traps IPv4 BFD packets
+ * - ``ipv6_bfd``
+ - ``control``
+ - Traps IPv6 BFD packets
+ * - ``ipv4_ospf``
+ - ``control``
+ - Traps IPv4 OSPF packets
+ * - ``ipv6_ospf``
+ - ``control``
+ - Traps IPv6 OSPF packets
+ * - ``ipv4_bgp``
+ - ``control``
+ - Traps IPv4 BGP packets
+ * - ``ipv6_bgp``
+ - ``control``
+ - Traps IPv6 BGP packets
+ * - ``ipv4_vrrp``
+ - ``control``
+ - Traps IPv4 VRRP packets
+ * - ``ipv6_vrrp``
+ - ``control``
+ - Traps IPv6 VRRP packets
+ * - ``ipv4_pim``
+ - ``control``
+ - Traps IPv4 PIM packets
+ * - ``ipv6_pim``
+ - ``control``
+ - Traps IPv6 PIM packets
+ * - ``uc_loopback``
+ - ``control``
+ - Traps unicast packets that need to be routed through the same layer 3
+ interface from which they were received. Such packets are routed by the
+ kernel, but also cause it to potentially generate ICMP redirect packets
+ * - ``local_route``
+ - ``control``
+ - Traps unicast packets that hit a local route and need to be locally
+ delivered
+ * - ``external_route``
+ - ``control``
+ - Traps packets that should be routed through an external interface (e.g.,
+ management interface) that does not belong to the same device (e.g.,
+ switch ASIC) as the ingress interface
+ * - ``ipv6_uc_dip_link_local_scope``
+ - ``control``
+ - Traps unicast IPv6 packets that need to be routed and have a destination
+ IP address with a link-local scope (i.e., fe80::/10). The trap allows
+ device drivers to avoid programming link-local routes, but still receive
+ packets for local delivery
+ * - ``ipv6_dip_all_nodes``
+ - ``control``
+ - Traps IPv6 packets that their destination IP address is the "All Nodes
+ Address" (i.e., ff02::1)
+ * - ``ipv6_dip_all_routers``
+ - ``control``
+ - Traps IPv6 packets that their destination IP address is the "All Routers
+ Address" (i.e., ff02::2)
+ * - ``ipv6_router_solicit``
+ - ``control``
+ - Traps IPv6 Router Solicitation packets
+ * - ``ipv6_router_advert``
+ - ``control``
+ - Traps IPv6 Router Advertisement packets
+ * - ``ipv6_redirect``
+ - ``control``
+ - Traps IPv6 Redirect Message packets
+ * - ``ipv4_router_alert``
+ - ``control``
+ - Traps IPv4 packets that need to be routed and include the Router Alert
+ option. Such packets need to be locally delivered to raw sockets that
+ have the IP_ROUTER_ALERT socket option set
+ * - ``ipv6_router_alert``
+ - ``control``
+ - Traps IPv6 packets that need to be routed and include the Router Alert
+ option in their Hop-by-Hop extension header. Such packets need to be
+ locally delivered to raw sockets that have the IPV6_ROUTER_ALERT socket
+ option set
+ * - ``ptp_event``
+ - ``control``
+ - Traps PTP time-critical event messages (Sync, Delay_req, Pdelay_Req and
+ Pdelay_Resp)
+ * - ``ptp_general``
+ - ``control``
+ - Traps PTP general messages (Announce, Follow_Up, Delay_Resp,
+ Pdelay_Resp_Follow_Up, management and signaling)
+ * - ``flow_action_sample``
+ - ``control``
+ - Traps packets sampled during processing of flow action sample (e.g., via
+ tc's sample action)
+ * - ``flow_action_trap``
+ - ``control``
+ - Traps packets logged during processing of flow action trap (e.g., via
+ tc's trap action)
Driver-specific Packet Traps
============================
@@ -277,8 +438,11 @@ narrow. The description of these groups must be added to the following table:
- Contains packet traps for packets that were dropped by the device during
layer 2 forwarding (i.e., bridge)
* - ``l3_drops``
- - Contains packet traps for packets that were dropped by the device or hit
- an exception (e.g., TTL error) during layer 3 forwarding
+ - Contains packet traps for packets that were dropped by the device during
+ layer 3 forwarding
+ * - ``l3_exceptions``
+ - Contains packet traps for packets that hit an exception (e.g., TTL
+ error) during layer 3 forwarding
* - ``buffer_drops``
- Contains packet traps for packets that were dropped by the device due to
an enqueue decision
@@ -288,6 +452,55 @@ narrow. The description of these groups must be added to the following table:
* - ``acl_drops``
- Contains packet traps for packets that were dropped by the device during
ACL processing
+ * - ``stp``
+ - Contains packet traps for STP packets
+ * - ``lacp``
+ - Contains packet traps for LACP packets
+ * - ``lldp``
+ - Contains packet traps for LLDP packets
+ * - ``mc_snooping``
+ - Contains packet traps for IGMP and MLD packets required for multicast
+ snooping
+ * - ``dhcp``
+ - Contains packet traps for DHCP packets
+ * - ``neigh_discovery``
+ - Contains packet traps for neighbour discovery packets (e.g., ARP, IPv6
+ ND)
+ * - ``bfd``
+ - Contains packet traps for BFD packets
+ * - ``ospf``
+ - Contains packet traps for OSPF packets
+ * - ``bgp``
+ - Contains packet traps for BGP packets
+ * - ``vrrp``
+ - Contains packet traps for VRRP packets
+ * - ``pim``
+ - Contains packet traps for PIM packets
+ * - ``uc_loopback``
+ - Contains a packet trap for unicast loopback packets (i.e.,
+ ``uc_loopback``). This trap is singled-out because in cases such as
+ one-armed router it will be constantly triggered. To limit the impact on
+ the CPU usage, a packet trap policer with a low rate can be bound to the
+ group without affecting other traps
+ * - ``local_delivery``
+ - Contains packet traps for packets that should be locally delivered after
+ routing, but do not match more specific packet traps (e.g.,
+ ``ipv4_bgp``)
+ * - ``ipv6``
+ - Contains packet traps for various IPv6 control packets (e.g., Router
+ Advertisements)
+ * - ``ptp_event``
+ - Contains packet traps for PTP time-critical event messages (Sync,
+ Delay_req, Pdelay_Req and Pdelay_Resp)
+ * - ``ptp_general``
+ - Contains packet traps for PTP general messages (Announce, Follow_Up,
+ Delay_Resp, Pdelay_Resp_Follow_Up, management and signaling)
+ * - ``acl_sample``
+ - Contains packet traps for packets that were sampled by the device during
+ ACL processing
+ * - ``acl_trap``
+ - Contains packet traps for packets that were trapped (logged) by the
+ device during ACL processing
Packet Trap Policers
====================
diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
index 5b58fc4e1268..72ea8d295724 100644
--- a/Documentation/networking/devlink/ice.rst
+++ b/Documentation/networking/devlink/ice.rst
@@ -61,14 +61,25 @@ The ``ice`` driver reports the following versions
- running
- ICE OS Default Package
- The name of the DDP package that is active in the device. The DDP
- package is loaded by the driver during initialization. Each varation
- of DDP package shall have a unique name.
+ package is loaded by the driver during initialization. Each
+ variation of the DDP package has a unique name.
* - ``fw.app``
- running
- 1.3.1.0
- The version of the DDP package that is active in the device. Note
that both the name (as reported by ``fw.app.name``) and version are
required to uniquely identify the package.
+ * - ``fw.netlist``
+ - running
+ - 1.1.2000-6.7.0
+ - The version of the netlist module. This module defines the device's
+ Ethernet capabilities and default settings, and is used by the
+ management firmware as part of managing link and device
+ connectivity.
+ * - ``fw.netlist.build``
+ - running
+ - 0xee16ced7
+ - The first 4 bytes of the hash of the netlist module contents.
Regions
=======
diff --git a/Documentation/networking/dns_resolver.txt b/Documentation/networking/dns_resolver.rst
index eaa8f9a6fd5d..add4d59a99a5 100644
--- a/Documentation/networking/dns_resolver.txt
+++ b/Documentation/networking/dns_resolver.rst
@@ -1,8 +1,10 @@
- ===================
- DNS Resolver Module
- ===================
+.. SPDX-License-Identifier: GPL-2.0
-Contents:
+===================
+DNS Resolver Module
+===================
+
+.. Contents:
- Overview.
- Compilation.
@@ -12,8 +14,7 @@ Contents:
- Debugging.
-========
-OVERVIEW
+Overview
========
The DNS resolver module provides a way for kernel services to make DNS queries
@@ -33,50 +34,50 @@ It does not yet support the following AFS features:
This code is extracted from the CIFS filesystem.
-===========
-COMPILATION
+Compilation
===========
-The module should be enabled by turning on the kernel configuration options:
+The module should be enabled by turning on the kernel configuration options::
CONFIG_DNS_RESOLVER - tristate "DNS Resolver support"
-==========
-SETTING UP
+Setting up
==========
To set up this facility, the /etc/request-key.conf file must be altered so that
/sbin/request-key can appropriately direct the upcalls. For example, to handle
basic dname to IPv4/IPv6 address resolution, the following line should be
-added:
+added::
+
#OP TYPE DESC CO-INFO PROGRAM ARG1 ARG2 ARG3 ...
#====== ============ ======= ======= ==========================
create dns_resolver * * /usr/sbin/cifs.upcall %k
To direct a query for query type 'foo', a line of the following should be added
-before the more general line given above as the first match is the one taken.
+before the more general line given above as the first match is the one taken::
create dns_resolver foo:* * /usr/sbin/dns.foo %k
-=====
-USAGE
+Usage
=====
To make use of this facility, one of the following functions that are
-implemented in the module can be called after doing:
+implemented in the module can be called after doing::
#include <linux/dns_resolver.h>
- (1) int dns_query(const char *type, const char *name, size_t namelen,
- const char *options, char **_result, time_t *_expiry);
+ ::
+
+ int dns_query(const char *type, const char *name, size_t namelen,
+ const char *options, char **_result, time_t *_expiry);
This is the basic access function. It looks for a cached DNS query and if
it doesn't find it, it upcalls to userspace to make a new DNS query, which
may then be cached. The key description is constructed as a string of the
- form:
+ form::
[<type>:]<name>
@@ -107,16 +108,14 @@ This can be cleared by any process that has the CAP_SYS_ADMIN capability by
the use of KEYCTL_KEYRING_CLEAR on the keyring ID.
-===============================
-READING DNS KEYS FROM USERSPACE
+Reading DNS Keys from Userspace
===============================
Keys of dns_resolver type can be read from userspace using keyctl_read() or
"keyctl read/print/pipe".
-=========
-MECHANISM
+Mechanism
=========
The dnsresolver module registers a key type called "dns_resolver". Keys of
@@ -147,11 +146,10 @@ See <file:Documentation/security/keys/request-key.rst> for further
information about request-key function.
-=========
-DEBUGGING
+Debugging
=========
Debugging messages can be turned on dynamically by writing a 1 into the
-following file:
+following file::
- /sys/module/dnsresolver/parameters/debug
+ /sys/module/dnsresolver/parameters/debug
diff --git a/Documentation/networking/driver.txt b/Documentation/networking/driver.rst
index da59e2884130..c8f59dbda46f 100644
--- a/Documentation/networking/driver.txt
+++ b/Documentation/networking/driver.rst
@@ -1,4 +1,8 @@
-Document about softnet driver issues
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+Softnet Driver Issues
+=====================
Transmit path guidelines:
@@ -8,7 +12,7 @@ Transmit path guidelines:
transmit function will become busy.
Instead it must maintain the queue properly. For example,
- for a driver implementing scatter-gather this means:
+ for a driver implementing scatter-gather this means::
static netdev_tx_t drv_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev)
@@ -38,25 +42,25 @@ Transmit path guidelines:
return NETDEV_TX_OK;
}
- And then at the end of your TX reclamation event handling:
+ And then at the end of your TX reclamation event handling::
if (netif_queue_stopped(dp->dev) &&
- TX_BUFFS_AVAIL(dp) > (MAX_SKB_FRAGS + 1))
+ TX_BUFFS_AVAIL(dp) > (MAX_SKB_FRAGS + 1))
netif_wake_queue(dp->dev);
- For a non-scatter-gather supporting card, the three tests simply become:
+ For a non-scatter-gather supporting card, the three tests simply become::
/* This is a hard error log it. */
if (TX_BUFFS_AVAIL(dp) <= 0)
- and:
+ and::
if (TX_BUFFS_AVAIL(dp) == 0)
- and:
+ and::
if (netif_queue_stopped(dp->dev) &&
- TX_BUFFS_AVAIL(dp) > 0)
+ TX_BUFFS_AVAIL(dp) > 0)
netif_wake_queue(dp->dev);
2) An ndo_start_xmit method must not modify the shared parts of a
@@ -86,7 +90,7 @@ Close/stop guidelines:
1) After the ndo_stop routine has been called, the hardware must
not receive or transmit any data. All in flight packets must
- be aborted. If necessary, poll or wait for completion of
+ be aborted. If necessary, poll or wait for completion of
any reset commands.
2) The ndo_stop routine will be called by unregister_netdevice
diff --git a/Documentation/networking/dsa/sja1105.rst b/Documentation/networking/dsa/sja1105.rst
index 64553d8d91cb..b6bbc17814fb 100644
--- a/Documentation/networking/dsa/sja1105.rst
+++ b/Documentation/networking/dsa/sja1105.rst
@@ -66,34 +66,193 @@ reprogrammed with the updated static configuration.
Traffic support
===============
-The switches do not support switch tagging in hardware. But they do support
-customizing the TPID by which VLAN traffic is identified as such. The switch
-driver is leveraging ``CONFIG_NET_DSA_TAG_8021Q`` by requesting that special
-VLANs (with a custom TPID of ``ETH_P_EDSA`` instead of ``ETH_P_8021Q``) are
-installed on its ports when not in ``vlan_filtering`` mode. This does not
-interfere with the reception and transmission of real 802.1Q-tagged traffic,
-because the switch does no longer parse those packets as VLAN after the TPID
-change.
-The TPID is restored when ``vlan_filtering`` is requested by the user through
-the bridge layer, and general IP termination becomes no longer possible through
-the switch netdevices in this mode.
-
-The switches have two programmable filters for link-local destination MACs.
+The switches do not have hardware support for DSA tags, except for "slow
+protocols" for switch control as STP and PTP. For these, the switches have two
+programmable filters for link-local destination MACs.
These are used to trap BPDUs and PTP traffic to the master netdevice, and are
further used to support STP and 1588 ordinary clock/boundary clock
-functionality.
-
-The following traffic modes are supported over the switch netdevices:
-
-+--------------------+------------+------------------+------------------+
-| | Standalone | Bridged with | Bridged with |
-| | ports | vlan_filtering 0 | vlan_filtering 1 |
-+====================+============+==================+==================+
-| Regular traffic | Yes | Yes | No (use master) |
-+--------------------+------------+------------------+------------------+
-| Management traffic | Yes | Yes | Yes |
-| (BPDU, PTP) | | | |
-+--------------------+------------+------------------+------------------+
+functionality. For frames trapped to the CPU, source port and switch ID
+information is encoded by the hardware into the frames.
+
+But by leveraging ``CONFIG_NET_DSA_TAG_8021Q`` (a software-defined DSA tagging
+format based on VLANs), general-purpose traffic termination through the network
+stack can be supported under certain circumstances.
+
+Depending on VLAN awareness state, the following operating modes are possible
+with the switch:
+
+- Mode 1 (VLAN-unaware): a port is in this mode when it is used as a standalone
+ net device, or when it is enslaved to a bridge with ``vlan_filtering=0``.
+- Mode 2 (fully VLAN-aware): a port is in this mode when it is enslaved to a
+ bridge with ``vlan_filtering=1``. Access to the entire VLAN range is given to
+ the user through ``bridge vlan`` commands, but general-purpose (anything
+ other than STP, PTP etc) traffic termination is not possible through the
+ switch net devices. The other packets can be still by user space processed
+ through the DSA master interface (similar to ``DSA_TAG_PROTO_NONE``).
+- Mode 3 (best-effort VLAN-aware): a port is in this mode when enslaved to a
+ bridge with ``vlan_filtering=1``, and the devlink property of its parent
+ switch named ``best_effort_vlan_filtering`` is set to ``true``. When
+ configured like this, the range of usable VIDs is reduced (0 to 1023 and 3072
+ to 4094), so is the number of usable VIDs (maximum of 7 non-pvid VLANs per
+ port*), and shared VLAN learning is performed (FDB lookup is done only by
+ DMAC, not also by VID).
+
+To summarize, in each mode, the following types of traffic are supported over
+the switch net devices:
+
++-------------+-----------+--------------+------------+
+| | Mode 1 | Mode 2 | Mode 3 |
++=============+===========+==============+============+
+| Regular | Yes | No | Yes |
+| traffic | | (use master) | |
++-------------+-----------+--------------+------------+
+| Management | Yes | Yes | Yes |
+| traffic | | | |
+| (BPDU, PTP) | | | |
++-------------+-----------+--------------+------------+
+
+To configure the switch to operate in Mode 3, the following steps can be
+followed::
+
+ ip link add dev br0 type bridge
+ # swp2 operates in Mode 1 now
+ ip link set dev swp2 master br0
+ # swp2 temporarily moves to Mode 2
+ ip link set dev br0 type bridge vlan_filtering 1
+ [ 61.204770] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering
+ [ 61.239944] sja1105 spi0.1: Disabled switch tagging
+ # swp3 now operates in Mode 3
+ devlink dev param set spi/spi0.1 name best_effort_vlan_filtering value true cmode runtime
+ [ 64.682927] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering
+ [ 64.711925] sja1105 spi0.1: Enabled switch tagging
+ # Cannot use VLANs in range 1024-3071 while in Mode 3.
+ bridge vlan add dev swp2 vid 1025 untagged pvid
+ RTNETLINK answers: Operation not permitted
+ bridge vlan add dev swp2 vid 100
+ bridge vlan add dev swp2 vid 101 untagged
+ bridge vlan
+ port vlan ids
+ swp5 1 PVID Egress Untagged
+
+ swp2 1 PVID Egress Untagged
+ 100
+ 101 Egress Untagged
+
+ swp3 1 PVID Egress Untagged
+
+ swp4 1 PVID Egress Untagged
+
+ br0 1 PVID Egress Untagged
+ bridge vlan add dev swp2 vid 102
+ bridge vlan add dev swp2 vid 103
+ bridge vlan add dev swp2 vid 104
+ bridge vlan add dev swp2 vid 105
+ bridge vlan add dev swp2 vid 106
+ bridge vlan add dev swp2 vid 107
+ # Cannot use mode than 7 VLANs per port while in Mode 3.
+ [ 3885.216832] sja1105 spi0.1: No more free subvlans
+
+\* "maximum of 7 non-pvid VLANs per port": Decoding VLAN-tagged packets on the
+CPU in mode 3 is possible through VLAN retagging of packets that go from the
+switch to the CPU. In cross-chip topologies, the port that goes to the CPU
+might also go to other switches. In that case, those other switches will see
+only a retagged packet (which only has meaning for the CPU). So if they are
+interested in this VLAN, they need to apply retagging in the reverse direction,
+to recover the original value from it. This consumes extra hardware resources
+for this switch. There is a maximum of 32 entries in the Retagging Table of
+each switch device.
+
+As an example, consider this cross-chip topology::
+
+ +-------------------------------------------------+
+ | Host SoC |
+ | +-------------------------+ |
+ | | DSA master for embedded | |
+ | | switch (non-sja1105) | |
+ | +--------+-------------------------+--------+ |
+ | | embedded L2 switch | |
+ | | | |
+ | | +--------------+ +--------------+ | |
+ | | |DSA master for| |DSA master for| | |
+ | | | SJA1105 1 | | SJA1105 2 | | |
+ +--+---+--------------+-----+--------------+---+--+
+
+ +-----------------------+ +-----------------------+
+ | SJA1105 switch 1 | | SJA1105 switch 2 |
+ +-----+-----+-----+-----+ +-----+-----+-----+-----+
+ |sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3|
+ +-----+-----+-----+-----+ +-----+-----+-----+-----+
+
+To reach the CPU, SJA1105 switch 1 (spi/spi2.1) uses the same port as is uses
+to reach SJA1105 switch 2 (spi/spi2.2), which would be port 4 (not drawn).
+Similarly for SJA1105 switch 2.
+
+Also consider the following commands, that add VLAN 100 to every sja1105 user
+port::
+
+ devlink dev param set spi/spi2.1 name best_effort_vlan_filtering value true cmode runtime
+ devlink dev param set spi/spi2.2 name best_effort_vlan_filtering value true cmode runtime
+ ip link add dev br0 type bridge
+ for port in sw1p0 sw1p1 sw1p2 sw1p3 \
+ sw2p0 sw2p1 sw2p2 sw2p3; do
+ ip link set dev $port master br0
+ done
+ ip link set dev br0 type bridge vlan_filtering 1
+ for port in sw1p0 sw1p1 sw1p2 sw1p3 \
+ sw2p0 sw2p1 sw2p2; do
+ bridge vlan add dev $port vid 100
+ done
+ ip link add link br0 name br0.100 type vlan id 100 && ip link set dev br0.100 up
+ ip addr add 192.168.100.3/24 dev br0.100
+ bridge vlan add dev br0 vid 100 self
+
+ bridge vlan
+ port vlan ids
+ sw1p0 1 PVID Egress Untagged
+ 100
+
+ sw1p1 1 PVID Egress Untagged
+ 100
+
+ sw1p2 1 PVID Egress Untagged
+ 100
+
+ sw1p3 1 PVID Egress Untagged
+ 100
+
+ sw2p0 1 PVID Egress Untagged
+ 100
+
+ sw2p1 1 PVID Egress Untagged
+ 100
+
+ sw2p2 1 PVID Egress Untagged
+ 100
+
+ sw2p3 1 PVID Egress Untagged
+
+ br0 1 PVID Egress Untagged
+ 100
+
+SJA1105 switch 1 consumes 1 retagging entry for each VLAN on each user port
+towards the CPU. It also consumes 1 retagging entry for each non-pvid VLAN that
+it is also interested in, which is configured on any port of any neighbor
+switch.
+
+In this case, SJA1105 switch 1 consumes a total of 11 retagging entries, as
+follows:
+- 8 retagging entries for VLANs 1 and 100 installed on its user ports
+ (``sw1p0`` - ``sw1p3``)
+- 3 retagging entries for VLAN 100 installed on the user ports of SJA1105
+ switch 2 (``sw2p0`` - ``sw2p2``), because it also has ports that are
+ interested in it. The VLAN 1 is a pvid on SJA1105 switch 2 and does not need
+ reverse retagging.
+
+SJA1105 switch 2 also consumes 11 retagging entries, but organized as follows:
+- 7 retagging entries for the bridge VLANs on its user ports (``sw2p0`` -
+ ``sw2p3``).
+- 4 retagging entries for VLAN 100 installed on the user ports of SJA1105
+ switch 1 (``sw1p0`` - ``sw1p3``).
Switching features
==================
@@ -230,6 +389,122 @@ simultaneously on two ports. The driver checks the consistency of the schedules
against this restriction and errors out when appropriate. Schedule analysis is
needed to avoid this, which is outside the scope of the document.
+Routing actions (redirect, trap, drop)
+--------------------------------------
+
+The switch is able to offload flow-based redirection of packets to a set of
+destination ports specified by the user. Internally, this is implemented by
+making use of Virtual Links, a TTEthernet concept.
+
+The driver supports 2 types of keys for Virtual Links:
+
+- VLAN-aware virtual links: these match on destination MAC address, VLAN ID and
+ VLAN PCP.
+- VLAN-unaware virtual links: these match on destination MAC address only.
+
+The VLAN awareness state of the bridge (vlan_filtering) cannot be changed while
+there are virtual link rules installed.
+
+Composing multiple actions inside the same rule is supported. When only routing
+actions are requested, the driver creates a "non-critical" virtual link. When
+the action list also contains tc-gate (more details below), the virtual link
+becomes "time-critical" (draws frame buffers from a reserved memory partition,
+etc).
+
+The 3 routing actions that are supported are "trap", "drop" and "redirect".
+
+Example 1: send frames received on swp2 with a DA of 42:be:24:9b:76:20 to the
+CPU and to swp3. This type of key (DA only) when the port's VLAN awareness
+state is off::
+
+ tc qdisc add dev swp2 clsact
+ tc filter add dev swp2 ingress flower skip_sw dst_mac 42:be:24:9b:76:20 \
+ action mirred egress redirect dev swp3 \
+ action trap
+
+Example 2: drop frames received on swp2 with a DA of 42:be:24:9b:76:20, a VID
+of 100 and a PCP of 0::
+
+ tc filter add dev swp2 ingress protocol 802.1Q flower skip_sw \
+ dst_mac 42:be:24:9b:76:20 vlan_id 100 vlan_prio 0 action drop
+
+Time-based ingress policing
+---------------------------
+
+The TTEthernet hardware abilities of the switch can be constrained to act
+similarly to the Per-Stream Filtering and Policing (PSFP) clause specified in
+IEEE 802.1Q-2018 (formerly 802.1Qci). This means it can be used to perform
+tight timing-based admission control for up to 1024 flows (identified by a
+tuple composed of destination MAC address, VLAN ID and VLAN PCP). Packets which
+are received outside their expected reception window are dropped.
+
+This capability can be managed through the offload of the tc-gate action. As
+routing actions are intrinsic to virtual links in TTEthernet (which performs
+explicit routing of time-critical traffic and does not leave that in the hands
+of the FDB, flooding etc), the tc-gate action may never appear alone when
+asking sja1105 to offload it. One (or more) redirect or trap actions must also
+follow along.
+
+Example: create a tc-taprio schedule that is phase-aligned with a tc-gate
+schedule (the clocks must be synchronized by a 1588 application stack, which is
+outside the scope of this document). No packet delivered by the sender will be
+dropped. Note that the reception window is larger than the transmission window
+(and much more so, in this example) to compensate for the packet propagation
+delay of the link (which can be determined by the 1588 application stack).
+
+Receiver (sja1105)::
+
+ tc qdisc add dev swp2 clsact
+ now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
+ sec=$(echo $now | awk -F. '{print $1}') && \
+ base_time="$(((sec + 2) * 1000000000))" && \
+ echo "base time ${base_time}"
+ tc filter add dev swp2 ingress flower skip_sw \
+ dst_mac 42:be:24:9b:76:20 \
+ action gate base-time ${base_time} \
+ sched-entry OPEN 60000 -1 -1 \
+ sched-entry CLOSE 40000 -1 -1 \
+ action trap
+
+Sender::
+
+ now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
+ sec=$(echo $now | awk -F. '{print $1}') && \
+ base_time="$(((sec + 2) * 1000000000))" && \
+ echo "base time ${base_time}"
+ tc qdisc add dev eno0 parent root taprio \
+ num_tc 8 \
+ map 0 1 2 3 4 5 6 7 \
+ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
+ base-time ${base_time} \
+ sched-entry S 01 50000 \
+ sched-entry S 00 50000 \
+ flags 2
+
+The engine used to schedule the ingress gate operations is the same that the
+one used for the tc-taprio offload. Therefore, the restrictions regarding the
+fact that no two gate actions (either tc-gate or tc-taprio gates) may fire at
+the same time (during the same 200 ns slot) still apply.
+
+To come in handy, it is possible to share time-triggered virtual links across
+more than 1 ingress port, via flow blocks. In this case, the restriction of
+firing at the same time does not apply because there is a single schedule in
+the system, that of the shared virtual link::
+
+ tc qdisc add dev swp2 ingress_block 1 clsact
+ tc qdisc add dev swp3 ingress_block 1 clsact
+ tc filter add block 1 flower skip_sw dst_mac 42:be:24:9b:76:20 \
+ action gate index 2 \
+ base-time 0 \
+ sched-entry OPEN 50000000 -1 -1 \
+ sched-entry CLOSE 50000000 -1 -1 \
+ action trap
+
+Hardware statistics for each flow are also available ("pkts" counts the number
+of dropped frames, which is a sum of frames dropped due to timing violations,
+lack of destination ports and MTU enforcement checks). Byte-level counters are
+not available.
+
Device Tree bindings and board design
=====================================
diff --git a/Documentation/networking/eql.txt b/Documentation/networking/eql.rst
index 0f1550150f05..a628c4c81166 100644
--- a/Documentation/networking/eql.txt
+++ b/Documentation/networking/eql.rst
@@ -1,5 +1,11 @@
- EQL Driver: Serial IP Load Balancing HOWTO
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================================
+EQL Driver: Serial IP Load Balancing HOWTO
+==========================================
+
Simon "Guru Aleph-Null" Janes, simon@ncm.com
+
v1.1, February 27, 1995
This is the manual for the EQL device driver. EQL is a software device
@@ -12,7 +18,8 @@
which was only created to patch cleanly in the very latest kernel
source trees. (Yes, it worked fine.)
- 1. Introduction
+1. Introduction
+===============
Which is worse? A huge fee for a 56K leased line or two phone lines?
It's probably the former. If you find yourself craving more bandwidth,
@@ -41,47 +48,40 @@
Hey, we can all dream you know...
- 2. Kernel Configuration
+2. Kernel Configuration
+=======================
Here I describe the general steps of getting a kernel up and working
with the eql driver. From patching, building, to installing.
- 2.1. Patching The Kernel
+2.1. Patching The Kernel
+------------------------
If you do not have or cannot get a copy of the kernel with the eql
driver folded into it, get your copy of the driver from
ftp://slaughter.ncm.com/pub/Linux/LOAD_BALANCING/eql-1.1.tar.gz.
Unpack this archive someplace obvious like /usr/local/src/. It will
- create the following files:
-
-
+ create the following files::
- ______________________________________________________________________
-rw-r--r-- guru/ncm 198 Jan 19 18:53 1995 eql-1.1/NO-WARRANTY
-rw-r--r-- guru/ncm 30620 Feb 27 21:40 1995 eql-1.1/eql-1.1.patch
-rwxr-xr-x guru/ncm 16111 Jan 12 22:29 1995 eql-1.1/eql_enslave
-rw-r--r-- guru/ncm 2195 Jan 10 21:48 1995 eql-1.1/eql_enslave.c
- ______________________________________________________________________
Unpack a recent kernel (something after 1.1.92) someplace convenient
like say /usr/src/linux-1.1.92.eql. Use symbolic links to point
/usr/src/linux to this development directory.
- Apply the patch by running the commands:
+ Apply the patch by running the commands::
-
- ______________________________________________________________________
cd /usr/src
patch </usr/local/src/eql-1.1/eql-1.1.patch
- ______________________________________________________________________
-
-
-
- 2.2. Building The Kernel
+2.2. Building The Kernel
+------------------------
After patching the kernel, run make config and configure the kernel
for your hardware.
@@ -90,7 +90,8 @@
After configuration, make and install according to your habit.
- 3. Network Configuration
+3. Network Configuration
+========================
So far, I have only used the eql device with the DSLIP SLIP connection
manager by Matt Dillon (-- "The man who sold his soul to code so much
@@ -100,37 +101,27 @@
connection.
- 3.1. /etc/rc.d/rc.inet1
+3.1. /etc/rc.d/rc.inet1
+-----------------------
In rc.inet1, ifconfig the eql device to the IP address you usually use
for your machine, and the MTU you prefer for your SLIP lines. One
could argue that MTU should be roughly half the usual size for two
modems, one-third for three, one-fourth for four, etc... But going
too far below 296 is probably overkill. Here is an example ifconfig
- command that sets up the eql device:
-
+ command that sets up the eql device::
-
- ______________________________________________________________________
ifconfig eql 198.67.33.239 mtu 1006
- ______________________________________________________________________
-
-
-
-
Once the eql device is up and running, add a static default route to
it in the routing table using the cool new route syntax that makes
- life so much easier:
+ life so much easier::
-
-
- ______________________________________________________________________
route add default eql
- ______________________________________________________________________
- 3.2. Enslaving Devices By Hand
+3.2. Enslaving Devices By Hand
+------------------------------
Enslaving devices by hand requires two utility programs: eql_enslave
and eql_emancipate (-- eql_emancipate hasn't been written because when
@@ -140,87 +131,56 @@
The syntax for enslaving a device is "eql_enslave <master-name>
- <slave-name> <estimated-bps>". Here are some example enslavings:
-
+ <slave-name> <estimated-bps>". Here are some example enslavings::
-
- ______________________________________________________________________
eql_enslave eql sl0 28800
eql_enslave eql ppp0 14400
eql_enslave eql sl1 57600
- ______________________________________________________________________
-
-
-
-
When you want to free a device from its life of slavery, you can
either down the device with ifconfig (eql will automatically bury the
dead slave and remove it from its queue) or use eql_emancipate to free
it. (-- Or just ifconfig it down, and the eql driver will take it out
- for you.--)
-
-
+ for you.--)::
- ______________________________________________________________________
eql_emancipate eql sl0
eql_emancipate eql ppp0
eql_emancipate eql sl1
- ______________________________________________________________________
-
-
-
- 3.3. DSLIP Configuration for the eql Device
+3.3. DSLIP Configuration for the eql Device
+-------------------------------------------
The general idea is to bring up and keep up as many SLIP connections
as you need, automatically.
- 3.3.1. /etc/slip/runslip.conf
-
- Here is an example runslip.conf:
-
-
-
-
-
-
-
-
-
-
-
+3.3.1. /etc/slip/runslip.conf
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ Here is an example runslip.conf::
+ name sl-line-1
+ enabled
+ baud 38400
+ mtu 576
+ ducmd -e /etc/slip/dialout/cua2-288.xp -t 9
+ command eql_enslave eql $interface 28800
+ address 198.67.33.239
+ line /dev/cua2
+ name sl-line-2
+ enabled
+ baud 38400
+ mtu 576
+ ducmd -e /etc/slip/dialout/cua3-288.xp -t 9
+ command eql_enslave eql $interface 28800
+ address 198.67.33.239
+ line /dev/cua3
- ______________________________________________________________________
- name sl-line-1
- enabled
- baud 38400
- mtu 576
- ducmd -e /etc/slip/dialout/cua2-288.xp -t 9
- command eql_enslave eql $interface 28800
- address 198.67.33.239
- line /dev/cua2
- name sl-line-2
- enabled
- baud 38400
- mtu 576
- ducmd -e /etc/slip/dialout/cua3-288.xp -t 9
- command eql_enslave eql $interface 28800
- address 198.67.33.239
- line /dev/cua3
- ______________________________________________________________________
-
-
-
-
-
- 3.4. Using PPP and the eql Device
+3.4. Using PPP and the eql Device
+---------------------------------
I have not yet done any load-balancing testing for PPP devices, mainly
because I don't have a PPP-connection manager like SLIP has with
@@ -235,7 +195,8 @@
year.
- 4. About the Slave Scheduler Algorithm
+4. About the Slave Scheduler Algorithm
+======================================
The slave scheduler probably could be replaced with a dozen other
things and push traffic much faster. The formula in the current set
@@ -254,7 +215,8 @@
traffic and the "slower" modem starved.
- 5. Testers' Reports
+5. Testers' Reports
+===================
Some people have experimented with the eql device with newer
kernels (than 1.1.75). I have since updated the driver to patch
@@ -262,87 +224,29 @@
balancing" driver config option.
- o icee from LinuxNET patched 1.1.86 without any rejects and was able
+ - icee from LinuxNET patched 1.1.86 without any rejects and was able
to boot the kernel and enslave a couple of ISDN PPP links.
- 5.1. Randolph Bentson's Test Report
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+5.1. Randolph Bentson's Test Report
+-----------------------------------
+ ::
+ From bentson@grieg.seaslug.org Wed Feb 8 19:08:09 1995
+ Date: Tue, 7 Feb 95 22:57 PST
+ From: Randolph Bentson <bentson@grieg.seaslug.org>
+ To: guru@ncm.com
+ Subject: EQL driver tests
+ I have been checking out your eql driver. (Nice work, that!)
+ Although you may already done this performance testing, here
+ are some data I've discovered.
+ Randolph Bentson
+ bentson@grieg.seaslug.org
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- From bentson@grieg.seaslug.org Wed Feb 8 19:08:09 1995
- Date: Tue, 7 Feb 95 22:57 PST
- From: Randolph Bentson <bentson@grieg.seaslug.org>
- To: guru@ncm.com
- Subject: EQL driver tests
-
-
- I have been checking out your eql driver. (Nice work, that!)
- Although you may already done this performance testing, here
- are some data I've discovered.
-
- Randolph Bentson
- bentson@grieg.seaslug.org
-
- ---------------------------------------------------------
+------------------------------------------------------------------
A pseudo-device driver, EQL, written by Simon Janes, can be used
@@ -363,7 +267,7 @@
Once a link was established, I timed a binary ftp transfer of
289284 bytes of data. If there were no overhead (packet headers,
inter-character and inter-packet delays, etc.) the transfers
- would take the following times:
+ would take the following times::
bits/sec seconds
345600 8.3
@@ -388,141 +292,82 @@
that the connection establishment seemed fragile for the higher
speeds. Once established, the connection seemed robust enough.)
- #lines speed mtu seconds theory actual %of
- kbit/sec duration speed speed max
- 3 115200 900 _ 345600
- 3 115200 400 18.1 345600 159825 46
- 2 115200 900 _ 230400
- 2 115200 600 18.1 230400 159825 69
- 2 115200 400 19.3 230400 149888 65
- 4 57600 900 _ 234600
- 4 57600 600 _ 234600
- 4 57600 400 _ 234600
- 3 57600 600 20.9 172800 138413 80
- 3 57600 900 21.2 172800 136455 78
- 3 115200 600 21.7 345600 133311 38
- 3 57600 400 22.5 172800 128571 74
- 4 38400 900 25.2 153600 114795 74
- 4 38400 600 26.4 153600 109577 71
- 4 38400 400 27.3 153600 105965 68
- 2 57600 900 29.1 115200 99410.3 86
- 1 115200 900 30.7 115200 94229.3 81
- 2 57600 600 30.2 115200 95789.4 83
- 3 38400 900 30.3 115200 95473.3 82
- 3 38400 600 31.2 115200 92719.2 80
- 1 115200 600 31.3 115200 92423 80
- 2 57600 400 32.3 115200 89561.6 77
- 1 115200 400 32.8 115200 88196.3 76
- 3 38400 400 33.5 115200 86353.4 74
- 2 38400 900 43.7 76800 66197.7 86
- 2 38400 600 44 76800 65746.4 85
- 2 38400 400 47.2 76800 61289 79
- 4 19200 900 50.8 76800 56945.7 74
- 4 19200 400 53.2 76800 54376.7 70
- 4 19200 600 53.7 76800 53870.4 70
- 1 57600 900 54.6 57600 52982.4 91
- 1 57600 600 56.2 57600 51474 89
- 3 19200 900 60.5 57600 47815.5 83
- 1 57600 400 60.2 57600 48053.8 83
- 3 19200 600 62 57600 46658.7 81
- 3 19200 400 64.7 57600 44711.6 77
- 1 38400 900 79.4 38400 36433.8 94
- 1 38400 600 82.4 38400 35107.3 91
- 2 19200 900 84.4 38400 34275.4 89
- 1 38400 400 86.8 38400 33327.6 86
- 2 19200 600 87.6 38400 33023.3 85
- 2 19200 400 91.2 38400 31719.7 82
- 4 9600 900 94.7 38400 30547.4 79
- 4 9600 400 106 38400 27290.9 71
- 4 9600 600 110 38400 26298.5 68
- 3 9600 900 118 28800 24515.6 85
- 3 9600 600 120 28800 24107 83
- 3 9600 400 131 28800 22082.7 76
- 1 19200 900 155 19200 18663.5 97
- 1 19200 600 161 19200 17968 93
- 1 19200 400 170 19200 17016.7 88
- 2 9600 600 176 19200 16436.6 85
- 2 9600 900 180 19200 16071.3 83
- 2 9600 400 181 19200 15982.5 83
- 1 9600 900 305 9600 9484.72 98
- 1 9600 600 314 9600 9212.87 95
- 1 9600 400 332 9600 8713.37 90
-
-
-
-
-
- 5.2. Anthony Healy's Report
-
-
-
-
-
-
-
- Date: Mon, 13 Feb 1995 16:17:29 +1100 (EST)
- From: Antony Healey <ahealey@st.nepean.uws.edu.au>
- To: Simon Janes <guru@ncm.com>
- Subject: Re: Load Balancing
-
- Hi Simon,
+ ====== ======== === ======== ======= ======= ===
+ #lines speed mtu seconds theory actual %of
+ kbit/sec duration speed speed max
+ ====== ======== === ======== ======= ======= ===
+ 3 115200 900 _ 345600
+ 3 115200 400 18.1 345600 159825 46
+ 2 115200 900 _ 230400
+ 2 115200 600 18.1 230400 159825 69
+ 2 115200 400 19.3 230400 149888 65
+ 4 57600 900 _ 234600
+ 4 57600 600 _ 234600
+ 4 57600 400 _ 234600
+ 3 57600 600 20.9 172800 138413 80
+ 3 57600 900 21.2 172800 136455 78
+ 3 115200 600 21.7 345600 133311 38
+ 3 57600 400 22.5 172800 128571 74
+ 4 38400 900 25.2 153600 114795 74
+ 4 38400 600 26.4 153600 109577 71
+ 4 38400 400 27.3 153600 105965 68
+ 2 57600 900 29.1 115200 99410.3 86
+ 1 115200 900 30.7 115200 94229.3 81
+ 2 57600 600 30.2 115200 95789.4 83
+ 3 38400 900 30.3 115200 95473.3 82
+ 3 38400 600 31.2 115200 92719.2 80
+ 1 115200 600 31.3 115200 92423 80
+ 2 57600 400 32.3 115200 89561.6 77
+ 1 115200 400 32.8 115200 88196.3 76
+ 3 38400 400 33.5 115200 86353.4 74
+ 2 38400 900 43.7 76800 66197.7 86
+ 2 38400 600 44 76800 65746.4 85
+ 2 38400 400 47.2 76800 61289 79
+ 4 19200 900 50.8 76800 56945.7 74
+ 4 19200 400 53.2 76800 54376.7 70
+ 4 19200 600 53.7 76800 53870.4 70
+ 1 57600 900 54.6 57600 52982.4 91
+ 1 57600 600 56.2 57600 51474 89
+ 3 19200 900 60.5 57600 47815.5 83
+ 1 57600 400 60.2 57600 48053.8 83
+ 3 19200 600 62 57600 46658.7 81
+ 3 19200 400 64.7 57600 44711.6 77
+ 1 38400 900 79.4 38400 36433.8 94
+ 1 38400 600 82.4 38400 35107.3 91
+ 2 19200 900 84.4 38400 34275.4 89
+ 1 38400 400 86.8 38400 33327.6 86
+ 2 19200 600 87.6 38400 33023.3 85
+ 2 19200 400 91.2 38400 31719.7 82
+ 4 9600 900 94.7 38400 30547.4 79
+ 4 9600 400 106 38400 27290.9 71
+ 4 9600 600 110 38400 26298.5 68
+ 3 9600 900 118 28800 24515.6 85
+ 3 9600 600 120 28800 24107 83
+ 3 9600 400 131 28800 22082.7 76
+ 1 19200 900 155 19200 18663.5 97
+ 1 19200 600 161 19200 17968 93
+ 1 19200 400 170 19200 17016.7 88
+ 2 9600 600 176 19200 16436.6 85
+ 2 9600 900 180 19200 16071.3 83
+ 2 9600 400 181 19200 15982.5 83
+ 1 9600 900 305 9600 9484.72 98
+ 1 9600 600 314 9600 9212.87 95
+ 1 9600 400 332 9600 8713.37 90
+ ====== ======== === ======== ======= ======= ===
+
+5.2. Anthony Healy's Report
+---------------------------
+
+ ::
+
+ Date: Mon, 13 Feb 1995 16:17:29 +1100 (EST)
+ From: Antony Healey <ahealey@st.nepean.uws.edu.au>
+ To: Simon Janes <guru@ncm.com>
+ Subject: Re: Load Balancing
+
+ Hi Simon,
I've installed your patch and it works great. I have trialed
it over twin SL/IP lines, just over null modems, but I was
able to data at over 48Kb/s [ISDN link -Simon]. I managed a
transfer of up to 7.5 Kbyte/s on one go, but averaged around
6.4 Kbyte/s, which I think is pretty cool. :)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index 567326491f80..d42661b91128 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -204,6 +204,8 @@ Userspace to kernel:
``ETHTOOL_MSG_EEE_GET`` get EEE settings
``ETHTOOL_MSG_EEE_SET`` set EEE settings
``ETHTOOL_MSG_TSINFO_GET`` get timestamping info
+ ``ETHTOOL_MSG_CABLE_TEST_ACT`` action start cable test
+ ``ETHTOOL_MSG_CABLE_TEST_TDR_ACT`` action start raw TDR cable test
===================================== ================================
Kernel to userspace:
@@ -235,6 +237,8 @@ Kernel to userspace:
``ETHTOOL_MSG_EEE_GET_REPLY`` EEE settings
``ETHTOOL_MSG_EEE_NTF`` EEE settings
``ETHTOOL_MSG_TSINFO_GET_REPLY`` timestamping info
+ ``ETHTOOL_MSG_CABLE_TEST_NTF`` Cable test results
+ ``ETHTOOL_MSG_CABLE_TEST_TDR_NTF`` Cable test TDR results
===================================== =================================
``GET`` requests are sent by userspace applications to retrieve device
@@ -392,14 +396,16 @@ Request contents:
Kernel response contents:
- ==================================== ====== ==========================
- ``ETHTOOL_A_LINKMODES_HEADER`` nested reply header
- ``ETHTOOL_A_LINKMODES_AUTONEG`` u8 autonegotiation status
- ``ETHTOOL_A_LINKMODES_OURS`` bitset advertised link modes
- ``ETHTOOL_A_LINKMODES_PEER`` bitset partner link modes
- ``ETHTOOL_A_LINKMODES_SPEED`` u32 link speed (Mb/s)
- ``ETHTOOL_A_LINKMODES_DUPLEX`` u8 duplex mode
- ==================================== ====== ==========================
+ ========================================== ====== ==========================
+ ``ETHTOOL_A_LINKMODES_HEADER`` nested reply header
+ ``ETHTOOL_A_LINKMODES_AUTONEG`` u8 autonegotiation status
+ ``ETHTOOL_A_LINKMODES_OURS`` bitset advertised link modes
+ ``ETHTOOL_A_LINKMODES_PEER`` bitset partner link modes
+ ``ETHTOOL_A_LINKMODES_SPEED`` u32 link speed (Mb/s)
+ ``ETHTOOL_A_LINKMODES_DUPLEX`` u8 duplex mode
+ ``ETHTOOL_A_LINKMODES_MASTER_SLAVE_CFG`` u8 Master/slave port mode
+ ``ETHTOOL_A_LINKMODES_MASTER_SLAVE_STATE`` u8 Master/slave port state
+ ========================================== ====== ==========================
For ``ETHTOOL_A_LINKMODES_OURS``, value represents advertised modes and mask
represents supported modes. ``ETHTOOL_A_LINKMODES_PEER`` in the reply is a bit
@@ -414,14 +420,15 @@ LINKMODES_SET
Request contents:
- ==================================== ====== ==========================
- ``ETHTOOL_A_LINKMODES_HEADER`` nested request header
- ``ETHTOOL_A_LINKMODES_AUTONEG`` u8 autonegotiation status
- ``ETHTOOL_A_LINKMODES_OURS`` bitset advertised link modes
- ``ETHTOOL_A_LINKMODES_PEER`` bitset partner link modes
- ``ETHTOOL_A_LINKMODES_SPEED`` u32 link speed (Mb/s)
- ``ETHTOOL_A_LINKMODES_DUPLEX`` u8 duplex mode
- ==================================== ====== ==========================
+ ========================================== ====== ==========================
+ ``ETHTOOL_A_LINKMODES_HEADER`` nested request header
+ ``ETHTOOL_A_LINKMODES_AUTONEG`` u8 autonegotiation status
+ ``ETHTOOL_A_LINKMODES_OURS`` bitset advertised link modes
+ ``ETHTOOL_A_LINKMODES_PEER`` bitset partner link modes
+ ``ETHTOOL_A_LINKMODES_SPEED`` u32 link speed (Mb/s)
+ ``ETHTOOL_A_LINKMODES_DUPLEX`` u8 duplex mode
+ ``ETHTOOL_A_LINKMODES_MASTER_SLAVE_CFG`` u8 Master/slave port mode
+ ========================================== ====== ==========================
``ETHTOOL_A_LINKMODES_OURS`` bit set allows setting advertised link modes. If
autonegotiation is on (either set now or kept from before), advertised modes
@@ -449,10 +456,12 @@ Request contents:
Kernel response contents:
- ==================================== ====== ==========================
+ ==================================== ====== ============================
``ETHTOOL_A_LINKSTATE_HEADER`` nested reply header
``ETHTOOL_A_LINKSTATE_LINK`` bool link state (up/down)
- ==================================== ====== ==========================
+ ``ETHTOOL_A_LINKSTATE_SQI`` u32 Current Signal Quality Index
+ ``ETHTOOL_A_LINKSTATE_SQI_MAX`` u32 Max support SQI value
+ ==================================== ====== ============================
For most NIC drivers, the value of ``ETHTOOL_A_LINKSTATE_LINK`` returns
carrier flag provided by ``netif_carrier_ok()`` but there are drivers which
@@ -955,13 +964,159 @@ Kernel response contents:
is no special value for this case). The bitset attributes are omitted if they
would be empty (no bit set).
+CABLE_TEST
+==========
+
+Start a cable test.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_CABLE_TEST_HEADER`` nested request header
+ ==================================== ====== ==========================
+
+Notification contents:
+
+An Ethernet cable typically contains 1, 2 or 4 pairs. The length of
+the pair can only be measured when there is a fault in the pair and
+hence a reflection. Information about the fault may not be available,
+depending on the specific hardware. Hence the contents of the notify
+message are mostly optional. The attributes can be repeated an
+arbitrary number of times, in an arbitrary order, for an arbitrary
+number of pairs.
+
+The example shows the notification sent when the test is completed for
+a T2 cable, i.e. two pairs. One pair is OK and hence has no length
+information. The second pair has a fault and does have length
+information.
+
+ +---------------------------------------------+--------+---------------------+
+ | ``ETHTOOL_A_CABLE_TEST_HEADER`` | nested | reply header |
+ +---------------------------------------------+--------+---------------------+
+ | ``ETHTOOL_A_CABLE_TEST_STATUS`` | u8 | completed |
+ +---------------------------------------------+--------+---------------------+
+ | ``ETHTOOL_A_CABLE_TEST_NTF_NEST`` | nested | all the results |
+ +-+-------------------------------------------+--------+---------------------+
+ | | ``ETHTOOL_A_CABLE_NEST_RESULT`` | nested | cable test result |
+ +-+-+-----------------------------------------+--------+---------------------+
+ | | | ``ETHTOOL_A_CABLE_RESULTS_PAIR`` | u8 | pair number |
+ +-+-+-----------------------------------------+--------+---------------------+
+ | | | ``ETHTOOL_A_CABLE_RESULTS_CODE`` | u8 | result code |
+ +-+-+-----------------------------------------+--------+---------------------+
+ | | ``ETHTOOL_A_CABLE_NEST_RESULT`` | nested | cable test results |
+ +-+-+-----------------------------------------+--------+---------------------+
+ | | | ``ETHTOOL_A_CABLE_RESULTS_PAIR`` | u8 | pair number |
+ +-+-+-----------------------------------------+--------+---------------------+
+ | | | ``ETHTOOL_A_CABLE_RESULTS_CODE`` | u8 | result code |
+ +-+-+-----------------------------------------+--------+---------------------+
+ | | ``ETHTOOL_A_CABLE_NEST_FAULT_LENGTH`` | nested | cable length |
+ +-+-+-----------------------------------------+--------+---------------------+
+ | | | ``ETHTOOL_A_CABLE_FAULT_LENGTH_PAIR`` | u8 | pair number |
+ +-+-+-----------------------------------------+--------+---------------------+
+ | | | ``ETHTOOL_A_CABLE_FAULT_LENGTH_CM`` | u32 | length in cm |
+ +-+-+-----------------------------------------+--------+---------------------+
+
+CABLE_TEST TDR
+==============
+
+Start a cable test and report raw TDR data
+
+Request contents:
+
+ +--------------------------------------------+--------+-----------------------+
+ | ``ETHTOOL_A_CABLE_TEST_TDR_HEADER`` | nested | reply header |
+ +--------------------------------------------+--------+-----------------------+
+ | ``ETHTOOL_A_CABLE_TEST_TDR_CFG`` | nested | test configuration |
+ +-+------------------------------------------+--------+-----------------------+
+ | | ``ETHTOOL_A_CABLE_STEP_FIRST_DISTANCE `` | u32 | first data distance |
+ +-+-+----------------------------------------+--------+-----------------------+
+ | | ``ETHTOOL_A_CABLE_STEP_LAST_DISTANCE `` | u32 | last data distance |
+ +-+-+----------------------------------------+--------+-----------------------+
+ | | ``ETHTOOL_A_CABLE_STEP_STEP_DISTANCE `` | u32 | distance of each step |
+ +-+-+----------------------------------------+--------+-----------------------+
+ | | ``ETHTOOL_A_CABLE_TEST_TDR_CFG_PAIR`` | u8 | pair to test |
+ +-+-+----------------------------------------+--------+-----------------------+
+
+The ETHTOOL_A_CABLE_TEST_TDR_CFG is optional, as well as all members
+of the nest. All distances are expressed in centimeters. The PHY takes
+the distances as a guide, and rounds to the nearest distance it
+actually supports. If a pair is passed, only that one pair will be
+tested. Otherwise all pairs are tested.
+
+Notification contents:
+
+Raw TDR data is gathered by sending a pulse down the cable and
+recording the amplitude of the reflected pulse for a given distance.
+
+It can take a number of seconds to collect TDR data, especial if the
+full 100 meters is probed at 1 meter intervals. When the test is
+started a notification will be sent containing just
+ETHTOOL_A_CABLE_TEST_TDR_STATUS with the value
+ETHTOOL_A_CABLE_TEST_NTF_STATUS_STARTED.
+
+When the test has completed a second notification will be sent
+containing ETHTOOL_A_CABLE_TEST_TDR_STATUS with the value
+ETHTOOL_A_CABLE_TEST_NTF_STATUS_COMPLETED and the TDR data.
+
+The message may optionally contain the amplitude of the pulse send
+down the cable. This is measured in mV. A reflection should not be
+bigger than transmitted pulse.
+
+Before the raw TDR data should be an ETHTOOL_A_CABLE_TDR_NEST_STEP
+nest containing information about the distance along the cable for the
+first reading, the last reading, and the step between each
+reading. Distances are measured in centimeters. These should be the
+exact values the PHY used. These may be different to what the user
+requested, if the native measurement resolution is greater than 1 cm.
+
+For each step along the cable, a ETHTOOL_A_CABLE_TDR_NEST_AMPLITUDE is
+used to report the amplitude of the reflection for a given pair.
+
+ +---------------------------------------------+--------+----------------------+
+ | ``ETHTOOL_A_CABLE_TEST_TDR_HEADER`` | nested | reply header |
+ +---------------------------------------------+--------+----------------------+
+ | ``ETHTOOL_A_CABLE_TEST_TDR_STATUS`` | u8 | completed |
+ +---------------------------------------------+--------+----------------------+
+ | ``ETHTOOL_A_CABLE_TEST_TDR_NTF_NEST`` | nested | all the results |
+ +-+-------------------------------------------+--------+----------------------+
+ | | ``ETHTOOL_A_CABLE_TDR_NEST_PULSE`` | nested | TX Pulse amplitude |
+ +-+-+-----------------------------------------+--------+----------------------+
+ | | | ``ETHTOOL_A_CABLE_PULSE_mV`` | s16 | Pulse amplitude |
+ +-+-+-----------------------------------------+--------+----------------------+
+ | | ``ETHTOOL_A_CABLE_NEST_STEP`` | nested | TDR step info |
+ +-+-+-----------------------------------------+--------+----------------------+
+ | | | ``ETHTOOL_A_CABLE_STEP_FIRST_DISTANCE ``| u32 | First data distance |
+ +-+-+-----------------------------------------+--------+----------------------+
+ | | | ``ETHTOOL_A_CABLE_STEP_LAST_DISTANCE `` | u32 | Last data distance |
+ +-+-+-----------------------------------------+--------+----------------------+
+ | | | ``ETHTOOL_A_CABLE_STEP_STEP_DISTANCE `` | u32 | distance of each step|
+ +-+-+-----------------------------------------+--------+----------------------+
+ | | ``ETHTOOL_A_CABLE_TDR_NEST_AMPLITUDE`` | nested | Reflection amplitude |
+ +-+-+-----------------------------------------+--------+----------------------+
+ | | | ``ETHTOOL_A_CABLE_RESULTS_PAIR`` | u8 | pair number |
+ +-+-+-----------------------------------------+--------+----------------------+
+ | | | ``ETHTOOL_A_CABLE_AMPLITUDE_mV`` | s16 | Reflection amplitude |
+ +-+-+-----------------------------------------+--------+----------------------+
+ | | ``ETHTOOL_A_CABLE_TDR_NEST_AMPLITUDE`` | nested | Reflection amplitude |
+ +-+-+-----------------------------------------+--------+----------------------+
+ | | | ``ETHTOOL_A_CABLE_RESULTS_PAIR`` | u8 | pair number |
+ +-+-+-----------------------------------------+--------+----------------------+
+ | | | ``ETHTOOL_A_CABLE_AMPLITUDE_mV`` | s16 | Reflection amplitude |
+ +-+-+-----------------------------------------+--------+----------------------+
+ | | ``ETHTOOL_A_CABLE_TDR_NEST_AMPLITUDE`` | nested | Reflection amplitude |
+ +-+-+-----------------------------------------+--------+----------------------+
+ | | | ``ETHTOOL_A_CABLE_RESULTS_PAIR`` | u8 | pair number |
+ +-+-+-----------------------------------------+--------+----------------------+
+ | | | ``ETHTOOL_A_CABLE_AMPLITUDE_mV`` | s16 | Reflection amplitude |
+ +-+-+-----------------------------------------+--------+----------------------+
Request translation
===================
The following table maps ioctl commands to netlink commands providing their
functionality. Entries with "n/a" in right column are commands which do not
-have their netlink replacement yet.
+have their netlink replacement yet. Entries which "n/a" in the left column
+are netlink only.
=================================== =====================================
ioctl command netlink command
@@ -1050,4 +1205,6 @@ have their netlink replacement yet.
``ETHTOOL_PHY_STUNABLE`` n/a
``ETHTOOL_GFECPARAM`` n/a
``ETHTOOL_SFECPARAM`` n/a
+ n/a ''ETHTOOL_MSG_CABLE_TEST_ACT''
+ n/a ''ETHTOOL_MSG_CABLE_TEST_TDR_ACT''
=================================== =====================================
diff --git a/Documentation/networking/fib_trie.txt b/Documentation/networking/fib_trie.rst
index fe719388518b..f1435b7fcdb7 100644
--- a/Documentation/networking/fib_trie.txt
+++ b/Documentation/networking/fib_trie.rst
@@ -1,8 +1,12 @@
- LC-trie implementation notes.
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+LC-trie implementation notes
+============================
Node types
----------
-leaf
+leaf
An end node with data. This has a copy of the relevant key, along
with 'hlist' with routing table entries sorted by prefix length.
See struct leaf and struct leaf_info.
@@ -13,7 +17,7 @@ trie node or tnode
A few concepts explained
------------------------
-Bits (tnode)
+Bits (tnode)
The number of bits in the key segment used for indexing into the
child array - the "child index". See Level Compression.
@@ -23,7 +27,7 @@ Pos (tnode)
Path Compression / skipped bits
Any given tnode is linked to from the child array of its parent, using
- a segment of the key specified by the parent's "pos" and "bits"
+ a segment of the key specified by the parent's "pos" and "bits"
In certain cases, this tnode's own "pos" will not be immediately
adjacent to the parent (pos+bits), but there will be some bits
in the key skipped over because they represent a single path with no
@@ -56,8 +60,8 @@ full_children
Comments
---------
-We have tried to keep the structure of the code as close to fib_hash as
-possible to allow verification and help up reviewing.
+We have tried to keep the structure of the code as close to fib_hash as
+possible to allow verification and help up reviewing.
fib_find_node()
A good start for understanding this code. This function implements a
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.rst
index 2f0f8b17dade..a1d3e192b9fa 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.rst
@@ -1,3 +1,6 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================================
Linux Socket Filtering aka Berkeley Packet Filter (BPF)
=======================================================
@@ -42,10 +45,10 @@ displays what is being placed into this structure.
Although we were only speaking about sockets here, BPF in Linux is used
in many more places. There's xt_bpf for netfilter, cls_bpf in the kernel
-qdisc layer, SECCOMP-BPF (SECure COMPuting [1]), and lots of other places
+qdisc layer, SECCOMP-BPF (SECure COMPuting [1]_), and lots of other places
such as team driver, PTP code, etc where BPF is being used.
- [1] Documentation/userspace-api/seccomp_filter.rst
+.. [1] Documentation/userspace-api/seccomp_filter.rst
Original BPF paper:
@@ -59,23 +62,23 @@ Structure
---------
User space applications include <linux/filter.h> which contains the
-following relevant structures:
+following relevant structures::
-struct sock_filter { /* Filter block */
- __u16 code; /* Actual filter code */
- __u8 jt; /* Jump true */
- __u8 jf; /* Jump false */
- __u32 k; /* Generic multiuse field */
-};
+ struct sock_filter { /* Filter block */
+ __u16 code; /* Actual filter code */
+ __u8 jt; /* Jump true */
+ __u8 jf; /* Jump false */
+ __u32 k; /* Generic multiuse field */
+ };
Such a structure is assembled as an array of 4-tuples, that contains
a code, jt, jf and k value. jt and jf are jump offsets and k a generic
-value to be used for a provided code.
+value to be used for a provided code::
-struct sock_fprog { /* Required for SO_ATTACH_FILTER. */
- unsigned short len; /* Number of filter blocks */
- struct sock_filter __user *filter;
-};
+ struct sock_fprog { /* Required for SO_ATTACH_FILTER. */
+ unsigned short len; /* Number of filter blocks */
+ struct sock_filter __user *filter;
+ };
For socket filtering, a pointer to this structure (as shown in
follow-up example) is being passed to the kernel through setsockopt(2).
@@ -83,55 +86,57 @@ follow-up example) is being passed to the kernel through setsockopt(2).
Example
-------
-#include <sys/socket.h>
-#include <sys/types.h>
-#include <arpa/inet.h>
-#include <linux/if_ether.h>
-/* ... */
-
-/* From the example above: tcpdump -i em1 port 22 -dd */
-struct sock_filter code[] = {
- { 0x28, 0, 0, 0x0000000c },
- { 0x15, 0, 8, 0x000086dd },
- { 0x30, 0, 0, 0x00000014 },
- { 0x15, 2, 0, 0x00000084 },
- { 0x15, 1, 0, 0x00000006 },
- { 0x15, 0, 17, 0x00000011 },
- { 0x28, 0, 0, 0x00000036 },
- { 0x15, 14, 0, 0x00000016 },
- { 0x28, 0, 0, 0x00000038 },
- { 0x15, 12, 13, 0x00000016 },
- { 0x15, 0, 12, 0x00000800 },
- { 0x30, 0, 0, 0x00000017 },
- { 0x15, 2, 0, 0x00000084 },
- { 0x15, 1, 0, 0x00000006 },
- { 0x15, 0, 8, 0x00000011 },
- { 0x28, 0, 0, 0x00000014 },
- { 0x45, 6, 0, 0x00001fff },
- { 0xb1, 0, 0, 0x0000000e },
- { 0x48, 0, 0, 0x0000000e },
- { 0x15, 2, 0, 0x00000016 },
- { 0x48, 0, 0, 0x00000010 },
- { 0x15, 0, 1, 0x00000016 },
- { 0x06, 0, 0, 0x0000ffff },
- { 0x06, 0, 0, 0x00000000 },
-};
-
-struct sock_fprog bpf = {
- .len = ARRAY_SIZE(code),
- .filter = code,
-};
-
-sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
-if (sock < 0)
- /* ... bail out ... */
-
-ret = setsockopt(sock, SOL_SOCKET, SO_ATTACH_FILTER, &bpf, sizeof(bpf));
-if (ret < 0)
- /* ... bail out ... */
-
-/* ... */
-close(sock);
+::
+
+ #include <sys/socket.h>
+ #include <sys/types.h>
+ #include <arpa/inet.h>
+ #include <linux/if_ether.h>
+ /* ... */
+
+ /* From the example above: tcpdump -i em1 port 22 -dd */
+ struct sock_filter code[] = {
+ { 0x28, 0, 0, 0x0000000c },
+ { 0x15, 0, 8, 0x000086dd },
+ { 0x30, 0, 0, 0x00000014 },
+ { 0x15, 2, 0, 0x00000084 },
+ { 0x15, 1, 0, 0x00000006 },
+ { 0x15, 0, 17, 0x00000011 },
+ { 0x28, 0, 0, 0x00000036 },
+ { 0x15, 14, 0, 0x00000016 },
+ { 0x28, 0, 0, 0x00000038 },
+ { 0x15, 12, 13, 0x00000016 },
+ { 0x15, 0, 12, 0x00000800 },
+ { 0x30, 0, 0, 0x00000017 },
+ { 0x15, 2, 0, 0x00000084 },
+ { 0x15, 1, 0, 0x00000006 },
+ { 0x15, 0, 8, 0x00000011 },
+ { 0x28, 0, 0, 0x00000014 },
+ { 0x45, 6, 0, 0x00001fff },
+ { 0xb1, 0, 0, 0x0000000e },
+ { 0x48, 0, 0, 0x0000000e },
+ { 0x15, 2, 0, 0x00000016 },
+ { 0x48, 0, 0, 0x00000010 },
+ { 0x15, 0, 1, 0x00000016 },
+ { 0x06, 0, 0, 0x0000ffff },
+ { 0x06, 0, 0, 0x00000000 },
+ };
+
+ struct sock_fprog bpf = {
+ .len = ARRAY_SIZE(code),
+ .filter = code,
+ };
+
+ sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+ if (sock < 0)
+ /* ... bail out ... */
+
+ ret = setsockopt(sock, SOL_SOCKET, SO_ATTACH_FILTER, &bpf, sizeof(bpf));
+ if (ret < 0)
+ /* ... bail out ... */
+
+ /* ... */
+ close(sock);
The above example code attaches a socket filter for a PF_PACKET socket
in order to let all IPv4/IPv6 packets with port 22 pass. The rest will
@@ -178,15 +183,17 @@ closely modelled after Steven McCanne's and Van Jacobson's BPF paper.
The BPF architecture consists of the following basic elements:
+ ======= ====================================================
Element Description
-
+ ======= ====================================================
A 32 bit wide accumulator
X 32 bit wide X register
M[] 16 x 32 bit wide misc registers aka "scratch memory
- store", addressable from 0 to 15
+ store", addressable from 0 to 15
+ ======= ====================================================
A program, that is translated by bpf_asm into "opcodes" is an array that
-consists of the following elements (as already mentioned):
+consists of the following elements (as already mentioned)::
op:16, jt:8, jf:8, k:32
@@ -201,8 +208,9 @@ and return instructions that are also represented in bpf_asm syntax. This
table lists all bpf_asm instructions available resp. what their underlying
opcodes as defined in linux/filter.h stand for:
+ =========== =================== =====================
Instruction Addressing mode Description
-
+ =========== =================== =====================
ld 1, 2, 3, 4, 12 Load word into A
ldi 4 Load word into A
ldh 1, 2 Load half-word into A
@@ -241,11 +249,13 @@ opcodes as defined in linux/filter.h stand for:
txa Copy X into A
ret 4, 11 Return
+ =========== =================== =====================
The next table shows addressing formats from the 2nd column:
+ =============== =================== ===============================================
Addressing mode Syntax Description
-
+ =============== =================== ===============================================
0 x/%x Register X
1 [k] BHW at byte offset k in the packet
2 [x + k] BHW at the offset X + k in the packet
@@ -259,6 +269,7 @@ The next table shows addressing formats from the 2nd column:
10 x/%x,Lt Jump to Lt if predicate is true
11 a/%a Accumulator A
12 extension BPF extension
+ =============== =================== ===============================================
The Linux kernel also has a couple of BPF extensions that are used along
with the class of load instructions by "overloading" the k argument with
@@ -267,8 +278,9 @@ extensions are loaded into A.
Possible BPF extensions are shown in the following table:
+ =================================== =================================================
Extension Description
-
+ =================================== =================================================
len skb->len
proto skb->protocol
type skb->pkt_type
@@ -285,18 +297,19 @@ Possible BPF extensions are shown in the following table:
vlan_avail skb_vlan_tag_present(skb)
vlan_tpid skb->vlan_proto
rand prandom_u32()
+ =================================== =================================================
These extensions can also be prefixed with '#'.
Examples for low-level BPF:
-** ARP packets:
+**ARP packets**::
ldh [12]
jne #0x806, drop
ret #-1
drop: ret #0
-** IPv4 TCP packets:
+**IPv4 TCP packets**::
ldh [12]
jne #0x800, drop
@@ -305,14 +318,15 @@ Examples for low-level BPF:
ret #-1
drop: ret #0
-** (Accelerated) VLAN w/ id 10:
+**(Accelerated) VLAN w/ id 10**::
ld vlan_tci
jneq #10, drop
ret #-1
drop: ret #0
-** icmp random packet sampling, 1 in 4
+**icmp random packet sampling, 1 in 4**:
+
ldh [12]
jne #0x800, drop
ldb [23]
@@ -324,7 +338,7 @@ Examples for low-level BPF:
ret #-1
drop: ret #0
-** SECCOMP filter example:
+**SECCOMP filter example**::
ld [4] /* offsetof(struct seccomp_data, arch) */
jne #0xc000003e, bad /* AUDIT_ARCH_X86_64 */
@@ -345,18 +359,18 @@ Examples for low-level BPF:
The above example code can be placed into a file (here called "foo"), and
then be passed to the bpf_asm tool for generating opcodes, output that xt_bpf
and cls_bpf understands and can directly be loaded with. Example with above
-ARP code:
+ARP code::
-$ ./bpf_asm foo
-4,40 0 0 12,21 0 1 2054,6 0 0 4294967295,6 0 0 0,
+ $ ./bpf_asm foo
+ 4,40 0 0 12,21 0 1 2054,6 0 0 4294967295,6 0 0 0,
-In copy and paste C-like output:
+In copy and paste C-like output::
-$ ./bpf_asm -c foo
-{ 0x28, 0, 0, 0x0000000c },
-{ 0x15, 0, 1, 0x00000806 },
-{ 0x06, 0, 0, 0xffffffff },
-{ 0x06, 0, 0, 0000000000 },
+ $ ./bpf_asm -c foo
+ { 0x28, 0, 0, 0x0000000c },
+ { 0x15, 0, 1, 0x00000806 },
+ { 0x06, 0, 0, 0xffffffff },
+ { 0x06, 0, 0, 0000000000 },
In particular, as usage with xt_bpf or cls_bpf can result in more complex BPF
filters that might not be obvious at first, it's good to test filters before
@@ -365,9 +379,9 @@ bpf_dbg under tools/bpf/ in the kernel source directory. This debugger allows
for testing BPF filters against given pcap files, single stepping through the
BPF code on the pcap's packets and to do BPF machine register dumps.
-Starting bpf_dbg is trivial and just requires issuing:
+Starting bpf_dbg is trivial and just requires issuing::
-# ./bpf_dbg
+ # ./bpf_dbg
In case input and output do not equal stdin/stdout, bpf_dbg takes an
alternative stdin source as a first argument, and an alternative stdout
@@ -381,84 +395,100 @@ Interaction in bpf_dbg happens through a shell that also has auto-completion
support (follow-up example commands starting with '>' denote bpf_dbg shell).
The usual workflow would be to ...
-> load bpf 6,40 0 0 12,21 0 3 2048,48 0 0 23,21 0 1 1,6 0 0 65535,6 0 0 0
+* load bpf 6,40 0 0 12,21 0 3 2048,48 0 0 23,21 0 1 1,6 0 0 65535,6 0 0 0
Loads a BPF filter from standard output of bpf_asm, or transformed via
- e.g. `tcpdump -iem1 -ddd port 22 | tr '\n' ','`. Note that for JIT
+ e.g. ``tcpdump -iem1 -ddd port 22 | tr '\n' ','``. Note that for JIT
debugging (next section), this command creates a temporary socket and
loads the BPF code into the kernel. Thus, this will also be useful for
JIT developers.
-> load pcap foo.pcap
+* load pcap foo.pcap
+
Loads standard tcpdump pcap file.
-> run [<n>]
+* run [<n>]
+
bpf passes:1 fails:9
Runs through all packets from a pcap to account how many passes and fails
the filter will generate. A limit of packets to traverse can be given.
-> disassemble
-l0: ldh [12]
-l1: jeq #0x800, l2, l5
-l2: ldb [23]
-l3: jeq #0x1, l4, l5
-l4: ret #0xffff
-l5: ret #0
+* disassemble::
+
+ l0: ldh [12]
+ l1: jeq #0x800, l2, l5
+ l2: ldb [23]
+ l3: jeq #0x1, l4, l5
+ l4: ret #0xffff
+ l5: ret #0
+
Prints out BPF code disassembly.
-> dump
-/* { op, jt, jf, k }, */
-{ 0x28, 0, 0, 0x0000000c },
-{ 0x15, 0, 3, 0x00000800 },
-{ 0x30, 0, 0, 0x00000017 },
-{ 0x15, 0, 1, 0x00000001 },
-{ 0x06, 0, 0, 0x0000ffff },
-{ 0x06, 0, 0, 0000000000 },
+* dump::
+
+ /* { op, jt, jf, k }, */
+ { 0x28, 0, 0, 0x0000000c },
+ { 0x15, 0, 3, 0x00000800 },
+ { 0x30, 0, 0, 0x00000017 },
+ { 0x15, 0, 1, 0x00000001 },
+ { 0x06, 0, 0, 0x0000ffff },
+ { 0x06, 0, 0, 0000000000 },
+
Prints out C-style BPF code dump.
-> breakpoint 0
-breakpoint at: l0: ldh [12]
-> breakpoint 1
-breakpoint at: l1: jeq #0x800, l2, l5
+* breakpoint 0::
+
+ breakpoint at: l0: ldh [12]
+
+* breakpoint 1::
+
+ breakpoint at: l1: jeq #0x800, l2, l5
+
...
+
Sets breakpoints at particular BPF instructions. Issuing a `run` command
will walk through the pcap file continuing from the current packet and
break when a breakpoint is being hit (another `run` will continue from
the currently active breakpoint executing next instructions):
- > run
- -- register dump --
- pc: [0] <-- program counter
- code: [40] jt[0] jf[0] k[12] <-- plain BPF code of current instruction
- curr: l0: ldh [12] <-- disassembly of current instruction
- A: [00000000][0] <-- content of A (hex, decimal)
- X: [00000000][0] <-- content of X (hex, decimal)
- M[0,15]: [00000000][0] <-- folded content of M (hex, decimal)
- -- packet dump -- <-- Current packet from pcap (hex)
- len: 42
- 0: 00 19 cb 55 55 a4 00 14 a4 43 78 69 08 06 00 01
- 16: 08 00 06 04 00 01 00 14 a4 43 78 69 0a 3b 01 26
- 32: 00 00 00 00 00 00 0a 3b 01 01
- (breakpoint)
- >
-
-> breakpoint
-breakpoints: 0 1
- Prints currently set breakpoints.
-
-> step [-<n>, +<n>]
+ * run::
+
+ -- register dump --
+ pc: [0] <-- program counter
+ code: [40] jt[0] jf[0] k[12] <-- plain BPF code of current instruction
+ curr: l0: ldh [12] <-- disassembly of current instruction
+ A: [00000000][0] <-- content of A (hex, decimal)
+ X: [00000000][0] <-- content of X (hex, decimal)
+ M[0,15]: [00000000][0] <-- folded content of M (hex, decimal)
+ -- packet dump -- <-- Current packet from pcap (hex)
+ len: 42
+ 0: 00 19 cb 55 55 a4 00 14 a4 43 78 69 08 06 00 01
+ 16: 08 00 06 04 00 01 00 14 a4 43 78 69 0a 3b 01 26
+ 32: 00 00 00 00 00 00 0a 3b 01 01
+ (breakpoint)
+ >
+
+ * breakpoint::
+
+ breakpoints: 0 1
+
+ Prints currently set breakpoints.
+
+* step [-<n>, +<n>]
+
Performs single stepping through the BPF program from the current pc
offset. Thus, on each step invocation, above register dump is issued.
This can go forwards and backwards in time, a plain `step` will break
on the next BPF instruction, thus +1. (No `run` needs to be issued here.)
-> select <n>
+* select <n>
+
Selects a given packet from the pcap file to continue from. Thus, on
the next `run` or `step`, the BPF program is being evaluated against
the user pre-selected packet. Numbering starts just as in Wireshark
with index 1.
-> quit
-#
+* quit
+
Exits bpf_dbg.
JIT compiler
@@ -468,23 +498,23 @@ The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC,
PowerPC, ARM, ARM64, MIPS, RISC-V and s390 and can be enabled through
CONFIG_BPF_JIT. The JIT compiler is transparently invoked for each
attached filter from user space or for internal kernel users if it has
-been previously enabled by root:
+been previously enabled by root::
echo 1 > /proc/sys/net/core/bpf_jit_enable
For JIT developers, doing audits etc, each compile run can output the generated
-opcode image into the kernel log via:
+opcode image into the kernel log via::
echo 2 > /proc/sys/net/core/bpf_jit_enable
-Example output from dmesg:
+Example output from dmesg::
-[ 3389.935842] flen=6 proglen=70 pass=3 image=ffffffffa0069c8f
-[ 3389.935847] JIT code: 00000000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 68
-[ 3389.935849] JIT code: 00000010: 44 2b 4f 6c 4c 8b 87 d8 00 00 00 be 0c 00 00 00
-[ 3389.935850] JIT code: 00000020: e8 1d 94 ff e0 3d 00 08 00 00 75 16 be 17 00 00
-[ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
-[ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
+ [ 3389.935842] flen=6 proglen=70 pass=3 image=ffffffffa0069c8f
+ [ 3389.935847] JIT code: 00000000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 68
+ [ 3389.935849] JIT code: 00000010: 44 2b 4f 6c 4c 8b 87 d8 00 00 00 be 0c 00 00 00
+ [ 3389.935850] JIT code: 00000020: e8 1d 94 ff e0 3d 00 08 00 00 75 16 be 17 00 00
+ [ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
+ [ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and
setting any other value than that will return in failure. This is even the case for
@@ -493,78 +523,78 @@ is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is t
generally recommended approach instead.
In the kernel source tree under tools/bpf/, there's bpf_jit_disasm for
-generating disassembly out of the kernel log's hexdump:
-
-# ./bpf_jit_disasm
-70 bytes emitted from JIT compiler (pass:3, flen:6)
-ffffffffa0069c8f + <x>:
- 0: push %rbp
- 1: mov %rsp,%rbp
- 4: sub $0x60,%rsp
- 8: mov %rbx,-0x8(%rbp)
- c: mov 0x68(%rdi),%r9d
- 10: sub 0x6c(%rdi),%r9d
- 14: mov 0xd8(%rdi),%r8
- 1b: mov $0xc,%esi
- 20: callq 0xffffffffe0ff9442
- 25: cmp $0x800,%eax
- 2a: jne 0x0000000000000042
- 2c: mov $0x17,%esi
- 31: callq 0xffffffffe0ff945e
- 36: cmp $0x1,%eax
- 39: jne 0x0000000000000042
- 3b: mov $0xffff,%eax
- 40: jmp 0x0000000000000044
- 42: xor %eax,%eax
- 44: leaveq
- 45: retq
-
-Issuing option `-o` will "annotate" opcodes to resulting assembler
-instructions, which can be very useful for JIT developers:
-
-# ./bpf_jit_disasm -o
-70 bytes emitted from JIT compiler (pass:3, flen:6)
-ffffffffa0069c8f + <x>:
- 0: push %rbp
- 55
- 1: mov %rsp,%rbp
- 48 89 e5
- 4: sub $0x60,%rsp
- 48 83 ec 60
- 8: mov %rbx,-0x8(%rbp)
- 48 89 5d f8
- c: mov 0x68(%rdi),%r9d
- 44 8b 4f 68
- 10: sub 0x6c(%rdi),%r9d
- 44 2b 4f 6c
- 14: mov 0xd8(%rdi),%r8
- 4c 8b 87 d8 00 00 00
- 1b: mov $0xc,%esi
- be 0c 00 00 00
- 20: callq 0xffffffffe0ff9442
- e8 1d 94 ff e0
- 25: cmp $0x800,%eax
- 3d 00 08 00 00
- 2a: jne 0x0000000000000042
- 75 16
- 2c: mov $0x17,%esi
- be 17 00 00 00
- 31: callq 0xffffffffe0ff945e
- e8 28 94 ff e0
- 36: cmp $0x1,%eax
- 83 f8 01
- 39: jne 0x0000000000000042
- 75 07
- 3b: mov $0xffff,%eax
- b8 ff ff 00 00
- 40: jmp 0x0000000000000044
- eb 02
- 42: xor %eax,%eax
- 31 c0
- 44: leaveq
- c9
- 45: retq
- c3
+generating disassembly out of the kernel log's hexdump::
+
+ # ./bpf_jit_disasm
+ 70 bytes emitted from JIT compiler (pass:3, flen:6)
+ ffffffffa0069c8f + <x>:
+ 0: push %rbp
+ 1: mov %rsp,%rbp
+ 4: sub $0x60,%rsp
+ 8: mov %rbx,-0x8(%rbp)
+ c: mov 0x68(%rdi),%r9d
+ 10: sub 0x6c(%rdi),%r9d
+ 14: mov 0xd8(%rdi),%r8
+ 1b: mov $0xc,%esi
+ 20: callq 0xffffffffe0ff9442
+ 25: cmp $0x800,%eax
+ 2a: jne 0x0000000000000042
+ 2c: mov $0x17,%esi
+ 31: callq 0xffffffffe0ff945e
+ 36: cmp $0x1,%eax
+ 39: jne 0x0000000000000042
+ 3b: mov $0xffff,%eax
+ 40: jmp 0x0000000000000044
+ 42: xor %eax,%eax
+ 44: leaveq
+ 45: retq
+
+ Issuing option `-o` will "annotate" opcodes to resulting assembler
+ instructions, which can be very useful for JIT developers:
+
+ # ./bpf_jit_disasm -o
+ 70 bytes emitted from JIT compiler (pass:3, flen:6)
+ ffffffffa0069c8f + <x>:
+ 0: push %rbp
+ 55
+ 1: mov %rsp,%rbp
+ 48 89 e5
+ 4: sub $0x60,%rsp
+ 48 83 ec 60
+ 8: mov %rbx,-0x8(%rbp)
+ 48 89 5d f8
+ c: mov 0x68(%rdi),%r9d
+ 44 8b 4f 68
+ 10: sub 0x6c(%rdi),%r9d
+ 44 2b 4f 6c
+ 14: mov 0xd8(%rdi),%r8
+ 4c 8b 87 d8 00 00 00
+ 1b: mov $0xc,%esi
+ be 0c 00 00 00
+ 20: callq 0xffffffffe0ff9442
+ e8 1d 94 ff e0
+ 25: cmp $0x800,%eax
+ 3d 00 08 00 00
+ 2a: jne 0x0000000000000042
+ 75 16
+ 2c: mov $0x17,%esi
+ be 17 00 00 00
+ 31: callq 0xffffffffe0ff945e
+ e8 28 94 ff e0
+ 36: cmp $0x1,%eax
+ 83 f8 01
+ 39: jne 0x0000000000000042
+ 75 07
+ 3b: mov $0xffff,%eax
+ b8 ff ff 00 00
+ 40: jmp 0x0000000000000044
+ eb 02
+ 42: xor %eax,%eax
+ 31 c0
+ 44: leaveq
+ c9
+ 45: retq
+ c3
For BPF JIT developers, bpf_jit_disasm, bpf_asm and bpf_dbg provides a useful
toolchain for developing and testing the kernel's JIT compiler.
@@ -663,9 +693,9 @@ Some core changes of the new internal format:
- Conditional jt/jf targets replaced with jt/fall-through:
- While the original design has constructs such as "if (cond) jump_true;
- else jump_false;", they are being replaced into alternative constructs like
- "if (cond) jump_true; /* else fall-through */".
+ While the original design has constructs such as ``if (cond) jump_true;
+ else jump_false;``, they are being replaced into alternative constructs like
+ ``if (cond) jump_true; /* else fall-through */``.
- Introduces bpf_call insn and register passing convention for zero overhead
calls from/to other kernel functions:
@@ -684,32 +714,32 @@ Some core changes of the new internal format:
a return value of the function. Since R6 - R9 are callee saved, their state
is preserved across the call.
- For example, consider three C functions:
+ For example, consider three C functions::
- u64 f1() { return (*_f2)(1); }
- u64 f2(u64 a) { return f3(a + 1, a); }
- u64 f3(u64 a, u64 b) { return a - b; }
+ u64 f1() { return (*_f2)(1); }
+ u64 f2(u64 a) { return f3(a + 1, a); }
+ u64 f3(u64 a, u64 b) { return a - b; }
- GCC can compile f1, f3 into x86_64:
+ GCC can compile f1, f3 into x86_64::
- f1:
- movl $1, %edi
- movq _f2(%rip), %rax
- jmp *%rax
- f3:
- movq %rdi, %rax
- subq %rsi, %rax
- ret
+ f1:
+ movl $1, %edi
+ movq _f2(%rip), %rax
+ jmp *%rax
+ f3:
+ movq %rdi, %rax
+ subq %rsi, %rax
+ ret
- Function f2 in eBPF may look like:
+ Function f2 in eBPF may look like::
- f2:
- bpf_mov R2, R1
- bpf_add R1, 1
- bpf_call f3
- bpf_exit
+ f2:
+ bpf_mov R2, R1
+ bpf_add R1, 1
+ bpf_call f3
+ bpf_exit
- If f2 is JITed and the pointer stored to '_f2'. The calls f1 -> f2 -> f3 and
+ If f2 is JITed and the pointer stored to ``_f2``. The calls f1 -> f2 -> f3 and
returns will be seamless. Without JIT, __bpf_prog_run() interpreter needs to
be used to call into f2.
@@ -722,6 +752,8 @@ Some core changes of the new internal format:
On 64-bit architectures all register map to HW registers one to one. For
example, x86_64 JIT compiler can map them as ...
+ ::
+
R0 - rax
R1 - rdi
R2 - rsi
@@ -737,7 +769,7 @@ Some core changes of the new internal format:
... since x86_64 ABI mandates rdi, rsi, rdx, rcx, r8, r9 for argument passing
and rbx, r12 - r15 are callee saved.
- Then the following internal BPF pseudo-program:
+ Then the following internal BPF pseudo-program::
bpf_mov R6, R1 /* save ctx */
bpf_mov R2, 2
@@ -755,7 +787,7 @@ Some core changes of the new internal format:
bpf_add R0, R7
bpf_exit
- After JIT to x86_64 may look like:
+ After JIT to x86_64 may look like::
push %rbp
mov %rsp,%rbp
@@ -781,21 +813,21 @@ Some core changes of the new internal format:
leaveq
retq
- Which is in this example equivalent in C to:
+ Which is in this example equivalent in C to::
u64 bpf_filter(u64 ctx)
{
- return foo(ctx, 2, 3, 4, 5) + bar(ctx, 6, 7, 8, 9);
+ return foo(ctx, 2, 3, 4, 5) + bar(ctx, 6, 7, 8, 9);
}
In-kernel functions foo() and bar() with prototype: u64 (*)(u64 arg1, u64
arg2, u64 arg3, u64 arg4, u64 arg5); will receive arguments in proper
- registers and place their return value into '%rax' which is R0 in eBPF.
+ registers and place their return value into ``%rax`` which is R0 in eBPF.
Prologue and epilogue are emitted by JIT and are implicit in the
interpreter. R0-R5 are scratch registers, so eBPF program needs to preserve
them across the calls as defined by calling convention.
- For example the following program is invalid:
+ For example the following program is invalid::
bpf_mov R1, 1
bpf_call foo
@@ -814,7 +846,7 @@ The input context pointer for invoking the interpreter function is generic,
its content is defined by a specific use case. For seccomp register R1 points
to seccomp_data, for converted BPF filters R1 points to a skb.
-A program, that is translated internally consists of the following elements:
+A program, that is translated internally consists of the following elements::
op:16, jt:8, jf:8, k:32 ==> op:8, dst_reg:4, src_reg:4, off:16, imm:32
@@ -824,7 +856,7 @@ instructions must be multiple of 8 bytes to preserve backward compatibility.
Internal BPF is a general purpose RISC instruction set. Not every register and
every instruction are used during translation from original BPF to new format.
-For example, socket filters are not using 'exclusive add' instruction, but
+For example, socket filters are not using ``exclusive add`` instruction, but
tracing filters may do to maintain counters of events, for example. Register R9
is not used by socket filters either, but more complex filters may be running
out of registers and would have to resort to spill/fill to stack.
@@ -849,7 +881,7 @@ eBPF opcode encoding
eBPF is reusing most of the opcode encoding from classic to simplify conversion
of classic BPF to eBPF. For arithmetic and jump instructions the 8-bit 'code'
-field is divided into three parts:
+field is divided into three parts::
+----------------+--------+--------------------+
| 4 bits | 1 bit | 3 bits |
@@ -859,8 +891,9 @@ field is divided into three parts:
Three LSB bits store instruction class which is one of:
- Classic BPF classes: eBPF classes:
-
+ =================== ===============
+ Classic BPF classes eBPF classes
+ =================== ===============
BPF_LD 0x00 BPF_LD 0x00
BPF_LDX 0x01 BPF_LDX 0x01
BPF_ST 0x02 BPF_ST 0x02
@@ -869,25 +902,28 @@ Three LSB bits store instruction class which is one of:
BPF_JMP 0x05 BPF_JMP 0x05
BPF_RET 0x06 BPF_JMP32 0x06
BPF_MISC 0x07 BPF_ALU64 0x07
+ =================== ===============
When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ...
- BPF_K 0x00
- BPF_X 0x08
+ ::
+
+ BPF_K 0x00
+ BPF_X 0x08
- * in classic BPF, this means:
+ * in classic BPF, this means::
- BPF_SRC(code) == BPF_X - use register X as source operand
- BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
+ BPF_SRC(code) == BPF_X - use register X as source operand
+ BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
- * in eBPF, this means:
+ * in eBPF, this means::
- BPF_SRC(code) == BPF_X - use 'src_reg' register as source operand
- BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
+ BPF_SRC(code) == BPF_X - use 'src_reg' register as source operand
+ BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
... and four MSB bits store operation code.
-If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
+If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of::
BPF_ADD 0x00
BPF_SUB 0x10
@@ -904,7 +940,7 @@ If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
BPF_ARSH 0xc0 /* eBPF only: sign extending shift right */
BPF_END 0xd0 /* eBPF only: endianness conversion */
-If BPF_CLASS(code) == BPF_JMP or BPF_JMP32 [ in eBPF ], BPF_OP(code) is one of:
+If BPF_CLASS(code) == BPF_JMP or BPF_JMP32 [ in eBPF ], BPF_OP(code) is one of::
BPF_JA 0x00 /* BPF_JMP only */
BPF_JEQ 0x10
@@ -934,7 +970,7 @@ exactly the same operations as BPF_ALU, but with 64-bit wide operands
instead. So BPF_ADD | BPF_X | BPF_ALU64 means 64-bit addition, i.e.:
dst_reg = dst_reg + src_reg
-Classic BPF wastes the whole BPF_RET class to represent a single 'ret'
+Classic BPF wastes the whole BPF_RET class to represent a single ``ret``
operation. Classic BPF_RET | BPF_K means copy imm32 into return register
and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT
in eBPF means function exit only. The eBPF program needs to store return
@@ -942,7 +978,7 @@ value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is used as
BPF_JMP32 to mean exactly the same operations as BPF_JMP, but with 32-bit wide
operands for the comparisons instead.
-For load and store instructions the 8-bit 'code' field is divided as:
+For load and store instructions the 8-bit 'code' field is divided as::
+--------+--------+-------------------+
| 3 bits | 2 bits | 3 bits |
@@ -952,19 +988,21 @@ For load and store instructions the 8-bit 'code' field is divided as:
Size modifier is one of ...
+::
+
BPF_W 0x00 /* word */
BPF_H 0x08 /* half word */
BPF_B 0x10 /* byte */
BPF_DW 0x18 /* eBPF only, double word */
-... which encodes size of load/store operation:
+... which encodes size of load/store operation::
B - 1 byte
H - 2 byte
W - 4 byte
DW - 8 byte (eBPF only)
-Mode modifier is one of:
+Mode modifier is one of::
BPF_IMM 0x00 /* used for 32-bit mov in classic BPF and 64-bit in eBPF */
BPF_ABS 0x20
@@ -979,7 +1017,7 @@ eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
They had to be carried over from classic to have strong performance of
socket filters running in eBPF interpreter. These instructions can only
-be used when interpreter context is a pointer to 'struct sk_buff' and
+be used when interpreter context is a pointer to ``struct sk_buff`` and
have seven implicit operands. Register R6 is an implicit input that must
contain pointer to sk_buff. Register R0 is an implicit output which contains
the data fetched from the packet. Registers R1-R5 are scratch registers
@@ -992,26 +1030,26 @@ the interpreter will abort the execution of the program. JIT compilers
therefore must preserve this property. src_reg and imm32 fields are
explicit inputs to these instructions.
-For example:
+For example::
BPF_IND | BPF_W | BPF_LD means:
R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
and R1 - R5 were scratched.
-Unlike classic BPF instruction set, eBPF has generic load/store operations:
+Unlike classic BPF instruction set, eBPF has generic load/store operations::
-BPF_MEM | <size> | BPF_STX: *(size *) (dst_reg + off) = src_reg
-BPF_MEM | <size> | BPF_ST: *(size *) (dst_reg + off) = imm32
-BPF_MEM | <size> | BPF_LDX: dst_reg = *(size *) (src_reg + off)
-BPF_XADD | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
-BPF_XADD | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
+ BPF_MEM | <size> | BPF_STX: *(size *) (dst_reg + off) = src_reg
+ BPF_MEM | <size> | BPF_ST: *(size *) (dst_reg + off) = imm32
+ BPF_MEM | <size> | BPF_LDX: dst_reg = *(size *) (src_reg + off)
+ BPF_XADD | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
+ BPF_XADD | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. Note that 1 and
2 byte atomic increments are not supported.
eBPF has one 16-byte instruction: BPF_LD | BPF_DW | BPF_IMM which consists
-of two consecutive 'struct bpf_insn' 8-byte blocks and interpreted as single
+of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single
instruction that loads 64-bit immediate value into a dst_reg.
Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads
32-bit immediate value into a register.
@@ -1037,38 +1075,48 @@ since addition of two valid pointers makes invalid pointer.
(In 'secure' mode verifier will reject any type of pointer arithmetic to make
sure that kernel addresses don't leak to unprivileged users)
-If register was never written to, it's not readable:
+If register was never written to, it's not readable::
+
bpf_mov R0 = R2
bpf_exit
+
will be rejected, since R2 is unreadable at the start of the program.
After kernel function call, R1-R5 are reset to unreadable and
R0 has a return type of the function.
Since R6-R9 are callee saved, their state is preserved across the call.
+
+::
+
bpf_mov R6 = 1
bpf_call foo
bpf_mov R0 = R6
bpf_exit
+
is a correct program. If there was R1 instead of R6, it would have
been rejected.
load/store instructions are allowed only with registers of valid types, which
are PTR_TO_CTX, PTR_TO_MAP, PTR_TO_STACK. They are bounds and alignment checked.
-For example:
+For example::
+
bpf_mov R1 = 1
bpf_mov R2 = 2
bpf_xadd *(u32 *)(R1 + 3) += R2
bpf_exit
+
will be rejected, since R1 doesn't have a valid pointer type at the time of
execution of instruction bpf_xadd.
-At the start R1 type is PTR_TO_CTX (a pointer to generic 'struct bpf_context')
+At the start R1 type is PTR_TO_CTX (a pointer to generic ``struct bpf_context``)
A callback is used to customize verifier to restrict eBPF program access to only
certain fields within ctx structure with specified size and alignment.
-For example, the following insn:
+For example, the following insn::
+
bpf_ld R0 = *(u32 *)(R6 + 8)
+
intends to load a word from address R6 + 8 and store it into R0
If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know
that offset 8 of size 4 bytes can be accessed for reading, otherwise
@@ -1079,10 +1127,13 @@ so it will fail verification, since it's out of bounds.
The verifier will allow eBPF program to read data from stack only after
it wrote into it.
+
Classic BPF verifier does similar check with M[0-15] memory slots.
-For example:
+For example::
+
bpf_ld R0 = *(u32 *)(R10 - 4)
bpf_exit
+
is invalid program.
Though R10 is correct read-only register and has type PTR_TO_STACK
and R10 - 4 is within stack bounds, there were no stores into that location.
@@ -1113,48 +1164,61 @@ Register value tracking
-----------------------
In order to determine the safety of an eBPF program, the verifier must track
the range of possible values in each register and also in each stack slot.
-This is done with 'struct bpf_reg_state', defined in include/linux/
+This is done with ``struct bpf_reg_state``, defined in include/linux/
bpf_verifier.h, which unifies tracking of scalar and pointer values. Each
register state has a type, which is either NOT_INIT (the register has not been
written to), SCALAR_VALUE (some value which is not usable as a pointer), or a
pointer type. The types of pointers describe their base, as follows:
- PTR_TO_CTX Pointer to bpf_context.
- CONST_PTR_TO_MAP Pointer to struct bpf_map. "Const" because arithmetic
- on these pointers is forbidden.
- PTR_TO_MAP_VALUE Pointer to the value stored in a map element.
+
+
+ PTR_TO_CTX
+ Pointer to bpf_context.
+ CONST_PTR_TO_MAP
+ Pointer to struct bpf_map. "Const" because arithmetic
+ on these pointers is forbidden.
+ PTR_TO_MAP_VALUE
+ Pointer to the value stored in a map element.
PTR_TO_MAP_VALUE_OR_NULL
- Either a pointer to a map value, or NULL; map accesses
- (see section 'eBPF maps', below) return this type,
- which becomes a PTR_TO_MAP_VALUE when checked != NULL.
- Arithmetic on these pointers is forbidden.
- PTR_TO_STACK Frame pointer.
- PTR_TO_PACKET skb->data.
- PTR_TO_PACKET_END skb->data + headlen; arithmetic forbidden.
- PTR_TO_SOCKET Pointer to struct bpf_sock_ops, implicitly refcounted.
+ Either a pointer to a map value, or NULL; map accesses
+ (see section 'eBPF maps', below) return this type,
+ which becomes a PTR_TO_MAP_VALUE when checked != NULL.
+ Arithmetic on these pointers is forbidden.
+ PTR_TO_STACK
+ Frame pointer.
+ PTR_TO_PACKET
+ skb->data.
+ PTR_TO_PACKET_END
+ skb->data + headlen; arithmetic forbidden.
+ PTR_TO_SOCKET
+ Pointer to struct bpf_sock_ops, implicitly refcounted.
PTR_TO_SOCKET_OR_NULL
- Either a pointer to a socket, or NULL; socket lookup
- returns this type, which becomes a PTR_TO_SOCKET when
- checked != NULL. PTR_TO_SOCKET is reference-counted,
- so programs must release the reference through the
- socket release function before the end of the program.
- Arithmetic on these pointers is forbidden.
+ Either a pointer to a socket, or NULL; socket lookup
+ returns this type, which becomes a PTR_TO_SOCKET when
+ checked != NULL. PTR_TO_SOCKET is reference-counted,
+ so programs must release the reference through the
+ socket release function before the end of the program.
+ Arithmetic on these pointers is forbidden.
+
However, a pointer may be offset from this base (as a result of pointer
arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable
offset'. The former is used when an exactly-known value (e.g. an immediate
operand) is added to a pointer, while the latter is used for values which are
not exactly known. The variable offset is also used in SCALAR_VALUEs, to track
the range of possible values in the register.
+
The verifier's knowledge about the variable offset consists of:
+
* minimum and maximum values as unsigned
* minimum and maximum values as signed
+
* knowledge of the values of individual bits, in the form of a 'tnum': a u64
-'mask' and a u64 'value'. 1s in the mask represent bits whose value is unknown;
-1s in the value represent bits known to be 1. Bits known to be 0 have 0 in both
-mask and value; no bit should ever be 1 in both. For example, if a byte is read
-into a register from memory, the register's top 56 bits are known zero, while
-the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we
-then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0;
-0x1ff), because of potential carries.
+ 'mask' and a u64 'value'. 1s in the mask represent bits whose value is unknown;
+ 1s in the value represent bits known to be 1. Bits known to be 0 have 0 in both
+ mask and value; no bit should ever be 1 in both. For example, if a byte is read
+ into a register from memory, the register's top 56 bits are known zero, while
+ the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we
+ then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0;
+ 0x1ff), because of potential carries.
Besides arithmetic, the register state can also be updated by conditional
branches. For instance, if a SCALAR_VALUE is compared > 8, in the 'true' branch
@@ -1188,7 +1252,7 @@ The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common
to all copies of the pointer returned from a socket lookup. This has similar
behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but
it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly
-represents a reference to the corresponding 'struct sock'. To ensure that the
+represents a reference to the corresponding ``struct sock``. To ensure that the
reference is not leaked, it is imperative to NULL-check the reference and in
the non-NULL case, and pass the valid reference to the socket release function.
@@ -1196,17 +1260,18 @@ Direct packet access
--------------------
In cls_bpf and act_bpf programs the verifier allows direct access to the packet
data via skb->data and skb->data_end pointers.
-Ex:
-1: r4 = *(u32 *)(r1 +80) /* load skb->data_end */
-2: r3 = *(u32 *)(r1 +76) /* load skb->data */
-3: r5 = r3
-4: r5 += 14
-5: if r5 > r4 goto pc+16
-R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
-6: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */
+Ex::
+
+ 1: r4 = *(u32 *)(r1 +80) /* load skb->data_end */
+ 2: r3 = *(u32 *)(r1 +76) /* load skb->data */
+ 3: r5 = r3
+ 4: r5 += 14
+ 5: if r5 > r4 goto pc+16
+ R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
+ 6: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */
this 2byte load from the packet is safe to do, since the program author
-did check 'if (skb->data + 14 > skb->data_end) goto err' at insn #5 which
+did check ``if (skb->data + 14 > skb->data_end) goto err`` at insn #5 which
means that in the fall-through case the register R3 (which points to skb->data)
has at least 14 directly accessible bytes. The verifier marks it
as R3=pkt(id=0,off=0,r=14).
@@ -1215,52 +1280,58 @@ off=0 means that no additional constants were added.
r=14 is the range of safe access which means that bytes [R3, R3 + 14) are ok.
Note that R5 is marked as R5=pkt(id=0,off=14,r=14). It also points
to the packet data, but constant 14 was added to the register, so
-it now points to 'skb->data + 14' and accessible range is [R5, R5 + 14 - 14)
+it now points to ``skb->data + 14`` and accessible range is [R5, R5 + 14 - 14)
which is zero bytes.
-More complex packet access may look like:
- R0=inv1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
- 6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */
- 7: r4 = *(u8 *)(r3 +12)
- 8: r4 *= 14
- 9: r3 = *(u32 *)(r1 +76) /* load skb->data */
-10: r3 += r4
-11: r2 = r1
-12: r2 <<= 48
-13: r2 >>= 48
-14: r3 += r2
-15: r2 = r3
-16: r2 += 8
-17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */
-18: if r2 > r1 goto pc+2
- R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)) R5=pkt(id=0,off=14,r=14) R10=fp
-19: r1 = *(u8 *)(r3 +4)
+More complex packet access may look like::
+
+
+ R0=inv1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
+ 6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */
+ 7: r4 = *(u8 *)(r3 +12)
+ 8: r4 *= 14
+ 9: r3 = *(u32 *)(r1 +76) /* load skb->data */
+ 10: r3 += r4
+ 11: r2 = r1
+ 12: r2 <<= 48
+ 13: r2 >>= 48
+ 14: r3 += r2
+ 15: r2 = r3
+ 16: r2 += 8
+ 17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */
+ 18: if r2 > r1 goto pc+2
+ R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)) R5=pkt(id=0,off=14,r=14) R10=fp
+ 19: r1 = *(u8 *)(r3 +4)
+
The state of the register R3 is R3=pkt(id=2,off=0,r=8)
-id=2 means that two 'r3 += rX' instructions were seen, so r3 points to some
+id=2 means that two ``r3 += rX`` instructions were seen, so r3 points to some
offset within a packet and since the program author did
-'if (r3 + 8 > r1) goto err' at insn #18, the safe range is [R3, R3 + 8).
+``if (r3 + 8 > r1) goto err`` at insn #18, the safe range is [R3, R3 + 8).
The verifier only allows 'add'/'sub' operations on packet registers. Any other
operation will set the register state to 'SCALAR_VALUE' and it won't be
available for direct packet access.
-Operation 'r3 += rX' may overflow and become less than original skb->data,
-therefore the verifier has to prevent that. So when it sees 'r3 += rX'
+
+Operation ``r3 += rX`` may overflow and become less than original skb->data,
+therefore the verifier has to prevent that. So when it sees ``r3 += rX``
instruction and rX is more than 16-bit value, any subsequent bounds-check of r3
against skb->data_end will not give us 'range' information, so attempts to read
through the pointer will give "invalid access to packet" error.
-Ex. after insn 'r4 = *(u8 *)(r3 +12)' (insn #7 above) the state of r4 is
+
+Ex. after insn ``r4 = *(u8 *)(r3 +12)`` (insn #7 above) the state of r4 is
R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) which means that upper 56 bits
of the register are guaranteed to be zero, and nothing is known about the lower
-8 bits. After insn 'r4 *= 14' the state becomes
+8 bits. After insn ``r4 *= 14`` the state becomes
R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)), since multiplying an 8-bit
value by constant 14 will keep upper 52 bits as zero, also the least significant
-bit will be zero as 14 is even. Similarly 'r2 >>= 48' will make
+bit will be zero as 14 is even. Similarly ``r2 >>= 48`` will make
R2=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)), since the shift is not sign
extending. This logic is implemented in adjust_reg_min_max_vals() function,
which calls adjust_ptr_min_max_vals() for adding pointer to scalar (or vice
versa) and adjust_scalar_min_max_vals() for operations on two scalars.
The end result is that bpf program author can access packet directly
-using normal C code as:
+using normal C code as::
+
void *data = (void *)(long)skb->data;
void *data_end = (void *)(long)skb->data_end;
struct eth_hdr *eth = data;
@@ -1268,13 +1339,14 @@ using normal C code as:
struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph);
if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end)
- return 0;
+ return 0;
if (eth->h_proto != htons(ETH_P_IP))
- return 0;
+ return 0;
if (iph->protocol != IPPROTO_UDP || iph->ihl != 5)
- return 0;
+ return 0;
if (udp->dest == 53 || udp->source == 9)
- ...;
+ ...;
+
which makes such programs easier to write comparing to LD_ABS insn
and significantly faster.
@@ -1284,23 +1356,24 @@ eBPF maps
and userspace.
The maps are accessed from user space via BPF syscall, which has commands:
+
- create a map with given type and attributes
- map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)
+ ``map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)``
using attr->map_type, attr->key_size, attr->value_size, attr->max_entries
returns process-local file descriptor or negative error
- lookup key in a given map
- err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)
+ ``err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)``
using attr->map_fd, attr->key, attr->value
returns zero and stores found elem into value or negative error
- create or update key/value pair in a given map
- err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)
+ ``err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)``
using attr->map_fd, attr->key, attr->value
returns zero or negative error
- find and delete element by key in a given map
- err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)
+ ``err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)``
using attr->map_fd, attr->key
- to delete map: close(fd)
@@ -1312,10 +1385,11 @@ are concurrently updating.
maps can have different types: hash, array, bloom filter, radix-tree, etc.
The map is defined by:
- . type
- . max number of elements
- . key size in bytes
- . value size in bytes
+
+ - type
+ - max number of elements
+ - key size in bytes
+ - value size in bytes
Pruning
-------
@@ -1339,57 +1413,75 @@ Understanding eBPF verifier messages
The following are few examples of invalid eBPF programs and verifier error
messages as seen in the log:
-Program with unreachable instructions:
-static struct bpf_insn prog[] = {
+Program with unreachable instructions::
+
+ static struct bpf_insn prog[] = {
BPF_EXIT_INSN(),
BPF_EXIT_INSN(),
-};
+ };
+
Error:
+
unreachable insn 1
-Program that reads uninitialized register:
+Program that reads uninitialized register::
+
BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
BPF_EXIT_INSN(),
-Error:
+
+Error::
+
0: (bf) r0 = r2
R2 !read_ok
-Program that doesn't initialize R0 before exiting:
+Program that doesn't initialize R0 before exiting::
+
BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
BPF_EXIT_INSN(),
-Error:
+
+Error::
+
0: (bf) r2 = r1
1: (95) exit
R0 !read_ok
-Program that accesses stack out of bounds:
- BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
- BPF_EXIT_INSN(),
-Error:
- 0: (7a) *(u64 *)(r10 +8) = 0
- invalid stack off=8 size=8
+Program that accesses stack out of bounds::
+
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
+ BPF_EXIT_INSN(),
+
+Error::
+
+ 0: (7a) *(u64 *)(r10 +8) = 0
+ invalid stack off=8 size=8
+
+Program that doesn't initialize stack before passing its address into function::
-Program that doesn't initialize stack before passing its address into function:
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_EXIT_INSN(),
-Error:
+
+Error::
+
0: (bf) r2 = r10
1: (07) r2 += -8
2: (b7) r1 = 0x0
3: (85) call 1
invalid indirect read from stack off -8+0 size 8
-Program that uses invalid map_fd=0 while calling to map_lookup_elem() function:
+Program that uses invalid map_fd=0 while calling to map_lookup_elem() function::
+
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_EXIT_INSN(),
-Error:
+
+Error::
+
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
@@ -1398,7 +1490,8 @@ Error:
fd 0 is not pointing to valid bpf_map
Program that doesn't check return value of map_lookup_elem() before accessing
-map element:
+map element::
+
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
@@ -1406,7 +1499,9 @@ map element:
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
BPF_EXIT_INSN(),
-Error:
+
+Error::
+
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
@@ -1416,7 +1511,8 @@ Error:
R0 invalid mem access 'map_value_or_null'
Program that correctly checks map_lookup_elem() returned value for NULL, but
-accesses the memory with incorrect alignment:
+accesses the memory with incorrect alignment::
+
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
@@ -1425,7 +1521,9 @@ accesses the memory with incorrect alignment:
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
BPF_EXIT_INSN(),
-Error:
+
+Error::
+
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
@@ -1438,7 +1536,8 @@ Error:
Program that correctly checks map_lookup_elem() returned value for NULL and
accesses memory with correct alignment in one side of 'if' branch, but fails
-to do so in the other side of 'if' branch:
+to do so in the other side of 'if' branch::
+
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
@@ -1449,7 +1548,9 @@ to do so in the other side of 'if' branch:
BPF_EXIT_INSN(),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
-Error:
+
+Error::
+
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
@@ -1465,8 +1566,8 @@ Error:
R0 invalid mem access 'imm'
Program that performs a socket lookup then sets the pointer to NULL without
-checking it:
-value:
+checking it::
+
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
@@ -1477,7 +1578,9 @@ value:
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
-Error:
+
+Error::
+
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10
@@ -1491,7 +1594,8 @@ Error:
Unreleased reference id=1, alloc_insn=7
Program that performs a socket lookup but does not NULL-check the returned
-value:
+value::
+
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
@@ -1501,7 +1605,9 @@ value:
BPF_MOV64_IMM(BPF_REG_5, 0),
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_EXIT_INSN(),
-Error:
+
+Error::
+
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10
@@ -1519,7 +1625,7 @@ Testing
Next to the BPF toolchain, the kernel also ships a test module that contains
various test cases for classic and internal BPF that can be executed against
the BPF interpreter and JIT compiler. It can be found in lib/test_bpf.c and
-enabled via Kconfig:
+enabled via Kconfig::
CONFIG_TEST_BPF=m
@@ -1540,6 +1646,6 @@ The document was written in the hope that it is found useful and in order
to give potential BPF hackers or security auditors a better overview of
the underlying architecture.
-Jay Schulist <jschlst@samba.org>
-Daniel Borkmann <daniel@iogearbox.net>
-Alexei Starovoitov <ast@kernel.org>
+- Jay Schulist <jschlst@samba.org>
+- Daniel Borkmann <daniel@iogearbox.net>
+- Alexei Starovoitov <ast@kernel.org>
diff --git a/Documentation/networking/fore200e.txt b/Documentation/networking/fore200e.rst
index 1f98f62b4370..55df9ec09ac8 100644
--- a/Documentation/networking/fore200e.txt
+++ b/Documentation/networking/fore200e.rst
@@ -1,6 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+=============================================
FORE Systems PCA-200E/SBA-200E ATM NIC driver
----------------------------------------------
+=============================================
This driver adds support for the FORE Systems 200E-series ATM adapters
to the Linux operating system. It is based on the earlier PCA-200E driver
@@ -27,8 +29,8 @@ in the linux/drivers/atm directory for details and restrictions.
Firmware Updates
----------------
-The FORE Systems 200E-series driver is shipped with firmware data being
-uploaded to the ATM adapters at system boot time or at module loading time.
+The FORE Systems 200E-series driver is shipped with firmware data being
+uploaded to the ATM adapters at system boot time or at module loading time.
The supplied firmware images should work with all adapters.
However, if you encounter problems (the firmware doesn't start or the driver
diff --git a/Documentation/networking/framerelay.txt b/Documentation/networking/framerelay.rst
index 1a0b720440dd..6d904399ec6d 100644
--- a/Documentation/networking/framerelay.txt
+++ b/Documentation/networking/framerelay.rst
@@ -1,4 +1,10 @@
-Frame Relay (FR) support for linux is built into a two tiered system of device
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+Frame Relay (FR)
+================
+
+Frame Relay (FR) support for linux is built into a two tiered system of device
drivers. The upper layer implements RFC1490 FR specification, and uses the
Data Link Connection Identifier (DLCI) as its hardware address. Usually these
are assigned by your network supplier, they give you the number/numbers of
@@ -7,18 +13,18 @@ the Virtual Connections (VC) assigned to you.
Each DLCI is a point-to-point link between your machine and a remote one.
As such, a separate device is needed to accommodate the routing. Within the
net-tools archives is 'dlcicfg'. This program will communicate with the
-base "DLCI" device, and create new net devices named 'dlci00', 'dlci01'...
+base "DLCI" device, and create new net devices named 'dlci00', 'dlci01'...
The configuration script will ask you how many DLCIs you need, as well as
how many DLCIs you want to assign to each Frame Relay Access Device (FRAD).
The DLCI uses a number of function calls to communicate with the FRAD, all
-of which are stored in the FRAD's private data area. assoc/deassoc,
+of which are stored in the FRAD's private data area. assoc/deassoc,
activate/deactivate and dlci_config. The DLCI supplies a receive function
to the FRAD to accept incoming packets.
With this initial offering, only 1 FRAD driver is available. With many thanks
-to Sangoma Technologies, David Mandelstam & Gene Kozin, the S502A, S502E &
-S508 are supported. This driver is currently set up for only FR, but as
+to Sangoma Technologies, David Mandelstam & Gene Kozin, the S502A, S502E &
+S508 are supported. This driver is currently set up for only FR, but as
Sangoma makes more firmware modules available, it can be updated to provide
them as well.
@@ -32,8 +38,7 @@ an initial configuration.
Additional FRAD device drivers can be added as hardware is available.
At this time, the dlcicfg and fradcfg programs have not been incorporated into
-the net-tools distribution. They can be found at ftp.invlogic.com, in
+the net-tools distribution. They can be found at ftp.invlogic.com, in
/pub/linux. Note that with OS/2 FTPD, you end up in /pub by default, so just
-use 'cd linux'. v0.10 is for use on pre-2.0.3 and earlier, v0.15 is for
+use 'cd linux'. v0.10 is for use on pre-2.0.3 and earlier, v0.15 is for
pre-2.0.4 and later.
-
diff --git a/Documentation/networking/gen_stats.txt b/Documentation/networking/gen_stats.rst
index 179b18ce45ff..595a83b9a61b 100644
--- a/Documentation/networking/gen_stats.txt
+++ b/Documentation/networking/gen_stats.rst
@@ -1,67 +1,76 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================================
Generic networking statistics for netlink users
-======================================================================
+===============================================
Statistic counters are grouped into structs:
+==================== ===================== =====================
Struct TLV type Description
-----------------------------------------------------------------------
+==================== ===================== =====================
gnet_stats_basic TCA_STATS_BASIC Basic statistics
gnet_stats_rate_est TCA_STATS_RATE_EST Rate estimator
gnet_stats_queue TCA_STATS_QUEUE Queue statistics
none TCA_STATS_APP Application specific
+==================== ===================== =====================
Collecting:
-----------
-Declare the statistic structs you need:
-struct mystruct {
- struct gnet_stats_basic bstats;
- struct gnet_stats_queue qstats;
- ...
-};
+Declare the statistic structs you need::
+
+ struct mystruct {
+ struct gnet_stats_basic bstats;
+ struct gnet_stats_queue qstats;
+ ...
+ };
+
+Update statistics, in dequeue() methods only, (while owning qdisc->running)::
-Update statistics, in dequeue() methods only, (while owning qdisc->running)
-mystruct->tstats.packet++;
-mystruct->qstats.backlog += skb->pkt_len;
+ mystruct->tstats.packet++;
+ mystruct->qstats.backlog += skb->pkt_len;
Export to userspace (Dump):
---------------------------
-my_dumping_routine(struct sk_buff *skb, ...)
-{
- struct gnet_dump dump;
+::
- if (gnet_stats_start_copy(skb, TCA_STATS2, &mystruct->lock, &dump,
- TCA_PAD) < 0)
- goto rtattr_failure;
+ my_dumping_routine(struct sk_buff *skb, ...)
+ {
+ struct gnet_dump dump;
- if (gnet_stats_copy_basic(&dump, &mystruct->bstats) < 0 ||
- gnet_stats_copy_queue(&dump, &mystruct->qstats) < 0 ||
- gnet_stats_copy_app(&dump, &xstats, sizeof(xstats)) < 0)
- goto rtattr_failure;
+ if (gnet_stats_start_copy(skb, TCA_STATS2, &mystruct->lock, &dump,
+ TCA_PAD) < 0)
+ goto rtattr_failure;
- if (gnet_stats_finish_copy(&dump) < 0)
- goto rtattr_failure;
- ...
-}
+ if (gnet_stats_copy_basic(&dump, &mystruct->bstats) < 0 ||
+ gnet_stats_copy_queue(&dump, &mystruct->qstats) < 0 ||
+ gnet_stats_copy_app(&dump, &xstats, sizeof(xstats)) < 0)
+ goto rtattr_failure;
+
+ if (gnet_stats_finish_copy(&dump) < 0)
+ goto rtattr_failure;
+ ...
+ }
TCA_STATS/TCA_XSTATS backward compatibility:
--------------------------------------------
Prior users of struct tc_stats and xstats can maintain backward
compatibility by calling the compat wrappers to keep providing the
-existing TLV types.
+existing TLV types::
-my_dumping_routine(struct sk_buff *skb, ...)
-{
- if (gnet_stats_start_copy_compat(skb, TCA_STATS2, TCA_STATS,
- TCA_XSTATS, &mystruct->lock, &dump,
- TCA_PAD) < 0)
- goto rtattr_failure;
- ...
-}
+ my_dumping_routine(struct sk_buff *skb, ...)
+ {
+ if (gnet_stats_start_copy_compat(skb, TCA_STATS2, TCA_STATS,
+ TCA_XSTATS, &mystruct->lock, &dump,
+ TCA_PAD) < 0)
+ goto rtattr_failure;
+ ...
+ }
A struct tc_stats will be filled out during gnet_stats_copy_* calls
and appended to the skb. TCA_XSTATS is provided if gnet_stats_copy_app
@@ -77,7 +86,7 @@ are responsible for making sure that the lock is initialized.
Rate Estimator:
---------------
+---------------
0) Prepare an estimator attribute. Most likely this would be in user
space. The value of this TLV should contain a tc_estimator structure.
@@ -92,18 +101,19 @@ Rate Estimator:
TCA_RATE to your code in the kernel.
In the kernel when setting up:
+
1) make sure you have basic stats and rate stats setup first.
2) make sure you have initialized stats lock that is used to setup such
stats.
-3) Now initialize a new estimator:
+3) Now initialize a new estimator::
- int ret = gen_new_estimator(my_basicstats,my_rate_est_stats,
- mystats_lock, attr_with_tcestimator_struct);
+ int ret = gen_new_estimator(my_basicstats,my_rate_est_stats,
+ mystats_lock, attr_with_tcestimator_struct);
- if ret == 0
- success
- else
- failed
+ if ret == 0
+ success
+ else
+ failed
From now on, every time you dump my_rate_est_stats it will contain
up-to-date info.
@@ -115,5 +125,5 @@ are still valid (i.e still exist) at the time of making this call.
Authors:
--------
-Thomas Graf <tgraf@suug.ch>
-Jamal Hadi Salim <hadi@cyberus.ca>
+- Thomas Graf <tgraf@suug.ch>
+- Jamal Hadi Salim <hadi@cyberus.ca>
diff --git a/Documentation/networking/generic-hdlc.txt b/Documentation/networking/generic-hdlc.rst
index 4eb3cc40b702..1c3bb5cb98d4 100644
--- a/Documentation/networking/generic-hdlc.txt
+++ b/Documentation/networking/generic-hdlc.rst
@@ -1,14 +1,22 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================
Generic HDLC layer
+==================
+
Krzysztof Halasa <khc@pm.waw.pl>
Generic HDLC layer currently supports:
+
1. Frame Relay (ANSI, CCITT, Cisco and no LMI)
+
- Normal (routed) and Ethernet-bridged (Ethernet device emulation)
interfaces can share a single PVC.
- ARP support (no InARP support in the kernel - there is an
experimental InARP user-space daemon available on:
http://www.kernel.org/pub/linux/utils/net/hdlc/).
+
2. raw HDLC - either IP (IPv4) interface or Ethernet device emulation
3. Cisco HDLC
4. PPP
@@ -24,19 +32,24 @@ with IEEE 802.1Q (VLANs) and 802.1D (Ethernet bridging).
Make sure the hdlc.o and the hardware driver are loaded. It should
create a number of "hdlc" (hdlc0 etc) network devices, one for each
WAN port. You'll need the "sethdlc" utility, get it from:
+
http://www.kernel.org/pub/linux/utils/net/hdlc/
-Compile sethdlc.c utility:
+Compile sethdlc.c utility::
+
gcc -O2 -Wall -o sethdlc sethdlc.c
+
Make sure you're using a correct version of sethdlc for your kernel.
Use sethdlc to set physical interface, clock rate, HDLC mode used,
and add any required PVCs if using Frame Relay.
-Usually you want something like:
+Usually you want something like::
sethdlc hdlc0 clock int rate 128000
sethdlc hdlc0 cisco interval 10 timeout 25
-or
+
+or::
+
sethdlc hdlc0 rs232 clock ext
sethdlc hdlc0 fr lmi ansi
sethdlc hdlc0 create 99
@@ -49,46 +62,63 @@ any IP address to it) before using pvc devices.
Setting interface:
-* v35 | rs232 | x21 | t1 | e1 - sets physical interface for a given port
- if the card has software-selectable interfaces
- loopback - activate hardware loopback (for testing only)
-* clock ext - both RX clock and TX clock external
-* clock int - both RX clock and TX clock internal
-* clock txint - RX clock external, TX clock internal
-* clock txfromrx - RX clock external, TX clock derived from RX clock
-* rate - sets clock rate in bps (for "int" or "txint" clock only)
+* v35 | rs232 | x21 | t1 | e1
+ - sets physical interface for a given port
+ if the card has software-selectable interfaces
+ loopback
+ - activate hardware loopback (for testing only)
+* clock ext
+ - both RX clock and TX clock external
+* clock int
+ - both RX clock and TX clock internal
+* clock txint
+ - RX clock external, TX clock internal
+* clock txfromrx
+ - RX clock external, TX clock derived from RX clock
+* rate
+ - sets clock rate in bps (for "int" or "txint" clock only)
Setting protocol:
* hdlc - sets raw HDLC (IP-only) mode
+
nrz / nrzi / fm-mark / fm-space / manchester - sets transmission code
+
no-parity / crc16 / crc16-pr0 (CRC16 with preset zeros) / crc32-itu
+
crc16-itu (CRC16 with ITU-T polynomial) / crc16-itu-pr0 - sets parity
* hdlc-eth - Ethernet device emulation using HDLC. Parity and encoding
as above.
* cisco - sets Cisco HDLC mode (IP, IPv6 and IPX supported)
+
interval - time in seconds between keepalive packets
+
timeout - time in seconds after last received keepalive packet before
- we assume the link is down
+ we assume the link is down
* ppp - sets synchronous PPP mode
* x25 - sets X.25 mode
* fr - Frame Relay mode
+
lmi ansi / ccitt / cisco / none - LMI (link management) type
+
dce - Frame Relay DCE (network) side LMI instead of default DTE (user).
+
It has nothing to do with clocks!
- t391 - link integrity verification polling timer (in seconds) - user
- t392 - polling verification timer (in seconds) - network
- n391 - full status polling counter - user
- n392 - error threshold - both user and network
- n393 - monitored events count - both user and network
+
+ - t391 - link integrity verification polling timer (in seconds) - user
+ - t392 - polling verification timer (in seconds) - network
+ - n391 - full status polling counter - user
+ - n392 - error threshold - both user and network
+ - n393 - monitored events count - both user and network
Frame-Relay only:
+
* create n | delete n - adds / deletes PVC interface with DLCI #n.
Newly created interface will be named pvc0, pvc1 etc.
@@ -101,26 +131,34 @@ Frame-Relay only:
Board-specific issues
---------------------
-n2.o and c101.o need parameters to work:
+n2.o and c101.o need parameters to work::
insmod n2 hw=io,irq,ram,ports[:io,irq,...]
-example:
+
+example::
+
insmod n2 hw=0x300,10,0xD0000,01
-or
+or::
+
insmod c101 hw=irq,ram[:irq,...]
-example:
+
+example::
+
insmod c101 hw=9,0xdc000
-If built into the kernel, these drivers need kernel (command line) parameters:
+If built into the kernel, these drivers need kernel (command line) parameters::
+
n2.hw=io,irq,ram,ports:...
-or
+
+or::
+
c101.hw=irq,ram:...
If you have a problem with N2, C101 or PLX200SYN card, you can issue the
-"private" command to see port's packet descriptor rings (in kernel logs):
+"private" command to see port's packet descriptor rings (in kernel logs)::
sethdlc hdlc0 private
diff --git a/Documentation/networking/generic_netlink.txt b/Documentation/networking/generic_netlink.rst
index 3e071115ca90..59e04ccf80c1 100644
--- a/Documentation/networking/generic_netlink.txt
+++ b/Documentation/networking/generic_netlink.rst
@@ -1,3 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+Generic Netlink
+===============
+
A wiki document on how to use Generic Netlink can be found here:
* http://www.linuxfoundation.org/collaborate/workgroups/networking/generic_netlink_howto
diff --git a/Documentation/networking/gtp.txt b/Documentation/networking/gtp.rst
index 6966bbec1ecb..1563fb94b289 100644
--- a/Documentation/networking/gtp.txt
+++ b/Documentation/networking/gtp.rst
@@ -1,12 +1,18 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================================
The Linux kernel GTP tunneling module
-======================================================================
-Documentation by Harald Welte <laforge@gnumonks.org> and
- Andreas Schultz <aschultz@tpip.net>
+=====================================
+
+Documentation by
+ Harald Welte <laforge@gnumonks.org> and
+ Andreas Schultz <aschultz@tpip.net>
In 'drivers/net/gtp.c' you are finding a kernel-level implementation
of a GTP tunnel endpoint.
-== What is GTP ==
+What is GTP
+===========
GTP is the Generic Tunnel Protocol, which is a 3GPP protocol used for
tunneling User-IP payload between a mobile station (phone, modem)
@@ -41,7 +47,8 @@ publicly via the 3GPP website at http://www.3gpp.org/DynaReport/29060.htm
A direct PDF link to v13.6.0 is provided for convenience below:
http://www.etsi.org/deliver/etsi_ts/129000_129099/129060/13.06.00_60/ts_129060v130600p.pdf
-== The Linux GTP tunnelling module ==
+The Linux GTP tunnelling module
+===============================
The module implements the function of a tunnel endpoint, i.e. it is
able to decapsulate tunneled IP packets in the uplink originated by
@@ -70,7 +77,8 @@ Userspace :)
The official homepage of the module is at
https://osmocom.org/projects/linux-kernel-gtp-u/wiki
-== Userspace Programs with Linux Kernel GTP-U support ==
+Userspace Programs with Linux Kernel GTP-U support
+==================================================
At the time of this writing, there are at least two Free Software
implementations that implement GTP-C and can use the netlink interface
@@ -82,7 +90,8 @@ to make use of the Linux kernel GTP-U support:
* ergw (GGSN + P-GW in Erlang):
https://github.com/travelping/ergw
-== Userspace Library / Command Line Utilities ==
+Userspace Library / Command Line Utilities
+==========================================
There is a userspace library called 'libgtpnl' which is based on
libmnl and which implements a C-language API towards the netlink
@@ -90,7 +99,8 @@ interface provided by the Kernel GTP module:
http://git.osmocom.org/libgtpnl/
-== Protocol Versions ==
+Protocol Versions
+=================
There are two different versions of GTP-U: v0 [GSM TS 09.60] and v1
[3GPP TS 29.281]. Both are implemented in the Kernel GTP module.
@@ -105,7 +115,8 @@ doesn't implement GTP-C, we don't have to worry about this. It's the
responsibility of the control plane implementation in userspace to
implement that.
-== IPv6 ==
+IPv6
+====
The 3GPP specifications indicate either IPv4 or IPv6 can be used both
on the inner (user) IP layer, or on the outer (transport) layer.
@@ -114,22 +125,25 @@ Unfortunately, the Kernel module currently supports IPv6 neither for
the User IP payload, nor for the outer IP layer. Patches or other
Contributions to fix this are most welcome!
-== Mailing List ==
+Mailing List
+============
-If yo have questions regarding how to use the Kernel GTP module from
+If you have questions regarding how to use the Kernel GTP module from
your own software, or want to contribute to the code, please use the
osmocom-net-grps mailing list for related discussion. The list can be
reached at osmocom-net-gprs@lists.osmocom.org and the mailman
interface for managing your subscription is at
https://lists.osmocom.org/mailman/listinfo/osmocom-net-gprs
-== Issue Tracker ==
+Issue Tracker
+=============
The Osmocom project maintains an issue tracker for the Kernel GTP-U
module at
https://osmocom.org/projects/linux-kernel-gtp-u/issues
-== History / Acknowledgements ==
+History / Acknowledgements
+==========================
The Module was originally created in 2012 by Harald Welte, but never
completed. Pablo came in to finish the mess Harald left behind. But
@@ -139,9 +153,11 @@ In 2015, Andreas Schultz came to the rescue and fixed lots more bugs,
extended it with new features and finally pushed all of us to get it
mainline, where it was merged in 4.7.0.
-== Architectural Details ==
+Architectural Details
+=====================
-=== Local GTP-U entity and tunnel identification ===
+Local GTP-U entity and tunnel identification
+--------------------------------------------
GTP-U uses UDP for transporting PDU's. The receiving UDP port is 2152
for GTPv1-U and 3386 for GTPv0-U.
@@ -164,15 +180,15 @@ Therefore:
destination IP and the tunnel endpoint id. The source IP and port
have no meaning and can change at any time.
-[3GPP TS 29.281] Section 4.3.0 defines this so:
+[3GPP TS 29.281] Section 4.3.0 defines this so::
-> The TEID in the GTP-U header is used to de-multiplex traffic
-> incoming from remote tunnel endpoints so that it is delivered to the
-> User plane entities in a way that allows multiplexing of different
-> users, different packet protocols and different QoS levels.
-> Therefore no two remote GTP-U endpoints shall send traffic to a
-> GTP-U protocol entity using the same TEID value except
-> for data forwarding as part of mobility procedures.
+ The TEID in the GTP-U header is used to de-multiplex traffic
+ incoming from remote tunnel endpoints so that it is delivered to the
+ User plane entities in a way that allows multiplexing of different
+ users, different packet protocols and different QoS levels.
+ Therefore no two remote GTP-U endpoints shall send traffic to a
+ GTP-U protocol entity using the same TEID value except
+ for data forwarding as part of mobility procedures.
The definition above only defines that two remote GTP-U endpoints
*should not* send to the same TEID, it *does not* forbid or exclude
@@ -183,7 +199,8 @@ multiple or unknown peers.
Therefore, the receiving side identifies tunnels exclusively based on
TEIDs, not based on the source IP!
-== APN vs. Network Device ==
+APN vs. Network Device
+======================
The GTP-U driver creates a Linux network device for each Gi/SGi
interface.
@@ -201,29 +218,33 @@ number of Gi/SGi interfaces implemented by a GGSN/P-GW.
[3GPP TS 29.061] Section 11.3 makes it clear that the selection of a
specific Gi/SGi interfaces is made through the Access Point Name
-(APN):
-
-> 2. each private network manages its own addressing. In general this
-> will result in different private networks having overlapping
-> address ranges. A logically separate connection (e.g. an IP in IP
-> tunnel or layer 2 virtual circuit) is used between the GGSN/P-GW
-> and each private network.
->
-> In this case the IP address alone is not necessarily unique. The
-> pair of values, Access Point Name (APN) and IPv4 address and/or
-> IPv6 prefixes, is unique.
+(APN)::
+
+ 2. each private network manages its own addressing. In general this
+ will result in different private networks having overlapping
+ address ranges. A logically separate connection (e.g. an IP in IP
+ tunnel or layer 2 virtual circuit) is used between the GGSN/P-GW
+ and each private network.
+
+ In this case the IP address alone is not necessarily unique. The
+ pair of values, Access Point Name (APN) and IPv4 address and/or
+ IPv6 prefixes, is unique.
In order to support the overlapping address range use case, each APN
is mapped to a separate Gi/SGi interface (network device).
-NOTE: The Access Point Name is purely a control plane (GTP-C) concept.
-At the GTP-U level, only Tunnel Endpoint Identifiers are present in
-GTP-U packets and network devices are known
+.. note::
+
+ The Access Point Name is purely a control plane (GTP-C) concept.
+ At the GTP-U level, only Tunnel Endpoint Identifiers are present in
+ GTP-U packets and network devices are known
Therefore for a given UE the mapping in IP to PDN network is:
+
* network device + MS IP -> Peer IP + Peer TEID,
and from PDN to IP network:
+
* local GTP-U IP + TEID -> network device
Furthermore, before a received T-PDU is injected into the network
diff --git a/Documentation/networking/hinic.txt b/Documentation/networking/hinic.rst
index 989366a4039c..867ac8f4e04a 100644
--- a/Documentation/networking/hinic.txt
+++ b/Documentation/networking/hinic.rst
@@ -1,3 +1,6 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================================================
Linux Kernel Driver for Huawei Intelligent NIC(HiNIC) family
============================================================
@@ -110,7 +113,7 @@ hinic_dev - de/constructs the Logical Tx and Rx Queues.
(hinic_main.c, hinic_dev.h)
-Miscellaneous:
+Miscellaneous
=============
Common functions that are used by HW and Logical Device.
diff --git a/Documentation/networking/ila.txt b/Documentation/networking/ila.rst
index a17dac9dc915..5ac0a6270b17 100644
--- a/Documentation/networking/ila.txt
+++ b/Documentation/networking/ila.rst
@@ -1,4 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================
Identifier Locator Addressing (ILA)
+===================================
Introduction
@@ -26,11 +30,13 @@ The ILA protocol is described in Internet-Draft draft-herbert-intarea-ila.
ILA terminology
===============
- - Identifier A number that identifies an addressable node in the network
+ - Identifier
+ A number that identifies an addressable node in the network
independent of its location. ILA identifiers are sixty-four
bit values.
- - Locator A network prefix that routes to a physical host. Locators
+ - Locator
+ A network prefix that routes to a physical host. Locators
provide the topological location of an addressed node. ILA
locators are sixty-four bit prefixes.
@@ -51,17 +57,20 @@ ILA terminology
bits) and an identifier (low order sixty-four bits). ILA
addresses are never visible to an application.
- - ILA host An end host that is capable of performing ILA translations
+ - ILA host
+ An end host that is capable of performing ILA translations
on transmit or receive.
- - ILA router A network node that performs ILA translation and forwarding
+ - ILA router
+ A network node that performs ILA translation and forwarding
of translated packets.
- ILA forwarding cache
A type of ILA router that only maintains a working set
cache of mappings.
- - ILA node A network node capable of performing ILA translations. This
+ - ILA node
+ A network node capable of performing ILA translations. This
can be an ILA router, ILA forwarding cache, or ILA host.
@@ -82,18 +91,18 @@ Configuration and datapath for these two points of deployment is somewhat
different.
The diagram below illustrates the flow of packets through ILA as well
-as showing ILA hosts and routers.
+as showing ILA hosts and routers::
+--------+ +--------+
| Host A +-+ +--->| Host B |
| | | (2) ILA (') | |
+--------+ | ...addressed.... ( ) +--------+
- V +---+--+ . packet . +---+--+ (_)
+ V +---+--+ . packet . +---+--+ (_)
(1) SIR | | ILA |----->-------->---->| ILA | | (3) SIR
addressed +->|router| . . |router|->-+ addressed
packet +---+--+ . IPv6 . +---+--+ packet
- / . Network .
- / . . +--+-++--------+
+ / . Network .
+ / . . +--+-++--------+
+--------+ / . . |ILA || Host |
| Host +--+ . .- -|host|| |
| | . . +--+-++--------+
@@ -173,7 +182,7 @@ ILA address, never a SIR address.
In the simplest format the identifier types, C-bit, and checksum
adjustment value are not present so an identifier is considered an
-unstructured sixty-four bit value.
+unstructured sixty-four bit value::
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier |
@@ -184,7 +193,7 @@ unstructured sixty-four bit value.
The checksum neutral adjustment may be configured to always be
present using neutral-map-auto. In this case there is no C-bit, but the
checksum adjustment is in the low order 16 bits. The identifier is
-still sixty-four bits.
+still sixty-four bits::
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier |
@@ -193,7 +202,7 @@ still sixty-four bits.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The C-bit may used to explicitly indicate that checksum neutral
-mapping has been applied to an ILA address. The format is:
+mapping has been applied to an ILA address. The format is::
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |C| Identifier |
@@ -204,7 +213,7 @@ mapping has been applied to an ILA address. The format is:
The identifier type field may be present to indicate the identifier
type. If it is not present then the type is inferred based on mapping
configuration. The checksum neutral adjustment may automatically
-used with the identifier type as illustrated below.
+used with the identifier type as illustrated below::
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type| Identifier |
@@ -213,7 +222,7 @@ used with the identifier type as illustrated below.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
If the identifier type and the C-bit can be present simultaneously so
-the identifier format would be:
+the identifier format would be::
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type|C| Identifier |
@@ -258,28 +267,30 @@ same meanings as described above.
Some examples
=============
-# Configure an ILA route that uses checksum neutral mapping as well
-# as type field. Note that the type field is set in the SIR address
-# (the 2000 implies type is 1 which is LUID).
-ip route add 3333:0:0:1:2000:0:1:87/128 encap ila 2001:0:87:0 \
- csum-mode neutral-map ident-type use-format
-
-# Configure an ILA LWT route that uses auto checksum neutral mapping
-# (no C-bit) and configure identifier type to be LUID so that the
-# identifier type field will not be present.
-ip route add 3333:0:0:1:2000:0:2:87/128 encap ila 2001:0:87:1 \
- csum-mode neutral-map-auto ident-type luid
-
-ila_xlat configuration
-
-# Configure an ILA to SIR mapping that matches a locator and overwrites
-# it with a SIR address (3333:0:0:1 in this example). The C-bit and
-# identifier field are used.
-ip ila add loc_match 2001:0:119:0 loc 3333:0:0:1 \
- csum-mode neutral-map-auto ident-type use-format
-
-# Configure an ILA to SIR mapping where checksum neutral is automatically
-# set without the C-bit and the identifier type is configured to be LUID
-# so that the identifier type field is not present.
-ip ila add loc_match 2001:0:119:0 loc 3333:0:0:1 \
- csum-mode neutral-map-auto ident-type use-format
+::
+
+ # Configure an ILA route that uses checksum neutral mapping as well
+ # as type field. Note that the type field is set in the SIR address
+ # (the 2000 implies type is 1 which is LUID).
+ ip route add 3333:0:0:1:2000:0:1:87/128 encap ila 2001:0:87:0 \
+ csum-mode neutral-map ident-type use-format
+
+ # Configure an ILA LWT route that uses auto checksum neutral mapping
+ # (no C-bit) and configure identifier type to be LUID so that the
+ # identifier type field will not be present.
+ ip route add 3333:0:0:1:2000:0:2:87/128 encap ila 2001:0:87:1 \
+ csum-mode neutral-map-auto ident-type luid
+
+ ila_xlat configuration
+
+ # Configure an ILA to SIR mapping that matches a locator and overwrites
+ # it with a SIR address (3333:0:0:1 in this example). The C-bit and
+ # identifier field are used.
+ ip ila add loc_match 2001:0:119:0 loc 3333:0:0:1 \
+ csum-mode neutral-map-auto ident-type use-format
+
+ # Configure an ILA to SIR mapping where checksum neutral is automatically
+ # set without the C-bit and the identifier type is configured to be LUID
+ # so that the identifier type field is not present.
+ ip ila add loc_match 2001:0:119:0 loc 3333:0:0:1 \
+ csum-mode neutral-map-auto ident-type use-format
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 6538ede29661..0186e276690a 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -15,6 +15,7 @@ Contents:
device_drivers/index
dsa/index
devlink/index
+ caif/index
ethtool-netlink
ieee802154
j1939
@@ -24,6 +25,7 @@ Contents:
failover
net_dim
net_failover
+ page_pool
phy
sfp-phylink
alias
@@ -36,6 +38,91 @@ Contents:
tls-offload
nfc
6lowpan
+ 6pack
+ altera_tse
+ arcnet-hardware
+ arcnet
+ atm
+ ax25
+ baycom
+ bonding
+ cdc_mbim
+ cops
+ cxacru
+ dccp
+ dctcp
+ decnet
+ defza
+ dns_resolver
+ driver
+ eql
+ fib_trie
+ filter
+ fore200e
+ framerelay
+ generic-hdlc
+ generic_netlink
+ gen_stats
+ gtp
+ hinic
+ ila
+ ipddp
+ ip_dynaddr
+ iphase
+ ipsec
+ ip-sysctl
+ ipv6
+ ipvlan
+ ipvs-sysctl
+ kcm
+ l2tp
+ lapb-module
+ ltpc
+ mac80211-injection
+ mpls-sysctl
+ multiqueue
+ netconsole
+ netdev-features
+ netdevices
+ netfilter-sysctl
+ netif-msg
+ nf_conntrack-sysctl
+ nf_flowtable
+ openvswitch
+ operstates
+ packet_mmap
+ phonet
+ pktgen
+ plip
+ ppp_generic
+ proc_net_tcp
+ radiotap-headers
+ ray_cs
+ rds
+ regulatory
+ rxrpc
+ sctp
+ secid
+ seg6-sysctl
+ skfp
+ strparser
+ switchdev
+ tc-actions-env-rules
+ tcp-thin
+ team
+ timestamping
+ tproxy
+ tuntap
+ udplite
+ vrf
+ vxlan
+ x25-iface
+ x25
+ xfrm_device
+ xfrm_proc
+ xfrm_sync
+ xfrm_sysctl
+ z8530drv
.. only:: subproject and html
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.rst
index 6fcfd313dbe4..b72f89d5694c 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1,8 +1,15 @@
-/proc/sys/net/ipv4/* Variables:
+.. SPDX-License-Identifier: GPL-2.0
+
+=========
+IP Sysctl
+=========
+
+/proc/sys/net/ipv4/* Variables
+==============================
ip_forward - BOOLEAN
- 0 - disabled (default)
- not 0 - enabled
+ - 0 - disabled (default)
+ - not 0 - enabled
Forward Packets between interfaces.
@@ -38,6 +45,7 @@ ip_no_pmtu_disc - INTEGER
could break other protocols.
Possible values: 0-3
+
Default: FALSE
min_pmtu - INTEGER
@@ -51,16 +59,20 @@ ip_forward_use_pmtu - BOOLEAN
which tries to discover path mtus by itself and depends on the
kernel honoring this information. This is normally not the
case.
+
Default: 0 (disabled)
+
Possible values:
- 0 - disabled
- 1 - enabled
+
+ - 0 - disabled
+ - 1 - enabled
fwmark_reflect - BOOLEAN
Controls the fwmark of kernel-generated IPv4 reply packets that are not
associated with a socket for example, TCP RSTs or ICMP echo replies).
If unset, these packets have a fwmark of zero. If set, they have the
fwmark of the packet they are replying to.
+
Default: 0
fib_multipath_use_neigh - BOOLEAN
@@ -68,63 +80,80 @@ fib_multipath_use_neigh - BOOLEAN
multipath routes. If disabled, neighbor information is not used and
packets could be directed to a failed nexthop. Only valid for kernels
built with CONFIG_IP_ROUTE_MULTIPATH enabled.
+
Default: 0 (disabled)
+
Possible values:
- 0 - disabled
- 1 - enabled
+
+ - 0 - disabled
+ - 1 - enabled
fib_multipath_hash_policy - INTEGER
Controls which hash policy to use for multipath routes. Only valid
for kernels built with CONFIG_IP_ROUTE_MULTIPATH enabled.
+
Default: 0 (Layer 3)
+
Possible values:
- 0 - Layer 3
- 1 - Layer 4
- 2 - Layer 3 or inner Layer 3 if present
+
+ - 0 - Layer 3
+ - 1 - Layer 4
+ - 2 - Layer 3 or inner Layer 3 if present
fib_sync_mem - UNSIGNED INTEGER
Amount of dirty memory from fib entries that can be backlogged before
synchronize_rcu is forced.
- Default: 512kB Minimum: 64kB Maximum: 64MB
+
+ Default: 512kB Minimum: 64kB Maximum: 64MB
ip_forward_update_priority - INTEGER
Whether to update SKB priority from "TOS" field in IPv4 header after it
is forwarded. The new SKB priority is mapped from TOS field value
according to an rt_tos2priority table (see e.g. man tc-prio).
+
Default: 1 (Update priority.)
+
Possible values:
- 0 - Do not update priority.
- 1 - Update priority.
+
+ - 0 - Do not update priority.
+ - 1 - Update priority.
route/max_size - INTEGER
Maximum number of routes allowed in the kernel. Increase
this when using large numbers of interfaces and/or routes.
+
From linux kernel 3.6 onwards, this is deprecated for ipv4
as route cache is no longer used.
neigh/default/gc_thresh1 - INTEGER
Minimum number of entries to keep. Garbage collector will not
purge entries if there are fewer than this number.
+
Default: 128
neigh/default/gc_thresh2 - INTEGER
Threshold when garbage collector becomes more aggressive about
purging entries. Entries older than 5 seconds will be cleared
when over this number.
+
Default: 512
neigh/default/gc_thresh3 - INTEGER
Maximum number of non-PERMANENT neighbor entries allowed. Increase
this when using large numbers of interfaces and when communicating
with large numbers of directly-connected peers.
+
Default: 1024
neigh/default/unres_qlen_bytes - INTEGER
The maximum number of bytes which may be used by packets
queued for each unresolved address by other network layers.
(added in linux 3.3)
+
Setting negative value is meaningless and will return error.
+
Default: SK_WMEM_MAX, (same as net.core.wmem_default).
+
Exact value depends on architecture and kernel options,
but should be enough to allow queuing 256 packets
of medium size.
@@ -132,11 +161,14 @@ neigh/default/unres_qlen_bytes - INTEGER
neigh/default/unres_qlen - INTEGER
The maximum number of packets which may be queued for each
unresolved address by other network layers.
+
(deprecated in linux 3.3) : use unres_qlen_bytes instead.
+
Prior to linux 3.3, the default value is 3 which may cause
unexpected packet loss. The current default value is calculated
according to default value of unres_qlen_bytes and true size of
packet.
+
Default: 101
mtu_expires - INTEGER
@@ -183,7 +215,8 @@ ipfrag_max_dist - INTEGER
from different IP datagrams, which could result in data corruption.
Default: 64
-INET peer storage:
+INET peer storage
+=================
inet_peer_threshold - INTEGER
The approximate size of the storage. Starting from this threshold
@@ -203,7 +236,8 @@ inet_peer_maxttl - INTEGER
when the number of entries in the pool is very small).
Measured in seconds.
-TCP variables:
+TCP variables
+=============
somaxconn - INTEGER
Limit of socket listen() backlog, known in userspace as SOMAXCONN.
@@ -222,18 +256,22 @@ tcp_adv_win_scale - INTEGER
Count buffering overhead as bytes/2^tcp_adv_win_scale
(if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale),
if it is <= 0.
+
Possible values are [-31, 31], inclusive.
+
Default: 1
tcp_allowed_congestion_control - STRING
Show/set the congestion control choices available to non-privileged
processes. The list is a subset of those listed in
tcp_available_congestion_control.
+
Default is "reno" and the default setting (tcp_congestion_control).
tcp_app_win - INTEGER
Reserve max(window/2^tcp_app_win, mss) of window for application
buffer. Value 0 is special, it means that nothing is reserved.
+
Default: 31
tcp_autocorking - BOOLEAN
@@ -244,6 +282,7 @@ tcp_autocorking - BOOLEAN
packet for the flow is waiting in Qdisc queues or device transmit
queue. Applications can still use TCP_CORK for optimal behavior
when they know how/when to uncork their sockets.
+
Default : 1
tcp_available_congestion_control - STRING
@@ -265,6 +304,7 @@ tcp_mtu_probe_floor - INTEGER
tcp_min_snd_mss - INTEGER
TCP SYN and SYNACK messages usually advertise an ADVMSS option,
as described in RFC 1122 and RFC 6691.
+
If this ADVMSS option is smaller than tcp_min_snd_mss,
it is silently capped to tcp_min_snd_mss.
@@ -277,6 +317,7 @@ tcp_congestion_control - STRING
Default is set as part of kernel configuration.
For passive connections, the listener congestion control choice
is inherited.
+
[see setsockopt(listenfd, SOL_TCP, TCP_CONGESTION, "name" ...) ]
tcp_dsack - BOOLEAN
@@ -286,9 +327,12 @@ tcp_early_retrans - INTEGER
Tail loss probe (TLP) converts RTOs occurring due to tail
losses into fast recovery (draft-ietf-tcpm-rack). Note that
TLP requires RACK to function properly (see tcp_recovery below)
+
Possible values:
- 0 disables TLP
- 3 or 4 enables TLP
+
+ - 0 disables TLP
+ - 3 or 4 enables TLP
+
Default: 3
tcp_ecn - INTEGER
@@ -297,12 +341,17 @@ tcp_ecn - INTEGER
support for it. This feature is useful in avoiding losses due
to congestion by allowing supporting routers to signal
congestion before having to drop packets.
+
Possible values are:
- 0 Disable ECN. Neither initiate nor accept ECN.
- 1 Enable ECN when requested by incoming connections and
- also request ECN on outgoing connection attempts.
- 2 Enable ECN when requested by incoming connections
- but do not request ECN on outgoing connections.
+
+ = =====================================================
+ 0 Disable ECN. Neither initiate nor accept ECN.
+ 1 Enable ECN when requested by incoming connections and
+ also request ECN on outgoing connection attempts.
+ 2 Enable ECN when requested by incoming connections
+ but do not request ECN on outgoing connections.
+ = =====================================================
+
Default: 2
tcp_ecn_fallback - BOOLEAN
@@ -312,6 +361,7 @@ tcp_ecn_fallback - BOOLEAN
additional detection mechanisms could be implemented under this
knob. The value is not used, if tcp_ecn or per route (or congestion
control) ECN settings are disabled.
+
Default: 1 (fallback enabled)
tcp_fack - BOOLEAN
@@ -324,7 +374,9 @@ tcp_fin_timeout - INTEGER
valid "receive only" state for an un-orphaned connection, an
orphaned connection in FIN_WAIT_2 state could otherwise wait
forever for the remote to close its end of the connection.
+
Cf. tcp_max_orphans
+
Default: 60 seconds
tcp_frto - INTEGER
@@ -390,7 +442,8 @@ tcp_l3mdev_accept - BOOLEAN
derived from the listen socket to be bound to the L3 domain in
which the packets originated. Only valid when the kernel was
compiled with CONFIG_NET_L3_MASTER_DEV.
- Default: 0 (disabled)
+
+ Default: 0 (disabled)
tcp_low_latency - BOOLEAN
This is a legacy option, it has no effect anymore.
@@ -410,10 +463,14 @@ tcp_max_orphans - INTEGER
tcp_max_syn_backlog - INTEGER
Maximal number of remembered connection requests (SYN_RECV),
which have not received an acknowledgment from connecting client.
+
This is a per-listener limit.
+
The minimal value is 128 for low memory machines, and it will
increase in proportion to the memory of machine.
+
If server suffers from overload, try increasing this number.
+
Remember to also check /proc/sys/net/core/somaxconn
A SYN_RECV request socket consumes about 304 bytes of memory.
@@ -445,7 +502,9 @@ tcp_min_rtt_wlen - INTEGER
minimum RTT when it is moved to a longer path (e.g., due to traffic
engineering). A longer window makes the filter more resistant to RTT
inflations such as transient congestion. The unit is seconds.
+
Possible values: 0 - 86400 (1 day)
+
Default: 300
tcp_moderate_rcvbuf - BOOLEAN
@@ -457,9 +516,10 @@ tcp_moderate_rcvbuf - BOOLEAN
tcp_mtu_probing - INTEGER
Controls TCP Packetization-Layer Path MTU Discovery. Takes three
values:
- 0 - Disabled
- 1 - Disabled by default, enabled when an ICMP black hole detected
- 2 - Always enabled, use initial MSS of tcp_base_mss.
+
+ - 0 - Disabled
+ - 1 - Disabled by default, enabled when an ICMP black hole detected
+ - 2 - Always enabled, use initial MSS of tcp_base_mss.
tcp_probe_interval - UNSIGNED INTEGER
Controls how often to start TCP Packetization-Layer Path MTU
@@ -481,6 +541,7 @@ tcp_no_metrics_save - BOOLEAN
tcp_no_ssthresh_metrics_save - BOOLEAN
Controls whether TCP saves ssthresh metrics in the route cache.
+
Default is 1, which disables ssthresh metrics.
tcp_orphan_retries - INTEGER
@@ -489,6 +550,7 @@ tcp_orphan_retries - INTEGER
See tcp_retries2 for more details.
The default value is 8.
+
If your machine is a loaded WEB server,
you should think about lowering this value, such sockets
may consume significant resources. Cf. tcp_max_orphans.
@@ -497,11 +559,15 @@ tcp_recovery - INTEGER
This value is a bitmap to enable various experimental loss recovery
features.
- RACK: 0x1 enables the RACK loss detection for fast detection of lost
- retransmissions and tail drops. It also subsumes and disables
- RFC6675 recovery for SACK connections.
- RACK: 0x2 makes RACK's reordering window static (min_rtt/4).
- RACK: 0x4 disables RACK's DUPACK threshold heuristic
+ ========= =============================================================
+ RACK: 0x1 enables the RACK loss detection for fast detection of lost
+ retransmissions and tail drops. It also subsumes and disables
+ RFC6675 recovery for SACK connections.
+
+ RACK: 0x2 makes RACK's reordering window static (min_rtt/4).
+
+ RACK: 0x4 disables RACK's DUPACK threshold heuristic
+ ========= =============================================================
Default: 0x1
@@ -509,12 +575,14 @@ tcp_reordering - INTEGER
Initial reordering level of packets in a TCP stream.
TCP stack can then dynamically adjust flow reordering level
between this initial value and tcp_max_reordering
+
Default: 3
tcp_max_reordering - INTEGER
Maximal reordering level of packets in a TCP stream.
300 is a fairly conservative value, but you might increase it
if paths are using per packet load balancing (like bonding rr mode)
+
Default: 300
tcp_retrans_collapse - BOOLEAN
@@ -550,12 +618,14 @@ tcp_rfc1337 - BOOLEAN
If set, the TCP stack behaves conforming to RFC1337. If unset,
we are not conforming to RFC, but prevent TCP TIME_WAIT
assassination.
+
Default: 0
tcp_rmem - vector of 3 INTEGERs: min, default, max
min: Minimal size of receive buffer used by TCP sockets.
It is guaranteed to each TCP socket, even under moderate memory
pressure.
+
Default: 4K
default: initial size of receive buffer used by TCP sockets.
@@ -581,6 +651,14 @@ tcp_comp_sack_delay_ns - LONG INTEGER
Default : 1,000,000 ns (1 ms)
+tcp_comp_sack_slack_ns - LONG INTEGER
+ This sysctl control the slack used when arming the
+ timer used by SACK compression. This gives extra time
+ for small RTT flows, and reduces system overhead by allowing
+ opportunistic reduction of timer interrupts.
+
+ Default : 100,000 ns (100 us)
+
tcp_comp_sack_nr - INTEGER
Max number of SACK that can be compressed.
Using 0 disables SACK compression.
@@ -592,12 +670,14 @@ tcp_slow_start_after_idle - BOOLEAN
window after an idle period. An idle period is defined at
the current RTO. If unset, the congestion window will not
be timed out after an idle period.
+
Default: 1
tcp_stdurg - BOOLEAN
Use the Host requirements interpretation of the TCP urgent pointer field.
Most hosts use the older BSD interpretation, so if you turn this on
Linux might not communicate correctly with them.
+
Default: FALSE
tcp_synack_retries - INTEGER
@@ -646,15 +726,18 @@ tcp_fastopen - INTEGER
the option value being the length of the syn-data backlog.
The values (bitmap) are
- 0x1: (client) enables sending data in the opening SYN on the client.
- 0x2: (server) enables the server support, i.e., allowing data in
+
+ ===== ======== ======================================================
+ 0x1 (client) enables sending data in the opening SYN on the client.
+ 0x2 (server) enables the server support, i.e., allowing data in
a SYN packet to be accepted and passed to the
application before 3-way handshake finishes.
- 0x4: (client) send data in the opening SYN regardless of cookie
+ 0x4 (client) send data in the opening SYN regardless of cookie
availability and without a cookie option.
- 0x200: (server) accept data-in-SYN w/o any cookie option present.
- 0x400: (server) enable all listeners to support Fast Open by
+ 0x200 (server) accept data-in-SYN w/o any cookie option present.
+ 0x400 (server) enable all listeners to support Fast Open by
default without explicit TCP_FASTOPEN socket option.
+ ===== ======== ======================================================
Default: 0x1
@@ -668,6 +751,7 @@ tcp_fastopen_blackhole_timeout_sec - INTEGER
get detected right after Fastopen is re-enabled and will reset to
initial value when the blackhole issue goes away.
0 to disable the blackhole detection.
+
By default, it is set to 1hr.
tcp_fastopen_key - list of comma separated 32-digit hexadecimal INTEGERs
@@ -698,20 +782,24 @@ tcp_syn_retries - INTEGER
for an active TCP connection attempt will happen after 127seconds.
tcp_timestamps - INTEGER
-Enable timestamps as defined in RFC1323.
- 0: Disabled.
- 1: Enable timestamps as defined in RFC1323 and use random offset for
- each connection rather than only using the current time.
- 2: Like 1, but without random offsets.
+ Enable timestamps as defined in RFC1323.
+
+ - 0: Disabled.
+ - 1: Enable timestamps as defined in RFC1323 and use random offset for
+ each connection rather than only using the current time.
+ - 2: Like 1, but without random offsets.
+
Default: 1
tcp_min_tso_segs - INTEGER
Minimal number of segments per TSO frame.
+
Since linux-3.12, TCP does an automatic sizing of TSO frames,
depending on flow rate, instead of filling 64Kbytes packets.
For specific usages, it's possible to force TCP to build big
TSO frames. Note that TCP stack might split too big TSO packets
if available window is too small.
+
Default: 2
tcp_pacing_ss_ratio - INTEGER
@@ -720,6 +808,7 @@ tcp_pacing_ss_ratio - INTEGER
If TCP is in slow start, tcp_pacing_ss_ratio is applied
to let TCP probe for bigger speeds, assuming cwnd can be
doubled every other RTT.
+
Default: 200
tcp_pacing_ca_ratio - INTEGER
@@ -727,6 +816,7 @@ tcp_pacing_ca_ratio - INTEGER
to current rate. (current_rate = cwnd * mss / srtt)
If TCP is in congestion avoidance phase, tcp_pacing_ca_ratio
is applied to conservatively probe for bigger throughput.
+
Default: 120
tcp_tso_win_divisor - INTEGER
@@ -734,16 +824,20 @@ tcp_tso_win_divisor - INTEGER
can be consumed by a single TSO frame.
The setting of this parameter is a choice between burstiness and
building larger TSO frames.
+
Default: 3
tcp_tw_reuse - INTEGER
Enable reuse of TIME-WAIT sockets for new connections when it is
safe from protocol viewpoint.
- 0 - disable
- 1 - global enable
- 2 - enable for loopback traffic only
+
+ - 0 - disable
+ - 1 - global enable
+ - 2 - enable for loopback traffic only
+
It should not be changed without advice/request of technical
experts.
+
Default: 2
tcp_window_scaling - BOOLEAN
@@ -752,11 +846,14 @@ tcp_window_scaling - BOOLEAN
tcp_wmem - vector of 3 INTEGERs: min, default, max
min: Amount of memory reserved for send buffers for TCP sockets.
Each TCP socket has rights to use it due to fact of its birth.
+
Default: 4K
default: initial size of send buffer used by TCP sockets. This
value overrides net.core.wmem_default used by other protocols.
+
It is usually lower than net.core.wmem_default.
+
Default: 16K
max: Maximal amount of memory allowed for automatically tuned
@@ -764,6 +861,7 @@ tcp_wmem - vector of 3 INTEGERs: min, default, max
net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables
automatic tuning of that socket's send buffer size, in which case
this value is ignored.
+
Default: between 64K and 4MB, depending on RAM size.
tcp_notsent_lowat - UNSIGNED INTEGER
@@ -784,6 +882,7 @@ tcp_workaround_signed_windows - BOOLEAN
remote TCP is broken and treats the window as a signed quantity.
If unset, assume the remote TCP is not broken even if we do
not receive a window scaling option from them.
+
Default: 0
tcp_thin_linear_timeouts - BOOLEAN
@@ -795,7 +894,8 @@ tcp_thin_linear_timeouts - BOOLEAN
initiated. This improves retransmission latency for
non-aggressive thin streams, often found to be time-dependent.
For more information on thin streams, see
- Documentation/networking/tcp-thin.txt
+ Documentation/networking/tcp-thin.rst
+
Default: 0
tcp_limit_output_bytes - INTEGER
@@ -807,6 +907,7 @@ tcp_limit_output_bytes - INTEGER
flows, for typical pfifo_fast qdiscs. tcp_limit_output_bytes
limits the number of bytes on qdisc or device to reduce artificial
RTT/cwnd and reduce bufferbloat.
+
Default: 1048576 (16 * 65536)
tcp_challenge_ack_limit - INTEGER
@@ -822,7 +923,8 @@ tcp_rx_skb_cache - BOOLEAN
Default: 0 (disabled)
-UDP variables:
+UDP variables
+=============
udp_l3mdev_accept - BOOLEAN
Enabling this option allows a "global" bound socket to work
@@ -830,7 +932,8 @@ udp_l3mdev_accept - BOOLEAN
being received regardless of the L3 domain in which they
originated. Only valid when the kernel was compiled with
CONFIG_NET_L3_MASTER_DEV.
- Default: 0 (disabled)
+
+ Default: 0 (disabled)
udp_mem - vector of 3 INTEGERs: min, pressure, max
Number of pages allowed for queueing by all UDP sockets.
@@ -849,15 +952,18 @@ udp_rmem_min - INTEGER
Minimal size of receive buffer used by UDP sockets in moderation.
Each UDP socket is able to use the size for receiving data, even if
total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
+
Default: 4K
udp_wmem_min - INTEGER
Minimal size of send buffer used by UDP sockets in moderation.
Each UDP socket is able to use the size for sending data, even if
total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
+
Default: 4K
-RAW variables:
+RAW variables
+=============
raw_l3mdev_accept - BOOLEAN
Enabling this option allows a "global" bound socket to work
@@ -865,9 +971,11 @@ raw_l3mdev_accept - BOOLEAN
being received regardless of the L3 domain in which they
originated. Only valid when the kernel was compiled with
CONFIG_NET_L3_MASTER_DEV.
+
Default: 1 (enabled)
-CIPSOv4 Variables:
+CIPSOv4 Variables
+=================
cipso_cache_enable - BOOLEAN
If set, enable additions to and lookups from the CIPSO label mapping
@@ -875,6 +983,7 @@ cipso_cache_enable - BOOLEAN
miss. However, regardless of the setting the cache is still
invalidated when required when means you can safely toggle this on and
off and the cache will always be "safe".
+
Default: 1
cipso_cache_bucket_size - INTEGER
@@ -884,6 +993,7 @@ cipso_cache_bucket_size - INTEGER
more CIPSO label mappings that can be cached. When the number of
entries in a given hash bucket reaches this limit adding new entries
causes the oldest entry in the bucket to be removed to make room.
+
Default: 10
cipso_rbm_optfmt - BOOLEAN
@@ -891,6 +1001,7 @@ cipso_rbm_optfmt - BOOLEAN
the CIPSO draft specification (see Documentation/netlabel for details).
This means that when set the CIPSO tag will be padded with empty
categories in order to make the packet data 32-bit aligned.
+
Default: 0
cipso_rbm_structvalid - BOOLEAN
@@ -900,9 +1011,11 @@ cipso_rbm_structvalid - BOOLEAN
where in the CIPSO processing code but setting this to 0 (False) should
result in less work (i.e. it should be faster) but could cause problems
with other implementations that require strict checking.
+
Default: 0
-IP Variables:
+IP Variables
+============
ip_local_port_range - 2 INTEGERS
Defines the local port range that is used by TCP and UDP to
@@ -931,12 +1044,12 @@ ip_local_reserved_ports - list of comma separated ranges
assignments.
You can reserve ports which are not in the current
- ip_local_port_range, e.g.:
+ ip_local_port_range, e.g.::
- $ cat /proc/sys/net/ipv4/ip_local_port_range
- 32000 60999
- $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
- 8080,9148
+ $ cat /proc/sys/net/ipv4/ip_local_port_range
+ 32000 60999
+ $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
+ 8080,9148
although this is redundant. However such a setting is useful
if later the port range is changed to a value that will
@@ -956,6 +1069,7 @@ ip_unprivileged_port_start - INTEGER
ip_nonlocal_bind - BOOLEAN
If set, allows processes to bind() to non-local IP addresses,
which can be quite useful - but may break some applications.
+
Default: 0
ip_autobind_reuse - BOOLEAN
@@ -972,6 +1086,7 @@ ip_dynaddr - BOOLEAN
If set to a non-zero value larger than 1, a kernel log
message will be printed when dynamic address rewriting
occurs.
+
Default: 0
ip_early_demux - BOOLEAN
@@ -981,25 +1096,37 @@ ip_early_demux - BOOLEAN
It may add an additional cost for pure routing workloads that
reduces overall throughput, in such case you should disable it.
+
Default: 1
+ping_group_range - 2 INTEGERS
+ Restrict ICMP_PROTO datagram sockets to users in the group range.
+ The default is "1 0", meaning, that nobody (not even root) may
+ create ping sockets. Setting it to "100 100" would grant permissions
+ to the single group. "0 4294967295" would enable it for the world, "100
+ 4294967295" would enable it for the users, but not daemons.
+
tcp_early_demux - BOOLEAN
Enable early demux for established TCP sockets.
+
Default: 1
udp_early_demux - BOOLEAN
Enable early demux for connected UDP sockets. Disable this if
your system could experience more unconnected load.
+
Default: 1
icmp_echo_ignore_all - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO
requests sent to it.
+
Default: 0
icmp_echo_ignore_broadcasts - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO and
TIMESTAMP requests sent to it via broadcast/multicast.
+
Default: 1
icmp_ratelimit - INTEGER
@@ -1009,46 +1136,55 @@ icmp_ratelimit - INTEGER
otherwise the minimal space between responses in milliseconds.
Note that another sysctl, icmp_msgs_per_sec limits the number
of ICMP packets sent on all targets.
+
Default: 1000
icmp_msgs_per_sec - INTEGER
Limit maximal number of ICMP packets sent per second from this host.
Only messages whose type matches icmp_ratemask (see below) are
controlled by this limit.
+
Default: 1000
icmp_msgs_burst - INTEGER
icmp_msgs_per_sec controls number of ICMP packets sent per second,
while icmp_msgs_burst controls the burst size of these packets.
+
Default: 50
icmp_ratemask - INTEGER
Mask made of ICMP types for which rates are being limited.
+
Significant bits: IHGFEDCBA9876543210
+
Default mask: 0000001100000011000 (6168)
Bit definitions (see include/linux/icmp.h):
+
+ = =========================
0 Echo Reply
- 3 Destination Unreachable *
- 4 Source Quench *
+ 3 Destination Unreachable [1]_
+ 4 Source Quench [1]_
5 Redirect
8 Echo Request
- B Time Exceeded *
- C Parameter Problem *
+ B Time Exceeded [1]_
+ C Parameter Problem [1]_
D Timestamp Request
E Timestamp Reply
F Info Request
G Info Reply
H Address Mask Request
I Address Mask Reply
+ = =========================
- * These are rate limited by default (see default mask above)
+ .. [1] These are rate limited by default (see default mask above)
icmp_ignore_bogus_error_responses - BOOLEAN
Some routers violate RFC1122 by sending bogus responses to broadcast
frames. Such violations are normally logged via a kernel warning.
If this is set to TRUE, the kernel will not give such warnings, which
will avoid log file clutter.
+
Default: 1
icmp_errors_use_inbound_ifaddr - BOOLEAN
@@ -1093,32 +1229,39 @@ igmp_max_memberships - INTEGER
igmp_max_msf - INTEGER
Maximum number of addresses allowed in the source filter list for a
multicast group.
+
Default: 10
igmp_qrv - INTEGER
Controls the IGMP query robustness variable (see RFC2236 8.1).
+
Default: 2 (as specified by RFC2236 8.1)
+
Minimum: 1 (as specified by RFC6636 4.5)
force_igmp_version - INTEGER
- 0 - (default) No enforcement of a IGMP version, IGMPv1/v2 fallback
- allowed. Will back to IGMPv3 mode again if all IGMPv1/v2 Querier
- Present timer expires.
- 1 - Enforce to use IGMP version 1. Will also reply IGMPv1 report if
- receive IGMPv2/v3 query.
- 2 - Enforce to use IGMP version 2. Will fallback to IGMPv1 if receive
- IGMPv1 query message. Will reply report if receive IGMPv3 query.
- 3 - Enforce to use IGMP version 3. The same react with default 0.
+ - 0 - (default) No enforcement of a IGMP version, IGMPv1/v2 fallback
+ allowed. Will back to IGMPv3 mode again if all IGMPv1/v2 Querier
+ Present timer expires.
+ - 1 - Enforce to use IGMP version 1. Will also reply IGMPv1 report if
+ receive IGMPv2/v3 query.
+ - 2 - Enforce to use IGMP version 2. Will fallback to IGMPv1 if receive
+ IGMPv1 query message. Will reply report if receive IGMPv3 query.
+ - 3 - Enforce to use IGMP version 3. The same react with default 0.
+
+ .. note::
- Note: this is not the same with force_mld_version because IGMPv3 RFC3376
- Security Considerations does not have clear description that we could
- ignore other version messages completely as MLDv2 RFC3810. So make
- this value as default 0 is recommended.
+ this is not the same with force_mld_version because IGMPv3 RFC3376
+ Security Considerations does not have clear description that we could
+ ignore other version messages completely as MLDv2 RFC3810. So make
+ this value as default 0 is recommended.
-conf/interface/* changes special settings per interface (where
-"interface" is the name of your network interface)
+``conf/interface/*``
+ changes special settings per interface (where
+ interface" is the name of your network interface)
-conf/all/* is special, changes the settings for all interfaces
+``conf/all/*``
+ is special, changes the settings for all interfaces
log_martians - BOOLEAN
Log packets with impossible addresses to kernel log.
@@ -1129,14 +1272,21 @@ log_martians - BOOLEAN
accept_redirects - BOOLEAN
Accept ICMP redirect messages.
accept_redirects for the interface will be enabled if:
+
- both conf/{all,interface}/accept_redirects are TRUE in the case
forwarding for the interface is enabled
+
or
+
- at least one of conf/{all,interface}/accept_redirects is TRUE in the
case forwarding for the interface is disabled
+
accept_redirects for the interface will be disabled otherwise
- default TRUE (host)
- FALSE (router)
+
+ default:
+
+ - TRUE (host)
+ - FALSE (router)
forwarding - BOOLEAN
Enable IP forwarding on this interface. This controls whether packets
@@ -1161,12 +1311,14 @@ medium_id - INTEGER
proxy_arp - BOOLEAN
Do proxy arp.
+
proxy_arp for the interface will be enabled if at least one of
conf/{all,interface}/proxy_arp is set to TRUE,
it will be disabled otherwise
proxy_arp_pvlan - BOOLEAN
Private VLAN proxy arp.
+
Basically allow proxy arp replies back to the same interface
(from which the ARP request/solicitation was received).
@@ -1179,6 +1331,7 @@ proxy_arp_pvlan - BOOLEAN
proxy_arp.
This technology is known by different names:
+
In RFC 3069 it is called VLAN Aggregation.
Cisco and Allied Telesyn call it Private VLAN.
Hewlett-Packard call it Source-Port filtering or port-isolation.
@@ -1187,26 +1340,33 @@ proxy_arp_pvlan - BOOLEAN
shared_media - BOOLEAN
Send(router) or accept(host) RFC1620 shared media redirects.
Overrides secure_redirects.
+
shared_media for the interface will be enabled if at least one of
conf/{all,interface}/shared_media is set to TRUE,
it will be disabled otherwise
+
default TRUE
secure_redirects - BOOLEAN
Accept ICMP redirect messages only to gateways listed in the
interface's current gateway list. Even if disabled, RFC1122 redirect
rules still apply.
+
Overridden by shared_media.
+
secure_redirects for the interface will be enabled if at least one of
conf/{all,interface}/secure_redirects is set to TRUE,
it will be disabled otherwise
+
default TRUE
send_redirects - BOOLEAN
Send redirects, if router.
+
send_redirects for the interface will be enabled if at least one of
conf/{all,interface}/send_redirects is set to TRUE,
it will be disabled otherwise
+
Default: TRUE
bootp_relay - BOOLEAN
@@ -1215,15 +1375,20 @@ bootp_relay - BOOLEAN
BOOTP relay daemon will catch and forward such packets.
conf/all/bootp_relay must also be set to TRUE to enable BOOTP relay
for the interface
+
default FALSE
+
Not Implemented Yet.
accept_source_route - BOOLEAN
Accept packets with SRR option.
conf/all/accept_source_route must also be set to TRUE to accept packets
with SRR option on the interface
- default TRUE (router)
- FALSE (host)
+
+ default
+
+ - TRUE (router)
+ - FALSE (host)
accept_local - BOOLEAN
Accept packets with local source addresses. In combination with
@@ -1234,18 +1399,19 @@ accept_local - BOOLEAN
route_localnet - BOOLEAN
Do not consider loopback addresses as martian source or destination
while routing. This enables the use of 127/8 for local routing purposes.
+
default FALSE
rp_filter - INTEGER
- 0 - No source validation.
- 1 - Strict mode as defined in RFC3704 Strict Reverse Path
- Each incoming packet is tested against the FIB and if the interface
- is not the best reverse path the packet check will fail.
- By default failed packets are discarded.
- 2 - Loose mode as defined in RFC3704 Loose Reverse Path
- Each incoming packet's source address is also tested against the FIB
- and if the source address is not reachable via any interface
- the packet check will fail.
+ - 0 - No source validation.
+ - 1 - Strict mode as defined in RFC3704 Strict Reverse Path
+ Each incoming packet is tested against the FIB and if the interface
+ is not the best reverse path the packet check will fail.
+ By default failed packets are discarded.
+ - 2 - Loose mode as defined in RFC3704 Loose Reverse Path
+ Each incoming packet's source address is also tested against the FIB
+ and if the source address is not reachable via any interface
+ the packet check will fail.
Current recommended practice in RFC3704 is to enable strict mode
to prevent IP spoofing from DDos attacks. If using asymmetric routing
@@ -1258,19 +1424,19 @@ rp_filter - INTEGER
in startup scripts.
arp_filter - BOOLEAN
- 1 - Allows you to have multiple network interfaces on the same
- subnet, and have the ARPs for each interface be answered
- based on whether or not the kernel would route a packet from
- the ARP'd IP out that interface (therefore you must use source
- based routing for this to work). In other words it allows control
- of which cards (usually 1) will respond to an arp request.
-
- 0 - (default) The kernel can respond to arp requests with addresses
- from other interfaces. This may seem wrong but it usually makes
- sense, because it increases the chance of successful communication.
- IP addresses are owned by the complete host on Linux, not by
- particular interfaces. Only for more complex setups like load-
- balancing, does this behaviour cause problems.
+ - 1 - Allows you to have multiple network interfaces on the same
+ subnet, and have the ARPs for each interface be answered
+ based on whether or not the kernel would route a packet from
+ the ARP'd IP out that interface (therefore you must use source
+ based routing for this to work). In other words it allows control
+ of which cards (usually 1) will respond to an arp request.
+
+ - 0 - (default) The kernel can respond to arp requests with addresses
+ from other interfaces. This may seem wrong but it usually makes
+ sense, because it increases the chance of successful communication.
+ IP addresses are owned by the complete host on Linux, not by
+ particular interfaces. Only for more complex setups like load-
+ balancing, does this behaviour cause problems.
arp_filter for the interface will be enabled if at least one of
conf/{all,interface}/arp_filter is set to TRUE,
@@ -1280,26 +1446,27 @@ arp_announce - INTEGER
Define different restriction levels for announcing the local
source IP address from IP packets in ARP requests sent on
interface:
- 0 - (default) Use any local address, configured on any interface
- 1 - Try to avoid local addresses that are not in the target's
- subnet for this interface. This mode is useful when target
- hosts reachable via this interface require the source IP
- address in ARP requests to be part of their logical network
- configured on the receiving interface. When we generate the
- request we will check all our subnets that include the
- target IP and will preserve the source address if it is from
- such subnet. If there is no such subnet we select source
- address according to the rules for level 2.
- 2 - Always use the best local address for this target.
- In this mode we ignore the source address in the IP packet
- and try to select local address that we prefer for talks with
- the target host. Such local address is selected by looking
- for primary IP addresses on all our subnets on the outgoing
- interface that include the target IP address. If no suitable
- local address is found we select the first local address
- we have on the outgoing interface or on all other interfaces,
- with the hope we will receive reply for our request and
- even sometimes no matter the source IP address we announce.
+
+ - 0 - (default) Use any local address, configured on any interface
+ - 1 - Try to avoid local addresses that are not in the target's
+ subnet for this interface. This mode is useful when target
+ hosts reachable via this interface require the source IP
+ address in ARP requests to be part of their logical network
+ configured on the receiving interface. When we generate the
+ request we will check all our subnets that include the
+ target IP and will preserve the source address if it is from
+ such subnet. If there is no such subnet we select source
+ address according to the rules for level 2.
+ - 2 - Always use the best local address for this target.
+ In this mode we ignore the source address in the IP packet
+ and try to select local address that we prefer for talks with
+ the target host. Such local address is selected by looking
+ for primary IP addresses on all our subnets on the outgoing
+ interface that include the target IP address. If no suitable
+ local address is found we select the first local address
+ we have on the outgoing interface or on all other interfaces,
+ with the hope we will receive reply for our request and
+ even sometimes no matter the source IP address we announce.
The max value from conf/{all,interface}/arp_announce is used.
@@ -1310,32 +1477,37 @@ arp_announce - INTEGER
arp_ignore - INTEGER
Define different modes for sending replies in response to
received ARP requests that resolve local target IP addresses:
- 0 - (default): reply for any local target IP address, configured
- on any interface
- 1 - reply only if the target IP address is local address
- configured on the incoming interface
- 2 - reply only if the target IP address is local address
- configured on the incoming interface and both with the
- sender's IP address are part from same subnet on this interface
- 3 - do not reply for local addresses configured with scope host,
- only resolutions for global and link addresses are replied
- 4-7 - reserved
- 8 - do not reply for all local addresses
+
+ - 0 - (default): reply for any local target IP address, configured
+ on any interface
+ - 1 - reply only if the target IP address is local address
+ configured on the incoming interface
+ - 2 - reply only if the target IP address is local address
+ configured on the incoming interface and both with the
+ sender's IP address are part from same subnet on this interface
+ - 3 - do not reply for local addresses configured with scope host,
+ only resolutions for global and link addresses are replied
+ - 4-7 - reserved
+ - 8 - do not reply for all local addresses
The max value from conf/{all,interface}/arp_ignore is used
when ARP request is received on the {interface}
arp_notify - BOOLEAN
Define mode for notification of address and device changes.
- 0 - (default): do nothing
- 1 - Generate gratuitous arp requests when device is brought up
- or hardware address changes.
+
+ == ==========================================================
+ 0 (default): do nothing
+ 1 Generate gratuitous arp requests when device is brought up
+ or hardware address changes.
+ == ==========================================================
arp_accept - BOOLEAN
Define behavior for gratuitous ARP frames who's IP is not
already present in the ARP table:
- 0 - don't create new entries in the ARP table
- 1 - create new entries in the ARP table
+
+ - 0 - don't create new entries in the ARP table
+ - 1 - create new entries in the ARP table
Both replies and requests type gratuitous arp will trigger the
ARP table to be updated, if this setting is on.
@@ -1371,11 +1543,13 @@ disable_xfrm - BOOLEAN
igmpv2_unsolicited_report_interval - INTEGER
The interval in milliseconds in which the next unsolicited
IGMPv1 or IGMPv2 report retransmit will take place.
+
Default: 10000 (10 seconds)
igmpv3_unsolicited_report_interval - INTEGER
The interval in milliseconds in which the next unsolicited
IGMPv3 report retransmit will take place.
+
Default: 1000 (1 seconds)
promote_secondaries - BOOLEAN
@@ -1386,19 +1560,23 @@ promote_secondaries - BOOLEAN
drop_unicast_in_l2_multicast - BOOLEAN
Drop any unicast IP packets that are received in link-layer
multicast (or broadcast) frames.
+
This behavior (for multicast) is actually a SHOULD in RFC
1122, but is disabled by default for compatibility reasons.
+
Default: off (0)
drop_gratuitous_arp - BOOLEAN
Drop all gratuitous ARP frames, for example if there's a known
good ARP proxy on the network and such frames need not be used
(or in the case of 802.11, must not be used to prevent attacks.)
+
Default: off (0)
tag - INTEGER
Allows you to write a number, which can be used as required.
+
Default value is 0.
xfrm4_gc_thresh - INTEGER
@@ -1410,21 +1588,24 @@ xfrm4_gc_thresh - INTEGER
igmp_link_local_mcast_reports - BOOLEAN
Enable IGMP reports for link local multicast groups in the
224.0.0.X range.
+
Default TRUE
Alexey Kuznetsov.
kuznet@ms2.inr.ac.ru
Updated by:
-Andi Kleen
-ak@muc.de
-Nicolas Delon
-delon.nicolas@wanadoo.fr
+- Andi Kleen
+ ak@muc.de
+- Nicolas Delon
+ delon.nicolas@wanadoo.fr
-/proc/sys/net/ipv6/* Variables:
+
+/proc/sys/net/ipv6/* Variables
+==============================
IPv6 has no global variables such as tcp_*. tcp_* settings under ipv4/ also
apply to IPv6 [XXX?].
@@ -1433,8 +1614,9 @@ bindv6only - BOOLEAN
Default value for IPV6_V6ONLY socket option,
which restricts use of the IPv6 socket to IPv6 communication
only.
- TRUE: disable IPv4-mapped address feature
- FALSE: enable IPv4-mapped address feature
+
+ - TRUE: disable IPv4-mapped address feature
+ - FALSE: enable IPv4-mapped address feature
Default: FALSE (as specified in RFC3493)
@@ -1442,8 +1624,10 @@ flowlabel_consistency - BOOLEAN
Protect the consistency (and unicity) of flow label.
You have to disable it to use IPV6_FL_F_REFLECT flag on the
flow label manager.
- TRUE: enabled
- FALSE: disabled
+
+ - TRUE: enabled
+ - FALSE: disabled
+
Default: TRUE
auto_flowlabels - INTEGER
@@ -1451,22 +1635,28 @@ auto_flowlabels - INTEGER
packet. This allows intermediate devices, such as routers, to
identify packet flows for mechanisms like Equal Cost Multipath
Routing (see RFC 6438).
- 0: automatic flow labels are completely disabled
- 1: automatic flow labels are enabled by default, they can be
+
+ = ===========================================================
+ 0 automatic flow labels are completely disabled
+ 1 automatic flow labels are enabled by default, they can be
disabled on a per socket basis using the IPV6_AUTOFLOWLABEL
socket option
- 2: automatic flow labels are allowed, they may be enabled on a
+ 2 automatic flow labels are allowed, they may be enabled on a
per socket basis using the IPV6_AUTOFLOWLABEL socket option
- 3: automatic flow labels are enabled and enforced, they cannot
+ 3 automatic flow labels are enabled and enforced, they cannot
be disabled by the socket option
+ = ===========================================================
+
Default: 1
flowlabel_state_ranges - BOOLEAN
Split the flow label number space into two ranges. 0-0x7FFFF is
reserved for the IPv6 flow manager facility, 0x80000-0xFFFFF
is reserved for stateless flow labels as described in RFC6437.
- TRUE: enabled
- FALSE: disabled
+
+ - TRUE: enabled
+ - FALSE: disabled
+
Default: true
flowlabel_reflect - INTEGER
@@ -1476,49 +1666,59 @@ flowlabel_reflect - INTEGER
https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
This is a bitmask.
- 1: enabled for established flows
- Note that this prevents automatic flowlabel changes, as done
- in "tcp: change IPv6 flow-label upon receiving spurious retransmission"
- and "tcp: Change txhash on every SYN and RTO retransmit"
+ - 1: enabled for established flows
+
+ Note that this prevents automatic flowlabel changes, as done
+ in "tcp: change IPv6 flow-label upon receiving spurious retransmission"
+ and "tcp: Change txhash on every SYN and RTO retransmit"
- 2: enabled for TCP RESET packets (no active listener)
- If set, a RST packet sent in response to a SYN packet on a closed
- port will reflect the incoming flow label.
+ - 2: enabled for TCP RESET packets (no active listener)
+ If set, a RST packet sent in response to a SYN packet on a closed
+ port will reflect the incoming flow label.
- 4: enabled for ICMPv6 echo reply messages.
+ - 4: enabled for ICMPv6 echo reply messages.
Default: 0
fib_multipath_hash_policy - INTEGER
Controls which hash policy to use for multipath routes.
+
Default: 0 (Layer 3)
+
Possible values:
- 0 - Layer 3 (source and destination addresses plus flow label)
- 1 - Layer 4 (standard 5-tuple)
- 2 - Layer 3 or inner Layer 3 if present
+
+ - 0 - Layer 3 (source and destination addresses plus flow label)
+ - 1 - Layer 4 (standard 5-tuple)
+ - 2 - Layer 3 or inner Layer 3 if present
anycast_src_echo_reply - BOOLEAN
Controls the use of anycast addresses as source addresses for ICMPv6
echo reply
- TRUE: enabled
- FALSE: disabled
+
+ - TRUE: enabled
+ - FALSE: disabled
+
Default: FALSE
idgen_delay - INTEGER
Controls the delay in seconds after which time to retry
privacy stable address generation if a DAD conflict is
detected.
+
Default: 1 (as specified in RFC7217)
idgen_retries - INTEGER
Controls the number of retries to generate a stable privacy
address if a DAD conflict is detected.
+
Default: 3 (as specified in RFC7217)
mld_qrv - INTEGER
Controls the MLD query robustness variable (see RFC3810 9.1).
+
Default: 2 (as specified by RFC3810 9.1)
+
Minimum: 1 (as specified by RFC6636 4.5)
max_dst_opts_number - INTEGER
@@ -1526,6 +1726,7 @@ max_dst_opts_number - INTEGER
options extension header. If this value is less than zero
then unknown options are disallowed and the number of known
TLVs allowed is the absolute value of this number.
+
Default: 8
max_hbh_opts_number - INTEGER
@@ -1533,16 +1734,19 @@ max_hbh_opts_number - INTEGER
options extension header. If this value is less than zero
then unknown options are disallowed and the number of known
TLVs allowed is the absolute value of this number.
+
Default: 8
max_dst_opts_length - INTEGER
Maximum length allowed for a Destination options extension
header.
+
Default: INT_MAX (unlimited)
max_hbh_length - INTEGER
Maximum length allowed for a Hop-by-Hop options extension
header.
+
Default: INT_MAX (unlimited)
skip_notify_on_dev_down - BOOLEAN
@@ -1551,8 +1755,21 @@ skip_notify_on_dev_down - BOOLEAN
generate this message; IPv6 does by default. Setting this sysctl
to true skips the message, making IPv4 and IPv6 on par in relying
on userspace caches to track link events and evict routes.
+
Default: false (generate message)
+nexthop_compat_mode - BOOLEAN
+ New nexthop API provides a means for managing nexthops independent of
+ prefixes. Backwards compatibilty with old route format is enabled by
+ default which means route dumps and notifications contain the new
+ nexthop attribute but also the full, expanded nexthop definition.
+ Further, updates or deletes of a nexthop configuration generate route
+ notifications for each fib entry using the nexthop. Once a system
+ understands the new API, this sysctl can be disabled to achieve full
+ performance benefits of the new API by disabling the nexthop expansion
+ and extraneous notifications.
+ Default: true (backward compat mode)
+
IPv6 Fragmentation:
ip6frag_high_thresh - INTEGER
@@ -1573,18 +1790,20 @@ seg6_flowlabel - INTEGER
Controls the behaviour of computing the flowlabel of outer
IPv6 header in case of SR T.encaps
- -1 set flowlabel to zero.
- 0 copy flowlabel from Inner packet in case of Inner IPv6
- (Set flowlabel to 0 in case IPv4/L2)
- 1 Compute the flowlabel using seg6_make_flowlabel()
+ == =======================================================
+ -1 set flowlabel to zero.
+ 0 copy flowlabel from Inner packet in case of Inner IPv6
+ (Set flowlabel to 0 in case IPv4/L2)
+ 1 Compute the flowlabel using seg6_make_flowlabel()
+ == =======================================================
Default is 0.
-conf/default/*:
+``conf/default/*``:
Change the interface-specific default settings.
-conf/all/*:
+``conf/all/*``:
Change all the interface-specific settings.
[XXX: Other special features than forwarding?]
@@ -1608,9 +1827,10 @@ fwmark_reflect - BOOLEAN
associated with a socket for example, TCP RSTs or ICMPv6 echo replies).
If unset, these packets have a fwmark of zero. If set, they have the
fwmark of the packet they are replying to.
+
Default: 0
-conf/interface/*:
+``conf/interface/*``:
Change special settings per interface.
The functional behaviour for certain settings is different
@@ -1625,31 +1845,40 @@ accept_ra - INTEGER
transmitted.
Possible values are:
- 0 Do not accept Router Advertisements.
- 1 Accept Router Advertisements if forwarding is disabled.
- 2 Overrule forwarding behaviour. Accept Router Advertisements
- even if forwarding is enabled.
- Functional default: enabled if local forwarding is disabled.
- disabled if local forwarding is enabled.
+ == ===========================================================
+ 0 Do not accept Router Advertisements.
+ 1 Accept Router Advertisements if forwarding is disabled.
+ 2 Overrule forwarding behaviour. Accept Router Advertisements
+ even if forwarding is enabled.
+ == ===========================================================
+
+ Functional default:
+
+ - enabled if local forwarding is disabled.
+ - disabled if local forwarding is enabled.
accept_ra_defrtr - BOOLEAN
Learn default router in Router Advertisement.
- Functional default: enabled if accept_ra is enabled.
- disabled if accept_ra is disabled.
+ Functional default:
+
+ - enabled if accept_ra is enabled.
+ - disabled if accept_ra is disabled.
accept_ra_from_local - BOOLEAN
Accept RA with source-address that is found on local machine
- if the RA is otherwise proper and able to be accepted.
- Default is to NOT accept these as it may be an un-intended
- network loop.
+ if the RA is otherwise proper and able to be accepted.
+
+ Default is to NOT accept these as it may be an un-intended
+ network loop.
Functional default:
- enabled if accept_ra_from_local is enabled
- on a specific interface.
- disabled if accept_ra_from_local is disabled
- on a specific interface.
+
+ - enabled if accept_ra_from_local is enabled
+ on a specific interface.
+ - disabled if accept_ra_from_local is disabled
+ on a specific interface.
accept_ra_min_hop_limit - INTEGER
Minimum hop limit Information in Router Advertisement.
@@ -1662,8 +1891,10 @@ accept_ra_min_hop_limit - INTEGER
accept_ra_pinfo - BOOLEAN
Learn Prefix Information in Router Advertisement.
- Functional default: enabled if accept_ra is enabled.
- disabled if accept_ra is disabled.
+ Functional default:
+
+ - enabled if accept_ra is enabled.
+ - disabled if accept_ra is disabled.
accept_ra_rt_info_min_plen - INTEGER
Minimum prefix length of Route Information in RA.
@@ -1671,8 +1902,10 @@ accept_ra_rt_info_min_plen - INTEGER
Route Information w/ prefix smaller than this variable shall
be ignored.
- Functional default: 0 if accept_ra_rtr_pref is enabled.
- -1 if accept_ra_rtr_pref is disabled.
+ Functional default:
+
+ * 0 if accept_ra_rtr_pref is enabled.
+ * -1 if accept_ra_rtr_pref is disabled.
accept_ra_rt_info_max_plen - INTEGER
Maximum prefix length of Route Information in RA.
@@ -1680,33 +1913,41 @@ accept_ra_rt_info_max_plen - INTEGER
Route Information w/ prefix larger than this variable shall
be ignored.
- Functional default: 0 if accept_ra_rtr_pref is enabled.
- -1 if accept_ra_rtr_pref is disabled.
+ Functional default:
+
+ * 0 if accept_ra_rtr_pref is enabled.
+ * -1 if accept_ra_rtr_pref is disabled.
accept_ra_rtr_pref - BOOLEAN
Accept Router Preference in RA.
- Functional default: enabled if accept_ra is enabled.
- disabled if accept_ra is disabled.
+ Functional default:
+
+ - enabled if accept_ra is enabled.
+ - disabled if accept_ra is disabled.
accept_ra_mtu - BOOLEAN
Apply the MTU value specified in RA option 5 (RFC4861). If
disabled, the MTU specified in the RA will be ignored.
- Functional default: enabled if accept_ra is enabled.
- disabled if accept_ra is disabled.
+ Functional default:
+
+ - enabled if accept_ra is enabled.
+ - disabled if accept_ra is disabled.
accept_redirects - BOOLEAN
Accept Redirects.
- Functional default: enabled if local forwarding is disabled.
- disabled if local forwarding is enabled.
+ Functional default:
+
+ - enabled if local forwarding is disabled.
+ - disabled if local forwarding is enabled.
accept_source_route - INTEGER
Accept source routing (routing extension header).
- >= 0: Accept only routing header type 2.
- < 0: Do not accept routing header.
+ - >= 0: Accept only routing header type 2.
+ - < 0: Do not accept routing header.
Default: 0
@@ -1714,24 +1955,30 @@ autoconf - BOOLEAN
Autoconfigure addresses using Prefix Information in Router
Advertisements.
- Functional default: enabled if accept_ra_pinfo is enabled.
- disabled if accept_ra_pinfo is disabled.
+ Functional default:
+
+ - enabled if accept_ra_pinfo is enabled.
+ - disabled if accept_ra_pinfo is disabled.
dad_transmits - INTEGER
The amount of Duplicate Address Detection probes to send.
+
Default: 1
forwarding - INTEGER
Configure interface-specific Host/Router behaviour.
- Note: It is recommended to have the same setting on all
- interfaces; mixed router/host scenarios are rather uncommon.
+ .. note::
+
+ It is recommended to have the same setting on all
+ interfaces; mixed router/host scenarios are rather uncommon.
Possible values are:
- 0 Forwarding disabled
- 1 Forwarding enabled
- FALSE (0):
+ - 0 Forwarding disabled
+ - 1 Forwarding enabled
+
+ **FALSE (0)**:
By default, Host behaviour is assumed. This means:
@@ -1742,7 +1989,7 @@ forwarding - INTEGER
Advertisements (and do autoconfiguration).
4. If accept_redirects is TRUE (default), accept Redirects.
- TRUE (1):
+ **TRUE (1)**:
If local forwarding is enabled, Router behaviour is assumed.
This means exactly the reverse from the above:
@@ -1753,19 +2000,22 @@ forwarding - INTEGER
4. Redirects are ignored.
Default: 0 (disabled) if global forwarding is disabled (default),
- otherwise 1 (enabled).
+ otherwise 1 (enabled).
hop_limit - INTEGER
Default Hop Limit to set.
+
Default: 64
mtu - INTEGER
Default Maximum Transfer Unit
+
Default: 1280 (IPv6 required minimum)
ip_nonlocal_bind - BOOLEAN
If set, allows processes to bind() to non-local IPv6 addresses,
which can be quite useful - but may break some applications.
+
Default: 0
router_probe_interval - INTEGER
@@ -1777,15 +2027,18 @@ router_probe_interval - INTEGER
router_solicitation_delay - INTEGER
Number of seconds to wait after interface is brought up
before sending Router Solicitations.
+
Default: 1
router_solicitation_interval - INTEGER
Number of seconds to wait between Router Solicitations.
+
Default: 4
router_solicitations - INTEGER
Number of Router Solicitations to send until assuming no
routers are present.
+
Default: 3
use_oif_addrs_only - BOOLEAN
@@ -1797,28 +2050,35 @@ use_oif_addrs_only - BOOLEAN
use_tempaddr - INTEGER
Preference for Privacy Extensions (RFC3041).
- <= 0 : disable Privacy Extensions
- == 1 : enable Privacy Extensions, but prefer public
- addresses over temporary addresses.
- > 1 : enable Privacy Extensions and prefer temporary
- addresses over public addresses.
- Default: 0 (for most devices)
- -1 (for point-to-point devices and loopback devices)
+
+ * <= 0 : disable Privacy Extensions
+ * == 1 : enable Privacy Extensions, but prefer public
+ addresses over temporary addresses.
+ * > 1 : enable Privacy Extensions and prefer temporary
+ addresses over public addresses.
+
+ Default:
+
+ * 0 (for most devices)
+ * -1 (for point-to-point devices and loopback devices)
temp_valid_lft - INTEGER
valid lifetime (in seconds) for temporary addresses.
- Default: 604800 (7 days)
+
+ Default: 172800 (2 days)
temp_prefered_lft - INTEGER
Preferred lifetime (in seconds) for temporary addresses.
+
Default: 86400 (1 day)
keep_addr_on_down - INTEGER
Keep all IPv6 addresses on an interface down event. If set static
global addresses with no expiration time are not flushed.
- >0 : enabled
- 0 : system default
- <0 : disabled
+
+ * >0 : enabled
+ * 0 : system default
+ * <0 : disabled
Default: 0 (addresses are removed)
@@ -1827,11 +2087,13 @@ max_desync_factor - INTEGER
that ensures that clients don't synchronize with each
other and generate new addresses at exactly the same time.
value is in seconds.
+
Default: 600
regen_max_retry - INTEGER
Number of attempts before give up attempting to generate
valid temporary addresses.
+
Default: 5
max_addresses - INTEGER
@@ -1839,12 +2101,14 @@ max_addresses - INTEGER
to zero disables the limitation. It is not recommended to set this
value too large (or to zero) because it would be an easy way to
crash the kernel by allowing too many addresses to be created.
+
Default: 16
disable_ipv6 - BOOLEAN
Disable IPv6 operation. If accept_dad is set to 2, this value
will be dynamically set to TRUE if DAD fails for the link-local
address.
+
Default: FALSE (enable IPv6 operation)
When this value is changed from 1 to 0 (IPv6 is being enabled),
@@ -1858,10 +2122,13 @@ disable_ipv6 - BOOLEAN
accept_dad - INTEGER
Whether to accept DAD (Duplicate Address Detection).
- 0: Disable DAD
- 1: Enable DAD (default)
- 2: Enable DAD, and disable IPv6 operation if MAC-based duplicate
- link-local address has been found.
+
+ == ==============================================================
+ 0 Disable DAD
+ 1 Enable DAD (default)
+ 2 Enable DAD, and disable IPv6 operation if MAC-based duplicate
+ link-local address has been found.
+ == ==============================================================
DAD operation and mode on a given interface will be selected according
to the maximum value of conf/{all,interface}/accept_dad.
@@ -1869,6 +2136,7 @@ accept_dad - INTEGER
force_tllao - BOOLEAN
Enable sending the target link-layer address option even when
responding to a unicast neighbor solicitation.
+
Default: FALSE
Quoting from RFC 2461, section 4.4, Target link-layer address:
@@ -1886,9 +2154,10 @@ force_tllao - BOOLEAN
ndisc_notify - BOOLEAN
Define mode for notification of address and device changes.
- 0 - (default): do nothing
- 1 - Generate unsolicited neighbour advertisements when device is brought
- up or hardware address changes.
+
+ * 0 - (default): do nothing
+ * 1 - Generate unsolicited neighbour advertisements when device is brought
+ up or hardware address changes.
ndisc_tclass - INTEGER
The IPv6 Traffic Class to use by default when sending IPv6 Neighbor
@@ -1897,33 +2166,38 @@ ndisc_tclass - INTEGER
These 8 bits can be interpreted as 6 high order bits holding the DSCP
value and 2 low order bits representing ECN (which you probably want
to leave cleared).
- 0 - (default)
+
+ * 0 - (default)
mldv1_unsolicited_report_interval - INTEGER
The interval in milliseconds in which the next unsolicited
MLDv1 report retransmit will take place.
+
Default: 10000 (10 seconds)
mldv2_unsolicited_report_interval - INTEGER
The interval in milliseconds in which the next unsolicited
MLDv2 report retransmit will take place.
+
Default: 1000 (1 second)
force_mld_version - INTEGER
- 0 - (default) No enforcement of a MLD version, MLDv1 fallback allowed
- 1 - Enforce to use MLD version 1
- 2 - Enforce to use MLD version 2
+ * 0 - (default) No enforcement of a MLD version, MLDv1 fallback allowed
+ * 1 - Enforce to use MLD version 1
+ * 2 - Enforce to use MLD version 2
suppress_frag_ndisc - INTEGER
Control RFC 6980 (Security Implications of IPv6 Fragmentation
with IPv6 Neighbor Discovery) behavior:
- 1 - (default) discard fragmented neighbor discovery packets
- 0 - allow fragmented neighbor discovery packets
+
+ * 1 - (default) discard fragmented neighbor discovery packets
+ * 0 - allow fragmented neighbor discovery packets
optimistic_dad - BOOLEAN
Whether to perform Optimistic Duplicate Address Detection (RFC 4429).
- 0: disabled (default)
- 1: enabled
+
+ * 0: disabled (default)
+ * 1: enabled
Optimistic Duplicate Address Detection for the interface will be enabled
if at least one of conf/{all,interface}/optimistic_dad is set to 1,
@@ -1934,8 +2208,9 @@ use_optimistic - BOOLEAN
source address selection. Preferred addresses will still be chosen
before optimistic addresses, subject to other ranking in the source
address selection algorithm.
- 0: disabled (default)
- 1: enabled
+
+ * 0: disabled (default)
+ * 1: enabled
This will be enabled if at least one of
conf/{all,interface}/use_optimistic is set to 1, disabled otherwise.
@@ -1957,12 +2232,14 @@ stable_secret - IPv6 address
addr_gen_mode - INTEGER
Defines how link-local and autoconf addresses are generated.
- 0: generate address based on EUI64 (default)
- 1: do no generate a link-local address, use EUI64 for addresses generated
- from autoconf
- 2: generate stable privacy addresses, using the secret from
+ = =================================================================
+ 0 generate address based on EUI64 (default)
+ 1 do no generate a link-local address, use EUI64 for addresses
+ generated from autoconf
+ 2 generate stable privacy addresses, using the secret from
stable_secret (RFC7217)
- 3: generate stable privacy addresses, using a random secret if unset
+ 3 generate stable privacy addresses, using a random secret if unset
+ = =================================================================
drop_unicast_in_l2_multicast - BOOLEAN
Drop any unicast IPv6 packets that are received in link-layer
@@ -1984,13 +2261,18 @@ enhanced_dad - BOOLEAN
detection of duplicates due to loopback of the NS messages that we send.
The nonce option will be sent on an interface unless both of
conf/{all,interface}/enhanced_dad are set to FALSE.
+
Default: TRUE
-icmp/*:
+``icmp/*``:
+===========
+
ratelimit - INTEGER
Limit the maximal rates for sending ICMPv6 messages.
+
0 to disable any limiting,
otherwise the minimal space between responses in milliseconds.
+
Default: 1000
ratemask - list of comma separated ranges
@@ -2011,16 +2293,19 @@ ratemask - list of comma separated ranges
echo_ignore_all - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO
requests sent to it over the IPv6 protocol.
+
Default: 0
echo_ignore_multicast - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO
requests sent to it over the IPv6 protocol via multicast.
+
Default: 0
echo_ignore_anycast - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO
requests sent to it over the IPv6 protocol destined to anycast address.
+
Default: 0
xfrm6_gc_thresh - INTEGER
@@ -2036,43 +2321,52 @@ YOSHIFUJI Hideaki / USAGI Project <yoshfuji@linux-ipv6.org>
/proc/sys/net/bridge/* Variables:
+=================================
bridge-nf-call-arptables - BOOLEAN
- 1 : pass bridged ARP traffic to arptables' FORWARD chain.
- 0 : disable this.
+ - 1 : pass bridged ARP traffic to arptables' FORWARD chain.
+ - 0 : disable this.
+
Default: 1
bridge-nf-call-iptables - BOOLEAN
- 1 : pass bridged IPv4 traffic to iptables' chains.
- 0 : disable this.
+ - 1 : pass bridged IPv4 traffic to iptables' chains.
+ - 0 : disable this.
+
Default: 1
bridge-nf-call-ip6tables - BOOLEAN
- 1 : pass bridged IPv6 traffic to ip6tables' chains.
- 0 : disable this.
+ - 1 : pass bridged IPv6 traffic to ip6tables' chains.
+ - 0 : disable this.
+
Default: 1
bridge-nf-filter-vlan-tagged - BOOLEAN
- 1 : pass bridged vlan-tagged ARP/IP/IPv6 traffic to {arp,ip,ip6}tables.
- 0 : disable this.
+ - 1 : pass bridged vlan-tagged ARP/IP/IPv6 traffic to {arp,ip,ip6}tables.
+ - 0 : disable this.
+
Default: 0
bridge-nf-filter-pppoe-tagged - BOOLEAN
- 1 : pass bridged pppoe-tagged IP/IPv6 traffic to {ip,ip6}tables.
- 0 : disable this.
+ - 1 : pass bridged pppoe-tagged IP/IPv6 traffic to {ip,ip6}tables.
+ - 0 : disable this.
+
Default: 0
bridge-nf-pass-vlan-input-dev - BOOLEAN
- 1: if bridge-nf-filter-vlan-tagged is enabled, try to find a vlan
- interface on the bridge and set the netfilter input device to the vlan.
- This allows use of e.g. "iptables -i br0.1" and makes the REDIRECT
- target work with vlan-on-top-of-bridge interfaces. When no matching
- vlan interface is found, or this switch is off, the input device is
- set to the bridge interface.
- 0: disable bridge netfilter vlan interface lookup.
+ - 1: if bridge-nf-filter-vlan-tagged is enabled, try to find a vlan
+ interface on the bridge and set the netfilter input device to the
+ vlan. This allows use of e.g. "iptables -i br0.1" and makes the
+ REDIRECT target work with vlan-on-top-of-bridge interfaces. When no
+ matching vlan interface is found, or this switch is off, the input
+ device is set to the bridge interface.
+
+ - 0: disable bridge netfilter vlan interface lookup.
+
Default: 0
-proc/sys/net/sctp/* Variables:
+``proc/sys/net/sctp/*`` Variables:
+==================================
addip_enable - BOOLEAN
Enable or disable extension of Dynamic Address Reconfiguration
@@ -2137,11 +2431,13 @@ addip_noauth_enable - BOOLEAN
we provide this variable to control the enforcement of the
authentication requirement.
- 1: Allow ADD-IP extension to be used without authentication. This
+ == ===============================================================
+ 1 Allow ADD-IP extension to be used without authentication. This
should only be set in a closed environment for interoperability
with older implementations.
- 0: Enforce the authentication requirement
+ 0 Enforce the authentication requirement
+ == ===============================================================
Default: 0
@@ -2151,8 +2447,8 @@ auth_enable - BOOLEAN
required for secure operation of Dynamic Address Reconfiguration
(ADD-IP) extension.
- 1: Enable this extension.
- 0: Disable this extension.
+ - 1: Enable this extension.
+ - 0: Disable this extension.
Default: 0
@@ -2160,8 +2456,8 @@ prsctp_enable - BOOLEAN
Enable or disable the Partial Reliability extension (RFC3758) which
is used to notify peers that a given DATA should no longer be expected.
- 1: Enable extension
- 0: Disable
+ - 1: Enable extension
+ - 0: Disable
Default: 1
@@ -2263,8 +2559,8 @@ cookie_preserve_enable - BOOLEAN
Enable or disable the ability to extend the lifetime of the SCTP cookie
that is used during the establishment phase of SCTP association
- 1: Enable cookie lifetime extension.
- 0: Disable
+ - 1: Enable cookie lifetime extension.
+ - 0: Disable
Default: 1
@@ -2272,9 +2568,11 @@ cookie_hmac_alg - STRING
Select the hmac algorithm used when generating the cookie value sent by
a listening sctp socket to a connecting client in the INIT-ACK chunk.
Valid values are:
+
* md5
* sha1
* none
+
Ability to assign md5 or sha1 as the selected alg is predicated on the
configuration of those algorithms at build time (CONFIG_CRYPTO_MD5 and
CONFIG_CRYPTO_SHA1).
@@ -2293,16 +2591,16 @@ rcvbuf_policy - INTEGER
to each association instead of the socket. This prevents the described
blocking.
- 1: rcvbuf space is per association
- 0: rcvbuf space is per socket
+ - 1: rcvbuf space is per association
+ - 0: rcvbuf space is per socket
Default: 0
sndbuf_policy - INTEGER
Similar to rcvbuf_policy above, this applies to send buffer space.
- 1: Send buffer is tracked per association
- 0: Send buffer is tracked per socket.
+ - 1: Send buffer is tracked per association
+ - 0: Send buffer is tracked per socket.
Default: 0
@@ -2335,19 +2633,23 @@ sctp_wmem - vector of 3 INTEGERs: min, default, max
addr_scope_policy - INTEGER
Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00
- 0 - Disable IPv4 address scoping
- 1 - Enable IPv4 address scoping
- 2 - Follow draft but allow IPv4 private addresses
- 3 - Follow draft but allow IPv4 link local addresses
+ - 0 - Disable IPv4 address scoping
+ - 1 - Enable IPv4 address scoping
+ - 2 - Follow draft but allow IPv4 private addresses
+ - 3 - Follow draft but allow IPv4 link local addresses
Default: 1
-/proc/sys/net/core/*
+``/proc/sys/net/core/*``
+========================
+
Please see: Documentation/admin-guide/sysctl/net.rst for descriptions of these entries.
-/proc/sys/net/unix/*
+``/proc/sys/net/unix/*``
+========================
+
max_dgram_qlen - INTEGER
The maximum length of dgram socket receive queue
diff --git a/Documentation/networking/ip_dynaddr.txt b/Documentation/networking/ip_dynaddr.rst
index 45f3c1268e86..eacc0c780c7f 100644
--- a/Documentation/networking/ip_dynaddr.txt
+++ b/Documentation/networking/ip_dynaddr.rst
@@ -1,10 +1,15 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
IP dynamic address hack-port v0.03
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+==================================
+
This stuff allows diald ONESHOT connections to get established by
dynamically changing packet source address (and socket's if local procs).
It is implemented for TCP diald-box connections(1) and IP_MASQuerading(2).
-If enabled[*] and forwarding interface has changed:
+If enabled\ [#]_ and forwarding interface has changed:
+
1) Socket (and packet) source address is rewritten ON RETRANSMISSIONS
while in SYN_SENT state (diald-box processes).
2) Out-bounded MASQueraded source address changes ON OUTPUT (when
@@ -12,18 +17,24 @@ If enabled[*] and forwarding interface has changed:
received by the tunnel.
This is specially helpful for auto dialup links (diald), where the
-``actual'' outgoing address is unknown at the moment the link is
+``actual`` outgoing address is unknown at the moment the link is
going up. So, the *same* (local AND masqueraded) connections requests that
bring the link up will be able to get established.
-[*] At boot, by default no address rewriting is attempted.
- To enable:
+.. [#] At boot, by default no address rewriting is attempted.
+
+ To enable::
+
# echo 1 > /proc/sys/net/ipv4/ip_dynaddr
- To enable verbose mode:
- # echo 2 > /proc/sys/net/ipv4/ip_dynaddr
- To disable (default)
+
+ To enable verbose mode::
+
+ # echo 2 > /proc/sys/net/ipv4/ip_dynaddr
+
+ To disable (default)::
+
# echo 0 > /proc/sys/net/ipv4/ip_dynaddr
Enjoy!
--- Juanjo <jjciarla@raiz.uncu.edu.ar>
+Juanjo <jjciarla@raiz.uncu.edu.ar>
diff --git a/Documentation/networking/ipddp.txt b/Documentation/networking/ipddp.rst
index ba5c217fffe0..be7091b77927 100644
--- a/Documentation/networking/ipddp.txt
+++ b/Documentation/networking/ipddp.rst
@@ -1,7 +1,12 @@
-Text file for ipddp.c:
- AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation
+.. SPDX-License-Identifier: GPL-2.0
-This text file is written by Jay Schulist <jschlst@samba.org>
+=========================================================
+AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation
+=========================================================
+
+Documentation ipddp.c
+
+This file is written by Jay Schulist <jschlst@samba.org>
Introduction
------------
@@ -21,7 +26,7 @@ kernel AppleTalk layer and drivers are available.
Each mode requires its own user space software.
Compiling AppleTalk-IP Decapsulation/Encapsulation
-=================================================
+==================================================
AppleTalk-IP decapsulation needs to be compiled into your kernel. You
will need to turn on AppleTalk-IP driver support. Then you will need to
diff --git a/Documentation/networking/iphase.txt b/Documentation/networking/iphase.rst
index 670b72f16585..92d9b757d75a 100644
--- a/Documentation/networking/iphase.txt
+++ b/Documentation/networking/iphase.rst
@@ -1,27 +1,35 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
+ATM (i)Chip IA Linux Driver Source
+==================================
+
+ READ ME FISRT
- READ ME FISRT
- ATM (i)Chip IA Linux Driver Source
--------------------------------------------------------------------------------
- Read This Before You Begin!
+
+ Read This Before You Begin!
+
--------------------------------------------------------------------------------
Description
------------
+===========
-This is the README file for the Interphase PCI ATM (i)Chip IA Linux driver
+This is the README file for the Interphase PCI ATM (i)Chip IA Linux driver
source release.
The features and limitations of this driver are as follows:
+
- A single VPI (VPI value of 0) is supported.
- - Supports 4K VCs for the server board (with 512K control memory) and 1K
+ - Supports 4K VCs for the server board (with 512K control memory) and 1K
VCs for the client board (with 128K control memory).
- UBR, ABR and CBR service categories are supported.
- - Only AAL5 is supported.
- - Supports setting of PCR on the VCs.
+ - Only AAL5 is supported.
+ - Supports setting of PCR on the VCs.
- Multiple adapters in a system are supported.
- - All variants of Interphase ATM PCI (i)Chip adapter cards are supported,
- including x575 (OC3, control memory 128K , 512K and packet memory 128K,
- 512K and 1M), x525 (UTP25) and x531 (DS3 and E3). See
+ - All variants of Interphase ATM PCI (i)Chip adapter cards are supported,
+ including x575 (OC3, control memory 128K , 512K and packet memory 128K,
+ 512K and 1M), x525 (UTP25) and x531 (DS3 and E3). See
http://www.iphase.com/
for details.
- Only x86 platforms are supported.
@@ -29,128 +37,155 @@ The features and limitations of this driver are as follows:
Before You Start
-----------------
+================
Installation
------------
1. Installing the adapters in the system
+
To install the ATM adapters in the system, follow the steps below.
+
a. Login as root.
b. Shut down the system and power off the system.
c. Install one or more ATM adapters in the system.
- d. Connect each adapter to a port on an ATM switch. The green 'Link'
- LED on the front panel of the adapter will be on if the adapter is
- connected to the switch properly when the system is powered up.
+ d. Connect each adapter to a port on an ATM switch. The green 'Link'
+ LED on the front panel of the adapter will be on if the adapter is
+ connected to the switch properly when the system is powered up.
e. Power on and boot the system.
2. [ Removed ]
3. Rebuild kernel with ABR support
+
[ a. and b. removed ]
- c. Reconfigure the kernel, choose the Interphase ia driver through "make
+
+ c. Reconfigure the kernel, choose the Interphase ia driver through "make
menuconfig" or "make xconfig".
- d. Rebuild the kernel, loadable modules and the atm tools.
+ d. Rebuild the kernel, loadable modules and the atm tools.
e. Install the new built kernel and modules and reboot.
4. Load the adapter hardware driver (ia driver) if it is built as a module
+
a. Login as root.
b. Change directory to /lib/modules/<kernel-version>/atm.
c. Run "insmod suni.o;insmod iphase.o"
- The yellow 'status' LED on the front panel of the adapter will blink
- while the driver is loaded in the system.
- d. To verify that the 'ia' driver is loaded successfully, run the
- following command:
+ The yellow 'status' LED on the front panel of the adapter will blink
+ while the driver is loaded in the system.
+ d. To verify that the 'ia' driver is loaded successfully, run the
+ following command::
- cat /proc/atm/devices
+ cat /proc/atm/devices
- If the driver is loaded successfully, the output of the command will
- be similar to the following lines:
+ If the driver is loaded successfully, the output of the command will
+ be similar to the following lines::
- Itf Type ESI/"MAC"addr AAL(TX,err,RX,err,drop) ...
- 0 ia xxxxxxxxx 0 ( 0 0 0 0 0 ) 5 ( 0 0 0 0 0 )
+ Itf Type ESI/"MAC"addr AAL(TX,err,RX,err,drop) ...
+ 0 ia xxxxxxxxx 0 ( 0 0 0 0 0 ) 5 ( 0 0 0 0 0 )
- You can also check the system log file /var/log/messages for messages
- related to the ATM driver.
+ You can also check the system log file /var/log/messages for messages
+ related to the ATM driver.
-5. Ia Driver Configuration
+5. Ia Driver Configuration
5.1 Configuration of adapter buffers
The (i)Chip boards have 3 different packet RAM size variants: 128K, 512K and
- 1M. The RAM size decides the number of buffers and buffer size. The default
- size and number of buffers are set as following:
-
- Total Rx RAM Tx RAM Rx Buf Tx Buf Rx buf Tx buf
- RAM size size size size size cnt cnt
- -------- ------ ------ ------ ------ ------ ------
- 128K 64K 64K 10K 10K 6 6
- 512K 256K 256K 10K 10K 25 25
- 1M 512K 512K 10K 10K 51 51
+ 1M. The RAM size decides the number of buffers and buffer size. The default
+ size and number of buffers are set as following:
+
+ ========= ======= ====== ====== ====== ====== ======
+ Total Rx RAM Tx RAM Rx Buf Tx Buf Rx buf Tx buf
+ RAM size size size size size cnt cnt
+ ========= ======= ====== ====== ====== ====== ======
+ 128K 64K 64K 10K 10K 6 6
+ 512K 256K 256K 10K 10K 25 25
+ 1M 512K 512K 10K 10K 51 51
+ ========= ======= ====== ====== ====== ====== ======
These setting should work well in most environments, but can be
- changed by typing the following command:
-
- insmod <IA_DIR>/ia.o IA_RX_BUF=<RX_CNT> IA_RX_BUF_SZ=<RX_SIZE> \
- IA_TX_BUF=<TX_CNT> IA_TX_BUF_SZ=<TX_SIZE>
+ changed by typing the following command::
+
+ insmod <IA_DIR>/ia.o IA_RX_BUF=<RX_CNT> IA_RX_BUF_SZ=<RX_SIZE> \
+ IA_TX_BUF=<TX_CNT> IA_TX_BUF_SZ=<TX_SIZE>
+
Where:
- RX_CNT = number of receive buffers in the range (1-128)
- RX_SIZE = size of receive buffers in the range (48-64K)
- TX_CNT = number of transmit buffers in the range (1-128)
- TX_SIZE = size of transmit buffers in the range (48-64K)
- 1. Transmit and receive buffer size must be a multiple of 4.
- 2. Care should be taken so that the memory required for the
- transmit and receive buffers is less than or equal to the
- total adapter packet memory.
+ - RX_CNT = number of receive buffers in the range (1-128)
+ - RX_SIZE = size of receive buffers in the range (48-64K)
+ - TX_CNT = number of transmit buffers in the range (1-128)
+ - TX_SIZE = size of transmit buffers in the range (48-64K)
+
+ 1. Transmit and receive buffer size must be a multiple of 4.
+ 2. Care should be taken so that the memory required for the
+ transmit and receive buffers is less than or equal to the
+ total adapter packet memory.
5.2 Turn on ia debug trace
- When the ia driver is built with the CONFIG_ATM_IA_DEBUG flag, the driver
- can provide more debug trace if needed. There is a bit mask variable,
- IADebugFlag, which controls the output of the traces. You can find the bit
- map of the IADebugFlag in iphase.h.
- The debug trace can be turn on through the insmod command line option, for
- example, "insmod iphase.o IADebugFlag=0xffffffff" can turn on all the debug
+ When the ia driver is built with the CONFIG_ATM_IA_DEBUG flag, the driver
+ can provide more debug trace if needed. There is a bit mask variable,
+ IADebugFlag, which controls the output of the traces. You can find the bit
+ map of the IADebugFlag in iphase.h.
+ The debug trace can be turn on through the insmod command line option, for
+ example, "insmod iphase.o IADebugFlag=0xffffffff" can turn on all the debug
traces together with loading the driver.
6. Ia Driver Test Using ttcp_atm and PVC
- For the PVC setup, the test machines can either be connected back-to-back or
- through a switch. If connected through the switch, the switch must be
+ For the PVC setup, the test machines can either be connected back-to-back or
+ through a switch. If connected through the switch, the switch must be
configured for the PVC(s).
a. For UBR test:
- At the test machine intended to receive data, type:
- ttcp_atm -r -a -s 0.100
- At the other test machine, type:
- ttcp_atm -t -a -s 0.100 -n 10000
+
+ At the test machine intended to receive data, type::
+
+ ttcp_atm -r -a -s 0.100
+
+ At the other test machine, type::
+
+ ttcp_atm -t -a -s 0.100 -n 10000
+
Run "ttcp_atm -h" to display more options of the ttcp_atm tool.
b. For ABR test:
- It is the same as the UBR testing, but with an extra command option:
- -Pabr:max_pcr=<xxx>
- where:
- xxx = the maximum peak cell rate, from 170 - 353207.
- This option must be set on both the machines.
+
+ It is the same as the UBR testing, but with an extra command option::
+
+ -Pabr:max_pcr=<xxx>
+
+ where:
+
+ xxx = the maximum peak cell rate, from 170 - 353207.
+
+ This option must be set on both the machines.
+
c. For CBR test:
- It is the same as the UBR testing, but with an extra command option:
- -Pcbr:max_pcr=<xxx>
- where:
- xxx = the maximum peak cell rate, from 170 - 353207.
- This option may only be set on the transmit machine.
+ It is the same as the UBR testing, but with an extra command option::
+
+ -Pcbr:max_pcr=<xxx>
+
+ where:
+
+ xxx = the maximum peak cell rate, from 170 - 353207.
-OUTSTANDING ISSUES
-------------------
+ This option may only be set on the transmit machine.
+
+
+Outstanding Issues
+==================
Contact Information
-------------------
+::
+
Customer Support:
- United States: Telephone: (214) 654-5555
- Fax: (214) 654-5500
+ United States: Telephone: (214) 654-5555
+ Fax: (214) 654-5500
E-Mail: intouch@iphase.com
Europe: Telephone: 33 (0)1 41 15 44 00
Fax: 33 (0)1 41 15 12 13
diff --git a/Documentation/networking/ipsec.txt b/Documentation/networking/ipsec.rst
index ba794b7e51be..afe9d7b48be3 100644
--- a/Documentation/networking/ipsec.txt
+++ b/Documentation/networking/ipsec.rst
@@ -1,12 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====
+IPsec
+=====
+
Here documents known IPsec corner cases which need to be keep in mind when
deploy various IPsec configuration in real world production environment.
-1. IPcomp: Small IP packet won't get compressed at sender, and failed on
+1. IPcomp:
+ Small IP packet won't get compressed at sender, and failed on
policy check on receiver.
-Quote from RFC3173:
-2.2. Non-Expansion Policy
+Quote from RFC3173::
+
+ 2.2. Non-Expansion Policy
If the total size of a compressed payload and the IPComp header, as
defined in section 3, is not smaller than the size of the original
diff --git a/Documentation/networking/ipv6.txt b/Documentation/networking/ipv6.rst
index 6cd74fa55358..ba09c2f2dcc7 100644
--- a/Documentation/networking/ipv6.txt
+++ b/Documentation/networking/ipv6.rst
@@ -1,9 +1,15 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====
+IPv6
+====
+
Options for the ipv6 module are supplied as parameters at load time.
Module options may be given as command line arguments to the insmod
or modprobe command, but are usually specified in either
-/etc/modules.d/*.conf configuration files, or in a distro-specific
+``/etc/modules.d/*.conf`` configuration files, or in a distro-specific
configuration file.
The available ipv6 module parameters are listed below. If a parameter
diff --git a/Documentation/networking/ipvlan.txt b/Documentation/networking/ipvlan.rst
index 27a38e50c287..694adcba36b0 100644
--- a/Documentation/networking/ipvlan.txt
+++ b/Documentation/networking/ipvlan.rst
@@ -1,11 +1,15 @@
+.. SPDX-License-Identifier: GPL-2.0
- IPVLAN Driver HOWTO
+===================
+IPVLAN Driver HOWTO
+===================
Initial Release:
Mahesh Bandewar <maheshb AT google.com>
1. Introduction:
- This is conceptually very similar to the macvlan driver with one major
+================
+This is conceptually very similar to the macvlan driver with one major
exception of using L3 for mux-ing /demux-ing among slaves. This property makes
the master device share the L2 with it's slave devices. I have developed this
driver in conjunction with network namespaces and not sure if there is use case
@@ -13,34 +17,48 @@ outside of it.
2. Building and Installation:
- In order to build the driver, please select the config item CONFIG_IPVLAN.
+=============================
+
+In order to build the driver, please select the config item CONFIG_IPVLAN.
The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module
(CONFIG_IPVLAN=m).
3. Configuration:
- There are no module parameters for this driver and it can be configured
+=================
+
+There are no module parameters for this driver and it can be configured
using IProute2/ip utility.
+::
ip link add link <master> name <slave> type ipvlan [ mode MODE ] [ FLAGS ]
where
- MODE: l3 (default) | l3s | l2
- FLAGS: bridge (default) | private | vepa
+ MODE: l3 (default) | l3s | l2
+ FLAGS: bridge (default) | private | vepa
+
+e.g.
- e.g.
(a) Following will create IPvlan link with eth0 as master in
- L3 bridge mode
- bash# ip link add link eth0 name ipvl0 type ipvlan
- (b) This command will create IPvlan link in L2 bridge mode.
- bash# ip link add link eth0 name ipvl0 type ipvlan mode l2 bridge
- (c) This command will create an IPvlan device in L2 private mode.
- bash# ip link add link eth0 name ipvlan type ipvlan mode l2 private
- (d) This command will create an IPvlan device in L2 vepa mode.
- bash# ip link add link eth0 name ipvlan type ipvlan mode l2 vepa
+ L3 bridge mode::
+
+ bash# ip link add link eth0 name ipvl0 type ipvlan
+ (b) This command will create IPvlan link in L2 bridge mode::
+
+ bash# ip link add link eth0 name ipvl0 type ipvlan mode l2 bridge
+
+ (c) This command will create an IPvlan device in L2 private mode::
+
+ bash# ip link add link eth0 name ipvlan type ipvlan mode l2 private
+
+ (d) This command will create an IPvlan device in L2 vepa mode::
+
+ bash# ip link add link eth0 name ipvlan type ipvlan mode l2 vepa
4. Operating modes:
- IPvlan has two modes of operation - L2 and L3. For a given master device,
+===================
+
+IPvlan has two modes of operation - L2 and L3. For a given master device,
you can select one of these two modes and all slaves on that master will
operate in the same (selected) mode. The RX mode is almost identical except
that in L3 mode the slaves wont receive any multicast / broadcast traffic.
@@ -48,39 +66,50 @@ L3 mode is more restrictive since routing is controlled from the other (mostly)
default namespace.
4.1 L2 mode:
- In this mode TX processing happens on the stack instance attached to the
+------------
+
+In this mode TX processing happens on the stack instance attached to the
slave device and packets are switched and queued to the master device to send
out. In this mode the slaves will RX/TX multicast and broadcast (if applicable)
as well.
4.2 L3 mode:
- In this mode TX processing up to L3 happens on the stack instance attached
+------------
+
+In this mode TX processing up to L3 happens on the stack instance attached
to the slave device and packets are switched to the stack instance of the
master device for the L2 processing and routing from that instance will be
used before packets are queued on the outbound device. In this mode the slaves
will not receive nor can send multicast / broadcast traffic.
4.3 L3S mode:
- This is very similar to the L3 mode except that iptables (conn-tracking)
+-------------
+
+This is very similar to the L3 mode except that iptables (conn-tracking)
works in this mode and hence it is L3-symmetric (L3s). This will have slightly less
performance but that shouldn't matter since you are choosing this mode over plain-L3
mode to make conn-tracking work.
5. Mode flags:
- At this time following mode flags are available
+==============
+
+At this time following mode flags are available
5.1 bridge:
- This is the default option. To configure the IPvlan port in this mode,
+-----------
+This is the default option. To configure the IPvlan port in this mode,
user can choose to either add this option on the command-line or don't specify
anything. This is the traditional mode where slaves can cross-talk among
themselves apart from talking through the master device.
5.2 private:
- If this option is added to the command-line, the port is set in private
+------------
+If this option is added to the command-line, the port is set in private
mode. i.e. port won't allow cross communication between slaves.
5.3 vepa:
- If this is added to the command-line, the port is set in VEPA mode.
+---------
+If this is added to the command-line, the port is set in VEPA mode.
i.e. port will offload switching functionality to the external entity as
described in 802.1Qbg
Note: VEPA mode in IPvlan has limitations. IPvlan uses the mac-address of the
@@ -89,18 +118,25 @@ neighbor will have source and destination mac same. This will make the switch /
router send the redirect message.
6. What to choose (macvlan vs. ipvlan)?
- These two devices are very similar in many regards and the specific use
+=======================================
+
+These two devices are very similar in many regards and the specific use
case could very well define which device to choose. if one of the following
-situations defines your use case then you can choose to use ipvlan -
- (a) The Linux host that is connected to the external switch / router has
-policy configured that allows only one mac per port.
- (b) No of virtual devices created on a master exceed the mac capacity and
-puts the NIC in promiscuous mode and degraded performance is a concern.
- (c) If the slave device is to be put into the hostile / untrusted network
-namespace where L2 on the slave could be changed / misused.
+situations defines your use case then you can choose to use ipvlan:
+
+
+(a) The Linux host that is connected to the external switch / router has
+ policy configured that allows only one mac per port.
+(b) No of virtual devices created on a master exceed the mac capacity and
+ puts the NIC in promiscuous mode and degraded performance is a concern.
+(c) If the slave device is to be put into the hostile / untrusted network
+ namespace where L2 on the slave could be changed / misused.
6. Example configuration:
+=========================
+
+::
+=============================================================+
| Host: host1 |
@@ -117,30 +153,37 @@ namespace where L2 on the slave could be changed / misused.
+==============================#==============================+
- (a) Create two network namespaces - ns0, ns1
- ip netns add ns0
- ip netns add ns1
-
- (b) Create two ipvlan slaves on eth0 (master device)
- ip link add link eth0 ipvl0 type ipvlan mode l2
- ip link add link eth0 ipvl1 type ipvlan mode l2
-
- (c) Assign slaves to the respective network namespaces
- ip link set dev ipvl0 netns ns0
- ip link set dev ipvl1 netns ns1
-
- (d) Now switch to the namespace (ns0 or ns1) to configure the slave devices
- - For ns0
- (1) ip netns exec ns0 bash
- (2) ip link set dev ipvl0 up
- (3) ip link set dev lo up
- (4) ip -4 addr add 127.0.0.1 dev lo
- (5) ip -4 addr add $IPADDR dev ipvl0
- (6) ip -4 route add default via $ROUTER dev ipvl0
- - For ns1
- (1) ip netns exec ns1 bash
- (2) ip link set dev ipvl1 up
- (3) ip link set dev lo up
- (4) ip -4 addr add 127.0.0.1 dev lo
- (5) ip -4 addr add $IPADDR dev ipvl1
- (6) ip -4 route add default via $ROUTER dev ipvl1
+(a) Create two network namespaces - ns0, ns1::
+
+ ip netns add ns0
+ ip netns add ns1
+
+(b) Create two ipvlan slaves on eth0 (master device)::
+
+ ip link add link eth0 ipvl0 type ipvlan mode l2
+ ip link add link eth0 ipvl1 type ipvlan mode l2
+
+(c) Assign slaves to the respective network namespaces::
+
+ ip link set dev ipvl0 netns ns0
+ ip link set dev ipvl1 netns ns1
+
+(d) Now switch to the namespace (ns0 or ns1) to configure the slave devices
+
+ - For ns0::
+
+ (1) ip netns exec ns0 bash
+ (2) ip link set dev ipvl0 up
+ (3) ip link set dev lo up
+ (4) ip -4 addr add 127.0.0.1 dev lo
+ (5) ip -4 addr add $IPADDR dev ipvl0
+ (6) ip -4 route add default via $ROUTER dev ipvl0
+
+ - For ns1::
+
+ (1) ip netns exec ns1 bash
+ (2) ip link set dev ipvl1 up
+ (3) ip link set dev lo up
+ (4) ip -4 addr add 127.0.0.1 dev lo
+ (5) ip -4 addr add $IPADDR dev ipvl1
+ (6) ip -4 route add default via $ROUTER dev ipvl1
diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.rst
index 056898685d40..be36c4600e8f 100644
--- a/Documentation/networking/ipvs-sysctl.txt
+++ b/Documentation/networking/ipvs-sysctl.rst
@@ -1,23 +1,30 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========
+IPvs-sysctl
+===========
+
/proc/sys/net/ipv4/vs/* Variables:
+==================================
am_droprate - INTEGER
- default 10
+ default 10
- It sets the always mode drop rate, which is used in the mode 3
- of the drop_rate defense.
+ It sets the always mode drop rate, which is used in the mode 3
+ of the drop_rate defense.
amemthresh - INTEGER
- default 1024
+ default 1024
- It sets the available memory threshold (in pages), which is
- used in the automatic modes of defense. When there is no
- enough available memory, the respective strategy will be
- enabled and the variable is automatically set to 2, otherwise
- the strategy is disabled and the variable is set to 1.
+ It sets the available memory threshold (in pages), which is
+ used in the automatic modes of defense. When there is no
+ enough available memory, the respective strategy will be
+ enabled and the variable is automatically set to 2, otherwise
+ the strategy is disabled and the variable is set to 1.
backup_only - BOOLEAN
- 0 - disabled (default)
- not 0 - enabled
+ - 0 - disabled (default)
+ - not 0 - enabled
If set, disable the director function while the server is
in backup mode to avoid packet loops for DR/TUN methods.
@@ -44,8 +51,8 @@ conn_reuse_mode - INTEGER
real servers to a very busy cluster.
conntrack - BOOLEAN
- 0 - disabled (default)
- not 0 - enabled
+ - 0 - disabled (default)
+ - not 0 - enabled
If set, maintain connection tracking entries for
connections handled by IPVS.
@@ -61,28 +68,28 @@ conntrack - BOOLEAN
Only available when IPVS is compiled with CONFIG_IP_VS_NFCT enabled.
cache_bypass - BOOLEAN
- 0 - disabled (default)
- not 0 - enabled
+ - 0 - disabled (default)
+ - not 0 - enabled
- If it is enabled, forward packets to the original destination
- directly when no cache server is available and destination
- address is not local (iph->daddr is RTN_UNICAST). It is mostly
- used in transparent web cache cluster.
+ If it is enabled, forward packets to the original destination
+ directly when no cache server is available and destination
+ address is not local (iph->daddr is RTN_UNICAST). It is mostly
+ used in transparent web cache cluster.
debug_level - INTEGER
- 0 - transmission error messages (default)
- 1 - non-fatal error messages
- 2 - configuration
- 3 - destination trash
- 4 - drop entry
- 5 - service lookup
- 6 - scheduling
- 7 - connection new/expire, lookup and synchronization
- 8 - state transition
- 9 - binding destination, template checks and applications
- 10 - IPVS packet transmission
- 11 - IPVS packet handling (ip_vs_in/ip_vs_out)
- 12 or more - packet traversal
+ - 0 - transmission error messages (default)
+ - 1 - non-fatal error messages
+ - 2 - configuration
+ - 3 - destination trash
+ - 4 - drop entry
+ - 5 - service lookup
+ - 6 - scheduling
+ - 7 - connection new/expire, lookup and synchronization
+ - 8 - state transition
+ - 9 - binding destination, template checks and applications
+ - 10 - IPVS packet transmission
+ - 11 - IPVS packet handling (ip_vs_in/ip_vs_out)
+ - 12 or more - packet traversal
Only available when IPVS is compiled with CONFIG_IP_VS_DEBUG enabled.
@@ -92,58 +99,58 @@ debug_level - INTEGER
the level.
drop_entry - INTEGER
- 0 - disabled (default)
-
- The drop_entry defense is to randomly drop entries in the
- connection hash table, just in order to collect back some
- memory for new connections. In the current code, the
- drop_entry procedure can be activated every second, then it
- randomly scans 1/32 of the whole and drops entries that are in
- the SYN-RECV/SYNACK state, which should be effective against
- syn-flooding attack.
-
- The valid values of drop_entry are from 0 to 3, where 0 means
- that this strategy is always disabled, 1 and 2 mean automatic
- modes (when there is no enough available memory, the strategy
- is enabled and the variable is automatically set to 2,
- otherwise the strategy is disabled and the variable is set to
- 1), and 3 means that that the strategy is always enabled.
+ - 0 - disabled (default)
+
+ The drop_entry defense is to randomly drop entries in the
+ connection hash table, just in order to collect back some
+ memory for new connections. In the current code, the
+ drop_entry procedure can be activated every second, then it
+ randomly scans 1/32 of the whole and drops entries that are in
+ the SYN-RECV/SYNACK state, which should be effective against
+ syn-flooding attack.
+
+ The valid values of drop_entry are from 0 to 3, where 0 means
+ that this strategy is always disabled, 1 and 2 mean automatic
+ modes (when there is no enough available memory, the strategy
+ is enabled and the variable is automatically set to 2,
+ otherwise the strategy is disabled and the variable is set to
+ 1), and 3 means that that the strategy is always enabled.
drop_packet - INTEGER
- 0 - disabled (default)
+ - 0 - disabled (default)
- The drop_packet defense is designed to drop 1/rate packets
- before forwarding them to real servers. If the rate is 1, then
- drop all the incoming packets.
+ The drop_packet defense is designed to drop 1/rate packets
+ before forwarding them to real servers. If the rate is 1, then
+ drop all the incoming packets.
- The value definition is the same as that of the drop_entry. In
- the automatic mode, the rate is determined by the follow
- formula: rate = amemthresh / (amemthresh - available_memory)
- when available memory is less than the available memory
- threshold. When the mode 3 is set, the always mode drop rate
- is controlled by the /proc/sys/net/ipv4/vs/am_droprate.
+ The value definition is the same as that of the drop_entry. In
+ the automatic mode, the rate is determined by the follow
+ formula: rate = amemthresh / (amemthresh - available_memory)
+ when available memory is less than the available memory
+ threshold. When the mode 3 is set, the always mode drop rate
+ is controlled by the /proc/sys/net/ipv4/vs/am_droprate.
expire_nodest_conn - BOOLEAN
- 0 - disabled (default)
- not 0 - enabled
-
- The default value is 0, the load balancer will silently drop
- packets when its destination server is not available. It may
- be useful, when user-space monitoring program deletes the
- destination server (because of server overload or wrong
- detection) and add back the server later, and the connections
- to the server can continue.
-
- If this feature is enabled, the load balancer will expire the
- connection immediately when a packet arrives and its
- destination server is not available, then the client program
- will be notified that the connection is closed. This is
- equivalent to the feature some people requires to flush
- connections when its destination is not available.
+ - 0 - disabled (default)
+ - not 0 - enabled
+
+ The default value is 0, the load balancer will silently drop
+ packets when its destination server is not available. It may
+ be useful, when user-space monitoring program deletes the
+ destination server (because of server overload or wrong
+ detection) and add back the server later, and the connections
+ to the server can continue.
+
+ If this feature is enabled, the load balancer will expire the
+ connection immediately when a packet arrives and its
+ destination server is not available, then the client program
+ will be notified that the connection is closed. This is
+ equivalent to the feature some people requires to flush
+ connections when its destination is not available.
expire_quiescent_template - BOOLEAN
- 0 - disabled (default)
- not 0 - enabled
+ - 0 - disabled (default)
+ - not 0 - enabled
When set to a non-zero value, the load balancer will expire
persistent templates when the destination server is quiescent.
@@ -158,8 +165,8 @@ expire_quiescent_template - BOOLEAN
connection and the destination server is quiescent.
ignore_tunneled - BOOLEAN
- 0 - disabled (default)
- not 0 - enabled
+ - 0 - disabled (default)
+ - not 0 - enabled
If set, ipvs will set the ipvs_property on all packets which are of
unrecognized protocols. This prevents us from routing tunneled
@@ -168,30 +175,30 @@ ignore_tunneled - BOOLEAN
ipvs routing loops when ipvs is also acting as a real server).
nat_icmp_send - BOOLEAN
- 0 - disabled (default)
- not 0 - enabled
+ - 0 - disabled (default)
+ - not 0 - enabled
- It controls sending icmp error messages (ICMP_DEST_UNREACH)
- for VS/NAT when the load balancer receives packets from real
- servers but the connection entries don't exist.
+ It controls sending icmp error messages (ICMP_DEST_UNREACH)
+ for VS/NAT when the load balancer receives packets from real
+ servers but the connection entries don't exist.
pmtu_disc - BOOLEAN
- 0 - disabled
- not 0 - enabled (default)
+ - 0 - disabled
+ - not 0 - enabled (default)
By default, reject with FRAG_NEEDED all DF packets that exceed
the PMTU, irrespective of the forwarding method. For TUN method
the flag can be disabled to fragment such packets.
secure_tcp - INTEGER
- 0 - disabled (default)
+ - 0 - disabled (default)
The secure_tcp defense is to use a more complicated TCP state
transition table. For VS/NAT, it also delays entering the
TCP ESTABLISHED state until the three way handshake is completed.
- The value definition is the same as that of drop_entry and
- drop_packet.
+ The value definition is the same as that of drop_entry and
+ drop_packet.
sync_threshold - vector of 2 INTEGERs: sync_threshold, sync_period
default 3 50
@@ -248,8 +255,8 @@ sync_ports - INTEGER
8848+sync_ports-1.
snat_reroute - BOOLEAN
- 0 - disabled
- not 0 - enabled (default)
+ - 0 - disabled
+ - not 0 - enabled (default)
If enabled, recalculate the route of SNATed packets from
realservers so that they are routed as if they originate from the
@@ -270,6 +277,7 @@ sync_persist_mode - INTEGER
Controls the synchronisation of connections when using persistence
0: All types of connections are synchronised
+
1: Attempt to reduce the synchronisation traffic depending on
the connection type. For persistent services avoid synchronisation
for normal connections, do it only for persistence templates.
diff --git a/Documentation/networking/kcm.txt b/Documentation/networking/kcm.rst
index b773a5278ac4..db0f5560ac1c 100644
--- a/Documentation/networking/kcm.txt
+++ b/Documentation/networking/kcm.rst
@@ -1,35 +1,38 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============================
Kernel Connection Multiplexor
------------------------------
+=============================
Kernel Connection Multiplexor (KCM) is a mechanism that provides a message based
interface over TCP for generic application protocols. With KCM an application
can efficiently send and receive application protocol messages over TCP using
datagram sockets.
-KCM implements an NxM multiplexor in the kernel as diagrammed below:
-
-+------------+ +------------+ +------------+ +------------+
-| KCM socket | | KCM socket | | KCM socket | | KCM socket |
-+------------+ +------------+ +------------+ +------------+
- | | | |
- +-----------+ | | +----------+
- | | | |
- +----------------------------------+
- | Multiplexor |
- +----------------------------------+
- | | | | |
- +---------+ | | | ------------+
- | | | | |
-+----------+ +----------+ +----------+ +----------+ +----------+
-| Psock | | Psock | | Psock | | Psock | | Psock |
-+----------+ +----------+ +----------+ +----------+ +----------+
- | | | | |
-+----------+ +----------+ +----------+ +----------+ +----------+
-| TCP sock | | TCP sock | | TCP sock | | TCP sock | | TCP sock |
-+----------+ +----------+ +----------+ +----------+ +----------+
+KCM implements an NxM multiplexor in the kernel as diagrammed below::
+
+ +------------+ +------------+ +------------+ +------------+
+ | KCM socket | | KCM socket | | KCM socket | | KCM socket |
+ +------------+ +------------+ +------------+ +------------+
+ | | | |
+ +-----------+ | | +----------+
+ | | | |
+ +----------------------------------+
+ | Multiplexor |
+ +----------------------------------+
+ | | | | |
+ +---------+ | | | ------------+
+ | | | | |
+ +----------+ +----------+ +----------+ +----------+ +----------+
+ | Psock | | Psock | | Psock | | Psock | | Psock |
+ +----------+ +----------+ +----------+ +----------+ +----------+
+ | | | | |
+ +----------+ +----------+ +----------+ +----------+ +----------+
+ | TCP sock | | TCP sock | | TCP sock | | TCP sock | | TCP sock |
+ +----------+ +----------+ +----------+ +----------+ +----------+
KCM sockets
------------
+===========
The KCM sockets provide the user interface to the multiplexor. All the KCM sockets
bound to a multiplexor are considered to have equivalent function, and I/O
@@ -37,7 +40,7 @@ operations in different sockets may be done in parallel without the need for
synchronization between threads in userspace.
Multiplexor
------------
+===========
The multiplexor provides the message steering. In the transmit path, messages
written on a KCM socket are sent atomically on an appropriate TCP socket.
@@ -45,14 +48,14 @@ Similarly, in the receive path, messages are constructed on each TCP socket
(Psock) and complete messages are steered to a KCM socket.
TCP sockets & Psocks
---------------------
+====================
TCP sockets may be bound to a KCM multiplexor. A Psock structure is allocated
for each bound TCP socket, this structure holds the state for constructing
messages on receive as well as other connection specific information for KCM.
Connected mode semantics
-------------------------
+========================
Each multiplexor assumes that all attached TCP connections are to the same
destination and can use the different connections for load balancing when
@@ -60,7 +63,7 @@ transmitting. The normal send and recv calls (include sendmmsg and recvmmsg)
can be used to send and receive messages from the KCM socket.
Socket types
-------------
+============
KCM supports SOCK_DGRAM and SOCK_SEQPACKET socket types.
@@ -110,23 +113,23 @@ User interface
Creating a multiplexor
----------------------
-A new multiplexor and initial KCM socket is created by a socket call:
+A new multiplexor and initial KCM socket is created by a socket call::
socket(AF_KCM, type, protocol)
- - type is either SOCK_DGRAM or SOCK_SEQPACKET
- - protocol is KCMPROTO_CONNECTED
+- type is either SOCK_DGRAM or SOCK_SEQPACKET
+- protocol is KCMPROTO_CONNECTED
Cloning KCM sockets
-------------------
After the first KCM socket is created using the socket call as described
above, additional sockets for the multiplexor can be created by cloning
-a KCM socket. This is accomplished by an ioctl on a KCM socket:
+a KCM socket. This is accomplished by an ioctl on a KCM socket::
/* From linux/kcm.h */
struct kcm_clone {
- int fd;
+ int fd;
};
struct kcm_clone info;
@@ -142,11 +145,11 @@ Attach transport sockets
------------------------
Attaching of transport sockets to a multiplexor is performed by calling an
-ioctl on a KCM socket for the multiplexor. e.g.:
+ioctl on a KCM socket for the multiplexor. e.g.::
/* From linux/kcm.h */
struct kcm_attach {
- int fd;
+ int fd;
int bpf_fd;
};
@@ -160,18 +163,19 @@ ioctl on a KCM socket for the multiplexor. e.g.:
ioctl(kcmfd, SIOCKCMATTACH, &info);
The kcm_attach structure contains:
- fd: file descriptor for TCP socket being attached
- bpf_prog_fd: file descriptor for compiled BPF program downloaded
+
+ - fd: file descriptor for TCP socket being attached
+ - bpf_prog_fd: file descriptor for compiled BPF program downloaded
Unattach transport sockets
--------------------------
Unattaching a transport socket from a multiplexor is straightforward. An
-"unattach" ioctl is done with the kcm_unattach structure as the argument:
+"unattach" ioctl is done with the kcm_unattach structure as the argument::
/* From linux/kcm.h */
struct kcm_unattach {
- int fd;
+ int fd;
};
struct kcm_unattach info;
@@ -190,7 +194,7 @@ When receive is disabled, any pending messages in the socket's
receive buffer are moved to other sockets. This feature is useful
if an application thread knows that it will be doing a lot of
work on a request and won't be able to service new messages for a
-while. Example use:
+while. Example use::
int val = 1;
@@ -200,7 +204,7 @@ BFP programs for message delineation
------------------------------------
BPF programs can be compiled using the BPF LLVM backend. For example,
-the BPF program for parsing Thrift is:
+the BPF program for parsing Thrift is::
#include "bpf.h" /* for __sk_buff */
#include "bpf_helpers.h" /* for load_word intrinsic */
@@ -250,6 +254,7 @@ based on groups, or batches of messages, can be beneficial for performance.
On transmit, there are three ways an application can batch (pipeline)
messages on a KCM socket.
+
1) Send multiple messages in a single sendmmsg.
2) Send a group of messages each with a sendmsg call, where all messages
except the last have MSG_BATCH in the flags of sendmsg call.
diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.rst
index 9bc271cdc9a8..a48238a2ec09 100644
--- a/Documentation/networking/l2tp.txt
+++ b/Documentation/networking/l2tp.rst
@@ -1,3 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====
+L2TP
+====
+
This document describes how to use the kernel's L2TP drivers to
provide L2TP functionality. L2TP is a protocol that tunnels one or
more sessions over an IP tunnel. It is commonly used for VPNs
@@ -121,14 +127,16 @@ Userspace may control behavior of the tunnel or session using
setsockopt and ioctl on the PPPoX socket. The following socket
options are supported:-
-DEBUG - bitmask of debug message categories. See below.
-SENDSEQ - 0 => don't send packets with sequence numbers
- 1 => send packets with sequence numbers
-RECVSEQ - 0 => receive packet sequence numbers are optional
- 1 => drop receive packets without sequence numbers
-LNSMODE - 0 => act as LAC.
- 1 => act as LNS.
-REORDERTO - reorder timeout (in millisecs). If 0, don't try to reorder.
+========= ===========================================================
+DEBUG bitmask of debug message categories. See below.
+SENDSEQ - 0 => don't send packets with sequence numbers
+ - 1 => send packets with sequence numbers
+RECVSEQ - 0 => receive packet sequence numbers are optional
+ - 1 => drop receive packets without sequence numbers
+LNSMODE - 0 => act as LAC.
+ - 1 => act as LNS.
+REORDERTO reorder timeout (in millisecs). If 0, don't try to reorder.
+========= ===========================================================
Only the DEBUG option is supported by the special tunnel management
PPPoX socket.
@@ -177,20 +185,22 @@ setsockopt on the PPPoX socket to set a debug mask.
The following debug mask bits are available:
+================ ==============================
L2TP_MSG_DEBUG verbose debug (if compiled in)
L2TP_MSG_CONTROL userspace - kernel interface
L2TP_MSG_SEQ sequence numbers handling
L2TP_MSG_DATA data packets
+================ ==============================
If enabled, files under a l2tp debugfs directory can be used to dump
kernel state about L2TP tunnels and sessions. To access it, the
-debugfs filesystem must first be mounted.
+debugfs filesystem must first be mounted::
-# mount -t debugfs debugfs /debug
+ # mount -t debugfs debugfs /debug
-Files under the l2tp directory can then be accessed.
+Files under the l2tp directory can then be accessed::
-# cat /debug/l2tp/tunnels
+ # cat /debug/l2tp/tunnels
The debugfs files should not be used by applications to obtain L2TP
state information because the file format is subject to change. It is
@@ -211,14 +221,14 @@ iproute2's ip utility to support this.
To create an L2TPv3 ethernet pseudowire between local host 192.168.1.1
and peer 192.168.1.2, using IP addresses 10.5.1.1 and 10.5.1.2 for the
-tunnel endpoints:-
+tunnel endpoints::
-# ip l2tp add tunnel tunnel_id 1 peer_tunnel_id 1 udp_sport 5000 \
- udp_dport 5000 encap udp local 192.168.1.1 remote 192.168.1.2
-# ip l2tp add session tunnel_id 1 session_id 1 peer_session_id 1
-# ip -s -d show dev l2tpeth0
-# ip addr add 10.5.1.2/32 peer 10.5.1.1/32 dev l2tpeth0
-# ip li set dev l2tpeth0 up
+ # ip l2tp add tunnel tunnel_id 1 peer_tunnel_id 1 udp_sport 5000 \
+ udp_dport 5000 encap udp local 192.168.1.1 remote 192.168.1.2
+ # ip l2tp add session tunnel_id 1 session_id 1 peer_session_id 1
+ # ip -s -d show dev l2tpeth0
+ # ip addr add 10.5.1.2/32 peer 10.5.1.1/32 dev l2tpeth0
+ # ip li set dev l2tpeth0 up
Choose IP addresses to be the address of a local IP interface and that
of the remote system. The IP addresses of the l2tpeth0 interface can be
@@ -228,75 +238,78 @@ Repeat the above at the peer, with ports, tunnel/session ids and IP
addresses reversed. The tunnel and session IDs can be any non-zero
32-bit number, but the values must be reversed at the peer.
+======================== ===================
Host 1 Host2
+======================== ===================
udp_sport=5000 udp_sport=5001
udp_dport=5001 udp_dport=5000
tunnel_id=42 tunnel_id=45
peer_tunnel_id=45 peer_tunnel_id=42
session_id=128 session_id=5196755
peer_session_id=5196755 peer_session_id=128
+======================== ===================
When done at both ends of the tunnel, it should be possible to send
-data over the network. e.g.
+data over the network. e.g.::
-# ping 10.5.1.1
+ # ping 10.5.1.1
Sample Userspace Code
=====================
-1. Create tunnel management PPPoX socket
-
- kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP);
- if (kernel_fd >= 0) {
- struct sockaddr_pppol2tp sax;
- struct sockaddr_in const *peer_addr;
-
- peer_addr = l2tp_tunnel_get_peer_addr(tunnel);
- memset(&sax, 0, sizeof(sax));
- sax.sa_family = AF_PPPOX;
- sax.sa_protocol = PX_PROTO_OL2TP;
- sax.pppol2tp.fd = udp_fd; /* fd of tunnel UDP socket */
- sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr;
- sax.pppol2tp.addr.sin_port = peer_addr->sin_port;
- sax.pppol2tp.addr.sin_family = AF_INET;
- sax.pppol2tp.s_tunnel = tunnel_id;
- sax.pppol2tp.s_session = 0; /* special case: mgmt socket */
- sax.pppol2tp.d_tunnel = 0;
- sax.pppol2tp.d_session = 0; /* special case: mgmt socket */
-
- if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) {
- perror("connect failed");
- result = -errno;
- goto err;
- }
- }
-
-2. Create session PPPoX data socket
-
- struct sockaddr_pppol2tp sax;
- int fd;
-
- /* Note, the target socket must be bound already, else it will not be ready */
- sax.sa_family = AF_PPPOX;
- sax.sa_protocol = PX_PROTO_OL2TP;
- sax.pppol2tp.fd = tunnel_fd;
- sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr;
- sax.pppol2tp.addr.sin_port = addr->sin_port;
- sax.pppol2tp.addr.sin_family = AF_INET;
- sax.pppol2tp.s_tunnel = tunnel_id;
- sax.pppol2tp.s_session = session_id;
- sax.pppol2tp.d_tunnel = peer_tunnel_id;
- sax.pppol2tp.d_session = peer_session_id;
-
- /* session_fd is the fd of the session's PPPoL2TP socket.
- * tunnel_fd is the fd of the tunnel UDP socket.
- */
- fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax));
- if (fd < 0 ) {
- return -errno;
- }
- return 0;
+1. Create tunnel management PPPoX socket::
+
+ kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP);
+ if (kernel_fd >= 0) {
+ struct sockaddr_pppol2tp sax;
+ struct sockaddr_in const *peer_addr;
+
+ peer_addr = l2tp_tunnel_get_peer_addr(tunnel);
+ memset(&sax, 0, sizeof(sax));
+ sax.sa_family = AF_PPPOX;
+ sax.sa_protocol = PX_PROTO_OL2TP;
+ sax.pppol2tp.fd = udp_fd; /* fd of tunnel UDP socket */
+ sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr;
+ sax.pppol2tp.addr.sin_port = peer_addr->sin_port;
+ sax.pppol2tp.addr.sin_family = AF_INET;
+ sax.pppol2tp.s_tunnel = tunnel_id;
+ sax.pppol2tp.s_session = 0; /* special case: mgmt socket */
+ sax.pppol2tp.d_tunnel = 0;
+ sax.pppol2tp.d_session = 0; /* special case: mgmt socket */
+
+ if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) {
+ perror("connect failed");
+ result = -errno;
+ goto err;
+ }
+ }
+
+2. Create session PPPoX data socket::
+
+ struct sockaddr_pppol2tp sax;
+ int fd;
+
+ /* Note, the target socket must be bound already, else it will not be ready */
+ sax.sa_family = AF_PPPOX;
+ sax.sa_protocol = PX_PROTO_OL2TP;
+ sax.pppol2tp.fd = tunnel_fd;
+ sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr;
+ sax.pppol2tp.addr.sin_port = addr->sin_port;
+ sax.pppol2tp.addr.sin_family = AF_INET;
+ sax.pppol2tp.s_tunnel = tunnel_id;
+ sax.pppol2tp.s_session = session_id;
+ sax.pppol2tp.d_tunnel = peer_tunnel_id;
+ sax.pppol2tp.d_session = peer_session_id;
+
+ /* session_fd is the fd of the session's PPPoL2TP socket.
+ * tunnel_fd is the fd of the tunnel UDP socket.
+ */
+ fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax));
+ if (fd < 0 ) {
+ return -errno;
+ }
+ return 0;
Internal Implementation
=======================
diff --git a/Documentation/networking/lapb-module.txt b/Documentation/networking/lapb-module.rst
index d4fc8f221559..ff586bc9f005 100644
--- a/Documentation/networking/lapb-module.txt
+++ b/Documentation/networking/lapb-module.rst
@@ -1,8 +1,14 @@
- The Linux LAPB Module Interface 1.3
+.. SPDX-License-Identifier: GPL-2.0
- Jonathan Naylor 29.12.96
+===============================
+The Linux LAPB Module Interface
+===============================
-Changed (Henner Eisen, 2000-10-29): int return value for data_indication()
+Version 1.3
+
+Jonathan Naylor 29.12.96
+
+Changed (Henner Eisen, 2000-10-29): int return value for data_indication()
The LAPB module will be a separately compiled module for use by any parts of
the Linux operating system that require a LAPB service. This document
@@ -32,16 +38,16 @@ LAPB Initialisation Structure
This structure is used only once, in the call to lapb_register (see below).
It contains information about the device driver that requires the services
-of the LAPB module.
+of the LAPB module::
-struct lapb_register_struct {
- void (*connect_confirmation)(int token, int reason);
- void (*connect_indication)(int token, int reason);
- void (*disconnect_confirmation)(int token, int reason);
- void (*disconnect_indication)(int token, int reason);
- int (*data_indication)(int token, struct sk_buff *skb);
- void (*data_transmit)(int token, struct sk_buff *skb);
-};
+ struct lapb_register_struct {
+ void (*connect_confirmation)(int token, int reason);
+ void (*connect_indication)(int token, int reason);
+ void (*disconnect_confirmation)(int token, int reason);
+ void (*disconnect_indication)(int token, int reason);
+ int (*data_indication)(int token, struct sk_buff *skb);
+ void (*data_transmit)(int token, struct sk_buff *skb);
+ };
Each member of this structure corresponds to a function in the device driver
that is called when a particular event in the LAPB module occurs. These will
@@ -54,19 +60,19 @@ LAPB Parameter Structure
This structure is used with the lapb_getparms and lapb_setparms functions
(see below). They are used to allow the device driver to get and set the
-operational parameters of the LAPB implementation for a given connection.
-
-struct lapb_parms_struct {
- unsigned int t1;
- unsigned int t1timer;
- unsigned int t2;
- unsigned int t2timer;
- unsigned int n2;
- unsigned int n2count;
- unsigned int window;
- unsigned int state;
- unsigned int mode;
-};
+operational parameters of the LAPB implementation for a given connection::
+
+ struct lapb_parms_struct {
+ unsigned int t1;
+ unsigned int t1timer;
+ unsigned int t2;
+ unsigned int t2timer;
+ unsigned int n2;
+ unsigned int n2count;
+ unsigned int window;
+ unsigned int state;
+ unsigned int mode;
+ };
T1 and T2 are protocol timing parameters and are given in units of 100ms. N2
is the maximum number of tries on the link before it is declared a failure.
@@ -78,11 +84,14 @@ link.
The mode variable is a bit field used for setting (at present) three values.
The bit fields have the following meanings:
+====== =================================================
Bit Meaning
+====== =================================================
0 LAPB operation (0=LAPB_STANDARD 1=LAPB_EXTENDED).
1 [SM]LP operation (0=LAPB_SLP 1=LAPB=MLP).
2 DTE/DCE operation (0=LAPB_DTE 1=LAPB_DCE)
3-31 Reserved, must be 0.
+====== =================================================
Extended LAPB operation indicates the use of extended sequence numbers and
consequently larger window sizes, the default is standard LAPB operation.
@@ -99,8 +108,9 @@ Functions
The LAPB module provides a number of function entry points.
+::
-int lapb_register(void *token, struct lapb_register_struct);
+ int lapb_register(void *token, struct lapb_register_struct);
This must be called before the LAPB module may be used. If the call is
successful then LAPB_OK is returned. The token must be a unique identifier
@@ -111,33 +121,42 @@ For multiple LAPB links in a single device driver, multiple calls to
lapb_register must be made. The format of the lapb_register_struct is given
above. The return values are:
+============= =============================
LAPB_OK LAPB registered successfully.
LAPB_BADTOKEN Token is already registered.
LAPB_NOMEM Out of memory
+============= =============================
+::
-int lapb_unregister(void *token);
+ int lapb_unregister(void *token);
This releases all the resources associated with a LAPB link. Any current
LAPB link will be abandoned without further messages being passed. After
this call, the value of token is no longer valid for any calls to the LAPB
function. The valid return values are:
+============= ===============================
LAPB_OK LAPB unregistered successfully.
LAPB_BADTOKEN Invalid/unknown LAPB token.
+============= ===============================
+::
-int lapb_getparms(void *token, struct lapb_parms_struct *parms);
+ int lapb_getparms(void *token, struct lapb_parms_struct *parms);
This allows the device driver to get the values of the current LAPB
variables, the lapb_parms_struct is described above. The valid return values
are:
+============= =============================
LAPB_OK LAPB getparms was successful.
LAPB_BADTOKEN Invalid/unknown LAPB token.
+============= =============================
+::
-int lapb_setparms(void *token, struct lapb_parms_struct *parms);
+ int lapb_setparms(void *token, struct lapb_parms_struct *parms);
This allows the device driver to set the values of the current LAPB
variables, the lapb_parms_struct is described above. The values of t1timer,
@@ -145,42 +164,54 @@ t2timer and n2count are ignored, likewise changing the mode bits when
connected will be ignored. An error implies that none of the values have
been changed. The valid return values are:
+============= =================================================
LAPB_OK LAPB getparms was successful.
LAPB_BADTOKEN Invalid/unknown LAPB token.
LAPB_INVALUE One of the values was out of its allowable range.
+============= =================================================
+::
-int lapb_connect_request(void *token);
+ int lapb_connect_request(void *token);
Initiate a connect using the current parameter settings. The valid return
values are:
+============== =================================
LAPB_OK LAPB is starting to connect.
LAPB_BADTOKEN Invalid/unknown LAPB token.
LAPB_CONNECTED LAPB module is already connected.
+============== =================================
+::
-int lapb_disconnect_request(void *token);
+ int lapb_disconnect_request(void *token);
Initiate a disconnect. The valid return values are:
+================= ===============================
LAPB_OK LAPB is starting to disconnect.
LAPB_BADTOKEN Invalid/unknown LAPB token.
LAPB_NOTCONNECTED LAPB module is not connected.
+================= ===============================
+::
-int lapb_data_request(void *token, struct sk_buff *skb);
+ int lapb_data_request(void *token, struct sk_buff *skb);
Queue data with the LAPB module for transmitting over the link. If the call
is successful then the skbuff is owned by the LAPB module and may not be
used by the device driver again. The valid return values are:
+================= =============================
LAPB_OK LAPB has accepted the data.
LAPB_BADTOKEN Invalid/unknown LAPB token.
LAPB_NOTCONNECTED LAPB module is not connected.
+================= =============================
+::
-int lapb_data_received(void *token, struct sk_buff *skb);
+ int lapb_data_received(void *token, struct sk_buff *skb);
Queue data with the LAPB module which has been received from the device. It
is expected that the data passed to the LAPB module has skb->data pointing
@@ -188,9 +219,10 @@ to the beginning of the LAPB data. If the call is successful then the skbuff
is owned by the LAPB module and may not be used by the device driver again.
The valid return values are:
+============= ===========================
LAPB_OK LAPB has accepted the data.
LAPB_BADTOKEN Invalid/unknown LAPB token.
-
+============= ===========================
Callbacks
---------
@@ -200,49 +232,58 @@ module to call when an event occurs. They are registered with the LAPB
module with lapb_register (see above) in the structure lapb_register_struct
(see above).
+::
-void (*connect_confirmation)(void *token, int reason);
+ void (*connect_confirmation)(void *token, int reason);
This is called by the LAPB module when a connection is established after
being requested by a call to lapb_connect_request (see above). The reason is
always LAPB_OK.
+::
-void (*connect_indication)(void *token, int reason);
+ void (*connect_indication)(void *token, int reason);
This is called by the LAPB module when the link is established by the remote
system. The value of reason is always LAPB_OK.
+::
-void (*disconnect_confirmation)(void *token, int reason);
+ void (*disconnect_confirmation)(void *token, int reason);
This is called by the LAPB module when an event occurs after the device
driver has called lapb_disconnect_request (see above). The reason indicates
what has happened. In all cases the LAPB link can be regarded as being
terminated. The values for reason are:
+================= ====================================================
LAPB_OK The LAPB link was terminated normally.
LAPB_NOTCONNECTED The remote system was not connected.
LAPB_TIMEDOUT No response was received in N2 tries from the remote
system.
+================= ====================================================
+::
-void (*disconnect_indication)(void *token, int reason);
+ void (*disconnect_indication)(void *token, int reason);
This is called by the LAPB module when the link is terminated by the remote
system or another event has occurred to terminate the link. This may be
returned in response to a lapb_connect_request (see above) if the remote
system refused the request. The values for reason are:
+================= ====================================================
LAPB_OK The LAPB link was terminated normally by the remote
system.
LAPB_REFUSED The remote system refused the connect request.
LAPB_NOTCONNECTED The remote system was not connected.
LAPB_TIMEDOUT No response was received in N2 tries from the remote
system.
+================= ====================================================
+::
-int (*data_indication)(void *token, struct sk_buff *skb);
+ int (*data_indication)(void *token, struct sk_buff *skb);
This is called by the LAPB module when data has been received from the
remote system that should be passed onto the next layer in the protocol
@@ -254,8 +295,9 @@ This method should return NET_RX_DROP (as defined in the header
file include/linux/netdevice.h) if and only if the frame was dropped
before it could be delivered to the upper layer.
+::
-void (*data_transmit)(void *token, struct sk_buff *skb);
+ void (*data_transmit)(void *token, struct sk_buff *skb);
This is called by the LAPB module when data is to be transmitted to the
remote system by the device driver. The skbuff becomes the property of the
diff --git a/Documentation/networking/ltpc.txt b/Documentation/networking/ltpc.rst
index 0bf3220c715b..0ad197fd17ce 100644
--- a/Documentation/networking/ltpc.txt
+++ b/Documentation/networking/ltpc.rst
@@ -1,3 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========
+LTPC Driver
+===========
+
This is the ALPHA version of the ltpc driver.
In order to use it, you will need at least version 1.3.3 of the
@@ -15,7 +21,7 @@ yourself. (see "Card Configuration" below for how to determine or
change the settings on your card)
When the driver is compiled into the kernel, you can add a line such
-as the following to your /etc/lilo.conf:
+as the following to your /etc/lilo.conf::
append="ltpc=0x240,9,1"
@@ -25,13 +31,13 @@ the driver will try to determine them itself.
If you load the driver as a module, you can pass the parameters "io=",
"irq=", and "dma=" on the command line with insmod or modprobe, or add
-them as options in a configuration file in /etc/modprobe.d/ directory:
+them as options in a configuration file in /etc/modprobe.d/ directory::
alias lt0 ltpc # autoload the module when the interface is configured
options ltpc io=0x240 irq=9 dma=1
Before starting up the netatalk demons (perhaps in rc.local), you
-need to add a line such as:
+need to add a line such as::
/sbin/ifconfig lt0 127.0.0.42
@@ -42,7 +48,7 @@ The appropriate netatalk configuration depends on whether you are
attached to a network that includes AppleTalk routers or not. If,
like me, you are simply connecting to your home Macintoshes and
printers, you need to set up netatalk to "seed". The way I do this
-is to have the lines
+is to have the lines::
dummy -seed -phase 2 -net 2000 -addr 2000.26 -zone "1033"
lt0 -seed -phase 1 -net 1033 -addr 1033.27 -zone "1033"
@@ -57,13 +63,13 @@ such.
If you are attached to an extended AppleTalk network, with routers on
it, then you don't need to fool around with this -- the appropriate
-line in atalkd.conf is
+line in atalkd.conf is::
lt0 -phase 1
---------------------------------------
-Card Configuration:
+Card Configuration
+==================
The interrupts and so forth are configured via the dipswitch on the
board. Set the switches so as not to conflict with other hardware.
@@ -73,38 +79,44 @@ board. Set the switches so as not to conflict with other hardware.
original documentation refers to IRQ2. Since you'll be running
this on an AT (or later) class machine, that really means IRQ9.
+ === ===========================================================
SW1 IRQ 4
SW2 IRQ 3
SW3 IRQ 9 (2 in original card documentation only applies to XT)
+ === ===========================================================
DMA -- choose DMA 1 or 3, and set both corresponding switches.
+ === =====
SW4 DMA 3
SW5 DMA 1
SW6 DMA 3
SW7 DMA 1
+ === =====
I/O address -- choose one.
+ === =========
SW8 220 / 240
+ === =========
---------------------------------------
-IP:
+IP
+==
Yes, it is possible to do IP over LocalTalk. However, you can't just
treat the LocalTalk device like an ordinary Ethernet device, even if
that's what it looks like to Netatalk.
Instead, you follow the same procedure as for doing IP in EtherTalk.
-See Documentation/networking/ipddp.txt for more information about the
+See Documentation/networking/ipddp.rst for more information about the
kernel driver and userspace tools needed.
---------------------------------------
-BUGS:
+Bugs
+====
IRQ autoprobing often doesn't work on a cold boot. To get around
this, either compile the driver as a module, or pass the parameters
@@ -120,12 +132,13 @@ It may theoretically be possible to use two LTPC cards in the same
machine, but this is unsupported, so if you really want to do this,
you'll probably have to hack the initialization code a bit.
-______________________________________
-THANKS:
- Thanks to Alan Cox for helpful discussions early on in this
+Thanks
+======
+
+Thanks to Alan Cox for helpful discussions early on in this
work, and to Denis Hainsworth for doing the bleeding-edge testing.
--- Bradford Johnson <bradford@math.umn.edu>
+Bradford Johnson <bradford@math.umn.edu>
--- Updated 11/09/1998 by David Huggins-Daines <dhd@debian.org>
+Updated 11/09/1998 by David Huggins-Daines <dhd@debian.org>
diff --git a/Documentation/networking/mac80211-injection.txt b/Documentation/networking/mac80211-injection.rst
index d58d78df9ca2..be65f886ff1f 100644
--- a/Documentation/networking/mac80211-injection.txt
+++ b/Documentation/networking/mac80211-injection.rst
@@ -1,16 +1,19 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================================
How to use packet injection with mac80211
=========================================
mac80211 now allows arbitrary packets to be injected down any Monitor Mode
interface from userland. The packet you inject needs to be composed in the
-following format:
+following format::
[ radiotap header ]
[ ieee80211 header ]
[ payload ]
The radiotap format is discussed in
-./Documentation/networking/radiotap-headers.txt.
+./Documentation/networking/radiotap-headers.rst.
Despite many radiotap parameters being currently defined, most only make sense
to appear on received packets. The following information is parsed from the
@@ -18,15 +21,19 @@ radiotap headers and used to control injection:
* IEEE80211_RADIOTAP_FLAGS
- IEEE80211_RADIOTAP_F_FCS: FCS will be removed and recalculated
- IEEE80211_RADIOTAP_F_WEP: frame will be encrypted if key available
- IEEE80211_RADIOTAP_F_FRAG: frame will be fragmented if longer than the
+ ========================= ===========================================
+ IEEE80211_RADIOTAP_F_FCS FCS will be removed and recalculated
+ IEEE80211_RADIOTAP_F_WEP frame will be encrypted if key available
+ IEEE80211_RADIOTAP_F_FRAG frame will be fragmented if longer than the
current fragmentation threshold.
+ ========================= ===========================================
* IEEE80211_RADIOTAP_TX_FLAGS
- IEEE80211_RADIOTAP_F_TX_NOACK: frame should be sent without waiting for
+ ============================= ========================================
+ IEEE80211_RADIOTAP_F_TX_NOACK frame should be sent without waiting for
an ACK even if it is a unicast frame
+ ============================= ========================================
* IEEE80211_RADIOTAP_RATE
@@ -37,8 +44,10 @@ radiotap headers and used to control injection:
HT rate for the transmission (only for devices without own rate control).
Also some flags are parsed
- IEEE80211_RADIOTAP_MCS_SGI: use short guard interval
- IEEE80211_RADIOTAP_MCS_BW_40: send in HT40 mode
+ ============================ ========================
+ IEEE80211_RADIOTAP_MCS_SGI use short guard interval
+ IEEE80211_RADIOTAP_MCS_BW_40 send in HT40 mode
+ ============================ ========================
* IEEE80211_RADIOTAP_DATA_RETRIES
@@ -51,17 +60,17 @@ radiotap headers and used to control injection:
without own rate control). Also other fields are parsed
flags field
- IEEE80211_RADIOTAP_VHT_FLAG_SGI: use short guard interval
+ IEEE80211_RADIOTAP_VHT_FLAG_SGI: use short guard interval
bandwidth field
- 1: send using 40MHz channel width
- 4: send using 80MHz channel width
- 11: send using 160MHz channel width
+ * 1: send using 40MHz channel width
+ * 4: send using 80MHz channel width
+ * 11: send using 160MHz channel width
The injection code can also skip all other currently defined radiotap fields
facilitating replay of captured radiotap headers directly.
-Here is an example valid radiotap header defining some parameters
+Here is an example valid radiotap header defining some parameters::
0x00, 0x00, // <-- radiotap version
0x0b, 0x00, // <- radiotap header length
@@ -71,7 +80,7 @@ Here is an example valid radiotap header defining some parameters
0x01 //<-- antenna
The ieee80211 header follows immediately afterwards, looking for example like
-this:
+this::
0x08, 0x01, 0x00, 0x00,
0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
@@ -84,10 +93,10 @@ Then lastly there is the payload.
After composing the packet contents, it is sent by send()-ing it to a logical
mac80211 interface that is in Monitor mode. Libpcap can also be used,
(which is easier than doing the work to bind the socket to the right
-interface), along the following lines:
+interface), along the following lines:::
ppcap = pcap_open_live(szInterfaceName, 800, 1, 20, szErrbuf);
-...
+ ...
r = pcap_inject(ppcap, u8aSendBuffer, nLength);
You can also find a link to a complete inject application here:
diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.rst
index 025cc9b96992..0a2ac88404d7 100644
--- a/Documentation/networking/mpls-sysctl.txt
+++ b/Documentation/networking/mpls-sysctl.rst
@@ -1,4 +1,11 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+MPLS Sysfs variables
+====================
+
/proc/sys/net/mpls/* Variables:
+===============================
platform_labels - INTEGER
Number of entries in the platform label table. It is not
@@ -17,6 +24,7 @@ platform_labels - INTEGER
no longer fit in the table.
Possible values: 0 - 1048575
+
Default: 0
ip_ttl_propagate - BOOL
@@ -27,8 +35,8 @@ ip_ttl_propagate - BOOL
If disabled, the MPLS transport network will appear as a
single hop to transit traffic.
- 0 - disabled / RFC 3443 [Short] Pipe Model
- 1 - enabled / RFC 3443 Uniform Model (default)
+ * 0 - disabled / RFC 3443 [Short] Pipe Model
+ * 1 - enabled / RFC 3443 Uniform Model (default)
default_ttl - INTEGER
Default TTL value to use for MPLS packets where it cannot be
@@ -36,6 +44,7 @@ default_ttl - INTEGER
or ip_ttl_propagate has been disabled.
Possible values: 1 - 255
+
Default: 255
conf/<interface>/input - BOOL
@@ -44,5 +53,5 @@ conf/<interface>/input - BOOL
If disabled, packets will be discarded without further
processing.
- 0 - disabled (default)
- not 0 - enabled
+ * 0 - disabled (default)
+ * not 0 - enabled
diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.rst
index 4caa0e314cc2..0a576166e9dd 100644
--- a/Documentation/networking/multiqueue.txt
+++ b/Documentation/networking/multiqueue.rst
@@ -1,17 +1,17 @@
+.. SPDX-License-Identifier: GPL-2.0
- HOWTO for multiqueue network device support
- ===========================================
+===========================================
+HOWTO for multiqueue network device support
+===========================================
Section 1: Base driver requirements for implementing multiqueue support
+=======================================================================
Intro: Kernel support for multiqueue devices
---------------------------------------------------------
Kernel support for multiqueue devices is always present.
-Section 1: Base driver requirements for implementing multiqueue support
------------------------------------------------------------------------
-
Base drivers are required to use the new alloc_etherdev_mq() or
alloc_netdev_mq() functions to allocate the subqueues for the device. The
underlying kernel API will take care of the allocation and deallocation of
@@ -26,8 +26,7 @@ comes online or when it's completely shut down (unregister_netdev(), etc.).
Section 2: Qdisc support for multiqueue devices
-
------------------------------------------------
+===============================================
Currently two qdiscs are optimized for multiqueue devices. The first is the
default pfifo_fast qdisc. This qdisc supports one qdisc per hardware queue.
@@ -46,22 +45,22 @@ will be queued to the band associated with the hardware queue.
Section 3: Brief howto using MULTIQ for multiqueue devices
----------------------------------------------------------------
+==========================================================
The userspace command 'tc,' part of the iproute2 package, is used to configure
qdiscs. To add the MULTIQ qdisc to your network device, assuming the device
-is called eth0, run the following command:
+is called eth0, run the following command::
-# tc qdisc add dev eth0 root handle 1: multiq
+ # tc qdisc add dev eth0 root handle 1: multiq
The qdisc will allocate the number of bands to equal the number of queues that
the device reports, and bring the qdisc online. Assuming eth0 has 4 Tx
-queues, the band mapping would look like:
+queues, the band mapping would look like::
-band 0 => queue 0
-band 1 => queue 1
-band 2 => queue 2
-band 3 => queue 3
+ band 0 => queue 0
+ band 1 => queue 1
+ band 2 => queue 2
+ band 3 => queue 3
Traffic will begin flowing through each queue based on either the simple_tx_hash
function or based on netdev->select_queue() if you have it defined.
@@ -69,11 +68,11 @@ function or based on netdev->select_queue() if you have it defined.
The behavior of tc filters remains the same. However a new tc action,
skbedit, has been added. Assuming you wanted to route all traffic to a
specific host, for example 192.168.0.3, through a specific queue you could use
-this action and establish a filter such as:
+this action and establish a filter such as::
-tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \
- match ip dst 192.168.0.3 \
- action skbedit queue_mapping 3
+ tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \
+ match ip dst 192.168.0.3 \
+ action skbedit queue_mapping 3
-Author: Alexander Duyck <alexander.h.duyck@intel.com>
-Original Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
+:Author: Alexander Duyck <alexander.h.duyck@intel.com>
+:Original Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
diff --git a/Documentation/networking/netconsole.txt b/Documentation/networking/netconsole.rst
index 296ea00fd3eb..1f5c4a04027c 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.rst
@@ -1,7 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========
+Netconsole
+==========
+
started by Ingo Molnar <mingo@redhat.com>, 2001.09.17
+
2.6 port and netpoll api by Matt Mackall <mpm@selenic.com>, Sep 9 2003
+
IPv6 support by Cong Wang <xiyou.wangcong@gmail.com>, Jan 1 2013
+
Extended console support by Tejun Heo <tj@kernel.org>, May 1 2015
Please send bug reports to Matt Mackall <mpm@selenic.com>
@@ -23,34 +32,34 @@ Sender and receiver configuration:
==================================
It takes a string configuration parameter "netconsole" in the
-following format:
+following format::
netconsole=[+][src-port]@[src-ip]/[<dev>],[tgt-port]@<tgt-ip>/[tgt-macaddr]
where
- + if present, enable extended console support
- src-port source for UDP packets (defaults to 6665)
- src-ip source IP to use (interface address)
- dev network interface (eth0)
- tgt-port port for logging agent (6666)
- tgt-ip IP address for logging agent
- tgt-macaddr ethernet MAC address for logging agent (broadcast)
+ + if present, enable extended console support
+ src-port source for UDP packets (defaults to 6665)
+ src-ip source IP to use (interface address)
+ dev network interface (eth0)
+ tgt-port port for logging agent (6666)
+ tgt-ip IP address for logging agent
+ tgt-macaddr ethernet MAC address for logging agent (broadcast)
-Examples:
+Examples::
linux netconsole=4444@10.0.0.1/eth1,9353@10.0.0.2/12:34:56:78:9a:bc
- or
+or::
insmod netconsole netconsole=@/,@10.0.0.2/
- or using IPv6
+or using IPv6::
insmod netconsole netconsole=@/,@fd00:1:2:3::1/
It also supports logging to multiple remote agents by specifying
parameters for the multiple agents separated by semicolons and the
-complete string enclosed in "quotes", thusly:
+complete string enclosed in "quotes", thusly::
modprobe netconsole netconsole="@/,@10.0.0.2/;@/eth1,6892@10.0.0.3/"
@@ -67,14 +76,19 @@ for example:
On distributions using a BSD-based netcat version (e.g. Fedora,
openSUSE and Ubuntu) the listening port must be specified without
- the -p switch:
+ the -p switch::
+
+ nc -u -l -p <port>' / 'nc -u -l <port>
+
+ or::
- 'nc -u -l -p <port>' / 'nc -u -l <port>' or
- 'netcat -u -l -p <port>' / 'netcat -u -l <port>'
+ netcat -u -l -p <port>' / 'netcat -u -l <port>
3) socat
- 'socat udp-recv:<port> -'
+::
+
+ socat udp-recv:<port> -
Dynamic reconfiguration:
========================
@@ -92,7 +106,7 @@ netconsole module (or kernel, if netconsole is built-in).
Some examples follow (where configfs is mounted at the /sys/kernel/config
mountpoint).
-To add a remote logging target (target names can be arbitrary):
+To add a remote logging target (target names can be arbitrary)::
cd /sys/kernel/config/netconsole/
mkdir target1
@@ -102,12 +116,13 @@ above) and are disabled by default -- they must first be enabled by writing
"1" to the "enabled" attribute (usually after setting parameters accordingly)
as described below.
-To remove a target:
+To remove a target::
rmdir /sys/kernel/config/netconsole/othertarget/
The interface exposes these parameters of a netconsole target to userspace:
+ ============== ================================= ============
enabled Is this target currently enabled? (read-write)
extended Extended mode enabled (read-write)
dev_name Local network interface name (read-write)
@@ -117,12 +132,13 @@ The interface exposes these parameters of a netconsole target to userspace:
remote_ip Remote agent's IP address (read-write)
local_mac Local interface's MAC address (read-only)
remote_mac Remote agent's MAC address (read-write)
+ ============== ================================= ============
The "enabled" attribute is also used to control whether the parameters of
a target can be updated or not -- you can modify the parameters of only
disabled targets (i.e. if "enabled" is 0).
-To update a target's parameters:
+To update a target's parameters::
cat enabled # check if enabled is 1
echo 0 > enabled # disable the target (if required)
@@ -140,12 +156,12 @@ Extended console:
If '+' is prefixed to the configuration line or "extended" config file
is set to 1, extended console support is enabled. An example boot
-param follows.
+param follows::
linux netconsole=+4444@10.0.0.1/eth1,9353@10.0.0.2/12:34:56:78:9a:bc
Log messages are transmitted with extended metadata header in the
-following format which is the same as /dev/kmsg.
+following format which is the same as /dev/kmsg::
<level>,<sequnum>,<timestamp>,<contflag>;<message text>
@@ -155,12 +171,12 @@ newline is used as the delimeter.
If a message doesn't fit in certain number of bytes (currently 1000),
the message is split into multiple fragments by netconsole. These
-fragments are transmitted with "ncfrag" header field added.
+fragments are transmitted with "ncfrag" header field added::
ncfrag=<byte-offset>/<total-bytes>
For example, assuming a lot smaller chunk size, a message "the first
-chunk, the 2nd chunk." may be split as follows.
+chunk, the 2nd chunk." may be split as follows::
6,416,1758426,-,ncfrag=0/31;the first chunk,
6,416,1758426,-,ncfrag=16/31; the 2nd chunk.
@@ -168,39 +184,52 @@ chunk, the 2nd chunk." may be split as follows.
Miscellaneous notes:
====================
-WARNING: the default target ethernet setting uses the broadcast
-ethernet address to send packets, which can cause increased load on
-other systems on the same ethernet segment.
+.. Warning::
+
+ the default target ethernet setting uses the broadcast
+ ethernet address to send packets, which can cause increased load on
+ other systems on the same ethernet segment.
+
+.. Tip::
+
+ some LAN switches may be configured to suppress ethernet broadcasts
+ so it is advised to explicitly specify the remote agents' MAC addresses
+ from the config parameters passed to netconsole.
+
+.. Tip::
+
+ to find out the MAC address of, say, 10.0.0.2, you may try using::
+
+ ping -c 1 10.0.0.2 ; /sbin/arp -n | grep 10.0.0.2
-TIP: some LAN switches may be configured to suppress ethernet broadcasts
-so it is advised to explicitly specify the remote agents' MAC addresses
-from the config parameters passed to netconsole.
+.. Tip::
-TIP: to find out the MAC address of, say, 10.0.0.2, you may try using:
+ in case the remote logging agent is on a separate LAN subnet than
+ the sender, it is suggested to try specifying the MAC address of the
+ default gateway (you may use /sbin/route -n to find it out) as the
+ remote MAC address instead.
- ping -c 1 10.0.0.2 ; /sbin/arp -n | grep 10.0.0.2
+.. note::
-TIP: in case the remote logging agent is on a separate LAN subnet than
-the sender, it is suggested to try specifying the MAC address of the
-default gateway (you may use /sbin/route -n to find it out) as the
-remote MAC address instead.
+ the network device (eth1 in the above case) can run any kind
+ of other network traffic, netconsole is not intrusive. Netconsole
+ might cause slight delays in other traffic if the volume of kernel
+ messages is high, but should have no other impact.
-NOTE: the network device (eth1 in the above case) can run any kind
-of other network traffic, netconsole is not intrusive. Netconsole
-might cause slight delays in other traffic if the volume of kernel
-messages is high, but should have no other impact.
+.. note::
-NOTE: if you find that the remote logging agent is not receiving or
-printing all messages from the sender, it is likely that you have set
-the "console_loglevel" parameter (on the sender) to only send high
-priority messages to the console. You can change this at runtime using:
+ if you find that the remote logging agent is not receiving or
+ printing all messages from the sender, it is likely that you have set
+ the "console_loglevel" parameter (on the sender) to only send high
+ priority messages to the console. You can change this at runtime using::
- dmesg -n 8
+ dmesg -n 8
-or by specifying "debug" on the kernel command line at boot, to send
-all kernel messages to the console. A specific value for this parameter
-can also be set using the "loglevel" kernel boot option. See the
-dmesg(8) man page and Documentation/admin-guide/kernel-parameters.rst for details.
+ or by specifying "debug" on the kernel command line at boot, to send
+ all kernel messages to the console. A specific value for this parameter
+ can also be set using the "loglevel" kernel boot option. See the
+ dmesg(8) man page and Documentation/admin-guide/kernel-parameters.rst
+ for details.
Netconsole was designed to be as instantaneous as possible, to
enable the logging of even the most critical kernel bugs. It works
diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.rst
index 58dd1c1e3c65..a2d7d7160e39 100644
--- a/Documentation/networking/netdev-features.txt
+++ b/Documentation/networking/netdev-features.rst
@@ -1,3 +1,6 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================================================
Netdev features mess and how to get out from it alive
=====================================================
@@ -6,8 +9,8 @@ Author:
- Part I: Feature sets
-======================
+Part I: Feature sets
+====================
Long gone are the days when a network card would just take and give packets
verbatim. Today's devices add multiple features and bugs (read: offloads)
@@ -39,8 +42,8 @@ one used internally by network core:
- Part II: Controlling enabled features
-=======================================
+Part II: Controlling enabled features
+=====================================
When current feature set (netdev->features) is to be changed, new set
is calculated and filtered by calling ndo_fix_features callback
@@ -65,8 +68,8 @@ driver except by means of ndo_fix_features callback.
- Part III: Implementation hints
-================================
+Part III: Implementation hints
+==============================
* ndo_fix_features:
@@ -94,8 +97,8 @@ Errors returned are not (and cannot be) propagated anywhere except dmesg.
- Part IV: Features
-===================
+Part IV: Features
+=================
For current list of features, see include/linux/netdev_features.h.
This section describes semantics of some of them.
diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.rst
index 7fec2061a334..5a85fcc80c76 100644
--- a/Documentation/networking/netdevices.txt
+++ b/Documentation/networking/netdevices.rst
@@ -1,5 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+=====================================
Network Devices, the Kernel, and You!
+=====================================
Introduction
@@ -75,11 +78,12 @@ ndo_start_xmit:
Don't use it for new drivers.
Context: Process with BHs disabled or BH (timer),
- will be called with interrupts disabled by netconsole.
+ will be called with interrupts disabled by netconsole.
- Return codes:
- o NETDEV_TX_OK everything ok.
- o NETDEV_TX_BUSY Cannot transmit packet, try later
+ Return codes:
+
+ * NETDEV_TX_OK everything ok.
+ * NETDEV_TX_BUSY Cannot transmit packet, try later
Usually a bug, means queue start/stop flow control is broken in
the driver. Note: the driver must NOT put the skb in its DMA ring.
@@ -95,10 +99,13 @@ ndo_set_rx_mode:
struct napi_struct synchronization rules
========================================
napi->poll:
- Synchronization: NAPI_STATE_SCHED bit in napi->state. Device
+ Synchronization:
+ NAPI_STATE_SCHED bit in napi->state. Device
driver's ndo_stop method will invoke napi_disable() on
all NAPI instances which will do a sleeping poll on the
NAPI_STATE_SCHED napi->state bit, waiting for all pending
NAPI activity to cease.
- Context: softirq
- will be called with interrupts disabled by netconsole.
+
+ Context:
+ softirq
+ will be called with interrupts disabled by netconsole.
diff --git a/Documentation/networking/netfilter-sysctl.txt b/Documentation/networking/netfilter-sysctl.rst
index 55791e50e169..beb6d7b275d4 100644
--- a/Documentation/networking/netfilter-sysctl.txt
+++ b/Documentation/networking/netfilter-sysctl.rst
@@ -1,8 +1,15 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================
+Netfilter Sysfs variables
+=========================
+
/proc/sys/net/netfilter/* Variables:
+====================================
nf_log_all_netns - BOOLEAN
- 0 - disabled (default)
- not 0 - enabled
+ - 0 - disabled (default)
+ - not 0 - enabled
By default, only init_net namespace can log packets into kernel log
with LOG target; this aims to prevent containers from flooding host
diff --git a/Documentation/networking/netif-msg.rst b/Documentation/networking/netif-msg.rst
new file mode 100644
index 000000000000..b20d265a734d
--- /dev/null
+++ b/Documentation/networking/netif-msg.rst
@@ -0,0 +1,95 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+NETIF Msg Level
+===============
+
+The design of the network interface message level setting.
+
+History
+-------
+
+ The design of the debugging message interface was guided and
+ constrained by backwards compatibility previous practice. It is useful
+ to understand the history and evolution in order to understand current
+ practice and relate it to older driver source code.
+
+ From the beginning of Linux, each network device driver has had a local
+ integer variable that controls the debug message level. The message
+ level ranged from 0 to 7, and monotonically increased in verbosity.
+
+ The message level was not precisely defined past level 3, but were
+ always implemented within +-1 of the specified level. Drivers tended
+ to shed the more verbose level messages as they matured.
+
+ - 0 Minimal messages, only essential information on fatal errors.
+ - 1 Standard messages, initialization status. No run-time messages
+ - 2 Special media selection messages, generally timer-driver.
+ - 3 Interface starts and stops, including normal status messages
+ - 4 Tx and Rx frame error messages, and abnormal driver operation
+ - 5 Tx packet queue information, interrupt events.
+ - 6 Status on each completed Tx packet and received Rx packets
+ - 7 Initial contents of Tx and Rx packets
+
+ Initially this message level variable was uniquely named in each driver
+ e.g. "lance_debug", so that a kernel symbolic debugger could locate and
+ modify the setting. When kernel modules became common, the variables
+ were consistently renamed to "debug" and allowed to be set as a module
+ parameter.
+
+ This approach worked well. However there is always a demand for
+ additional features. Over the years the following emerged as
+ reasonable and easily implemented enhancements
+
+ - Using an ioctl() call to modify the level.
+ - Per-interface rather than per-driver message level setting.
+ - More selective control over the type of messages emitted.
+
+ The netif_msg recommendation adds these features with only a minor
+ complexity and code size increase.
+
+ The recommendation is the following points
+
+ - Retaining the per-driver integer variable "debug" as a module
+ parameter with a default level of '1'.
+
+ - Adding a per-interface private variable named "msg_enable". The
+ variable is a bit map rather than a level, and is initialized as::
+
+ 1 << debug
+
+ Or more precisely::
+
+ debug < 0 ? 0 : 1 << min(sizeof(int)-1, debug)
+
+ Messages should changes from::
+
+ if (debug > 1)
+ printk(MSG_DEBUG "%s: ...
+
+ to::
+
+ if (np->msg_enable & NETIF_MSG_LINK)
+ printk(MSG_DEBUG "%s: ...
+
+
+The set of message levels is named
+
+
+ ========= =================== ============
+ Old level Name Bit position
+ ========= =================== ============
+ 0 NETIF_MSG_DRV 0x0001
+ 1 NETIF_MSG_PROBE 0x0002
+ 2 NETIF_MSG_LINK 0x0004
+ 2 NETIF_MSG_TIMER 0x0004
+ 3 NETIF_MSG_IFDOWN 0x0008
+ 3 NETIF_MSG_IFUP 0x0008
+ 4 NETIF_MSG_RX_ERR 0x0010
+ 4 NETIF_MSG_TX_ERR 0x0010
+ 5 NETIF_MSG_TX_QUEUED 0x0020
+ 5 NETIF_MSG_INTR 0x0020
+ 6 NETIF_MSG_TX_DONE 0x0040
+ 6 NETIF_MSG_RX_STATUS 0x0040
+ 7 NETIF_MSG_PKTDATA 0x0080
+ ========= =================== ============
diff --git a/Documentation/networking/netif-msg.txt b/Documentation/networking/netif-msg.txt
deleted file mode 100644
index c967ddb90d0b..000000000000
--- a/Documentation/networking/netif-msg.txt
+++ /dev/null
@@ -1,79 +0,0 @@
-
-________________
-NETIF Msg Level
-
-The design of the network interface message level setting.
-
-History
-
- The design of the debugging message interface was guided and
- constrained by backwards compatibility previous practice. It is useful
- to understand the history and evolution in order to understand current
- practice and relate it to older driver source code.
-
- From the beginning of Linux, each network device driver has had a local
- integer variable that controls the debug message level. The message
- level ranged from 0 to 7, and monotonically increased in verbosity.
-
- The message level was not precisely defined past level 3, but were
- always implemented within +-1 of the specified level. Drivers tended
- to shed the more verbose level messages as they matured.
- 0 Minimal messages, only essential information on fatal errors.
- 1 Standard messages, initialization status. No run-time messages
- 2 Special media selection messages, generally timer-driver.
- 3 Interface starts and stops, including normal status messages
- 4 Tx and Rx frame error messages, and abnormal driver operation
- 5 Tx packet queue information, interrupt events.
- 6 Status on each completed Tx packet and received Rx packets
- 7 Initial contents of Tx and Rx packets
-
- Initially this message level variable was uniquely named in each driver
- e.g. "lance_debug", so that a kernel symbolic debugger could locate and
- modify the setting. When kernel modules became common, the variables
- were consistently renamed to "debug" and allowed to be set as a module
- parameter.
-
- This approach worked well. However there is always a demand for
- additional features. Over the years the following emerged as
- reasonable and easily implemented enhancements
- Using an ioctl() call to modify the level.
- Per-interface rather than per-driver message level setting.
- More selective control over the type of messages emitted.
-
- The netif_msg recommendation adds these features with only a minor
- complexity and code size increase.
-
- The recommendation is the following points
- Retaining the per-driver integer variable "debug" as a module
- parameter with a default level of '1'.
-
- Adding a per-interface private variable named "msg_enable". The
- variable is a bit map rather than a level, and is initialized as
- 1 << debug
- Or more precisely
- debug < 0 ? 0 : 1 << min(sizeof(int)-1, debug)
-
- Messages should changes from
- if (debug > 1)
- printk(MSG_DEBUG "%s: ...
- to
- if (np->msg_enable & NETIF_MSG_LINK)
- printk(MSG_DEBUG "%s: ...
-
-
-The set of message levels is named
- Old level Name Bit position
- 0 NETIF_MSG_DRV 0x0001
- 1 NETIF_MSG_PROBE 0x0002
- 2 NETIF_MSG_LINK 0x0004
- 2 NETIF_MSG_TIMER 0x0004
- 3 NETIF_MSG_IFDOWN 0x0008
- 3 NETIF_MSG_IFUP 0x0008
- 4 NETIF_MSG_RX_ERR 0x0010
- 4 NETIF_MSG_TX_ERR 0x0010
- 5 NETIF_MSG_TX_QUEUED 0x0020
- 5 NETIF_MSG_INTR 0x0020
- 6 NETIF_MSG_TX_DONE 0x0040
- 6 NETIF_MSG_RX_STATUS 0x0040
- 7 NETIF_MSG_PKTDATA 0x0080
-
diff --git a/Documentation/networking/nf_conntrack-sysctl.txt b/Documentation/networking/nf_conntrack-sysctl.rst
index f75c2ce6e136..11a9b76786cb 100644
--- a/Documentation/networking/nf_conntrack-sysctl.txt
+++ b/Documentation/networking/nf_conntrack-sysctl.rst
@@ -1,8 +1,15 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================
+Netfilter Conntrack Sysfs variables
+===================================
+
/proc/sys/net/netfilter/nf_conntrack_* Variables:
+=================================================
nf_conntrack_acct - BOOLEAN
- 0 - disabled (default)
- not 0 - enabled
+ - 0 - disabled (default)
+ - not 0 - enabled
Enable connection tracking flow accounting. 64-bit byte and packet
counters per flow are added.
@@ -16,8 +23,8 @@ nf_conntrack_buckets - INTEGER
This sysctl is only writeable in the initial net namespace.
nf_conntrack_checksum - BOOLEAN
- 0 - disabled
- not 0 - enabled (default)
+ - 0 - disabled
+ - not 0 - enabled (default)
Verify checksum of incoming packets. Packets with bad checksums are
in INVALID state. If this is enabled, such packets will not be
@@ -27,8 +34,8 @@ nf_conntrack_count - INTEGER (read-only)
Number of currently allocated flow entries.
nf_conntrack_events - BOOLEAN
- 0 - disabled
- not 0 - enabled (default)
+ - 0 - disabled
+ - not 0 - enabled (default)
If this option is enabled, the connection tracking code will
provide userspace with connection tracking events via ctnetlink.
@@ -62,8 +69,8 @@ nf_conntrack_generic_timeout - INTEGER (seconds)
protocols.
nf_conntrack_helper - BOOLEAN
- 0 - disabled (default)
- not 0 - enabled
+ - 0 - disabled (default)
+ - not 0 - enabled
Enable automatic conntrack helper assignment.
If disabled it is required to set up iptables rules to assign
@@ -81,14 +88,14 @@ nf_conntrack_icmpv6_timeout - INTEGER (seconds)
Default for ICMP6 timeout.
nf_conntrack_log_invalid - INTEGER
- 0 - disable (default)
- 1 - log ICMP packets
- 6 - log TCP packets
- 17 - log UDP packets
- 33 - log DCCP packets
- 41 - log ICMPv6 packets
- 136 - log UDPLITE packets
- 255 - log packets of any protocol
+ - 0 - disable (default)
+ - 1 - log ICMP packets
+ - 6 - log TCP packets
+ - 17 - log UDP packets
+ - 33 - log DCCP packets
+ - 41 - log ICMPv6 packets
+ - 136 - log UDPLITE packets
+ - 255 - log packets of any protocol
Log invalid packets of a type specified by value.
@@ -97,15 +104,15 @@ nf_conntrack_max - INTEGER
nf_conntrack_buckets value * 4.
nf_conntrack_tcp_be_liberal - BOOLEAN
- 0 - disabled (default)
- not 0 - enabled
+ - 0 - disabled (default)
+ - not 0 - enabled
Be conservative in what you do, be liberal in what you accept from others.
If it's non-zero, we mark only out of window RST segments as INVALID.
nf_conntrack_tcp_loose - BOOLEAN
- 0 - disabled
- not 0 - enabled (default)
+ - 0 - disabled
+ - not 0 - enabled (default)
If it is set to zero, we disable picking up already established
connections.
@@ -148,8 +155,8 @@ nf_conntrack_tcp_timeout_unacknowledged - INTEGER (seconds)
default 300
nf_conntrack_timestamp - BOOLEAN
- 0 - disabled (default)
- not 0 - enabled
+ - 0 - disabled (default)
+ - not 0 - enabled
Enable connection tracking flow timestamping.
diff --git a/Documentation/networking/nf_flowtable.txt b/Documentation/networking/nf_flowtable.rst
index 0bf32d1121be..b6e1fa141aae 100644
--- a/Documentation/networking/nf_flowtable.txt
+++ b/Documentation/networking/nf_flowtable.rst
@@ -1,3 +1,6 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================
Netfilter's flowtable infrastructure
====================================
@@ -31,15 +34,17 @@ to use this new alternative forwarding path via nftables policy.
This is represented in Fig.1, which describes the classic forwarding path
including the Netfilter hooks and the flowtable fastpath bypass.
- userspace process
- ^ |
- | |
- _____|____ ____\/___
- / \ / \
- | input | | output |
- \__________/ \_________/
- ^ |
- | |
+::
+
+ userspace process
+ ^ |
+ | |
+ _____|____ ____\/___
+ / \ / \
+ | input | | output |
+ \__________/ \_________/
+ ^ |
+ | |
_________ __________ --------- _____\/_____
/ \ / \ |Routing | / \
--> ingress ---> prerouting ---> |decision| | postrouting |--> neigh_xmit
@@ -59,7 +64,7 @@ including the Netfilter hooks and the flowtable fastpath bypass.
\ / |
|__yes_________________fastpath bypass ____________________________|
- Fig.1 Netfilter hooks and flowtable interactions
+ Fig.1 Netfilter hooks and flowtable interactions
The flowtable entry also stores the NAT configuration, so all packets are
mangled according to the NAT policy that matches the initial packets that went
@@ -72,18 +77,18 @@ Example configuration
---------------------
Enabling the flowtable bypass is relatively easy, you only need to create a
-flowtable and add one rule to your forward chain.
+flowtable and add one rule to your forward chain::
- table inet x {
+ table inet x {
flowtable f {
hook ingress priority 0; devices = { eth0, eth1 };
}
- chain y {
- type filter hook forward priority 0; policy accept;
- ip protocol tcp flow offload @f
- counter packets 0 bytes 0
- }
- }
+ chain y {
+ type filter hook forward priority 0; policy accept;
+ ip protocol tcp flow offload @f
+ counter packets 0 bytes 0
+ }
+ }
This example adds the flowtable 'f' to the ingress hook of the eth0 and eth1
netdevices. You can create as many flowtables as you want in case you need to
@@ -101,12 +106,12 @@ forwarding bypass.
More reading
------------
-This documentation is based on the LWN.net articles [1][2]. Rafal Milecki also
-made a very complete and comprehensive summary called "A state of network
+This documentation is based on the LWN.net articles [1]_\ [2]_. Rafal Milecki
+also made a very complete and comprehensive summary called "A state of network
acceleration" that describes how things were before this infrastructure was
-mailined [3] and it also makes a rough summary of this work [4].
+mailined [3]_ and it also makes a rough summary of this work [4]_.
-[1] https://lwn.net/Articles/738214/
-[2] https://lwn.net/Articles/742164/
-[3] http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html
-[4] http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html
+.. [1] https://lwn.net/Articles/738214/
+.. [2] https://lwn.net/Articles/742164/
+.. [3] http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html
+.. [4] http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html
diff --git a/Documentation/networking/openvswitch.txt b/Documentation/networking/openvswitch.rst
index b3b9ac61d29d..1a8353dbf1b6 100644
--- a/Documentation/networking/openvswitch.txt
+++ b/Documentation/networking/openvswitch.rst
@@ -1,3 +1,6 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============================================
Open vSwitch datapath developer documentation
=============================================
@@ -80,13 +83,13 @@ The <linux/openvswitch.h> header file defines the exact format of the
flow key attributes. For informal explanatory purposes here, we write
them as comma-separated strings, with parentheses indicating arguments
and nesting. For example, the following could represent a flow key
-corresponding to a TCP packet that arrived on vport 1:
+corresponding to a TCP packet that arrived on vport 1::
in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4),
eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0,
frag=no), tcp(src=49163, dst=80)
-Often we ellipsize arguments not important to the discussion, e.g.:
+Often we ellipsize arguments not important to the discussion, e.g.::
in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)
@@ -151,20 +154,20 @@ Some care is needed to really maintain forward and backward
compatibility for applications that follow the rules listed under
"Flow key compatibility" above.
-The basic rule is obvious:
+The basic rule is obvious::
- ------------------------------------------------------------------
+ ==================================================================
New network protocol support must only supplement existing flow
key attributes. It must not change the meaning of already defined
flow key attributes.
- ------------------------------------------------------------------
+ ==================================================================
This rule does have less-obvious consequences so it is worth working
through a few examples. Suppose, for example, that the kernel module
did not already implement VLAN parsing. Instead, it just interpreted
the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the
packet. The flow key for any packet with an 802.1Q header would look
-essentially like this, ignoring metadata:
+essentially like this, ignoring metadata::
eth(...), eth_type(0x8100)
@@ -172,7 +175,7 @@ Naively, to add VLAN support, it makes sense to add a new "vlan" flow
key attribute to contain the VLAN tag, then continue to decode the
encapsulated headers beyond the VLAN tag using the existing field
definitions. With this change, a TCP packet in VLAN 10 would have a
-flow key much like this:
+flow key much like this::
eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)
@@ -187,7 +190,7 @@ across kernel versions even though it follows the compatibility rules.
The solution is to use a set of nested attributes. This is, for
example, why 802.1Q support uses nested attributes. A TCP packet in
-VLAN 10 is actually expressed as:
+VLAN 10 is actually expressed as::
eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800),
ip(proto=6, ...), tcp(...)))
@@ -215,14 +218,14 @@ For example, consider a packet that contains an IP header that
indicates protocol 6 for TCP, but which is truncated just after the IP
header, so that the TCP header is missing. The flow key for this
packet would include a tcp attribute with all-zero src and dst, like
-this:
+this::
eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0)
As another example, consider a packet with an Ethernet type of 0x8100,
indicating that a VLAN TCI should follow, but which is truncated just
after the Ethernet type. The flow key for this packet would include
-an all-zero-bits vlan and an empty encap attribute, like this:
+an all-zero-bits vlan and an empty encap attribute, like this::
eth(...), eth_type(0x8100), vlan(0), encap()
diff --git a/Documentation/networking/operstates.txt b/Documentation/networking/operstates.rst
index b203d1334822..9c918f7cb0e8 100644
--- a/Documentation/networking/operstates.txt
+++ b/Documentation/networking/operstates.rst
@@ -1,5 +1,12 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================
+Operational States
+==================
+
1. Introduction
+===============
Linux distinguishes between administrative and operational state of an
interface. Administrative state is the result of "ip link set dev
@@ -20,6 +27,7 @@ and changeable from userspace under certain rules.
2. Querying from userspace
+==========================
Both admin and operational state can be queried via the netlink
operation RTM_GETLINK. It is also possible to subscribe to RTNLGRP_LINK
@@ -30,16 +38,20 @@ These values contain interface state:
ifinfomsg::if_flags & IFF_UP:
Interface is admin up
+
ifinfomsg::if_flags & IFF_RUNNING:
Interface is in RFC2863 operational state UP or UNKNOWN. This is for
backward compatibility, routing daemons, dhcp clients can use this
flag to determine whether they should use the interface.
+
ifinfomsg::if_flags & IFF_LOWER_UP:
Driver has signaled netif_carrier_on()
+
ifinfomsg::if_flags & IFF_DORMANT:
Driver has signaled netif_dormant_on()
TLV IFLA_OPERSTATE
+------------------
contains RFC2863 state of the interface in numeric representation:
@@ -47,26 +59,33 @@ IF_OPER_UNKNOWN (0):
Interface is in unknown state, neither driver nor userspace has set
operational state. Interface must be considered for user data as
setting operational state has not been implemented in every driver.
+
IF_OPER_NOTPRESENT (1):
Unused in current kernel (notpresent interfaces normally disappear),
just a numerical placeholder.
+
IF_OPER_DOWN (2):
Interface is unable to transfer data on L1, f.e. ethernet is not
plugged or interface is ADMIN down.
+
IF_OPER_LOWERLAYERDOWN (3):
Interfaces stacked on an interface that is IF_OPER_DOWN show this
state (f.e. VLAN).
+
IF_OPER_TESTING (4):
Unused in current kernel.
+
IF_OPER_DORMANT (5):
Interface is L1 up, but waiting for an external event, f.e. for a
protocol to establish. (802.1X)
+
IF_OPER_UP (6):
Interface is operational up and can be used.
This TLV can also be queried via sysfs.
TLV IFLA_LINKMODE
+-----------------
contains link policy. This is needed for userspace interaction
described below.
@@ -75,6 +94,7 @@ This TLV can also be queried via sysfs.
3. Kernel driver API
+====================
Kernel drivers have access to two flags that map to IFF_LOWER_UP and
IFF_DORMANT. These flags can be set from everywhere, even from
@@ -126,6 +146,7 @@ netif_carrier_ok() && !netif_dormant():
4. Setting from userspace
+=========================
Applications have to use the netlink interface to influence the
RFC2863 operational state of an interface. Setting IFLA_LINKMODE to 1
@@ -139,18 +160,18 @@ are multicasted on the netlink group RTNLGRP_LINK.
So basically a 802.1X supplicant interacts with the kernel like this:
--subscribe to RTNLGRP_LINK
--set IFLA_LINKMODE to 1 via RTM_SETLINK
--query RTM_GETLINK once to get initial state
--if initial flags are not (IFF_LOWER_UP && !IFF_DORMANT), wait until
- netlink multicast signals this state
--do 802.1X, eventually abort if flags go down again
--send RTM_SETLINK to set operstate to IF_OPER_UP if authentication
- succeeds, IF_OPER_DORMANT otherwise
--see how operstate and IFF_RUNNING is echoed via netlink multicast
--set interface back to IF_OPER_DORMANT if 802.1X reauthentication
- fails
--restart if kernel changes IFF_LOWER_UP or IFF_DORMANT flag
+- subscribe to RTNLGRP_LINK
+- set IFLA_LINKMODE to 1 via RTM_SETLINK
+- query RTM_GETLINK once to get initial state
+- if initial flags are not (IFF_LOWER_UP && !IFF_DORMANT), wait until
+ netlink multicast signals this state
+- do 802.1X, eventually abort if flags go down again
+- send RTM_SETLINK to set operstate to IF_OPER_UP if authentication
+ succeeds, IF_OPER_DORMANT otherwise
+- see how operstate and IFF_RUNNING is echoed via netlink multicast
+- set interface back to IF_OPER_DORMANT if 802.1X reauthentication
+ fails
+- restart if kernel changes IFF_LOWER_UP or IFF_DORMANT flag
if supplicant goes down, bring back IFLA_LINKMODE to 0 and
IFLA_OPERSTATE to a sane value.
diff --git a/Documentation/networking/packet_mmap.rst b/Documentation/networking/packet_mmap.rst
new file mode 100644
index 000000000000..6c009ceb1183
--- /dev/null
+++ b/Documentation/networking/packet_mmap.rst
@@ -0,0 +1,1084 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========
+Packet MMAP
+===========
+
+Abstract
+========
+
+This file documents the mmap() facility available with the PACKET
+socket interface on 2.4/2.6/3.x kernels. This type of sockets is used for
+
+i) capture network traffic with utilities like tcpdump,
+ii) transmit network traffic, or any other that needs raw
+ access to network interface.
+
+Howto can be found at:
+
+ https://sites.google.com/site/packetmmap/
+
+Please send your comments to
+ - Ulisses Alonso Camaró <uaca@i.hate.spam.alumni.uv.es>
+ - Johann Baudy
+
+Why use PACKET_MMAP
+===================
+
+In Linux 2.4/2.6/3.x if PACKET_MMAP is not enabled, the capture process is very
+inefficient. It uses very limited buffers and requires one system call to
+capture each packet, it requires two if you want to get packet's timestamp
+(like libpcap always does).
+
+In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size
+configurable circular buffer mapped in user space that can be used to either
+send or receive packets. This way reading packets just needs to wait for them,
+most of the time there is no need to issue a single system call. Concerning
+transmission, multiple packets can be sent through one system call to get the
+highest bandwidth. By using a shared buffer between the kernel and the user
+also has the benefit of minimizing packet copies.
+
+It's fine to use PACKET_MMAP to improve the performance of the capture and
+transmission process, but it isn't everything. At least, if you are capturing
+at high speeds (this is relative to the cpu speed), you should check if the
+device driver of your network interface card supports some sort of interrupt
+load mitigation or (even better) if it supports NAPI, also make sure it is
+enabled. For transmission, check the MTU (Maximum Transmission Unit) used and
+supported by devices of your network. CPU IRQ pinning of your network interface
+card can also be an advantage.
+
+How to use mmap() to improve capture process
+============================================
+
+From the user standpoint, you should use the higher level libpcap library, which
+is a de facto standard, portable across nearly all operating systems
+including Win32.
+
+Packet MMAP support was integrated into libpcap around the time of version 1.3.0;
+TPACKET_V3 support was added in version 1.5.0
+
+How to use mmap() directly to improve capture process
+=====================================================
+
+From the system calls stand point, the use of PACKET_MMAP involves
+the following process::
+
+
+ [setup] socket() -------> creation of the capture socket
+ setsockopt() ---> allocation of the circular buffer (ring)
+ option: PACKET_RX_RING
+ mmap() ---------> mapping of the allocated buffer to the
+ user process
+
+ [capture] poll() ---------> to wait for incoming packets
+
+ [shutdown] close() --------> destruction of the capture socket and
+ deallocation of all associated
+ resources.
+
+
+socket creation and destruction is straight forward, and is done
+the same way with or without PACKET_MMAP::
+
+ int fd = socket(PF_PACKET, mode, htons(ETH_P_ALL));
+
+where mode is SOCK_RAW for the raw interface were link level
+information can be captured or SOCK_DGRAM for the cooked
+interface where link level information capture is not
+supported and a link level pseudo-header is provided
+by the kernel.
+
+The destruction of the socket and all associated resources
+is done by a simple call to close(fd).
+
+Similarly as without PACKET_MMAP, it is possible to use one socket
+for capture and transmission. This can be done by mapping the
+allocated RX and TX buffer ring with a single mmap() call.
+See "Mapping and use of the circular buffer (ring)".
+
+Next I will describe PACKET_MMAP settings and its constraints,
+also the mapping of the circular buffer in the user process and
+the use of this buffer.
+
+How to use mmap() directly to improve transmission process
+==========================================================
+Transmission process is similar to capture as shown below::
+
+ [setup] socket() -------> creation of the transmission socket
+ setsockopt() ---> allocation of the circular buffer (ring)
+ option: PACKET_TX_RING
+ bind() ---------> bind transmission socket with a network interface
+ mmap() ---------> mapping of the allocated buffer to the
+ user process
+
+ [transmission] poll() ---------> wait for free packets (optional)
+ send() ---------> send all packets that are set as ready in
+ the ring
+ The flag MSG_DONTWAIT can be used to return
+ before end of transfer.
+
+ [shutdown] close() --------> destruction of the transmission socket and
+ deallocation of all associated resources.
+
+Socket creation and destruction is also straight forward, and is done
+the same way as in capturing described in the previous paragraph::
+
+ int fd = socket(PF_PACKET, mode, 0);
+
+The protocol can optionally be 0 in case we only want to transmit
+via this socket, which avoids an expensive call to packet_rcv().
+In this case, you also need to bind(2) the TX_RING with sll_protocol = 0
+set. Otherwise, htons(ETH_P_ALL) or any other protocol, for example.
+
+Binding the socket to your network interface is mandatory (with zero copy) to
+know the header size of frames used in the circular buffer.
+
+As capture, each frame contains two parts::
+
+ --------------------
+ | struct tpacket_hdr | Header. It contains the status of
+ | | of this frame
+ |--------------------|
+ | data buffer |
+ . . Data that will be sent over the network interface.
+ . .
+ --------------------
+
+ bind() associates the socket to your network interface thanks to
+ sll_ifindex parameter of struct sockaddr_ll.
+
+ Initialization example::
+
+ struct sockaddr_ll my_addr;
+ struct ifreq s_ifr;
+ ...
+
+ strncpy (s_ifr.ifr_name, "eth0", sizeof(s_ifr.ifr_name));
+
+ /* get interface index of eth0 */
+ ioctl(this->socket, SIOCGIFINDEX, &s_ifr);
+
+ /* fill sockaddr_ll struct to prepare binding */
+ my_addr.sll_family = AF_PACKET;
+ my_addr.sll_protocol = htons(ETH_P_ALL);
+ my_addr.sll_ifindex = s_ifr.ifr_ifindex;
+
+ /* bind socket to eth0 */
+ bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll));
+
+ A complete tutorial is available at: https://sites.google.com/site/packetmmap/
+
+By default, the user should put data at::
+
+ frame base + TPACKET_HDRLEN - sizeof(struct sockaddr_ll)
+
+So, whatever you choose for the socket mode (SOCK_DGRAM or SOCK_RAW),
+the beginning of the user data will be at::
+
+ frame base + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
+
+If you wish to put user data at a custom offset from the beginning of
+the frame (for payload alignment with SOCK_RAW mode for instance) you
+can set tp_net (with SOCK_DGRAM) or tp_mac (with SOCK_RAW). In order
+to make this work it must be enabled previously with setsockopt()
+and the PACKET_TX_HAS_OFF option.
+
+PACKET_MMAP settings
+====================
+
+To setup PACKET_MMAP from user level code is done with a call like
+
+ - Capture process::
+
+ setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req, sizeof(req))
+
+ - Transmission process::
+
+ setsockopt(fd, SOL_PACKET, PACKET_TX_RING, (void *) &req, sizeof(req))
+
+The most significant argument in the previous call is the req parameter,
+this parameter must to have the following structure::
+
+ struct tpacket_req
+ {
+ unsigned int tp_block_size; /* Minimal size of contiguous block */
+ unsigned int tp_block_nr; /* Number of blocks */
+ unsigned int tp_frame_size; /* Size of frame */
+ unsigned int tp_frame_nr; /* Total number of frames */
+ };
+
+This structure is defined in /usr/include/linux/if_packet.h and establishes a
+circular buffer (ring) of unswappable memory.
+Being mapped in the capture process allows reading the captured frames and
+related meta-information like timestamps without requiring a system call.
+
+Frames are grouped in blocks. Each block is a physically contiguous
+region of memory and holds tp_block_size/tp_frame_size frames. The total number
+of blocks is tp_block_nr. Note that tp_frame_nr is a redundant parameter because::
+
+ frames_per_block = tp_block_size/tp_frame_size
+
+indeed, packet_set_ring checks that the following condition is true::
+
+ frames_per_block * tp_block_nr == tp_frame_nr
+
+Lets see an example, with the following values::
+
+ tp_block_size= 4096
+ tp_frame_size= 2048
+ tp_block_nr = 4
+ tp_frame_nr = 8
+
+we will get the following buffer structure::
+
+ block #1 block #2
+ +---------+---------+ +---------+---------+
+ | frame 1 | frame 2 | | frame 3 | frame 4 |
+ +---------+---------+ +---------+---------+
+
+ block #3 block #4
+ +---------+---------+ +---------+---------+
+ | frame 5 | frame 6 | | frame 7 | frame 8 |
+ +---------+---------+ +---------+---------+
+
+A frame can be of any size with the only condition it can fit in a block. A block
+can only hold an integer number of frames, or in other words, a frame cannot
+be spawned across two blocks, so there are some details you have to take into
+account when choosing the frame_size. See "Mapping and use of the circular
+buffer (ring)".
+
+PACKET_MMAP setting constraints
+===============================
+
+In kernel versions prior to 2.4.26 (for the 2.4 branch) and 2.6.5 (2.6 branch),
+the PACKET_MMAP buffer could hold only 32768 frames in a 32 bit architecture or
+16384 in a 64 bit architecture. For information on these kernel versions
+see http://pusa.uv.es/~ulisses/packet_mmap/packet_mmap.pre-2.4.26_2.6.5.txt
+
+Block size limit
+----------------
+
+As stated earlier, each block is a contiguous physical region of memory. These
+memory regions are allocated with calls to the __get_free_pages() function. As
+the name indicates, this function allocates pages of memory, and the second
+argument is "order" or a power of two number of pages, that is
+(for PAGE_SIZE == 4096) order=0 ==> 4096 bytes, order=1 ==> 8192 bytes,
+order=2 ==> 16384 bytes, etc. The maximum size of a
+region allocated by __get_free_pages is determined by the MAX_ORDER macro. More
+precisely the limit can be calculated as::
+
+ PAGE_SIZE << MAX_ORDER
+
+ In a i386 architecture PAGE_SIZE is 4096 bytes
+ In a 2.4/i386 kernel MAX_ORDER is 10
+ In a 2.6/i386 kernel MAX_ORDER is 11
+
+So get_free_pages can allocate as much as 4MB or 8MB in a 2.4/2.6 kernel
+respectively, with an i386 architecture.
+
+User space programs can include /usr/include/sys/user.h and
+/usr/include/linux/mmzone.h to get PAGE_SIZE MAX_ORDER declarations.
+
+The pagesize can also be determined dynamically with the getpagesize (2)
+system call.
+
+Block number limit
+------------------
+
+To understand the constraints of PACKET_MMAP, we have to see the structure
+used to hold the pointers to each block.
+
+Currently, this structure is a dynamically allocated vector with kmalloc
+called pg_vec, its size limits the number of blocks that can be allocated::
+
+ +---+---+---+---+
+ | x | x | x | x |
+ +---+---+---+---+
+ | | | |
+ | | | v
+ | | v block #4
+ | v block #3
+ v block #2
+ block #1
+
+kmalloc allocates any number of bytes of physically contiguous memory from
+a pool of pre-determined sizes. This pool of memory is maintained by the slab
+allocator which is at the end the responsible for doing the allocation and
+hence which imposes the maximum memory that kmalloc can allocate.
+
+In a 2.4/2.6 kernel and the i386 architecture, the limit is 131072 bytes. The
+predetermined sizes that kmalloc uses can be checked in the "size-<bytes>"
+entries of /proc/slabinfo
+
+In a 32 bit architecture, pointers are 4 bytes long, so the total number of
+pointers to blocks is::
+
+ 131072/4 = 32768 blocks
+
+PACKET_MMAP buffer size calculator
+==================================
+
+Definitions:
+
+============== ================================================================
+<size-max> is the maximum size of allocable with kmalloc
+ (see /proc/slabinfo)
+<pointer size> depends on the architecture -- ``sizeof(void *)``
+<page size> depends on the architecture -- PAGE_SIZE or getpagesize (2)
+<max-order> is the value defined with MAX_ORDER
+<frame size> it's an upper bound of frame's capture size (more on this later)
+============== ================================================================
+
+from these definitions we will derive::
+
+ <block number> = <size-max>/<pointer size>
+ <block size> = <pagesize> << <max-order>
+
+so, the max buffer size is::
+
+ <block number> * <block size>
+
+and, the number of frames be::
+
+ <block number> * <block size> / <frame size>
+
+Suppose the following parameters, which apply for 2.6 kernel and an
+i386 architecture::
+
+ <size-max> = 131072 bytes
+ <pointer size> = 4 bytes
+ <pagesize> = 4096 bytes
+ <max-order> = 11
+
+and a value for <frame size> of 2048 bytes. These parameters will yield::
+
+ <block number> = 131072/4 = 32768 blocks
+ <block size> = 4096 << 11 = 8 MiB.
+
+and hence the buffer will have a 262144 MiB size. So it can hold
+262144 MiB / 2048 bytes = 134217728 frames
+
+Actually, this buffer size is not possible with an i386 architecture.
+Remember that the memory is allocated in kernel space, in the case of
+an i386 kernel's memory size is limited to 1GiB.
+
+All memory allocations are not freed until the socket is closed. The memory
+allocations are done with GFP_KERNEL priority, this basically means that
+the allocation can wait and swap other process' memory in order to allocate
+the necessary memory, so normally limits can be reached.
+
+Other constraints
+-----------------
+
+If you check the source code you will see that what I draw here as a frame
+is not only the link level frame. At the beginning of each frame there is a
+header called struct tpacket_hdr used in PACKET_MMAP to hold link level's frame
+meta information like timestamp. So what we draw here a frame it's really
+the following (from include/linux/if_packet.h)::
+
+ /*
+ Frame structure:
+
+ - Start. Frame must be aligned to TPACKET_ALIGNMENT=16
+ - struct tpacket_hdr
+ - pad to TPACKET_ALIGNMENT=16
+ - struct sockaddr_ll
+ - Gap, chosen so that packet data (Start+tp_net) aligns to
+ TPACKET_ALIGNMENT=16
+ - Start+tp_mac: [ Optional MAC header ]
+ - Start+tp_net: Packet data, aligned to TPACKET_ALIGNMENT=16.
+ - Pad to align to TPACKET_ALIGNMENT=16
+ */
+
+The following are conditions that are checked in packet_set_ring
+
+ - tp_block_size must be a multiple of PAGE_SIZE (1)
+ - tp_frame_size must be greater than TPACKET_HDRLEN (obvious)
+ - tp_frame_size must be a multiple of TPACKET_ALIGNMENT
+ - tp_frame_nr must be exactly frames_per_block*tp_block_nr
+
+Note that tp_block_size should be chosen to be a power of two or there will
+be a waste of memory.
+
+Mapping and use of the circular buffer (ring)
+---------------------------------------------
+
+The mapping of the buffer in the user process is done with the conventional
+mmap function. Even the circular buffer is compound of several physically
+discontiguous blocks of memory, they are contiguous to the user space, hence
+just one call to mmap is needed::
+
+ mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+
+If tp_frame_size is a divisor of tp_block_size frames will be
+contiguously spaced by tp_frame_size bytes. If not, each
+tp_block_size/tp_frame_size frames there will be a gap between
+the frames. This is because a frame cannot be spawn across two
+blocks.
+
+To use one socket for capture and transmission, the mapping of both the
+RX and TX buffer ring has to be done with one call to mmap::
+
+ ...
+ setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &foo, sizeof(foo));
+ setsockopt(fd, SOL_PACKET, PACKET_TX_RING, &bar, sizeof(bar));
+ ...
+ rx_ring = mmap(0, size * 2, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+ tx_ring = rx_ring + size;
+
+RX must be the first as the kernel maps the TX ring memory right
+after the RX one.
+
+At the beginning of each frame there is an status field (see
+struct tpacket_hdr). If this field is 0 means that the frame is ready
+to be used for the kernel, If not, there is a frame the user can read
+and the following flags apply:
+
+Capture process
+^^^^^^^^^^^^^^^
+
+ from include/linux/if_packet.h
+
+ #define TP_STATUS_COPY (1 << 1)
+ #define TP_STATUS_LOSING (1 << 2)
+ #define TP_STATUS_CSUMNOTREADY (1 << 3)
+ #define TP_STATUS_CSUM_VALID (1 << 7)
+
+====================== =======================================================
+TP_STATUS_COPY This flag indicates that the frame (and associated
+ meta information) has been truncated because it's
+ larger than tp_frame_size. This packet can be
+ read entirely with recvfrom().
+
+ In order to make this work it must to be
+ enabled previously with setsockopt() and
+ the PACKET_COPY_THRESH option.
+
+ The number of frames that can be buffered to
+ be read with recvfrom is limited like a normal socket.
+ See the SO_RCVBUF option in the socket (7) man page.
+
+TP_STATUS_LOSING indicates there were packet drops from last time
+ statistics where checked with getsockopt() and
+ the PACKET_STATISTICS option.
+
+TP_STATUS_CSUMNOTREADY currently it's used for outgoing IP packets which
+ its checksum will be done in hardware. So while
+ reading the packet we should not try to check the
+ checksum.
+
+TP_STATUS_CSUM_VALID This flag indicates that at least the transport
+ header checksum of the packet has been already
+ validated on the kernel side. If the flag is not set
+ then we are free to check the checksum by ourselves
+ provided that TP_STATUS_CSUMNOTREADY is also not set.
+====================== =======================================================
+
+for convenience there are also the following defines::
+
+ #define TP_STATUS_KERNEL 0
+ #define TP_STATUS_USER 1
+
+The kernel initializes all frames to TP_STATUS_KERNEL, when the kernel
+receives a packet it puts in the buffer and updates the status with
+at least the TP_STATUS_USER flag. Then the user can read the packet,
+once the packet is read the user must zero the status field, so the kernel
+can use again that frame buffer.
+
+The user can use poll (any other variant should apply too) to check if new
+packets are in the ring::
+
+ struct pollfd pfd;
+
+ pfd.fd = fd;
+ pfd.revents = 0;
+ pfd.events = POLLIN|POLLRDNORM|POLLERR;
+
+ if (status == TP_STATUS_KERNEL)
+ retval = poll(&pfd, 1, timeout);
+
+It doesn't incur in a race condition to first check the status value and
+then poll for frames.
+
+Transmission process
+^^^^^^^^^^^^^^^^^^^^
+
+Those defines are also used for transmission::
+
+ #define TP_STATUS_AVAILABLE 0 // Frame is available
+ #define TP_STATUS_SEND_REQUEST 1 // Frame will be sent on next send()
+ #define TP_STATUS_SENDING 2 // Frame is currently in transmission
+ #define TP_STATUS_WRONG_FORMAT 4 // Frame format is not correct
+
+First, the kernel initializes all frames to TP_STATUS_AVAILABLE. To send a
+packet, the user fills a data buffer of an available frame, sets tp_len to
+current data buffer size and sets its status field to TP_STATUS_SEND_REQUEST.
+This can be done on multiple frames. Once the user is ready to transmit, it
+calls send(). Then all buffers with status equal to TP_STATUS_SEND_REQUEST are
+forwarded to the network device. The kernel updates each status of sent
+frames with TP_STATUS_SENDING until the end of transfer.
+
+At the end of each transfer, buffer status returns to TP_STATUS_AVAILABLE.
+
+::
+
+ header->tp_len = in_i_size;
+ header->tp_status = TP_STATUS_SEND_REQUEST;
+ retval = send(this->socket, NULL, 0, 0);
+
+The user can also use poll() to check if a buffer is available:
+
+(status == TP_STATUS_SENDING)
+
+::
+
+ struct pollfd pfd;
+ pfd.fd = fd;
+ pfd.revents = 0;
+ pfd.events = POLLOUT;
+ retval = poll(&pfd, 1, timeout);
+
+What TPACKET versions are available and when to use them?
+=========================================================
+
+::
+
+ int val = tpacket_version;
+ setsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val));
+ getsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val));
+
+where 'tpacket_version' can be TPACKET_V1 (default), TPACKET_V2, TPACKET_V3.
+
+TPACKET_V1:
+ - Default if not otherwise specified by setsockopt(2)
+ - RX_RING, TX_RING available
+
+TPACKET_V1 --> TPACKET_V2:
+ - Made 64 bit clean due to unsigned long usage in TPACKET_V1
+ structures, thus this also works on 64 bit kernel with 32 bit
+ userspace and the like
+ - Timestamp resolution in nanoseconds instead of microseconds
+ - RX_RING, TX_RING available
+ - VLAN metadata information available for packets
+ (TP_STATUS_VLAN_VALID, TP_STATUS_VLAN_TPID_VALID),
+ in the tpacket2_hdr structure:
+
+ - TP_STATUS_VLAN_VALID bit being set into the tp_status field indicates
+ that the tp_vlan_tci field has valid VLAN TCI value
+ - TP_STATUS_VLAN_TPID_VALID bit being set into the tp_status field
+ indicates that the tp_vlan_tpid field has valid VLAN TPID value
+
+ - How to switch to TPACKET_V2:
+
+ 1. Replace struct tpacket_hdr by struct tpacket2_hdr
+ 2. Query header len and save
+ 3. Set protocol version to 2, set up ring as usual
+ 4. For getting the sockaddr_ll,
+ use ``(void *)hdr + TPACKET_ALIGN(hdrlen)`` instead of
+ ``(void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr))``
+
+TPACKET_V2 --> TPACKET_V3:
+ - Flexible buffer implementation for RX_RING:
+ 1. Blocks can be configured with non-static frame-size
+ 2. Read/poll is at a block-level (as opposed to packet-level)
+ 3. Added poll timeout to avoid indefinite user-space wait
+ on idle links
+ 4. Added user-configurable knobs:
+
+ 4.1 block::timeout
+ 4.2 tpkt_hdr::sk_rxhash
+
+ - RX Hash data available in user space
+ - TX_RING semantics are conceptually similar to TPACKET_V2;
+ use tpacket3_hdr instead of tpacket2_hdr, and TPACKET3_HDRLEN
+ instead of TPACKET2_HDRLEN. In the current implementation,
+ the tp_next_offset field in the tpacket3_hdr MUST be set to
+ zero, indicating that the ring does not hold variable sized frames.
+ Packets with non-zero values of tp_next_offset will be dropped.
+
+AF_PACKET fanout mode
+=====================
+
+In the AF_PACKET fanout mode, packet reception can be load balanced among
+processes. This also works in combination with mmap(2) on packet sockets.
+
+Currently implemented fanout policies are:
+
+ - PACKET_FANOUT_HASH: schedule to socket by skb's packet hash
+ - PACKET_FANOUT_LB: schedule to socket by round-robin
+ - PACKET_FANOUT_CPU: schedule to socket by CPU packet arrives on
+ - PACKET_FANOUT_RND: schedule to socket by random selection
+ - PACKET_FANOUT_ROLLOVER: if one socket is full, rollover to another
+ - PACKET_FANOUT_QM: schedule to socket by skbs recorded queue_mapping
+
+Minimal example code by David S. Miller (try things like "./test eth0 hash",
+"./test eth0 lb", etc.)::
+
+ #include <stddef.h>
+ #include <stdlib.h>
+ #include <stdio.h>
+ #include <string.h>
+
+ #include <sys/types.h>
+ #include <sys/wait.h>
+ #include <sys/socket.h>
+ #include <sys/ioctl.h>
+
+ #include <unistd.h>
+
+ #include <linux/if_ether.h>
+ #include <linux/if_packet.h>
+
+ #include <net/if.h>
+
+ static const char *device_name;
+ static int fanout_type;
+ static int fanout_id;
+
+ #ifndef PACKET_FANOUT
+ # define PACKET_FANOUT 18
+ # define PACKET_FANOUT_HASH 0
+ # define PACKET_FANOUT_LB 1
+ #endif
+
+ static int setup_socket(void)
+ {
+ int err, fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_IP));
+ struct sockaddr_ll ll;
+ struct ifreq ifr;
+ int fanout_arg;
+
+ if (fd < 0) {
+ perror("socket");
+ return EXIT_FAILURE;
+ }
+
+ memset(&ifr, 0, sizeof(ifr));
+ strcpy(ifr.ifr_name, device_name);
+ err = ioctl(fd, SIOCGIFINDEX, &ifr);
+ if (err < 0) {
+ perror("SIOCGIFINDEX");
+ return EXIT_FAILURE;
+ }
+
+ memset(&ll, 0, sizeof(ll));
+ ll.sll_family = AF_PACKET;
+ ll.sll_ifindex = ifr.ifr_ifindex;
+ err = bind(fd, (struct sockaddr *) &ll, sizeof(ll));
+ if (err < 0) {
+ perror("bind");
+ return EXIT_FAILURE;
+ }
+
+ fanout_arg = (fanout_id | (fanout_type << 16));
+ err = setsockopt(fd, SOL_PACKET, PACKET_FANOUT,
+ &fanout_arg, sizeof(fanout_arg));
+ if (err) {
+ perror("setsockopt");
+ return EXIT_FAILURE;
+ }
+
+ return fd;
+ }
+
+ static void fanout_thread(void)
+ {
+ int fd = setup_socket();
+ int limit = 10000;
+
+ if (fd < 0)
+ exit(fd);
+
+ while (limit-- > 0) {
+ char buf[1600];
+ int err;
+
+ err = read(fd, buf, sizeof(buf));
+ if (err < 0) {
+ perror("read");
+ exit(EXIT_FAILURE);
+ }
+ if ((limit % 10) == 0)
+ fprintf(stdout, "(%d) \n", getpid());
+ }
+
+ fprintf(stdout, "%d: Received 10000 packets\n", getpid());
+
+ close(fd);
+ exit(0);
+ }
+
+ int main(int argc, char **argp)
+ {
+ int fd, err;
+ int i;
+
+ if (argc != 3) {
+ fprintf(stderr, "Usage: %s INTERFACE {hash|lb}\n", argp[0]);
+ return EXIT_FAILURE;
+ }
+
+ if (!strcmp(argp[2], "hash"))
+ fanout_type = PACKET_FANOUT_HASH;
+ else if (!strcmp(argp[2], "lb"))
+ fanout_type = PACKET_FANOUT_LB;
+ else {
+ fprintf(stderr, "Unknown fanout type [%s]\n", argp[2]);
+ exit(EXIT_FAILURE);
+ }
+
+ device_name = argp[1];
+ fanout_id = getpid() & 0xffff;
+
+ for (i = 0; i < 4; i++) {
+ pid_t pid = fork();
+
+ switch (pid) {
+ case 0:
+ fanout_thread();
+
+ case -1:
+ perror("fork");
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ for (i = 0; i < 4; i++) {
+ int status;
+
+ wait(&status);
+ }
+
+ return 0;
+ }
+
+AF_PACKET TPACKET_V3 example
+============================
+
+AF_PACKET's TPACKET_V3 ring buffer can be configured to use non-static frame
+sizes by doing it's own memory management. It is based on blocks where polling
+works on a per block basis instead of per ring as in TPACKET_V2 and predecessor.
+
+It is said that TPACKET_V3 brings the following benefits:
+
+ * ~15% - 20% reduction in CPU-usage
+ * ~20% increase in packet capture rate
+ * ~2x increase in packet density
+ * Port aggregation analysis
+ * Non static frame size to capture entire packet payload
+
+So it seems to be a good candidate to be used with packet fanout.
+
+Minimal example code by Daniel Borkmann based on Chetan Loke's lolpcap (compile
+it with gcc -Wall -O2 blob.c, and try things like "./a.out eth0", etc.)::
+
+ /* Written from scratch, but kernel-to-user space API usage
+ * dissected from lolpcap:
+ * Copyright 2011, Chetan Loke <loke.chetan@gmail.com>
+ * License: GPL, version 2.0
+ */
+
+ #include <stdio.h>
+ #include <stdlib.h>
+ #include <stdint.h>
+ #include <string.h>
+ #include <assert.h>
+ #include <net/if.h>
+ #include <arpa/inet.h>
+ #include <netdb.h>
+ #include <poll.h>
+ #include <unistd.h>
+ #include <signal.h>
+ #include <inttypes.h>
+ #include <sys/socket.h>
+ #include <sys/mman.h>
+ #include <linux/if_packet.h>
+ #include <linux/if_ether.h>
+ #include <linux/ip.h>
+
+ #ifndef likely
+ # define likely(x) __builtin_expect(!!(x), 1)
+ #endif
+ #ifndef unlikely
+ # define unlikely(x) __builtin_expect(!!(x), 0)
+ #endif
+
+ struct block_desc {
+ uint32_t version;
+ uint32_t offset_to_priv;
+ struct tpacket_hdr_v1 h1;
+ };
+
+ struct ring {
+ struct iovec *rd;
+ uint8_t *map;
+ struct tpacket_req3 req;
+ };
+
+ static unsigned long packets_total = 0, bytes_total = 0;
+ static sig_atomic_t sigint = 0;
+
+ static void sighandler(int num)
+ {
+ sigint = 1;
+ }
+
+ static int setup_socket(struct ring *ring, char *netdev)
+ {
+ int err, i, fd, v = TPACKET_V3;
+ struct sockaddr_ll ll;
+ unsigned int blocksiz = 1 << 22, framesiz = 1 << 11;
+ unsigned int blocknum = 64;
+
+ fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+ if (fd < 0) {
+ perror("socket");
+ exit(1);
+ }
+
+ err = setsockopt(fd, SOL_PACKET, PACKET_VERSION, &v, sizeof(v));
+ if (err < 0) {
+ perror("setsockopt");
+ exit(1);
+ }
+
+ memset(&ring->req, 0, sizeof(ring->req));
+ ring->req.tp_block_size = blocksiz;
+ ring->req.tp_frame_size = framesiz;
+ ring->req.tp_block_nr = blocknum;
+ ring->req.tp_frame_nr = (blocksiz * blocknum) / framesiz;
+ ring->req.tp_retire_blk_tov = 60;
+ ring->req.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH;
+
+ err = setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &ring->req,
+ sizeof(ring->req));
+ if (err < 0) {
+ perror("setsockopt");
+ exit(1);
+ }
+
+ ring->map = mmap(NULL, ring->req.tp_block_size * ring->req.tp_block_nr,
+ PROT_READ | PROT_WRITE, MAP_SHARED | MAP_LOCKED, fd, 0);
+ if (ring->map == MAP_FAILED) {
+ perror("mmap");
+ exit(1);
+ }
+
+ ring->rd = malloc(ring->req.tp_block_nr * sizeof(*ring->rd));
+ assert(ring->rd);
+ for (i = 0; i < ring->req.tp_block_nr; ++i) {
+ ring->rd[i].iov_base = ring->map + (i * ring->req.tp_block_size);
+ ring->rd[i].iov_len = ring->req.tp_block_size;
+ }
+
+ memset(&ll, 0, sizeof(ll));
+ ll.sll_family = PF_PACKET;
+ ll.sll_protocol = htons(ETH_P_ALL);
+ ll.sll_ifindex = if_nametoindex(netdev);
+ ll.sll_hatype = 0;
+ ll.sll_pkttype = 0;
+ ll.sll_halen = 0;
+
+ err = bind(fd, (struct sockaddr *) &ll, sizeof(ll));
+ if (err < 0) {
+ perror("bind");
+ exit(1);
+ }
+
+ return fd;
+ }
+
+ static void display(struct tpacket3_hdr *ppd)
+ {
+ struct ethhdr *eth = (struct ethhdr *) ((uint8_t *) ppd + ppd->tp_mac);
+ struct iphdr *ip = (struct iphdr *) ((uint8_t *) eth + ETH_HLEN);
+
+ if (eth->h_proto == htons(ETH_P_IP)) {
+ struct sockaddr_in ss, sd;
+ char sbuff[NI_MAXHOST], dbuff[NI_MAXHOST];
+
+ memset(&ss, 0, sizeof(ss));
+ ss.sin_family = PF_INET;
+ ss.sin_addr.s_addr = ip->saddr;
+ getnameinfo((struct sockaddr *) &ss, sizeof(ss),
+ sbuff, sizeof(sbuff), NULL, 0, NI_NUMERICHOST);
+
+ memset(&sd, 0, sizeof(sd));
+ sd.sin_family = PF_INET;
+ sd.sin_addr.s_addr = ip->daddr;
+ getnameinfo((struct sockaddr *) &sd, sizeof(sd),
+ dbuff, sizeof(dbuff), NULL, 0, NI_NUMERICHOST);
+
+ printf("%s -> %s, ", sbuff, dbuff);
+ }
+
+ printf("rxhash: 0x%x\n", ppd->hv1.tp_rxhash);
+ }
+
+ static void walk_block(struct block_desc *pbd, const int block_num)
+ {
+ int num_pkts = pbd->h1.num_pkts, i;
+ unsigned long bytes = 0;
+ struct tpacket3_hdr *ppd;
+
+ ppd = (struct tpacket3_hdr *) ((uint8_t *) pbd +
+ pbd->h1.offset_to_first_pkt);
+ for (i = 0; i < num_pkts; ++i) {
+ bytes += ppd->tp_snaplen;
+ display(ppd);
+
+ ppd = (struct tpacket3_hdr *) ((uint8_t *) ppd +
+ ppd->tp_next_offset);
+ }
+
+ packets_total += num_pkts;
+ bytes_total += bytes;
+ }
+
+ static void flush_block(struct block_desc *pbd)
+ {
+ pbd->h1.block_status = TP_STATUS_KERNEL;
+ }
+
+ static void teardown_socket(struct ring *ring, int fd)
+ {
+ munmap(ring->map, ring->req.tp_block_size * ring->req.tp_block_nr);
+ free(ring->rd);
+ close(fd);
+ }
+
+ int main(int argc, char **argp)
+ {
+ int fd, err;
+ socklen_t len;
+ struct ring ring;
+ struct pollfd pfd;
+ unsigned int block_num = 0, blocks = 64;
+ struct block_desc *pbd;
+ struct tpacket_stats_v3 stats;
+
+ if (argc != 2) {
+ fprintf(stderr, "Usage: %s INTERFACE\n", argp[0]);
+ return EXIT_FAILURE;
+ }
+
+ signal(SIGINT, sighandler);
+
+ memset(&ring, 0, sizeof(ring));
+ fd = setup_socket(&ring, argp[argc - 1]);
+ assert(fd > 0);
+
+ memset(&pfd, 0, sizeof(pfd));
+ pfd.fd = fd;
+ pfd.events = POLLIN | POLLERR;
+ pfd.revents = 0;
+
+ while (likely(!sigint)) {
+ pbd = (struct block_desc *) ring.rd[block_num].iov_base;
+
+ if ((pbd->h1.block_status & TP_STATUS_USER) == 0) {
+ poll(&pfd, 1, -1);
+ continue;
+ }
+
+ walk_block(pbd, block_num);
+ flush_block(pbd);
+ block_num = (block_num + 1) % blocks;
+ }
+
+ len = sizeof(stats);
+ err = getsockopt(fd, SOL_PACKET, PACKET_STATISTICS, &stats, &len);
+ if (err < 0) {
+ perror("getsockopt");
+ exit(1);
+ }
+
+ fflush(stdout);
+ printf("\nReceived %u packets, %lu bytes, %u dropped, freeze_q_cnt: %u\n",
+ stats.tp_packets, bytes_total, stats.tp_drops,
+ stats.tp_freeze_q_cnt);
+
+ teardown_socket(&ring, fd);
+ return 0;
+ }
+
+PACKET_QDISC_BYPASS
+===================
+
+If there is a requirement to load the network with many packets in a similar
+fashion as pktgen does, you might set the following option after socket
+creation::
+
+ int one = 1;
+ setsockopt(fd, SOL_PACKET, PACKET_QDISC_BYPASS, &one, sizeof(one));
+
+This has the side-effect, that packets sent through PF_PACKET will bypass the
+kernel's qdisc layer and are forcedly pushed to the driver directly. Meaning,
+packet are not buffered, tc disciplines are ignored, increased loss can occur
+and such packets are also not visible to other PF_PACKET sockets anymore. So,
+you have been warned; generally, this can be useful for stress testing various
+components of a system.
+
+On default, PACKET_QDISC_BYPASS is disabled and needs to be explicitly enabled
+on PF_PACKET sockets.
+
+PACKET_TIMESTAMP
+================
+
+The PACKET_TIMESTAMP setting determines the source of the timestamp in
+the packet meta information for mmap(2)ed RX_RING and TX_RINGs. If your
+NIC is capable of timestamping packets in hardware, you can request those
+hardware timestamps to be used. Note: you may need to enable the generation
+of hardware timestamps with SIOCSHWTSTAMP (see related information from
+Documentation/networking/timestamping.rst).
+
+PACKET_TIMESTAMP accepts the same integer bit field as SO_TIMESTAMPING::
+
+ int req = SOF_TIMESTAMPING_RAW_HARDWARE;
+ setsockopt(fd, SOL_PACKET, PACKET_TIMESTAMP, (void *) &req, sizeof(req))
+
+For the mmap(2)ed ring buffers, such timestamps are stored in the
+``tpacket{,2,3}_hdr`` structure's tp_sec and ``tp_{n,u}sec`` members.
+To determine what kind of timestamp has been reported, the tp_status field
+is binary or'ed with the following possible bits ...
+
+::
+
+ TP_STATUS_TS_RAW_HARDWARE
+ TP_STATUS_TS_SOFTWARE
+
+... that are equivalent to its ``SOF_TIMESTAMPING_*`` counterparts. For the
+RX_RING, if neither is set (i.e. PACKET_TIMESTAMP is not set), then a
+software fallback was invoked *within* PF_PACKET's processing code (less
+precise).
+
+Getting timestamps for the TX_RING works as follows: i) fill the ring frames,
+ii) call sendto() e.g. in blocking mode, iii) wait for status of relevant
+frames to be updated resp. the frame handed over to the application, iv) walk
+through the frames to pick up the individual hw/sw timestamps.
+
+Only (!) if transmit timestamping is enabled, then these bits are combined
+with binary | with TP_STATUS_AVAILABLE, so you must check for that in your
+application (e.g. !(tp_status & (TP_STATUS_SEND_REQUEST | TP_STATUS_SENDING))
+in a first step to see if the frame belongs to the application, and then
+one can extract the type of timestamp in a second step from tp_status)!
+
+If you don't care about them, thus having it disabled, checking for
+TP_STATUS_AVAILABLE resp. TP_STATUS_WRONG_FORMAT is sufficient. If in the
+TX_RING part only TP_STATUS_AVAILABLE is set, then the tp_sec and tp_{n,u}sec
+members do not contain a valid value. For TX_RINGs, by default no timestamp
+is generated!
+
+See include/linux/net_tstamp.h and Documentation/networking/timestamping.rst
+for more information on hardware timestamps.
+
+Miscellaneous bits
+==================
+
+- Packet sockets work well together with Linux socket filters, thus you also
+ might want to have a look at Documentation/networking/filter.rst
+
+THANKS
+======
+
+ Jesse Brandeburg, for fixing my grammathical/spelling errors
diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt
deleted file mode 100644
index 999eb41da81d..000000000000
--- a/Documentation/networking/packet_mmap.txt
+++ /dev/null
@@ -1,1061 +0,0 @@
---------------------------------------------------------------------------------
-+ ABSTRACT
---------------------------------------------------------------------------------
-
-This file documents the mmap() facility available with the PACKET
-socket interface on 2.4/2.6/3.x kernels. This type of sockets is used for
-i) capture network traffic with utilities like tcpdump, ii) transmit network
-traffic, or any other that needs raw access to network interface.
-
-Howto can be found at:
- https://sites.google.com/site/packetmmap/
-
-Please send your comments to
- Ulisses Alonso Camaró <uaca@i.hate.spam.alumni.uv.es>
- Johann Baudy
-
--------------------------------------------------------------------------------
-+ Why use PACKET_MMAP
---------------------------------------------------------------------------------
-
-In Linux 2.4/2.6/3.x if PACKET_MMAP is not enabled, the capture process is very
-inefficient. It uses very limited buffers and requires one system call to
-capture each packet, it requires two if you want to get packet's timestamp
-(like libpcap always does).
-
-In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size
-configurable circular buffer mapped in user space that can be used to either
-send or receive packets. This way reading packets just needs to wait for them,
-most of the time there is no need to issue a single system call. Concerning
-transmission, multiple packets can be sent through one system call to get the
-highest bandwidth. By using a shared buffer between the kernel and the user
-also has the benefit of minimizing packet copies.
-
-It's fine to use PACKET_MMAP to improve the performance of the capture and
-transmission process, but it isn't everything. At least, if you are capturing
-at high speeds (this is relative to the cpu speed), you should check if the
-device driver of your network interface card supports some sort of interrupt
-load mitigation or (even better) if it supports NAPI, also make sure it is
-enabled. For transmission, check the MTU (Maximum Transmission Unit) used and
-supported by devices of your network. CPU IRQ pinning of your network interface
-card can also be an advantage.
-
---------------------------------------------------------------------------------
-+ How to use mmap() to improve capture process
---------------------------------------------------------------------------------
-
-From the user standpoint, you should use the higher level libpcap library, which
-is a de facto standard, portable across nearly all operating systems
-including Win32.
-
-Packet MMAP support was integrated into libpcap around the time of version 1.3.0;
-TPACKET_V3 support was added in version 1.5.0
-
---------------------------------------------------------------------------------
-+ How to use mmap() directly to improve capture process
---------------------------------------------------------------------------------
-
-From the system calls stand point, the use of PACKET_MMAP involves
-the following process:
-
-
-[setup] socket() -------> creation of the capture socket
- setsockopt() ---> allocation of the circular buffer (ring)
- option: PACKET_RX_RING
- mmap() ---------> mapping of the allocated buffer to the
- user process
-
-[capture] poll() ---------> to wait for incoming packets
-
-[shutdown] close() --------> destruction of the capture socket and
- deallocation of all associated
- resources.
-
-
-socket creation and destruction is straight forward, and is done
-the same way with or without PACKET_MMAP:
-
- int fd = socket(PF_PACKET, mode, htons(ETH_P_ALL));
-
-where mode is SOCK_RAW for the raw interface were link level
-information can be captured or SOCK_DGRAM for the cooked
-interface where link level information capture is not
-supported and a link level pseudo-header is provided
-by the kernel.
-
-The destruction of the socket and all associated resources
-is done by a simple call to close(fd).
-
-Similarly as without PACKET_MMAP, it is possible to use one socket
-for capture and transmission. This can be done by mapping the
-allocated RX and TX buffer ring with a single mmap() call.
-See "Mapping and use of the circular buffer (ring)".
-
-Next I will describe PACKET_MMAP settings and its constraints,
-also the mapping of the circular buffer in the user process and
-the use of this buffer.
-
---------------------------------------------------------------------------------
-+ How to use mmap() directly to improve transmission process
---------------------------------------------------------------------------------
-Transmission process is similar to capture as shown below.
-
-[setup] socket() -------> creation of the transmission socket
- setsockopt() ---> allocation of the circular buffer (ring)
- option: PACKET_TX_RING
- bind() ---------> bind transmission socket with a network interface
- mmap() ---------> mapping of the allocated buffer to the
- user process
-
-[transmission] poll() ---------> wait for free packets (optional)
- send() ---------> send all packets that are set as ready in
- the ring
- The flag MSG_DONTWAIT can be used to return
- before end of transfer.
-
-[shutdown] close() --------> destruction of the transmission socket and
- deallocation of all associated resources.
-
-Socket creation and destruction is also straight forward, and is done
-the same way as in capturing described in the previous paragraph:
-
- int fd = socket(PF_PACKET, mode, 0);
-
-The protocol can optionally be 0 in case we only want to transmit
-via this socket, which avoids an expensive call to packet_rcv().
-In this case, you also need to bind(2) the TX_RING with sll_protocol = 0
-set. Otherwise, htons(ETH_P_ALL) or any other protocol, for example.
-
-Binding the socket to your network interface is mandatory (with zero copy) to
-know the header size of frames used in the circular buffer.
-
-As capture, each frame contains two parts:
-
- --------------------
-| struct tpacket_hdr | Header. It contains the status of
-| | of this frame
-|--------------------|
-| data buffer |
-. . Data that will be sent over the network interface.
-. .
- --------------------
-
- bind() associates the socket to your network interface thanks to
- sll_ifindex parameter of struct sockaddr_ll.
-
- Initialization example:
-
- struct sockaddr_ll my_addr;
- struct ifreq s_ifr;
- ...
-
- strncpy (s_ifr.ifr_name, "eth0", sizeof(s_ifr.ifr_name));
-
- /* get interface index of eth0 */
- ioctl(this->socket, SIOCGIFINDEX, &s_ifr);
-
- /* fill sockaddr_ll struct to prepare binding */
- my_addr.sll_family = AF_PACKET;
- my_addr.sll_protocol = htons(ETH_P_ALL);
- my_addr.sll_ifindex = s_ifr.ifr_ifindex;
-
- /* bind socket to eth0 */
- bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll));
-
- A complete tutorial is available at: https://sites.google.com/site/packetmmap/
-
-By default, the user should put data at :
- frame base + TPACKET_HDRLEN - sizeof(struct sockaddr_ll)
-
-So, whatever you choose for the socket mode (SOCK_DGRAM or SOCK_RAW),
-the beginning of the user data will be at :
- frame base + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
-
-If you wish to put user data at a custom offset from the beginning of
-the frame (for payload alignment with SOCK_RAW mode for instance) you
-can set tp_net (with SOCK_DGRAM) or tp_mac (with SOCK_RAW). In order
-to make this work it must be enabled previously with setsockopt()
-and the PACKET_TX_HAS_OFF option.
-
---------------------------------------------------------------------------------
-+ PACKET_MMAP settings
---------------------------------------------------------------------------------
-
-To setup PACKET_MMAP from user level code is done with a call like
-
- - Capture process
- setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req, sizeof(req))
- - Transmission process
- setsockopt(fd, SOL_PACKET, PACKET_TX_RING, (void *) &req, sizeof(req))
-
-The most significant argument in the previous call is the req parameter,
-this parameter must to have the following structure:
-
- struct tpacket_req
- {
- unsigned int tp_block_size; /* Minimal size of contiguous block */
- unsigned int tp_block_nr; /* Number of blocks */
- unsigned int tp_frame_size; /* Size of frame */
- unsigned int tp_frame_nr; /* Total number of frames */
- };
-
-This structure is defined in /usr/include/linux/if_packet.h and establishes a
-circular buffer (ring) of unswappable memory.
-Being mapped in the capture process allows reading the captured frames and
-related meta-information like timestamps without requiring a system call.
-
-Frames are grouped in blocks. Each block is a physically contiguous
-region of memory and holds tp_block_size/tp_frame_size frames. The total number
-of blocks is tp_block_nr. Note that tp_frame_nr is a redundant parameter because
-
- frames_per_block = tp_block_size/tp_frame_size
-
-indeed, packet_set_ring checks that the following condition is true
-
- frames_per_block * tp_block_nr == tp_frame_nr
-
-Lets see an example, with the following values:
-
- tp_block_size= 4096
- tp_frame_size= 2048
- tp_block_nr = 4
- tp_frame_nr = 8
-
-we will get the following buffer structure:
-
- block #1 block #2
-+---------+---------+ +---------+---------+
-| frame 1 | frame 2 | | frame 3 | frame 4 |
-+---------+---------+ +---------+---------+
-
- block #3 block #4
-+---------+---------+ +---------+---------+
-| frame 5 | frame 6 | | frame 7 | frame 8 |
-+---------+---------+ +---------+---------+
-
-A frame can be of any size with the only condition it can fit in a block. A block
-can only hold an integer number of frames, or in other words, a frame cannot
-be spawned across two blocks, so there are some details you have to take into
-account when choosing the frame_size. See "Mapping and use of the circular
-buffer (ring)".
-
---------------------------------------------------------------------------------
-+ PACKET_MMAP setting constraints
---------------------------------------------------------------------------------
-
-In kernel versions prior to 2.4.26 (for the 2.4 branch) and 2.6.5 (2.6 branch),
-the PACKET_MMAP buffer could hold only 32768 frames in a 32 bit architecture or
-16384 in a 64 bit architecture. For information on these kernel versions
-see http://pusa.uv.es/~ulisses/packet_mmap/packet_mmap.pre-2.4.26_2.6.5.txt
-
- Block size limit
-------------------
-
-As stated earlier, each block is a contiguous physical region of memory. These
-memory regions are allocated with calls to the __get_free_pages() function. As
-the name indicates, this function allocates pages of memory, and the second
-argument is "order" or a power of two number of pages, that is
-(for PAGE_SIZE == 4096) order=0 ==> 4096 bytes, order=1 ==> 8192 bytes,
-order=2 ==> 16384 bytes, etc. The maximum size of a
-region allocated by __get_free_pages is determined by the MAX_ORDER macro. More
-precisely the limit can be calculated as:
-
- PAGE_SIZE << MAX_ORDER
-
- In a i386 architecture PAGE_SIZE is 4096 bytes
- In a 2.4/i386 kernel MAX_ORDER is 10
- In a 2.6/i386 kernel MAX_ORDER is 11
-
-So get_free_pages can allocate as much as 4MB or 8MB in a 2.4/2.6 kernel
-respectively, with an i386 architecture.
-
-User space programs can include /usr/include/sys/user.h and
-/usr/include/linux/mmzone.h to get PAGE_SIZE MAX_ORDER declarations.
-
-The pagesize can also be determined dynamically with the getpagesize (2)
-system call.
-
- Block number limit
---------------------
-
-To understand the constraints of PACKET_MMAP, we have to see the structure
-used to hold the pointers to each block.
-
-Currently, this structure is a dynamically allocated vector with kmalloc
-called pg_vec, its size limits the number of blocks that can be allocated.
-
- +---+---+---+---+
- | x | x | x | x |
- +---+---+---+---+
- | | | |
- | | | v
- | | v block #4
- | v block #3
- v block #2
- block #1
-
-kmalloc allocates any number of bytes of physically contiguous memory from
-a pool of pre-determined sizes. This pool of memory is maintained by the slab
-allocator which is at the end the responsible for doing the allocation and
-hence which imposes the maximum memory that kmalloc can allocate.
-
-In a 2.4/2.6 kernel and the i386 architecture, the limit is 131072 bytes. The
-predetermined sizes that kmalloc uses can be checked in the "size-<bytes>"
-entries of /proc/slabinfo
-
-In a 32 bit architecture, pointers are 4 bytes long, so the total number of
-pointers to blocks is
-
- 131072/4 = 32768 blocks
-
- PACKET_MMAP buffer size calculator
-------------------------------------
-
-Definitions:
-
-<size-max> : is the maximum size of allocable with kmalloc (see /proc/slabinfo)
-<pointer size>: depends on the architecture -- sizeof(void *)
-<page size> : depends on the architecture -- PAGE_SIZE or getpagesize (2)
-<max-order> : is the value defined with MAX_ORDER
-<frame size> : it's an upper bound of frame's capture size (more on this later)
-
-from these definitions we will derive
-
- <block number> = <size-max>/<pointer size>
- <block size> = <pagesize> << <max-order>
-
-so, the max buffer size is
-
- <block number> * <block size>
-
-and, the number of frames be
-
- <block number> * <block size> / <frame size>
-
-Suppose the following parameters, which apply for 2.6 kernel and an
-i386 architecture:
-
- <size-max> = 131072 bytes
- <pointer size> = 4 bytes
- <pagesize> = 4096 bytes
- <max-order> = 11
-
-and a value for <frame size> of 2048 bytes. These parameters will yield
-
- <block number> = 131072/4 = 32768 blocks
- <block size> = 4096 << 11 = 8 MiB.
-
-and hence the buffer will have a 262144 MiB size. So it can hold
-262144 MiB / 2048 bytes = 134217728 frames
-
-Actually, this buffer size is not possible with an i386 architecture.
-Remember that the memory is allocated in kernel space, in the case of
-an i386 kernel's memory size is limited to 1GiB.
-
-All memory allocations are not freed until the socket is closed. The memory
-allocations are done with GFP_KERNEL priority, this basically means that
-the allocation can wait and swap other process' memory in order to allocate
-the necessary memory, so normally limits can be reached.
-
- Other constraints
--------------------
-
-If you check the source code you will see that what I draw here as a frame
-is not only the link level frame. At the beginning of each frame there is a
-header called struct tpacket_hdr used in PACKET_MMAP to hold link level's frame
-meta information like timestamp. So what we draw here a frame it's really
-the following (from include/linux/if_packet.h):
-
-/*
- Frame structure:
-
- - Start. Frame must be aligned to TPACKET_ALIGNMENT=16
- - struct tpacket_hdr
- - pad to TPACKET_ALIGNMENT=16
- - struct sockaddr_ll
- - Gap, chosen so that packet data (Start+tp_net) aligns to
- TPACKET_ALIGNMENT=16
- - Start+tp_mac: [ Optional MAC header ]
- - Start+tp_net: Packet data, aligned to TPACKET_ALIGNMENT=16.
- - Pad to align to TPACKET_ALIGNMENT=16
- */
-
- The following are conditions that are checked in packet_set_ring
-
- tp_block_size must be a multiple of PAGE_SIZE (1)
- tp_frame_size must be greater than TPACKET_HDRLEN (obvious)
- tp_frame_size must be a multiple of TPACKET_ALIGNMENT
- tp_frame_nr must be exactly frames_per_block*tp_block_nr
-
-Note that tp_block_size should be chosen to be a power of two or there will
-be a waste of memory.
-
---------------------------------------------------------------------------------
-+ Mapping and use of the circular buffer (ring)
---------------------------------------------------------------------------------
-
-The mapping of the buffer in the user process is done with the conventional
-mmap function. Even the circular buffer is compound of several physically
-discontiguous blocks of memory, they are contiguous to the user space, hence
-just one call to mmap is needed:
-
- mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
-
-If tp_frame_size is a divisor of tp_block_size frames will be
-contiguously spaced by tp_frame_size bytes. If not, each
-tp_block_size/tp_frame_size frames there will be a gap between
-the frames. This is because a frame cannot be spawn across two
-blocks.
-
-To use one socket for capture and transmission, the mapping of both the
-RX and TX buffer ring has to be done with one call to mmap:
-
- ...
- setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &foo, sizeof(foo));
- setsockopt(fd, SOL_PACKET, PACKET_TX_RING, &bar, sizeof(bar));
- ...
- rx_ring = mmap(0, size * 2, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
- tx_ring = rx_ring + size;
-
-RX must be the first as the kernel maps the TX ring memory right
-after the RX one.
-
-At the beginning of each frame there is an status field (see
-struct tpacket_hdr). If this field is 0 means that the frame is ready
-to be used for the kernel, If not, there is a frame the user can read
-and the following flags apply:
-
-+++ Capture process:
- from include/linux/if_packet.h
-
- #define TP_STATUS_COPY (1 << 1)
- #define TP_STATUS_LOSING (1 << 2)
- #define TP_STATUS_CSUMNOTREADY (1 << 3)
- #define TP_STATUS_CSUM_VALID (1 << 7)
-
-TP_STATUS_COPY : This flag indicates that the frame (and associated
- meta information) has been truncated because it's
- larger than tp_frame_size. This packet can be
- read entirely with recvfrom().
-
- In order to make this work it must to be
- enabled previously with setsockopt() and
- the PACKET_COPY_THRESH option.
-
- The number of frames that can be buffered to
- be read with recvfrom is limited like a normal socket.
- See the SO_RCVBUF option in the socket (7) man page.
-
-TP_STATUS_LOSING : indicates there were packet drops from last time
- statistics where checked with getsockopt() and
- the PACKET_STATISTICS option.
-
-TP_STATUS_CSUMNOTREADY: currently it's used for outgoing IP packets which
- its checksum will be done in hardware. So while
- reading the packet we should not try to check the
- checksum.
-
-TP_STATUS_CSUM_VALID : This flag indicates that at least the transport
- header checksum of the packet has been already
- validated on the kernel side. If the flag is not set
- then we are free to check the checksum by ourselves
- provided that TP_STATUS_CSUMNOTREADY is also not set.
-
-for convenience there are also the following defines:
-
- #define TP_STATUS_KERNEL 0
- #define TP_STATUS_USER 1
-
-The kernel initializes all frames to TP_STATUS_KERNEL, when the kernel
-receives a packet it puts in the buffer and updates the status with
-at least the TP_STATUS_USER flag. Then the user can read the packet,
-once the packet is read the user must zero the status field, so the kernel
-can use again that frame buffer.
-
-The user can use poll (any other variant should apply too) to check if new
-packets are in the ring:
-
- struct pollfd pfd;
-
- pfd.fd = fd;
- pfd.revents = 0;
- pfd.events = POLLIN|POLLRDNORM|POLLERR;
-
- if (status == TP_STATUS_KERNEL)
- retval = poll(&pfd, 1, timeout);
-
-It doesn't incur in a race condition to first check the status value and
-then poll for frames.
-
-++ Transmission process
-Those defines are also used for transmission:
-
- #define TP_STATUS_AVAILABLE 0 // Frame is available
- #define TP_STATUS_SEND_REQUEST 1 // Frame will be sent on next send()
- #define TP_STATUS_SENDING 2 // Frame is currently in transmission
- #define TP_STATUS_WRONG_FORMAT 4 // Frame format is not correct
-
-First, the kernel initializes all frames to TP_STATUS_AVAILABLE. To send a
-packet, the user fills a data buffer of an available frame, sets tp_len to
-current data buffer size and sets its status field to TP_STATUS_SEND_REQUEST.
-This can be done on multiple frames. Once the user is ready to transmit, it
-calls send(). Then all buffers with status equal to TP_STATUS_SEND_REQUEST are
-forwarded to the network device. The kernel updates each status of sent
-frames with TP_STATUS_SENDING until the end of transfer.
-At the end of each transfer, buffer status returns to TP_STATUS_AVAILABLE.
-
- header->tp_len = in_i_size;
- header->tp_status = TP_STATUS_SEND_REQUEST;
- retval = send(this->socket, NULL, 0, 0);
-
-The user can also use poll() to check if a buffer is available:
-(status == TP_STATUS_SENDING)
-
- struct pollfd pfd;
- pfd.fd = fd;
- pfd.revents = 0;
- pfd.events = POLLOUT;
- retval = poll(&pfd, 1, timeout);
-
--------------------------------------------------------------------------------
-+ What TPACKET versions are available and when to use them?
--------------------------------------------------------------------------------
-
- int val = tpacket_version;
- setsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val));
- getsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val));
-
-where 'tpacket_version' can be TPACKET_V1 (default), TPACKET_V2, TPACKET_V3.
-
-TPACKET_V1:
- - Default if not otherwise specified by setsockopt(2)
- - RX_RING, TX_RING available
-
-TPACKET_V1 --> TPACKET_V2:
- - Made 64 bit clean due to unsigned long usage in TPACKET_V1
- structures, thus this also works on 64 bit kernel with 32 bit
- userspace and the like
- - Timestamp resolution in nanoseconds instead of microseconds
- - RX_RING, TX_RING available
- - VLAN metadata information available for packets
- (TP_STATUS_VLAN_VALID, TP_STATUS_VLAN_TPID_VALID),
- in the tpacket2_hdr structure:
- - TP_STATUS_VLAN_VALID bit being set into the tp_status field indicates
- that the tp_vlan_tci field has valid VLAN TCI value
- - TP_STATUS_VLAN_TPID_VALID bit being set into the tp_status field
- indicates that the tp_vlan_tpid field has valid VLAN TPID value
- - How to switch to TPACKET_V2:
- 1. Replace struct tpacket_hdr by struct tpacket2_hdr
- 2. Query header len and save
- 3. Set protocol version to 2, set up ring as usual
- 4. For getting the sockaddr_ll,
- use (void *)hdr + TPACKET_ALIGN(hdrlen) instead of
- (void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
-
-TPACKET_V2 --> TPACKET_V3:
- - Flexible buffer implementation for RX_RING:
- 1. Blocks can be configured with non-static frame-size
- 2. Read/poll is at a block-level (as opposed to packet-level)
- 3. Added poll timeout to avoid indefinite user-space wait
- on idle links
- 4. Added user-configurable knobs:
- 4.1 block::timeout
- 4.2 tpkt_hdr::sk_rxhash
- - RX Hash data available in user space
- - TX_RING semantics are conceptually similar to TPACKET_V2;
- use tpacket3_hdr instead of tpacket2_hdr, and TPACKET3_HDRLEN
- instead of TPACKET2_HDRLEN. In the current implementation,
- the tp_next_offset field in the tpacket3_hdr MUST be set to
- zero, indicating that the ring does not hold variable sized frames.
- Packets with non-zero values of tp_next_offset will be dropped.
-
--------------------------------------------------------------------------------
-+ AF_PACKET fanout mode
--------------------------------------------------------------------------------
-
-In the AF_PACKET fanout mode, packet reception can be load balanced among
-processes. This also works in combination with mmap(2) on packet sockets.
-
-Currently implemented fanout policies are:
-
- - PACKET_FANOUT_HASH: schedule to socket by skb's packet hash
- - PACKET_FANOUT_LB: schedule to socket by round-robin
- - PACKET_FANOUT_CPU: schedule to socket by CPU packet arrives on
- - PACKET_FANOUT_RND: schedule to socket by random selection
- - PACKET_FANOUT_ROLLOVER: if one socket is full, rollover to another
- - PACKET_FANOUT_QM: schedule to socket by skbs recorded queue_mapping
-
-Minimal example code by David S. Miller (try things like "./test eth0 hash",
-"./test eth0 lb", etc.):
-
-#include <stddef.h>
-#include <stdlib.h>
-#include <stdio.h>
-#include <string.h>
-
-#include <sys/types.h>
-#include <sys/wait.h>
-#include <sys/socket.h>
-#include <sys/ioctl.h>
-
-#include <unistd.h>
-
-#include <linux/if_ether.h>
-#include <linux/if_packet.h>
-
-#include <net/if.h>
-
-static const char *device_name;
-static int fanout_type;
-static int fanout_id;
-
-#ifndef PACKET_FANOUT
-# define PACKET_FANOUT 18
-# define PACKET_FANOUT_HASH 0
-# define PACKET_FANOUT_LB 1
-#endif
-
-static int setup_socket(void)
-{
- int err, fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_IP));
- struct sockaddr_ll ll;
- struct ifreq ifr;
- int fanout_arg;
-
- if (fd < 0) {
- perror("socket");
- return EXIT_FAILURE;
- }
-
- memset(&ifr, 0, sizeof(ifr));
- strcpy(ifr.ifr_name, device_name);
- err = ioctl(fd, SIOCGIFINDEX, &ifr);
- if (err < 0) {
- perror("SIOCGIFINDEX");
- return EXIT_FAILURE;
- }
-
- memset(&ll, 0, sizeof(ll));
- ll.sll_family = AF_PACKET;
- ll.sll_ifindex = ifr.ifr_ifindex;
- err = bind(fd, (struct sockaddr *) &ll, sizeof(ll));
- if (err < 0) {
- perror("bind");
- return EXIT_FAILURE;
- }
-
- fanout_arg = (fanout_id | (fanout_type << 16));
- err = setsockopt(fd, SOL_PACKET, PACKET_FANOUT,
- &fanout_arg, sizeof(fanout_arg));
- if (err) {
- perror("setsockopt");
- return EXIT_FAILURE;
- }
-
- return fd;
-}
-
-static void fanout_thread(void)
-{
- int fd = setup_socket();
- int limit = 10000;
-
- if (fd < 0)
- exit(fd);
-
- while (limit-- > 0) {
- char buf[1600];
- int err;
-
- err = read(fd, buf, sizeof(buf));
- if (err < 0) {
- perror("read");
- exit(EXIT_FAILURE);
- }
- if ((limit % 10) == 0)
- fprintf(stdout, "(%d) \n", getpid());
- }
-
- fprintf(stdout, "%d: Received 10000 packets\n", getpid());
-
- close(fd);
- exit(0);
-}
-
-int main(int argc, char **argp)
-{
- int fd, err;
- int i;
-
- if (argc != 3) {
- fprintf(stderr, "Usage: %s INTERFACE {hash|lb}\n", argp[0]);
- return EXIT_FAILURE;
- }
-
- if (!strcmp(argp[2], "hash"))
- fanout_type = PACKET_FANOUT_HASH;
- else if (!strcmp(argp[2], "lb"))
- fanout_type = PACKET_FANOUT_LB;
- else {
- fprintf(stderr, "Unknown fanout type [%s]\n", argp[2]);
- exit(EXIT_FAILURE);
- }
-
- device_name = argp[1];
- fanout_id = getpid() & 0xffff;
-
- for (i = 0; i < 4; i++) {
- pid_t pid = fork();
-
- switch (pid) {
- case 0:
- fanout_thread();
-
- case -1:
- perror("fork");
- exit(EXIT_FAILURE);
- }
- }
-
- for (i = 0; i < 4; i++) {
- int status;
-
- wait(&status);
- }
-
- return 0;
-}
-
--------------------------------------------------------------------------------
-+ AF_PACKET TPACKET_V3 example
--------------------------------------------------------------------------------
-
-AF_PACKET's TPACKET_V3 ring buffer can be configured to use non-static frame
-sizes by doing it's own memory management. It is based on blocks where polling
-works on a per block basis instead of per ring as in TPACKET_V2 and predecessor.
-
-It is said that TPACKET_V3 brings the following benefits:
- *) ~15 - 20% reduction in CPU-usage
- *) ~20% increase in packet capture rate
- *) ~2x increase in packet density
- *) Port aggregation analysis
- *) Non static frame size to capture entire packet payload
-
-So it seems to be a good candidate to be used with packet fanout.
-
-Minimal example code by Daniel Borkmann based on Chetan Loke's lolpcap (compile
-it with gcc -Wall -O2 blob.c, and try things like "./a.out eth0", etc.):
-
-/* Written from scratch, but kernel-to-user space API usage
- * dissected from lolpcap:
- * Copyright 2011, Chetan Loke <loke.chetan@gmail.com>
- * License: GPL, version 2.0
- */
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <stdint.h>
-#include <string.h>
-#include <assert.h>
-#include <net/if.h>
-#include <arpa/inet.h>
-#include <netdb.h>
-#include <poll.h>
-#include <unistd.h>
-#include <signal.h>
-#include <inttypes.h>
-#include <sys/socket.h>
-#include <sys/mman.h>
-#include <linux/if_packet.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-
-#ifndef likely
-# define likely(x) __builtin_expect(!!(x), 1)
-#endif
-#ifndef unlikely
-# define unlikely(x) __builtin_expect(!!(x), 0)
-#endif
-
-struct block_desc {
- uint32_t version;
- uint32_t offset_to_priv;
- struct tpacket_hdr_v1 h1;
-};
-
-struct ring {
- struct iovec *rd;
- uint8_t *map;
- struct tpacket_req3 req;
-};
-
-static unsigned long packets_total = 0, bytes_total = 0;
-static sig_atomic_t sigint = 0;
-
-static void sighandler(int num)
-{
- sigint = 1;
-}
-
-static int setup_socket(struct ring *ring, char *netdev)
-{
- int err, i, fd, v = TPACKET_V3;
- struct sockaddr_ll ll;
- unsigned int blocksiz = 1 << 22, framesiz = 1 << 11;
- unsigned int blocknum = 64;
-
- fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
- if (fd < 0) {
- perror("socket");
- exit(1);
- }
-
- err = setsockopt(fd, SOL_PACKET, PACKET_VERSION, &v, sizeof(v));
- if (err < 0) {
- perror("setsockopt");
- exit(1);
- }
-
- memset(&ring->req, 0, sizeof(ring->req));
- ring->req.tp_block_size = blocksiz;
- ring->req.tp_frame_size = framesiz;
- ring->req.tp_block_nr = blocknum;
- ring->req.tp_frame_nr = (blocksiz * blocknum) / framesiz;
- ring->req.tp_retire_blk_tov = 60;
- ring->req.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH;
-
- err = setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &ring->req,
- sizeof(ring->req));
- if (err < 0) {
- perror("setsockopt");
- exit(1);
- }
-
- ring->map = mmap(NULL, ring->req.tp_block_size * ring->req.tp_block_nr,
- PROT_READ | PROT_WRITE, MAP_SHARED | MAP_LOCKED, fd, 0);
- if (ring->map == MAP_FAILED) {
- perror("mmap");
- exit(1);
- }
-
- ring->rd = malloc(ring->req.tp_block_nr * sizeof(*ring->rd));
- assert(ring->rd);
- for (i = 0; i < ring->req.tp_block_nr; ++i) {
- ring->rd[i].iov_base = ring->map + (i * ring->req.tp_block_size);
- ring->rd[i].iov_len = ring->req.tp_block_size;
- }
-
- memset(&ll, 0, sizeof(ll));
- ll.sll_family = PF_PACKET;
- ll.sll_protocol = htons(ETH_P_ALL);
- ll.sll_ifindex = if_nametoindex(netdev);
- ll.sll_hatype = 0;
- ll.sll_pkttype = 0;
- ll.sll_halen = 0;
-
- err = bind(fd, (struct sockaddr *) &ll, sizeof(ll));
- if (err < 0) {
- perror("bind");
- exit(1);
- }
-
- return fd;
-}
-
-static void display(struct tpacket3_hdr *ppd)
-{
- struct ethhdr *eth = (struct ethhdr *) ((uint8_t *) ppd + ppd->tp_mac);
- struct iphdr *ip = (struct iphdr *) ((uint8_t *) eth + ETH_HLEN);
-
- if (eth->h_proto == htons(ETH_P_IP)) {
- struct sockaddr_in ss, sd;
- char sbuff[NI_MAXHOST], dbuff[NI_MAXHOST];
-
- memset(&ss, 0, sizeof(ss));
- ss.sin_family = PF_INET;
- ss.sin_addr.s_addr = ip->saddr;
- getnameinfo((struct sockaddr *) &ss, sizeof(ss),
- sbuff, sizeof(sbuff), NULL, 0, NI_NUMERICHOST);
-
- memset(&sd, 0, sizeof(sd));
- sd.sin_family = PF_INET;
- sd.sin_addr.s_addr = ip->daddr;
- getnameinfo((struct sockaddr *) &sd, sizeof(sd),
- dbuff, sizeof(dbuff), NULL, 0, NI_NUMERICHOST);
-
- printf("%s -> %s, ", sbuff, dbuff);
- }
-
- printf("rxhash: 0x%x\n", ppd->hv1.tp_rxhash);
-}
-
-static void walk_block(struct block_desc *pbd, const int block_num)
-{
- int num_pkts = pbd->h1.num_pkts, i;
- unsigned long bytes = 0;
- struct tpacket3_hdr *ppd;
-
- ppd = (struct tpacket3_hdr *) ((uint8_t *) pbd +
- pbd->h1.offset_to_first_pkt);
- for (i = 0; i < num_pkts; ++i) {
- bytes += ppd->tp_snaplen;
- display(ppd);
-
- ppd = (struct tpacket3_hdr *) ((uint8_t *) ppd +
- ppd->tp_next_offset);
- }
-
- packets_total += num_pkts;
- bytes_total += bytes;
-}
-
-static void flush_block(struct block_desc *pbd)
-{
- pbd->h1.block_status = TP_STATUS_KERNEL;
-}
-
-static void teardown_socket(struct ring *ring, int fd)
-{
- munmap(ring->map, ring->req.tp_block_size * ring->req.tp_block_nr);
- free(ring->rd);
- close(fd);
-}
-
-int main(int argc, char **argp)
-{
- int fd, err;
- socklen_t len;
- struct ring ring;
- struct pollfd pfd;
- unsigned int block_num = 0, blocks = 64;
- struct block_desc *pbd;
- struct tpacket_stats_v3 stats;
-
- if (argc != 2) {
- fprintf(stderr, "Usage: %s INTERFACE\n", argp[0]);
- return EXIT_FAILURE;
- }
-
- signal(SIGINT, sighandler);
-
- memset(&ring, 0, sizeof(ring));
- fd = setup_socket(&ring, argp[argc - 1]);
- assert(fd > 0);
-
- memset(&pfd, 0, sizeof(pfd));
- pfd.fd = fd;
- pfd.events = POLLIN | POLLERR;
- pfd.revents = 0;
-
- while (likely(!sigint)) {
- pbd = (struct block_desc *) ring.rd[block_num].iov_base;
-
- if ((pbd->h1.block_status & TP_STATUS_USER) == 0) {
- poll(&pfd, 1, -1);
- continue;
- }
-
- walk_block(pbd, block_num);
- flush_block(pbd);
- block_num = (block_num + 1) % blocks;
- }
-
- len = sizeof(stats);
- err = getsockopt(fd, SOL_PACKET, PACKET_STATISTICS, &stats, &len);
- if (err < 0) {
- perror("getsockopt");
- exit(1);
- }
-
- fflush(stdout);
- printf("\nReceived %u packets, %lu bytes, %u dropped, freeze_q_cnt: %u\n",
- stats.tp_packets, bytes_total, stats.tp_drops,
- stats.tp_freeze_q_cnt);
-
- teardown_socket(&ring, fd);
- return 0;
-}
-
--------------------------------------------------------------------------------
-+ PACKET_QDISC_BYPASS
--------------------------------------------------------------------------------
-
-If there is a requirement to load the network with many packets in a similar
-fashion as pktgen does, you might set the following option after socket
-creation:
-
- int one = 1;
- setsockopt(fd, SOL_PACKET, PACKET_QDISC_BYPASS, &one, sizeof(one));
-
-This has the side-effect, that packets sent through PF_PACKET will bypass the
-kernel's qdisc layer and are forcedly pushed to the driver directly. Meaning,
-packet are not buffered, tc disciplines are ignored, increased loss can occur
-and such packets are also not visible to other PF_PACKET sockets anymore. So,
-you have been warned; generally, this can be useful for stress testing various
-components of a system.
-
-On default, PACKET_QDISC_BYPASS is disabled and needs to be explicitly enabled
-on PF_PACKET sockets.
-
--------------------------------------------------------------------------------
-+ PACKET_TIMESTAMP
--------------------------------------------------------------------------------
-
-The PACKET_TIMESTAMP setting determines the source of the timestamp in
-the packet meta information for mmap(2)ed RX_RING and TX_RINGs. If your
-NIC is capable of timestamping packets in hardware, you can request those
-hardware timestamps to be used. Note: you may need to enable the generation
-of hardware timestamps with SIOCSHWTSTAMP (see related information from
-Documentation/networking/timestamping.txt).
-
-PACKET_TIMESTAMP accepts the same integer bit field as SO_TIMESTAMPING:
-
- int req = SOF_TIMESTAMPING_RAW_HARDWARE;
- setsockopt(fd, SOL_PACKET, PACKET_TIMESTAMP, (void *) &req, sizeof(req))
-
-For the mmap(2)ed ring buffers, such timestamps are stored in the
-tpacket{,2,3}_hdr structure's tp_sec and tp_{n,u}sec members. To determine
-what kind of timestamp has been reported, the tp_status field is binary |'ed
-with the following possible bits ...
-
- TP_STATUS_TS_RAW_HARDWARE
- TP_STATUS_TS_SOFTWARE
-
-... that are equivalent to its SOF_TIMESTAMPING_* counterparts. For the
-RX_RING, if neither is set (i.e. PACKET_TIMESTAMP is not set), then a
-software fallback was invoked *within* PF_PACKET's processing code (less
-precise).
-
-Getting timestamps for the TX_RING works as follows: i) fill the ring frames,
-ii) call sendto() e.g. in blocking mode, iii) wait for status of relevant
-frames to be updated resp. the frame handed over to the application, iv) walk
-through the frames to pick up the individual hw/sw timestamps.
-
-Only (!) if transmit timestamping is enabled, then these bits are combined
-with binary | with TP_STATUS_AVAILABLE, so you must check for that in your
-application (e.g. !(tp_status & (TP_STATUS_SEND_REQUEST | TP_STATUS_SENDING))
-in a first step to see if the frame belongs to the application, and then
-one can extract the type of timestamp in a second step from tp_status)!
-
-If you don't care about them, thus having it disabled, checking for
-TP_STATUS_AVAILABLE resp. TP_STATUS_WRONG_FORMAT is sufficient. If in the
-TX_RING part only TP_STATUS_AVAILABLE is set, then the tp_sec and tp_{n,u}sec
-members do not contain a valid value. For TX_RINGs, by default no timestamp
-is generated!
-
-See include/linux/net_tstamp.h and Documentation/networking/timestamping.txt
-for more information on hardware timestamps.
-
--------------------------------------------------------------------------------
-+ Miscellaneous bits
--------------------------------------------------------------------------------
-
-- Packet sockets work well together with Linux socket filters, thus you also
- might want to have a look at Documentation/networking/filter.txt
-
---------------------------------------------------------------------------------
-+ THANKS
---------------------------------------------------------------------------------
-
- Jesse Brandeburg, for fixing my grammathical/spelling errors
-
diff --git a/Documentation/networking/phonet.txt b/Documentation/networking/phonet.rst
index 81003581f47a..8668dcbc5e6a 100644
--- a/Documentation/networking/phonet.txt
+++ b/Documentation/networking/phonet.rst
@@ -1,3 +1,7 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+============================
Linux Phonet protocol family
============================
@@ -11,6 +15,7 @@ device attached to the modem. The modem takes care of routing.
Phonet packets can be exchanged through various hardware connections
depending on the device, such as:
+
- USB with the CDC Phonet interface,
- infrared,
- Bluetooth,
@@ -21,7 +26,7 @@ depending on the device, such as:
Packets format
--------------
-Phonet packets have a common header as follows:
+Phonet packets have a common header as follows::
struct phonethdr {
uint8_t pn_media; /* Media type (link-layer identifier) */
@@ -72,7 +77,7 @@ only the (default) Linux FIFO qdisc should be used with them.
Network layer
-------------
-The Phonet socket address family maps the Phonet packet header:
+The Phonet socket address family maps the Phonet packet header::
struct sockaddr_pn {
sa_family_t spn_family; /* AF_PHONET */
@@ -94,6 +99,8 @@ protocol from the PF_PHONET family. Each socket is bound to one of the
2^10 object IDs available, and can send and receive packets with any
other peer.
+::
+
struct sockaddr_pn addr = { .spn_family = AF_PHONET, };
ssize_t len;
socklen_t addrlen = sizeof(addr);
@@ -105,7 +112,7 @@ other peer.
sendto(fd, msg, msglen, 0, (struct sockaddr *)&addr, sizeof(addr));
len = recvfrom(fd, buf, sizeof(buf), 0,
- (struct sockaddr *)&addr, &addrlen);
+ (struct sockaddr *)&addr, &addrlen);
This protocol follows the SOCK_DGRAM connection-less semantics.
However, connect() and getpeername() are not supported, as they did
@@ -116,7 +123,7 @@ Resource subscription
---------------------
A Phonet datagram socket can be subscribed to any number of 8-bits
-Phonet resources, as follow:
+Phonet resources, as follow::
uint32_t res = 0xXX;
ioctl(fd, SIOCPNADDRESOURCE, &res);
@@ -137,6 +144,8 @@ socket paradigm. The listening socket is bound to an unique free object
ID. Each listening socket can handle up to 255 simultaneous
connections, one per accept()'d socket.
+::
+
int lfd, cfd;
lfd = socket(PF_PHONET, SOCK_SEQPACKET, PN_PROTO_PIPE);
@@ -161,7 +170,7 @@ Connections are traditionally established between two endpoints by a
As of Linux kernel version 2.6.39, it is also possible to connect
two endpoints directly, using connect() on the active side. This is
intended to support the newer Nokia Wireless Modem API, as found in
-e.g. the Nokia Slim Modem in the ST-Ericsson U8500 platform:
+e.g. the Nokia Slim Modem in the ST-Ericsson U8500 platform::
struct sockaddr_spn spn;
int fd;
@@ -177,38 +186,45 @@ e.g. the Nokia Slim Modem in the ST-Ericsson U8500 platform:
close(fd);
-WARNING:
-When polling a connected pipe socket for writability, there is an
-intrinsic race condition whereby writability might be lost between the
-polling and the writing system calls. In this case, the socket will
-block until write becomes possible again, unless non-blocking mode
-is enabled.
+.. Warning:
+
+ When polling a connected pipe socket for writability, there is an
+ intrinsic race condition whereby writability might be lost between the
+ polling and the writing system calls. In this case, the socket will
+ block until write becomes possible again, unless non-blocking mode
+ is enabled.
The pipe protocol provides two socket options at the SOL_PNPIPE level:
PNPIPE_ENCAP accepts one integer value (int) of:
- PNPIPE_ENCAP_NONE: The socket operates normally (default).
+ PNPIPE_ENCAP_NONE:
+ The socket operates normally (default).
- PNPIPE_ENCAP_IP: The socket is used as a backend for a virtual IP
+ PNPIPE_ENCAP_IP:
+ The socket is used as a backend for a virtual IP
interface. This requires CAP_NET_ADMIN capability. GPRS data
support on Nokia modems can use this. Note that the socket cannot
be reliably poll()'d or read() from while in this mode.
- PNPIPE_IFINDEX is a read-only integer value. It contains the
- interface index of the network interface created by PNPIPE_ENCAP,
- or zero if encapsulation is off.
+ PNPIPE_IFINDEX
+ is a read-only integer value. It contains the
+ interface index of the network interface created by PNPIPE_ENCAP,
+ or zero if encapsulation is off.
- PNPIPE_HANDLE is a read-only integer value. It contains the underlying
- identifier ("pipe handle") of the pipe. This is only defined for
- socket descriptors that are already connected or being connected.
+ PNPIPE_HANDLE
+ is a read-only integer value. It contains the underlying
+ identifier ("pipe handle") of the pipe. This is only defined for
+ socket descriptors that are already connected or being connected.
Authors
-------
Linux Phonet was initially written by Sakari Ailus.
+
Other contributors include Mikä Liljeberg, Andras Domokos,
Carlos Chinea and Rémi Denis-Courmont.
-Copyright (C) 2008 Nokia Corporation.
+
+Copyright |copy| 2008 Nokia Corporation.
diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.rst
index d2fd78f85aa4..7afa1c9f1183 100644
--- a/Documentation/networking/pktgen.txt
+++ b/Documentation/networking/pktgen.rst
@@ -1,7 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
-
- HOWTO for the linux packet generator
- ------------------------------------
+====================================
+HOWTO for the linux packet generator
+====================================
Enable CONFIG_NET_PKTGEN to compile and build pktgen either in-kernel
or as a module. A module is preferred; modprobe pktgen if needed. Once
@@ -9,17 +10,18 @@ running, pktgen creates a thread for each CPU with affinity to that CPU.
Monitoring and controlling is done via /proc. It is easiest to select a
suitable sample script and configure that.
-On a dual CPU:
+On a dual CPU::
+
+ ps aux | grep pkt
+ root 129 0.3 0.0 0 0 ? SW 2003 523:20 [kpktgend_0]
+ root 130 0.3 0.0 0 0 ? SW 2003 509:50 [kpktgend_1]
-ps aux | grep pkt
-root 129 0.3 0.0 0 0 ? SW 2003 523:20 [kpktgend_0]
-root 130 0.3 0.0 0 0 ? SW 2003 509:50 [kpktgend_1]
+For monitoring and control pktgen creates::
-For monitoring and control pktgen creates:
/proc/net/pktgen/pgctrl
/proc/net/pktgen/kpktgend_X
- /proc/net/pktgen/ethX
+ /proc/net/pktgen/ethX
Tuning NIC for max performance
@@ -28,7 +30,8 @@ Tuning NIC for max performance
The default NIC settings are (likely) not tuned for pktgen's artificial
overload type of benchmarking, as this could hurt the normal use-case.
-Specifically increasing the TX ring buffer in the NIC:
+Specifically increasing the TX ring buffer in the NIC::
+
# ethtool -G ethX tx 1024
A larger TX ring can improve pktgen's performance, while it can hurt
@@ -46,7 +49,8 @@ This cleanup issue is specifically the case for the driver ixgbe
and the cleanup interval is affected by the ethtool --coalesce setting
of parameter "rx-usecs".
-For ixgbe use e.g. "30" resulting in approx 33K interrupts/sec (1/30*10^6):
+For ixgbe use e.g. "30" resulting in approx 33K interrupts/sec (1/30*10^6)::
+
# ethtool -C ethX rx-usecs 30
@@ -55,7 +59,7 @@ Kernel threads
Pktgen creates a thread for each CPU with affinity to that CPU.
Which is controlled through procfile /proc/net/pktgen/kpktgend_X.
-Example: /proc/net/pktgen/kpktgend_0
+Example: /proc/net/pktgen/kpktgend_0::
Running:
Stopped: eth4@0
@@ -64,6 +68,7 @@ Example: /proc/net/pktgen/kpktgend_0
Most important are the devices assigned to the thread.
The two basic thread commands are:
+
* add_device DEVICE@NAME -- adds a single device
* rem_device_all -- remove all associated devices
@@ -73,7 +78,7 @@ be unique.
To support adding the same device to multiple threads, which is useful
with multi queue NICs, the device naming scheme is extended with "@":
- device@something
+device@something
The part after "@" can be anything, but it is custom to use the thread
number.
@@ -83,30 +88,30 @@ Viewing devices
The Params section holds configured information. The Current section
holds running statistics. The Result is printed after a run or after
-interruption. Example:
-
-/proc/net/pktgen/eth4@0
-
- Params: count 100000 min_pkt_size: 60 max_pkt_size: 60
- frags: 0 delay: 0 clone_skb: 64 ifname: eth4@0
- flows: 0 flowlen: 0
- queue_map_min: 0 queue_map_max: 0
- dst_min: 192.168.81.2 dst_max:
- src_min: src_max:
- src_mac: 90:e2:ba:0a:56:b4 dst_mac: 00:1b:21:3c:9d:f8
- udp_src_min: 9 udp_src_max: 109 udp_dst_min: 9 udp_dst_max: 9
- src_mac_count: 0 dst_mac_count: 0
- Flags: UDPSRC_RND NO_TIMESTAMP QUEUE_MAP_CPU
- Current:
- pkts-sofar: 100000 errors: 0
- started: 623913381008us stopped: 623913396439us idle: 25us
- seq_num: 100001 cur_dst_mac_offset: 0 cur_src_mac_offset: 0
- cur_saddr: 192.168.8.3 cur_daddr: 192.168.81.2
- cur_udp_dst: 9 cur_udp_src: 42
- cur_queue_map: 0
- flows: 0
- Result: OK: 15430(c15405+d25) usec, 100000 (60byte,0frags)
- 6480562pps 3110Mb/sec (3110669760bps) errors: 0
+interruption. Example::
+
+ /proc/net/pktgen/eth4@0
+
+ Params: count 100000 min_pkt_size: 60 max_pkt_size: 60
+ frags: 0 delay: 0 clone_skb: 64 ifname: eth4@0
+ flows: 0 flowlen: 0
+ queue_map_min: 0 queue_map_max: 0
+ dst_min: 192.168.81.2 dst_max:
+ src_min: src_max:
+ src_mac: 90:e2:ba:0a:56:b4 dst_mac: 00:1b:21:3c:9d:f8
+ udp_src_min: 9 udp_src_max: 109 udp_dst_min: 9 udp_dst_max: 9
+ src_mac_count: 0 dst_mac_count: 0
+ Flags: UDPSRC_RND NO_TIMESTAMP QUEUE_MAP_CPU
+ Current:
+ pkts-sofar: 100000 errors: 0
+ started: 623913381008us stopped: 623913396439us idle: 25us
+ seq_num: 100001 cur_dst_mac_offset: 0 cur_src_mac_offset: 0
+ cur_saddr: 192.168.8.3 cur_daddr: 192.168.81.2
+ cur_udp_dst: 9 cur_udp_src: 42
+ cur_queue_map: 0
+ flows: 0
+ Result: OK: 15430(c15405+d25) usec, 100000 (60byte,0frags)
+ 6480562pps 3110Mb/sec (3110669760bps) errors: 0
Configuring devices
@@ -114,11 +119,12 @@ Configuring devices
This is done via the /proc interface, and most easily done via pgset
as defined in the sample scripts.
You need to specify PGDEV environment variable to use functions from sample
-scripts, i.e.:
-export PGDEV=/proc/net/pktgen/eth4@0
-source samples/pktgen/functions.sh
+scripts, i.e.::
+
+ export PGDEV=/proc/net/pktgen/eth4@0
+ source samples/pktgen/functions.sh
-Examples:
+Examples::
pg_ctrl start starts injection.
pg_ctrl stop aborts injection. Also, ^C aborts generator.
@@ -126,17 +132,17 @@ Examples:
pgset "clone_skb 1" sets the number of copies of the same packet
pgset "clone_skb 0" use single SKB for all transmits
pgset "burst 8" uses xmit_more API to queue 8 copies of the same
- packet and update HW tx queue tail pointer once.
- "burst 1" is the default
+ packet and update HW tx queue tail pointer once.
+ "burst 1" is the default
pgset "pkt_size 9014" sets packet size to 9014
pgset "frags 5" packet will consist of 5 fragments
pgset "count 200000" sets number of packets to send, set to zero
- for continuous sends until explicitly stopped.
+ for continuous sends until explicitly stopped.
pgset "delay 5000" adds delay to hard_start_xmit(). nanoseconds
pgset "dst 10.0.0.1" sets IP destination address
- (BEWARE! This generator is very aggressive!)
+ (BEWARE! This generator is very aggressive!)
pgset "dst_min 10.0.0.1" Same as dst
pgset "dst_max 10.0.0.254" Set the maximum destination IP.
@@ -149,46 +155,46 @@ Examples:
pgset "queue_map_min 0" Sets the min value of tx queue interval
pgset "queue_map_max 7" Sets the max value of tx queue interval, for multiqueue devices
- To select queue 1 of a given device,
- use queue_map_min=1 and queue_map_max=1
+ To select queue 1 of a given device,
+ use queue_map_min=1 and queue_map_max=1
pgset "src_mac_count 1" Sets the number of MACs we'll range through.
- The 'minimum' MAC is what you set with srcmac.
+ The 'minimum' MAC is what you set with srcmac.
pgset "dst_mac_count 1" Sets the number of MACs we'll range through.
- The 'minimum' MAC is what you set with dstmac.
+ The 'minimum' MAC is what you set with dstmac.
pgset "flag [name]" Set a flag to determine behaviour. Current flags
- are: IPSRC_RND # IP source is random (between min/max)
- IPDST_RND # IP destination is random
- UDPSRC_RND, UDPDST_RND,
- MACSRC_RND, MACDST_RND
- TXSIZE_RND, IPV6,
- MPLS_RND, VID_RND, SVID_RND
- FLOW_SEQ,
- QUEUE_MAP_RND # queue map random
- QUEUE_MAP_CPU # queue map mirrors smp_processor_id()
- UDPCSUM,
- IPSEC # IPsec encapsulation (needs CONFIG_XFRM)
- NODE_ALLOC # node specific memory allocation
- NO_TIMESTAMP # disable timestamping
+ are: IPSRC_RND # IP source is random (between min/max)
+ IPDST_RND # IP destination is random
+ UDPSRC_RND, UDPDST_RND,
+ MACSRC_RND, MACDST_RND
+ TXSIZE_RND, IPV6,
+ MPLS_RND, VID_RND, SVID_RND
+ FLOW_SEQ,
+ QUEUE_MAP_RND # queue map random
+ QUEUE_MAP_CPU # queue map mirrors smp_processor_id()
+ UDPCSUM,
+ IPSEC # IPsec encapsulation (needs CONFIG_XFRM)
+ NODE_ALLOC # node specific memory allocation
+ NO_TIMESTAMP # disable timestamping
pgset 'flag ![name]' Clear a flag to determine behaviour.
- Note that you might need to use single quote in
- interactive mode, so that your shell wouldn't expand
- the specified flag as a history command.
+ Note that you might need to use single quote in
+ interactive mode, so that your shell wouldn't expand
+ the specified flag as a history command.
pgset "spi [SPI_VALUE]" Set specific SA used to transform packet.
pgset "udp_src_min 9" set UDP source port min, If < udp_src_max, then
- cycle through the port range.
+ cycle through the port range.
pgset "udp_src_max 9" set UDP source port max.
pgset "udp_dst_min 9" set UDP destination port min, If < udp_dst_max, then
- cycle through the port range.
+ cycle through the port range.
pgset "udp_dst_max 9" set UDP destination port max.
pgset "mpls 0001000a,0002000a,0000000a" set MPLS labels (in this example
- outer label=16,middle label=32,
+ outer label=16,middle label=32,
inner label=0 (IPv4 NULL)) Note that
there must be no spaces between the
arguments. Leading zeros are required.
@@ -232,10 +238,14 @@ A collection of tutorial scripts and helpers for pktgen is in the
samples/pktgen directory. The helper parameters.sh file support easy
and consistent parameter parsing across the sample scripts.
-Usage example and help:
+Usage example and help::
+
./pktgen_sample01_simple.sh -i eth4 -m 00:1B:21:3C:9D:F8 -d 192.168.8.2
-Usage: ./pktgen_sample01_simple.sh [-vx] -i ethX
+Usage:::
+
+ ./pktgen_sample01_simple.sh [-vx] -i ethX
+
-i : ($DEV) output interface/device (required)
-s : ($PKT_SIZE) packet size
-d : ($DEST_IP) destination IP
@@ -250,13 +260,13 @@ The global variables being set are also listed. E.g. the required
interface/device parameter "-i" sets variable $DEV. Copy the
pktgen_sampleXX scripts and modify them to fit your own needs.
-The old scripts:
+The old scripts::
-pktgen.conf-1-2 # 1 CPU 2 dev
-pktgen.conf-1-1-rdos # 1 CPU 1 dev w. route DoS
-pktgen.conf-1-1-ip6 # 1 CPU 1 dev ipv6
-pktgen.conf-1-1-ip6-rdos # 1 CPU 1 dev ipv6 w. route DoS
-pktgen.conf-1-1-flows # 1 CPU 1 dev multiple flows.
+ pktgen.conf-1-2 # 1 CPU 2 dev
+ pktgen.conf-1-1-rdos # 1 CPU 1 dev w. route DoS
+ pktgen.conf-1-1-ip6 # 1 CPU 1 dev ipv6
+ pktgen.conf-1-1-ip6-rdos # 1 CPU 1 dev ipv6 w. route DoS
+ pktgen.conf-1-1-flows # 1 CPU 1 dev multiple flows.
Interrupt affinity
@@ -271,10 +281,10 @@ to the running threads CPU (directly from smp_processor_id()).
Enable IPsec
============
Default IPsec transformation with ESP encapsulation plus transport mode
-can be enabled by simply setting:
+can be enabled by simply setting::
-pgset "flag IPSEC"
-pgset "flows 1"
+ pgset "flag IPSEC"
+ pgset "flows 1"
To avoid breaking existing testbed scripts for using AH type and tunnel mode,
you can use "pgset spi SPI_VALUE" to specify which transformation mode
@@ -284,115 +294,117 @@ to employ.
Current commands and configuration options
==========================================
-** Pgcontrol commands:
+**Pgcontrol commands**::
-start
-stop
-reset
+ start
+ stop
+ reset
-** Thread commands:
+**Thread commands**::
-add_device
-rem_device_all
+ add_device
+ rem_device_all
-** Device commands:
+**Device commands**::
-count
-clone_skb
-burst
-debug
+ count
+ clone_skb
+ burst
+ debug
-frags
-delay
+ frags
+ delay
-src_mac_count
-dst_mac_count
+ src_mac_count
+ dst_mac_count
-pkt_size
-min_pkt_size
-max_pkt_size
+ pkt_size
+ min_pkt_size
+ max_pkt_size
-queue_map_min
-queue_map_max
-skb_priority
+ queue_map_min
+ queue_map_max
+ skb_priority
-tos (ipv4)
-traffic_class (ipv6)
+ tos (ipv4)
+ traffic_class (ipv6)
-mpls
+ mpls
-udp_src_min
-udp_src_max
+ udp_src_min
+ udp_src_max
-udp_dst_min
-udp_dst_max
+ udp_dst_min
+ udp_dst_max
-node
+ node
-flag
- IPSRC_RND
- IPDST_RND
- UDPSRC_RND
- UDPDST_RND
- MACSRC_RND
- MACDST_RND
- TXSIZE_RND
- IPV6
- MPLS_RND
- VID_RND
- SVID_RND
- FLOW_SEQ
- QUEUE_MAP_RND
- QUEUE_MAP_CPU
- UDPCSUM
- IPSEC
- NODE_ALLOC
- NO_TIMESTAMP
+ flag
+ IPSRC_RND
+ IPDST_RND
+ UDPSRC_RND
+ UDPDST_RND
+ MACSRC_RND
+ MACDST_RND
+ TXSIZE_RND
+ IPV6
+ MPLS_RND
+ VID_RND
+ SVID_RND
+ FLOW_SEQ
+ QUEUE_MAP_RND
+ QUEUE_MAP_CPU
+ UDPCSUM
+ IPSEC
+ NODE_ALLOC
+ NO_TIMESTAMP
-spi (ipsec)
+ spi (ipsec)
-dst_min
-dst_max
+ dst_min
+ dst_max
-src_min
-src_max
+ src_min
+ src_max
-dst_mac
-src_mac
+ dst_mac
+ src_mac
-clear_counters
+ clear_counters
-src6
-dst6
-dst6_max
-dst6_min
+ src6
+ dst6
+ dst6_max
+ dst6_min
-flows
-flowlen
+ flows
+ flowlen
-rate
-ratep
+ rate
+ ratep
-xmit_mode <start_xmit|netif_receive>
+ xmit_mode <start_xmit|netif_receive>
-vlan_cfi
-vlan_id
-vlan_p
+ vlan_cfi
+ vlan_id
+ vlan_p
-svlan_cfi
-svlan_id
-svlan_p
+ svlan_cfi
+ svlan_id
+ svlan_p
References:
-ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/
-ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/examples/
+
+- ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/
+- tp://robur.slu.se/pub/Linux/net-development/pktgen-testing/examples/
Paper from Linux-Kongress in Erlangen 2004.
-ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/pktgen_paper.pdf
+- ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/pktgen_paper.pdf
Thanks to:
+
Grant Grundler for testing on IA-64 and parisc, Harald Welte, Lennert Buytenhek
Stephen Hemminger, Andi Kleen, Dave Miller and many others.
diff --git a/Documentation/networking/PLIP.txt b/Documentation/networking/plip.rst
index ad7e3f7c3bbf..0eda745050ff 100644
--- a/Documentation/networking/PLIP.txt
+++ b/Documentation/networking/plip.rst
@@ -1,4 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================================================
PLIP: The Parallel Line Internet Protocol Device
+================================================
Donald Becker (becker@super.org)
I.D.A. Supercomputing Research Center, Bowie MD 20715
@@ -83,7 +87,7 @@ When the PLIP driver is used in IRQ mode, the timeout used for triggering a
data transfer (the maximal time the PLIP driver would allow the other side
before announcing a timeout, when trying to handshake a transfer of some
data) is, by default, 500usec. As IRQ delivery is more or less immediate,
-this timeout is quite sufficient.
+this timeout is quite sufficient.
When in IRQ-less mode, the PLIP driver polls the parallel port HZ times
per second (where HZ is typically 100 on most platforms, and 1024 on an
@@ -115,7 +119,7 @@ printer "null" cable to transfer data four bits at a time using
data bit outputs connected to status bit inputs.
The second data transfer method relies on both machines having
-bi-directional parallel ports, rather than output-only ``printer''
+bi-directional parallel ports, rather than output-only ``printer``
ports. This allows byte-wide transfers and avoids reconstructing
nibbles into bytes, leading to much faster transfers.
@@ -132,7 +136,7 @@ bits with standard status register implementation.
A cable that implements this protocol is available commercially as a
"Null Printer" or "Turbo Laplink" cable. It can be constructed with
-two DB-25 male connectors symmetrically connected as follows:
+two DB-25 male connectors symmetrically connected as follows::
STROBE output 1*
D0->ERROR 2 - 15 15 - 2
@@ -146,7 +150,8 @@ two DB-25 male connectors symmetrically connected as follows:
SLCTIN 17 - 17
extra grounds are 18*,19*,20*,21*,22*,23*,24*
GROUND 25 - 25
-* Do not connect these pins on either end
+
+ * Do not connect these pins on either end
If the cable you are using has a metallic shield it should be
connected to the metallic DB-25 shell at one end only.
@@ -155,14 +160,14 @@ Parallel Transfer Mode 1
========================
The second data transfer method relies on both machines having
-bi-directional parallel ports, rather than output-only ``printer''
+bi-directional parallel ports, rather than output-only ``printer``
ports. This allows byte-wide transfers, and avoids reconstructing
nibbles into bytes. This cable should not be used on unidirectional
-``printer'' (as opposed to ``parallel'') ports or when the machine
+``printer`` (as opposed to ``parallel``) ports or when the machine
isn't configured for PLIP, as it will result in output driver
conflicts and the (unlikely) possibility of damage.
-The cable for this transfer mode should be constructed as follows:
+The cable for this transfer mode should be constructed as follows::
STROBE->BUSY 1 - 11
D0->D0 2 - 2
@@ -179,7 +184,8 @@ The cable for this transfer mode should be constructed as follows:
GND->ERROR 18 - 15
extra grounds are 19*,20*,21*,22*,23*,24*
GROUND 25 - 25
-* Do not connect these pins on either end
+
+ * Do not connect these pins on either end
Once again, if the cable you are using has a metallic shield it should
be connected to the metallic DB-25 shell at one end only.
@@ -188,7 +194,7 @@ PLIP Mode 0 transfer protocol
=============================
The PLIP driver is compatible with the "Crynwr" parallel port transfer
-standard in Mode 0. That standard specifies the following protocol:
+standard in Mode 0. That standard specifies the following protocol::
send header nibble '0x8'
count-low octet
@@ -196,20 +202,21 @@ standard in Mode 0. That standard specifies the following protocol:
... data octets
checksum octet
-Each octet is sent as
+Each octet is sent as::
+
<wait for rx. '0x1?'> <send 0x10+(octet&0x0F)>
<wait for rx. '0x0?'> <send 0x00+((octet>>4)&0x0F)>
To start a transfer the transmitting machine outputs a nibble 0x08.
That raises the ACK line, triggering an interrupt in the receiving
machine. The receiving machine disables interrupts and raises its own ACK
-line.
+line.
-Restated:
+Restated::
-(OUT is bit 0-4, OUT.j is bit j from OUT. IN likewise)
-Send_Byte:
- OUT := low nibble, OUT.4 := 1
- WAIT FOR IN.4 = 1
- OUT := high nibble, OUT.4 := 0
- WAIT FOR IN.4 = 0
+ (OUT is bit 0-4, OUT.j is bit j from OUT. IN likewise)
+ Send_Byte:
+ OUT := low nibble, OUT.4 := 1
+ WAIT FOR IN.4 = 1
+ OUT := high nibble, OUT.4 := 0
+ WAIT FOR IN.4 = 0
diff --git a/Documentation/networking/ppp_generic.txt b/Documentation/networking/ppp_generic.rst
index fd563aff5fc9..e60504377900 100644
--- a/Documentation/networking/ppp_generic.txt
+++ b/Documentation/networking/ppp_generic.rst
@@ -1,8 +1,12 @@
- PPP Generic Driver and Channel Interface
- ----------------------------------------
+.. SPDX-License-Identifier: GPL-2.0
- Paul Mackerras
+========================================
+PPP Generic Driver and Channel Interface
+========================================
+
+ Paul Mackerras
paulus@samba.org
+
7 Feb 2002
The generic PPP driver in linux-2.4 provides an implementation of the
@@ -19,7 +23,7 @@ functionality which is of use in any PPP implementation, including:
* simple packet filtering
For sending and receiving PPP frames, the generic PPP driver calls on
-the services of PPP `channels'. A PPP channel encapsulates a
+the services of PPP ``channels``. A PPP channel encapsulates a
mechanism for transporting PPP frames from one machine to another. A
PPP channel implementation can be arbitrarily complex internally but
has a very simple interface with the generic PPP code: it merely has
@@ -102,7 +106,7 @@ communications medium and prepare it to do PPP. For example, with an
async tty, this can involve setting the tty speed and modes, issuing
modem commands, and then going through some sort of dialog with the
remote system to invoke PPP service there. We refer to this process
-as `discovery'. Then the user-level process tells the medium to
+as ``discovery``. Then the user-level process tells the medium to
become a PPP channel and register itself with the generic PPP layer.
The channel then has to report the channel number assigned to it back
to the user-level process. From that point, the PPP negotiation code
@@ -111,8 +115,8 @@ negotiation, accessing the channel through the /dev/ppp interface.
At the interface to the PPP generic layer, PPP frames are stored in
skbuff structures and start with the two-byte PPP protocol number.
-The frame does *not* include the 0xff `address' byte or the 0x03
-`control' byte that are optionally used in async PPP. Nor is there
+The frame does *not* include the 0xff ``address`` byte or the 0x03
+``control`` byte that are optionally used in async PPP. Nor is there
any escaping of control characters, nor are there any FCS or framing
characters included. That is all the responsibility of the channel
code, if it is needed for the particular medium. That is, the skbuffs
@@ -121,16 +125,16 @@ protocol number and the data, and the skbuffs presented to ppp_input()
must be in the same format.
The channel must provide an instance of a ppp_channel struct to
-represent the channel. The channel is free to use the `private' field
-however it wishes. The channel should initialize the `mtu' and
-`hdrlen' fields before calling ppp_register_channel() and not change
-them until after ppp_unregister_channel() returns. The `mtu' field
+represent the channel. The channel is free to use the ``private`` field
+however it wishes. The channel should initialize the ``mtu`` and
+``hdrlen`` fields before calling ppp_register_channel() and not change
+them until after ppp_unregister_channel() returns. The ``mtu`` field
represents the maximum size of the data part of the PPP frames, that
is, it does not include the 2-byte protocol number.
If the channel needs some headroom in the skbuffs presented to it for
transmission (i.e., some space free in the skbuff data area before the
-start of the PPP frame), it should set the `hdrlen' field of the
+start of the PPP frame), it should set the ``hdrlen`` field of the
ppp_channel struct to the amount of headroom required. The generic
PPP layer will attempt to provide that much headroom but the channel
should still check if there is sufficient headroom and copy the skbuff
@@ -322,6 +326,8 @@ an interface unit are:
interface. The argument should be a pointer to an int containing
the new flags value. The bits in the flags value that can be set
are:
+
+ ================ ========================================
SC_COMP_TCP enable transmit TCP header compression
SC_NO_TCP_CCID disable connection-id compression for
TCP header compression
@@ -335,6 +341,7 @@ an interface unit are:
SC_MP_SHORTSEQ expect short multilink sequence
numbers on received multilink fragments
SC_MP_XSHORTSEQ transmit short multilink sequence nos.
+ ================ ========================================
The values of these flags are defined in <linux/ppp-ioctl.h>. Note
that the values of the SC_MULTILINK, SC_MP_SHORTSEQ and
@@ -345,17 +352,20 @@ an interface unit are:
interface unit. The argument should point to an int where the ioctl
will store the flags value. As well as the values listed above for
PPPIOCSFLAGS, the following bits may be set in the returned value:
+
+ ================ =========================================
SC_COMP_RUN CCP compressor is running
SC_DECOMP_RUN CCP decompressor is running
SC_DC_ERROR CCP decompressor detected non-fatal error
SC_DC_FERROR CCP decompressor detected fatal error
+ ================ =========================================
* PPPIOCSCOMPRESS sets the parameters for packet compression or
decompression. The argument should point to a ppp_option_data
structure (defined in <linux/ppp-ioctl.h>), which contains a
pointer/length pair which should describe a block of memory
containing a CCP option specifying a compression method and its
- parameters. The ppp_option_data struct also contains a `transmit'
+ parameters. The ppp_option_data struct also contains a ``transmit``
field. If this is 0, the ioctl will affect the receive path,
otherwise the transmit path.
@@ -377,7 +387,7 @@ an interface unit are:
ppp_idle structure (defined in <linux/ppp_defs.h>). If the
CONFIG_PPP_FILTER option is enabled, the set of packets which reset
the transmit and receive idle timers is restricted to those which
- pass the `active' packet filter.
+ pass the ``active`` packet filter.
Two versions of this command exist, to deal with user space
expecting times as either 32-bit or 64-bit time_t seconds.
@@ -391,31 +401,33 @@ an interface unit are:
* PPPIOCSNPMODE sets the network-protocol mode for a given network
protocol. The argument should point to an npioctl struct (defined
- in <linux/ppp-ioctl.h>). The `protocol' field gives the PPP protocol
- number for the protocol to be affected, and the `mode' field
+ in <linux/ppp-ioctl.h>). The ``protocol`` field gives the PPP protocol
+ number for the protocol to be affected, and the ``mode`` field
specifies what to do with packets for that protocol:
+ ============= ==============================================
NPMODE_PASS normal operation, transmit and receive packets
NPMODE_DROP silently drop packets for this protocol
NPMODE_ERROR drop packets and return an error on transmit
NPMODE_QUEUE queue up packets for transmit, drop received
packets
+ ============= ==============================================
At present NPMODE_ERROR and NPMODE_QUEUE have the same effect as
NPMODE_DROP.
* PPPIOCGNPMODE returns the network-protocol mode for a given
protocol. The argument should point to an npioctl struct with the
- `protocol' field set to the PPP protocol number for the protocol of
- interest. On return the `mode' field will be set to the network-
+ ``protocol`` field set to the PPP protocol number for the protocol of
+ interest. On return the ``mode`` field will be set to the network-
protocol mode for that protocol.
-* PPPIOCSPASS and PPPIOCSACTIVE set the `pass' and `active' packet
+* PPPIOCSPASS and PPPIOCSACTIVE set the ``pass`` and ``active`` packet
filters. These ioctls are only available if the CONFIG_PPP_FILTER
option is selected. The argument should point to a sock_fprog
structure (defined in <linux/filter.h>) containing the compiled BPF
instructions for the filter. Packets are dropped if they fail the
- `pass' filter; otherwise, if they fail the `active' filter they are
+ ``pass`` filter; otherwise, if they fail the ``active`` filter they are
passed but they do not reset the transmit or receive idle timer.
* PPPIOCSMRRU enables or disables multilink processing for received
diff --git a/Documentation/networking/proc_net_tcp.txt b/Documentation/networking/proc_net_tcp.rst
index 4a79209e77a7..7d9dfe36af45 100644
--- a/Documentation/networking/proc_net_tcp.txt
+++ b/Documentation/networking/proc_net_tcp.rst
@@ -1,15 +1,21 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================================
+The proc/net/tcp and proc/net/tcp6 variables
+============================================
+
This document describes the interfaces /proc/net/tcp and /proc/net/tcp6.
Note that these interfaces are deprecated in favor of tcp_diag.
-These /proc interfaces provide information about currently active TCP
+These /proc interfaces provide information about currently active TCP
connections, and are implemented by tcp4_seq_show() in net/ipv4/tcp_ipv4.c
and tcp6_seq_show() in net/ipv6/tcp_ipv6.c, respectively.
It will first list all listening TCP sockets, and next list all established
-TCP connections. A typical entry of /proc/net/tcp would look like this (split
-up into 3 parts because of the length of the line):
+TCP connections. A typical entry of /proc/net/tcp would look like this (split
+up into 3 parts because of the length of the line)::
- 46: 010310AC:9C4C 030310AC:1770 01
+ 46: 010310AC:9C4C 030310AC:1770 01
| | | | | |--> connection state
| | | | |------> remote TCP port number
| | | |-------------> remote IPv4 address
@@ -17,7 +23,7 @@ up into 3 parts because of the length of the line):
| |---------------------------> local IPv4 address
|----------------------------------> number of entry
- 00000150:00000000 01:00000019 00000000
+ 00000150:00000000 01:00000019 00000000
| | | | |--> number of unrecovered RTO timeouts
| | | |----------> number of jiffies until timer expires
| | |----------------> timer_active (see below)
@@ -25,7 +31,7 @@ up into 3 parts because of the length of the line):
|-------------------------------> transmit-queue
1000 0 54165785 4 cd1e6040 25 4 27 3 -1
- | | | | | | | | | |--> slow start size threshold,
+ | | | | | | | | | |--> slow start size threshold,
| | | | | | | | | or -1 if the threshold
| | | | | | | | | is >= 0xFFFF
| | | | | | | | |----> sending congestion window
@@ -40,9 +46,12 @@ up into 3 parts because of the length of the line):
|---------------------------------------------> uid
timer_active:
+
+ == ================================================================
0 no timer is pending
1 retransmit-timer is pending
2 another timer (e.g. delayed ack or keepalive) is pending
- 3 this is a socket in TIME_WAIT state. Not all fields will contain
+ 3 this is a socket in TIME_WAIT state. Not all fields will contain
data (or even exist)
4 zero window probe timer is pending
+ == ================================================================
diff --git a/Documentation/networking/radiotap-headers.txt b/Documentation/networking/radiotap-headers.rst
index 953331c7984f..1a1bd1ec0650 100644
--- a/Documentation/networking/radiotap-headers.txt
+++ b/Documentation/networking/radiotap-headers.rst
@@ -1,3 +1,6 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================
How to use radiotap headers
===========================
@@ -5,9 +8,9 @@ Pointer to the radiotap include file
------------------------------------
Radiotap headers are variable-length and extensible, you can get most of the
-information you need to know on them from:
+information you need to know on them from::
-./include/net/ieee80211_radiotap.h
+ ./include/net/ieee80211_radiotap.h
This document gives an overview and warns on some corner cases.
@@ -21,6 +24,8 @@ of the it_present member of ieee80211_radiotap_header is set, it means that
the header for argument index 0 (IEEE80211_RADIOTAP_TSFT) is present in the
argument area.
+::
+
< 8-byte ieee80211_radiotap_header >
[ <possible argument bitmap extensions ... > ]
[ <argument> ... ]
@@ -76,6 +81,8 @@ ieee80211_radiotap_header.
Example valid radiotap header
-----------------------------
+::
+
0x00, 0x00, // <-- radiotap version + pad byte
0x0b, 0x00, // <- radiotap header length
0x04, 0x0c, 0x00, 0x00, // <-- bitmap
@@ -89,64 +96,64 @@ Using the Radiotap Parser
If you are having to parse a radiotap struct, you can radically simplify the
job by using the radiotap parser that lives in net/wireless/radiotap.c and has
-its prototypes available in include/net/cfg80211.h. You use it like this:
+its prototypes available in include/net/cfg80211.h. You use it like this::
-#include <net/cfg80211.h>
+ #include <net/cfg80211.h>
-/* buf points to the start of the radiotap header part */
+ /* buf points to the start of the radiotap header part */
-int MyFunction(u8 * buf, int buflen)
-{
- int pkt_rate_100kHz = 0, antenna = 0, pwr = 0;
- struct ieee80211_radiotap_iterator iterator;
- int ret = ieee80211_radiotap_iterator_init(&iterator, buf, buflen);
+ int MyFunction(u8 * buf, int buflen)
+ {
+ int pkt_rate_100kHz = 0, antenna = 0, pwr = 0;
+ struct ieee80211_radiotap_iterator iterator;
+ int ret = ieee80211_radiotap_iterator_init(&iterator, buf, buflen);
- while (!ret) {
+ while (!ret) {
- ret = ieee80211_radiotap_iterator_next(&iterator);
+ ret = ieee80211_radiotap_iterator_next(&iterator);
- if (ret)
- continue;
+ if (ret)
+ continue;
- /* see if this argument is something we can use */
+ /* see if this argument is something we can use */
- switch (iterator.this_arg_index) {
- /*
- * You must take care when dereferencing iterator.this_arg
- * for multibyte types... the pointer is not aligned. Use
- * get_unaligned((type *)iterator.this_arg) to dereference
- * iterator.this_arg for type "type" safely on all arches.
- */
- case IEEE80211_RADIOTAP_RATE:
- /* radiotap "rate" u8 is in
- * 500kbps units, eg, 0x02=1Mbps
- */
- pkt_rate_100kHz = (*iterator.this_arg) * 5;
- break;
+ switch (iterator.this_arg_index) {
+ /*
+ * You must take care when dereferencing iterator.this_arg
+ * for multibyte types... the pointer is not aligned. Use
+ * get_unaligned((type *)iterator.this_arg) to dereference
+ * iterator.this_arg for type "type" safely on all arches.
+ */
+ case IEEE80211_RADIOTAP_RATE:
+ /* radiotap "rate" u8 is in
+ * 500kbps units, eg, 0x02=1Mbps
+ */
+ pkt_rate_100kHz = (*iterator.this_arg) * 5;
+ break;
- case IEEE80211_RADIOTAP_ANTENNA:
- /* radiotap uses 0 for 1st ant */
- antenna = *iterator.this_arg);
- break;
+ case IEEE80211_RADIOTAP_ANTENNA:
+ /* radiotap uses 0 for 1st ant */
+ antenna = *iterator.this_arg);
+ break;
- case IEEE80211_RADIOTAP_DBM_TX_POWER:
- pwr = *iterator.this_arg;
- break;
+ case IEEE80211_RADIOTAP_DBM_TX_POWER:
+ pwr = *iterator.this_arg;
+ break;
- default:
- break;
- }
- } /* while more rt headers */
+ default:
+ break;
+ }
+ } /* while more rt headers */
- if (ret != -ENOENT)
- return TXRX_DROP;
+ if (ret != -ENOENT)
+ return TXRX_DROP;
- /* discard the radiotap header part */
- buf += iterator.max_length;
- buflen -= iterator.max_length;
+ /* discard the radiotap header part */
+ buf += iterator.max_length;
+ buflen -= iterator.max_length;
- ...
+ ...
-}
+ }
Andy Green <andy@warmcat.com>
diff --git a/Documentation/networking/ray_cs.txt b/Documentation/networking/ray_cs.rst
index c0c12307ed9d..9a46d1ae8f20 100644
--- a/Documentation/networking/ray_cs.txt
+++ b/Documentation/networking/ray_cs.rst
@@ -1,6 +1,14 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: <isonum.txt>
+
+=========================
+Raylink wireless LAN card
+=========================
+
September 21, 1999
-Copyright (c) 1998 Corey Thomas (corey@world.std.com)
+Copyright |copy| 1998 Corey Thomas (corey@world.std.com)
This file is the documentation for the Raylink Wireless LAN card driver for
Linux. The Raylink wireless LAN card is a PCMCIA card which provides IEEE
@@ -13,7 +21,7 @@ wireless LAN cards.
As of kernel 2.3.18, the ray_cs driver is part of the Linux kernel
source. My web page for the development of ray_cs is at
-http://web.ralinktech.com/ralink/Home/Support/Linux.html
+http://web.ralinktech.com/ralink/Home/Support/Linux.html
and I can be emailed at corey@world.std.com
The kernel driver is based on ray_cs-1.62.tgz
@@ -29,6 +37,7 @@ with nondefault parameters, they can be edited in
will find them all.
Information on card services is available at:
+
http://pcmcia-cs.sourceforge.net/
@@ -39,72 +48,78 @@ the driver.
Currently, ray_cs is not part of David Hinds card services package,
so the following magic is required.
-At the end of the /etc/pcmcia/config.opts file, add the line:
-source ./ray_cs.opts
+At the end of the /etc/pcmcia/config.opts file, add the line:
+source ./ray_cs.opts
This will make card services read the ray_cs.opts file
when starting. Create the file /etc/pcmcia/ray_cs.opts containing the
-following:
+following::
-#### start of /etc/pcmcia/ray_cs.opts ###################
-# Configuration options for Raylink Wireless LAN PCMCIA card
-device "ray_cs"
- class "network" module "misc/ray_cs"
+ #### start of /etc/pcmcia/ray_cs.opts ###################
+ # Configuration options for Raylink Wireless LAN PCMCIA card
+ device "ray_cs"
+ class "network" module "misc/ray_cs"
-card "RayLink PC Card WLAN Adapter"
- manfid 0x01a6, 0x0000
- bind "ray_cs"
+ card "RayLink PC Card WLAN Adapter"
+ manfid 0x01a6, 0x0000
+ bind "ray_cs"
-module "misc/ray_cs" opts ""
-#### end of /etc/pcmcia/ray_cs.opts #####################
+ module "misc/ray_cs" opts ""
+ #### end of /etc/pcmcia/ray_cs.opts #####################
To join an existing network with
-different parameters, contact the network administrator for the
+different parameters, contact the network administrator for the
configuration information, and edit /etc/pcmcia/ray_cs.opts.
Add the parameters below between the empty quotes.
Parameters for ray_cs driver which may be specified in ray_cs.opts:
-bc integer 0 = normal mode (802.11 timing)
- 1 = slow down inter frame timing to allow
- operation with older breezecom access
- points.
-
-beacon_period integer beacon period in Kilo-microseconds
- legal values = must be integer multiple
- of hop dwell
- default = 256
-
-country integer 1 = USA (default)
- 2 = Europe
- 3 = Japan
- 4 = Korea
- 5 = Spain
- 6 = France
- 7 = Israel
- 8 = Australia
+=============== =============== =============================================
+bc integer 0 = normal mode (802.11 timing),
+ 1 = slow down inter frame timing to allow
+ operation with older breezecom access
+ points.
+
+beacon_period integer beacon period in Kilo-microseconds,
+
+ legal values = must be integer multiple
+ of hop dwell
+
+ default = 256
+
+country integer 1 = USA (default),
+ 2 = Europe,
+ 3 = Japan,
+ 4 = Korea,
+ 5 = Spain,
+ 6 = France,
+ 7 = Israel,
+ 8 = Australia
essid string ESS ID - network name to join
+
string with maximum length of 32 chars
default value = "ADHOC_ESSID"
-hop_dwell integer hop dwell time in Kilo-microseconds
+hop_dwell integer hop dwell time in Kilo-microseconds
+
legal values = 16,32,64,128(default),256
irq_mask integer linux standard 16 bit value 1bit/IRQ
+
lsb is IRQ 0, bit 1 is IRQ 1 etc.
Used to restrict choice of IRQ's to use.
- Recommended method for controlling
- interrupts is in /etc/pcmcia/config.opts
+ Recommended method for controlling
+ interrupts is in /etc/pcmcia/config.opts
-net_type integer 0 (default) = adhoc network,
+net_type integer 0 (default) = adhoc network,
1 = infrastructure
phy_addr string string containing new MAC address in
hex, must start with x eg
x00008f123456
-psm integer 0 = continuously active
+psm integer 0 = continuously active,
1 = power save mode (not useful yet)
pc_debug integer (0-5) larger values for more verbose
@@ -114,14 +129,14 @@ ray_debug integer Replaced with pc_debug
ray_mem_speed integer defaults to 500
-sniffer integer 0 = not sniffer (default)
- 1 = sniffer which can be used to record all
- network traffic using tcpdump or similar,
- but no normal network use is allowed.
+sniffer integer 0 = not sniffer (default),
+ 1 = sniffer which can be used to record all
+ network traffic using tcpdump or similar,
+ but no normal network use is allowed.
-translate integer 0 = no translation (encapsulate frames)
+translate integer 0 = no translation (encapsulate frames),
1 = translation (RFC1042/802.1)
-
+=============== =============== =============================================
More on sniffer mode:
@@ -136,7 +151,7 @@ package which parses the 802.11 headers.
Known Problems and missing features
- Does not work with non x86
+ Does not work with non x86
Does not work with SMP
diff --git a/Documentation/networking/rds.txt b/Documentation/networking/rds.rst
index eec61694e894..44936c27ab3a 100644
--- a/Documentation/networking/rds.txt
+++ b/Documentation/networking/rds.rst
@@ -1,3 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==
+RDS
+===
Overview
========
@@ -24,36 +29,39 @@ as IB.
The high-level semantics of RDS from the application's point of view are
* Addressing
- RDS uses IPv4 addresses and 16bit port numbers to identify
- the end point of a connection. All socket operations that involve
- passing addresses between kernel and user space generally
- use a struct sockaddr_in.
- The fact that IPv4 addresses are used does not mean the underlying
- transport has to be IP-based. In fact, RDS over IB uses a
- reliable IB connection; the IP address is used exclusively to
- locate the remote node's GID (by ARPing for the given IP).
+ RDS uses IPv4 addresses and 16bit port numbers to identify
+ the end point of a connection. All socket operations that involve
+ passing addresses between kernel and user space generally
+ use a struct sockaddr_in.
+
+ The fact that IPv4 addresses are used does not mean the underlying
+ transport has to be IP-based. In fact, RDS over IB uses a
+ reliable IB connection; the IP address is used exclusively to
+ locate the remote node's GID (by ARPing for the given IP).
- The port space is entirely independent of UDP, TCP or any other
- protocol.
+ The port space is entirely independent of UDP, TCP or any other
+ protocol.
* Socket interface
- RDS sockets work *mostly* as you would expect from a BSD
- socket. The next section will cover the details. At any rate,
- all I/O is performed through the standard BSD socket API.
- Some additions like zerocopy support are implemented through
- control messages, while other extensions use the getsockopt/
- setsockopt calls.
-
- Sockets must be bound before you can send or receive data.
- This is needed because binding also selects a transport and
- attaches it to the socket. Once bound, the transport assignment
- does not change. RDS will tolerate IPs moving around (eg in
- a active-active HA scenario), but only as long as the address
- doesn't move to a different transport.
+
+ RDS sockets work *mostly* as you would expect from a BSD
+ socket. The next section will cover the details. At any rate,
+ all I/O is performed through the standard BSD socket API.
+ Some additions like zerocopy support are implemented through
+ control messages, while other extensions use the getsockopt/
+ setsockopt calls.
+
+ Sockets must be bound before you can send or receive data.
+ This is needed because binding also selects a transport and
+ attaches it to the socket. Once bound, the transport assignment
+ does not change. RDS will tolerate IPs moving around (eg in
+ a active-active HA scenario), but only as long as the address
+ doesn't move to a different transport.
* sysctls
- RDS supports a number of sysctls in /proc/sys/net/rds
+
+ RDS supports a number of sysctls in /proc/sys/net/rds
Socket Interface
@@ -66,89 +74,88 @@ Socket Interface
options.
fd = socket(PF_RDS, SOCK_SEQPACKET, 0);
- This creates a new, unbound RDS socket.
+ This creates a new, unbound RDS socket.
setsockopt(SOL_SOCKET): send and receive buffer size
- RDS honors the send and receive buffer size socket options.
- You are not allowed to queue more than SO_SNDSIZE bytes to
- a socket. A message is queued when sendmsg is called, and
- it leaves the queue when the remote system acknowledges
- its arrival.
-
- The SO_RCVSIZE option controls the maximum receive queue length.
- This is a soft limit rather than a hard limit - RDS will
- continue to accept and queue incoming messages, even if that
- takes the queue length over the limit. However, it will also
- mark the port as "congested" and send a congestion update to
- the source node. The source node is supposed to throttle any
- processes sending to this congested port.
+ RDS honors the send and receive buffer size socket options.
+ You are not allowed to queue more than SO_SNDSIZE bytes to
+ a socket. A message is queued when sendmsg is called, and
+ it leaves the queue when the remote system acknowledges
+ its arrival.
+
+ The SO_RCVSIZE option controls the maximum receive queue length.
+ This is a soft limit rather than a hard limit - RDS will
+ continue to accept and queue incoming messages, even if that
+ takes the queue length over the limit. However, it will also
+ mark the port as "congested" and send a congestion update to
+ the source node. The source node is supposed to throttle any
+ processes sending to this congested port.
bind(fd, &sockaddr_in, ...)
- This binds the socket to a local IP address and port, and a
- transport, if one has not already been selected via the
+ This binds the socket to a local IP address and port, and a
+ transport, if one has not already been selected via the
SO_RDS_TRANSPORT socket option
sendmsg(fd, ...)
- Sends a message to the indicated recipient. The kernel will
- transparently establish the underlying reliable connection
- if it isn't up yet.
+ Sends a message to the indicated recipient. The kernel will
+ transparently establish the underlying reliable connection
+ if it isn't up yet.
- An attempt to send a message that exceeds SO_SNDSIZE will
- return with -EMSGSIZE
+ An attempt to send a message that exceeds SO_SNDSIZE will
+ return with -EMSGSIZE
- An attempt to send a message that would take the total number
- of queued bytes over the SO_SNDSIZE threshold will return
- EAGAIN.
+ An attempt to send a message that would take the total number
+ of queued bytes over the SO_SNDSIZE threshold will return
+ EAGAIN.
- An attempt to send a message to a destination that is marked
- as "congested" will return ENOBUFS.
+ An attempt to send a message to a destination that is marked
+ as "congested" will return ENOBUFS.
recvmsg(fd, ...)
- Receives a message that was queued to this socket. The sockets
- recv queue accounting is adjusted, and if the queue length
- drops below SO_SNDSIZE, the port is marked uncongested, and
- a congestion update is sent to all peers.
-
- Applications can ask the RDS kernel module to receive
- notifications via control messages (for instance, there is a
- notification when a congestion update arrived, or when a RDMA
- operation completes). These notifications are received through
- the msg.msg_control buffer of struct msghdr. The format of the
- messages is described in manpages.
+ Receives a message that was queued to this socket. The sockets
+ recv queue accounting is adjusted, and if the queue length
+ drops below SO_SNDSIZE, the port is marked uncongested, and
+ a congestion update is sent to all peers.
+
+ Applications can ask the RDS kernel module to receive
+ notifications via control messages (for instance, there is a
+ notification when a congestion update arrived, or when a RDMA
+ operation completes). These notifications are received through
+ the msg.msg_control buffer of struct msghdr. The format of the
+ messages is described in manpages.
poll(fd)
- RDS supports the poll interface to allow the application
- to implement async I/O.
+ RDS supports the poll interface to allow the application
+ to implement async I/O.
- POLLIN handling is pretty straightforward. When there's an
- incoming message queued to the socket, or a pending notification,
- we signal POLLIN.
+ POLLIN handling is pretty straightforward. When there's an
+ incoming message queued to the socket, or a pending notification,
+ we signal POLLIN.
- POLLOUT is a little harder. Since you can essentially send
- to any destination, RDS will always signal POLLOUT as long as
- there's room on the send queue (ie the number of bytes queued
- is less than the sendbuf size).
+ POLLOUT is a little harder. Since you can essentially send
+ to any destination, RDS will always signal POLLOUT as long as
+ there's room on the send queue (ie the number of bytes queued
+ is less than the sendbuf size).
- However, the kernel will refuse to accept messages to
- a destination marked congested - in this case you will loop
- forever if you rely on poll to tell you what to do.
- This isn't a trivial problem, but applications can deal with
- this - by using congestion notifications, and by checking for
- ENOBUFS errors returned by sendmsg.
+ However, the kernel will refuse to accept messages to
+ a destination marked congested - in this case you will loop
+ forever if you rely on poll to tell you what to do.
+ This isn't a trivial problem, but applications can deal with
+ this - by using congestion notifications, and by checking for
+ ENOBUFS errors returned by sendmsg.
setsockopt(SOL_RDS, RDS_CANCEL_SENT_TO, &sockaddr_in)
- This allows the application to discard all messages queued to a
- specific destination on this particular socket.
-
- This allows the application to cancel outstanding messages if
- it detects a timeout. For instance, if it tried to send a message,
- and the remote host is unreachable, RDS will keep trying forever.
- The application may decide it's not worth it, and cancel the
- operation. In this case, it would use RDS_CANCEL_SENT_TO to
- nuke any pending messages.
-
- setsockopt(fd, SOL_RDS, SO_RDS_TRANSPORT, (int *)&transport ..)
- getsockopt(fd, SOL_RDS, SO_RDS_TRANSPORT, (int *)&transport ..)
+ This allows the application to discard all messages queued to a
+ specific destination on this particular socket.
+
+ This allows the application to cancel outstanding messages if
+ it detects a timeout. For instance, if it tried to send a message,
+ and the remote host is unreachable, RDS will keep trying forever.
+ The application may decide it's not worth it, and cancel the
+ operation. In this case, it would use RDS_CANCEL_SENT_TO to
+ nuke any pending messages.
+
+ ``setsockopt(fd, SOL_RDS, SO_RDS_TRANSPORT, (int *)&transport ..), getsockopt(fd, SOL_RDS, SO_RDS_TRANSPORT, (int *)&transport ..)``
Set or read an integer defining the underlying
encapsulating transport to be used for RDS packets on the
socket. When setting the option, integer argument may be
@@ -180,32 +187,39 @@ RDS Protocol
Message header
The message header is a 'struct rds_header' (see rds.h):
+
Fields:
+
h_sequence:
- per-packet sequence number
+ per-packet sequence number
h_ack:
- piggybacked acknowledgment of last packet received
+ piggybacked acknowledgment of last packet received
h_len:
- length of data, not including header
+ length of data, not including header
h_sport:
- source port
+ source port
h_dport:
- destination port
+ destination port
h_flags:
- CONG_BITMAP - this is a congestion update bitmap
- ACK_REQUIRED - receiver must ack this packet
- RETRANSMITTED - packet has previously been sent
+ Can be:
+
+ ============= ==================================
+ CONG_BITMAP this is a congestion update bitmap
+ ACK_REQUIRED receiver must ack this packet
+ RETRANSMITTED packet has previously been sent
+ ============= ==================================
+
h_credit:
- indicate to other end of connection that
- it has more credits available (i.e. there is
- more send room)
+ indicate to other end of connection that
+ it has more credits available (i.e. there is
+ more send room)
h_padding[4]:
- unused, for future use
+ unused, for future use
h_csum:
- header checksum
+ header checksum
h_exthdr:
- optional data can be passed here. This is currently used for
- passing RDMA-related information.
+ optional data can be passed here. This is currently used for
+ passing RDMA-related information.
ACK and retransmit handling
@@ -260,7 +274,7 @@ RDS Protocol
RDS Transport Layer
-==================
+===================
As mentioned above, RDS is not IB-specific. Its code is divided
into a general RDS layer and a transport layer.
@@ -281,19 +295,25 @@ RDS Kernel Structures
be sent and sets header fields as needed, based on the socket API.
This is then queued for the individual connection and sent by the
connection's transport.
+
struct rds_incoming
a generic struct referring to incoming data that can be handed from
the transport to the general code and queued by the general code
while the socket is awoken. It is then passed back to the transport
code to handle the actual copy-to-user.
+
struct rds_socket
per-socket information
+
struct rds_connection
per-connection information
+
struct rds_transport
pointers to transport-specific functions
+
struct rds_statistics
non-transport-specific statistics
+
struct rds_cong_map
wraps the raw congestion bitmap, contains rbnode, waitq, etc.
@@ -317,53 +337,58 @@ The send path
=============
rds_sendmsg()
- struct rds_message built from incoming data
- CMSGs parsed (e.g. RDMA ops)
- transport connection alloced and connected if not already
- rds_message placed on send queue
- send worker awoken
+ - struct rds_message built from incoming data
+ - CMSGs parsed (e.g. RDMA ops)
+ - transport connection alloced and connected if not already
+ - rds_message placed on send queue
+ - send worker awoken
+
rds_send_worker()
- calls rds_send_xmit() until queue is empty
+ - calls rds_send_xmit() until queue is empty
+
rds_send_xmit()
- transmits congestion map if one is pending
- may set ACK_REQUIRED
- calls transport to send either non-RDMA or RDMA message
- (RDMA ops never retransmitted)
+ - transmits congestion map if one is pending
+ - may set ACK_REQUIRED
+ - calls transport to send either non-RDMA or RDMA message
+ (RDMA ops never retransmitted)
+
rds_ib_xmit()
- allocs work requests from send ring
- adds any new send credits available to peer (h_credits)
- maps the rds_message's sg list
- piggybacks ack
- populates work requests
- post send to connection's queue pair
+ - allocs work requests from send ring
+ - adds any new send credits available to peer (h_credits)
+ - maps the rds_message's sg list
+ - piggybacks ack
+ - populates work requests
+ - post send to connection's queue pair
The recv path
=============
rds_ib_recv_cq_comp_handler()
- looks at write completions
- unmaps recv buffer from device
- no errors, call rds_ib_process_recv()
- refill recv ring
+ - looks at write completions
+ - unmaps recv buffer from device
+ - no errors, call rds_ib_process_recv()
+ - refill recv ring
+
rds_ib_process_recv()
- validate header checksum
- copy header to rds_ib_incoming struct if start of a new datagram
- add to ibinc's fraglist
- if competed datagram:
- update cong map if datagram was cong update
- call rds_recv_incoming() otherwise
- note if ack is required
+ - validate header checksum
+ - copy header to rds_ib_incoming struct if start of a new datagram
+ - add to ibinc's fraglist
+ - if competed datagram:
+ - update cong map if datagram was cong update
+ - call rds_recv_incoming() otherwise
+ - note if ack is required
+
rds_recv_incoming()
- drop duplicate packets
- respond to pings
- find the sock associated with this datagram
- add to sock queue
- wake up sock
- do some congestion calculations
+ - drop duplicate packets
+ - respond to pings
+ - find the sock associated with this datagram
+ - add to sock queue
+ - wake up sock
+ - do some congestion calculations
rds_recvmsg
- copy data into user iovec
- handle CMSGs
- return to application
+ - copy data into user iovec
+ - handle CMSGs
+ - return to application
Multipath RDS (mprds)
=====================
diff --git a/Documentation/networking/regulatory.txt b/Documentation/networking/regulatory.rst
index 381e5b23d61d..8701b91e81ee 100644
--- a/Documentation/networking/regulatory.txt
+++ b/Documentation/networking/regulatory.rst
@@ -1,5 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================
Linux wireless regulatory documentation
----------------------------------------
+=======================================
This document gives a brief review over how the Linux wireless
regulatory infrastructure works.
@@ -57,7 +60,7 @@ Users can use iw:
http://wireless.kernel.org/en/users/Documentation/iw
-An example:
+An example::
# set regulatory domain to "Costa Rica"
iw reg set CR
@@ -104,9 +107,9 @@ Example code - drivers hinting an alpha2:
This example comes from the zd1211rw device driver. You can start
by having a mapping of your device's EEPROM country/regulatory
-domain value to a specific alpha2 as follows:
+domain value to a specific alpha2 as follows::
-static struct zd_reg_alpha2_map reg_alpha2_map[] = {
+ static struct zd_reg_alpha2_map reg_alpha2_map[] = {
{ ZD_REGDOMAIN_FCC, "US" },
{ ZD_REGDOMAIN_IC, "CA" },
{ ZD_REGDOMAIN_ETSI, "DE" }, /* Generic ETSI, use most restrictive */
@@ -116,10 +119,10 @@ static struct zd_reg_alpha2_map reg_alpha2_map[] = {
{ ZD_REGDOMAIN_FRANCE, "FR" },
Then you can define a routine to map your read EEPROM value to an alpha2,
-as follows:
+as follows::
-static int zd_reg2alpha2(u8 regdomain, char *alpha2)
-{
+ static int zd_reg2alpha2(u8 regdomain, char *alpha2)
+ {
unsigned int i;
struct zd_reg_alpha2_map *reg_map;
for (i = 0; i < ARRAY_SIZE(reg_alpha2_map); i++) {
@@ -131,12 +134,14 @@ static int zd_reg2alpha2(u8 regdomain, char *alpha2)
}
}
return 1;
-}
+ }
Lastly, you can then hint to the core of your discovered alpha2, if a match
was found. You need to do this after you have registered your wiphy. You
are expected to do this during initialization.
+::
+
r = zd_reg2alpha2(mac->regdomain, alpha2);
if (!r)
regulatory_hint(hw->wiphy, alpha2);
@@ -156,9 +161,9 @@ call regulatory_hint() with the regulatory domain structure in it.
Bellow is a simple example, with a regulatory domain cached using the stack.
Your implementation may vary (read EEPROM cache instead, for example).
-Example cache of some regulatory domain
+Example cache of some regulatory domain::
-struct ieee80211_regdomain mydriver_jp_regdom = {
+ struct ieee80211_regdomain mydriver_jp_regdom = {
.n_reg_rules = 3,
.alpha2 = "JP",
//.alpha2 = "99", /* If I have no alpha2 to map it to */
@@ -173,9 +178,9 @@ struct ieee80211_regdomain mydriver_jp_regdom = {
NL80211_RRF_NO_IR|
NL80211_RRF_DFS),
}
-};
+ };
-Then in some part of your code after your wiphy has been registered:
+Then in some part of your code after your wiphy has been registered::
struct ieee80211_regdomain *rd;
int size_of_regd;
diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.rst
index 180e07d956a7..68552b92dc44 100644
--- a/Documentation/networking/rxrpc.txt
+++ b/Documentation/networking/rxrpc.rst
@@ -1,6 +1,8 @@
- ======================
- RxRPC NETWORK PROTOCOL
- ======================
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+RxRPC Network Protocol
+======================
The RxRPC protocol driver provides a reliable two-phase transport on top of UDP
that can be used to perform RxRPC remote operations. This is done over sockets
@@ -9,36 +11,35 @@ receive data, aborts and errors.
Contents of this document:
- (*) Overview.
+ (#) Overview.
- (*) RxRPC protocol summary.
+ (#) RxRPC protocol summary.
- (*) AF_RXRPC driver model.
+ (#) AF_RXRPC driver model.
- (*) Control messages.
+ (#) Control messages.
- (*) Socket options.
+ (#) Socket options.
- (*) Security.
+ (#) Security.
- (*) Example client usage.
+ (#) Example client usage.
- (*) Example server usage.
+ (#) Example server usage.
- (*) AF_RXRPC kernel interface.
+ (#) AF_RXRPC kernel interface.
- (*) Configurable parameters.
+ (#) Configurable parameters.
-========
-OVERVIEW
+Overview
========
RxRPC is a two-layer protocol. There is a session layer which provides
reliable virtual connections using UDP over IPv4 (or IPv6) as the transport
layer, but implements a real network protocol; and there's the presentation
layer which renders structured data to binary blobs and back again using XDR
-(as does SunRPC):
+(as does SunRPC)::
+-------------+
| Application |
@@ -85,31 +86,30 @@ The Andrew File System (AFS) is an example of an application that uses this and
that has both kernel (filesystem) and userspace (utility) components.
-======================
-RXRPC PROTOCOL SUMMARY
+RxRPC Protocol Summary
======================
An overview of the RxRPC protocol:
- (*) RxRPC sits on top of another networking protocol (UDP is the only option
+ (#) RxRPC sits on top of another networking protocol (UDP is the only option
currently), and uses this to provide network transport. UDP ports, for
example, provide transport endpoints.
- (*) RxRPC supports multiple virtual "connections" from any given transport
+ (#) RxRPC supports multiple virtual "connections" from any given transport
endpoint, thus allowing the endpoints to be shared, even to the same
remote endpoint.
- (*) Each connection goes to a particular "service". A connection may not go
+ (#) Each connection goes to a particular "service". A connection may not go
to multiple services. A service may be considered the RxRPC equivalent of
a port number. AF_RXRPC permits multiple services to share an endpoint.
- (*) Client-originating packets are marked, thus a transport endpoint can be
+ (#) Client-originating packets are marked, thus a transport endpoint can be
shared between client and server connections (connections have a
direction).
- (*) Up to a billion connections may be supported concurrently between one
+ (#) Up to a billion connections may be supported concurrently between one
local transport endpoint and one service on one remote endpoint. An RxRPC
- connection is described by seven numbers:
+ connection is described by seven numbers::
Local address }
Local port } Transport (UDP) address
@@ -119,22 +119,22 @@ An overview of the RxRPC protocol:
Connection ID
Service ID
- (*) Each RxRPC operation is a "call". A connection may make up to four
+ (#) Each RxRPC operation is a "call". A connection may make up to four
billion calls, but only up to four calls may be in progress on a
connection at any one time.
- (*) Calls are two-phase and asymmetric: the client sends its request data,
+ (#) Calls are two-phase and asymmetric: the client sends its request data,
which the service receives; then the service transmits the reply data
which the client receives.
- (*) The data blobs are of indefinite size, the end of a phase is marked with a
+ (#) The data blobs are of indefinite size, the end of a phase is marked with a
flag in the packet. The number of packets of data making up one blob may
not exceed 4 billion, however, as this would cause the sequence number to
wrap.
- (*) The first four bytes of the request data are the service operation ID.
+ (#) The first four bytes of the request data are the service operation ID.
- (*) Security is negotiated on a per-connection basis. The connection is
+ (#) Security is negotiated on a per-connection basis. The connection is
initiated by the first data packet on it arriving. If security is
requested, the server then issues a "challenge" and then the client
replies with a "response". If the response is successful, the security is
@@ -143,146 +143,145 @@ An overview of the RxRPC protocol:
connection lapse before the client, the security will be renegotiated if
the client uses the connection again.
- (*) Calls use ACK packets to handle reliability. Data packets are also
+ (#) Calls use ACK packets to handle reliability. Data packets are also
explicitly sequenced per call.
- (*) There are two types of positive acknowledgment: hard-ACKs and soft-ACKs.
+ (#) There are two types of positive acknowledgment: hard-ACKs and soft-ACKs.
A hard-ACK indicates to the far side that all the data received to a point
has been received and processed; a soft-ACK indicates that the data has
been received but may yet be discarded and re-requested. The sender may
not discard any transmittable packets until they've been hard-ACK'd.
- (*) Reception of a reply data packet implicitly hard-ACK's all the data
+ (#) Reception of a reply data packet implicitly hard-ACK's all the data
packets that make up the request.
- (*) An call is complete when the request has been sent, the reply has been
+ (#) An call is complete when the request has been sent, the reply has been
received and the final hard-ACK on the last packet of the reply has
reached the server.
- (*) An call may be aborted by either end at any time up to its completion.
+ (#) An call may be aborted by either end at any time up to its completion.
-=====================
-AF_RXRPC DRIVER MODEL
+AF_RXRPC Driver Model
=====================
About the AF_RXRPC driver:
- (*) The AF_RXRPC protocol transparently uses internal sockets of the transport
+ (#) The AF_RXRPC protocol transparently uses internal sockets of the transport
protocol to represent transport endpoints.
- (*) AF_RXRPC sockets map onto RxRPC connection bundles. Actual RxRPC
+ (#) AF_RXRPC sockets map onto RxRPC connection bundles. Actual RxRPC
connections are handled transparently. One client socket may be used to
make multiple simultaneous calls to the same service. One server socket
may handle calls from many clients.
- (*) Additional parallel client connections will be initiated to support extra
+ (#) Additional parallel client connections will be initiated to support extra
concurrent calls, up to a tunable limit.
- (*) Each connection is retained for a certain amount of time [tunable] after
+ (#) Each connection is retained for a certain amount of time [tunable] after
the last call currently using it has completed in case a new call is made
that could reuse it.
- (*) Each internal UDP socket is retained [tunable] for a certain amount of
+ (#) Each internal UDP socket is retained [tunable] for a certain amount of
time [tunable] after the last connection using it discarded, in case a new
connection is made that could use it.
- (*) A client-side connection is only shared between calls if they have have
+ (#) A client-side connection is only shared between calls if they have have
the same key struct describing their security (and assuming the calls
would otherwise share the connection). Non-secured calls would also be
able to share connections with each other.
- (*) A server-side connection is shared if the client says it is.
+ (#) A server-side connection is shared if the client says it is.
- (*) ACK'ing is handled by the protocol driver automatically, including ping
+ (#) ACK'ing is handled by the protocol driver automatically, including ping
replying.
- (*) SO_KEEPALIVE automatically pings the other side to keep the connection
+ (#) SO_KEEPALIVE automatically pings the other side to keep the connection
alive [TODO].
- (*) If an ICMP error is received, all calls affected by that error will be
+ (#) If an ICMP error is received, all calls affected by that error will be
aborted with an appropriate network error passed through recvmsg().
Interaction with the user of the RxRPC socket:
- (*) A socket is made into a server socket by binding an address with a
+ (#) A socket is made into a server socket by binding an address with a
non-zero service ID.
- (*) In the client, sending a request is achieved with one or more sendmsgs,
+ (#) In the client, sending a request is achieved with one or more sendmsgs,
followed by the reply being received with one or more recvmsgs.
- (*) The first sendmsg for a request to be sent from a client contains a tag to
+ (#) The first sendmsg for a request to be sent from a client contains a tag to
be used in all other sendmsgs or recvmsgs associated with that call. The
tag is carried in the control data.
- (*) connect() is used to supply a default destination address for a client
+ (#) connect() is used to supply a default destination address for a client
socket. This may be overridden by supplying an alternate address to the
first sendmsg() of a call (struct msghdr::msg_name).
- (*) If connect() is called on an unbound client, a random local port will
+ (#) If connect() is called on an unbound client, a random local port will
bound before the operation takes place.
- (*) A server socket may also be used to make client calls. To do this, the
+ (#) A server socket may also be used to make client calls. To do this, the
first sendmsg() of the call must specify the target address. The server's
transport endpoint is used to send the packets.
- (*) Once the application has received the last message associated with a call,
+ (#) Once the application has received the last message associated with a call,
the tag is guaranteed not to be seen again, and so it can be used to pin
client resources. A new call can then be initiated with the same tag
without fear of interference.
- (*) In the server, a request is received with one or more recvmsgs, then the
+ (#) In the server, a request is received with one or more recvmsgs, then the
the reply is transmitted with one or more sendmsgs, and then the final ACK
is received with a last recvmsg.
- (*) When sending data for a call, sendmsg is given MSG_MORE if there's more
+ (#) When sending data for a call, sendmsg is given MSG_MORE if there's more
data to come on that call.
- (*) When receiving data for a call, recvmsg flags MSG_MORE if there's more
+ (#) When receiving data for a call, recvmsg flags MSG_MORE if there's more
data to come for that call.
- (*) When receiving data or messages for a call, MSG_EOR is flagged by recvmsg
+ (#) When receiving data or messages for a call, MSG_EOR is flagged by recvmsg
to indicate the terminal message for that call.
- (*) A call may be aborted by adding an abort control message to the control
+ (#) A call may be aborted by adding an abort control message to the control
data. Issuing an abort terminates the kernel's use of that call's tag.
Any messages waiting in the receive queue for that call will be discarded.
- (*) Aborts, busy notifications and challenge packets are delivered by recvmsg,
+ (#) Aborts, busy notifications and challenge packets are delivered by recvmsg,
and control data messages will be set to indicate the context. Receiving
an abort or a busy message terminates the kernel's use of that call's tag.
- (*) The control data part of the msghdr struct is used for a number of things:
+ (#) The control data part of the msghdr struct is used for a number of things:
- (*) The tag of the intended or affected call.
+ (#) The tag of the intended or affected call.
- (*) Sending or receiving errors, aborts and busy notifications.
+ (#) Sending or receiving errors, aborts and busy notifications.
- (*) Notifications of incoming calls.
+ (#) Notifications of incoming calls.
- (*) Sending debug requests and receiving debug replies [TODO].
+ (#) Sending debug requests and receiving debug replies [TODO].
- (*) When the kernel has received and set up an incoming call, it sends a
+ (#) When the kernel has received and set up an incoming call, it sends a
message to server application to let it know there's a new call awaiting
its acceptance [recvmsg reports a special control message]. The server
application then uses sendmsg to assign a tag to the new call. Once that
is done, the first part of the request data will be delivered by recvmsg.
- (*) The server application has to provide the server socket with a keyring of
+ (#) The server application has to provide the server socket with a keyring of
secret keys corresponding to the security types it permits. When a secure
connection is being set up, the kernel looks up the appropriate secret key
in the keyring and then sends a challenge packet to the client and
receives a response packet. The kernel then checks the authorisation of
the packet and either aborts the connection or sets up the security.
- (*) The name of the key a client will use to secure its communications is
+ (#) The name of the key a client will use to secure its communications is
nominated by a socket option.
Notes on sendmsg:
- (*) MSG_WAITALL can be set to tell sendmsg to ignore signals if the peer is
+ (#) MSG_WAITALL can be set to tell sendmsg to ignore signals if the peer is
making progress at accepting packets within a reasonable time such that we
manage to queue up all the data for transmission. This requires the
client to accept at least one packet per 2*RTT time period.
@@ -294,7 +293,7 @@ Notes on sendmsg:
Notes on recvmsg:
- (*) If there's a sequence of data messages belonging to a particular call on
+ (#) If there's a sequence of data messages belonging to a particular call on
the receive queue, then recvmsg will keep working through them until:
(a) it meets the end of that call's received data,
@@ -320,13 +319,13 @@ Notes on recvmsg:
flagged.
-================
-CONTROL MESSAGES
+Control Messages
================
AF_RXRPC makes use of control messages in sendmsg() and recvmsg() to multiplex
calls, to invoke certain actions and to report certain conditions. These are:
+ ======================= === =========== ===============================
MESSAGE ID SRT DATA MEANING
======================= === =========== ===============================
RXRPC_USER_CALL_ID sr- User ID App's call specifier
@@ -340,10 +339,11 @@ calls, to invoke certain actions and to report certain conditions. These are:
RXRPC_EXCLUSIVE_CALL s-- n/a Make an exclusive client call
RXRPC_UPGRADE_SERVICE s-- n/a Client call can be upgraded
RXRPC_TX_LENGTH s-- data len Total length of Tx data
+ ======================= === =========== ===============================
(SRT = usable in Sendmsg / delivered by Recvmsg / Terminal message)
- (*) RXRPC_USER_CALL_ID
+ (#) RXRPC_USER_CALL_ID
This is used to indicate the application's call ID. It's an unsigned long
that the app specifies in the client by attaching it to the first data
@@ -351,7 +351,7 @@ calls, to invoke certain actions and to report certain conditions. These are:
message. recvmsg() passes it in conjunction with all messages except
those of the RXRPC_NEW_CALL message.
- (*) RXRPC_ABORT
+ (#) RXRPC_ABORT
This is can be used by an application to abort a call by passing it to
sendmsg, or it can be delivered by recvmsg to indicate a remote abort was
@@ -359,13 +359,13 @@ calls, to invoke certain actions and to report certain conditions. These are:
specify the call affected. If an abort is being sent, then error EBADSLT
will be returned if there is no call with that user ID.
- (*) RXRPC_ACK
+ (#) RXRPC_ACK
This is delivered to a server application to indicate that the final ACK
of a call was received from the client. It will be associated with an
RXRPC_USER_CALL_ID to indicate the call that's now complete.
- (*) RXRPC_NET_ERROR
+ (#) RXRPC_NET_ERROR
This is delivered to an application to indicate that an ICMP error message
was encountered in the process of trying to talk to the peer. An
@@ -373,13 +373,13 @@ calls, to invoke certain actions and to report certain conditions. These are:
indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call
affected.
- (*) RXRPC_BUSY
+ (#) RXRPC_BUSY
This is delivered to a client application to indicate that a call was
rejected by the server due to the server being busy. It will be
associated with an RXRPC_USER_CALL_ID to indicate the rejected call.
- (*) RXRPC_LOCAL_ERROR
+ (#) RXRPC_LOCAL_ERROR
This is delivered to an application to indicate that a local error was
encountered and that a call has been aborted because of it. An
@@ -387,13 +387,13 @@ calls, to invoke certain actions and to report certain conditions. These are:
indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call
affected.
- (*) RXRPC_NEW_CALL
+ (#) RXRPC_NEW_CALL
This is delivered to indicate to a server application that a new call has
arrived and is awaiting acceptance. No user ID is associated with this,
as a user ID must subsequently be assigned by doing an RXRPC_ACCEPT.
- (*) RXRPC_ACCEPT
+ (#) RXRPC_ACCEPT
This is used by a server application to attempt to accept a call and
assign it a user ID. It should be associated with an RXRPC_USER_CALL_ID
@@ -402,12 +402,12 @@ calls, to invoke certain actions and to report certain conditions. These are:
return error ENODATA. If the user ID is already in use by another call,
then error EBADSLT will be returned.
- (*) RXRPC_EXCLUSIVE_CALL
+ (#) RXRPC_EXCLUSIVE_CALL
This is used to indicate that a client call should be made on a one-off
connection. The connection is discarded once the call has terminated.
- (*) RXRPC_UPGRADE_SERVICE
+ (#) RXRPC_UPGRADE_SERVICE
This is used to make a client call to probe if the specified service ID
may be upgraded by the server. The caller must check msg_name returned to
@@ -419,7 +419,7 @@ calls, to invoke certain actions and to report certain conditions. These are:
future communication to that server and RXRPC_UPGRADE_SERVICE should no
longer be set.
- (*) RXRPC_TX_LENGTH
+ (#) RXRPC_TX_LENGTH
This is used to inform the kernel of the total amount of data that is
going to be transmitted by a call (whether in a client request or a
@@ -443,7 +443,7 @@ SOCKET OPTIONS
AF_RXRPC sockets support a few socket options at the SOL_RXRPC level:
- (*) RXRPC_SECURITY_KEY
+ (#) RXRPC_SECURITY_KEY
This is used to specify the description of the key to be used. The key is
extracted from the calling process's keyrings with request_key() and
@@ -452,17 +452,17 @@ AF_RXRPC sockets support a few socket options at the SOL_RXRPC level:
The optval pointer points to the description string, and optlen indicates
how long the string is, without the NUL terminator.
- (*) RXRPC_SECURITY_KEYRING
+ (#) RXRPC_SECURITY_KEYRING
Similar to above but specifies a keyring of server secret keys to use (key
type "keyring"). See the "Security" section.
- (*) RXRPC_EXCLUSIVE_CONNECTION
+ (#) RXRPC_EXCLUSIVE_CONNECTION
This is used to request that new connections should be used for each call
made subsequently on this socket. optval should be NULL and optlen 0.
- (*) RXRPC_MIN_SECURITY_LEVEL
+ (#) RXRPC_MIN_SECURITY_LEVEL
This is used to specify the minimum security level required for calls on
this socket. optval must point to an int containing one of the following
@@ -477,19 +477,19 @@ AF_RXRPC sockets support a few socket options at the SOL_RXRPC level:
Encrypted checksum plus packet padded and first eight bytes of packet
encrypted - which includes the actual packet length.
- (c) RXRPC_SECURITY_ENCRYPTED
+ (c) RXRPC_SECURITY_ENCRYPT
Encrypted checksum plus entire packet padded and encrypted, including
actual packet length.
- (*) RXRPC_UPGRADEABLE_SERVICE
+ (#) RXRPC_UPGRADEABLE_SERVICE
This is used to indicate that a service socket with two bindings may
upgrade one bound service to the other if requested by the client. optval
must point to an array of two unsigned short ints. The first is the
service ID to upgrade from and the second the service ID to upgrade to.
- (*) RXRPC_SUPPORTED_CMSG
+ (#) RXRPC_SUPPORTED_CMSG
This is a read-only option that writes an int into the buffer indicating
the highest control message type supported.
@@ -509,7 +509,7 @@ found at:
http://people.redhat.com/~dhowells/rxrpc/klog.c
The payload provided to add_key() on the client should be of the following
-form:
+form::
struct rxrpc_key_sec2_v1 {
uint16_t security_index; /* 2 */
@@ -546,14 +546,14 @@ EXAMPLE CLIENT USAGE
A client would issue an operation by:
- (1) An RxRPC socket is set up by:
+ (1) An RxRPC socket is set up by::
client = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
Where the third parameter indicates the protocol family of the transport
socket used - usually IPv4 but it can also be IPv6 [TODO].
- (2) A local address can optionally be bound:
+ (2) A local address can optionally be bound::
struct sockaddr_rxrpc srx = {
.srx_family = AF_RXRPC,
@@ -570,20 +570,20 @@ A client would issue an operation by:
several unrelated RxRPC sockets. Security is handled on a basis of
per-RxRPC virtual connection.
- (3) The security is set:
+ (3) The security is set::
const char *key = "AFS:cambridge.redhat.com";
setsockopt(client, SOL_RXRPC, RXRPC_SECURITY_KEY, key, strlen(key));
This issues a request_key() to get the key representing the security
- context. The minimum security level can be set:
+ context. The minimum security level can be set::
- unsigned int sec = RXRPC_SECURITY_ENCRYPTED;
+ unsigned int sec = RXRPC_SECURITY_ENCRYPT;
setsockopt(client, SOL_RXRPC, RXRPC_MIN_SECURITY_LEVEL,
&sec, sizeof(sec));
(4) The server to be contacted can then be specified (alternatively this can
- be done through sendmsg):
+ be done through sendmsg)::
struct sockaddr_rxrpc srx = {
.srx_family = AF_RXRPC,
@@ -598,7 +598,9 @@ A client would issue an operation by:
(5) The request data should then be posted to the server socket using a series
of sendmsg() calls, each with the following control message attached:
- RXRPC_USER_CALL_ID - specifies the user ID for this call
+ ================== ===================================
+ RXRPC_USER_CALL_ID specifies the user ID for this call
+ ================== ===================================
MSG_MORE should be set in msghdr::msg_flags on all but the last part of
the request. Multiple requests may be made simultaneously.
@@ -635,13 +637,12 @@ any more calls (further calls to the same destination will be blocked until the
probe is concluded).
-====================
-EXAMPLE SERVER USAGE
+Example Server Usage
====================
A server would be set up to accept operations in the following manner:
- (1) An RxRPC socket is created by:
+ (1) An RxRPC socket is created by::
server = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
@@ -649,7 +650,7 @@ A server would be set up to accept operations in the following manner:
socket used - usually IPv4.
(2) Security is set up if desired by giving the socket a keyring with server
- secret keys in it:
+ secret keys in it::
keyring = add_key("keyring", "AFSkeys", NULL, 0,
KEY_SPEC_PROCESS_KEYRING);
@@ -663,7 +664,7 @@ A server would be set up to accept operations in the following manner:
The keyring can be manipulated after it has been given to the socket. This
permits the server to add more keys, replace keys, etc. while it is live.
- (3) A local address must then be bound:
+ (3) A local address must then be bound::
struct sockaddr_rxrpc srx = {
.srx_family = AF_RXRPC,
@@ -680,7 +681,7 @@ A server would be set up to accept operations in the following manner:
should be called twice.
(4) If service upgrading is required, first two service IDs must have been
- bound and then the following option must be set:
+ bound and then the following option must be set::
unsigned short service_ids[2] = { from_ID, to_ID };
setsockopt(server, SOL_RXRPC, RXRPC_UPGRADEABLE_SERVICE,
@@ -690,14 +691,14 @@ A server would be set up to accept operations in the following manner:
to_ID if they request it. This will be reflected in msg_name obtained
through recvmsg() when the request data is delivered to userspace.
- (5) The server is then set to listen out for incoming calls:
+ (5) The server is then set to listen out for incoming calls::
listen(server, 100);
(6) The kernel notifies the server of pending incoming connections by sending
it a message for each. This is received with recvmsg() on the server
socket. It has no data, and has a single dataless control message
- attached:
+ attached::
RXRPC_NEW_CALL
@@ -709,8 +710,10 @@ A server would be set up to accept operations in the following manner:
(7) The server then accepts the new call by issuing a sendmsg() with two
pieces of control data and no actual data:
- RXRPC_ACCEPT - indicate connection acceptance
- RXRPC_USER_CALL_ID - specify user ID for this call
+ ================== ==============================
+ RXRPC_ACCEPT indicate connection acceptance
+ RXRPC_USER_CALL_ID specify user ID for this call
+ ================== ==============================
(8) The first request data packet will then be posted to the server socket for
recvmsg() to pick up. At that point, the RxRPC address for the call can
@@ -722,12 +725,17 @@ A server would be set up to accept operations in the following manner:
All data will be delivered with the following control message attached:
- RXRPC_USER_CALL_ID - specifies the user ID for this call
+
+ ================== ===================================
+ RXRPC_USER_CALL_ID specifies the user ID for this call
+ ================== ===================================
(9) The reply data should then be posted to the server socket using a series
of sendmsg() calls, each with the following control messages attached:
- RXRPC_USER_CALL_ID - specifies the user ID for this call
+ ================== ===================================
+ RXRPC_USER_CALL_ID specifies the user ID for this call
+ ================== ===================================
MSG_MORE should be set in msghdr::msg_flags on all but the last message
for a particular call.
@@ -736,8 +744,10 @@ A server would be set up to accept operations in the following manner:
when it is received. It will take the form of a dataless message with two
control messages attached:
- RXRPC_USER_CALL_ID - specifies the user ID for this call
- RXRPC_ACK - indicates final ACK (no data)
+ ================== ===================================
+ RXRPC_USER_CALL_ID specifies the user ID for this call
+ RXRPC_ACK indicates final ACK (no data)
+ ================== ===================================
MSG_EOR will be flagged to indicate that this is the final message for
this call.
@@ -746,8 +756,10 @@ A server would be set up to accept operations in the following manner:
aborted by calling sendmsg() with a dataless message with the following
control messages attached:
- RXRPC_USER_CALL_ID - specifies the user ID for this call
- RXRPC_ABORT - indicates abort code (4 byte data)
+ ================== ===================================
+ RXRPC_USER_CALL_ID specifies the user ID for this call
+ RXRPC_ABORT indicates abort code (4 byte data)
+ ================== ===================================
Any packets waiting in the socket's receive queue will be discarded if
this is issued.
@@ -757,8 +769,7 @@ the one server socket, using control messages on sendmsg() and recvmsg() to
determine the call affected.
-=========================
-AF_RXRPC KERNEL INTERFACE
+AF_RXRPC Kernel Interface
=========================
The AF_RXRPC module also provides an interface for use by in-kernel utilities
@@ -786,7 +797,7 @@ then it passes this to the kernel interface functions.
The kernel interface functions are as follows:
- (*) Begin a new client call.
+ (#) Begin a new client call::
struct rxrpc_call *
rxrpc_kernel_begin_call(struct socket *sock,
@@ -837,7 +848,7 @@ The kernel interface functions are as follows:
returned. The caller now holds a reference on this and it must be
properly ended.
- (*) End a client call.
+ (#) End a client call::
void rxrpc_kernel_end_call(struct socket *sock,
struct rxrpc_call *call);
@@ -846,7 +857,7 @@ The kernel interface functions are as follows:
from AF_RXRPC's knowledge and will not be seen again in association with
the specified call.
- (*) Send data through a call.
+ (#) Send data through a call::
typedef void (*rxrpc_notify_end_tx_t)(struct sock *sk,
unsigned long user_call_ID,
@@ -872,7 +883,7 @@ The kernel interface functions are as follows:
called with the call-state spinlock held to prevent any reply or final ACK
from being delivered first.
- (*) Receive data from a call.
+ (#) Receive data from a call::
int rxrpc_kernel_recv_data(struct socket *sock,
struct rxrpc_call *call,
@@ -902,12 +913,14 @@ The kernel interface functions are as follows:
more data was available, EMSGSIZE is returned.
If a remote ABORT is detected, the abort code received will be stored in
- *_abort and ECONNABORTED will be returned.
+ ``*_abort`` and ECONNABORTED will be returned.
The service ID that the call ended up with is returned into *_service.
This can be used to see if a call got a service upgrade.
- (*) Abort a call.
+ (#) Abort a call??
+
+ ::
void rxrpc_kernel_abort_call(struct socket *sock,
struct rxrpc_call *call,
@@ -916,7 +929,7 @@ The kernel interface functions are as follows:
This is used to abort a call if it's still in an abortable state. The
abort code specified will be placed in the ABORT message sent.
- (*) Intercept received RxRPC messages.
+ (#) Intercept received RxRPC messages::
typedef void (*rxrpc_interceptor_t)(struct sock *sk,
unsigned long user_call_ID,
@@ -937,7 +950,8 @@ The kernel interface functions are as follows:
The skb->mark field indicates the type of message:
- MARK MEANING
+ =============================== =======================================
+ Mark Meaning
=============================== =======================================
RXRPC_SKB_MARK_DATA Data message
RXRPC_SKB_MARK_FINAL_ACK Final ACK received for an incoming call
@@ -946,6 +960,7 @@ The kernel interface functions are as follows:
RXRPC_SKB_MARK_NET_ERROR Network error detected
RXRPC_SKB_MARK_LOCAL_ERROR Local error encountered
RXRPC_SKB_MARK_NEW_CALL New incoming call awaiting acceptance
+ =============================== =======================================
The remote abort message can be probed with rxrpc_kernel_get_abort_code().
The two error messages can be probed with rxrpc_kernel_get_error_number().
@@ -961,7 +976,7 @@ The kernel interface functions are as follows:
is possible to get extra refs on all types of message for later freeing,
but this may pin the state of a call until the message is finally freed.
- (*) Accept an incoming call.
+ (#) Accept an incoming call::
struct rxrpc_call *
rxrpc_kernel_accept_call(struct socket *sock,
@@ -975,7 +990,7 @@ The kernel interface functions are as follows:
returned. The caller now holds a reference on this and it must be
properly ended.
- (*) Reject an incoming call.
+ (#) Reject an incoming call::
int rxrpc_kernel_reject_call(struct socket *sock);
@@ -984,21 +999,21 @@ The kernel interface functions are as follows:
Other errors may be returned if the call had been aborted (-ECONNABORTED)
or had timed out (-ETIME).
- (*) Allocate a null key for doing anonymous security.
+ (#) Allocate a null key for doing anonymous security::
struct key *rxrpc_get_null_key(const char *keyname);
This is used to allocate a null RxRPC key that can be used to indicate
anonymous security for a particular domain.
- (*) Get the peer address of a call.
+ (#) Get the peer address of a call::
void rxrpc_kernel_get_peer(struct socket *sock, struct rxrpc_call *call,
struct sockaddr_rxrpc *_srx);
This is used to find the remote peer address of a call.
- (*) Set the total transmit data size on a call.
+ (#) Set the total transmit data size on a call::
void rxrpc_kernel_set_tx_length(struct socket *sock,
struct rxrpc_call *call,
@@ -1009,14 +1024,14 @@ The kernel interface functions are as follows:
size should be set when the call is begun. tx_total_len may not be less
than zero.
- (*) Get call RTT.
+ (#) Get call RTT::
u64 rxrpc_kernel_get_rtt(struct socket *sock, struct rxrpc_call *call);
Get the RTT time to the peer in use by a call. The value returned is in
nanoseconds.
- (*) Check call still alive.
+ (#) Check call still alive::
bool rxrpc_kernel_check_life(struct socket *sock,
struct rxrpc_call *call,
@@ -1024,7 +1039,7 @@ The kernel interface functions are as follows:
void rxrpc_kernel_probe_life(struct socket *sock,
struct rxrpc_call *call);
- The first function passes back in *_life a number that is updated when
+ The first function passes back in ``*_life`` a number that is updated when
ACKs are received from the peer (notably including PING RESPONSE ACKs
which we can elicit by sending PING ACKs to see if the call still exists
on the server). The caller should compare the numbers of two calls to see
@@ -1040,7 +1055,7 @@ The kernel interface functions are as follows:
first function to change. Note that this must be called in TASK_RUNNING
state.
- (*) Get reply timestamp.
+ (#) Get reply timestamp::
bool rxrpc_kernel_get_reply_time(struct socket *sock,
struct rxrpc_call *call,
@@ -1048,10 +1063,10 @@ The kernel interface functions are as follows:
This allows the timestamp on the first DATA packet of the reply of a
client call to be queried, provided that it is still in the Rx ring. If
- successful, the timestamp will be stored into *_ts and true will be
+ successful, the timestamp will be stored into ``*_ts`` and true will be
returned; false will be returned otherwise.
- (*) Get remote client epoch.
+ (#) Get remote client epoch::
u32 rxrpc_kernel_get_epoch(struct socket *sock,
struct rxrpc_call *call)
@@ -1065,7 +1080,7 @@ The kernel interface functions are as follows:
This value can be used to determine if the remote client has been
restarted as it shouldn't change otherwise.
- (*) Set the maxmimum lifespan on a call.
+ (#) Set the maxmimum lifespan on a call::
void rxrpc_kernel_set_max_life(struct socket *sock,
struct rxrpc_call *call,
@@ -1075,15 +1090,23 @@ The kernel interface functions are as follows:
jiffies). In the event of the timeout occurring, the call will be
aborted and -ETIME or -ETIMEDOUT will be returned.
+ (#) Apply the RXRPC_MIN_SECURITY_LEVEL sockopt to a socket from within in the
+ kernel::
-=======================
-CONFIGURABLE PARAMETERS
+ int rxrpc_sock_set_min_security_level(struct sock *sk,
+ unsigned int val);
+
+ This specifies the minimum security level required for calls on this
+ socket.
+
+
+Configurable Parameters
=======================
The RxRPC protocol driver has a number of configurable parameters that can be
adjusted through sysctls in /proc/net/rxrpc/:
- (*) req_ack_delay
+ (#) req_ack_delay
The amount of time in milliseconds after receiving a packet with the
request-ack flag set before we honour the flag and actually send the
@@ -1093,60 +1116,60 @@ adjusted through sysctls in /proc/net/rxrpc/:
reception window is full (to a maximum of 255 packets), so delaying the
ACK permits several packets to be ACK'd in one go.
- (*) soft_ack_delay
+ (#) soft_ack_delay
The amount of time in milliseconds after receiving a new packet before we
generate a soft-ACK to tell the sender that it doesn't need to resend.
- (*) idle_ack_delay
+ (#) idle_ack_delay
The amount of time in milliseconds after all the packets currently in the
received queue have been consumed before we generate a hard-ACK to tell
the sender it can free its buffers, assuming no other reason occurs that
we would send an ACK.
- (*) resend_timeout
+ (#) resend_timeout
The amount of time in milliseconds after transmitting a packet before we
transmit it again, assuming no ACK is received from the receiver telling
us they got it.
- (*) max_call_lifetime
+ (#) max_call_lifetime
The maximum amount of time in seconds that a call may be in progress
before we preemptively kill it.
- (*) dead_call_expiry
+ (#) dead_call_expiry
The amount of time in seconds before we remove a dead call from the call
list. Dead calls are kept around for a little while for the purpose of
repeating ACK and ABORT packets.
- (*) connection_expiry
+ (#) connection_expiry
The amount of time in seconds after a connection was last used before we
remove it from the connection list. While a connection is in existence,
it serves as a placeholder for negotiated security; when it is deleted,
the security must be renegotiated.
- (*) transport_expiry
+ (#) transport_expiry
The amount of time in seconds after a transport was last used before we
remove it from the transport list. While a transport is in existence, it
serves to anchor the peer data and keeps the connection ID counter.
- (*) rxrpc_rx_window_size
+ (#) rxrpc_rx_window_size
The size of the receive window in packets. This is the maximum number of
unconsumed received packets we're willing to hold in memory for any
particular call.
- (*) rxrpc_rx_mtu
+ (#) rxrpc_rx_mtu
The maximum packet MTU size that we're willing to receive in bytes. This
indicates to the peer whether we're willing to accept jumbo packets.
- (*) rxrpc_rx_jumbo_max
+ (#) rxrpc_rx_jumbo_max
The maximum number of packets that we're willing to accept in a jumbo
packet. Non-terminal packets in a jumbo packet must contain a four byte
diff --git a/Documentation/networking/scaling.rst b/Documentation/networking/scaling.rst
index f78d7bf27ff5..8f0347b9fb3d 100644
--- a/Documentation/networking/scaling.rst
+++ b/Documentation/networking/scaling.rst
@@ -81,7 +81,7 @@ of queues to IRQs can be determined from /proc/interrupts. By default,
an IRQ may be handled on any CPU. Because a non-negligible part of packet
processing takes place in receive interrupt handling, it is advantageous
to spread receive interrupts between CPUs. To manually adjust the IRQ
-affinity of each interrupt see Documentation/IRQ-affinity.txt. Some systems
+affinity of each interrupt see Documentation/core-api/irq/irq-affinity.rst. Some systems
will be running irqbalance, a daemon that dynamically optimizes IRQ
assignments and as a result may override any manual settings.
@@ -160,7 +160,7 @@ can be configured for each receive queue using a sysfs file entry::
This file implements a bitmap of CPUs. RPS is disabled when it is zero
(the default), in which case packets are processed on the interrupting
-CPU. Documentation/IRQ-affinity.txt explains how CPUs are assigned to
+CPU. Documentation/core-api/irq/irq-affinity.rst explains how CPUs are assigned to
the bitmap.
diff --git a/Documentation/networking/sctp.txt b/Documentation/networking/sctp.rst
index 97b810ca9082..9f4d9c8a925b 100644
--- a/Documentation/networking/sctp.txt
+++ b/Documentation/networking/sctp.rst
@@ -1,35 +1,42 @@
-Linux Kernel SCTP
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+Linux Kernel SCTP
+=================
This is the current BETA release of the Linux Kernel SCTP reference
-implementation.
+implementation.
SCTP (Stream Control Transmission Protocol) is a IP based, message oriented,
reliable transport protocol, with congestion control, support for
transparent multi-homing, and multiple ordered streams of messages.
RFC2960 defines the core protocol. The IETF SIGTRAN working group originally
-developed the SCTP protocol and later handed the protocol over to the
-Transport Area (TSVWG) working group for the continued evolvement of SCTP as a
-general purpose transport.
+developed the SCTP protocol and later handed the protocol over to the
+Transport Area (TSVWG) working group for the continued evolvement of SCTP as a
+general purpose transport.
-See the IETF website (http://www.ietf.org) for further documents on SCTP.
-See http://www.ietf.org/rfc/rfc2960.txt
+See the IETF website (http://www.ietf.org) for further documents on SCTP.
+See http://www.ietf.org/rfc/rfc2960.txt
The initial project goal is to create an Linux kernel reference implementation
-of SCTP that is RFC 2960 compliant and provides an programming interface
-referred to as the UDP-style API of the Sockets Extensions for SCTP, as
-proposed in IETF Internet-Drafts.
+of SCTP that is RFC 2960 compliant and provides an programming interface
+referred to as the UDP-style API of the Sockets Extensions for SCTP, as
+proposed in IETF Internet-Drafts.
-Caveats:
+Caveats
+=======
--lksctp can be built as statically or as a module. However, be aware that
-module removal of lksctp is not yet a safe activity.
+- lksctp can be built as statically or as a module. However, be aware that
+ module removal of lksctp is not yet a safe activity.
--There is tentative support for IPv6, but most work has gone towards
-implementation and testing lksctp on IPv4.
+- There is tentative support for IPv6, but most work has gone towards
+ implementation and testing lksctp on IPv4.
For more information, please visit the lksctp project website:
+
http://www.sf.net/projects/lksctp
Or contact the lksctp developers through the mailing list:
+
<linux-sctp@vger.kernel.org>
diff --git a/Documentation/networking/secid.txt b/Documentation/networking/secid.rst
index 95ea06784333..b45141a98027 100644
--- a/Documentation/networking/secid.txt
+++ b/Documentation/networking/secid.rst
@@ -1,3 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+LSM/SeLinux secid
+=================
+
flowi structure:
The secid member in the flow structure is used in LSMs (e.g. SELinux) to indicate
diff --git a/Documentation/networking/seg6-sysctl.rst b/Documentation/networking/seg6-sysctl.rst
new file mode 100644
index 000000000000..ec73e1445030
--- /dev/null
+++ b/Documentation/networking/seg6-sysctl.rst
@@ -0,0 +1,26 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Seg6 Sysfs variables
+====================
+
+
+/proc/sys/net/conf/<iface>/seg6_* variables:
+============================================
+
+seg6_enabled - BOOL
+ Accept or drop SR-enabled IPv6 packets on this interface.
+
+ Relevant packets are those with SRH present and DA = local.
+
+ * 0 - disabled (default)
+ * not 0 - enabled
+
+seg6_require_hmac - INTEGER
+ Define HMAC policy for ingress SR-enabled packets on this interface.
+
+ * -1 - Ignore HMAC field
+ * 0 - Accept SR packets without HMAC, validate SR packets with HMAC
+ * 1 - Drop SR packets without HMAC, validate SR packets with HMAC
+
+ Default is 0.
diff --git a/Documentation/networking/seg6-sysctl.txt b/Documentation/networking/seg6-sysctl.txt
deleted file mode 100644
index bdbde23b19cb..000000000000
--- a/Documentation/networking/seg6-sysctl.txt
+++ /dev/null
@@ -1,18 +0,0 @@
-/proc/sys/net/conf/<iface>/seg6_* variables:
-
-seg6_enabled - BOOL
- Accept or drop SR-enabled IPv6 packets on this interface.
-
- Relevant packets are those with SRH present and DA = local.
-
- 0 - disabled (default)
- not 0 - enabled
-
-seg6_require_hmac - INTEGER
- Define HMAC policy for ingress SR-enabled packets on this interface.
-
- -1 - Ignore HMAC field
- 0 - Accept SR packets without HMAC, validate SR packets with HMAC
- 1 - Drop SR packets without HMAC, validate SR packets with HMAC
-
- Default is 0.
diff --git a/Documentation/networking/skfp.txt b/Documentation/networking/skfp.rst
index 203ec66c9fb4..58f548105c1d 100644
--- a/Documentation/networking/skfp.txt
+++ b/Documentation/networking/skfp.rst
@@ -1,35 +1,41 @@
-(C)Copyright 1998-2000 SysKonnect,
-===========================================================================
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: <isonum.txt>
+
+========================
+SysKonnect driver - SKFP
+========================
+
+|copy| Copyright 1998-2000 SysKonnect,
skfp.txt created 11-May-2000
Readme File for skfp.o v2.06
-This file contains
-(1) OVERVIEW
-(2) SUPPORTED ADAPTERS
-(3) GENERAL INFORMATION
-(4) INSTALLATION
-(5) INCLUSION OF THE ADAPTER IN SYSTEM START
-(6) TROUBLESHOOTING
-(7) FUNCTION OF THE ADAPTER LEDS
-(8) HISTORY
+.. This file contains
-===========================================================================
+ (1) OVERVIEW
+ (2) SUPPORTED ADAPTERS
+ (3) GENERAL INFORMATION
+ (4) INSTALLATION
+ (5) INCLUSION OF THE ADAPTER IN SYSTEM START
+ (6) TROUBLESHOOTING
+ (7) FUNCTION OF THE ADAPTER LEDS
+ (8) HISTORY
-
-(1) OVERVIEW
-============
+1. Overview
+===========
This README explains how to use the driver 'skfp' for Linux with your
network adapter.
Chapter 2: Contains a list of all network adapters that are supported by
- this driver.
+this driver.
-Chapter 3: Gives some general information.
+Chapter 3:
+ Gives some general information.
Chapter 4: Describes common problems and solutions.
@@ -37,14 +43,13 @@ Chapter 5: Shows the changed functionality of the adapter LEDs.
Chapter 6: History of development.
-***
-
-(2) SUPPORTED ADAPTERS
-======================
+2. Supported adapters
+=====================
The network driver 'skfp' supports the following network adapters:
SysKonnect adapters:
+
- SK-5521 (SK-NET FDDI-UP)
- SK-5522 (SK-NET FDDI-UP DAS)
- SK-5541 (SK-NET FDDI-FP)
@@ -55,157 +60,187 @@ SysKonnect adapters:
- SK-5841 (SK-NET FDDI-FP64)
- SK-5843 (SK-NET FDDI-LP64)
- SK-5844 (SK-NET FDDI-LP64 DAS)
+
Compaq adapters (not tested):
+
- Netelligent 100 FDDI DAS Fibre SC
- Netelligent 100 FDDI SAS Fibre SC
- Netelligent 100 FDDI DAS UTP
- Netelligent 100 FDDI SAS UTP
- Netelligent 100 FDDI SAS Fibre MIC
-***
-(3) GENERAL INFORMATION
-=======================
+3. General Information
+======================
From v2.01 on, the driver is integrated in the linux kernel sources.
Therefore, the installation is the same as for any other adapter
supported by the kernel.
+
Refer to the manual of your distribution about the installation
of network adapters.
-Makes my life much easier :-)
-***
+Makes my life much easier :-)
-(4) TROUBLESHOOTING
-===================
+4. Troubleshooting
+==================
If you run into problems during installation, check those items:
-Problem: The FDDI adapter cannot be found by the driver.
-Reason: Look in /proc/pci for the following entry:
- 'FDDI network controller: SysKonnect SK-FDDI-PCI ...'
+Problem:
+ The FDDI adapter cannot be found by the driver.
+
+Reason:
+ Look in /proc/pci for the following entry:
+
+ 'FDDI network controller: SysKonnect SK-FDDI-PCI ...'
+
If this entry exists, then the FDDI adapter has been
found by the system and should be able to be used.
+
If this entry does not exist or if the file '/proc/pci'
is not there, then you may have a hardware problem or PCI
support may not be enabled in your kernel.
+
The adapter can be checked using the diagnostic program
which is available from the SysKonnect web site:
+
www.syskonnect.de
+
Some COMPAQ machines have a problem with PCI under
Linux. This is described in the 'PCI howto' document
(included in some distributions or available from the
www, e.g. at 'www.linux.org') and no workaround is available.
-Problem: You want to use your computer as a router between
- multiple IP subnetworks (using multiple adapters), but
+Problem:
+ You want to use your computer as a router between
+ multiple IP subnetworks (using multiple adapters), but
you cannot reach computers in other subnetworks.
-Reason: Either the router's kernel is not configured for IP
+
+Reason:
+ Either the router's kernel is not configured for IP
forwarding or there is a problem with the routing table
and gateway configuration in at least one of the
computers.
If your problem is not listed here, please contact our
-technical support for help.
-You can send email to:
- linux@syskonnect.de
+technical support for help.
+
+You can send email to: linux@syskonnect.de
+
When contacting our technical support,
please ensure that the following information is available:
+
- System Manufacturer and Model
- Boards in your system
- Distribution
- Kernel version
-***
-
-
-(5) FUNCTION OF THE ADAPTER LEDS
-================================
- The functionality of the LED's on the FDDI network adapters was
- changed in SMT version v2.82. With this new SMT version, the yellow
- LED works as a ring operational indicator. An active yellow LED
- indicates that the ring is down. The green LED on the adapter now
- works as a link indicator where an active GREEN LED indicates that
- the respective port has a physical connection.
+5. Function of the Adapter LEDs
+===============================
- With versions of SMT prior to v2.82 a ring up was indicated if the
- yellow LED was off while the green LED(s) showed the connection
- status of the adapter. During a ring down the green LED was off and
- the yellow LED was on.
+ The functionality of the LED's on the FDDI network adapters was
+ changed in SMT version v2.82. With this new SMT version, the yellow
+ LED works as a ring operational indicator. An active yellow LED
+ indicates that the ring is down. The green LED on the adapter now
+ works as a link indicator where an active GREEN LED indicates that
+ the respective port has a physical connection.
- All implementations indicate that a driver is not loaded if
- all LEDs are off.
+ With versions of SMT prior to v2.82 a ring up was indicated if the
+ yellow LED was off while the green LED(s) showed the connection
+ status of the adapter. During a ring down the green LED was off and
+ the yellow LED was on.
-***
+ All implementations indicate that a driver is not loaded if
+ all LEDs are off.
-(6) HISTORY
-===========
+6. History
+==========
v2.06 (20000511) (In-Kernel version)
New features:
+
- 64 bit support
- new pci dma interface
- in kernel 2.3.99
v2.05 (20000217) (In-Kernel version)
New features:
+
- Changes for 2.3.45 kernel
v2.04 (20000207) (Standalone version)
New features:
+
- Added rx/tx byte counter
v2.03 (20000111) (Standalone version)
Problems fixed:
+
- Fixed printk statements from v2.02
v2.02 (991215) (Standalone version)
Problems fixed:
+
- Removed unnecessary output
- Fixed path for "printver.sh" in makefile
v2.01 (991122) (In-Kernel version)
New features:
+
- Integration in Linux kernel sources
- Support for memory mapped I/O.
v2.00 (991112)
New features:
+
- Full source released under GPL
v1.05 (991023)
Problems fixed:
+
- Compilation with kernel version 2.2.13 failed
v1.04 (990427)
Changes:
+
- New SMT module included, changing LED functionality
+
Problems fixed:
+
- Synchronization on SMP machines was buggy
v1.03 (990325)
Problems fixed:
+
- Interrupt routing on SMP machines could be incorrect
v1.02 (990310)
New features:
+
- Support for kernel versions 2.2.x added
- Kernel patch instead of private duplicate of kernel functions
v1.01 (980812)
Problems fixed:
+
Connection hangup with telnet
Slow telnet connection
v1.00 beta 01 (980507)
New features:
+
None.
+
Problems fixed:
+
None.
+
Known limitations:
- - tar archive instead of standard package format (rpm).
+
+ - tar archive instead of standard package format (rpm).
- FDDI statistic is empty.
- not tested with 2.1.xx kernels
- integration in kernel not tested
@@ -216,5 +251,3 @@ v1.00 beta 01 (980507)
- does not work on some COMPAQ machines. See the PCI howto
document for details about this problem.
- data corruption with kernel versions below 2.0.33.
-
-*** End of information file ***
diff --git a/Documentation/networking/snmp_counter.rst b/Documentation/networking/snmp_counter.rst
index 10e11099e74a..4edd0d38779e 100644
--- a/Documentation/networking/snmp_counter.rst
+++ b/Documentation/networking/snmp_counter.rst
@@ -792,7 +792,7 @@ counters to indicate the ACK is skipped in which scenario. The ACK
would only be skipped if the received packet is either a SYN packet or
it has no data.
-.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
+.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst
* TcpExtTCPACKSkippedSynRecv
diff --git a/Documentation/networking/strparser.txt b/Documentation/networking/strparser.rst
index a7d354ddda7b..6cab1f74ae05 100644
--- a/Documentation/networking/strparser.txt
+++ b/Documentation/networking/strparser.rst
@@ -1,4 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================
Stream Parser (strparser)
+=========================
Introduction
============
@@ -34,8 +38,10 @@ that is called when a full message has been completed.
Functions
=========
-strp_init(struct strparser *strp, struct sock *sk,
- const struct strp_callbacks *cb)
+ ::
+
+ strp_init(struct strparser *strp, struct sock *sk,
+ const struct strp_callbacks *cb)
Called to initialize a stream parser. strp is a struct of type
strparser that is allocated by the upper layer. sk is the TCP
@@ -43,31 +49,41 @@ strp_init(struct strparser *strp, struct sock *sk,
callback mode; in general mode this is set to NULL. Callbacks
are called by the stream parser (the callbacks are listed below).
-void strp_pause(struct strparser *strp)
+ ::
+
+ void strp_pause(struct strparser *strp)
Temporarily pause a stream parser. Message parsing is suspended
and no new messages are delivered to the upper layer.
-void strp_unpause(struct strparser *strp)
+ ::
+
+ void strp_unpause(struct strparser *strp)
Unpause a paused stream parser.
-void strp_stop(struct strparser *strp);
+ ::
+
+ void strp_stop(struct strparser *strp);
strp_stop is called to completely stop stream parser operations.
This is called internally when the stream parser encounters an
error, and it is called from the upper layer to stop parsing
operations.
-void strp_done(struct strparser *strp);
+ ::
+
+ void strp_done(struct strparser *strp);
strp_done is called to release any resources held by the stream
parser instance. This must be called after the stream processor
has been stopped.
-int strp_process(struct strparser *strp, struct sk_buff *orig_skb,
- unsigned int orig_offset, size_t orig_len,
- size_t max_msg_size, long timeo)
+ ::
+
+ int strp_process(struct strparser *strp, struct sk_buff *orig_skb,
+ unsigned int orig_offset, size_t orig_len,
+ size_t max_msg_size, long timeo)
strp_process is called in general mode for a stream parser to
parse an sk_buff. The number of bytes processed or a negative
@@ -75,7 +91,9 @@ int strp_process(struct strparser *strp, struct sk_buff *orig_skb,
consume the sk_buff. max_msg_size is maximum size the stream
parser will parse. timeo is timeout for completing a message.
-void strp_data_ready(struct strparser *strp);
+ ::
+
+ void strp_data_ready(struct strparser *strp);
The upper layer calls strp_tcp_data_ready when data is ready on
the lower socket for strparser to process. This should be called
@@ -83,7 +101,9 @@ void strp_data_ready(struct strparser *strp);
maximum messages size is the limit of the receive socket
buffer and message timeout is the receive timeout for the socket.
-void strp_check_rcv(struct strparser *strp);
+ ::
+
+ void strp_check_rcv(struct strparser *strp);
strp_check_rcv is called to check for new messages on the socket.
This is normally called at initialization of a stream parser
@@ -94,7 +114,9 @@ Callbacks
There are six callbacks:
-int (*parse_msg)(struct strparser *strp, struct sk_buff *skb);
+ ::
+
+ int (*parse_msg)(struct strparser *strp, struct sk_buff *skb);
parse_msg is called to determine the length of the next message
in the stream. The upper layer must implement this function. It
@@ -107,14 +129,16 @@ int (*parse_msg)(struct strparser *strp, struct sk_buff *skb);
The return values of this function are:
- >0 : indicates length of successfully parsed message
- 0 : indicates more data must be received to parse the message
- -ESTRPIPE : current message should not be processed by the
- kernel, return control of the socket to userspace which
- can proceed to read the messages itself
- other < 0 : Error in parsing, give control back to userspace
- assuming that synchronization is lost and the stream
- is unrecoverable (application expected to close TCP socket)
+ ========= ===========================================================
+ >0 indicates length of successfully parsed message
+ 0 indicates more data must be received to parse the message
+ -ESTRPIPE current message should not be processed by the
+ kernel, return control of the socket to userspace which
+ can proceed to read the messages itself
+ other < 0 Error in parsing, give control back to userspace
+ assuming that synchronization is lost and the stream
+ is unrecoverable (application expected to close TCP socket)
+ ========= ===========================================================
In the case that an error is returned (return value is less than
zero) and the parser is in receive callback mode, then it will set
@@ -123,7 +147,9 @@ int (*parse_msg)(struct strparser *strp, struct sk_buff *skb);
the current message, then the error set on the attached socket is
ENODATA since the stream is unrecoverable in that case.
-void (*lock)(struct strparser *strp)
+ ::
+
+ void (*lock)(struct strparser *strp)
The lock callback is called to lock the strp structure when
the strparser is performing an asynchronous operation (such as
@@ -131,14 +157,18 @@ void (*lock)(struct strparser *strp)
function is to lock_sock for the associated socket. In general
mode the callback must be set appropriately.
-void (*unlock)(struct strparser *strp)
+ ::
+
+ void (*unlock)(struct strparser *strp)
The unlock callback is called to release the lock obtained
by the lock callback. In receive callback mode the default
function is release_sock for the associated socket. In general
mode the callback must be set appropriately.
-void (*rcv_msg)(struct strparser *strp, struct sk_buff *skb);
+ ::
+
+ void (*rcv_msg)(struct strparser *strp, struct sk_buff *skb);
rcv_msg is called when a full message has been received and
is queued. The callee must consume the sk_buff; it can
@@ -152,7 +182,9 @@ void (*rcv_msg)(struct strparser *strp, struct sk_buff *skb);
the length of the message. skb->len - offset may be greater
then full_len since strparser does not trim the skb.
-int (*read_sock_done)(struct strparser *strp, int err);
+ ::
+
+ int (*read_sock_done)(struct strparser *strp, int err);
read_sock_done is called when the stream parser is done reading
the TCP socket in receive callback mode. The stream parser may
@@ -160,7 +192,9 @@ int (*read_sock_done)(struct strparser *strp, int err);
to occur when exiting the loop. If the callback is not set (NULL
in strp_init) a default function is used.
-void (*abort_parser)(struct strparser *strp, int err);
+ ::
+
+ void (*abort_parser)(struct strparser *strp, int err);
This function is called when stream parser encounters an error
in parsing. The default function stops the stream parser and
@@ -204,4 +238,3 @@ Author
======
Tom Herbert (tom@quantonium.net)
-
diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.rst
index 86174ce8cd13..ddc3f35775dc 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.rst
@@ -1,7 +1,13 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+===============================================
Ethernet switch device driver model (switchdev)
===============================================
-Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
-Copyright (c) 2014-2015 Scott Feldman <sfeldma@gmail.com>
+
+Copyright |copy| 2014 Jiri Pirko <jiri@resnulli.us>
+
+Copyright |copy| 2014-2015 Scott Feldman <sfeldma@gmail.com>
The Ethernet switch device driver model (switchdev) is an in-kernel driver
@@ -12,53 +18,57 @@ Figure 1 is a block diagram showing the components of the switchdev model for
an example setup using a data-center-class switch ASIC chip. Other setups
with SR-IOV or soft switches, such as OVS, are possible.
+::
- User-space tools
+
+ User-space tools
user space |
+-------------------------------------------------------------------+
kernel | Netlink
- |
- +--------------+-------------------------------+
- | Network stack |
- | (Linux) |
- | |
- +----------------------------------------------+
-
- sw1p2 sw1p4 sw1p6
- sw1p1 + sw1p3 + sw1p5 + eth1
- + | + | + | +
- | | | | | | |
- +--+----+----+----+----+----+---+ +-----+-----+
- | Switch driver | | mgmt |
- | (this document) | | driver |
- | | | |
- +--------------+----------------+ +-----------+
- |
+ |
+ +--------------+-------------------------------+
+ | Network stack |
+ | (Linux) |
+ | |
+ +----------------------------------------------+
+
+ sw1p2 sw1p4 sw1p6
+ sw1p1 + sw1p3 + sw1p5 + eth1
+ + | + | + | +
+ | | | | | | |
+ +--+----+----+----+----+----+---+ +-----+-----+
+ | Switch driver | | mgmt |
+ | (this document) | | driver |
+ | | | |
+ +--------------+----------------+ +-----------+
+ |
kernel | HW bus (eg PCI)
+-------------------------------------------------------------------+
hardware |
- +--------------+----------------+
- | Switch device (sw1) |
- | +----+ +--------+
- | | v offloaded data path | mgmt port
- | | | |
- +--|----|----+----+----+----+---+
- | | | | | |
- + + + + + +
- p1 p2 p3 p4 p5 p6
+ +--------------+----------------+
+ | Switch device (sw1) |
+ | +----+ +--------+
+ | | v offloaded data path | mgmt port
+ | | | |
+ +--|----|----+----+----+----+---+
+ | | | | | |
+ + + + + + +
+ p1 p2 p3 p4 p5 p6
- front-panel ports
+ front-panel ports
- Fig 1.
+ Fig 1.
Include Files
-------------
-#include <linux/netdevice.h>
-#include <net/switchdev.h>
+::
+
+ #include <linux/netdevice.h>
+ #include <net/switchdev.h>
Configuration
@@ -114,10 +124,10 @@ Using port PHYS name (ndo_get_phys_port_name) for the key is particularly
useful for dynamically-named ports where the device names its ports based on
external configuration. For example, if a physical 40G port is split logically
into 4 10G ports, resulting in 4 port netdevs, the device can give a unique
-name for each port using port PHYS name. The udev rule would be:
+name for each port using port PHYS name. The udev rule would be::
-SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \
- ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
+ SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \
+ ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
Suggested naming convention is "swXpYsZ", where X is the switch name or ID, Y
is the port name or ID, and Z is the sub-port name or ID. For example, sw1p1s0
@@ -173,7 +183,7 @@ Static FDB Entries
The switchdev driver should implement ndo_fdb_add, ndo_fdb_del and ndo_fdb_dump
to support static FDB entries installed to the device. Static bridge FDB
-entries are installed, for example, using iproute2 bridge cmd:
+entries are installed, for example, using iproute2 bridge cmd::
bridge fdb add ADDR dev DEV [vlan VID] [self]
@@ -185,7 +195,7 @@ XXX: what should be done if offloading this rule to hardware fails (for
example, due to full capacity in hardware tables) ?
Note: by default, the bridge does not filter on VLAN and only bridges untagged
-traffic. To enable VLAN support, turn on VLAN filtering:
+traffic. To enable VLAN support, turn on VLAN filtering::
echo 1 >/sys/class/net/<bridge>/bridge/vlan_filtering
@@ -194,7 +204,7 @@ Notification of Learned/Forgotten Source MAC/VLANs
The switch device will learn/forget source MAC address/VLAN on ingress packets
and notify the switch driver of the mac/vlan/port tuples. The switch driver,
-in turn, will notify the bridge driver using the switchdev notifier call:
+in turn, will notify the bridge driver using the switchdev notifier call::
err = call_switchdev_notifiers(val, dev, info, extack);
@@ -202,7 +212,7 @@ Where val is SWITCHDEV_FDB_ADD when learning and SWITCHDEV_FDB_DEL when
forgetting, and info points to a struct switchdev_notifier_fdb_info. On
SWITCHDEV_FDB_ADD, the bridge driver will install the FDB entry into the
bridge's FDB and mark the entry as NTF_EXT_LEARNED. The iproute2 bridge
-command will label these entries "offload":
+command will label these entries "offload"::
$ bridge fdb
52:54:00:12:35:01 dev sw1p1 master br0 permanent
@@ -219,11 +229,11 @@ command will label these entries "offload":
01:00:5e:00:00:01 dev br0 self permanent
33:33:ff:12:35:01 dev br0 self permanent
-Learning on the port should be disabled on the bridge using the bridge command:
+Learning on the port should be disabled on the bridge using the bridge command::
bridge link set dev DEV learning off
-Learning on the device port should be enabled, as well as learning_sync:
+Learning on the device port should be enabled, as well as learning_sync::
bridge link set dev DEV learning on self
bridge link set dev DEV learning_sync on self
@@ -314,12 +324,16 @@ forwards the packet to the matching FIB entry's nexthop(s) egress ports.
To program the device, the driver has to register a FIB notifier handler
using register_fib_notifier. The following events are available:
-FIB_EVENT_ENTRY_ADD: used for both adding a new FIB entry to the device,
- or modifying an existing entry on the device.
-FIB_EVENT_ENTRY_DEL: used for removing a FIB entry
-FIB_EVENT_RULE_ADD, FIB_EVENT_RULE_DEL: used to propagate FIB rule changes
-FIB_EVENT_ENTRY_ADD and FIB_EVENT_ENTRY_DEL events pass:
+=================== ===================================================
+FIB_EVENT_ENTRY_ADD used for both adding a new FIB entry to the device,
+ or modifying an existing entry on the device.
+FIB_EVENT_ENTRY_DEL used for removing a FIB entry
+FIB_EVENT_RULE_ADD,
+FIB_EVENT_RULE_DEL used to propagate FIB rule changes
+=================== ===================================================
+
+FIB_EVENT_ENTRY_ADD and FIB_EVENT_ENTRY_DEL events pass::
struct fib_entry_notifier_info {
struct fib_notifier_info info; /* must be first */
@@ -332,12 +346,12 @@ FIB_EVENT_ENTRY_ADD and FIB_EVENT_ENTRY_DEL events pass:
u32 nlflags;
};
-to add/modify/delete IPv4 dst/dest_len prefix on table tb_id. The *fi
-structure holds details on the route and route's nexthops. *dev is one of the
-port netdevs mentioned in the route's next hop list.
+to add/modify/delete IPv4 dst/dest_len prefix on table tb_id. The ``*fi``
+structure holds details on the route and route's nexthops. ``*dev`` is one
+of the port netdevs mentioned in the route's next hop list.
Routes offloaded to the device are labeled with "offload" in the ip route
-listing:
+listing::
$ ip route show
default via 192.168.0.2 dev eth0
diff --git a/Documentation/networking/tc-actions-env-rules.rst b/Documentation/networking/tc-actions-env-rules.rst
new file mode 100644
index 000000000000..86884b8fb4e0
--- /dev/null
+++ b/Documentation/networking/tc-actions-env-rules.rst
@@ -0,0 +1,29 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================================
+TC Actions - Environmental Rules
+================================
+
+
+The "environmental" rules for authors of any new tc actions are:
+
+1) If you stealeth or borroweth any packet thou shalt be branching
+ from the righteous path and thou shalt cloneth.
+
+ For example if your action queues a packet to be processed later,
+ or intentionally branches by redirecting a packet, then you need to
+ clone the packet.
+
+2) If you munge any packet thou shalt call pskb_expand_head in the case
+ someone else is referencing the skb. After that you "own" the skb.
+
+3) Dropping packets you don't own is a no-no. You simply return
+ TC_ACT_SHOT to the caller and they will drop it.
+
+The "environmental" rules for callers of actions (qdiscs etc) are:
+
+#) Thou art responsible for freeing anything returned as being
+ TC_ACT_SHOT/STOLEN/QUEUED. If none of TC_ACT_SHOT/STOLEN/QUEUED is
+ returned, then all is great and you don't need to do anything.
+
+Post on netdev if something is unclear.
diff --git a/Documentation/networking/tc-actions-env-rules.txt b/Documentation/networking/tc-actions-env-rules.txt
deleted file mode 100644
index f37814693ad3..000000000000
--- a/Documentation/networking/tc-actions-env-rules.txt
+++ /dev/null
@@ -1,24 +0,0 @@
-
-The "environmental" rules for authors of any new tc actions are:
-
-1) If you stealeth or borroweth any packet thou shalt be branching
-from the righteous path and thou shalt cloneth.
-
-For example if your action queues a packet to be processed later,
-or intentionally branches by redirecting a packet, then you need to
-clone the packet.
-
-2) If you munge any packet thou shalt call pskb_expand_head in the case
-someone else is referencing the skb. After that you "own" the skb.
-
-3) Dropping packets you don't own is a no-no. You simply return
-TC_ACT_SHOT to the caller and they will drop it.
-
-The "environmental" rules for callers of actions (qdiscs etc) are:
-
-*) Thou art responsible for freeing anything returned as being
-TC_ACT_SHOT/STOLEN/QUEUED. If none of TC_ACT_SHOT/STOLEN/QUEUED is
-returned, then all is great and you don't need to do anything.
-
-Post on netdev if something is unclear.
-
diff --git a/Documentation/networking/tcp-thin.txt b/Documentation/networking/tcp-thin.rst
index 151e229980f1..b06765c96ea1 100644
--- a/Documentation/networking/tcp-thin.txt
+++ b/Documentation/networking/tcp-thin.rst
@@ -1,5 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
Thin-streams and TCP
====================
+
A wide range of Internet-based services that use reliable transport
protocols display what we call thin-stream properties. This means
that the application sends data with such a low rate that the
@@ -42,6 +46,7 @@ References
==========
More information on the modifications, as well as a wide range of
experimental data can be found here:
+
"Improving latency for interactive, thin-stream applications over
reliable transport"
http://simula.no/research/nd/publications/Simula.nd.477/simula_pdf_file
diff --git a/Documentation/networking/team.txt b/Documentation/networking/team.rst
index 5a013686b9ea..0a7f3a059586 100644
--- a/Documentation/networking/team.txt
+++ b/Documentation/networking/team.rst
@@ -1,2 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====
+Team
+====
+
Team devices are driven from userspace via libteam library which is here:
https://github.com/jpirko/libteam
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.rst
index 8dd6333c3270..1adead6a4527 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.rst
@@ -1,9 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+Timestamping
+============
+
1. Control Interfaces
+=====================
The interfaces for receiving network packages timestamps are:
-* SO_TIMESTAMP
+SO_TIMESTAMP
Generates a timestamp for each incoming packet in (not necessarily
monotonic) system time. Reports the timestamp via recvmsg() in a
control message in usec resolution.
@@ -13,7 +20,7 @@ The interfaces for receiving network packages timestamps are:
SO_TIMESTAMP_OLD and in struct __kernel_sock_timeval for
SO_TIMESTAMP_NEW options respectively.
-* SO_TIMESTAMPNS
+SO_TIMESTAMPNS
Same timestamping mechanism as SO_TIMESTAMP, but reports the
timestamp as struct timespec in nsec resolution.
SO_TIMESTAMPNS is defined as SO_TIMESTAMPNS_NEW or SO_TIMESTAMPNS_OLD
@@ -22,17 +29,18 @@ The interfaces for receiving network packages timestamps are:
and in struct __kernel_timespec for SO_TIMESTAMPNS_NEW options
respectively.
-* IP_MULTICAST_LOOP + SO_TIMESTAMP[NS]
+IP_MULTICAST_LOOP + SO_TIMESTAMP[NS]
Only for multicast:approximate transmit timestamp obtained by
reading the looped packet receive timestamp.
-* SO_TIMESTAMPING
+SO_TIMESTAMPING
Generates timestamps on reception, transmission or both. Supports
multiple timestamp sources, including hardware. Supports generating
timestamps for stream sockets.
-1.1 SO_TIMESTAMP (also SO_TIMESTAMP_OLD and SO_TIMESTAMP_NEW):
+1.1 SO_TIMESTAMP (also SO_TIMESTAMP_OLD and SO_TIMESTAMP_NEW)
+-------------------------------------------------------------
This socket option enables timestamping of datagrams on the reception
path. Because the destination socket, if any, is not known early in
@@ -59,10 +67,11 @@ struct __kernel_timespec format.
SO_TIMESTAMPNS_OLD returns incorrect timestamps after the year 2038
on 32 bit machines.
-1.3 SO_TIMESTAMPING (also SO_TIMESTAMPING_OLD and SO_TIMESTAMPING_NEW):
+1.3 SO_TIMESTAMPING (also SO_TIMESTAMPING_OLD and SO_TIMESTAMPING_NEW)
+----------------------------------------------------------------------
Supports multiple types of timestamp requests. As a result, this
-socket option takes a bitmap of flags, not a boolean. In
+socket option takes a bitmap of flags, not a boolean. In::
err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));
@@ -76,6 +85,7 @@ be enabled for individual sendmsg calls using cmsg (1.3.4).
1.3.1 Timestamp Generation
+^^^^^^^^^^^^^^^^^^^^^^^^^^
Some bits are requests to the stack to try to generate timestamps. Any
combination of them is valid. Changes to these bits apply to newly
@@ -106,7 +116,6 @@ SOF_TIMESTAMPING_TX_SOFTWARE:
require driver support and may not be available for all devices.
This flag can be enabled via both socket options and control messages.
-
SOF_TIMESTAMPING_TX_SCHED:
Request tx timestamps prior to entering the packet scheduler. Kernel
transmit latency is, if long, often dominated by queuing delay. The
@@ -132,6 +141,7 @@ SOF_TIMESTAMPING_TX_ACK:
1.3.2 Timestamp Reporting
+^^^^^^^^^^^^^^^^^^^^^^^^^
The other three bits control which timestamps will be reported in a
generated control message. Changes to the bits take immediate
@@ -151,11 +161,11 @@ SOF_TIMESTAMPING_RAW_HARDWARE:
1.3.3 Timestamp Options
+^^^^^^^^^^^^^^^^^^^^^^^
The interface supports the options
SOF_TIMESTAMPING_OPT_ID:
-
Generate a unique identifier along with each packet. A process can
have multiple concurrent timestamping requests outstanding. Packets
can be reordered in the transmit path, for instance in the packet
@@ -183,7 +193,6 @@ SOF_TIMESTAMPING_OPT_ID:
SOF_TIMESTAMPING_OPT_CMSG:
-
Support recv() cmsg for all timestamped packets. Control messages
are already supported unconditionally on all packets with receive
timestamps and on IPv6 packets with transmit timestamp. This option
@@ -193,7 +202,6 @@ SOF_TIMESTAMPING_OPT_CMSG:
SOF_TIMESTAMPING_OPT_TSONLY:
-
Applies to transmit timestamps only. Makes the kernel return the
timestamp as a cmsg alongside an empty packet, as opposed to
alongside the original packet. This reduces the amount of memory
@@ -202,7 +210,6 @@ SOF_TIMESTAMPING_OPT_TSONLY:
This option disables SOF_TIMESTAMPING_OPT_CMSG.
SOF_TIMESTAMPING_OPT_STATS:
-
Optional stats that are obtained along with the transmit timestamps.
It must be used together with SOF_TIMESTAMPING_OPT_TSONLY. When the
transmit timestamp is available, the stats are available in a
@@ -213,7 +220,6 @@ SOF_TIMESTAMPING_OPT_STATS:
data was limited by peer's receiver window.
SOF_TIMESTAMPING_OPT_PKTINFO:
-
Enable the SCM_TIMESTAMPING_PKTINFO control message for incoming
packets with hardware timestamps. The message contains struct
scm_ts_pktinfo, which supplies the index of the real interface which
@@ -223,7 +229,6 @@ SOF_TIMESTAMPING_OPT_PKTINFO:
other fields, but they are reserved and undefined.
SOF_TIMESTAMPING_OPT_TX_SWHW:
-
Request both hardware and software timestamps for outgoing packets
when SOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE
are enabled at the same time. If both timestamps are generated,
@@ -242,12 +247,13 @@ combined with SOF_TIMESTAMPING_OPT_TSONLY.
1.3.4. Enabling timestamps via control messages
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In addition to socket options, timestamp generation can be requested
per write via cmsg, only for SOF_TIMESTAMPING_TX_* (see Section 1.3.1).
Using this feature, applications can sample timestamps per sendmsg()
without paying the overhead of enabling and disabling timestamps via
-setsockopt:
+setsockopt::
struct msghdr *msg;
...
@@ -264,7 +270,7 @@ The SOF_TIMESTAMPING_TX_* flags set via cmsg will override
the SOF_TIMESTAMPING_TX_* flags set via setsockopt.
Moreover, applications must still enable timestamp reporting via
-setsockopt to receive timestamps:
+setsockopt to receive timestamps::
__u32 val = SOF_TIMESTAMPING_SOFTWARE |
SOF_TIMESTAMPING_OPT_ID /* or any other flag */;
@@ -272,6 +278,7 @@ setsockopt to receive timestamps:
1.4 Bytestream Timestamps
+-------------------------
The SO_TIMESTAMPING interface supports timestamping of bytes in a
bytestream. Each request is interpreted as a request for when the
@@ -331,6 +338,7 @@ unusual.
2 Data Interfaces
+==================
Timestamps are read using the ancillary data feature of recvmsg().
See `man 3 cmsg` for details of this interface. The socket manual
@@ -339,20 +347,21 @@ SO_TIMESTAMP and SO_TIMESTAMPNS records can be retrieved.
2.1 SCM_TIMESTAMPING records
+----------------------------
These timestamps are returned in a control message with cmsg_level
SOL_SOCKET, cmsg_type SCM_TIMESTAMPING, and payload of type
-For SO_TIMESTAMPING_OLD:
+For SO_TIMESTAMPING_OLD::
-struct scm_timestamping {
- struct timespec ts[3];
-};
+ struct scm_timestamping {
+ struct timespec ts[3];
+ };
-For SO_TIMESTAMPING_NEW:
+For SO_TIMESTAMPING_NEW::
-struct scm_timestamping64 {
- struct __kernel_timespec ts[3];
+ struct scm_timestamping64 {
+ struct __kernel_timespec ts[3];
Always use SO_TIMESTAMPING_NEW timestamp to always get timestamp in
struct scm_timestamping64 format.
@@ -377,6 +386,7 @@ in ts[0] when a real software timestamp is missing. This happens also
on hardware transmit timestamps.
2.1.1 Transmit timestamps with MSG_ERRQUEUE
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For transmit timestamps the outgoing packet is looped back to the
socket's error queue with the send timestamp(s) attached. A process
@@ -393,6 +403,7 @@ embeds the struct scm_timestamping.
2.1.1.2 Timestamp types
+~~~~~~~~~~~~~~~~~~~~~~~
The semantics of the three struct timespec are defined by field
ee_info in the extended error structure. It contains a value of
@@ -408,6 +419,7 @@ case the timestamp is stored in ts[0].
2.1.1.3 Fragmentation
+~~~~~~~~~~~~~~~~~~~~~
Fragmentation of outgoing datagrams is rare, but is possible, e.g., by
explicitly disabling PMTU discovery. If an outgoing packet is fragmented,
@@ -416,6 +428,7 @@ socket.
2.1.1.4 Packet Payload
+~~~~~~~~~~~~~~~~~~~~~~
The calling application is often not interested in receiving the whole
packet payload that it passed to the stack originally: the socket
@@ -427,6 +440,7 @@ however, the full packet is queued, taking up budget from SO_RCVBUF.
2.1.1.5 Blocking Read
+~~~~~~~~~~~~~~~~~~~~~
Reading from the error queue is always a non-blocking operation. To
block waiting on a timestamp, use poll or select. poll() will return
@@ -436,6 +450,7 @@ ignored on request. See also `man 2 poll`.
2.1.2 Receive timestamps
+^^^^^^^^^^^^^^^^^^^^^^^^
On reception, there is no reason to read from the socket error queue.
The SCM_TIMESTAMPING ancillary data is sent along with the packet data
@@ -447,16 +462,17 @@ is again deprecated and ts[2] holds a hardware timestamp if set.
3. Hardware Timestamping configuration: SIOCSHWTSTAMP and SIOCGHWTSTAMP
+=======================================================================
Hardware time stamping must also be initialized for each device driver
that is expected to do hardware time stamping. The parameter is defined in
-include/uapi/linux/net_tstamp.h as:
+include/uapi/linux/net_tstamp.h as::
-struct hwtstamp_config {
- int flags; /* no flags defined right now, must be zero */
- int tx_type; /* HWTSTAMP_TX_* */
- int rx_filter; /* HWTSTAMP_FILTER_* */
-};
+ struct hwtstamp_config {
+ int flags; /* no flags defined right now, must be zero */
+ int tx_type; /* HWTSTAMP_TX_* */
+ int rx_filter; /* HWTSTAMP_FILTER_* */
+ };
Desired behavior is passed into the kernel and to a specific device by
calling ioctl(SIOCSHWTSTAMP) with a pointer to a struct ifreq whose
@@ -487,44 +503,47 @@ Any process can read the actual configuration by passing this
structure to ioctl(SIOCGHWTSTAMP) in the same way. However, this has
not been implemented in all drivers.
-/* possible values for hwtstamp_config->tx_type */
-enum {
- /*
- * no outgoing packet will need hardware time stamping;
- * should a packet arrive which asks for it, no hardware
- * time stamping will be done
- */
- HWTSTAMP_TX_OFF,
-
- /*
- * enables hardware time stamping for outgoing packets;
- * the sender of the packet decides which are to be
- * time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE
- * before sending the packet
- */
- HWTSTAMP_TX_ON,
-};
-
-/* possible values for hwtstamp_config->rx_filter */
-enum {
- /* time stamp no incoming packet at all */
- HWTSTAMP_FILTER_NONE,
-
- /* time stamp any incoming packet */
- HWTSTAMP_FILTER_ALL,
-
- /* return value: time stamp all packets requested plus some others */
- HWTSTAMP_FILTER_SOME,
-
- /* PTP v1, UDP, any kind of event packet */
- HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
-
- /* for the complete list of values, please check
- * the include file include/uapi/linux/net_tstamp.h
- */
-};
+::
+
+ /* possible values for hwtstamp_config->tx_type */
+ enum {
+ /*
+ * no outgoing packet will need hardware time stamping;
+ * should a packet arrive which asks for it, no hardware
+ * time stamping will be done
+ */
+ HWTSTAMP_TX_OFF,
+
+ /*
+ * enables hardware time stamping for outgoing packets;
+ * the sender of the packet decides which are to be
+ * time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE
+ * before sending the packet
+ */
+ HWTSTAMP_TX_ON,
+ };
+
+ /* possible values for hwtstamp_config->rx_filter */
+ enum {
+ /* time stamp no incoming packet at all */
+ HWTSTAMP_FILTER_NONE,
+
+ /* time stamp any incoming packet */
+ HWTSTAMP_FILTER_ALL,
+
+ /* return value: time stamp all packets requested plus some others */
+ HWTSTAMP_FILTER_SOME,
+
+ /* PTP v1, UDP, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
+
+ /* for the complete list of values, please check
+ * the include file include/uapi/linux/net_tstamp.h
+ */
+ };
3.1 Hardware Timestamping Implementation: Device Drivers
+--------------------------------------------------------
A driver which supports hardware time stamping must support the
SIOCSHWTSTAMP ioctl and update the supplied struct hwtstamp_config with
@@ -533,22 +552,23 @@ should also support SIOCGHWTSTAMP.
Time stamps for received packets must be stored in the skb. To get a pointer
to the shared time stamp structure of the skb call skb_hwtstamps(). Then
-set the time stamps in the structure:
+set the time stamps in the structure::
-struct skb_shared_hwtstamps {
- /* hardware time stamp transformed into duration
- * since arbitrary point in time
- */
- ktime_t hwtstamp;
-};
+ struct skb_shared_hwtstamps {
+ /* hardware time stamp transformed into duration
+ * since arbitrary point in time
+ */
+ ktime_t hwtstamp;
+ };
Time stamps for outgoing packets are to be generated as follows:
+
- In hard_start_xmit(), check if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)
is set no-zero. If yes, then the driver is expected to do hardware time
stamping.
- If this is possible for the skb and requested, then declare
that the driver is doing the time stamping by setting the flag
- SKBTX_IN_PROGRESS in skb_shinfo(skb)->tx_flags , e.g. with
+ SKBTX_IN_PROGRESS in skb_shinfo(skb)->tx_flags , e.g. with::
skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
diff --git a/Documentation/networking/tproxy.txt b/Documentation/networking/tproxy.rst
index b9a188823d9f..00dc3a1a66b4 100644
--- a/Documentation/networking/tproxy.txt
+++ b/Documentation/networking/tproxy.rst
@@ -1,3 +1,6 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================
Transparent proxy support
=========================
@@ -11,39 +14,39 @@ From Linux 4.18 transparent proxy support is also available in nf_tables.
================================
The idea is that you identify packets with destination address matching a local
-socket on your box, set the packet mark to a certain value:
+socket on your box, set the packet mark to a certain value::
-# iptables -t mangle -N DIVERT
-# iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT
-# iptables -t mangle -A DIVERT -j MARK --set-mark 1
-# iptables -t mangle -A DIVERT -j ACCEPT
+ # iptables -t mangle -N DIVERT
+ # iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT
+ # iptables -t mangle -A DIVERT -j MARK --set-mark 1
+ # iptables -t mangle -A DIVERT -j ACCEPT
-Alternatively you can do this in nft with the following commands:
+Alternatively you can do this in nft with the following commands::
-# nft add table filter
-# nft add chain filter divert "{ type filter hook prerouting priority -150; }"
-# nft add rule filter divert meta l4proto tcp socket transparent 1 meta mark set 1 accept
+ # nft add table filter
+ # nft add chain filter divert "{ type filter hook prerouting priority -150; }"
+ # nft add rule filter divert meta l4proto tcp socket transparent 1 meta mark set 1 accept
And then match on that value using policy routing to have those packets
-delivered locally:
+delivered locally::
-# ip rule add fwmark 1 lookup 100
-# ip route add local 0.0.0.0/0 dev lo table 100
+ # ip rule add fwmark 1 lookup 100
+ # ip route add local 0.0.0.0/0 dev lo table 100
Because of certain restrictions in the IPv4 routing output code you'll have to
modify your application to allow it to send datagrams _from_ non-local IP
addresses. All you have to do is enable the (SOL_IP, IP_TRANSPARENT) socket
-option before calling bind:
-
-fd = socket(AF_INET, SOCK_STREAM, 0);
-/* - 8< -*/
-int value = 1;
-setsockopt(fd, SOL_IP, IP_TRANSPARENT, &value, sizeof(value));
-/* - 8< -*/
-name.sin_family = AF_INET;
-name.sin_port = htons(0xCAFE);
-name.sin_addr.s_addr = htonl(0xDEADBEEF);
-bind(fd, &name, sizeof(name));
+option before calling bind::
+
+ fd = socket(AF_INET, SOCK_STREAM, 0);
+ /* - 8< -*/
+ int value = 1;
+ setsockopt(fd, SOL_IP, IP_TRANSPARENT, &value, sizeof(value));
+ /* - 8< -*/
+ name.sin_family = AF_INET;
+ name.sin_port = htons(0xCAFE);
+ name.sin_addr.s_addr = htonl(0xDEADBEEF);
+ bind(fd, &name, sizeof(name));
A trivial patch for netcat is available here:
http://people.netfilter.org/hidden/tproxy/netcat-ip_transparent-support.patch
@@ -61,10 +64,10 @@ be able to find out the original destination address. Even in case of TCP
getting the original destination address is racy.)
The 'TPROXY' target provides similar functionality without relying on NAT. Simply
-add rules like this to the iptables ruleset above:
+add rules like this to the iptables ruleset above::
-# iptables -t mangle -A PREROUTING -p tcp --dport 80 -j TPROXY \
- --tproxy-mark 0x1/0x1 --on-port 50080
+ # iptables -t mangle -A PREROUTING -p tcp --dport 80 -j TPROXY \
+ --tproxy-mark 0x1/0x1 --on-port 50080
Or the following rule to nft:
@@ -82,10 +85,12 @@ nf_tables implementation.
====================================
To use tproxy you'll need to have the following modules compiled for iptables:
+
- NETFILTER_XT_MATCH_SOCKET
- NETFILTER_XT_TARGET_TPROXY
Or the floowing modules for nf_tables:
+
- NFT_SOCKET
- NFT_TPROXY
diff --git a/Documentation/networking/tuntap.txt b/Documentation/networking/tuntap.rst
index 0104830d5075..a59d1dd6fdcc 100644
--- a/Documentation/networking/tuntap.txt
+++ b/Documentation/networking/tuntap.rst
@@ -1,20 +1,28 @@
-Universal TUN/TAP device driver.
-Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com>
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
- Linux, Solaris drivers
- Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com>
+===============================
+Universal TUN/TAP device driver
+===============================
- FreeBSD TAP driver
- Copyright (c) 1999-2000 Maksim Yevmenkin <m_evmenkin@yahoo.com>
+Copyright |copy| 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com>
+
+ Linux, Solaris drivers
+ Copyright |copy| 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com>
+
+ FreeBSD TAP driver
+ Copyright |copy| 1999-2000 Maksim Yevmenkin <m_evmenkin@yahoo.com>
Revision of this document 2002 by Florian Thiel <florian.thiel@gmx.net>
1. Description
- TUN/TAP provides packet reception and transmission for user space programs.
+==============
+
+ TUN/TAP provides packet reception and transmission for user space programs.
It can be seen as a simple Point-to-Point or Ethernet device, which,
- instead of receiving packets from physical media, receives them from
- user space program and instead of sending packets via physical media
- writes them to the user space program.
+ instead of receiving packets from physical media, receives them from
+ user space program and instead of sending packets via physical media
+ writes them to the user space program.
In order to use the driver a program has to open /dev/net/tun and issue a
corresponding ioctl() to register a network device with the kernel. A network
@@ -33,41 +41,51 @@ Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com>
br_sigio.c - bridge based on async io and SIGIO signal.
However, the best example is VTun http://vtun.sourceforge.net :))
-2. Configuration
- Create device node:
+2. Configuration
+================
+
+ Create device node::
+
mkdir /dev/net (if it doesn't exist already)
mknod /dev/net/tun c 10 200
-
- Set permissions:
+
+ Set permissions::
+
e.g. chmod 0666 /dev/net/tun
- There's no harm in allowing the device to be accessible by non-root users,
- since CAP_NET_ADMIN is required for creating network devices or for
- connecting to network devices which aren't owned by the user in question.
- If you want to create persistent devices and give ownership of them to
- unprivileged users, then you need the /dev/net/tun device to be usable by
- those users.
+
+ There's no harm in allowing the device to be accessible by non-root users,
+ since CAP_NET_ADMIN is required for creating network devices or for
+ connecting to network devices which aren't owned by the user in question.
+ If you want to create persistent devices and give ownership of them to
+ unprivileged users, then you need the /dev/net/tun device to be usable by
+ those users.
Driver module autoloading
Make sure that "Kernel module loader" - module auto-loading
support is enabled in your kernel. The kernel should load it on
first access.
-
- Manual loading
- insert the module by hand:
- modprobe tun
+
+ Manual loading
+
+ insert the module by hand::
+
+ modprobe tun
If you do it the latter way, you have to load the module every time you
need it, if you do it the other way it will be automatically loaded when
/dev/net/tun is being opened.
-3. Program interface
- 3.1 Network device allocation:
+3. Program interface
+====================
+
+3.1 Network device allocation
+-----------------------------
- char *dev should be the name of the device with a format string (e.g.
- "tun%d"), but (as far as I can see) this can be any valid network device name.
- Note that the character pointer becomes overwritten with the real device name
- (e.g. "tun0")
+``char *dev`` should be the name of the device with a format string (e.g.
+"tun%d"), but (as far as I can see) this can be any valid network device name.
+Note that the character pointer becomes overwritten with the real device name
+(e.g. "tun0")::
#include <linux/if.h>
#include <linux/if_tun.h>
@@ -78,45 +96,51 @@ Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com>
int fd, err;
if( (fd = open("/dev/net/tun", O_RDWR)) < 0 )
- return tun_alloc_old(dev);
+ return tun_alloc_old(dev);
memset(&ifr, 0, sizeof(ifr));
- /* Flags: IFF_TUN - TUN device (no Ethernet headers)
- * IFF_TAP - TAP device
+ /* Flags: IFF_TUN - TUN device (no Ethernet headers)
+ * IFF_TAP - TAP device
*
- * IFF_NO_PI - Do not provide packet information
- */
- ifr.ifr_flags = IFF_TUN;
+ * IFF_NO_PI - Do not provide packet information
+ */
+ ifr.ifr_flags = IFF_TUN;
if( *dev )
- strncpy(ifr.ifr_name, dev, IFNAMSIZ);
+ strncpy(ifr.ifr_name, dev, IFNAMSIZ);
if( (err = ioctl(fd, TUNSETIFF, (void *) &ifr)) < 0 ){
- close(fd);
- return err;
+ close(fd);
+ return err;
}
strcpy(dev, ifr.ifr_name);
return fd;
- }
-
- 3.2 Frame format:
- If flag IFF_NO_PI is not set each frame format is:
+ }
+
+3.2 Frame format
+----------------
+
+If flag IFF_NO_PI is not set each frame format is::
+
Flags [2 bytes]
Proto [2 bytes]
Raw protocol(IP, IPv6, etc) frame.
- 3.3 Multiqueue tuntap interface:
+3.3 Multiqueue tuntap interface
+-------------------------------
+
+From version 3.8, Linux supports multiqueue tuntap which can uses multiple
+file descriptors (queues) to parallelize packets sending or receiving. The
+device allocation is the same as before, and if user wants to create multiple
+queues, TUNSETIFF with the same device name must be called many times with
+IFF_MULTI_QUEUE flag.
- From version 3.8, Linux supports multiqueue tuntap which can uses multiple
- file descriptors (queues) to parallelize packets sending or receiving. The
- device allocation is the same as before, and if user wants to create multiple
- queues, TUNSETIFF with the same device name must be called many times with
- IFF_MULTI_QUEUE flag.
+``char *dev`` should be the name of the device, queues is the number of queues
+to be created, fds is used to store and return the file descriptors (queues)
+created to the caller. Each file descriptor were served as the interface of a
+queue which could be accessed by userspace.
- char *dev should be the name of the device, queues is the number of queues to
- be created, fds is used to store and return the file descriptors (queues)
- created to the caller. Each file descriptor were served as the interface of a
- queue which could be accessed by userspace.
+::
#include <linux/if.h>
#include <linux/if_tun.h>
@@ -127,7 +151,7 @@ Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com>
int fd, err, i;
if (!dev)
- return -1;
+ return -1;
memset(&ifr, 0, sizeof(ifr));
/* Flags: IFF_TUN - TUN device (no Ethernet headers)
@@ -140,30 +164,30 @@ Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com>
strcpy(ifr.ifr_name, dev);
for (i = 0; i < queues; i++) {
- if ((fd = open("/dev/net/tun", O_RDWR)) < 0)
- goto err;
- err = ioctl(fd, TUNSETIFF, (void *)&ifr);
- if (err) {
- close(fd);
- goto err;
- }
- fds[i] = fd;
+ if ((fd = open("/dev/net/tun", O_RDWR)) < 0)
+ goto err;
+ err = ioctl(fd, TUNSETIFF, (void *)&ifr);
+ if (err) {
+ close(fd);
+ goto err;
+ }
+ fds[i] = fd;
}
return 0;
err:
for (--i; i >= 0; i--)
- close(fds[i]);
+ close(fds[i]);
return err;
}
- A new ioctl(TUNSETQUEUE) were introduced to enable or disable a queue. When
- calling it with IFF_DETACH_QUEUE flag, the queue were disabled. And when
- calling it with IFF_ATTACH_QUEUE flag, the queue were enabled. The queue were
- enabled by default after it was created through TUNSETIFF.
+A new ioctl(TUNSETQUEUE) were introduced to enable or disable a queue. When
+calling it with IFF_DETACH_QUEUE flag, the queue were disabled. And when
+calling it with IFF_ATTACH_QUEUE flag, the queue were enabled. The queue were
+enabled by default after it was created through TUNSETIFF.
- fd is the file descriptor (queue) that we want to enable or disable, when
- enable is true we enable it, otherwise we disable it
+fd is the file descriptor (queue) that we want to enable or disable, when
+enable is true we enable it, otherwise we disable it::
#include <linux/if.h>
#include <linux/if_tun.h>
@@ -175,53 +199,61 @@ Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com>
memset(&ifr, 0, sizeof(ifr));
if (enable)
- ifr.ifr_flags = IFF_ATTACH_QUEUE;
+ ifr.ifr_flags = IFF_ATTACH_QUEUE;
else
- ifr.ifr_flags = IFF_DETACH_QUEUE;
+ ifr.ifr_flags = IFF_DETACH_QUEUE;
return ioctl(fd, TUNSETQUEUE, (void *)&ifr);
}
-Universal TUN/TAP device driver Frequently Asked Question.
-
+Universal TUN/TAP device driver Frequently Asked Question
+=========================================================
+
1. What platforms are supported by TUN/TAP driver ?
+
Currently driver has been written for 3 Unices:
- Linux kernels 2.2.x, 2.4.x
- FreeBSD 3.x, 4.x, 5.x
- Solaris 2.6, 7.0, 8.0
+
+ - Linux kernels 2.2.x, 2.4.x
+ - FreeBSD 3.x, 4.x, 5.x
+ - Solaris 2.6, 7.0, 8.0
2. What is TUN/TAP driver used for?
-As mentioned above, main purpose of TUN/TAP driver is tunneling.
+
+As mentioned above, main purpose of TUN/TAP driver is tunneling.
It is used by VTun (http://vtun.sourceforge.net).
Another interesting application using TUN/TAP is pipsecd
(http://perso.enst.fr/~beyssac/pipsec/), a userspace IPSec
implementation that can use complete kernel routing (unlike FreeS/WAN).
-3. How does Virtual network device actually work ?
+3. How does Virtual network device actually work ?
+
Virtual network device can be viewed as a simple Point-to-Point or
-Ethernet device, which instead of receiving packets from a physical
-media, receives them from user space program and instead of sending
-packets via physical media sends them to the user space program.
+Ethernet device, which instead of receiving packets from a physical
+media, receives them from user space program and instead of sending
+packets via physical media sends them to the user space program.
Let's say that you configured IPv6 on the tap0, then whenever
the kernel sends an IPv6 packet to tap0, it is passed to the application
-(VTun for example). The application encrypts, compresses and sends it to
+(VTun for example). The application encrypts, compresses and sends it to
the other side over TCP or UDP. The application on the other side decompresses
-and decrypts the data received and writes the packet to the TAP device,
+and decrypts the data received and writes the packet to the TAP device,
the kernel handles the packet like it came from real physical device.
4. What is the difference between TUN driver and TAP driver?
+
TUN works with IP frames. TAP works with Ethernet frames.
This means that you have to read/write IP packets when you are using tun and
ethernet frames when using tap.
5. What is the difference between BPF and TUN/TAP driver?
+
BPF is an advanced packet filter. It can be attached to existing
network interface. It does not provide a virtual network interface.
A TUN/TAP driver does provide a virtual network interface and it is possible
to attach BPF to this interface.
6. Does TAP driver support kernel Ethernet bridging?
-Yes. Linux and FreeBSD drivers support Ethernet bridging.
+
+Yes. Linux and FreeBSD drivers support Ethernet bridging.
diff --git a/Documentation/networking/udplite.txt b/Documentation/networking/udplite.rst
index 53a726855e49..2c225f28b7b2 100644
--- a/Documentation/networking/udplite.txt
+++ b/Documentation/networking/udplite.rst
@@ -1,6 +1,8 @@
- ===========================================================================
- The UDP-Lite protocol (RFC 3828)
- ===========================================================================
+.. SPDX-License-Identifier: GPL-2.0
+
+================================
+The UDP-Lite protocol (RFC 3828)
+================================
UDP-Lite is a Standards-Track IETF transport protocol whose characteristic
@@ -11,39 +13,43 @@
This file briefly describes the existing kernel support and the socket API.
For in-depth information, you can consult:
- o The UDP-Lite Homepage:
- http://web.archive.org/web/*/http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/
- From here you can also download some example application source code.
+ - The UDP-Lite Homepage:
+ http://web.archive.org/web/%2E/http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/
+
+ From here you can also download some example application source code.
- o The UDP-Lite HOWTO on
- http://web.archive.org/web/*/http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/
- files/UDP-Lite-HOWTO.txt
+ - The UDP-Lite HOWTO on
+ http://web.archive.org/web/%2E/http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/UDP-Lite-HOWTO.txt
- o The Wireshark UDP-Lite WiKi (with capture files):
- https://wiki.wireshark.org/Lightweight_User_Datagram_Protocol
+ - The Wireshark UDP-Lite WiKi (with capture files):
+ https://wiki.wireshark.org/Lightweight_User_Datagram_Protocol
- o The Protocol Spec, RFC 3828, http://www.ietf.org/rfc/rfc3828.txt
+ - The Protocol Spec, RFC 3828, http://www.ietf.org/rfc/rfc3828.txt
- I) APPLICATIONS
+1. Applications
+===============
Several applications have been ported successfully to UDP-Lite. Ethereal
- (now called wireshark) has UDP-Litev4/v6 support by default.
+ (now called wireshark) has UDP-Litev4/v6 support by default.
+
Porting applications to UDP-Lite is straightforward: only socket level and
IPPROTO need to be changed; senders additionally set the checksum coverage
length (default = header length = 8). Details are in the next section.
-
- II) PROGRAMMING API
+2. Programming API
+==================
UDP-Lite provides a connectionless, unreliable datagram service and hence
uses the same socket type as UDP. In fact, porting from UDP to UDP-Lite is
- very easy: simply add `IPPROTO_UDPLITE' as the last argument of the socket(2)
- call so that the statement looks like:
+ very easy: simply add ``IPPROTO_UDPLITE`` as the last argument of the
+ socket(2) call so that the statement looks like::
s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDPLITE);
- or, respectively,
+ or, respectively,
+
+ ::
s = socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDPLITE);
@@ -56,10 +62,10 @@
* Sender checksum coverage: UDPLITE_SEND_CSCOV
- For example,
+ For example::
- int val = 20;
- setsockopt(s, SOL_UDPLITE, UDPLITE_SEND_CSCOV, &val, sizeof(int));
+ int val = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_SEND_CSCOV, &val, sizeof(int));
sets the checksum coverage length to 20 bytes (12b data + 8b header).
Of each packet only the first 20 bytes (plus the pseudo-header) will be
@@ -74,10 +80,10 @@
that of a traffic filter: when enabled, it instructs the kernel to drop
all packets which have a coverage _less_ than this value. For example, if
RTP and UDP headers are to be protected, a receiver can enforce that only
- packets with a minimum coverage of 20 are admitted:
+ packets with a minimum coverage of 20 are admitted::
- int min = 20;
- setsockopt(s, SOL_UDPLITE, UDPLITE_RECV_CSCOV, &min, sizeof(int));
+ int min = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_RECV_CSCOV, &min, sizeof(int));
The calls to getsockopt(2) are analogous. Being an extension and not a stand-
alone protocol, all socket options known from UDP can be used in exactly the
@@ -85,18 +91,18 @@
A detailed discussion of UDP-Lite checksum coverage options is in section IV.
-
- III) HEADER FILES
+3. Header Files
+===============
The socket API requires support through header files in /usr/include:
* /usr/include/netinet/in.h
- to define IPPROTO_UDPLITE
+ to define IPPROTO_UDPLITE
* /usr/include/netinet/udplite.h
- for UDP-Lite header fields and protocol constants
+ for UDP-Lite header fields and protocol constants
- For testing purposes, the following can serve as a `mini' header file:
+ For testing purposes, the following can serve as a ``mini`` header file::
#define IPPROTO_UDPLITE 136
#define SOL_UDPLITE 136
@@ -105,8 +111,9 @@
Ready-made header files for various distros are in the UDP-Lite tarball.
+4. Kernel Behaviour with Regards to the Various Socket Options
+==============================================================
- IV) KERNEL BEHAVIOUR WITH REGARD TO THE VARIOUS SOCKET OPTIONS
To enable debugging messages, the log level need to be set to 8, as most
messages use the KERN_DEBUG level (7).
@@ -136,13 +143,13 @@
3) Disabling the Checksum Computation
On both sender and receiver, checksumming will always be performed
- and cannot be disabled using SO_NO_CHECK. Thus
+ and cannot be disabled using SO_NO_CHECK. Thus::
- setsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, ... );
+ setsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, ... );
- will always will be ignored, while the value of
+ will always will be ignored, while the value of::
- getsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, &value, ...);
+ getsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, &value, ...);
is meaningless (as in TCP). Packets with a zero checksum field are
illegal (cf. RFC 3828, sec. 3.1) and will be silently discarded.
@@ -167,15 +174,15 @@
first one contains the L4 header.
The send buffer size has implications on the checksum coverage length.
- Consider the following example:
+ Consider the following example::
- Payload: 1536 bytes Send Buffer: 1024 bytes
- MTU: 1500 bytes Coverage Length: 856 bytes
+ Payload: 1536 bytes Send Buffer: 1024 bytes
+ MTU: 1500 bytes Coverage Length: 856 bytes
- UDP-Lite will ship the 1536 bytes in two separate packets:
+ UDP-Lite will ship the 1536 bytes in two separate packets::
- Packet 1: 1024 payload + 8 byte header + 20 byte IP header = 1052 bytes
- Packet 2: 512 payload + 8 byte header + 20 byte IP header = 540 bytes
+ Packet 1: 1024 payload + 8 byte header + 20 byte IP header = 1052 bytes
+ Packet 2: 512 payload + 8 byte header + 20 byte IP header = 540 bytes
The coverage packet covers the UDP-Lite header and 848 bytes of the
payload in the first packet, the second packet is fully covered. Note
@@ -184,17 +191,17 @@
length in such cases.
As an example of what happens when one UDP-Lite packet is split into
- several tiny fragments, consider the following example.
+ several tiny fragments, consider the following example::
- Payload: 1024 bytes Send buffer size: 1024 bytes
- MTU: 300 bytes Coverage length: 575 bytes
+ Payload: 1024 bytes Send buffer size: 1024 bytes
+ MTU: 300 bytes Coverage length: 575 bytes
- +-+-----------+--------------+--------------+--------------+
- |8| 272 | 280 | 280 | 280 |
- +-+-----------+--------------+--------------+--------------+
- 280 560 840 1032
- ^
- *****checksum coverage*************
+ +-+-----------+--------------+--------------+--------------+
+ |8| 272 | 280 | 280 | 280 |
+ +-+-----------+--------------+--------------+--------------+
+ 280 560 840 1032
+ ^
+ *****checksum coverage*************
The UDP-Lite module generates one 1032 byte packet (1024 + 8 byte
header). According to the interface MTU, these are split into 4 IP
@@ -208,7 +215,7 @@
lengths), only the first fragment needs to be considered. When using
larger checksum coverage lengths, each eligible fragment needs to be
checksummed. Suppose we have a checksum coverage of 3062. The buffer
- of 3356 bytes will be split into the following fragments:
+ of 3356 bytes will be split into the following fragments::
Fragment 1: 1280 bytes carrying 1232 bytes of UDP-Lite data
Fragment 2: 1280 bytes carrying 1232 bytes of UDP-Lite data
@@ -222,57 +229,63 @@
performance over wireless (or generally noisy) links and thus smaller
coverage lengths are likely to be expected.
-
- V) UDP-LITE RUNTIME STATISTICS AND THEIR MEANING
+5. UDP-Lite Runtime Statistics and their Meaning
+================================================
Exceptional and error conditions are logged to syslog at the KERN_DEBUG
level. Live statistics about UDP-Lite are available in /proc/net/snmp
- and can (with newer versions of netstat) be viewed using
+ and can (with newer versions of netstat) be viewed using::
- netstat -svu
+ netstat -svu
This displays UDP-Lite statistics variables, whose meaning is as follows.
- InDatagrams: The total number of datagrams delivered to users.
+ ============ =====================================================
+ InDatagrams The total number of datagrams delivered to users.
- NoPorts: Number of packets received to an unknown port.
- These cases are counted separately (not as InErrors).
+ NoPorts Number of packets received to an unknown port.
+ These cases are counted separately (not as InErrors).
- InErrors: Number of erroneous UDP-Lite packets. Errors include:
- * internal socket queue receive errors
- * packet too short (less than 8 bytes or stated
- coverage length exceeds received length)
- * xfrm4_policy_check() returned with error
- * application has specified larger min. coverage
- length than that of incoming packet
- * checksum coverage violated
- * bad checksum
+ InErrors Number of erroneous UDP-Lite packets. Errors include:
- OutDatagrams: Total number of sent datagrams.
+ * internal socket queue receive errors
+ * packet too short (less than 8 bytes or stated
+ coverage length exceeds received length)
+ * xfrm4_policy_check() returned with error
+ * application has specified larger min. coverage
+ length than that of incoming packet
+ * checksum coverage violated
+ * bad checksum
- These statistics derive from the UDP MIB (RFC 2013).
+ OutDatagrams Total number of sent datagrams.
+ ============ =====================================================
+ These statistics derive from the UDP MIB (RFC 2013).
- VI) IPTABLES
+6. IPtables
+===========
There is packet match support for UDP-Lite as well as support for the LOG target.
- If you copy and paste the following line into /etc/protocols,
+ If you copy and paste the following line into /etc/protocols::
- udplite 136 UDP-Lite # UDP-Lite [RFC 3828]
+ udplite 136 UDP-Lite # UDP-Lite [RFC 3828]
- then
- iptables -A INPUT -p udplite -j LOG
+ then::
- will produce logging output to syslog. Dropping and rejecting packets also works.
+ iptables -A INPUT -p udplite -j LOG
+ will produce logging output to syslog. Dropping and rejecting packets also works.
- VII) MAINTAINER ADDRESS
+7. Maintainer Address
+=====================
The UDP-Lite patch was developed at
- University of Aberdeen
- Electronics Research Group
- Department of Engineering
- Fraser Noble Building
- Aberdeen AB24 3UE; UK
+
+ University of Aberdeen
+ Electronics Research Group
+ Department of Engineering
+ Fraser Noble Building
+ Aberdeen AB24 3UE; UK
+
The current maintainer is Gerrit Renker, <gerrit@erg.abdn.ac.uk>. Initial
code was developed by William Stanislaus, <william@erg.abdn.ac.uk>.
diff --git a/Documentation/networking/vrf.rst b/Documentation/networking/vrf.rst
new file mode 100644
index 000000000000..0dde145043bc
--- /dev/null
+++ b/Documentation/networking/vrf.rst
@@ -0,0 +1,451 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================
+Virtual Routing and Forwarding (VRF)
+====================================
+
+The VRF Device
+==============
+
+The VRF device combined with ip rules provides the ability to create virtual
+routing and forwarding domains (aka VRFs, VRF-lite to be specific) in the
+Linux network stack. One use case is the multi-tenancy problem where each
+tenant has their own unique routing tables and in the very least need
+different default gateways.
+
+Processes can be "VRF aware" by binding a socket to the VRF device. Packets
+through the socket then use the routing table associated with the VRF
+device. An important feature of the VRF device implementation is that it
+impacts only Layer 3 and above so L2 tools (e.g., LLDP) are not affected
+(ie., they do not need to be run in each VRF). The design also allows
+the use of higher priority ip rules (Policy Based Routing, PBR) to take
+precedence over the VRF device rules directing specific traffic as desired.
+
+In addition, VRF devices allow VRFs to be nested within namespaces. For
+example network namespaces provide separation of network interfaces at the
+device layer, VLANs on the interfaces within a namespace provide L2 separation
+and then VRF devices provide L3 separation.
+
+Design
+------
+A VRF device is created with an associated route table. Network interfaces
+are then enslaved to a VRF device::
+
+ +-----------------------------+
+ | vrf-blue | ===> route table 10
+ +-----------------------------+
+ | | |
+ +------+ +------+ +-------------+
+ | eth1 | | eth2 | ... | bond1 |
+ +------+ +------+ +-------------+
+ | |
+ +------+ +------+
+ | eth8 | | eth9 |
+ +------+ +------+
+
+Packets received on an enslaved device and are switched to the VRF device
+in the IPv4 and IPv6 processing stacks giving the impression that packets
+flow through the VRF device. Similarly on egress routing rules are used to
+send packets to the VRF device driver before getting sent out the actual
+interface. This allows tcpdump on a VRF device to capture all packets into
+and out of the VRF as a whole\ [1]_. Similarly, netfilter\ [2]_ and tc rules
+can be applied using the VRF device to specify rules that apply to the VRF
+domain as a whole.
+
+.. [1] Packets in the forwarded state do not flow through the device, so those
+ packets are not seen by tcpdump. Will revisit this limitation in a
+ future release.
+
+.. [2] Iptables on ingress supports PREROUTING with skb->dev set to the real
+ ingress device and both INPUT and PREROUTING rules with skb->dev set to
+ the VRF device. For egress POSTROUTING and OUTPUT rules can be written
+ using either the VRF device or real egress device.
+
+Setup
+-----
+1. VRF device is created with an association to a FIB table.
+ e.g,::
+
+ ip link add vrf-blue type vrf table 10
+ ip link set dev vrf-blue up
+
+2. An l3mdev FIB rule directs lookups to the table associated with the device.
+ A single l3mdev rule is sufficient for all VRFs. The VRF device adds the
+ l3mdev rule for IPv4 and IPv6 when the first device is created with a
+ default preference of 1000. Users may delete the rule if desired and add
+ with a different priority or install per-VRF rules.
+
+ Prior to the v4.8 kernel iif and oif rules are needed for each VRF device::
+
+ ip ru add oif vrf-blue table 10
+ ip ru add iif vrf-blue table 10
+
+3. Set the default route for the table (and hence default route for the VRF)::
+
+ ip route add table 10 unreachable default metric 4278198272
+
+ This high metric value ensures that the default unreachable route can
+ be overridden by a routing protocol suite. FRRouting interprets
+ kernel metrics as a combined admin distance (upper byte) and priority
+ (lower 3 bytes). Thus the above metric translates to [255/8192].
+
+4. Enslave L3 interfaces to a VRF device::
+
+ ip link set dev eth1 master vrf-blue
+
+ Local and connected routes for enslaved devices are automatically moved to
+ the table associated with VRF device. Any additional routes depending on
+ the enslaved device are dropped and will need to be reinserted to the VRF
+ FIB table following the enslavement.
+
+ The IPv6 sysctl option keep_addr_on_down can be enabled to keep IPv6 global
+ addresses as VRF enslavement changes::
+
+ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1
+
+5. Additional VRF routes are added to associated table::
+
+ ip route add table 10 ...
+
+
+Applications
+------------
+Applications that are to work within a VRF need to bind their socket to the
+VRF device::
+
+ setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1);
+
+or to specify the output device using cmsg and IP_PKTINFO.
+
+By default the scope of the port bindings for unbound sockets is
+limited to the default VRF. That is, it will not be matched by packets
+arriving on interfaces enslaved to an l3mdev and processes may bind to
+the same port if they bind to an l3mdev.
+
+TCP & UDP services running in the default VRF context (ie., not bound
+to any VRF device) can work across all VRF domains by enabling the
+tcp_l3mdev_accept and udp_l3mdev_accept sysctl options::
+
+ sysctl -w net.ipv4.tcp_l3mdev_accept=1
+ sysctl -w net.ipv4.udp_l3mdev_accept=1
+
+These options are disabled by default so that a socket in a VRF is only
+selected for packets in that VRF. There is a similar option for RAW
+sockets, which is enabled by default for reasons of backwards compatibility.
+This is so as to specify the output device with cmsg and IP_PKTINFO, but
+using a socket not bound to the corresponding VRF. This allows e.g. older ping
+implementations to be run with specifying the device but without executing it
+in the VRF. This option can be disabled so that packets received in a VRF
+context are only handled by a raw socket bound to the VRF, and packets in the
+default VRF are only handled by a socket not bound to any VRF::
+
+ sysctl -w net.ipv4.raw_l3mdev_accept=0
+
+netfilter rules on the VRF device can be used to limit access to services
+running in the default VRF context as well.
+
+--------------------------------------------------------------------------------
+
+Using iproute2 for VRFs
+=======================
+iproute2 supports the vrf keyword as of v4.7. For backwards compatibility this
+section lists both commands where appropriate -- with the vrf keyword and the
+older form without it.
+
+1. Create a VRF
+
+ To instantiate a VRF device and associate it with a table::
+
+ $ ip link add dev NAME type vrf table ID
+
+ As of v4.8 the kernel supports the l3mdev FIB rule where a single rule
+ covers all VRFs. The l3mdev rule is created for IPv4 and IPv6 on first
+ device create.
+
+2. List VRFs
+
+ To list VRFs that have been created::
+
+ $ ip [-d] link show type vrf
+ NOTE: The -d option is needed to show the table id
+
+ For example::
+
+ $ ip -d link show type vrf
+ 11: mgmt: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
+ link/ether 72:b3:ba:91:e2:24 brd ff:ff:ff:ff:ff:ff promiscuity 0
+ vrf table 1 addrgenmode eui64
+ 12: red: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
+ link/ether b6:6f:6e:f6:da:73 brd ff:ff:ff:ff:ff:ff promiscuity 0
+ vrf table 10 addrgenmode eui64
+ 13: blue: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
+ link/ether 36:62:e8:7d:bb:8c brd ff:ff:ff:ff:ff:ff promiscuity 0
+ vrf table 66 addrgenmode eui64
+ 14: green: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
+ link/ether e6:28:b8:63:70:bb brd ff:ff:ff:ff:ff:ff promiscuity 0
+ vrf table 81 addrgenmode eui64
+
+
+ Or in brief output::
+
+ $ ip -br link show type vrf
+ mgmt UP 72:b3:ba:91:e2:24 <NOARP,MASTER,UP,LOWER_UP>
+ red UP b6:6f:6e:f6:da:73 <NOARP,MASTER,UP,LOWER_UP>
+ blue UP 36:62:e8:7d:bb:8c <NOARP,MASTER,UP,LOWER_UP>
+ green UP e6:28:b8:63:70:bb <NOARP,MASTER,UP,LOWER_UP>
+
+
+3. Assign a Network Interface to a VRF
+
+ Network interfaces are assigned to a VRF by enslaving the netdevice to a
+ VRF device::
+
+ $ ip link set dev NAME master NAME
+
+ On enslavement connected and local routes are automatically moved to the
+ table associated with the VRF device.
+
+ For example::
+
+ $ ip link set dev eth0 master mgmt
+
+
+4. Show Devices Assigned to a VRF
+
+ To show devices that have been assigned to a specific VRF add the master
+ option to the ip command::
+
+ $ ip link show vrf NAME
+ $ ip link show master NAME
+
+ For example::
+
+ $ ip link show vrf red
+ 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP mode DEFAULT group default qlen 1000
+ link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff
+ 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP mode DEFAULT group default qlen 1000
+ link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff
+ 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master red state DOWN mode DEFAULT group default qlen 1000
+ link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff
+
+
+ Or using the brief output::
+
+ $ ip -br link show vrf red
+ eth1 UP 02:00:00:00:02:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
+ eth2 UP 02:00:00:00:02:03 <BROADCAST,MULTICAST,UP,LOWER_UP>
+ eth5 DOWN 02:00:00:00:02:06 <BROADCAST,MULTICAST>
+
+
+5. Show Neighbor Entries for a VRF
+
+ To list neighbor entries associated with devices enslaved to a VRF device
+ add the master option to the ip command::
+
+ $ ip [-6] neigh show vrf NAME
+ $ ip [-6] neigh show master NAME
+
+ For example::
+
+ $ ip neigh show vrf red
+ 10.2.1.254 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE
+ 10.2.2.254 dev eth2 lladdr 5e:54:01:6a:ee:80 REACHABLE
+
+ $ ip -6 neigh show vrf red
+ 2002:1::64 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE
+
+
+6. Show Addresses for a VRF
+
+ To show addresses for interfaces associated with a VRF add the master
+ option to the ip command::
+
+ $ ip addr show vrf NAME
+ $ ip addr show master NAME
+
+ For example::
+
+ $ ip addr show vrf red
+ 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
+ link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff
+ inet 10.2.1.2/24 brd 10.2.1.255 scope global eth1
+ valid_lft forever preferred_lft forever
+ inet6 2002:1::2/120 scope global
+ valid_lft forever preferred_lft forever
+ inet6 fe80::ff:fe00:202/64 scope link
+ valid_lft forever preferred_lft forever
+ 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
+ link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff
+ inet 10.2.2.2/24 brd 10.2.2.255 scope global eth2
+ valid_lft forever preferred_lft forever
+ inet6 2002:2::2/120 scope global
+ valid_lft forever preferred_lft forever
+ inet6 fe80::ff:fe00:203/64 scope link
+ valid_lft forever preferred_lft forever
+ 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master red state DOWN group default qlen 1000
+ link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff
+
+ Or in brief format::
+
+ $ ip -br addr show vrf red
+ eth1 UP 10.2.1.2/24 2002:1::2/120 fe80::ff:fe00:202/64
+ eth2 UP 10.2.2.2/24 2002:2::2/120 fe80::ff:fe00:203/64
+ eth5 DOWN
+
+
+7. Show Routes for a VRF
+
+ To show routes for a VRF use the ip command to display the table associated
+ with the VRF device::
+
+ $ ip [-6] route show vrf NAME
+ $ ip [-6] route show table ID
+
+ For example::
+
+ $ ip route show vrf red
+ unreachable default metric 4278198272
+ broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.2
+ 10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.2
+ local 10.2.1.2 dev eth1 proto kernel scope host src 10.2.1.2
+ broadcast 10.2.1.255 dev eth1 proto kernel scope link src 10.2.1.2
+ broadcast 10.2.2.0 dev eth2 proto kernel scope link src 10.2.2.2
+ 10.2.2.0/24 dev eth2 proto kernel scope link src 10.2.2.2
+ local 10.2.2.2 dev eth2 proto kernel scope host src 10.2.2.2
+ broadcast 10.2.2.255 dev eth2 proto kernel scope link src 10.2.2.2
+
+ $ ip -6 route show vrf red
+ local 2002:1:: dev lo proto none metric 0 pref medium
+ local 2002:1::2 dev lo proto none metric 0 pref medium
+ 2002:1::/120 dev eth1 proto kernel metric 256 pref medium
+ local 2002:2:: dev lo proto none metric 0 pref medium
+ local 2002:2::2 dev lo proto none metric 0 pref medium
+ 2002:2::/120 dev eth2 proto kernel metric 256 pref medium
+ local fe80:: dev lo proto none metric 0 pref medium
+ local fe80:: dev lo proto none metric 0 pref medium
+ local fe80::ff:fe00:202 dev lo proto none metric 0 pref medium
+ local fe80::ff:fe00:203 dev lo proto none metric 0 pref medium
+ fe80::/64 dev eth1 proto kernel metric 256 pref medium
+ fe80::/64 dev eth2 proto kernel metric 256 pref medium
+ ff00::/8 dev red metric 256 pref medium
+ ff00::/8 dev eth1 metric 256 pref medium
+ ff00::/8 dev eth2 metric 256 pref medium
+ unreachable default dev lo metric 4278198272 error -101 pref medium
+
+8. Route Lookup for a VRF
+
+ A test route lookup can be done for a VRF::
+
+ $ ip [-6] route get vrf NAME ADDRESS
+ $ ip [-6] route get oif NAME ADDRESS
+
+ For example::
+
+ $ ip route get 10.2.1.40 vrf red
+ 10.2.1.40 dev eth1 table red src 10.2.1.2
+ cache
+
+ $ ip -6 route get 2002:1::32 vrf red
+ 2002:1::32 from :: dev eth1 table red proto kernel src 2002:1::2 metric 256 pref medium
+
+
+9. Removing Network Interface from a VRF
+
+ Network interfaces are removed from a VRF by breaking the enslavement to
+ the VRF device::
+
+ $ ip link set dev NAME nomaster
+
+ Connected routes are moved back to the default table and local entries are
+ moved to the local table.
+
+ For example::
+
+ $ ip link set dev eth0 nomaster
+
+--------------------------------------------------------------------------------
+
+Commands used in this example::
+
+ cat >> /etc/iproute2/rt_tables.d/vrf.conf <<EOF
+ 1 mgmt
+ 10 red
+ 66 blue
+ 81 green
+ EOF
+
+ function vrf_create
+ {
+ VRF=$1
+ TBID=$2
+
+ # create VRF device
+ ip link add ${VRF} type vrf table ${TBID}
+
+ if [ "${VRF}" != "mgmt" ]; then
+ ip route add table ${TBID} unreachable default metric 4278198272
+ fi
+ ip link set dev ${VRF} up
+ }
+
+ vrf_create mgmt 1
+ ip link set dev eth0 master mgmt
+
+ vrf_create red 10
+ ip link set dev eth1 master red
+ ip link set dev eth2 master red
+ ip link set dev eth5 master red
+
+ vrf_create blue 66
+ ip link set dev eth3 master blue
+
+ vrf_create green 81
+ ip link set dev eth4 master green
+
+
+ Interface addresses from /etc/network/interfaces:
+ auto eth0
+ iface eth0 inet static
+ address 10.0.0.2
+ netmask 255.255.255.0
+ gateway 10.0.0.254
+
+ iface eth0 inet6 static
+ address 2000:1::2
+ netmask 120
+
+ auto eth1
+ iface eth1 inet static
+ address 10.2.1.2
+ netmask 255.255.255.0
+
+ iface eth1 inet6 static
+ address 2002:1::2
+ netmask 120
+
+ auto eth2
+ iface eth2 inet static
+ address 10.2.2.2
+ netmask 255.255.255.0
+
+ iface eth2 inet6 static
+ address 2002:2::2
+ netmask 120
+
+ auto eth3
+ iface eth3 inet static
+ address 10.2.3.2
+ netmask 255.255.255.0
+
+ iface eth3 inet6 static
+ address 2002:3::2
+ netmask 120
+
+ auto eth4
+ iface eth4 inet static
+ address 10.2.4.2
+ netmask 255.255.255.0
+
+ iface eth4 inet6 static
+ address 2002:4::2
+ netmask 120
diff --git a/Documentation/networking/vrf.txt b/Documentation/networking/vrf.txt
deleted file mode 100644
index a5f103b083a0..000000000000
--- a/Documentation/networking/vrf.txt
+++ /dev/null
@@ -1,418 +0,0 @@
-Virtual Routing and Forwarding (VRF)
-====================================
-The VRF device combined with ip rules provides the ability to create virtual
-routing and forwarding domains (aka VRFs, VRF-lite to be specific) in the
-Linux network stack. One use case is the multi-tenancy problem where each
-tenant has their own unique routing tables and in the very least need
-different default gateways.
-
-Processes can be "VRF aware" by binding a socket to the VRF device. Packets
-through the socket then use the routing table associated with the VRF
-device. An important feature of the VRF device implementation is that it
-impacts only Layer 3 and above so L2 tools (e.g., LLDP) are not affected
-(ie., they do not need to be run in each VRF). The design also allows
-the use of higher priority ip rules (Policy Based Routing, PBR) to take
-precedence over the VRF device rules directing specific traffic as desired.
-
-In addition, VRF devices allow VRFs to be nested within namespaces. For
-example network namespaces provide separation of network interfaces at the
-device layer, VLANs on the interfaces within a namespace provide L2 separation
-and then VRF devices provide L3 separation.
-
-Design
-------
-A VRF device is created with an associated route table. Network interfaces
-are then enslaved to a VRF device:
-
- +-----------------------------+
- | vrf-blue | ===> route table 10
- +-----------------------------+
- | | |
- +------+ +------+ +-------------+
- | eth1 | | eth2 | ... | bond1 |
- +------+ +------+ +-------------+
- | |
- +------+ +------+
- | eth8 | | eth9 |
- +------+ +------+
-
-Packets received on an enslaved device and are switched to the VRF device
-in the IPv4 and IPv6 processing stacks giving the impression that packets
-flow through the VRF device. Similarly on egress routing rules are used to
-send packets to the VRF device driver before getting sent out the actual
-interface. This allows tcpdump on a VRF device to capture all packets into
-and out of the VRF as a whole.[1] Similarly, netfilter[2] and tc rules can be
-applied using the VRF device to specify rules that apply to the VRF domain
-as a whole.
-
-[1] Packets in the forwarded state do not flow through the device, so those
- packets are not seen by tcpdump. Will revisit this limitation in a
- future release.
-
-[2] Iptables on ingress supports PREROUTING with skb->dev set to the real
- ingress device and both INPUT and PREROUTING rules with skb->dev set to
- the VRF device. For egress POSTROUTING and OUTPUT rules can be written
- using either the VRF device or real egress device.
-
-Setup
------
-1. VRF device is created with an association to a FIB table.
- e.g, ip link add vrf-blue type vrf table 10
- ip link set dev vrf-blue up
-
-2. An l3mdev FIB rule directs lookups to the table associated with the device.
- A single l3mdev rule is sufficient for all VRFs. The VRF device adds the
- l3mdev rule for IPv4 and IPv6 when the first device is created with a
- default preference of 1000. Users may delete the rule if desired and add
- with a different priority or install per-VRF rules.
-
- Prior to the v4.8 kernel iif and oif rules are needed for each VRF device:
- ip ru add oif vrf-blue table 10
- ip ru add iif vrf-blue table 10
-
-3. Set the default route for the table (and hence default route for the VRF).
- ip route add table 10 unreachable default metric 4278198272
-
- This high metric value ensures that the default unreachable route can
- be overridden by a routing protocol suite. FRRouting interprets
- kernel metrics as a combined admin distance (upper byte) and priority
- (lower 3 bytes). Thus the above metric translates to [255/8192].
-
-4. Enslave L3 interfaces to a VRF device.
- ip link set dev eth1 master vrf-blue
-
- Local and connected routes for enslaved devices are automatically moved to
- the table associated with VRF device. Any additional routes depending on
- the enslaved device are dropped and will need to be reinserted to the VRF
- FIB table following the enslavement.
-
- The IPv6 sysctl option keep_addr_on_down can be enabled to keep IPv6 global
- addresses as VRF enslavement changes.
- sysctl -w net.ipv6.conf.all.keep_addr_on_down=1
-
-5. Additional VRF routes are added to associated table.
- ip route add table 10 ...
-
-
-Applications
-------------
-Applications that are to work within a VRF need to bind their socket to the
-VRF device:
-
- setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1);
-
-or to specify the output device using cmsg and IP_PKTINFO.
-
-By default the scope of the port bindings for unbound sockets is
-limited to the default VRF. That is, it will not be matched by packets
-arriving on interfaces enslaved to an l3mdev and processes may bind to
-the same port if they bind to an l3mdev.
-
-TCP & UDP services running in the default VRF context (ie., not bound
-to any VRF device) can work across all VRF domains by enabling the
-tcp_l3mdev_accept and udp_l3mdev_accept sysctl options:
-
- sysctl -w net.ipv4.tcp_l3mdev_accept=1
- sysctl -w net.ipv4.udp_l3mdev_accept=1
-
-These options are disabled by default so that a socket in a VRF is only
-selected for packets in that VRF. There is a similar option for RAW
-sockets, which is enabled by default for reasons of backwards compatibility.
-This is so as to specify the output device with cmsg and IP_PKTINFO, but
-using a socket not bound to the corresponding VRF. This allows e.g. older ping
-implementations to be run with specifying the device but without executing it
-in the VRF. This option can be disabled so that packets received in a VRF
-context are only handled by a raw socket bound to the VRF, and packets in the
-default VRF are only handled by a socket not bound to any VRF:
-
- sysctl -w net.ipv4.raw_l3mdev_accept=0
-
-netfilter rules on the VRF device can be used to limit access to services
-running in the default VRF context as well.
-
-################################################################################
-
-Using iproute2 for VRFs
-=======================
-iproute2 supports the vrf keyword as of v4.7. For backwards compatibility this
-section lists both commands where appropriate -- with the vrf keyword and the
-older form without it.
-
-1. Create a VRF
-
- To instantiate a VRF device and associate it with a table:
- $ ip link add dev NAME type vrf table ID
-
- As of v4.8 the kernel supports the l3mdev FIB rule where a single rule
- covers all VRFs. The l3mdev rule is created for IPv4 and IPv6 on first
- device create.
-
-2. List VRFs
-
- To list VRFs that have been created:
- $ ip [-d] link show type vrf
- NOTE: The -d option is needed to show the table id
-
- For example:
- $ ip -d link show type vrf
- 11: mgmt: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
- link/ether 72:b3:ba:91:e2:24 brd ff:ff:ff:ff:ff:ff promiscuity 0
- vrf table 1 addrgenmode eui64
- 12: red: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
- link/ether b6:6f:6e:f6:da:73 brd ff:ff:ff:ff:ff:ff promiscuity 0
- vrf table 10 addrgenmode eui64
- 13: blue: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
- link/ether 36:62:e8:7d:bb:8c brd ff:ff:ff:ff:ff:ff promiscuity 0
- vrf table 66 addrgenmode eui64
- 14: green: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
- link/ether e6:28:b8:63:70:bb brd ff:ff:ff:ff:ff:ff promiscuity 0
- vrf table 81 addrgenmode eui64
-
-
- Or in brief output:
-
- $ ip -br link show type vrf
- mgmt UP 72:b3:ba:91:e2:24 <NOARP,MASTER,UP,LOWER_UP>
- red UP b6:6f:6e:f6:da:73 <NOARP,MASTER,UP,LOWER_UP>
- blue UP 36:62:e8:7d:bb:8c <NOARP,MASTER,UP,LOWER_UP>
- green UP e6:28:b8:63:70:bb <NOARP,MASTER,UP,LOWER_UP>
-
-
-3. Assign a Network Interface to a VRF
-
- Network interfaces are assigned to a VRF by enslaving the netdevice to a
- VRF device:
- $ ip link set dev NAME master NAME
-
- On enslavement connected and local routes are automatically moved to the
- table associated with the VRF device.
-
- For example:
- $ ip link set dev eth0 master mgmt
-
-
-4. Show Devices Assigned to a VRF
-
- To show devices that have been assigned to a specific VRF add the master
- option to the ip command:
- $ ip link show vrf NAME
- $ ip link show master NAME
-
- For example:
- $ ip link show vrf red
- 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP mode DEFAULT group default qlen 1000
- link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff
- 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP mode DEFAULT group default qlen 1000
- link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff
- 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master red state DOWN mode DEFAULT group default qlen 1000
- link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff
-
-
- Or using the brief output:
- $ ip -br link show vrf red
- eth1 UP 02:00:00:00:02:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
- eth2 UP 02:00:00:00:02:03 <BROADCAST,MULTICAST,UP,LOWER_UP>
- eth5 DOWN 02:00:00:00:02:06 <BROADCAST,MULTICAST>
-
-
-5. Show Neighbor Entries for a VRF
-
- To list neighbor entries associated with devices enslaved to a VRF device
- add the master option to the ip command:
- $ ip [-6] neigh show vrf NAME
- $ ip [-6] neigh show master NAME
-
- For example:
- $ ip neigh show vrf red
- 10.2.1.254 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE
- 10.2.2.254 dev eth2 lladdr 5e:54:01:6a:ee:80 REACHABLE
-
- $ ip -6 neigh show vrf red
- 2002:1::64 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE
-
-
-6. Show Addresses for a VRF
-
- To show addresses for interfaces associated with a VRF add the master
- option to the ip command:
- $ ip addr show vrf NAME
- $ ip addr show master NAME
-
- For example:
- $ ip addr show vrf red
- 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
- link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff
- inet 10.2.1.2/24 brd 10.2.1.255 scope global eth1
- valid_lft forever preferred_lft forever
- inet6 2002:1::2/120 scope global
- valid_lft forever preferred_lft forever
- inet6 fe80::ff:fe00:202/64 scope link
- valid_lft forever preferred_lft forever
- 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
- link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff
- inet 10.2.2.2/24 brd 10.2.2.255 scope global eth2
- valid_lft forever preferred_lft forever
- inet6 2002:2::2/120 scope global
- valid_lft forever preferred_lft forever
- inet6 fe80::ff:fe00:203/64 scope link
- valid_lft forever preferred_lft forever
- 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master red state DOWN group default qlen 1000
- link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff
-
- Or in brief format:
- $ ip -br addr show vrf red
- eth1 UP 10.2.1.2/24 2002:1::2/120 fe80::ff:fe00:202/64
- eth2 UP 10.2.2.2/24 2002:2::2/120 fe80::ff:fe00:203/64
- eth5 DOWN
-
-
-7. Show Routes for a VRF
-
- To show routes for a VRF use the ip command to display the table associated
- with the VRF device:
- $ ip [-6] route show vrf NAME
- $ ip [-6] route show table ID
-
- For example:
- $ ip route show vrf red
- unreachable default metric 4278198272
- broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.2
- 10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.2
- local 10.2.1.2 dev eth1 proto kernel scope host src 10.2.1.2
- broadcast 10.2.1.255 dev eth1 proto kernel scope link src 10.2.1.2
- broadcast 10.2.2.0 dev eth2 proto kernel scope link src 10.2.2.2
- 10.2.2.0/24 dev eth2 proto kernel scope link src 10.2.2.2
- local 10.2.2.2 dev eth2 proto kernel scope host src 10.2.2.2
- broadcast 10.2.2.255 dev eth2 proto kernel scope link src 10.2.2.2
-
- $ ip -6 route show vrf red
- local 2002:1:: dev lo proto none metric 0 pref medium
- local 2002:1::2 dev lo proto none metric 0 pref medium
- 2002:1::/120 dev eth1 proto kernel metric 256 pref medium
- local 2002:2:: dev lo proto none metric 0 pref medium
- local 2002:2::2 dev lo proto none metric 0 pref medium
- 2002:2::/120 dev eth2 proto kernel metric 256 pref medium
- local fe80:: dev lo proto none metric 0 pref medium
- local fe80:: dev lo proto none metric 0 pref medium
- local fe80::ff:fe00:202 dev lo proto none metric 0 pref medium
- local fe80::ff:fe00:203 dev lo proto none metric 0 pref medium
- fe80::/64 dev eth1 proto kernel metric 256 pref medium
- fe80::/64 dev eth2 proto kernel metric 256 pref medium
- ff00::/8 dev red metric 256 pref medium
- ff00::/8 dev eth1 metric 256 pref medium
- ff00::/8 dev eth2 metric 256 pref medium
- unreachable default dev lo metric 4278198272 error -101 pref medium
-
-8. Route Lookup for a VRF
-
- A test route lookup can be done for a VRF:
- $ ip [-6] route get vrf NAME ADDRESS
- $ ip [-6] route get oif NAME ADDRESS
-
- For example:
- $ ip route get 10.2.1.40 vrf red
- 10.2.1.40 dev eth1 table red src 10.2.1.2
- cache
-
- $ ip -6 route get 2002:1::32 vrf red
- 2002:1::32 from :: dev eth1 table red proto kernel src 2002:1::2 metric 256 pref medium
-
-
-9. Removing Network Interface from a VRF
-
- Network interfaces are removed from a VRF by breaking the enslavement to
- the VRF device:
- $ ip link set dev NAME nomaster
-
- Connected routes are moved back to the default table and local entries are
- moved to the local table.
-
- For example:
- $ ip link set dev eth0 nomaster
-
---------------------------------------------------------------------------------
-
-Commands used in this example:
-
-cat >> /etc/iproute2/rt_tables.d/vrf.conf <<EOF
-1 mgmt
-10 red
-66 blue
-81 green
-EOF
-
-function vrf_create
-{
- VRF=$1
- TBID=$2
-
- # create VRF device
- ip link add ${VRF} type vrf table ${TBID}
-
- if [ "${VRF}" != "mgmt" ]; then
- ip route add table ${TBID} unreachable default metric 4278198272
- fi
- ip link set dev ${VRF} up
-}
-
-vrf_create mgmt 1
-ip link set dev eth0 master mgmt
-
-vrf_create red 10
-ip link set dev eth1 master red
-ip link set dev eth2 master red
-ip link set dev eth5 master red
-
-vrf_create blue 66
-ip link set dev eth3 master blue
-
-vrf_create green 81
-ip link set dev eth4 master green
-
-
-Interface addresses from /etc/network/interfaces:
-auto eth0
-iface eth0 inet static
- address 10.0.0.2
- netmask 255.255.255.0
- gateway 10.0.0.254
-
-iface eth0 inet6 static
- address 2000:1::2
- netmask 120
-
-auto eth1
-iface eth1 inet static
- address 10.2.1.2
- netmask 255.255.255.0
-
-iface eth1 inet6 static
- address 2002:1::2
- netmask 120
-
-auto eth2
-iface eth2 inet static
- address 10.2.2.2
- netmask 255.255.255.0
-
-iface eth2 inet6 static
- address 2002:2::2
- netmask 120
-
-auto eth3
-iface eth3 inet static
- address 10.2.3.2
- netmask 255.255.255.0
-
-iface eth3 inet6 static
- address 2002:3::2
- netmask 120
-
-auto eth4
-iface eth4 inet static
- address 10.2.4.2
- netmask 255.255.255.0
-
-iface eth4 inet6 static
- address 2002:4::2
- netmask 120
diff --git a/Documentation/networking/vxlan.txt b/Documentation/networking/vxlan.rst
index c28f4989c3f0..ce239fa01848 100644
--- a/Documentation/networking/vxlan.txt
+++ b/Documentation/networking/vxlan.rst
@@ -1,3 +1,6 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================================
Virtual eXtensible Local Area Networking documentation
======================================================
@@ -21,8 +24,9 @@ neighbors GRE and VLAN. Configuring VXLAN requires the version of
iproute2 that matches the kernel release where VXLAN was first merged
upstream.
-1. Create vxlan device
- # ip link add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth1 dstport 4789
+1. Create vxlan device::
+
+ # ip link add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth1 dstport 4789
This creates a new device named vxlan0. The device uses the multicast
group 239.1.1.1 over eth1 to handle traffic for which there is no
@@ -32,20 +36,25 @@ pre-dates the IANA's selection of a standard destination port number
and uses the Linux-selected value by default to maintain backwards
compatibility.
-2. Delete vxlan device
- # ip link delete vxlan0
+2. Delete vxlan device::
+
+ # ip link delete vxlan0
-3. Show vxlan info
- # ip -d link show vxlan0
+3. Show vxlan info::
+
+ # ip -d link show vxlan0
It is possible to create, destroy and display the vxlan
forwarding table using the new bridge command.
-1. Create forwarding table entry
- # bridge fdb add to 00:17:42:8a:b4:05 dst 192.19.0.2 dev vxlan0
+1. Create forwarding table entry::
+
+ # bridge fdb add to 00:17:42:8a:b4:05 dst 192.19.0.2 dev vxlan0
+
+2. Delete forwarding table entry::
+
+ # bridge fdb delete 00:17:42:8a:b4:05 dev vxlan0
-2. Delete forwarding table entry
- # bridge fdb delete 00:17:42:8a:b4:05 dev vxlan0
+3. Show forwarding table::
-3. Show forwarding table
- # bridge fdb show dev vxlan0
+ # bridge fdb show dev vxlan0
diff --git a/Documentation/networking/x25-iface.txt b/Documentation/networking/x25-iface.rst
index 7f213b556e85..df401891dce6 100644
--- a/Documentation/networking/x25-iface.txt
+++ b/Documentation/networking/x25-iface.rst
@@ -1,4 +1,10 @@
- X.25 Device Driver Interface 1.1
+.. SPDX-License-Identifier: GPL-2.0
+
+============================-
+X.25 Device Driver Interface
+============================-
+
+Version 1.1
Jonathan Naylor 26.12.96
@@ -99,7 +105,7 @@ reduced by the following measures or a combination thereof:
(1) Drivers for kernel versions 2.4.x and above should always check the
return value of netif_rx(). If it returns NET_RX_DROP, the
driver's LAPB protocol must not confirm reception of the frame
- to the peer.
+ to the peer.
This will reliably suppress packet loss. The LAPB protocol will
automatically cause the peer to re-transmit the dropped packet
later.
diff --git a/Documentation/networking/x25.txt b/Documentation/networking/x25.rst
index c91c6d7159ff..00e45d384ba0 100644
--- a/Documentation/networking/x25.txt
+++ b/Documentation/networking/x25.rst
@@ -1,4 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================
Linux X.25 Project
+==================
As my third year dissertation at University I have taken it upon myself to
write an X.25 implementation for Linux. My aim is to provide a complete X.25
diff --git a/Documentation/networking/xfrm_device.txt b/Documentation/networking/xfrm_device.rst
index a1c904dc70dc..da1073acda96 100644
--- a/Documentation/networking/xfrm_device.txt
+++ b/Documentation/networking/xfrm_device.rst
@@ -1,7 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
===============================================
XFRM device - offloading the IPsec computations
===============================================
+
Shannon Nelson <shannon.nelson@oracle.com>
@@ -19,7 +21,7 @@ hardware offload.
Userland access to the offload is typically through a system such as
libreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can
be handy when experimenting. An example command might look something
-like this:
+like this::
ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
reqid 0x07 replay-window 32 \
@@ -34,15 +36,17 @@ Yes, that's ugly, but that's what shell scripts and/or libreswan are for.
Callbacks to implement
======================
-/* from include/linux/netdevice.h */
-struct xfrmdev_ops {
+::
+
+ /* from include/linux/netdevice.h */
+ struct xfrmdev_ops {
int (*xdo_dev_state_add) (struct xfrm_state *x);
void (*xdo_dev_state_delete) (struct xfrm_state *x);
void (*xdo_dev_state_free) (struct xfrm_state *x);
bool (*xdo_dev_offload_ok) (struct sk_buff *skb,
struct xfrm_state *x);
void (*xdo_dev_state_advance_esn) (struct xfrm_state *x);
-};
+ };
The NIC driver offering ipsec offload will need to implement these
callbacks to make the offload available to the network stack's
@@ -58,6 +62,8 @@ At probe time and before the call to register_netdev(), the driver should
set up local data structures and XFRM callbacks, and set the feature bits.
The XFRM code's listener will finish the setup on NETDEV_REGISTER.
+::
+
adapter->netdev->xfrmdev_ops = &ixgbe_xfrmdev_ops;
adapter->netdev->features |= NETIF_F_HW_ESP;
adapter->netdev->hw_enc_features |= NETIF_F_HW_ESP;
@@ -65,16 +71,20 @@ The XFRM code's listener will finish the setup on NETDEV_REGISTER.
When new SAs are set up with a request for "offload" feature, the
driver's xdo_dev_state_add() will be given the new SA to be offloaded
and an indication of whether it is for Rx or Tx. The driver should
+
- verify the algorithm is supported for offloads
- store the SA information (key, salt, target-ip, protocol, etc)
- enable the HW offload of the SA
- return status value:
+
+ =========== ===================================
0 success
-EOPNETSUPP offload not supported, try SW IPsec
other fail the request
+ =========== ===================================
The driver can also set an offload_handle in the SA, an opaque void pointer
-that can be used to convey context into the fast-path offload requests.
+that can be used to convey context into the fast-path offload requests::
xs->xso.offload_handle = context;
@@ -88,7 +98,7 @@ return true of false to signify its support.
When ready to send, the driver needs to inspect the Tx packet for the
offload information, including the opaque context, and set up the packet
-send accordingly.
+send accordingly::
xs = xfrm_input_state(skb);
context = xs->xso.offload_handle;
@@ -105,18 +115,21 @@ the packet's skb. At this point the data should be decrypted but the
IPsec headers are still in the packet data; they are removed later up
the stack in xfrm_input().
- find and hold the SA that was used to the Rx skb
+ find and hold the SA that was used to the Rx skb::
+
get spi, protocol, and destination IP from packet headers
xs = find xs from (spi, protocol, dest_IP)
xfrm_state_hold(xs);
- store the state information into the skb
+ store the state information into the skb::
+
sp = secpath_set(skb);
if (!sp) return;
sp->xvec[sp->len++] = xs;
sp->olen++;
- indicate the success and/or error status of the offload
+ indicate the success and/or error status of the offload::
+
xo = xfrm_offload(skb);
xo->flags = CRYPTO_DONE;
xo->status = crypto_status;
@@ -136,5 +149,3 @@ hardware needs.
As a netdev is set to DOWN the XFRM stack's netdev listener will call
xdo_dev_state_delete() and xdo_dev_state_free() on any remaining offloaded
states.
-
-
diff --git a/Documentation/networking/xfrm_proc.txt b/Documentation/networking/xfrm_proc.rst
index 2eae619ab67b..0a771c5a7399 100644
--- a/Documentation/networking/xfrm_proc.txt
+++ b/Documentation/networking/xfrm_proc.rst
@@ -1,5 +1,9 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
XFRM proc - /proc/net/xfrm_* files
==================================
+
Masahide NAKAMURA <nakam@linux-ipv6.org>
@@ -14,42 +18,58 @@ as part of the linux private MIB. These counters can be viewed in
Inbound errors
~~~~~~~~~~~~~~
+
XfrmInError:
All errors which is not matched others
+
XfrmInBufferError:
No buffer is left
+
XfrmInHdrError:
Header error
+
XfrmInNoStates:
No state is found
i.e. Either inbound SPI, address, or IPsec protocol at SA is wrong
+
XfrmInStateProtoError:
Transformation protocol specific error
e.g. SA key is wrong
+
XfrmInStateModeError:
Transformation mode specific error
+
XfrmInStateSeqError:
Sequence error
i.e. Sequence number is out of window
+
XfrmInStateExpired:
State is expired
+
XfrmInStateMismatch:
State has mismatch option
e.g. UDP encapsulation type is mismatch
+
XfrmInStateInvalid:
State is invalid
+
XfrmInTmplMismatch:
No matching template for states
e.g. Inbound SAs are correct but SP rule is wrong
+
XfrmInNoPols:
No policy is found for states
e.g. Inbound SAs are correct but no SP is found
+
XfrmInPolBlock:
Policy discards
+
XfrmInPolError:
Policy error
+
XfrmAcquireError:
State hasn't been fully acquired before use
+
XfrmFwdHdrError:
Forward routing of a packet is not allowed
@@ -57,26 +77,37 @@ Outbound errors
~~~~~~~~~~~~~~~
XfrmOutError:
All errors which is not matched others
+
XfrmOutBundleGenError:
Bundle generation error
+
XfrmOutBundleCheckError:
Bundle check error
+
XfrmOutNoStates:
No state is found
+
XfrmOutStateProtoError:
Transformation protocol specific error
+
XfrmOutStateModeError:
Transformation mode specific error
+
XfrmOutStateSeqError:
Sequence error
i.e. Sequence number overflow
+
XfrmOutStateExpired:
State is expired
+
XfrmOutPolBlock:
Policy discards
+
XfrmOutPolDead:
Policy is dead
+
XfrmOutPolError:
Policy error
+
XfrmOutStateInvalid:
State is invalid, perhaps expired
diff --git a/Documentation/networking/xfrm_sync.txt b/Documentation/networking/xfrm_sync.rst
index 8d88e0f2ec49..6246503ceab2 100644
--- a/Documentation/networking/xfrm_sync.txt
+++ b/Documentation/networking/xfrm_sync.rst
@@ -1,3 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====
+XFRM
+====
The sync patches work is based on initial patches from
Krisztian <hidden@balabit.hu> and others and additional patches
@@ -40,30 +45,32 @@ The netlink message types are:
XFRM_MSG_NEWAE and XFRM_MSG_GETAE.
A XFRM_MSG_GETAE does not have TLVs.
+
A XFRM_MSG_NEWAE will have at least two TLVs (as is
discussed further below).
-aevent_id structure looks like:
+aevent_id structure looks like::
struct xfrm_aevent_id {
- struct xfrm_usersa_id sa_id;
- xfrm_address_t saddr;
- __u32 flags;
- __u32 reqid;
+ struct xfrm_usersa_id sa_id;
+ xfrm_address_t saddr;
+ __u32 flags;
+ __u32 reqid;
};
The unique SA is identified by the combination of xfrm_usersa_id,
reqid and saddr.
flags are used to indicate different things. The possible
-flags are:
- XFRM_AE_RTHR=1, /* replay threshold*/
- XFRM_AE_RVAL=2, /* replay value */
- XFRM_AE_LVAL=4, /* lifetime value */
- XFRM_AE_ETHR=8, /* expiry timer threshold */
- XFRM_AE_CR=16, /* Event cause is replay update */
- XFRM_AE_CE=32, /* Event cause is timer expiry */
- XFRM_AE_CU=64, /* Event cause is policy update */
+flags are::
+
+ XFRM_AE_RTHR=1, /* replay threshold*/
+ XFRM_AE_RVAL=2, /* replay value */
+ XFRM_AE_LVAL=4, /* lifetime value */
+ XFRM_AE_ETHR=8, /* expiry timer threshold */
+ XFRM_AE_CR=16, /* Event cause is replay update */
+ XFRM_AE_CE=32, /* Event cause is timer expiry */
+ XFRM_AE_CU=64, /* Event cause is policy update */
How these flags are used is dependent on the direction of the
message (kernel<->user) as well the cause (config, query or event).
@@ -80,23 +87,27 @@ to get notified of these events.
-----------------------------------------
a) byte value (XFRMA_LTIME_VAL)
+
This TLV carries the running/current counter for byte lifetime since
last event.
b)replay value (XFRMA_REPLAY_VAL)
+
This TLV carries the running/current counter for replay sequence since
last event.
c)replay threshold (XFRMA_REPLAY_THRESH)
+
This TLV carries the threshold being used by the kernel to trigger events
when the replay sequence is exceeded.
d) expiry timer (XFRMA_ETIMER_THRESH)
+
This is a timer value in milliseconds which is used as the nagle
value to rate limit the events.
3) Default configurations for the parameters:
-----------------------------------------------
+---------------------------------------------
By default these events should be turned off unless there is
at least one listener registered to listen to the multicast
@@ -108,6 +119,7 @@ we also provide default threshold values for these different parameters
in case they are not specified.
the two sysctls/proc entries are:
+
a) /proc/sys/net/core/sysctl_xfrm_aevent_etime
used to provide default values for the XFRMA_ETIMER_THRESH in incremental
units of time of 100ms. The default is 10 (1 second)
@@ -120,37 +132,45 @@ in incremental packet count. The default is two packets.
----------------
a) XFRM_MSG_GETAE issued by user-->kernel.
-XFRM_MSG_GETAE does not carry any TLVs.
+ XFRM_MSG_GETAE does not carry any TLVs.
+
The response is a XFRM_MSG_NEWAE which is formatted based on what
XFRM_MSG_GETAE queried for.
+
The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
-*if XFRM_AE_RTHR flag is set, then XFRMA_REPLAY_THRESH is also retrieved
-*if XFRM_AE_ETHR flag is set, then XFRMA_ETIMER_THRESH is also retrieved
+* if XFRM_AE_RTHR flag is set, then XFRMA_REPLAY_THRESH is also retrieved
+* if XFRM_AE_ETHR flag is set, then XFRMA_ETIMER_THRESH is also retrieved
b) XFRM_MSG_NEWAE is issued by either user space to configure
-or kernel to announce events or respond to a XFRM_MSG_GETAE.
+ or kernel to announce events or respond to a XFRM_MSG_GETAE.
i) user --> kernel to configure a specific SA.
+
any of the values or threshold parameters can be updated by passing the
appropriate TLV.
+
A response is issued back to the sender in user space to indicate success
or failure.
+
In the case of success, additionally an event with
XFRM_MSG_NEWAE is also issued to any listeners as described in iii).
ii) kernel->user direction as a response to XFRM_MSG_GETAE
+
The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
+
The threshold TLVs will be included if explicitly requested in
the XFRM_MSG_GETAE message.
iii) kernel->user to report as event if someone sets any values or
-thresholds for an SA using XFRM_MSG_NEWAE (as described in #i above).
-In such a case XFRM_AE_CU flag is set to inform the user that
-the change happened as a result of an update.
-The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
+ thresholds for an SA using XFRM_MSG_NEWAE (as described in #i above).
+ In such a case XFRM_AE_CU flag is set to inform the user that
+ the change happened as a result of an update.
+ The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
iv) kernel->user to report event when replay threshold or a timeout
-is exceeded.
+ is exceeded.
+
In such a case either XFRM_AE_CR (replay exceeded) or XFRM_AE_CE (timeout
happened) is set to inform the user what happened.
Note the two flags are mutually exclusive.
diff --git a/Documentation/networking/xfrm_sysctl.txt b/Documentation/networking/xfrm_sysctl.rst
index 5bbd16792fe1..47b9bbdd0179 100644
--- a/Documentation/networking/xfrm_sysctl.txt
+++ b/Documentation/networking/xfrm_sysctl.rst
@@ -1,4 +1,11 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+XFRM Syscall
+============
+
/proc/sys/net/core/xfrm_* Variables:
+====================================
xfrm_acq_expires - INTEGER
default 30 - hard timeout in seconds for acquire requests
diff --git a/Documentation/networking/z8530drv.txt b/Documentation/networking/z8530drv.rst
index 2206abbc3e1b..d2942760f167 100644
--- a/Documentation/networking/z8530drv.txt
+++ b/Documentation/networking/z8530drv.rst
@@ -1,33 +1,30 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+=========================================================
+SCC.C - Linux driver for Z8530 based HDLC cards for AX.25
+=========================================================
+
+
This is a subset of the documentation. To use this driver you MUST have the
full package from:
Internet:
-=========
-1. ftp://ftp.ccac.rwth-aachen.de/pub/jr/z8530drv-utils_3.0-3.tar.gz
+ 1. ftp://ftp.ccac.rwth-aachen.de/pub/jr/z8530drv-utils_3.0-3.tar.gz
-2. ftp://ftp.pspt.fi/pub/ham/linux/ax25/z8530drv-utils_3.0-3.tar.gz
+ 2. ftp://ftp.pspt.fi/pub/ham/linux/ax25/z8530drv-utils_3.0-3.tar.gz
Please note that the information in this document may be hopelessly outdated.
A new version of the documentation, along with links to other important
Linux Kernel AX.25 documentation and programs, is available on
http://yaina.de/jreuter
------------------------------------------------------------------------------
-
-
- SCC.C - Linux driver for Z8530 based HDLC cards for AX.25
-
- ********************************************************************
-
- (c) 1993,2000 by Joerg Reuter DL1BKE <jreuter@yaina.de>
-
- portions (c) 1993 Guido ten Dolle PE1NNZ
-
- for the complete copyright notice see >> Copying.Z8530DRV <<
+Copyright |copy| 1993,2000 by Joerg Reuter DL1BKE <jreuter@yaina.de>
- ********************************************************************
+portions Copyright |copy| 1993 Guido ten Dolle PE1NNZ
+for the complete copyright notice see >> Copying.Z8530DRV <<
1. Initialization of the driver
===============================
@@ -50,7 +47,7 @@ AX.25-HOWTO on how to emulate a KISS TNC on network device drivers.
(If you're going to compile the driver as a part of the kernel image,
skip this chapter and continue with 1.2)
-Before you can use a module, you'll have to load it with
+Before you can use a module, you'll have to load it with::
insmod scc.o
@@ -75,61 +72,73 @@ The file itself consists of two main sections.
==========================================
The hardware setup section defines the following parameters for each
-Z8530:
-
-chip 1
-data_a 0x300 # data port A
-ctrl_a 0x304 # control port A
-data_b 0x301 # data port B
-ctrl_b 0x305 # control port B
-irq 5 # IRQ No. 5
-pclock 4915200 # clock
-board BAYCOM # hardware type
-escc no # enhanced SCC chip? (8580/85180/85280)
-vector 0 # latch for interrupt vector
-special no # address of special function register
-option 0 # option to set via sfr
-
-
-chip - this is just a delimiter to make sccinit a bit simpler to
+Z8530::
+
+ chip 1
+ data_a 0x300 # data port A
+ ctrl_a 0x304 # control port A
+ data_b 0x301 # data port B
+ ctrl_b 0x305 # control port B
+ irq 5 # IRQ No. 5
+ pclock 4915200 # clock
+ board BAYCOM # hardware type
+ escc no # enhanced SCC chip? (8580/85180/85280)
+ vector 0 # latch for interrupt vector
+ special no # address of special function register
+ option 0 # option to set via sfr
+
+
+chip
+ - this is just a delimiter to make sccinit a bit simpler to
program. A parameter has no effect.
-data_a - the address of the data port A of this Z8530 (needed)
-ctrl_a - the address of the control port A (needed)
-data_b - the address of the data port B (needed)
-ctrl_b - the address of the control port B (needed)
-
-irq - the used IRQ for this chip. Different chips can use different
- IRQs or the same. If they share an interrupt, it needs to be
+data_a
+ - the address of the data port A of this Z8530 (needed)
+ctrl_a
+ - the address of the control port A (needed)
+data_b
+ - the address of the data port B (needed)
+ctrl_b
+ - the address of the control port B (needed)
+
+irq
+ - the used IRQ for this chip. Different chips can use different
+ IRQs or the same. If they share an interrupt, it needs to be
specified within one chip-definition only.
pclock - the clock at the PCLK pin of the Z8530 (option, 4915200 is
- default), measured in Hertz
+ default), measured in Hertz
-board - the "type" of the board:
+board
+ - the "type" of the board:
+ ======================= ========
SCC type value
- ---------------------------------
+ ======================= ========
PA0HZP SCC card PA0HZP
EAGLE card EAGLE
PC100 card PC100
PRIMUS-PC (DG9BL) card PRIMUS
BayCom (U)SCC card BAYCOM
+ ======================= ========
-escc - if you want support for ESCC chips (8580, 85180, 85280), set
- this to "yes" (option, defaults to "no")
+escc
+ - if you want support for ESCC chips (8580, 85180, 85280), set
+ this to "yes" (option, defaults to "no")
-vector - address of the vector latch (aka "intack port") for PA0HZP
- cards. There can be only one vector latch for all chips!
+vector
+ - address of the vector latch (aka "intack port") for PA0HZP
+ cards. There can be only one vector latch for all chips!
(option, defaults to 0)
-special - address of the special function register on several cards.
- (option, defaults to 0)
+special
+ - address of the special function register on several cards.
+ (option, defaults to 0)
option - The value you write into that register (option, default is 0)
You can specify up to four chips (8 channels). If this is not enough,
-just change
+just change::
#define MAXSCC 4
@@ -138,75 +147,81 @@ to a higher value.
Example for the BAYCOM USCC:
----------------------------
-chip 1
-data_a 0x300 # data port A
-ctrl_a 0x304 # control port A
-data_b 0x301 # data port B
-ctrl_b 0x305 # control port B
-irq 5 # IRQ No. 5 (#)
-board BAYCOM # hardware type (*)
-#
-# SCC chip 2
-#
-chip 2
-data_a 0x302
-ctrl_a 0x306
-data_b 0x303
-ctrl_b 0x307
-board BAYCOM
+::
+
+ chip 1
+ data_a 0x300 # data port A
+ ctrl_a 0x304 # control port A
+ data_b 0x301 # data port B
+ ctrl_b 0x305 # control port B
+ irq 5 # IRQ No. 5 (#)
+ board BAYCOM # hardware type (*)
+ #
+ # SCC chip 2
+ #
+ chip 2
+ data_a 0x302
+ ctrl_a 0x306
+ data_b 0x303
+ ctrl_b 0x307
+ board BAYCOM
An example for a PA0HZP card:
-----------------------------
-chip 1
-data_a 0x153
-data_b 0x151
-ctrl_a 0x152
-ctrl_b 0x150
-irq 9
-pclock 4915200
-board PA0HZP
-vector 0x168
-escc no
-#
-#
-#
-chip 2
-data_a 0x157
-data_b 0x155
-ctrl_a 0x156
-ctrl_b 0x154
-irq 9
-pclock 4915200
-board PA0HZP
-vector 0x168
-escc no
+::
+
+ chip 1
+ data_a 0x153
+ data_b 0x151
+ ctrl_a 0x152
+ ctrl_b 0x150
+ irq 9
+ pclock 4915200
+ board PA0HZP
+ vector 0x168
+ escc no
+ #
+ #
+ #
+ chip 2
+ data_a 0x157
+ data_b 0x155
+ ctrl_a 0x156
+ ctrl_b 0x154
+ irq 9
+ pclock 4915200
+ board PA0HZP
+ vector 0x168
+ escc no
A DRSI would should probably work with this:
--------------------------------------------
(actually: two DRSI cards...)
-chip 1
-data_a 0x303
-data_b 0x301
-ctrl_a 0x302
-ctrl_b 0x300
-irq 7
-pclock 4915200
-board DRSI
-escc no
-#
-#
-#
-chip 2
-data_a 0x313
-data_b 0x311
-ctrl_a 0x312
-ctrl_b 0x310
-irq 7
-pclock 4915200
-board DRSI
-escc no
+::
+
+ chip 1
+ data_a 0x303
+ data_b 0x301
+ ctrl_a 0x302
+ ctrl_b 0x300
+ irq 7
+ pclock 4915200
+ board DRSI
+ escc no
+ #
+ #
+ #
+ chip 2
+ data_a 0x313
+ data_b 0x311
+ ctrl_a 0x312
+ ctrl_b 0x310
+ irq 7
+ pclock 4915200
+ board DRSI
+ escc no
Note that you cannot use the on-board baudrate generator off DRSI
cards. Use "mode dpll" for clock source (see below).
@@ -220,17 +235,19 @@ The utility "gencfg"
If you only know the parameters for the PE1CHL driver for DOS,
run gencfg. It will generate the correct port addresses (I hope).
Its parameters are exactly the same as the ones you use with
-the "attach scc" command in net, except that the string "init" must
-not appear. Example:
+the "attach scc" command in net, except that the string "init" must
+not appear. Example::
-gencfg 2 0x150 4 2 0 1 0x168 9 4915200
+ gencfg 2 0x150 4 2 0 1 0x168 9 4915200
will print a skeleton z8530drv.conf for the OptoSCC to stdout.
-gencfg 2 0x300 2 4 5 -4 0 7 4915200 0x10
+::
+
+ gencfg 2 0x300 2 4 5 -4 0 7 4915200 0x10
does the same for the BAYCOM USCC card. In my opinion it is much easier
-to edit scc_config.h...
+to edit scc_config.h...
1.2.2 channel configuration
@@ -239,58 +256,58 @@ to edit scc_config.h...
The channel definition is divided into three sub sections for each
channel:
-An example for scc0:
-
-# DEVICE
-
-device scc0 # the device for the following params
-
-# MODEM / BUFFERS
-
-speed 1200 # the default baudrate
-clock dpll # clock source:
- # dpll = normal half duplex operation
- # external = MODEM provides own Rx/Tx clock
- # divider = use full duplex divider if
- # installed (1)
-mode nrzi # HDLC encoding mode
- # nrzi = 1k2 MODEM, G3RUH 9k6 MODEM
- # nrz = DF9IC 9k6 MODEM
- #
-bufsize 384 # size of buffers. Note that this must include
- # the AX.25 header, not only the data field!
- # (optional, defaults to 384)
-
-# KISS (Layer 1)
-
-txdelay 36 # (see chapter 1.4)
-persist 64
-slot 8
-tail 8
-fulldup 0
-wait 12
-min 3
-maxkey 7
-idle 3
-maxdef 120
-group 0
-txoff off
-softdcd on
-slip off
+An example for scc0::
+
+ # DEVICE
+
+ device scc0 # the device for the following params
+
+ # MODEM / BUFFERS
+
+ speed 1200 # the default baudrate
+ clock dpll # clock source:
+ # dpll = normal half duplex operation
+ # external = MODEM provides own Rx/Tx clock
+ # divider = use full duplex divider if
+ # installed (1)
+ mode nrzi # HDLC encoding mode
+ # nrzi = 1k2 MODEM, G3RUH 9k6 MODEM
+ # nrz = DF9IC 9k6 MODEM
+ #
+ bufsize 384 # size of buffers. Note that this must include
+ # the AX.25 header, not only the data field!
+ # (optional, defaults to 384)
+
+ # KISS (Layer 1)
+
+ txdelay 36 # (see chapter 1.4)
+ persist 64
+ slot 8
+ tail 8
+ fulldup 0
+ wait 12
+ min 3
+ maxkey 7
+ idle 3
+ maxdef 120
+ group 0
+ txoff off
+ softdcd on
+ slip off
The order WITHIN these sections is unimportant. The order OF these
sections IS important. The MODEM parameters are set with the first
recognized KISS parameter...
Please note that you can initialize the board only once after boot
-(or insmod). You can change all parameters but "mode" and "clock"
-later with the Sccparam program or through KISS. Just to avoid
-security holes...
+(or insmod). You can change all parameters but "mode" and "clock"
+later with the Sccparam program or through KISS. Just to avoid
+security holes...
(1) this divider is usually mounted on the SCC-PBC (PA0HZP) or not
- present at all (BayCom). It feeds back the output of the DPLL
- (digital pll) as transmit clock. Using this mode without a divider
- installed will normally result in keying the transceiver until
+ present at all (BayCom). It feeds back the output of the DPLL
+ (digital pll) as transmit clock. Using this mode without a divider
+ installed will normally result in keying the transceiver until
maxkey expires --- of course without sending anything (useful).
2. Attachment of a channel by your AX.25 software
@@ -299,15 +316,15 @@ security holes...
2.1 Kernel AX.25
================
-To set up an AX.25 device you can simply type:
+To set up an AX.25 device you can simply type::
ifconfig scc0 44.128.1.1 hw ax25 dl0tha-7
-This will create a network interface with the IP number 44.128.20.107
-and the callsign "dl0tha". If you do not have any IP number (yet) you
-can use any of the 44.128.0.0 network. Note that you do not need
-axattach. The purpose of axattach (like slattach) is to create a KISS
-network device linked to a TTY. Please read the documentation of the
+This will create a network interface with the IP number 44.128.20.107
+and the callsign "dl0tha". If you do not have any IP number (yet) you
+can use any of the 44.128.0.0 network. Note that you do not need
+axattach. The purpose of axattach (like slattach) is to create a KISS
+network device linked to a TTY. Please read the documentation of the
ax25-utils and the AX.25-HOWTO to learn how to set the parameters of
the kernel AX.25.
@@ -318,16 +335,16 @@ Since the TTY driver (aka KISS TNC emulation) is gone you need
to emulate the old behaviour. The cost of using these programs is
that you probably need to compile the kernel AX.25, regardless of whether
you actually use it or not. First setup your /etc/ax25/axports,
-for example:
+for example::
9k6 dl0tha-9 9600 255 4 9600 baud port (scc3)
axlink dl0tha-15 38400 255 4 Link to NOS
-Now "ifconfig" the scc device:
+Now "ifconfig" the scc device::
ifconfig scc3 44.128.1.1 hw ax25 dl0tha-9
-You can now axattach a pseudo-TTY:
+You can now axattach a pseudo-TTY::
axattach /dev/ptys0 axlink
@@ -335,11 +352,11 @@ and start your NOS and attach /dev/ptys0 there. The problem is that
NOS is reachable only via digipeating through the kernel AX.25
(disastrous on a DAMA controlled channel). To solve this problem,
configure "rxecho" to echo the incoming frames from "9k6" to "axlink"
-and outgoing frames from "axlink" to "9k6" and start:
+and outgoing frames from "axlink" to "9k6" and start::
rxecho
-Or simply use "kissbridge" coming with z8530drv-utils:
+Or simply use "kissbridge" coming with z8530drv-utils::
ifconfig scc3 hw ax25 dl0tha-9
kissbridge scc3 /dev/ptys0
@@ -351,55 +368,57 @@ Or simply use "kissbridge" coming with z8530drv-utils:
3.1 Displaying SCC Parameters:
==============================
-Once a SCC channel has been attached, the parameter settings and
-some statistic information can be shown using the param program:
+Once a SCC channel has been attached, the parameter settings and
+some statistic information can be shown using the param program::
-dl1bke-u:~$ sccstat scc0
+ dl1bke-u:~$ sccstat scc0
-Parameters:
+ Parameters:
-speed : 1200 baud
-txdelay : 36
-persist : 255
-slottime : 0
-txtail : 8
-fulldup : 1
-waittime : 12
-mintime : 3 sec
-maxkeyup : 7 sec
-idletime : 3 sec
-maxdefer : 120 sec
-group : 0x00
-txoff : off
-softdcd : on
-SLIP : off
+ speed : 1200 baud
+ txdelay : 36
+ persist : 255
+ slottime : 0
+ txtail : 8
+ fulldup : 1
+ waittime : 12
+ mintime : 3 sec
+ maxkeyup : 7 sec
+ idletime : 3 sec
+ maxdefer : 120 sec
+ group : 0x00
+ txoff : off
+ softdcd : on
+ SLIP : off
-Status:
+ Status:
-HDLC Z8530 Interrupts Buffers
------------------------------------------------------------------------
-Sent : 273 RxOver : 0 RxInts : 125074 Size : 384
-Received : 1095 TxUnder: 0 TxInts : 4684 NoSpace : 0
-RxErrors : 1591 ExInts : 11776
-TxErrors : 0 SpInts : 1503
-Tx State : idle
+ HDLC Z8530 Interrupts Buffers
+ -----------------------------------------------------------------------
+ Sent : 273 RxOver : 0 RxInts : 125074 Size : 384
+ Received : 1095 TxUnder: 0 TxInts : 4684 NoSpace : 0
+ RxErrors : 1591 ExInts : 11776
+ TxErrors : 0 SpInts : 1503
+ Tx State : idle
The status info shown is:
-Sent - number of frames transmitted
-Received - number of frames received
-RxErrors - number of receive errors (CRC, ABORT)
-TxErrors - number of discarded Tx frames (due to various reasons)
-Tx State - status of the Tx interrupt handler: idle/busy/active/tail (2)
-RxOver - number of receiver overruns
-TxUnder - number of transmitter underruns
-RxInts - number of receiver interrupts
-TxInts - number of transmitter interrupts
-EpInts - number of receiver special condition interrupts
-SpInts - number of external/status interrupts
-Size - maximum size of an AX.25 frame (*with* AX.25 headers!)
-NoSpace - number of times a buffer could not get allocated
+============== ==============================================================
+Sent number of frames transmitted
+Received number of frames received
+RxErrors number of receive errors (CRC, ABORT)
+TxErrors number of discarded Tx frames (due to various reasons)
+Tx State status of the Tx interrupt handler: idle/busy/active/tail (2)
+RxOver number of receiver overruns
+TxUnder number of transmitter underruns
+RxInts number of receiver interrupts
+TxInts number of transmitter interrupts
+EpInts number of receiver special condition interrupts
+SpInts number of external/status interrupts
+Size maximum size of an AX.25 frame (*with* AX.25 headers!)
+NoSpace number of times a buffer could not get allocated
+============== ==============================================================
An overrun is abnormal. If lots of these occur, the product of
baudrate and number of interfaces is too high for the processing
@@ -411,32 +430,34 @@ driver or the kernel AX.25.
======================
-The setting of parameters of the emulated KISS TNC is done in the
+The setting of parameters of the emulated KISS TNC is done in the
same way in the SCC driver. You can change parameters by using
-the kissparms program from the ax25-utils package or use the program
-"sccparam":
+the kissparms program from the ax25-utils package or use the program
+"sccparam"::
sccparam <device> <paramname> <decimal-|hexadecimal value>
You can change the following parameters:
-param : value
-------------------------
-speed : 1200
-txdelay : 36
-persist : 255
-slottime : 0
-txtail : 8
-fulldup : 1
-waittime : 12
-mintime : 3
-maxkeyup : 7
-idletime : 3
-maxdefer : 120
-group : 0x00
-txoff : off
-softdcd : on
-SLIP : off
+=========== =====
+param value
+=========== =====
+speed 1200
+txdelay 36
+persist 255
+slottime 0
+txtail 8
+fulldup 1
+waittime 12
+mintime 3
+maxkeyup 7
+idletime 3
+maxdefer 120
+group 0x00
+txoff off
+softdcd on
+SLIP off
+=========== =====
The parameters have the following meaning:
@@ -447,92 +468,92 @@ speed:
Example: sccparam /dev/scc3 speed 9600
txdelay:
- The delay (in units of 10 ms) after keying of the
- transmitter, until the first byte is sent. This is usually
- called "TXDELAY" in a TNC. When 0 is specified, the driver
- will just wait until the CTS signal is asserted. This
- assumes the presence of a timer or other circuitry in the
- MODEM and/or transmitter, that asserts CTS when the
+ The delay (in units of 10 ms) after keying of the
+ transmitter, until the first byte is sent. This is usually
+ called "TXDELAY" in a TNC. When 0 is specified, the driver
+ will just wait until the CTS signal is asserted. This
+ assumes the presence of a timer or other circuitry in the
+ MODEM and/or transmitter, that asserts CTS when the
transmitter is ready for data.
A normal value of this parameter is 30-36.
Example: sccparam /dev/scc0 txd 20
persist:
- This is the probability that the transmitter will be keyed
- when the channel is found to be free. It is a value from 0
- to 255, and the probability is (value+1)/256. The value
- should be somewhere near 50-60, and should be lowered when
+ This is the probability that the transmitter will be keyed
+ when the channel is found to be free. It is a value from 0
+ to 255, and the probability is (value+1)/256. The value
+ should be somewhere near 50-60, and should be lowered when
the channel is used more heavily.
Example: sccparam /dev/scc2 persist 20
slottime:
- This is the time between samples of the channel. It is
- expressed in units of 10 ms. About 200-300 ms (value 20-30)
+ This is the time between samples of the channel. It is
+ expressed in units of 10 ms. About 200-300 ms (value 20-30)
seems to be a good value.
Example: sccparam /dev/scc0 slot 20
tail:
- The time the transmitter will remain keyed after the last
- byte of a packet has been transferred to the SCC. This is
- necessary because the CRC and a flag still have to leave the
- SCC before the transmitter is keyed down. The value depends
- on the baudrate selected. A few character times should be
+ The time the transmitter will remain keyed after the last
+ byte of a packet has been transferred to the SCC. This is
+ necessary because the CRC and a flag still have to leave the
+ SCC before the transmitter is keyed down. The value depends
+ on the baudrate selected. A few character times should be
sufficient, e.g. 40ms at 1200 baud. (value 4)
The value of this parameter is in 10 ms units.
Example: sccparam /dev/scc2 4
full:
- The full-duplex mode switch. This can be one of the following
+ The full-duplex mode switch. This can be one of the following
values:
- 0: The interface will operate in CSMA mode (the normal
- half-duplex packet radio operation)
- 1: Fullduplex mode, i.e. the transmitter will be keyed at
- any time, without checking the received carrier. It
- will be unkeyed when there are no packets to be sent.
- 2: Like 1, but the transmitter will remain keyed, also
- when there are no packets to be sent. Flags will be
- sent in that case, until a timeout (parameter 10)
- occurs.
+ 0: The interface will operate in CSMA mode (the normal
+ half-duplex packet radio operation)
+ 1: Fullduplex mode, i.e. the transmitter will be keyed at
+ any time, without checking the received carrier. It
+ will be unkeyed when there are no packets to be sent.
+ 2: Like 1, but the transmitter will remain keyed, also
+ when there are no packets to be sent. Flags will be
+ sent in that case, until a timeout (parameter 10)
+ occurs.
Example: sccparam /dev/scc0 fulldup off
wait:
- The initial waittime before any transmit attempt, after the
- frame has been queue for transmit. This is the length of
+ The initial waittime before any transmit attempt, after the
+ frame has been queue for transmit. This is the length of
the first slot in CSMA mode. In full duplex modes it is
set to 0 for maximum performance.
- The value of this parameter is in 10 ms units.
+ The value of this parameter is in 10 ms units.
Example: sccparam /dev/scc1 wait 4
maxkey:
- The maximal time the transmitter will be keyed to send
- packets, in seconds. This can be useful on busy CSMA
- channels, to avoid "getting a bad reputation" when you are
- generating a lot of traffic. After the specified time has
+ The maximal time the transmitter will be keyed to send
+ packets, in seconds. This can be useful on busy CSMA
+ channels, to avoid "getting a bad reputation" when you are
+ generating a lot of traffic. After the specified time has
elapsed, no new frame will be started. Instead, the trans-
- mitter will be switched off for a specified time (parameter
- min), and then the selected algorithm for keyup will be
+ mitter will be switched off for a specified time (parameter
+ min), and then the selected algorithm for keyup will be
started again.
- The value 0 as well as "off" will disable this feature,
- and allow infinite transmission time.
+ The value 0 as well as "off" will disable this feature,
+ and allow infinite transmission time.
Example: sccparam /dev/scc0 maxk 20
min:
- This is the time the transmitter will be switched off when
+ This is the time the transmitter will be switched off when
the maximum transmission time is exceeded.
Example: sccparam /dev/scc3 min 10
-idle
- This parameter specifies the maximum idle time in full duplex
- 2 mode, in seconds. When no frames have been sent for this
+idle:
+ This parameter specifies the maximum idle time in full duplex
+ 2 mode, in seconds. When no frames have been sent for this
time, the transmitter will be keyed down. A value of 0 is
has same result as the fullduplex mode 1. This parameter
can be disabled.
@@ -541,7 +562,7 @@ idle
maxdefer
This is the maximum time (in seconds) to wait for a free channel
- to send. When this timer expires the transmitter will be keyed
+ to send. When this timer expires the transmitter will be keyed
IMMEDIATELY. If you love to get trouble with other users you
should set this to a very low value ;-)
@@ -555,32 +576,38 @@ txoff:
Example: sccparam /dev/scc2 txoff on
group:
- It is possible to build special radio equipment to use more than
- one frequency on the same band, e.g. using several receivers and
+ It is possible to build special radio equipment to use more than
+ one frequency on the same band, e.g. using several receivers and
only one transmitter that can be switched between frequencies.
- Also, you can connect several radios that are active on the same
- band. In these cases, it is not possible, or not a good idea, to
- transmit on more than one frequency. The SCC driver provides a
- method to lock transmitters on different interfaces, using the
- "param <interface> group <x>" command. This will only work when
+ Also, you can connect several radios that are active on the same
+ band. In these cases, it is not possible, or not a good idea, to
+ transmit on more than one frequency. The SCC driver provides a
+ method to lock transmitters on different interfaces, using the
+ "param <interface> group <x>" command. This will only work when
you are using CSMA mode (parameter full = 0).
- The number <x> must be 0 if you want no group restrictions, and
+
+ The number <x> must be 0 if you want no group restrictions, and
can be computed as follows to create restricted groups:
<x> is the sum of some OCTAL numbers:
- 200 This transmitter will only be keyed when all other
- transmitters in the group are off.
- 100 This transmitter will only be keyed when the carrier
- detect of all other interfaces in the group is off.
- 0xx A byte that can be used to define different groups.
- Interfaces are in the same group, when the logical AND
- between their xx values is nonzero.
+
+ === =======================================================
+ 200 This transmitter will only be keyed when all other
+ transmitters in the group are off.
+ 100 This transmitter will only be keyed when the carrier
+ detect of all other interfaces in the group is off.
+ 0xx A byte that can be used to define different groups.
+ Interfaces are in the same group, when the logical AND
+ between their xx values is nonzero.
+ === =======================================================
Examples:
- When 2 interfaces use group 201, their transmitters will never be
+
+ When 2 interfaces use group 201, their transmitters will never be
keyed at the same time.
- When 2 interfaces use group 101, the transmitters will only key
- when both channels are clear at the same time. When group 301,
+
+ When 2 interfaces use group 101, the transmitters will only key
+ when both channels are clear at the same time. When group 301,
the transmitters will not be keyed at the same time.
Don't forget to convert the octal numbers into decimal before
@@ -595,19 +622,19 @@ softdcd:
Example: sccparam /dev/scc0 soft on
-4. Problems
+4. Problems
===========
If you have tx-problems with your BayCom USCC card please check
the manufacturer of the 8530. SGS chips have a slightly
-different timing. Try Zilog... A solution is to write to register 8
-instead to the data port, but this won't work with the ESCC chips.
+different timing. Try Zilog... A solution is to write to register 8
+instead to the data port, but this won't work with the ESCC chips.
*SIGH!*
A very common problem is that the PTT locks until the maxkeyup timer
expires, although interrupts and clock source are correct. In most
cases compiling the driver with CONFIG_SCC_DELAY (set with
-make config) solves the problems. For more hints read the (pseudo) FAQ
+make config) solves the problems. For more hints read the (pseudo) FAQ
and the documentation coming with z8530drv-utils.
I got reports that the driver has problems on some 386-based systems.
@@ -651,7 +678,9 @@ got it up-and-running?
Many thanks to Linus Torvalds and Alan Cox for including the driver
in the Linux standard distribution and their support.
-Joerg Reuter ampr-net: dl1bke@db0pra.ampr.org
- AX-25 : DL1BKE @ DB0ABH.#BAY.DEU.EU
- Internet: jreuter@yaina.de
- WWW : http://yaina.de/jreuter
+::
+
+ Joerg Reuter ampr-net: dl1bke@db0pra.ampr.org
+ AX-25 : DL1BKE @ DB0ABH.#BAY.DEU.EU
+ Internet: jreuter@yaina.de
+ WWW : http://yaina.de/jreuter
diff --git a/Documentation/nvdimm/maintainer-entry-profile.rst b/Documentation/nvdimm/maintainer-entry-profile.rst
index efe37adadcea..9da748e42623 100644
--- a/Documentation/nvdimm/maintainer-entry-profile.rst
+++ b/Documentation/nvdimm/maintainer-entry-profile.rst
@@ -4,15 +4,15 @@ LIBNVDIMM Maintainer Entry Profile
Overview
--------
The libnvdimm subsystem manages persistent memory across multiple
-architectures. The mailing list, is tracked by patchwork here:
+architectures. The mailing list is tracked by patchwork here:
https://patchwork.kernel.org/project/linux-nvdimm/list/
...and that instance is configured to give feedback to submitters on
patch acceptance and upstream merge. Patches are merged to either the
-'libnvdimm-fixes', or 'libnvdimm-for-next' branch. Those branches are
+'libnvdimm-fixes' or 'libnvdimm-for-next' branch. Those branches are
available here:
https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/
-In general patches can be submitted against the latest -rc, however if
+In general patches can be submitted against the latest -rc; however, if
the incoming code change is dependent on other pending changes then the
patch should be based on the libnvdimm-for-next branch. However, since
persistent memory sits at the intersection of storage and memory there
@@ -35,12 +35,12 @@ getting the test environment set up.
ACPI Device Specific Methods (_DSM)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Before patches enabling for a new _DSM family will be considered it must
+Before patches enabling a new _DSM family will be considered, it must
be assigned a format-interface-code from the NVDIMM Sub-team of the ACPI
Specification Working Group. In general, the stance of the subsystem is
-to push back on the proliferation of NVDIMM command sets, do strongly
+to push back on the proliferation of NVDIMM command sets, so do strongly
consider implementing support for an existing command set. See
-drivers/acpi/nfit/nfit.h for the set of support command sets.
+drivers/acpi/nfit/nfit.h for the set of supported command sets.
Key Cycle Dates
@@ -48,7 +48,7 @@ Key Cycle Dates
New submissions can be sent at any time, but if they intend to hit the
next merge window they should be sent before -rc4, and ideally
stabilized in the libnvdimm-for-next branch by -rc6. Of course if a
-patch set requires more than 2 weeks of review -rc4 is already too late
+patch set requires more than 2 weeks of review, -rc4 is already too late
and some patches may require multiple development cycles to review.
diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst
index 0924d29636ad..1831e431f725 100644
--- a/Documentation/power/pci.rst
+++ b/Documentation/power/pci.rst
@@ -1004,41 +1004,39 @@ including the PCI bus type. The flags should be set once at the driver probe
time with the help of the dev_pm_set_driver_flags() function and they should not
be updated directly afterwards.
-The DPM_FLAG_NEVER_SKIP flag prevents the PM core from using the direct-complete
-mechanism allowing device suspend/resume callbacks to be skipped if the device
-is in runtime suspend when the system suspend starts. That also affects all of
-the ancestors of the device, so this flag should only be used if absolutely
-necessary.
-
-The DPM_FLAG_SMART_PREPARE flag instructs the PCI bus type to only return a
-positive value from pci_pm_prepare() if the ->prepare callback provided by the
+The DPM_FLAG_NO_DIRECT_COMPLETE flag prevents the PM core from using the
+direct-complete mechanism allowing device suspend/resume callbacks to be skipped
+if the device is in runtime suspend when the system suspend starts. That also
+affects all of the ancestors of the device, so this flag should only be used if
+absolutely necessary.
+
+The DPM_FLAG_SMART_PREPARE flag causes the PCI bus type to return a positive
+value from pci_pm_prepare() only if the ->prepare callback provided by the
driver of the device returns a positive value. That allows the driver to opt
-out from using the direct-complete mechanism dynamically.
+out from using the direct-complete mechanism dynamically (whereas setting
+DPM_FLAG_NO_DIRECT_COMPLETE means permanent opt-out).
The DPM_FLAG_SMART_SUSPEND flag tells the PCI bus type that from the driver's
perspective the device can be safely left in runtime suspend during system
suspend. That causes pci_pm_suspend(), pci_pm_freeze() and pci_pm_poweroff()
-to skip resuming the device from runtime suspend unless there are PCI-specific
-reasons for doing that. Also, it causes pci_pm_suspend_late/noirq(),
-pci_pm_freeze_late/noirq() and pci_pm_poweroff_late/noirq() to return early
-if the device remains in runtime suspend in the beginning of the "late" phase
-of the system-wide transition under way. Moreover, if the device is in
-runtime suspend in pci_pm_resume_noirq() or pci_pm_restore_noirq(), its runtime
-power management status will be changed to "active" (as it is going to be put
-into D0 going forward), but if it is in runtime suspend in pci_pm_thaw_noirq(),
-the function will set the power.direct_complete flag for it (to make the PM core
-skip the subsequent "thaw" callbacks for it) and return.
-
-Setting the DPM_FLAG_LEAVE_SUSPENDED flag means that the driver prefers the
-device to be left in suspend after system-wide transitions to the working state.
-This flag is checked by the PM core, but the PCI bus type informs the PM core
-which devices may be left in suspend from its perspective (that happens during
-the "noirq" phase of system-wide suspend and analogous transitions) and next it
-uses the dev_pm_may_skip_resume() helper to decide whether or not to return from
-pci_pm_resume_noirq() early, as the PM core will skip the remaining resume
-callbacks for the device during the transition under way and will set its
-runtime PM status to "suspended" if dev_pm_may_skip_resume() returns "true" for
-it.
+to avoid resuming the device from runtime suspend unless there are PCI-specific
+reasons for doing that. Also, it causes pci_pm_suspend_late/noirq() and
+pci_pm_poweroff_late/noirq() to return early if the device remains in runtime
+suspend during the "late" phase of the system-wide transition under way.
+Moreover, if the device is in runtime suspend in pci_pm_resume_noirq() or
+pci_pm_restore_noirq(), its runtime PM status will be changed to "active" (as it
+is going to be put into D0 going forward).
+
+Setting the DPM_FLAG_MAY_SKIP_RESUME flag means that the driver allows its
+"noirq" and "early" resume callbacks to be skipped if the device can be left
+in suspend after a system-wide transition into the working state. This flag is
+taken into consideration by the PM core along with the power.may_skip_resume
+status bit of the device which is set by pci_pm_suspend_noirq() in certain
+situations. If the PM core determines that the driver's "noirq" and "early"
+resume callbacks should be skipped, the dev_pm_skip_resume() helper function
+will return "true" and that will cause pci_pm_resume_noirq() and
+pci_pm_resume_early() to return upfront without touching the device and
+executing the driver callbacks.
3.2. Device Runtime Power Management
------------------------------------
diff --git a/Documentation/power/suspend-and-cpuhotplug.rst b/Documentation/power/suspend-and-cpuhotplug.rst
index 572d968c5375..ebedb6c75db9 100644
--- a/Documentation/power/suspend-and-cpuhotplug.rst
+++ b/Documentation/power/suspend-and-cpuhotplug.rst
@@ -48,7 +48,7 @@ More details follow::
|
|
v
- disable_nonboot_cpus()
+ freeze_secondary_cpus()
/* start */
|
v
@@ -83,7 +83,7 @@ More details follow::
Release cpu_add_remove_lock
|
v
- /* disable_nonboot_cpus() complete */
+ /* freeze_secondary_cpus() complete */
|
v
Do suspend
@@ -93,7 +93,7 @@ More details follow::
Resuming back is likewise, with the counterparts being (in the order of
execution during resume):
-* enable_nonboot_cpus() which involves::
+* thaw_secondary_cpus() which involves::
| Acquire cpu_add_remove_lock
| Decrease cpu_hotplug_disabled, thereby enabling regular cpu hotplug
diff --git a/Documentation/powerpc/cxl.rst b/Documentation/powerpc/cxl.rst
index 920546d81326..d2d77057610e 100644
--- a/Documentation/powerpc/cxl.rst
+++ b/Documentation/powerpc/cxl.rst
@@ -133,6 +133,7 @@ User API
========
1. AFU character devices
+^^^^^^^^^^^^^^^^^^^^^^^^
For AFUs operating in AFU directed mode, two character device
files will be created. /dev/cxl/afu0.0m will correspond to a
@@ -395,6 +396,7 @@ read
2. Card character device (powerVM guest only)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In a powerVM guest, an extra character device is created for the
card. The device is only used to write (flash) a new image on the
diff --git a/Documentation/powerpc/firmware-assisted-dump.rst b/Documentation/powerpc/firmware-assisted-dump.rst
index b3f3ee135dbe..20ea8cdee0aa 100644
--- a/Documentation/powerpc/firmware-assisted-dump.rst
+++ b/Documentation/powerpc/firmware-assisted-dump.rst
@@ -344,7 +344,7 @@ Here is the list of files under powerpc debugfs:
NOTE:
- Please refer to Documentation/filesystems/debugfs.txt on
+ Please refer to Documentation/filesystems/debugfs.rst on
how to mount the debugfs filesystem.
diff --git a/Documentation/process/adding-syscalls.rst b/Documentation/process/adding-syscalls.rst
index 1c3a840d06b9..a6b4a3a5bf3f 100644
--- a/Documentation/process/adding-syscalls.rst
+++ b/Documentation/process/adding-syscalls.rst
@@ -33,7 +33,7 @@ interface.
to a somewhat opaque API.
- If you're just exposing runtime system information, a new node in sysfs
- (see ``Documentation/filesystems/sysfs.txt``) or the ``/proc`` filesystem may
+ (see ``Documentation/filesystems/sysfs.rst``) or the ``/proc`` filesystem may
be more appropriate. However, access to these mechanisms requires that the
relevant filesystem is mounted, which might not always be the case (e.g.
in a namespaced/sandboxed/chrooted environment). Avoid adding any API to
diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
index acb2f1b36350..17a8e584f15f 100644
--- a/Documentation/process/coding-style.rst
+++ b/Documentation/process/coding-style.rst
@@ -84,15 +84,20 @@ Get a decent editor and don't leave whitespace at the end of lines.
Coding style is all about readability and maintainability using commonly
available tools.
-The limit on the length of lines is 80 columns and this is a strongly
-preferred limit.
-
-Statements longer than 80 columns will be broken into sensible chunks, unless
-exceeding 80 columns significantly increases readability and does not hide
-information. Descendants are always substantially shorter than the parent and
-are placed substantially to the right. The same applies to function headers
-with a long argument list. However, never break user-visible strings such as
-printk messages, because that breaks the ability to grep for them.
+The preferred limit on the length of a single line is 80 columns.
+
+Statements longer than 80 columns should be broken into sensible chunks,
+unless exceeding 80 columns significantly increases readability and does
+not hide information.
+
+Descendants are always substantially shorter than the parent and are
+are placed substantially to the right. A very commonly used style
+is to align descendants to a function open parenthesis.
+
+These same rules are applied to function headers with a long argument list.
+
+However, never break user-visible strings such as printk messages because
+that breaks the ability to grep for them.
3) Placing Braces and Spaces
diff --git a/Documentation/process/index.rst b/Documentation/process/index.rst
index 6399d92f0b21..f07c9250c3ac 100644
--- a/Documentation/process/index.rst
+++ b/Documentation/process/index.rst
@@ -61,6 +61,7 @@ lack of a better place.
botching-up-ioctls
clang-format
../riscv/patch-acceptance
+ unaligned-memory-access
.. only:: subproject and html
diff --git a/Documentation/process/submit-checklist.rst b/Documentation/process/submit-checklist.rst
index 8e56337d422d..3f8e9d5d95c2 100644
--- a/Documentation/process/submit-checklist.rst
+++ b/Documentation/process/submit-checklist.rst
@@ -107,7 +107,7 @@ and elsewhere regarding submitting Linux kernel patches.
and why.
26) If any ioctl's are added by the patch, then also update
- ``Documentation/ioctl/ioctl-number.rst``.
+ ``Documentation/userspace-api/ioctl/ioctl-number.rst``.
27) If your modified source code depends on or uses any of the kernel
APIs or features that are related to the following ``Kconfig`` symbols,
diff --git a/Documentation/unaligned-memory-access.txt b/Documentation/process/unaligned-memory-access.rst
index 1ee82419d8aa..1ee82419d8aa 100644
--- a/Documentation/unaligned-memory-access.txt
+++ b/Documentation/process/unaligned-memory-access.rst
diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst
index b5c51f7c748d..367e27ec3c50 100644
--- a/Documentation/s390/vfio-ap.rst
+++ b/Documentation/s390/vfio-ap.rst
@@ -484,7 +484,7 @@ CARD.DOMAIN TYPE MODE
05.00ff CEX5A Accelerator
=========== ===== ============
-Guest2
+Guest3
------
=========== ===== ============
CARD.DOMAIN TYPE MODE
diff --git a/Documentation/scheduler/sched-domains.rst b/Documentation/scheduler/sched-domains.rst
index f7504226f445..5c4b7f4f0062 100644
--- a/Documentation/scheduler/sched-domains.rst
+++ b/Documentation/scheduler/sched-domains.rst
@@ -19,10 +19,12 @@ CPUs".
Each scheduling domain must have one or more CPU groups (struct sched_group)
which are organised as a circular one way linked list from the ->groups
pointer. The union of cpumasks of these groups MUST be the same as the
-domain's span. The intersection of cpumasks from any two of these groups
-MUST be the empty set. The group pointed to by the ->groups pointer MUST
-contain the CPU to which the domain belongs. Groups may be shared among
-CPUs as they contain read only data after they have been set up.
+domain's span. The group pointed to by the ->groups pointer MUST contain the CPU
+to which the domain belongs. Groups may be shared among CPUs as they contain
+read only data after they have been set up. The intersection of cpumasks from
+any two of these groups may be non empty. If this is the case the SD_OVERLAP
+flag is set on the corresponding scheduling domain and its groups may not be
+shared between CPUs.
Balancing within a sched domain occurs between groups. That is, each group
is treated as one entity. The load of a group is defined as the sum of the
diff --git a/Documentation/digsig.txt b/Documentation/security/digsig.rst
index f6a8902d3ef7..f6a8902d3ef7 100644
--- a/Documentation/digsig.txt
+++ b/Documentation/security/digsig.rst
diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index fc503dd689a7..8129405eb2cc 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -15,3 +15,4 @@ Security Documentation
self-protection
siphash
tpm/index
+ digsig
diff --git a/Documentation/security/lsm.rst b/Documentation/security/lsm.rst
index aadf47c808c0..6a2a2e973080 100644
--- a/Documentation/security/lsm.rst
+++ b/Documentation/security/lsm.rst
@@ -35,47 +35,50 @@ desired model of security. Linus also suggested the possibility of
migrating the Linux capabilities code into such a module.
The Linux Security Modules (LSM) project was started by WireX to develop
-such a framework. LSM is a joint development effort by several security
+such a framework. LSM was a joint development effort by several security
projects, including Immunix, SELinux, SGI and Janus, and several
individuals, including Greg Kroah-Hartman and James Morris, to develop a
-Linux kernel patch that implements this framework. The patch is
-currently tracking the 2.4 series and is targeted for integration into
-the 2.5 development series. This technical report provides an overview
-of the framework and the example capabilities security module provided
-by the LSM kernel patch.
+Linux kernel patch that implements this framework. The work was
+incorporated in the mainstream in December of 2003. This technical
+report provides an overview of the framework and the capabilities
+security module.
LSM Framework
=============
-The LSM kernel patch provides a general kernel framework to support
+The LSM framework provides a general kernel framework to support
security modules. In particular, the LSM framework is primarily focused
on supporting access control modules, although future development is
-likely to address other security needs such as auditing. By itself, the
+likely to address other security needs such as sandboxing. By itself, the
framework does not provide any additional security; it merely provides
-the infrastructure to support security modules. The LSM kernel patch
-also moves most of the capabilities logic into an optional security
-module, with the system defaulting to the traditional superuser logic.
+the infrastructure to support security modules. The LSM framework is
+optional, requiring `CONFIG_SECURITY` to be enabled. The capabilities
+logic is implemented as a security module.
This capabilities module is discussed further in
`LSM Capabilities Module`_.
-The LSM kernel patch adds security fields to kernel data structures and
-inserts calls to hook functions at critical points in the kernel code to
-manage the security fields and to perform access control. It also adds
-functions for registering and unregistering security modules, and adds a
-general :c:func:`security()` system call to support new system calls
-for security-aware applications.
-
-The LSM security fields are simply ``void*`` pointers. For process and
-program execution security information, security fields were added to
+The LSM framework includes security fields in kernel data structures and
+calls to hook functions at critical points in the kernel code to
+manage the security fields and to perform access control.
+It also adds functions for registering security modules.
+An interface `/sys/kernel/security/lsm` reports a comma separated list
+of security modules that are active on the system.
+
+The LSM security fields are simply ``void*`` pointers.
+The data is referred to as a blob, which may be managed by
+the framework or by the individual security modules that use it.
+Security blobs that are used by more than one security module are
+typically managed by the framework.
+For process and
+program execution security information, security fields are included in
:c:type:`struct task_struct <task_struct>` and
-:c:type:`struct linux_binprm <linux_binprm>`. For filesystem
-security information, a security field was added to :c:type:`struct
+:c:type:`struct cred <cred>`.
+For filesystem
+security information, a security field is included in :c:type:`struct
super_block <super_block>`. For pipe, file, and socket security
-information, security fields were added to :c:type:`struct inode
-<inode>` and :c:type:`struct file <file>`. For packet and
-network device security information, security fields were added to
-:c:type:`struct sk_buff <sk_buff>` and :c:type:`struct
-net_device <net_device>`. For System V IPC security information,
+information, security fields are included in :c:type:`struct inode
+<inode>` and :c:type:`struct file <file>`.
+For System V IPC security information,
security fields were added to :c:type:`struct kern_ipc_perm
<kern_ipc_perm>` and :c:type:`struct msg_msg
<msg_msg>`; additionally, the definitions for :c:type:`struct
@@ -84,118 +87,45 @@ were moved to header files (``include/linux/msg.h`` and
``include/linux/shm.h`` as appropriate) to allow the security modules to
use these definitions.
-Each LSM hook is a function pointer in a global table, security_ops.
-This table is a :c:type:`struct security_operations
-<security_operations>` structure as defined by
-``include/linux/security.h``. Detailed documentation for each hook is
-included in this header file. At present, this structure consists of a
-collection of substructures that group related hooks based on the kernel
-object (e.g. task, inode, file, sk_buff, etc) as well as some top-level
-hook function pointers for system operations. This structure is likely
-to be flattened in the future for performance. The placement of the hook
-calls in the kernel code is described by the "called:" lines in the
-per-hook documentation in the header file. The hook calls can also be
-easily found in the kernel code by looking for the string
-"security_ops->".
-
-Linus mentioned per-process security hooks in his original remarks as a
-possible alternative to global security hooks. However, if LSM were to
-start from the perspective of per-process hooks, then the base framework
-would have to deal with how to handle operations that involve multiple
-processes (e.g. kill), since each process might have its own hook for
-controlling the operation. This would require a general mechanism for
-composing hooks in the base framework. Additionally, LSM would still
-need global hooks for operations that have no process context (e.g.
-network input operations). Consequently, LSM provides global security
-hooks, but a security module is free to implement per-process hooks
-(where that makes sense) by storing a security_ops table in each
-process' security field and then invoking these per-process hooks from
-the global hooks. The problem of composition is thus deferred to the
-module.
-
-The global security_ops table is initialized to a set of hook functions
-provided by a dummy security module that provides traditional superuser
-logic. A :c:func:`register_security()` function (in
-``security/security.c``) is provided to allow a security module to set
-security_ops to refer to its own hook functions, and an
-:c:func:`unregister_security()` function is provided to revert
-security_ops to the dummy module hooks. This mechanism is used to set
-the primary security module, which is responsible for making the final
-decision for each hook.
-
-LSM also provides a simple mechanism for stacking additional security
-modules with the primary security module. It defines
-:c:func:`register_security()` and
-:c:func:`unregister_security()` hooks in the :c:type:`struct
-security_operations <security_operations>` structure and
-provides :c:func:`mod_reg_security()` and
-:c:func:`mod_unreg_security()` functions that invoke these hooks
-after performing some sanity checking. A security module can call these
-functions in order to stack with other modules. However, the actual
-details of how this stacking is handled are deferred to the module,
-which can implement these hooks in any way it wishes (including always
-returning an error if it does not wish to support stacking). In this
-manner, LSM again defers the problem of composition to the module.
-
-Although the LSM hooks are organized into substructures based on kernel
-object, all of the hooks can be viewed as falling into two major
+For packet and
+network device security information, security fields were added to
+:c:type:`struct sk_buff <sk_buff>` and
+:c:type:`struct scm_cookie <scm_cookie>`.
+Unlike the other security module data, the data used here is a
+32-bit integer. The security modules are required to map or otherwise
+associate these values with real security attributes.
+
+LSM hooks are maintained in lists. A list is maintained for each
+hook, and the hooks are called in the order specified by CONFIG_LSM.
+Detailed documentation for each hook is
+included in the `include/linux/lsm_hooks.h` header file.
+
+The LSM framework provides for a close approximation of
+general security module stacking. It defines
+security_add_hooks() to which each security module passes a
+:c:type:`struct security_hooks_list <security_hooks_list>`,
+which are added to the lists.
+The LSM framework does not provide a mechanism for removing hooks that
+have been registered. The SELinux security module has implemented
+a way to remove itself, however the feature has been deprecated.
+
+The hooks can be viewed as falling into two major
categories: hooks that are used to manage the security fields and hooks
that are used to perform access control. Examples of the first category
-of hooks include the :c:func:`alloc_security()` and
-:c:func:`free_security()` hooks defined for each kernel data
-structure that has a security field. These hooks are used to allocate
-and free security structures for kernel objects. The first category of
-hooks also includes hooks that set information in the security field
-after allocation, such as the :c:func:`post_lookup()` hook in
-:c:type:`struct inode_security_ops <inode_security_ops>`.
-This hook is used to set security information for inodes after
-successful lookup operations. An example of the second category of hooks
-is the :c:func:`permission()` hook in :c:type:`struct
-inode_security_ops <inode_security_ops>`. This hook checks
-permission when accessing an inode.
+of hooks include the security_inode_alloc() and security_inode_free()
+These hooks are used to allocate
+and free security structures for inode objects.
+An example of the second category of hooks
+is the security_inode_permission() hook.
+This hook checks permission when accessing an inode.
LSM Capabilities Module
=======================
-The LSM kernel patch moves most of the existing POSIX.1e capabilities
-logic into an optional security module stored in the file
-``security/capability.c``. This change allows users who do not want to
-use capabilities to omit this code entirely from their kernel, instead
-using the dummy module for traditional superuser logic or any other
-module that they desire. This change also allows the developers of the
-capabilities logic to maintain and enhance their code more freely,
-without needing to integrate patches back into the base kernel.
-
-In addition to moving the capabilities logic, the LSM kernel patch could
-move the capability-related fields from the kernel data structures into
-the new security fields managed by the security modules. However, at
-present, the LSM kernel patch leaves the capability fields in the kernel
-data structures. In his original remarks, Linus suggested that this
-might be preferable so that other security modules can be easily stacked
-with the capabilities module without needing to chain multiple security
-structures on the security field. It also avoids imposing extra overhead
-on the capabilities module to manage the security fields. However, the
-LSM framework could certainly support such a move if it is determined to
-be desirable, with only a few additional changes described below.
-
-At present, the capabilities logic for computing process capabilities on
-:c:func:`execve()` and :c:func:`set\*uid()`, checking
-capabilities for a particular process, saving and checking capabilities
-for netlink messages, and handling the :c:func:`capget()` and
-:c:func:`capset()` system calls have been moved into the
-capabilities module. There are still a few locations in the base kernel
-where capability-related fields are directly examined or modified, but
-the current version of the LSM patch does allow a security module to
-completely replace the assignment and testing of capabilities. These few
-locations would need to be changed if the capability-related fields were
-moved into the security field. The following is a list of known
-locations that still perform such direct examination or modification of
-capability-related fields:
-
-- ``fs/open.c``::c:func:`sys_access()`
-
-- ``fs/lockd/host.c``::c:func:`nlm_bind_host()`
-
-- ``fs/nfsd/auth.c``::c:func:`nfsd_setuser()`
-
-- ``fs/proc/array.c``::c:func:`task_cap()`
+The POSIX.1e capabilities logic is maintained as a security module
+stored in the file ``security/commoncap.c``. The capabilities
+module uses the order field of the :c:type:`lsm_info` description
+to identify it as the first security module to be registered.
+The capabilities security module does not use the general security
+blobs, unlike other modules. The reasons are historical and are
+based on overhead, complexity and performance concerns.
diff --git a/Documentation/security/siphash.rst b/Documentation/security/siphash.rst
index 4eba68cdf0a1..bd9363025fcb 100644
--- a/Documentation/security/siphash.rst
+++ b/Documentation/security/siphash.rst
@@ -7,7 +7,7 @@ SipHash - a short input PRF
SipHash is a cryptographically secure PRF -- a keyed hash function -- that
performs very well for short inputs, hence the name. It was designed by
cryptographers Daniel J. Bernstein and Jean-Philippe Aumasson. It is intended
-as a replacement for some uses of: `jhash`, `md5_transform`, `sha_transform`,
+as a replacement for some uses of: `jhash`, `md5_transform`, `sha1_transform`,
and so forth.
SipHash takes a secret key filled with randomly generated numbers and either
diff --git a/Documentation/sphinx/requirements.txt b/Documentation/sphinx/requirements.txt
index 14e29a0ae480..489f6626de67 100644
--- a/Documentation/sphinx/requirements.txt
+++ b/Documentation/sphinx/requirements.txt
@@ -1,3 +1,3 @@
docutils
-Sphinx==1.7.9
+Sphinx==2.4.4
sphinx_rtd_theme
diff --git a/Documentation/timers/timers-howto.rst b/Documentation/timers/timers-howto.rst
index 7e3167bec2b1..afb0a43b8cdf 100644
--- a/Documentation/timers/timers-howto.rst
+++ b/Documentation/timers/timers-howto.rst
@@ -110,3 +110,6 @@ NON-ATOMIC CONTEXT:
short, the difference is whether the sleep can be ended
early by a signal. In general, just use msleep unless
you know you have a need for the interruptible variant.
+
+ FLEXIBLE SLEEPING (any delay, uninterruptible)
+ * Use fsleep
diff --git a/Documentation/trace/coresight/coresight-ect.rst b/Documentation/trace/coresight/coresight-ect.rst
index ecc1e57012ef..a93e52abcf46 100644
--- a/Documentation/trace/coresight/coresight-ect.rst
+++ b/Documentation/trace/coresight/coresight-ect.rst
@@ -1,4 +1,5 @@
.. SPDX-License-Identifier: GPL-2.0
+
=============================================
CoreSight Embedded Cross Trigger (CTI & CTM).
=============================================
diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
index 4a2ebe0bd19b..f792b1959a33 100644
--- a/Documentation/trace/events.rst
+++ b/Documentation/trace/events.rst
@@ -527,8 +527,8 @@ The following commands are supported:
See Documentation/trace/histogram.rst for details and examples.
-6.3 In-kernel trace event API
------------------------------
+7. In-kernel trace event API
+============================
In most cases, the command-line interface to trace events is more than
sufficient. Sometimes, however, applications might find the need for
@@ -560,8 +560,8 @@ following:
- tracing synthetic events from in-kernel code
- the low-level "dynevent_cmd" API
-6.3.1 Dyamically creating synthetic event definitions
------------------------------------------------------
+7.1 Dyamically creating synthetic event definitions
+---------------------------------------------------
There are a couple ways to create a new synthetic event from a kernel
module or other kernel code.
@@ -666,8 +666,8 @@ registered by calling the synth_event_gen_cmd_end() function::
At this point, the event object is ready to be used for tracing new
events.
-6.3.3 Tracing synthetic events from in-kernel code
---------------------------------------------------
+7.2 Tracing synthetic events from in-kernel code
+------------------------------------------------
To trace a synthetic event, there are several options. The first
option is to trace the event in one call, using synth_event_trace()
@@ -678,8 +678,8 @@ synth_event_trace_start() and synth_event_trace_end() along with
synth_event_add_next_val() or synth_event_add_val() to add the values
piecewise.
-6.3.3.1 Tracing a synthetic event all at once
----------------------------------------------
+7.2.1 Tracing a synthetic event all at once
+-------------------------------------------
To trace a synthetic event all at once, the synth_event_trace() or
synth_event_trace_array() functions can be used.
@@ -780,8 +780,8 @@ remove the event::
ret = synth_event_delete("schedtest");
-6.3.3.1 Tracing a synthetic event piecewise
--------------------------------------------
+7.2.2 Tracing a synthetic event piecewise
+-----------------------------------------
To trace a synthetic using the piecewise method described above, the
synth_event_trace_start() function is used to 'open' the synthetic
@@ -864,8 +864,8 @@ Note that synth_event_trace_end() must be called at the end regardless
of whether any of the add calls failed (say due to a bad field name
being passed in).
-6.3.4 Dyamically creating kprobe and kretprobe event definitions
-----------------------------------------------------------------
+7.3 Dyamically creating kprobe and kretprobe event definitions
+--------------------------------------------------------------
To create a kprobe or kretprobe trace event from kernel code, the
kprobe_event_gen_cmd_start() or kretprobe_event_gen_cmd_start()
@@ -941,8 +941,8 @@ used to give the kprobe event file back and delete the event::
ret = kprobe_event_delete("gen_kprobe_test");
-6.3.4 The "dynevent_cmd" low-level API
---------------------------------------
+7.4 The "dynevent_cmd" low-level API
+------------------------------------
Both the in-kernel synthetic event and kprobe interfaces are built on
top of a lower-level "dynevent_cmd" interface. This interface is
diff --git a/Documentation/trace/ftrace-design.rst b/Documentation/trace/ftrace-design.rst
index a8e22e0db63c..6893399157f0 100644
--- a/Documentation/trace/ftrace-design.rst
+++ b/Documentation/trace/ftrace-design.rst
@@ -229,14 +229,6 @@ Adding support for it is easy: just define the macro in asm/ftrace.h and
pass the return address pointer as the 'retp' argument to
ftrace_push_return_trace().
-HAVE_FTRACE_NMI_ENTER
----------------------
-
-If you can't trace NMI functions, then skip this option.
-
-<details to be filled>
-
-
HAVE_SYSCALL_TRACEPOINTS
------------------------
diff --git a/Documentation/translations/it_IT/doc-guide/kernel-doc.rst b/Documentation/translations/it_IT/doc-guide/kernel-doc.rst
index a4ecd8f27631..524ad86cadbb 100644
--- a/Documentation/translations/it_IT/doc-guide/kernel-doc.rst
+++ b/Documentation/translations/it_IT/doc-guide/kernel-doc.rst
@@ -515,6 +515,22 @@ internal: *[source-pattern ...]*
.. kernel-doc:: drivers/gpu/drm/i915/intel_audio.c
:internal:
+identifiers: *[ function/type ...]*
+ Include la documentazione per ogni *function* e *type* in *source*.
+ Se non vengono esplicitamente specificate le funzioni da includere, allora
+ verranno incluse tutte quelle disponibili in *source*.
+
+ Esempi::
+
+ .. kernel-doc:: lib/bitmap.c
+ :identifiers: bitmap_parselist bitmap_parselist_user
+
+ .. kernel-doc:: lib/idr.c
+ :identifiers:
+
+functions: *[ function ...]*
+ Questo è uno pseudonimo, deprecato, per la direttiva 'identifiers'.
+
doc: *title*
Include la documentazione del paragrafo ``DOC:`` identificato dal titolo
(*title*) all'interno del file sorgente (*source*). Gli spazi in *title* sono
@@ -528,15 +544,6 @@ doc: *title*
.. kernel-doc:: drivers/gpu/drm/i915/intel_audio.c
:doc: High Definition Audio over HDMI and Display Port
-functions: *function* *[...]*
- Dal file sorgente (*source*) include la documentazione per le funzioni
- elencate (*function*).
-
- Esempio::
-
- .. kernel-doc:: lib/bitmap.c
- :functions: bitmap_parselist bitmap_parselist_user
-
Senza alcuna opzione, la direttiva kernel-doc include tutti i commenti di
documentazione presenti nel file sorgente (*source*).
diff --git a/Documentation/translations/it_IT/kernel-hacking/hacking.rst b/Documentation/translations/it_IT/kernel-hacking/hacking.rst
index 24c592852bf1..6aab27a8d323 100644
--- a/Documentation/translations/it_IT/kernel-hacking/hacking.rst
+++ b/Documentation/translations/it_IT/kernel-hacking/hacking.rst
@@ -627,6 +627,24 @@ Alcuni manutentori e sviluppatori potrebbero comunque richiedere
:c:func:`EXPORT_SYMBOL_GPL()` quando si aggiungono nuove funzionalità o
interfacce.
+:c:func:`EXPORT_SYMBOL_NS()`
+----------------------------
+
+Definita in ``include/linux/export.h``
+
+Questa è una variate di `EXPORT_SYMBOL()` che permette di specificare uno
+spazio dei nomi. Lo spazio dei nomi è documentato in
+:doc:`../core-api/symbol-namespaces`
+
+:c:func:`EXPORT_SYMBOL_NS_GPL()`
+--------------------------------
+
+Definita in ``include/linux/export.h``
+
+Questa è una variate di `EXPORT_SYMBOL_GPL()` che permette di specificare uno
+spazio dei nomi. Lo spazio dei nomi è documentato in
+:doc:`../core-api/symbol-namespaces`
+
Procedure e convenzioni
=======================
diff --git a/Documentation/translations/it_IT/kernel-hacking/locking.rst b/Documentation/translations/it_IT/kernel-hacking/locking.rst
index b9a6be4b8499..4615df5723fb 100644
--- a/Documentation/translations/it_IT/kernel-hacking/locking.rst
+++ b/Documentation/translations/it_IT/kernel-hacking/locking.rst
@@ -159,17 +159,17 @@ Sincronizzazione in contesto utente
Se avete una struttura dati che verrà utilizzata solo dal contesto utente,
allora, per proteggerla, potete utilizzare un semplice mutex
(``include/linux/mutex.h``). Questo è il caso più semplice: inizializzate il
-mutex; invocate :c:func:`mutex_lock_interruptible()` per trattenerlo e
-:c:func:`mutex_unlock()` per rilasciarlo. C'è anche :c:func:`mutex_lock()`
+mutex; invocate mutex_lock_interruptible() per trattenerlo e
+mutex_unlock() per rilasciarlo. C'è anche mutex_lock()
ma questa dovrebbe essere evitata perché non ritorna in caso di segnali.
Per esempio: ``net/netfilter/nf_sockopt.c`` permette la registrazione
-di nuove chiamate per :c:func:`setsockopt()` e :c:func:`getsockopt()`
-usando la funzione :c:func:`nf_register_sockopt()`. La registrazione e
+di nuove chiamate per setsockopt() e getsockopt()
+usando la funzione nf_register_sockopt(). La registrazione e
la rimozione vengono eseguite solamente quando il modulo viene caricato
o scaricato (e durante l'avvio del sistema, qui non abbiamo concorrenza),
e la lista delle funzioni registrate viene consultata solamente quando
-:c:func:`setsockopt()` o :c:func:`getsockopt()` sono sconosciute al sistema.
+setsockopt() o getsockopt() sono sconosciute al sistema.
In questo caso ``nf_sockopt_mutex`` è perfetto allo scopo, in particolar modo
visto che setsockopt e getsockopt potrebbero dormire.
@@ -179,19 +179,19 @@ Sincronizzazione fra il contesto utente e i softirq
Se un softirq condivide dati col contesto utente, avete due problemi.
Primo, il contesto utente corrente potrebbe essere interroto da un softirq,
e secondo, la sezione critica potrebbe essere eseguita da un altro
-processore. Questo è quando :c:func:`spin_lock_bh()`
+processore. Questo è quando spin_lock_bh()
(``include/linux/spinlock.h``) viene utilizzato. Questo disabilita i softirq
-sul processore e trattiene il *lock*. Invece, :c:func:`spin_unlock_bh()` fa
+sul processore e trattiene il *lock*. Invece, spin_unlock_bh() fa
l'opposto. (Il suffisso '_bh' è un residuo storico che fa riferimento al
"Bottom Halves", il vecchio nome delle interruzioni software. In un mondo
perfetto questa funzione si chiamerebbe 'spin_lock_softirq()').
-Da notare che in questo caso potete utilizzare anche :c:func:`spin_lock_irq()`
-o :c:func:`spin_lock_irqsave()`, queste fermano anche le interruzioni hardware:
+Da notare che in questo caso potete utilizzare anche spin_lock_irq()
+o spin_lock_irqsave(), queste fermano anche le interruzioni hardware:
vedere :ref:`Contesto di interruzione hardware <it_hardirq-context>`.
Questo funziona alla perfezione anche sui sistemi monoprocessore: gli spinlock
-svaniscono e questa macro diventa semplicemente :c:func:`local_bh_disable()`
+svaniscono e questa macro diventa semplicemente local_bh_disable()
(``include/linux/interrupt.h``), la quale impedisce ai softirq d'essere
eseguiti.
@@ -224,8 +224,8 @@ Differenti tasklet/timer
~~~~~~~~~~~~~~~~~~~~~~~~
Se un altro tasklet/timer vuole condividere dati col vostro tasklet o timer,
-allora avrete bisogno entrambe di :c:func:`spin_lock()` e
-:c:func:`spin_unlock()`. Qui :c:func:`spin_lock_bh()` è inutile, siete già
+allora avrete bisogno entrambe di spin_lock() e
+spin_unlock(). Qui spin_lock_bh() è inutile, siete già
in un tasklet ed avete la garanzia che nessun altro verrà eseguito sullo
stesso processore.
@@ -243,13 +243,13 @@ processore (vedere :ref:`Dati per processore <it_per-cpu>`). Se siete arrivati
fino a questo punto nell'uso dei softirq, probabilmente tenete alla scalabilità
delle prestazioni abbastanza da giustificarne la complessità aggiuntiva.
-Dovete utilizzare :c:func:`spin_lock()` e :c:func:`spin_unlock()` per
+Dovete utilizzare spin_lock() e spin_unlock() per
proteggere i dati condivisi.
Diversi Softirqs
~~~~~~~~~~~~~~~~
-Dovete utilizzare :c:func:`spin_lock()` e :c:func:`spin_unlock()` per
+Dovete utilizzare spin_lock() e spin_unlock() per
proteggere i dati condivisi, che siano timer, tasklet, diversi softirq o
lo stesso o altri softirq: uno qualsiasi di essi potrebbe essere in esecuzione
su un diverso processore.
@@ -270,40 +270,40 @@ Se un gestore di interruzioni hardware condivide dati con un softirq, allora
avrete due preoccupazioni. Primo, il softirq può essere interrotto da
un'interruzione hardware, e secondo, la sezione critica potrebbe essere
eseguita da un'interruzione hardware su un processore diverso. Questo è il caso
-dove :c:func:`spin_lock_irq()` viene utilizzato. Disabilita le interruzioni
-sul processore che l'esegue, poi trattiene il lock. :c:func:`spin_unlock_irq()`
+dove spin_lock_irq() viene utilizzato. Disabilita le interruzioni
+sul processore che l'esegue, poi trattiene il lock. spin_unlock_irq()
fa l'opposto.
-Il gestore d'interruzione hardware non usa :c:func:`spin_lock_irq()` perché
-i softirq non possono essere eseguiti quando il gestore d'interruzione hardware
-è in esecuzione: per questo si può usare :c:func:`spin_lock()`, che è un po'
+Il gestore d'interruzione hardware non ha bisogno di usare spin_lock_irq()
+perché i softirq non possono essere eseguiti quando il gestore d'interruzione
+hardware è in esecuzione: per questo si può usare spin_lock(), che è un po'
più veloce. L'unica eccezione è quando un altro gestore d'interruzioni
-hardware utilizza lo stesso *lock*: :c:func:`spin_lock_irq()` impedirà a questo
+hardware utilizza lo stesso *lock*: spin_lock_irq() impedirà a questo
secondo gestore di interrompere quello in esecuzione.
Questo funziona alla perfezione anche sui sistemi monoprocessore: gli spinlock
-svaniscono e questa macro diventa semplicemente :c:func:`local_irq_disable()`
+svaniscono e questa macro diventa semplicemente local_irq_disable()
(``include/asm/smp.h``), la quale impedisce a softirq/tasklet/BH d'essere
eseguiti.
-:c:func:`spin_lock_irqsave()` (``include/linux/spinlock.h``) è una variante che
+spin_lock_irqsave() (``include/linux/spinlock.h``) è una variante che
salva lo stato delle interruzioni in una variabile, questa verrà poi passata
-a :c:func:`spin_unlock_irqrestore()`. Questo significa che lo stesso codice
+a spin_unlock_irqrestore(). Questo significa che lo stesso codice
potrà essere utilizzato in un'interruzione hardware (dove le interruzioni sono
già disabilitate) e in un softirq (dove la disabilitazione delle interruzioni
è richiesta).
Da notare che i softirq (e quindi tasklet e timer) sono eseguiti al ritorno
-da un'interruzione hardware, quindi :c:func:`spin_lock_irq()` interrompe
+da un'interruzione hardware, quindi spin_lock_irq() interrompe
anche questi. Tenuto conto di questo si può dire che
-:c:func:`spin_lock_irqsave()` è la funzione di sincronizzazione più generica
+spin_lock_irqsave() è la funzione di sincronizzazione più generica
e potente.
Sincronizzazione fra due gestori d'interruzioni hardware
--------------------------------------------------------
Condividere dati fra due gestori di interruzione hardware è molto raro, ma se
-succede, dovreste usare :c:func:`spin_lock_irqsave()`: è una specificità
+succede, dovreste usare spin_lock_irqsave(): è una specificità
dell'architettura il fatto che tutte le interruzioni vengano interrotte
quando si eseguono di gestori di interruzioni.
@@ -317,11 +317,11 @@ Pete Zaitcev ci offre il seguente riassunto:
il mutex e dormire (``copy_from_user*(`` o ``kmalloc(x,GFP_KERNEL)``).
- Altrimenti (== i dati possono essere manipolati da un'interruzione) usate
- :c:func:`spin_lock_irqsave()` e :c:func:`spin_unlock_irqrestore()`.
+ spin_lock_irqsave() e spin_unlock_irqrestore().
- Evitate di trattenere uno spinlock per più di 5 righe di codice incluse
le chiamate a funzione (ad eccezione di quell per l'accesso come
- :c:func:`readb()`).
+ readb()).
Tabella dei requisiti minimi
----------------------------
@@ -334,7 +334,7 @@ processore alla volta, ma se deve condividere dati con un altro thread, allora
la sincronizzazione è necessaria).
Ricordatevi il suggerimento qui sopra: potete sempre usare
-:c:func:`spin_lock_irqsave()`, che è un sovrainsieme di tutte le altre funzioni
+spin_lock_irqsave(), che è un sovrainsieme di tutte le altre funzioni
per spinlock.
============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ==============
@@ -378,13 +378,13 @@ protetti dal *lock* quando qualche altro thread lo sta già facendo
trattenendo il *lock*. Potrete acquisire il *lock* più tardi se vi
serve accedere ai dati protetti da questo *lock*.
-La funzione :c:func:`spin_trylock()` non ritenta di acquisire il *lock*,
+La funzione spin_trylock() non ritenta di acquisire il *lock*,
se ci riesce al primo colpo ritorna un valore diverso da zero, altrimenti
se fallisce ritorna 0. Questa funzione può essere utilizzata in un qualunque
-contesto, ma come :c:func:`spin_lock()`: dovete disabilitare i contesti che
+contesto, ma come spin_lock(): dovete disabilitare i contesti che
potrebbero interrompervi e quindi trattenere lo spinlock.
-La funzione :c:func:`mutex_trylock()` invece di sospendere il vostro processo
+La funzione mutex_trylock() invece di sospendere il vostro processo
ritorna un valore diverso da zero se è possibile trattenere il lock al primo
colpo, altrimenti se fallisce ritorna 0. Nonostante non dorma, questa funzione
non può essere usata in modo sicuro in contesti di interruzione hardware o
@@ -506,7 +506,7 @@ della memoria che il suo contenuto sono protetti dal *lock*. Questo
caso è semplice dato che copiamo i dati dall'utente e non permettiamo
mai loro di accedere direttamente agli oggetti.
-C'è una piccola ottimizzazione qui: nella funzione :c:func:`cache_add()`
+C'è una piccola ottimizzazione qui: nella funzione cache_add()
impostiamo i campi dell'oggetto prima di acquisire il *lock*. Questo è
sicuro perché nessun altro potrà accedervi finché non lo inseriremo
nella memoria.
@@ -514,7 +514,7 @@ nella memoria.
Accesso dal contesto utente
---------------------------
-Ora consideriamo il caso in cui :c:func:`cache_find()` può essere invocata
+Ora consideriamo il caso in cui cache_find() può essere invocata
dal contesto d'interruzione: sia hardware che software. Un esempio potrebbe
essere un timer che elimina oggetti dalla memoria.
@@ -583,15 +583,15 @@ sono quelle rimosse, mentre quelle ``+`` sono quelle aggiunte.
return ret;
}
-Da notare che :c:func:`spin_lock_irqsave()` disabiliterà le interruzioni
+Da notare che spin_lock_irqsave() disabiliterà le interruzioni
se erano attive, altrimenti non farà niente (quando siamo già in un contesto
d'interruzione); dunque queste funzioni possono essere chiamante in
sicurezza da qualsiasi contesto.
-Sfortunatamente, :c:func:`cache_add()` invoca :c:func:`kmalloc()` con
+Sfortunatamente, cache_add() invoca kmalloc() con
l'opzione ``GFP_KERNEL`` che è permessa solo in contesto utente. Ho supposto
-che :c:func:`cache_add()` venga chiamata dal contesto utente, altrimenti
-questa opzione deve diventare un parametro di :c:func:`cache_add()`.
+che cache_add() venga chiamata dal contesto utente, altrimenti
+questa opzione deve diventare un parametro di cache_add().
Esporre gli oggetti al di fuori del file
----------------------------------------
@@ -610,7 +610,7 @@ Il secondo problema è il problema del ciclo di vita: se un'altra struttura
mantiene un puntatore ad un oggetto, presumibilmente si aspetta che questo
puntatore rimanga valido. Sfortunatamente, questo è garantito solo mentre
si trattiene il *lock*, altrimenti qualcuno potrebbe chiamare
-:c:func:`cache_delete()` o peggio, aggiungere un oggetto che riutilizza lo
+cache_delete() o peggio, aggiungere un oggetto che riutilizza lo
stesso indirizzo.
Dato che c'è un solo *lock*, non potete trattenerlo a vita: altrimenti
@@ -710,9 +710,9 @@ Ecco il codice::
}
Abbiamo incapsulato il contatore di riferimenti nelle tipiche funzioni
-di 'get' e 'put'. Ora possiamo ritornare l'oggetto da :c:func:`cache_find()`
+di 'get' e 'put'. Ora possiamo ritornare l'oggetto da cache_find()
col vantaggio che l'utente può dormire trattenendo l'oggetto (per esempio,
-:c:func:`copy_to_user()` per copiare il nome verso lo spazio utente).
+copy_to_user() per copiare il nome verso lo spazio utente).
Un altro punto da notare è che ho detto che il contatore dovrebbe incrementarsi
per ogni puntatore ad un oggetto: quindi il contatore di riferimenti è 1
@@ -727,8 +727,8 @@ Ci sono un certo numbero di operazioni atomiche definite
in ``include/asm/atomic.h``: queste sono garantite come atomiche su qualsiasi
processore del sistema, quindi non sono necessari i *lock*. In questo caso è
più semplice rispetto all'uso degli spinlock, benché l'uso degli spinlock
-sia più elegante per casi non banali. Le funzioni :c:func:`atomic_inc()` e
-:c:func:`atomic_dec_and_test()` vengono usate al posto dei tipici operatori di
+sia più elegante per casi non banali. Le funzioni atomic_inc() e
+atomic_dec_and_test() vengono usate al posto dei tipici operatori di
incremento e decremento, e i *lock* non sono più necessari per proteggere il
contatore stesso.
@@ -820,7 +820,7 @@ al nome di cambiare abbiamo tre possibilità:
- Si può togliere static da ``cache_lock`` e dire agli utenti che devono
trattenere il *lock* prima di modificare il nome di un oggetto.
-- Si può fornire una funzione :c:func:`cache_obj_rename()` che prende il
+- Si può fornire una funzione cache_obj_rename() che prende il
*lock* e cambia il nome per conto del chiamante; si dirà poi agli utenti
di usare questa funzione.
@@ -878,11 +878,11 @@ Da notare che ho deciso che il contatore di popolarità dovesse essere
protetto da ``cache_lock`` piuttosto che dal *lock* dell'oggetto; questo
perché è logicamente parte dell'infrastruttura (come
:c:type:`struct list_head <list_head>` nell'oggetto). In questo modo,
-in :c:func:`__cache_add()`, non ho bisogno di trattenere il *lock* di ogni
+in __cache_add(), non ho bisogno di trattenere il *lock* di ogni
oggetto mentre si cerca il meno popolare.
Ho anche deciso che il campo id è immutabile, quindi non ho bisogno di
-trattenere il lock dell'oggetto quando si usa :c:func:`__cache_find()`
+trattenere il lock dell'oggetto quando si usa __cache_find()
per leggere questo campo; il *lock* dell'oggetto è usato solo dal chiamante
che vuole leggere o scrivere il campo name.
@@ -907,7 +907,7 @@ Questo è facile da diagnosticare: non è uno di quei problemi che ti tengono
sveglio 5 notti a parlare da solo.
Un caso un pochino più complesso; immaginate d'avere una spazio condiviso
-fra un softirq ed il contesto utente. Se usate :c:func:`spin_lock()` per
+fra un softirq ed il contesto utente. Se usate spin_lock() per
proteggerlo, il contesto utente potrebbe essere interrotto da un softirq
mentre trattiene il lock, da qui il softirq rimarrà in attesa attiva provando
ad acquisire il *lock* già trattenuto nel contesto utente.
@@ -1006,12 +1006,12 @@ potreste fare come segue::
spin_unlock_bh(&list_lock);
Primo o poi, questo esploderà su un sistema multiprocessore perché un
-temporizzatore potrebbe essere già partiro prima di :c:func:`spin_lock_bh()`,
-e prenderà il *lock* solo dopo :c:func:`spin_unlock_bh()`, e cercherà
+temporizzatore potrebbe essere già partiro prima di spin_lock_bh(),
+e prenderà il *lock* solo dopo spin_unlock_bh(), e cercherà
di eliminare il suo oggetto (che però è già stato eliminato).
Questo può essere evitato controllando il valore di ritorno di
-:c:func:`del_timer()`: se ritorna 1, il temporizzatore è stato già
+del_timer(): se ritorna 1, il temporizzatore è stato già
rimosso. Se 0, significa (in questo caso) che il temporizzatore è in
esecuzione, quindi possiamo fare come segue::
@@ -1032,9 +1032,9 @@ esecuzione, quindi possiamo fare come segue::
spin_unlock_bh(&list_lock);
Un altro problema è l'eliminazione dei temporizzatori che si riavviano
-da soli (chiamando :c:func:`add_timer()` alla fine della loro esecuzione).
+da soli (chiamando add_timer() alla fine della loro esecuzione).
Dato che questo è un problema abbastanza comune con una propensione
-alle corse critiche, dovreste usare :c:func:`del_timer_sync()`
+alle corse critiche, dovreste usare del_timer_sync()
(``include/linux/timer.h``) per gestire questo caso. Questa ritorna il
numero di volte che il temporizzatore è stato interrotto prima che
fosse in grado di fermarlo senza che si riavviasse.
@@ -1116,7 +1116,7 @@ chiamata ``list``::
wmb();
list->next = new;
-La funzione :c:func:`wmb()` è una barriera di sincronizzazione delle
+La funzione wmb() è una barriera di sincronizzazione delle
scritture. Questa garantisce che la prima operazione (impostare l'elemento
``next`` del nuovo elemento) venga completata e vista da tutti i processori
prima che venga eseguita la seconda operazione (che sarebbe quella di mettere
@@ -1127,7 +1127,7 @@ completamente il nuovo elemento; oppure che lo vedano correttamente e quindi
il puntatore ``next`` deve puntare al resto della lista.
Fortunatamente, c'è una funzione che fa questa operazione sulle liste
-:c:type:`struct list_head <list_head>`: :c:func:`list_add_rcu()`
+:c:type:`struct list_head <list_head>`: list_add_rcu()
(``include/linux/list.h``).
Rimuovere un elemento dalla lista è anche più facile: sostituiamo il puntatore
@@ -1138,7 +1138,7 @@ l'elemento o lo salteranno.
list->next = old->next;
-La funzione :c:func:`list_del_rcu()` (``include/linux/list.h``) fa esattamente
+La funzione list_del_rcu() (``include/linux/list.h``) fa esattamente
questo (la versione normale corrompe il vecchio oggetto, e non vogliamo che
accada).
@@ -1146,9 +1146,9 @@ Anche i lettori devono stare attenti: alcuni processori potrebbero leggere
attraverso il puntatore ``next`` il contenuto dell'elemento successivo
troppo presto, ma non accorgersi che il contenuto caricato è sbagliato quando
il puntatore ``next`` viene modificato alla loro spalle. Ancora una volta
-c'è una funzione che viene in vostro aiuto :c:func:`list_for_each_entry_rcu()`
+c'è una funzione che viene in vostro aiuto list_for_each_entry_rcu()
(``include/linux/list.h``). Ovviamente, gli scrittori possono usare
-:c:func:`list_for_each_entry()` dato che non ci possono essere due scrittori
+list_for_each_entry() dato che non ci possono essere due scrittori
in contemporanea.
Il nostro ultimo dilemma è il seguente: quando possiamo realmente distruggere
@@ -1156,15 +1156,15 @@ l'elemento rimosso? Ricordate, un lettore potrebbe aver avuto accesso a questo
elemento proprio ora: se eliminiamo questo elemento ed il puntatore ``next``
cambia, il lettore salterà direttamente nella spazzatura e scoppierà. Dobbiamo
aspettare finché tutti i lettori che stanno attraversando la lista abbiano
-finito. Utilizziamo :c:func:`call_rcu()` per registrare una funzione di
+finito. Utilizziamo call_rcu() per registrare una funzione di
richiamo che distrugga l'oggetto quando tutti i lettori correnti hanno
terminato. In alternative, potrebbe essere usata la funzione
-:c:func:`synchronize_rcu()` che blocca l'esecuzione finché tutti i lettori
+synchronize_rcu() che blocca l'esecuzione finché tutti i lettori
non terminano di ispezionare la lista.
Ma come fa l'RCU a sapere quando i lettori sono finiti? Il meccanismo è
il seguente: innanzi tutto i lettori accedono alla lista solo fra la coppia
-:c:func:`rcu_read_lock()`/:c:func:`rcu_read_unlock()` che disabilita la
+rcu_read_lock()/rcu_read_unlock() che disabilita la
prelazione così che i lettori non vengano sospesi mentre stanno leggendo
la lista.
@@ -1253,12 +1253,12 @@ codice RCU è un po' più ottimizzato di così, ma questa è l'idea di fondo.
}
Da notare che i lettori modificano il campo popularity nella funzione
-:c:func:`__cache_find()`, e ora non trattiene alcun *lock*. Una soluzione
+__cache_find(), e ora non trattiene alcun *lock*. Una soluzione
potrebbe essere quella di rendere la variabile ``atomic_t``, ma per l'uso
che ne abbiamo fatto qui, non ci interessano queste corse critiche perché un
risultato approssimativo è comunque accettabile, quindi non l'ho cambiato.
-Il risultato è che la funzione :c:func:`cache_find()` non ha bisogno di alcuna
+Il risultato è che la funzione cache_find() non ha bisogno di alcuna
sincronizzazione con le altre funzioni, quindi è veloce su un sistema
multi-processore tanto quanto lo sarebbe su un sistema mono-processore.
@@ -1271,9 +1271,9 @@ riferimenti.
Ora, dato che il '*lock* di lettura' di un RCU non fa altro che disabilitare
la prelazione, un chiamante che ha sempre la prelazione disabilitata fra le
-chiamate :c:func:`cache_find()` e :c:func:`object_put()` non necessita
+chiamate cache_find() e object_put() non necessita
di incrementare e decrementare il contatore di riferimenti. Potremmo
-esporre la funzione :c:func:`__cache_find()` dichiarandola non-static,
+esporre la funzione __cache_find() dichiarandola non-static,
e quel chiamante potrebbe usare direttamente questa funzione.
Il beneficio qui sta nel fatto che il contatore di riferimenti no
@@ -1293,10 +1293,10 @@ singolo contatore. Facile e pulito.
Se questo dovesse essere troppo lento (solitamente non lo è, ma se avete
dimostrato che lo è devvero), potreste usare un contatore per ogni processore
e quindi non sarebbe più necessaria la mutua esclusione. Vedere
-:c:func:`DEFINE_PER_CPU()`, :c:func:`get_cpu_var()` e :c:func:`put_cpu_var()`
+DEFINE_PER_CPU(), get_cpu_var() e put_cpu_var()
(``include/linux/percpu.h``).
-Il tipo di dato ``local_t``, la funzione :c:func:`cpu_local_inc()` e tutte
+Il tipo di dato ``local_t``, la funzione cpu_local_inc() e tutte
le altre funzioni associate, sono di particolare utilità per semplici contatori
per-processore; su alcune architetture sono anche più efficienti
(``include/asm/local.h``).
@@ -1324,11 +1324,11 @@ da un'interruzione software. Il gestore d'interruzione non utilizza alcun
enable_irq(irq);
spin_unlock(&lock);
-La funzione :c:func:`disable_irq()` impedisce al gestore d'interruzioni
+La funzione disable_irq() impedisce al gestore d'interruzioni
d'essere eseguito (e aspetta che finisca nel caso fosse in esecuzione su
un altro processore). Lo spinlock, invece, previene accessi simultanei.
Naturalmente, questo è più lento della semplice chiamata
-:c:func:`spin_lock_irq()`, quindi ha senso solo se questo genere di accesso
+spin_lock_irq(), quindi ha senso solo se questo genere di accesso
è estremamente raro.
.. _`it_sleeping-things`:
@@ -1336,7 +1336,7 @@ Naturalmente, questo è più lento della semplice chiamata
Quali funzioni possono essere chiamate in modo sicuro dalle interruzioni?
=========================================================================
-Molte funzioni del kernel dormono (in sostanza, chiamano ``schedule()``)
+Molte funzioni del kernel dormono (in sostanza, chiamano schedule())
direttamente od indirettamente: non potete chiamarle se trattenere uno
spinlock o avete la prelazione disabilitata, mai. Questo significa che
dovete necessariamente essere nel contesto utente: chiamarle da un
@@ -1354,23 +1354,23 @@ dormire.
- Accessi allo spazio utente:
- - :c:func:`copy_from_user()`
+ - copy_from_user()
- - :c:func:`copy_to_user()`
+ - copy_to_user()
- - :c:func:`get_user()`
+ - get_user()
- - :c:func:`put_user()`
+ - put_user()
-- :c:func:`kmalloc(GFP_KERNEL) <kmalloc>`
+- kmalloc(GFP_KERNEL) <kmalloc>`
-- :c:func:`mutex_lock_interruptible()` and
- :c:func:`mutex_lock()`
+- mutex_lock_interruptible() and
+ mutex_lock()
- C'è anche :c:func:`mutex_trylock()` che però non dorme.
+ C'è anche mutex_trylock() che però non dorme.
Comunque, non deve essere usata in un contesto d'interruzione dato
che la sua implementazione non è sicura in quel contesto.
- Anche :c:func:`mutex_unlock()` non dorme mai. Non può comunque essere
+ Anche mutex_unlock() non dorme mai. Non può comunque essere
usata in un contesto d'interruzione perché un mutex deve essere rilasciato
dallo stesso processo che l'ha acquisito.
@@ -1380,11 +1380,11 @@ Alcune funzioni che non dormono
Alcune funzioni possono essere chiamate tranquillamente da qualsiasi
contesto, o trattenendo un qualsiasi *lock*.
-- :c:func:`printk()`
+- printk()
-- :c:func:`kfree()`
+- kfree()
-- :c:func:`add_timer()` e :c:func:`del_timer()`
+- add_timer() e del_timer()
Riferimento per l'API dei Mutex
===============================
@@ -1444,14 +1444,14 @@ prelazione
bh
Bottom Half: per ragioni storiche, le funzioni che contengono '_bh' nel
loro nome ora si riferiscono a qualsiasi interruzione software; per esempio,
- :c:func:`spin_lock_bh()` blocca qualsiasi interuzione software sul processore
+ spin_lock_bh() blocca qualsiasi interuzione software sul processore
corrente. I *Bottom Halves* sono deprecati, e probabilmente verranno
sostituiti dai tasklet. In un dato momento potrà esserci solo un
*bottom half* in esecuzione.
contesto d'interruzione
Non è il contesto utente: qui si processano le interruzioni hardware e
- software. La macro :c:func:`in_interrupt()` ritorna vero.
+ software. La macro in_interrupt() ritorna vero.
contesto utente
Il kernel che esegue qualcosa per conto di un particolare processo (per
@@ -1461,12 +1461,12 @@ contesto utente
che hardware.
interruzione hardware
- Richiesta di interruzione hardware. :c:func:`in_irq()` ritorna vero in un
+ Richiesta di interruzione hardware. in_irq() ritorna vero in un
gestore d'interruzioni hardware.
interruzione software / softirq
- Gestore di interruzioni software: :c:func:`in_irq()` ritorna falso;
- :c:func:`in_softirq()` ritorna vero. I tasklet e le softirq sono entrambi
+ Gestore di interruzioni software: in_irq() ritorna falso;
+ in_softirq() ritorna vero. I tasklet e le softirq sono entrambi
considerati 'interruzioni software'.
In soldoni, un softirq è uno delle 32 interruzioni software che possono
diff --git a/Documentation/translations/it_IT/process/2.Process.rst b/Documentation/translations/it_IT/process/2.Process.rst
index 9af4d01617c4..30dc172f06b0 100644
--- a/Documentation/translations/it_IT/process/2.Process.rst
+++ b/Documentation/translations/it_IT/process/2.Process.rst
@@ -23,18 +23,18 @@ ogni due o tre mesi viene effettuata un rilascio importante del kernel.
I rilasci più recenti sono stati:
====== =================
- 4.11 Aprile 30, 2017
- 4.12 Luglio 2, 2017
- 4.13 Settembre 3, 2017
- 4.14 Novembre 12, 2017
- 4.15 Gennaio 28, 2018
- 4.16 Aprile 1, 2018
+ 5.0 3 marzo, 2019
+ 5.1 5 maggio, 2019
+ 5.2 7 luglio, 2019
+ 5.3 15 settembre, 2019
+ 5.4 24 novembre, 2019
+ 5.5 6 gennaio, 2020
====== =================
-Ciascun rilascio 4.x è un importante rilascio del kernel con nuove
+Ciascun rilascio 5.x è un importante rilascio del kernel con nuove
funzionalità, modifiche interne dell'API, e molto altro. Un tipico
-rilascio 4.x contiene quasi 13,000 gruppi di modifiche con ulteriori
-modifiche a parecchie migliaia di linee di codice. La 4.x. è pertanto la
+rilascio contiene quasi 13,000 gruppi di modifiche con ulteriori
+modifiche a parecchie migliaia di linee di codice. La 5.x. è pertanto la
linea di confine nello sviluppo del kernel Linux; il kernel utilizza un sistema
di sviluppo continuo che integra costantemente nuove importanti modifiche.
@@ -55,8 +55,8 @@ verrà descritto dettagliatamente più avanti).
La finestra di inclusione resta attiva approssimativamente per due settimane.
Al termine di questo periodo, Linus Torvald dichiarerà che la finestra è
chiusa e rilascerà il primo degli "rc" del kernel.
-Per il kernel che è destinato ad essere 2.6.40, per esempio, il rilascio
-che emerge al termine della finestra d'inclusione si chiamerà 2.6.40-rc1.
+Per il kernel che è destinato ad essere 5.6, per esempio, il rilascio
+che emerge al termine della finestra d'inclusione si chiamerà 5.6-rc1.
Questo rilascio indica che il momento di aggiungere nuovi componenti è
passato, e che è iniziato il periodo di stabilizzazione del prossimo kernel.
@@ -76,22 +76,23 @@ Mentre le correzioni si aprono la loro strada all'interno del ramo principale,
il ritmo delle modifiche rallenta col tempo. Linus rilascia un nuovo
kernel -rc circa una volta alla settimana; e ne usciranno circa 6 o 9 prima
che il kernel venga considerato sufficientemente stabile e che il rilascio
-finale 2.6.x venga fatto. A quel punto tutto il processo ricomincerà.
+finale venga fatto. A quel punto tutto il processo ricomincerà.
-Esempio: ecco com'è andato il ciclo di sviluppo della versione 4.16
+Esempio: ecco com'è andato il ciclo di sviluppo della versione 5.4
(tutte le date si collocano nel 2018)
============== =======================================
- Gennaio 28 4.15 rilascio stabile
- Febbraio 11 4.16-rc1, finestra di inclusione chiusa
- Febbraio 18 4.16-rc2
- Febbraio 25 4.16-rc3
- Marzo 4 4.16-rc4
- Marzo 11 4.16-rc5
- Marzo 18 4.16-rc6
- Marzo 25 4.16-rc7
- Aprile 1 4.17 rilascio stabile
+ 15 settembre 5.3 rilascio stabile
+ 30 settembre 5.4-rc1, finestra di inclusione chiusa
+ 6 ottobre 5.4-rc2
+ 13 ottobre 5.4-rc3
+ 20 ottobre 5.4-rc4
+ 27 ottobre 5.4-rc5
+ 3 novembre 5.4-rc6
+ 10 novembre 5.4-rc7
+ 17 novembre 5.4-rc8
+ 24 novembre 5.4 rilascio stabile
============== =======================================
In che modo gli sviluppatori decidono quando chiudere il ciclo di sviluppo e
@@ -108,43 +109,44 @@ tipo di perfezione difficilmente viene raggiunta; esistono troppe variabili
in un progetto di questa portata. Arriva un punto dove ritardare il rilascio
finale peggiora la situazione; la quantità di modifiche in attesa della
prossima finestra di inclusione crescerà enormemente, creando ancor più
-regressioni al giro successivo. Quindi molti kernel 4.x escono con una
+regressioni al giro successivo. Quindi molti kernel 5.x escono con una
manciata di regressioni delle quali, si spera, nessuna è grave.
Una volta che un rilascio stabile è fatto, il suo costante mantenimento è
affidato al "squadra stabilità", attualmente composta da Greg Kroah-Hartman.
Questa squadra rilascia occasionalmente degli aggiornamenti relativi al
-rilascio stabile usando la numerazione 4.x.y. Per essere presa in
+rilascio stabile usando la numerazione 5.x.y. Per essere presa in
considerazione per un rilascio d'aggiornamento, una modifica deve:
(1) correggere un baco importante (2) essere già inserita nel ramo principale
per il prossimo sviluppo del kernel. Solitamente, passato il loro rilascio
iniziale, i kernel ricevono aggiornamenti per più di un ciclo di sviluppo.
-Quindi, per esempio, la storia del kernel 4.13 appare così:
+Quindi, per esempio, la storia del kernel 5.2 appare così (anno 2019):
============== ===============================
- Settembre 3 4.13 rilascio stabile
- Settembre 13 4.13.1
- Settembre 20 4.13.2
- Settembre 27 4.13.3
- Ottobre 5 4.13.4
- Ottobre 12 4.13.5
+ 15 settembre 5.2 rilascio stabile FIXME settembre è sbagliato
+ 14 luglio 5.2.1
+ 21 luglio 5.2.2
+ 26 luglio 5.2.3
+ 28 luglio 5.2.4
+ 31 luglio 5.2.5
... ...
- Novembre 24 4.13.16
+ 11 ottobre 5.2.21
============== ===============================
-La 4.13.16 fu l'aggiornamento finale per la versione 4.13.
+La 5.2.21 fu l'aggiornamento finale per la versione 5.2.
Alcuni kernel sono destinati ad essere kernel a "lungo termine"; questi
riceveranno assistenza per un lungo periodo di tempo. Al momento in cui
scriviamo, i manutentori dei kernel stabili a lungo termine sono:
- ====== ====================== ==========================================
- 3.16 Ben Hutchings (kernel stabile molto più a lungo termine)
- 4.1 Sasha Levin
- 4.4 Greg Kroah-Hartman (kernel stabile molto più a lungo termine)
- 4.9 Greg Kroah-Hartman
- 4.14 Greg Kroah-Hartman
- ====== ====================== ==========================================
+ ====== ================================ ==========================================
+ 3.16 Ben Hutchings (kernel stabile molto più a lungo termine)
+ 4.4 Greg Kroah-Hartman e Sasha Levin (kernel stabile molto più a lungo termine)
+ 4.9 Greg Kroah-Hartman e Sasha Levin
+ 4.14 Greg Kroah-Hartman e Sasha Levin
+ 4.19 Greg Kroah-Hartman e Sasha Levin
+ 5.4i Greg Kroah-Hartman e Sasha Levin
+ ====== ================================ ==========================================
Questa selezione di kernel di lungo periodo sono puramente dovuti ai loro
@@ -229,12 +231,13 @@ Come le modifiche finiscono nel Kernel
--------------------------------------
Esiste una sola persona che può inserire le patch nel repositorio principale
-del kernel: Linus Torvalds. Ma, di tutte le 9500 patch che entrarono nella
-versione 2.6.38 del kernel, solo 112 (circa l'1,3%) furono scelte direttamente
-da Linus in persona. Il progetto del kernel è cresciuto fino a raggiungere
-una dimensione tale per cui un singolo sviluppatore non può controllare e
-selezionare indipendentemente ogni modifica senza essere supportato.
-La via scelta dagli sviluppatori per indirizzare tale crescita è stata quella
+del kernel: Linus Torvalds. Ma, per esempio, di tutte le 9500 patch
+che entrarono nella versione 2.6.38 del kernel, solo 112 (circa
+l'1,3%) furono scelte direttamente da Linus in persona. Il progetto
+del kernel è cresciuto fino a raggiungere una dimensione tale per cui
+un singolo sviluppatore non può controllare e selezionare
+indipendentemente ogni modifica senza essere supportato. La via
+scelta dagli sviluppatori per indirizzare tale crescita è stata quella
di utilizzare un sistema di "sottotenenti" basato sulla fiducia.
Il codice base del kernel è spezzato in una serie si sottosistemi: rete,
diff --git a/Documentation/translations/it_IT/process/adding-syscalls.rst b/Documentation/translations/it_IT/process/adding-syscalls.rst
index c3a3439595a6..bff0a82bf127 100644
--- a/Documentation/translations/it_IT/process/adding-syscalls.rst
+++ b/Documentation/translations/it_IT/process/adding-syscalls.rst
@@ -39,7 +39,7 @@ vostra interfaccia.
un qualche modo opaca.
- Se dovete esporre solo delle informazioni sul sistema, un nuovo nodo in
- sysfs (vedere ``Documentation/filesystems/sysfs.txt``) o
+ sysfs (vedere ``Documentation/filesystems/sysfs.rst``) o
in procfs potrebbe essere sufficiente. Tuttavia, l'accesso a questi
meccanismi richiede che il filesystem sia montato, il che potrebbe non
essere sempre vero (per esempio, in ambienti come namespace/sandbox/chroot).
diff --git a/Documentation/translations/it_IT/process/coding-style.rst b/Documentation/translations/it_IT/process/coding-style.rst
index 8725f2b9e960..6f4f85832dee 100644
--- a/Documentation/translations/it_IT/process/coding-style.rst
+++ b/Documentation/translations/it_IT/process/coding-style.rst
@@ -313,7 +313,7 @@ che conta gli utenti attivi, dovreste chiamarla ``count_active_users()`` o
qualcosa di simile, **non** dovreste chiamarla ``cntusr()``.
Codificare il tipo di funzione nel suo nome (quella cosa chiamata notazione
-ungherese) fa male al cervello - il compilatore conosce comunque il tipo e
+ungherese) è stupido - il compilatore conosce comunque il tipo e
può verificarli, e inoltre confonde i programmatori. Non c'è da
sorprendersi che MicroSoft faccia programmi bacati.
@@ -825,8 +825,8 @@ linguaggio assembler.
Agli sviluppatori del kernel piace essere visti come dotti. Tenete un occhio
di riguardo per l'ortografia e farete una belle figura. In inglese, evitate
-l'uso di parole mozzate come ``dont``: usate ``do not`` oppure ``don't``.
-Scrivete messaggi concisi, chiari, e inequivocabili.
+l'uso incorretto di abbreviazioni come ``dont``: usate ``do not`` oppure
+``don't``. Scrivete messaggi concisi, chiari, e inequivocabili.
I messaggi del kernel non devono terminare con un punto fermo.
diff --git a/Documentation/translations/it_IT/process/deprecated.rst b/Documentation/translations/it_IT/process/deprecated.rst
index 776f26732a94..e108eaf82cf6 100644
--- a/Documentation/translations/it_IT/process/deprecated.rst
+++ b/Documentation/translations/it_IT/process/deprecated.rst
@@ -34,6 +34,33 @@ interfaccia come 'vecchia', questa non è una soluzione completa. L'interfaccia
deve essere rimossa dal kernel, o aggiunta a questo documento per scoraggiarne
l'uso.
+BUG() e BUG_ON()
+----------------
+Al loro posto usate WARN() e WARN_ON() per gestire le
+condizioni "impossibili" e gestitele come se fosse possibile farlo.
+Nonostante le funzioni della famiglia BUG() siano state progettate
+per asserire "situazioni impossibili" e interrompere in sicurezza un
+thread del kernel, queste si sono rivelate essere troppo rischiose
+(per esempio, in quale ordine rilasciare i *lock*? Ci sono stati che
+sono stati ripristinati?). Molto spesso l'uso di BUG()
+destabilizza il sistema o lo corrompe del tutto, il che rende
+impossibile un'attività di debug o anche solo leggere un rapporto
+circa l'errore. Linus ha un'opinione molto critica al riguardo:
+`email 1
+<https://lore.kernel.org/lkml/CA+55aFy6jNLsywVYdGp83AMrXBo_P-pkjkphPGrO=82SPKCpLQ@mail.gmail.com/>`_,
+`email 2
+<https://lore.kernel.org/lkml/CAHk-=whDHsbK3HTOpTF=ue_o04onRwTEaK_ZoJp_fjbqq4+=Jw@mail.gmail.com/>`_
+
+Tenete presente che la famiglia di funzioni WARN() dovrebbe essere
+usato solo per situazioni che si suppone siano "impossibili". Se
+volete avvisare gli utenti riguardo a qualcosa di possibile anche se
+indesiderato, usare le funzioni della famiglia pr_warn(). Chi
+amministra il sistema potrebbe aver attivato l'opzione sysctl
+*panic_on_warn* per essere sicuri che il sistema smetta di funzionare
+in caso si verifichino delle condizioni "inaspettate". (per esempio,
+date un'occhiata al questo `commit
+<https://git.kernel.org/linus/d4689846881d160a4d12a514e991a740bcb5d65a>`_)
+
Calcoli codificati negli argomenti di un allocatore
----------------------------------------------------
Il calcolo dinamico delle dimensioni (specialmente le moltiplicazioni) non
@@ -68,52 +95,81 @@ Invece, usate la seguente funzione::
header = kzalloc(struct_size(header, item, count), GFP_KERNEL);
-Per maggiori dettagli fate riferimento a :c:func:`array_size`,
-:c:func:`array3_size`, e :c:func:`struct_size`, così come la famiglia di
-funzioni :c:func:`check_add_overflow` e :c:func:`check_mul_overflow`.
+Per maggiori dettagli fate riferimento a array_size(),
+array3_size(), e struct_size(), così come la famiglia di
+funzioni check_add_overflow() e check_mul_overflow().
simple_strtol(), simple_strtoll(), simple_strtoul(), simple_strtoull()
----------------------------------------------------------------------
-Le funzioni :c:func:`simple_strtol`, :c:func:`simple_strtoll`,
-:c:func:`simple_strtoul`, e :c:func:`simple_strtoull` ignorano volutamente
+Le funzioni simple_strtol(), simple_strtoll(),
+simple_strtoul(), e simple_strtoull() ignorano volutamente
i possibili overflow, e questo può portare il chiamante a generare risultati
-inaspettati. Le rispettive funzioni :c:func:`kstrtol`, :c:func:`kstrtoll`,
-:c:func:`kstrtoul`, e :c:func:`kstrtoull` sono da considerarsi le corrette
+inaspettati. Le rispettive funzioni kstrtol(), kstrtoll(),
+kstrtoul(), e kstrtoull() sono da considerarsi le corrette
sostitute; tuttavia va notato che queste richiedono che la stringa sia
terminata con il carattere NUL o quello di nuova riga.
strcpy()
--------
-La funzione :c:func:`strcpy` non fa controlli agli estremi del buffer
+La funzione strcpy() non fa controlli agli estremi del buffer
di destinazione. Questo può portare ad un overflow oltre i limiti del
buffer e generare svariati tipi di malfunzionamenti. Nonostante l'opzione
`CONFIG_FORTIFY_SOURCE=y` e svariate opzioni del compilatore aiutano
a ridurne il rischio, non c'è alcuna buona ragione per continuare ad usare
-questa funzione. La versione sicura da usare è :c:func:`strscpy`.
+questa funzione. La versione sicura da usare è strscpy().
strncpy() su stringe terminate con NUL
--------------------------------------
-L'utilizzo di :c:func:`strncpy` non fornisce alcuna garanzia sul fatto che
+L'utilizzo di strncpy() non fornisce alcuna garanzia sul fatto che
il buffer di destinazione verrà terminato con il carattere NUL. Questo
potrebbe portare a diversi overflow di lettura o altri malfunzionamenti
causati, appunto, dalla mancanza del terminatore. Questa estende la
terminazione nel buffer di destinazione quando la stringa d'origine è più
corta; questo potrebbe portare ad una penalizzazione delle prestazioni per
chi usa solo stringe terminate. La versione sicura da usare è
-:c:func:`strscpy`. (chi usa :c:func:`strscpy` e necessita di estendere la
-terminazione con NUL deve aggiungere una chiamata a :c:func:`memset`)
+strscpy(). (chi usa strscpy() e necessita di estendere la
+terminazione con NUL deve aggiungere una chiamata a memset())
-Se il chiamate no usa stringhe terminate con NUL, allore :c:func:`strncpy()`
+Se il chiamate no usa stringhe terminate con NUL, allore strncpy()()
può continuare ad essere usata, ma i buffer di destinazione devono essere
marchiati con l'attributo `__nonstring <https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html>`_
per evitare avvisi durante la compilazione.
strlcpy()
---------
-La funzione :c:func:`strlcpy`, per prima cosa, legge interamente il buffer di
+La funzione strlcpy(), per prima cosa, legge interamente il buffer di
origine, magari leggendo più di quanto verrà effettivamente copiato. Questo
è inefficiente e può portare a overflow di lettura quando la stringa non è
-terminata con NUL. La versione sicura da usare è :c:func:`strscpy`.
+terminata con NUL. La versione sicura da usare è strscpy().
+
+Segnaposto %p nella stringa di formato
+--------------------------------------
+
+Tradizionalmente, l'uso del segnaposto "%p" nella stringa di formato
+esponne un indirizzo di memoria in dmesg, proc, sysfs, eccetera. Per
+evitare che questi indirizzi vengano sfruttati da malintenzionati,
+tutto gli usi di "%p" nel kernel rappresentano l'hash dell'indirizzo,
+rendendolo di fatto inutilizzabile. Nuovi usi di "%p" non dovrebbero
+essere aggiunti al kernel. Per una rappresentazione testuale di un
+indirizzo usate "%pS", l'output è migliore perché mostrerà il nome del
+simbolo. Per tutto il resto, semplicemente non usate "%p".
+
+Parafrasando la `guida
+<https://lore.kernel.org/lkml/CA+55aFwQEd_d40g4mUCSsVRZzrFPUJt74vc6PPpb675hYNXcKw@mail.gmail.com/>`_
+di Linus:
+
+- Se il valore hash di "%p" è inutile, chiediti se il puntatore stesso
+ è importante. Forse dovrebbe essere rimosso del tutto?
+- Se credi davvero che il vero valore del puntatore sia importante,
+ perché alcuni stati del sistema o i livelli di privilegi di un
+ utente sono considerati "special"? Se pensi di poterlo giustificare
+ (in un commento e nel messaggio del commit) abbastanza bene da
+ affrontare il giudizio di Linus, allora forse potrai usare "%px",
+ assicurandosi anche di averne il permesso.
+
+Infine, sappi che un cambio in favore di "%p" con hash `non verrà
+accettato
+<https://lore.kernel.org/lkml/CA+55aFwieC1-nAs+NFq9RTwaR8ef9hWa4MjNBWL41F-8wM49eA@mail.gmail.com/>`_.
Vettori a dimensione variabile (VLA)
------------------------------------
@@ -127,3 +183,47 @@ Questo può portare a dei malfunzionamenti, potrebbe sovrascrivere
dati importanti alla fine dello stack (quando il kernel è compilato senza
`CONFIG_THREAD_INFO_IN_TASK=y`), o sovrascrivere un pezzo di memoria adiacente
allo stack (quando il kernel è compilato senza `CONFIG_VMAP_STACK=y`).
+
+Salto implicito nell'istruzione switch-case
+-------------------------------------------
+
+Il linguaggio C permette ai casi di un'istruzione `switch` di saltare al
+prossimo caso quando l'istruzione "break" viene omessa alla fine del caso
+corrente. Tuttavia questo rende il codice ambiguo perché non è sempre ovvio se
+l'istruzione "break" viene omessa intenzionalmente o è un baco. Per esempio,
+osservando il seguente pezzo di codice non è chiaro se lo stato
+`STATE_ONE` è stato progettato apposta per eseguire anche `STATE_TWO`::
+
+ switch (value) {
+ case STATE_ONE:
+ do_something();
+ case STATE_TWO:
+ do_other();
+ break;
+ default:
+ WARN("unknown state");
+ }
+
+Dato che c'è stata una lunga lista di problemi `dovuti alla mancanza dell'istruzione
+"break" <https://cwe.mitre.org/data/definitions/484.html>`_, oggigiorno non
+permettiamo più che vi sia un "salto implicito" (*fall-through*). Per
+identificare un salto implicito intenzionale abbiamo adottato la pseudo
+parola chiave 'fallthrough' che viene espansa nell'estensione di gcc
+`__attribute__((fallthrough))` `Statement Attributes
+<https://gcc.gnu.org/onlinedocs/gcc/Statement-Attributes.html>`_.
+(Quando la sintassi C17/C18 `[[fallthrough]]` sarà più comunemente
+supportata dai compilatori C, analizzatori statici, e dagli IDE,
+allora potremo usare quella sintassi per la pseudo parola chiave)
+
+Quando la sintassi [[fallthrough]] sarà più comunemente supportata dai
+compilatori, analizzatori statici, e ambienti di sviluppo IDE,
+allora potremo usarla anche noi.
+
+Ne consegue che tutti i blocchi switch/case devono finire in uno dei seguenti
+modi:
+
+* ``break;``
+* `fallthrough;``
+* ``continue;``
+* ``goto <label>;``
+* ``return [expression];``
diff --git a/Documentation/translations/it_IT/process/email-clients.rst b/Documentation/translations/it_IT/process/email-clients.rst
index 224ab031ffd3..89abf6d325f2 100644
--- a/Documentation/translations/it_IT/process/email-clients.rst
+++ b/Documentation/translations/it_IT/process/email-clients.rst
@@ -1,12 +1,334 @@
.. include:: ../disclaimer-ita.rst
-:Original: :ref:`Documentation/process/email-clients.rst <email_clients>`
-
-.. _it_email_clients:
+:Original: :doc:`../../../process/email-clients`
+:Translator: Alessia Mantegazza <amantegazza@vaga.pv.it>
Informazioni sui programmi di posta elettronica per Linux
=========================================================
-.. warning::
+Git
+---
+
+Oggigiorno, la maggior parte degli sviluppatori utilizza ``git send-email``
+al posto dei classici programmi di posta elettronica. Le pagine man sono
+abbastanza buone. Dal lato del ricevente, i manutentori utilizzano ``git am``
+per applicare le patch.
+
+Se siete dei novelli utilizzatori di ``git`` allora inviate la patch a voi
+stessi. Salvatela come testo includendo tutte le intestazioni. Poi eseguite
+il comando ``git am messaggio-formato-testo.txt`` e revisionatene il risultato
+con ``git log``. Quando tutto funziona correttamente, allora potete inviare
+la patch alla lista di discussione più appropriata.
+
+Panoramica delle opzioni
+------------------------
+
+Le patch per il kernel vengono inviate per posta elettronica, preferibilmente
+come testo integrante del messaggio. Alcuni manutentori accettano gli
+allegati, ma in questo caso gli allegati devono avere il *content-type*
+impostato come ``text/plain``. Tuttavia, generalmente gli allegati non sono
+ben apprezzati perché rende più difficile citare porzioni di patch durante il
+processo di revisione.
+
+I programmi di posta elettronica che vengono usati per inviare le patch per il
+kernel Linux dovrebbero inviarle senza alterazioni. Per esempio, non
+dovrebbero modificare o rimuovere tabulazioni o spazi, nemmeno all'inizio o
+alla fine delle righe.
+
+Non inviate patch con ``format=flowed``. Questo potrebbe introdurre
+interruzioni di riga inaspettate e indesiderate.
+
+Non lasciate che il vostro programma di posta vada a capo automaticamente.
+Questo può corrompere le patch.
+
+I programmi di posta non dovrebbero modificare la codifica dei caratteri nel
+testo. Le patch inviate per posta elettronica dovrebbero essere codificate in
+ASCII o UTF-8.
+Se configurate il vostro programma per inviare messaggi codificati con UTF-8
+eviterete possibili problemi di codifica.
+
+I programmi di posta dovrebbero generare e mantenere le intestazioni
+"References" o "In-Reply-To:" cosicché la discussione non venga interrotta.
+
+Di solito, il copia-e-incolla (o taglia-e-incolla) non funziona con le patch
+perché le tabulazioni vengono convertite in spazi. Usando xclipboard, xclip
+e/o xcutsel potrebbe funzionare, ma è meglio che lo verifichiate o meglio
+ancora: non usate il copia-e-incolla.
+
+Non usate firme PGP/GPG nei messaggi che contengono delle patch. Questo
+impedisce il corretto funzionamento di alcuni script per leggere o applicare
+patch (questo si dovrebbe poter correggere).
+
+Prima di inviare le patch sulle liste di discussione Linux, può essere una
+buona idea quella di inviare la patch a voi stessi, salvare il messaggio
+ricevuto, e applicarlo ai sorgenti con successo.
+
+
+Alcuni suggerimenti per i programmi di posta elettronica (MUA)
+--------------------------------------------------------------
+
+Qui troverete alcuni suggerimenti per configurare i vostri MUA allo scopo
+di modificare ed inviare patch per il kernel Linux. Tuttavia, questi
+suggerimenti non sono da considerarsi come un riassunto di una configurazione
+completa.
+
+Legenda:
+
+- TUI = interfaccia utente testuale (*text-based user interface*)
+- GUI = interfaccia utente grafica (*graphical user interface*)
+
+Alpine (TUI)
+************
+
+Opzioni per la configurazione:
+
+Nella sezione :menuselection:`Sending Preferences`:
+
+- :menuselection:`Do Not Send Flowed Text` deve essere ``enabled``
+- :menuselection:`Strip Whitespace Before Sending` deve essere ``disabled``
+
+Quando state scrivendo un messaggio, il cursore dev'essere posizionato
+dove volete che la patch inizi, poi premendo :kbd:`CTRL-R` vi verrà chiesto
+di selezionare il file patch da inserire nel messaggio.
+
+Claws Mail (GUI)
+****************
+
+Funziona. Alcune persone riescono ad usarlo con successo per inviare le patch.
+
+Per inserire una patch usate :menuselection:`Messaggio-->Inserisci file`
+(:kbd:`CTRL-I`) oppure un editor esterno.
+
+Se la patch che avete inserito dev'essere modificata usato la finestra di
+scrittura di Claws, allora assicuratevi che l'"auto-interruzione" sia
+disabilitata :menuselection:`Configurazione-->Preferenze-->Composizione-->Interruzione riga`.
+
+Evolution (GUI)
+***************
+
+Alcune persone riescono ad usarlo con successo per inviare le patch.
+
+Quando state scrivendo una lettera selezionate: Preformattato
+ da :menuselection:`Formato-->Stile del paragrafo-->Preformattato`
+ (:kbd:`CTRL-7`) o dalla barra degli strumenti
+
+Poi per inserire la patch usate:
+:menuselection:`Inserisci--> File di testo...` (:kbd:`ALT-N x`)
+
+Potete anche eseguire ``diff -Nru old.c new.c | xclip``, selezionare
+:menuselection:`Preformattato`, e poi usare il tasto centrale del mouse.
+
+Kmail (GUI)
+***********
+
+Alcune persone riescono ad usarlo con successo per inviare le patch.
+
+La configurazione base che disabilita la composizione di messaggi HTML è
+corretta; non abilitatela.
+
+Quando state scrivendo un messaggio, nel menu opzioni, togliete la selezione a
+"A capo automatico". L'unico svantaggio sarà che qualsiasi altra cosa scriviate
+nel messaggio non verrà mandata a capo in automatico ma dovrete farlo voi.
+Il modo più semplice per ovviare a questo problema è quello di scrivere il
+messaggio con l'opzione abilitata e poi di salvarlo nelle bozze. Riaprendo ora
+il messaggio dalle bozze le andate a capo saranno parte integrante del
+messaggio, per cui togliendo l'opzione "A capo automatico" non perderete nulla.
+
+Alla fine del vostro messaggio, appena prima di inserire la vostra patch,
+aggiungete il delimitatore di patch: tre trattini (``---``).
+
+Ora, dal menu :menuselection:`Messaggio`, selezionate :menuselection:`Inserisci file di testo...`
+quindi scegliete la vostra patch.
+Come soluzione aggiuntiva potreste personalizzare la vostra barra degli
+strumenti aggiungendo un'icona per :menuselection:`Inserisci file di testo...`.
+
+Allargate la finestra di scrittura abbastanza da evitare andate a capo.
+Questo perché in Kmail 1.13.5 (KDE 4.5.4), Kmail aggiunge andate a capo
+automaticamente al momento dell'invio per tutte quelle righe che graficamente,
+nella vostra finestra di composizione, si sono estete su una riga successiva.
+Disabilitare l'andata a capo automatica non è sufficiente. Dunque, se la vostra
+patch contiene delle righe molto lunghe, allora dovrete allargare la finestra
+di composizione per evitare che quelle righe vadano a capo. Vedere:
+https://bugs.kde.org/show_bug.cgi?id=174034
+
+Potete firmare gli allegati con GPG, ma per le patch si preferisce aggiungerle
+al testo del messaggio per cui non usate la firma GPG. Firmare le patch
+inserite come testo del messaggio le rende più difficili da estrarre dalla loro
+codifica a 7-bit.
+
+Se dovete assolutamente inviare delle patch come allegati invece di integrarle
+nel testo del messaggio, allora premete il tasto destro sull'allegato e
+selezionate :menuselection:`Proprietà`, e poi attivate
+:menuselection:`Suggerisci visualizzazione automatica` per far si che
+l'allegato sia più leggibile venendo visualizzato come parte del messaggio.
+
+Per salvare le patch inviate come parte di un messaggio, selezionate il
+messaggio che la contiene, premete il tasto destro e selezionate
+:menuselection:`Salva come`. Se il messaggio fu ben preparato, allora potrete
+usarlo interamente senza alcuna modifica.
+I messaggi vengono salvati con permessi di lettura-scrittura solo per l'utente,
+nel caso in cui vogliate copiarli altrove per renderli disponibili ad altri
+gruppi o al mondo, ricordatevi di usare ``chmod`` per cambiare i permessi.
+
+Lotus Notes (GUI)
+*****************
+
+Scappate finché potete.
+
+IBM Verse (Web GUI)
+*******************
+
+Vedi il commento per Lotus Notes.
+
+Mutt (TUI)
+**********
+
+Un sacco di sviluppatori Linux usano ``mutt``, per cui deve funzionare
+abbastanza bene.
+
+Mutt non ha un proprio editor, quindi qualunque sia il vostro editor dovrete
+configurarlo per non aggiungere automaticamente le andate a capo. Molti
+editor hanno un'opzione :menuselection:`Inserisci file` che inserisce il
+contenuto di un file senza alterarlo.
+
+Per usare ``vim`` come editor per mutt::
+
+ set editor="vi"
+
+Se per inserire la patch nel messaggio usate xclip, scrivete il comando::
+
+ :set paste
+
+prima di premere il tasto centrale o shift-insert. Oppure usate il
+comando::
+
+ :r filename
+
+(a)llega funziona bene senza ``set paste``
+
+Potete generare le patch con ``git format-patch`` e usare Mutt per inviarle::
+
+ $ mutt -H 0001-some-bug-fix.patch
+
+Opzioni per la configurazione:
+
+Tutto dovrebbe funzionare già nella configurazione base.
+Tuttavia, è una buona idea quella di impostare ``send_charset``::
+
+ set send_charset="us-ascii:utf-8"
+
+Mutt è molto personalizzabile. Qui di seguito trovate la configurazione minima
+per iniziare ad usare Mutt per inviare patch usando Gmail::
+
+ # .muttrc
+ # ================ IMAP ====================
+ set imap_user = 'yourusername@gmail.com'
+ set imap_pass = 'yourpassword'
+ set spoolfile = imaps://imap.gmail.com/INBOX
+ set folder = imaps://imap.gmail.com/
+ set record="imaps://imap.gmail.com/[Gmail]/Sent Mail"
+ set postponed="imaps://imap.gmail.com/[Gmail]/Drafts"
+ set mbox="imaps://imap.gmail.com/[Gmail]/All Mail"
+
+ # ================ SMTP ====================
+ set smtp_url = "smtp://username@smtp.gmail.com:587/"
+ set smtp_pass = $imap_pass
+ set ssl_force_tls = yes # Require encrypted connection
+
+ # ================ Composition ====================
+ set editor = `echo \$EDITOR`
+ set edit_headers = yes # See the headers when editing
+ set charset = UTF-8 # value of $LANG; also fallback for send_charset
+ # Sender, email address, and sign-off line must match
+ unset use_domain # because joe@localhost is just embarrassing
+ set realname = "YOUR NAME"
+ set from = "username@gmail.com"
+ set use_from = yes
+
+La documentazione di Mutt contiene molte più informazioni:
+
+ https://gitlab.com/muttmua/mutt/-/wikis/UseCases/Gmail
+
+ http://www.mutt.org/doc/manual/
+
+Pine (TUI)
+**********
+
+Pine aveva alcuni problemi con gli spazi vuoti, ma questi dovrebbero essere
+stati risolti.
+
+Se potete usate alpine (il successore di pine).
+
+Opzioni di configurazione:
+
+- Nelle versioni più recenti è necessario avere ``quell-flowed-text``
+- l'opzione ``no-strip-whitespace-before-send`` è necessaria
+
+Sylpheed (GUI)
+**************
+
+- funziona bene per aggiungere testo in linea (o usando allegati)
+- permette di utilizzare editor esterni
+- è lento su cartelle grandi
+- non farà l'autenticazione TSL SMTP su una connessione non SSL
+- ha un utile righello nella finestra di scrittura
+- la rubrica non comprende correttamente il nome da visualizzare e
+ l'indirizzo associato
+
+Thunderbird (GUI)
+*****************
+
+Thunderbird è un clone di Outlook a cui piace maciullare il testo, ma esistono
+modi per impedirglielo.
+
+- permettere l'uso di editor esterni:
+ La cosa più semplice da fare con Thunderbird e le patch è quello di usare
+ l'estensione "external editor" e di usare il vostro ``$EDITOR`` preferito per
+ leggere/includere patch nel vostro messaggio. Per farlo, scaricate ed
+ installate l'estensione e aggiungete un bottone per chiamarla rapidamente
+ usando :menuselection:`Visualizza-->Barra degli strumenti-->Personalizza...`;
+ una volta fatto potrete richiamarlo premendo sul bottone mentre siete nella
+ finestra :menuselection:`Scrivi`
+
+ Tenete presente che "external editor" richiede che il vostro editor non
+ faccia alcun fork, in altre parole, l'editor non deve ritornare prima di
+ essere stato chiuso. Potreste dover passare dei parametri aggiuntivi al
+ vostro editor oppure cambiargli la configurazione. Per esempio, usando
+ gvim dovrete aggiungere l'opzione -f ``/usr/bin/gvim -f`` (Se il binario
+ si trova in ``/usr/bin``) nell'apposito campo nell'interfaccia di
+ configurazione di :menuselection:`external editor`. Se usate altri editor
+ consultate il loro manuale per sapere come configurarli.
+
+Per rendere l'editor interno un po' più sensato, fate così:
+
+- Modificate le impostazioni di Thunderbird per far si che non usi
+ ``format=flowed``. Andate in :menuselection:`Modifica-->Preferenze-->Avanzate-->Editor di configurazione`
+ per invocare il registro delle impostazioni.
+
+- impostate ``mailnews.send_plaintext_flowed`` a ``false``
+
+- impostate ``mailnews.wraplength`` da ``72`` a ``0``
+
+- :menuselection:`Visualizza-->Corpo del messaggio come-->Testo semplice`
+
+- :menuselection:`Visualizza-->Codifica del testo-->Unicode`
+
+
+TkRat (GUI)
+***********
+
+Funziona. Usare "Inserisci file..." o un editor esterno.
+
+Gmail (Web GUI)
+***************
+
+Non funziona per inviare le patch.
+
+Il programma web Gmail converte automaticamente i tab in spazi.
+
+Allo stesso tempo aggiunge andata a capo ogni 78 caratteri. Comunque
+il problema della conversione fra spazi e tab può essere risolto usando
+un editor esterno.
- TODO ancora da tradurre
+Un altro problema è che Gmail usa la codifica base64 per tutti quei messaggi
+che contengono caratteri non ASCII. Questo include cose tipo i nomi europei.
diff --git a/Documentation/translations/it_IT/process/index.rst b/Documentation/translations/it_IT/process/index.rst
index 012de0f3154a..c4c867132c88 100644
--- a/Documentation/translations/it_IT/process/index.rst
+++ b/Documentation/translations/it_IT/process/index.rst
@@ -59,6 +59,7 @@ perché non si è trovato un posto migliore.
magic-number
volatile-considered-harmful
clang-format
+ ../riscv/patch-acceptance
.. only:: subproject and html
diff --git a/Documentation/translations/it_IT/process/management-style.rst b/Documentation/translations/it_IT/process/management-style.rst
index 07e68bfb8402..c709285138a7 100644
--- a/Documentation/translations/it_IT/process/management-style.rst
+++ b/Documentation/translations/it_IT/process/management-style.rst
@@ -1,12 +1,293 @@
.. include:: ../disclaimer-ita.rst
-:Original: :ref:`Documentation/process/management-style.rst <managementstyle>`
+:Original: :doc:`../../../process/management-style`
+:Translator: Alessia Mantegazza <amantegazza@vaga.pv.it>
-.. _it_managementstyle:
+Il modello di gestione del kernel Linux
+=======================================
-Tipo di gestione del kernel Linux
-=================================
+Questo breve documento descrive il modello di gestione del kernel Linux.
+Per certi versi, esso rispecchia il documento
+:ref:`translations/it_IT/process/coding-style.rst <it_codingstyle>`,
+ed è principalmente scritto per evitare di rispondere [#f1]_ in continuazione
+alle stesse identiche (o quasi) domande.
-.. warning::
+Il modello di gestione è qualcosa di molto personale e molto più difficile da
+qualificare rispetto a delle semplici regole di codifica, quindi questo
+documento potrebbe avere più o meno a che fare con la realtà. È cominciato
+come un gioco, ma ciò non significa che non possa essere vero.
+Lo dovrete decidere voi stessi.
- TODO ancora da tradurre
+In ogni caso, quando si parla del "dirigente del kernel", ci si riferisce
+sempre alla persona che dirige tecnicamente, e non a coloro che
+tradizionalmente hanno un ruolo direttivo all'interno delle aziende. Se vi
+occupate di convalidare acquisti o avete una qualche idea sul budget del vostro
+gruppo, probabilmente non siete un dirigente del kernel. Quindi i suggerimenti
+qui indicati potrebbero fare al caso vostro, oppure no.
+
+Prima di tutto, suggerirei di acquistare "Le sette regole per avere successo",
+e di non leggerlo. Bruciatelo, è un grande gesto simbolico.
+
+.. [#f1] Questo documento non fa molto per risponde alla domanda, ma rende
+ così dannatamente ovvio a chi la pone che non abbiamo la minima idea
+ di come rispondere.
+
+Comunque, partiamo:
+
+.. _it_decisions:
+
+1) Le decisioni
+---------------
+
+Tutti pensano che i dirigenti decidano, e che questo prendere decisioni
+sia importante. Più grande e dolorosa è la decisione, più importante deve
+essere il dirigente che la prende. Questo è molto profondo ed ovvio, ma non è
+del tutto vero.
+
+Il gioco consiste nell'"evitare" di dover prendere decisioni. In particolare
+se qualcuno vi chiede di "Decidere" tra (a) o (b), e vi dice che ha
+davvero bisogno di voi per questo, come dirigenti siete nei guai.
+Le persone che gestite devono conoscere i dettagli più di quanto li conosciate
+voi, quindi se vengono da voi per una decisione tecnica, siete fottuti.
+Non sarete chiaramente competente per prendere quella decisione per loro.
+
+(Corollario: se le persone che gestite non conoscono i dettagli meglio di voi,
+anche in questo caso sarete fregati, tuttavia per altre ragioni. Ossia state
+facendo il lavoro sbagliato, e che invece dovrebbero essere "loro" a gestirvi)
+
+Quindi il gioco si chiama "evitare" decisioni, almeno le più grandi e
+difficili. Prendere decisioni piccoli e senza conseguenze va bene, e vi fa
+sembrare competenti in quello che state facendo, quindi quello che un dirigente
+del kernel ha bisogno di fare è trasformare le decisioni grandi e difficili
+in minuzie delle quali nessuno importa.
+
+Ciò aiuta a capire che la differenza chiave tra una grande decisione ed una
+piccola sta nella possibilità di modificare tale decisione in seguito.
+Qualsiasi decisione importante può essere ridotta in decisioni meno importanti,
+ma dovete assicurarvi che possano essere reversibili in caso di errori
+(presenti o futuri). Improvvisamente, dovrete essere doppiamente dirigenti
+per **due** decisioni non sequenziali - quella sbagliata **e** quella giusta.
+
+E le persone vedranno tutto ciò come prova di vera capacità di comando
+(*cough* cavolata *cough*)
+
+Così la chiave per evitare le decisioni difficili diviene l'evitare
+di fare cose che non possono essere disfatte. Non infilatevi in un angolo
+dal quale non potrete sfuggire. Un topo messo all'angolo può rivelarsi
+pericoloso - un dirigente messo all'angolo è solo pietoso.
+
+**In ogni caso** dato che nessuno è stupido al punto da lasciare veramente ad
+un dirigente del kernel un enorme responsabilità, solitamente è facile fare
+marcia indietro. Annullare una decisione è molto facile: semplicemente dite a
+tutti che siete stati degli scemi incompetenti, dite che siete dispiaciuti, ed
+annullate tutto l'inutile lavoro sul quale gli altri hanno lavorato nell'ultimo
+anno. Improvvisamente la decisione che avevate preso un anno fa non era poi
+così grossa, dato che può essere facilmente annullata.
+
+È emerso che alcune persone hanno dei problemi con questo tipo di approccio,
+questo per due ragioni:
+
+ - ammettere di essere degli idioti è più difficile di quanto sembri. A tutti
+ noi piace mantenere le apparenze, ed uscire allo scoperto in pubblico per
+ ammettere che ci si è sbagliati è qualcosa di davvero impegnativo.
+ - avere qualcuno che ti dice che ciò su cui hai lavorato nell'ultimo anno
+ non era del tutto valido, può rivelarsi difficile anche per un povero ed
+ umile ingegnere, e mentre il **lavoro** vero era abbastanza facile da
+ cancellare, dall'altro canto potreste aver irrimediabilmente perso la
+ fiducia di quell'ingegnere. E ricordate che l'"irrevocabile" era quello
+ che avevamo cercato di evitare fin dall'inizio, e la vostra decisione
+ ha finito per esserlo.
+
+Fortunatamente, entrambe queste ragioni posso essere mitigate semplicemente
+ammettendo fin dal principio che non avete una cavolo di idea, dicendo
+agli altri in anticipo che la vostra decisione è puramente ipotetica, e che
+potrebbe essere sbagliata. Dovreste sempre riservarvi il diritto di cambiare
+la vostra opinione, e rendere gli altri ben **consapevoli** di ciò.
+Ed è molto più facile ammettere di essere stupidi quando non avete **ancora**
+fatto quella cosa stupida.
+
+Poi, quando è realmente emersa la vostra stupidità, le persone semplicemente
+roteeranno gli occhi e diranno "Uffa, no, ancora".
+
+Questa ammissione preventiva di incompetenza potrebbe anche portare le persone
+che stanno facendo il vero lavoro, a pensarci due volte. Dopo tutto, se
+**loro** non sono certi se sia una buona idea, voi, sicuro come la morte,
+non dovreste incoraggiarli promettendogli che ciò su cui stanno lavorando
+verrà incluso. Fate si che ci pensino due volte prima che si imbarchino in un
+grosso lavoro.
+
+Ricordate: loro devono sapere più cose sui dettagli rispetto a voi, e
+solitamente pensano di avere già la risposta a tutto. La miglior cosa che
+potete fare in qualità di dirigente è di non instillare troppa fiducia, ma
+invece fornire una salutare dose di pensiero critico su quanto stanno facendo.
+
+Comunque, un altro modo di evitare una decisione è quello di lamentarsi
+malinconicamente dicendo : "non possiamo farli entrambi e basta?" e con uno
+sguardo pietoso. Fidatevi, funziona. Se non è chiaro quale sia il miglior
+approccio, lo scopriranno. La risposta potrebbe essere data dal fatto che
+entrambe i gruppi di lavoro diventano frustati al punto di rinunciarvi.
+
+Questo può suonare come un fallimento, ma di solito questo è un segno che
+c'era qualcosa che non andava in entrambe i progetti, e il motivo per
+il quale le persone coinvolte non abbiano potuto decidere era che entrambe
+sbagliavano. Voi ne uscirete freschi come una rosa, e avrete evitato un'altra
+decisione con la quale avreste potuto fregarvi.
+
+
+2) Le persone
+-------------
+
+Ci sono molte persone stupide, ed essere un dirigente significa che dovrete
+scendere a patti con questo, e molto più importate, che **loro** devono avere
+a che fare con **voi**.
+
+Ne emerge che mentre è facile annullare degli errori tecnici, non è invece
+così facile rimuovere i disordini della personalità. Dovrete semplicemente
+convivere con i loro, ed i vostri, problemi.
+
+Comunque, al fine di preparavi in qualità di dirigenti del kernel, è meglio
+ricordare di non abbattere alcun ponte, bombardare alcun paesano innocente,
+o escludere troppi sviluppatori kernel. Ne emerge che escludere le persone
+è piuttosto facile, mentre includerle nuovamente è difficile. Così
+"l'esclusione" immediatamente cade sotto il titolo di "non reversibile", e
+diviene un no-no secondo la sezione :ref:`it_decisions`.
+
+Esistono alcune semplici regole qui:
+
+ (1) non chiamate le persone teste di c*** (al meno, non in pubblico)
+ (2) imparate a scusarvi quando dimenticate la regola (1)
+
+Il problema del punto numero 1 è che è molto facile da rispettare, dato che
+è possibile dire "sei una testa di c***" in milioni di modi differenti [#f2]_,
+a volte senza nemmeno pensarci, e praticamente sempre con la calda convinzione
+di essere nel giusto.
+
+E più convinti sarete che avete ragione (e diciamolo, potete chiamare
+praticamente **tutti** testa di c**, e spesso **sarete** nel giusto), più
+difficile sarà scusarvi successivamente.
+
+Per risolvere questo problema, avete due possibilità:
+
+ - diventare davvero bravi nello scusarsi
+ - essere amabili così che nessuno finirà col sentirsi preso di mira. Siate
+ creativi abbastanza, e potrebbero esserne divertiti.
+
+L'opzione dell'essere immancabilmente educati non esiste proprio. Nessuno
+si fiderà di qualcuno che chiaramente sta nascondendo il suo vero carattere.
+
+.. [#f2] Paul Simon cantava: "50 modi per lasciare il vostro amante", perché,
+ molto francamente, "Un milione di modi per dire ad uno sviluppatore
+ Testa di c***" non avrebbe funzionato. Ma sono sicuro che ci abbia
+ pensato.
+
+
+3) Le persone II - quelle buone
+-------------------------------
+
+Mentre emerge che la maggior parte delle persone sono stupide, il corollario
+a questo è il triste fatto che anche voi siete fra queste, e che mentre
+possiamo tutti crogiolarci nella sicurezza di essere migliori della media
+delle persone (diciamocelo, nessuno crede di essere nelle media o sotto di
+essa), dovremmo anche ammettere che non siamo il "coltello più affilato" del
+circondario, e che ci saranno altre persone che sono meno stupide di quanto
+lo siete voi.
+
+Molti reagiscono male davanti alle persone intelligenti. Altri le usano a
+proprio vantaggio.
+
+Assicuratevi che voi, in qualità di manutentori del kernel, siate nel secondo
+gruppo. Inchinatevi dinanzi a loro perché saranno le persone che vi renderanno
+il lavoro più facile. In particolare, prenderanno le decisioni per voi, che è
+l'oggetto di questo gioco.
+
+Quindi quando trovate qualcuno più sveglio di voi, prendetevela comoda.
+Le vostre responsabilità dirigenziali si ridurranno in gran parte nel dire
+"Sembra una buona idea - Vai", oppure "Sembra buono, ma invece circa questo e
+quello?". La seconda versione in particolare è una gran modo per imparare
+qualcosa di nuovo circa "questo e quello" o di sembrare **extra** dirigenziali
+sottolineando qualcosa alla quale i più svegli non avevano pensato. In
+entrambe i casi, vincete.
+
+Una cosa alla quale dovete fare attenzione è che l'essere grandi in qualcosa
+non si traduce automaticamente nell'essere grandi anche in altre cose. Quindi
+dovreste dare una spintarella alle persone in una specifica direzione, ma
+diciamocelo, potrebbero essere bravi in ciò che fanno e far schifo in tutto
+il resto. La buona notizia è che le persone tendono a gravitare attorno a ciò
+in cui sono bravi, quindi non state facendo nulla di irreversibile quando li
+spingete verso una certa direzione, solo non spingete troppo.
+
+
+4) Addossare le colpe
+---------------------
+
+Le cose andranno male, e le persone vogliono qualcuno da incolpare. Sarete voi.
+
+Non è poi così difficile accettare la colpa, specialmente se le persone
+riescono a capire che non era **tutta** colpa vostra. Il che ci porta
+sulla miglior strada per assumersi la colpa: fatelo per qualcun'altro.
+Vi sentirete bene nel assumervi la responsabilità, e loro si sentiranno
+bene nel non essere incolpati, e coloro che hanno perso i loro 36GB di
+pornografia a causa della vostra incompetenza ammetteranno a malincuore che
+almeno non avete cercato di fare il furbetto.
+
+Successivamente fate in modo che gli sviluppatori che in realtà hanno fallito
+(se riuscite a trovarli) sappiano **in privato** che sono "fottuti".
+Questo non per fargli sapere che la prossima volta possono evitarselo ma per
+fargli capire che sono in debito. E, forse cosa più importante, sono loro che
+devono sistemare la cosa. Perché, ammettiamolo, è sicuro non sarete voi a
+farlo.
+
+Assumersi la colpa è anche ciò che vi rendere dirigenti in prima battuta.
+È parte di ciò che spinge gli altri a fidarsi di voi, e vi garantisce
+la gloria potenziale, perché siete gli unici a dire "Ho fatto una cavolata".
+E se avete seguito le regole precedenti, sarete decisamente bravi nel dirlo.
+
+
+5) Le cose da evitare
+---------------------
+
+Esiste una cosa che le persone odiano più che essere chiamate "teste di c****",
+ed è essere chiamate "teste di c****" con fare da bigotto. Se per il primo
+caso potrete comunque scusarvi, per il secondo non ve ne verrà data nemmeno
+l'opportunità. Probabilmente smetteranno di ascoltarvi anche se tutto sommato
+state svolgendo un buon lavoro.
+
+Tutti crediamo di essere migliori degli altri, il che significa che quando
+qualcuno inizia a darsi delle arie, ci da **davvero** fastidio. Potreste anche
+essere moralmente ed intellettualmente superiore a tutti quelli attorno a voi,
+ma non cercate di renderlo ovvio per gli altri a meno che non **vogliate**
+veramente far arrabbiare qualcuno [#f3]_.
+
+Allo stesso modo evitate di essere troppo gentili e pacati. Le buone maniere
+facilmente finiscono per strabordare e nascondere i problemi, e come si usa
+dire, "su internet nessuno può sentire la vostra pacatezza". Usate argomenti
+diretti per farvi capire, non potete sperare che la gente capisca in altro
+modo.
+
+Un po' di umorismo può aiutare a smorzare sia la franchezza che la moralità.
+Andare oltre i limiti al punto d'essere ridicolo può portare dei punti a casa
+senza renderlo spiacevole per i riceventi, i quali penseranno che stavate
+facendo gli scemi. Può anche aiutare a lasciare andare quei blocchi mentali
+che abbiamo nei confronti delle critiche.
+
+.. [#f3] Suggerimento: i forum di discussione su internet, che non sono
+ collegati col vostro lavoro, sono ottimi modi per sfogare la frustrazione
+ verso altre persone. Di tanto in tanto scrivete messaggi offensivi col ghigno
+ in faccia per infiammare qualche discussione: vi sentirete purificati. Solo
+ cercate di non cagare troppo vicino a casa.
+
+6) Perché io?
+-------------
+
+Dato che la vostra responsabilità principale è quella di prendervi le colpe
+d'altri, e rendere dolorosamente ovvio a tutti che siete degli incompetenti,
+la domanda naturale che ne segue sarà : perché dovrei fare tutto ciò?
+
+Innanzitutto, potreste diventare o no popolari al punto da avere la fila di
+ragazzine (o ragazzini, evitiamo pregiudizi o sessismo) che gridano e bussano
+alla porta del vostro camerino, ma comunque **proverete** un immenso senso di
+realizzazione personale dall'essere "in carica". Dimenticate il fatto che voi
+state discutendo con tutti e che cercate di inseguirli il più velocemente che
+potete. Tutti continueranno a pensare che voi siete la persona in carica.
+
+È un bel lavoro se riuscite ad adattarlo a voi.
diff --git a/Documentation/translations/it_IT/process/submit-checklist.rst b/Documentation/translations/it_IT/process/submit-checklist.rst
index 995ee69fab11..3e575502690f 100644
--- a/Documentation/translations/it_IT/process/submit-checklist.rst
+++ b/Documentation/translations/it_IT/process/submit-checklist.rst
@@ -117,7 +117,7 @@ sottomissione delle patch, in particolare
sorgenti che ne spieghi la logica: cosa fanno e perché.
25) Se la patch aggiunge nuove chiamate ioctl, allora aggiornate
- ``Documentation/ioctl/ioctl-number.rst``.
+ ``Documentation/userspace-api/ioctl/ioctl-number.rst``.
26) Se il codice che avete modificato dipende o usa una qualsiasi interfaccia o
funzionalità del kernel che è associata a uno dei seguenti simboli
diff --git a/Documentation/translations/it_IT/riscv/patch-acceptance.rst b/Documentation/translations/it_IT/riscv/patch-acceptance.rst
new file mode 100644
index 000000000000..edf67252b3fb
--- /dev/null
+++ b/Documentation/translations/it_IT/riscv/patch-acceptance.rst
@@ -0,0 +1,40 @@
+.. include:: ../disclaimer-ita.rst
+
+:Original: :doc:`../../../riscv/patch-acceptance`
+:Translator: Federico Vaga <federico.vaga@vaga.pv.it>
+
+arch/riscv linee guida alla manutenzione per gli sviluppatori
+=============================================================
+
+Introduzione
+------------
+
+L'insieme di istruzioni RISC-V sono sviluppate in modo aperto: le
+bozze in fase di sviluppo sono disponibili a tutti per essere
+revisionate e per essere sperimentare nelle implementazioni. Le bozze
+dei nuovi moduli o estensioni possono cambiare in fase di sviluppo - a
+volte in modo incompatibile rispetto a bozze precedenti. Questa
+flessibilità può portare a dei problemi di manutenzioni per il
+supporto RISC-V nel kernel Linux. I manutentori Linux non amano
+l'abbandono del codice, e il processo di sviluppo del kernel
+preferisce codice ben revisionato e testato rispetto a quello
+sperimentale. Desideriamo estendere questi stessi principi al codice
+relativo all'architettura RISC-V che verrà accettato per l'inclusione
+nel kernel.
+
+In aggiunta alla lista delle verifiche da fare prima di inviare una patch
+-------------------------------------------------------------------------
+
+Accetteremo le patch per un nuovo modulo o estensione se la fondazione
+RISC-V li classifica come "Frozen" o "Retified". (Ovviamente, gli
+sviluppatori sono liberi di mantenere una copia del kernel Linux
+contenente il codice per una bozza di estensione).
+
+In aggiunta, la specifica RISC-V permette agli implementatori di
+creare le proprie estensioni. Queste estensioni non passano
+attraverso il processo di revisione della fondazione RISC-V. Per
+questo motivo, al fine di evitare complicazioni o problemi di
+prestazioni, accetteremo patch solo per quelle estensioni che sono
+state ufficialmente accettate dalla fondazione RISC-V. (Ovviamente,
+gli implementatori sono liberi di mantenere una copia del kernel Linux
+contenente il codice per queste specifiche estensioni).
diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt
index 2e831ece6e26..e50fe6541335 100644
--- a/Documentation/translations/ko_KR/memory-barriers.txt
+++ b/Documentation/translations/ko_KR/memory-barriers.txt
@@ -641,7 +641,7 @@ P 는 짝수 번호 캐시 라인에 저장되어 있고, 변수 B 는 홀수
리눅스 커널이 지원하는 CPU 들은 (1) 쓰기가 정말로 일어날지, (2) 쓰기가 어디에
이루어질지, 그리고 (3) 쓰여질 값을 확실히 알기 전까지는 쓰기를 수행하지 않기
때문입니다. 하지만 "컨트롤 의존성" 섹션과
-Documentation/RCU/rcu_dereference.txt 파일을 주의 깊게 읽어 주시기 바랍니다:
+Documentation/RCU/rcu_dereference.rst 파일을 주의 깊게 읽어 주시기 바랍니다:
컴파일러는 매우 창의적인 많은 방법으로 종속성을 깰 수 있습니다.
CPU 1 CPU 2
diff --git a/Documentation/translations/zh_CN/IRQ.txt b/Documentation/translations/zh_CN/IRQ.txt
index 956026d5cf82..9aec8dca4fcf 100644
--- a/Documentation/translations/zh_CN/IRQ.txt
+++ b/Documentation/translations/zh_CN/IRQ.txt
@@ -1,4 +1,4 @@
-Chinese translated version of Documentation/IRQ.txt
+Chinese translated version of Documentation/core-api/irq/index.rst
If you have any comment or update to the content, please contact the
original document maintainer directly. However, if you have a problem
@@ -9,7 +9,7 @@ or if there is a problem with the translation.
Maintainer: Eric W. Biederman <ebiederman@xmission.com>
Chinese maintainer: Fu Wei <tekkamanninja@gmail.com>
---------------------------------------------------------------------
-Documentation/IRQ.txt 的中文翻译
+Documentation/core-api/irq/index.rst 的中文翻译
如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文
交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻
diff --git a/Documentation/translations/zh_CN/filesystems/debugfs.rst b/Documentation/translations/zh_CN/filesystems/debugfs.rst
new file mode 100644
index 000000000000..f8a28793c277
--- /dev/null
+++ b/Documentation/translations/zh_CN/filesystems/debugfs.rst
@@ -0,0 +1,221 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: :ref:`Documentation/filesystems/debugfs.txt <debugfs_index>`
+
+=======
+Debugfs
+=======
+
+译者
+::
+
+ 中文版维护者: 罗楚成 Chucheng Luo <luochucheng@vivo.com>
+ 中文版翻译者: 罗楚成 Chucheng Luo <luochucheng@vivo.com>
+ 中文版校译者: 罗楚成 Chucheng Luo <luochucheng@vivo.com>
+
+
+
+版权所有2020 罗楚成 <luochucheng@vivo.com>
+
+
+Debugfs是内核开发人员在用户空间获取信息的简单方法。与/proc不同,proc只提供进程
+信息。也不像sysfs,具有严格的“每个文件一个值“的规则。debugfs根本没有规则,开发
+人员可以在这里放置他们想要的任何信息。debugfs文件系统也不能用作稳定的ABI接口。
+从理论上讲,debugfs导出文件的时候没有任何约束。但是[1]实际情况并不总是那么
+简单。即使是debugfs接口,也最好根据需要进行设计,并尽量保持接口不变。
+
+
+Debugfs通常使用以下命令安装::
+
+ mount -t debugfs none /sys/kernel/debug
+
+(或等效的/etc/fstab行)。
+debugfs根目录默认仅可由root用户访问。要更改对文件树的访问,请使用“ uid”,“ gid”
+和“ mode”挂载选项。请注意,debugfs API仅按照GPL协议导出到模块。
+
+使用debugfs的代码应包含<linux/debugfs.h>。然后,首先是创建至少一个目录来保存
+一组debugfs文件::
+
+ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent);
+
+如果成功,此调用将在指定的父目录下创建一个名为name的目录。如果parent参数为空,
+则会在debugfs根目录中创建。创建目录成功时,返回值是一个指向dentry结构体的指针。
+该dentry结构体的指针可用于在目录中创建文件(以及最后将其清理干净)。ERR_PTR
+(-ERROR)返回值表明出错。如果返回ERR_PTR(-ENODEV),则表明内核是在没有debugfs
+支持的情况下构建的,并且下述函数都不会起作用。
+
+在debugfs目录中创建文件的最通用方法是::
+
+ struct dentry *debugfs_create_file(const char *name, umode_t mode,
+ struct dentry *parent, void *data,
+ const struct file_operations *fops);
+
+在这里,name是要创建的文件的名称,mode描述了访问文件应具有的权限,parent指向
+应该保存文件的目录,data将存储在产生的inode结构体的i_private字段中,而fops是
+一组文件操作函数,这些函数中实现文件操作的具体行为。至少,read()和/或
+write()操作应提供;其他可以根据需要包括在内。同样的,返回值将是指向创建文件
+的dentry指针,错误时返回ERR_PTR(-ERROR),系统不支持debugfs时返回值为ERR_PTR
+(-ENODEV)。创建一个初始大小的文件,可以使用以下函数代替::
+
+ struct dentry *debugfs_create_file_size(const char *name, umode_t mode,
+ struct dentry *parent, void *data,
+ const struct file_operations *fops,
+ loff_t file_size);
+
+file_size是初始文件大小。其他参数跟函数debugfs_create_file的相同。
+
+在许多情况下,没必要自己去创建一组文件操作;对于一些简单的情况,debugfs代码提供
+了许多帮助函数。包含单个整数值的文件可以使用以下任何一项创建::
+
+ void debugfs_create_u8(const char *name, umode_t mode,
+ struct dentry *parent, u8 *value);
+ void debugfs_create_u16(const char *name, umode_t mode,
+ struct dentry *parent, u16 *value);
+ struct dentry *debugfs_create_u32(const char *name, umode_t mode,
+ struct dentry *parent, u32 *value);
+ void debugfs_create_u64(const char *name, umode_t mode,
+ struct dentry *parent, u64 *value);
+
+这些文件支持读取和写入给定值。如果某个文件不支持写入,只需根据需要设置mode
+参数位。这些文件中的值以十进制表示;如果需要使用十六进制,可以使用以下函数
+替代::
+
+ void debugfs_create_x8(const char *name, umode_t mode,
+ struct dentry *parent, u8 *value);
+ void debugfs_create_x16(const char *name, umode_t mode,
+ struct dentry *parent, u16 *value);
+ void debugfs_create_x32(const char *name, umode_t mode,
+ struct dentry *parent, u32 *value);
+ void debugfs_create_x64(const char *name, umode_t mode,
+ struct dentry *parent, u64 *value);
+
+这些功能只有在开发人员知道导出值的大小的时候才有用。某些数据类型在不同的架构上
+有不同的宽度,这样会使情况变得有些复杂。在这种特殊情况下可以使用以下函数::
+
+ void debugfs_create_size_t(const char *name, umode_t mode,
+ struct dentry *parent, size_t *value);
+
+不出所料,此函数将创建一个debugfs文件来表示类型为size_t的变量。
+
+同样地,也有导出无符号长整型变量的函数,分别以十进制和十六进制表示如下::
+
+ struct dentry *debugfs_create_ulong(const char *name, umode_t mode,
+ struct dentry *parent,
+ unsigned long *value);
+ void debugfs_create_xul(const char *name, umode_t mode,
+ struct dentry *parent, unsigned long *value);
+
+布尔值可以通过以下方式放置在debugfs中::
+
+ struct dentry *debugfs_create_bool(const char *name, umode_t mode,
+ struct dentry *parent, bool *value);
+
+
+读取结果文件将产生Y(对于非零值)或N,后跟换行符写入的时候,它只接受大写或小写
+值或1或0。任何其他输入将被忽略。
+
+同样,atomic_t类型的值也可以放置在debugfs中::
+
+ void debugfs_create_atomic_t(const char *name, umode_t mode,
+ struct dentry *parent, atomic_t *value)
+
+读取此文件将获得atomic_t值,写入此文件将设置atomic_t值。
+
+另一个选择是通过以下结构体和函数导出一个任意二进制数据块::
+
+ struct debugfs_blob_wrapper {
+ void *data;
+ unsigned long size;
+ };
+
+ struct dentry *debugfs_create_blob(const char *name, umode_t mode,
+ struct dentry *parent,
+ struct debugfs_blob_wrapper *blob);
+
+读取此文件将返回由指针指向debugfs_blob_wrapper结构体的数据。一些驱动使用“blobs”
+作为一种返回几行(静态)格式化文本的简单方法。这个函数可用于导出二进制信息,但
+似乎在主线中没有任何代码这样做。请注意,使用debugfs_create_blob()命令创建的
+所有文件是只读的。
+
+如果您要转储一个寄存器块(在开发过程中经常会这么做,但是这样的调试代码很少上传
+到主线中。Debugfs提供两个函数:一个用于创建仅寄存器文件,另一个把一个寄存器块
+插入一个顺序文件中::
+
+ struct debugfs_reg32 {
+ char *name;
+ unsigned long offset;
+ };
+
+ struct debugfs_regset32 {
+ struct debugfs_reg32 *regs;
+ int nregs;
+ void __iomem *base;
+ };
+
+ struct dentry *debugfs_create_regset32(const char *name, umode_t mode,
+ struct dentry *parent,
+ struct debugfs_regset32 *regset);
+
+ void debugfs_print_regs32(struct seq_file *s, struct debugfs_reg32 *regs,
+ int nregs, void __iomem *base, char *prefix);
+
+“base”参数可能为0,但您可能需要使用__stringify构建reg32数组,实际上有许多寄存器
+名称(宏)是寄存器块在基址上的字节偏移量。
+
+如果要在debugfs中转储u32数组,可以使用以下函数创建文件::
+
+ void debugfs_create_u32_array(const char *name, umode_t mode,
+ struct dentry *parent,
+ u32 *array, u32 elements);
+
+“array”参数提供数据,而“elements”参数为数组中元素的数量。注意:数组创建后,数组
+大小无法更改。
+
+有一个函数来创建与设备相关的seq_file::
+
+ struct dentry *debugfs_create_devm_seqfile(struct device *dev,
+ const char *name,
+ struct dentry *parent,
+ int (*read_fn)(struct seq_file *s,
+ void *data));
+
+“dev”参数是与此debugfs文件相关的设备,并且“read_fn”是一个函数指针,这个函数在
+打印seq_file内容的时候被回调。
+
+还有一些其他的面向目录的函数::
+
+ struct dentry *debugfs_rename(struct dentry *old_dir,
+ struct dentry *old_dentry,
+ struct dentry *new_dir,
+ const char *new_name);
+
+ struct dentry *debugfs_create_symlink(const char *name,
+ struct dentry *parent,
+ const char *target);
+
+调用debugfs_rename()将为现有的debugfs文件重命名,可能同时切换目录。 new_name
+函数调用之前不能存在;返回值为old_dentry,其中包含更新的信息。可以使用
+debugfs_create_symlink()创建符号链接。
+
+所有debugfs用户必须考虑的一件事是:
+
+debugfs不会自动清除在其中创建的任何目录。如果一个模块在不显式删除debugfs目录的
+情况下卸载模块,结果将会遗留很多野指针,从而导致系统不稳定。因此,所有debugfs
+用户-至少是那些可以作为模块构建的用户-必须做模块卸载的时候准备删除在此创建的
+所有文件和目录。一份文件可以通过以下方式删除::
+
+ void debugfs_remove(struct dentry *dentry);
+
+dentry值可以为NULL或错误值,在这种情况下,不会有任何文件被删除。
+
+很久以前,内核开发者使用debugfs时需要记录他们创建的每个dentry指针,以便最后所有
+文件都可以被清理掉。但是,现在debugfs用户能调用以下函数递归清除之前创建的文件::
+
+ void debugfs_remove_recursive(struct dentry *dentry);
+
+如果将对应顶层目录的dentry传递给以上函数,则该目录下的整个层次结构将会被删除。
+
+注释:
+[1] http://lwn.net/Articles/309298/
diff --git a/Documentation/translations/zh_CN/filesystems/index.rst b/Documentation/translations/zh_CN/filesystems/index.rst
index 14f155edaf69..186501d13bc1 100644
--- a/Documentation/translations/zh_CN/filesystems/index.rst
+++ b/Documentation/translations/zh_CN/filesystems/index.rst
@@ -24,4 +24,5 @@ Linux Kernel中的文件系统
:maxdepth: 2
virtiofs
+ debugfs
diff --git a/Documentation/translations/zh_CN/filesystems/sysfs.txt b/Documentation/translations/zh_CN/filesystems/sysfs.txt
index ee1f37da5b23..fcf620049d11 100644
--- a/Documentation/translations/zh_CN/filesystems/sysfs.txt
+++ b/Documentation/translations/zh_CN/filesystems/sysfs.txt
@@ -1,4 +1,4 @@
-Chinese translated version of Documentation/filesystems/sysfs.txt
+Chinese translated version of Documentation/filesystems/sysfs.rst
If you have any comment or update to the content, please contact the
original document maintainer directly. However, if you have a problem
@@ -10,7 +10,7 @@ Maintainer: Patrick Mochel <mochel@osdl.org>
Mike Murphy <mamurph@cs.clemson.edu>
Chinese maintainer: Fu Wei <tekkamanninja@gmail.com>
---------------------------------------------------------------------
-Documentation/filesystems/sysfs.txt 的中文翻译
+Documentation/filesystems/sysfs.rst 的中文翻译
如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文
交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻
@@ -40,7 +40,7 @@ sysfs 是一个最初基于 ramfs 且位于内存的文件系统。它提供导
数据结构及其属性,以及它们之间的关联到用户空间的方法。
sysfs 始终与 kobject 的底层结构紧密相关。请阅读
-Documentation/kobject.txt 文档以获得更多关于 kobject 接口的
+Documentation/core-api/kobject.rst 文档以获得更多关于 kobject 接口的
信息。
@@ -281,7 +281,7 @@ drivers/ 包含了每个已为特定总线上的设备而挂载的驱动程序
假定驱动没有跨越多个总线类型)。
fs/ 包含了一个为文件系统设立的目录。现在每个想要导出属性的文件系统必须
-在 fs/ 下创建自己的层次结构(参见Documentation/filesystems/fuse.txt)。
+在 fs/ 下创建自己的层次结构(参见Documentation/filesystems/fuse.rst)。
dev/ 包含两个子目录: char/ 和 block/。在这两个子目录中,有以
<major>:<minor> 格式命名的符号链接。这些符号链接指向 sysfs 目录
diff --git a/Documentation/translations/zh_CN/process/submit-checklist.rst b/Documentation/translations/zh_CN/process/submit-checklist.rst
index 8738c55e42a2..50386e0e42e7 100644
--- a/Documentation/translations/zh_CN/process/submit-checklist.rst
+++ b/Documentation/translations/zh_CN/process/submit-checklist.rst
@@ -97,7 +97,7 @@ Linux内核补丁提交清单
24) 所有内存屏障例如 ``barrier()``, ``rmb()``, ``wmb()`` 都需要源代码中的注
释来解释它们正在执行的操作及其原因的逻辑。
-25) 如果补丁添加了任何ioctl,那么也要更新 ``Documentation/ioctl/ioctl-number.rst``
+25) 如果补丁添加了任何ioctl,那么也要更新 ``Documentation/userspace-api/ioctl/ioctl-number.rst``
26) 如果修改后的源代码依赖或使用与以下 ``Kconfig`` 符号相关的任何内核API或
功能,则在禁用相关 ``Kconfig`` 符号和/或 ``=m`` (如果该选项可用)的情况
diff --git a/Documentation/translations/zh_CN/video4linux/v4l2-framework.txt b/Documentation/translations/zh_CN/video4linux/v4l2-framework.txt
index 4d2af480a112..a88fcbc11eca 100644
--- a/Documentation/translations/zh_CN/video4linux/v4l2-framework.txt
+++ b/Documentation/translations/zh_CN/video4linux/v4l2-framework.txt
@@ -488,7 +488,7 @@ struct v4l2_subdev *sd = v4l2_i2c_new_subdev(v4l2_dev, adapter,
这个函数会加载给定的模块(如果没有模块需要加载,可以为 NULL),
并用给定的 i2c 适配器结构体指针(i2c_adapter)和 器件地址(chip/address)
-作为参数调用 i2c_new_device()。如果一切顺利,则就在 v4l2_device
+作为参数调用 i2c_new_client_device()。如果一切顺利,则就在 v4l2_device
中注册了子设备。
你也可以利用 v4l2_i2c_new_subdev()的最后一个参数,传递一个可能的
diff --git a/Documentation/usb/gadget_configfs.rst b/Documentation/usb/gadget_configfs.rst
index 54fb08baae22..158e48dab586 100644
--- a/Documentation/usb/gadget_configfs.rst
+++ b/Documentation/usb/gadget_configfs.rst
@@ -24,7 +24,7 @@ Linux provides a number of functions for gadgets to use.
Creating a gadget means deciding what configurations there will be
and which functions each configuration will provide.
-Configfs (please see `Documentation/filesystems/configfs/*`) lends itself nicely
+Configfs (please see `Documentation/filesystems/configfs.rst`) lends itself nicely
for the purpose of telling the kernel about the above mentioned decision.
This document is about how to do it.
@@ -354,7 +354,7 @@ the directories in general can be named at will. A group can have
a number of its default sub-groups created automatically.
For more information on configfs please see
-`Documentation/filesystems/configfs/*`.
+`Documentation/filesystems/configfs.rst`.
The concepts described above translate to USB gadgets like this:
diff --git a/Documentation/usb/raw-gadget.rst b/Documentation/usb/raw-gadget.rst
index 9e78cb858f86..68d879a8009e 100644
--- a/Documentation/usb/raw-gadget.rst
+++ b/Documentation/usb/raw-gadget.rst
@@ -27,9 +27,8 @@ differences are:
3. Raw Gadget provides a way to select a UDC device/driver to bind to,
while GadgetFS currently binds to the first available UDC.
-4. Raw Gadget uses predictable endpoint names (handles) across different
- UDCs (as long as UDCs have enough endpoints of each required transfer
- type).
+4. Raw Gadget explicitly exposes information about endpoints addresses and
+ capabilities allowing a user to write UDC-agnostic gadgets.
5. Raw Gadget has ioctl-based interface instead of a filesystem-based one.
@@ -50,12 +49,36 @@ The typical usage of Raw Gadget looks like:
Raw Gadget and react to those depending on what kind of USB device
needs to be emulated.
+Note, that some UDC drivers have fixed addresses assigned to endpoints, and
+therefore arbitrary endpoint addresses can't be used in the descriptors.
+Nevertheles, Raw Gadget provides a UDC-agnostic way to write USB gadgets.
+Once a USB_RAW_EVENT_CONNECT event is received via USB_RAW_IOCTL_EVENT_FETCH,
+the USB_RAW_IOCTL_EPS_INFO ioctl can be used to find out information about
+endpoints that the UDC driver has. Based on that information, the user must
+chose UDC endpoints that will be used for the gadget being emulated, and
+properly assign addresses in endpoint descriptors.
+
+You can find usage examples (along with a test suite) here:
+
+https://github.com/xairy/raw-gadget
+
+Internal details
+~~~~~~~~~~~~~~~~
+
+Currently every endpoint read/write ioctl submits a USB request and waits until
+its completion. This is the desired mode for coverage-guided fuzzing (as we'd
+like all USB request processing happen during the lifetime of a syscall),
+and must be kept in the implementation. (This might be slow for real world
+applications, thus the O_NONBLOCK improvement suggestion below.)
+
Potential future improvements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- Implement ioctl's for setting/clearing halt status on endpoints.
-
-- Reporting more events (suspend, resume, etc.) through
- USB_RAW_IOCTL_EVENT_FETCH.
+- Report more events (suspend, resume, etc.) through USB_RAW_IOCTL_EVENT_FETCH.
- Support O_NONBLOCK I/O.
+
+- Support USB 3 features (accept SS endpoint companion descriptor when
+ enabling endpoints; allow providing stream_id for bulk transfers).
+
+- Support ISO transfer features (expose frame_number for completed requests).
diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index f759edafd938..52bf58417653 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -146,6 +146,7 @@ Code Seq# Include File Comments
'H' 40-4F sound/hdspm.h conflict!
'H' 40-4F sound/hdsp.h conflict!
'H' 90 sound/usb/usx2y/usb_stream.h
+'H' 00-0F uapi/misc/habanalabs.h conflict!
'H' A0 uapi/linux/usb/cdc-wdm.h
'H' C0-F0 net/bluetooth/hci.h conflict!
'H' C0-DF net/bluetooth/hidp/hidp.h conflict!
diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index c3129b9ba5cb..57c01f531e61 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -74,7 +74,7 @@ should point to a file descriptor that is opened on the ``/dev/sev``
device, if needed (see individual commands).
On output, ``error`` is zero on success, or an error code. Error codes
-are defined in ``<linux/psp-dev.h>`.
+are defined in ``<linux/psp-dev.h>``.
KVM implements the following commands to support common lifecycle events of SEV
guests, such as launching, running, snapshotting, migrating and decommissioning.
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index efbbe570aa9b..426f94582b7a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2572,13 +2572,15 @@ list in 4.68.
:Parameters: None
:Returns: 0 on success, -1 on error
-This signals to the host kernel that the specified guest is being paused by
-userspace. The host will set a flag in the pvclock structure that is checked
-from the soft lockup watchdog. The flag is part of the pvclock structure that
-is shared between guest and host, specifically the second bit of the flags
+This ioctl sets a flag accessible to the guest indicating that the specified
+vCPU has been paused by the host userspace.
+
+The host will set a flag in the pvclock structure that is checked from the
+soft lockup watchdog. The flag is part of the pvclock structure that is
+shared between guest and host, specifically the second bit of the flags
field of the pvclock_vcpu_time_info structure. It will be set exclusively by
the host and read/cleared exclusively by the guest. The guest operation of
-checking and clearing the flag must an atomic operation so
+checking and clearing the flag must be an atomic operation so
load-link/store-conditional, or equivalent must be used. There are two cases
where the guest will clear the flag: when the soft lockup watchdog timer resets
itself or when a soft lockup is detected. This ioctl can be called any time
@@ -4334,9 +4336,13 @@ Errors:
#define KVM_STATE_NESTED_VMX_SMM_GUEST_MODE 0x00000001
#define KVM_STATE_NESTED_VMX_SMM_VMXON 0x00000002
+#define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE 0x00000001
+
struct kvm_vmx_nested_state_hdr {
+ __u32 flags;
__u64 vmxon_pa;
__u64 vmcs12_pa;
+ __u64 preemption_timer_deadline;
struct {
__u16 flags;
@@ -5066,10 +5072,13 @@ EOI was received.
struct kvm_hyperv_exit {
#define KVM_EXIT_HYPERV_SYNIC 1
#define KVM_EXIT_HYPERV_HCALL 2
+ #define KVM_EXIT_HYPERV_SYNDBG 3
__u32 type;
+ __u32 pad1;
union {
struct {
__u32 msr;
+ __u32 pad2;
__u64 control;
__u64 evt_page;
__u64 msg_page;
@@ -5079,6 +5088,15 @@ EOI was received.
__u64 result;
__u64 params[2];
} hcall;
+ struct {
+ __u32 msr;
+ __u32 pad2;
+ __u64 control;
+ __u64 status;
+ __u64 send_page;
+ __u64 recv_page;
+ __u64 pending_page;
+ } syndbg;
} u;
};
/* KVM_EXIT_HYPERV */
@@ -5095,6 +5113,12 @@ Hyper-V SynIC state change. Notification is used to remap SynIC
event/message pages and to enable/disable SynIC messages/events processing
in userspace.
+ - KVM_EXIT_HYPERV_SYNDBG -- synchronously notify user-space about
+
+Hyper-V Synthetic debugger state change. Notification is used to either update
+the pending_page location or to send a control command (send the buffer located
+in send_page or recv a buffer to recv_page).
+
::
/* KVM_EXIT_ARM_NISV */
@@ -5777,7 +5801,7 @@ will be initialized to 1 when created. This also improves performance because
dirty logging can be enabled gradually in small chunks on the first call
to KVM_CLEAR_DIRTY_LOG. KVM_DIRTY_LOG_INITIALLY_SET depends on
KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (it is also only available on
-x86 for now).
+x86 and arm64 for now).
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make
@@ -5802,6 +5826,23 @@ If present, this capability can be enabled for a VM, meaning that KVM
will allow the transition to secure guest mode. Otherwise KVM will
veto the transition.
+7.20 KVM_CAP_HALT_POLL
+----------------------
+
+:Architectures: all
+:Target: VM
+:Parameters: args[0] is the maximum poll time in nanoseconds
+:Returns: 0 on success; -1 on error
+
+This capability overrides the kvm module parameter halt_poll_ns for the
+target VM.
+
+VCPU polling allows a VCPU to poll for wakeup events instead of immediately
+scheduling during guest halts. The maximum time a VCPU can spend polling is
+controlled by the kvm module parameter halt_poll_ns. This capability allows
+the maximum halt time to specified on a per-VM basis, effectively overriding
+the module parameter for the target VM.
+
8. Other capabilities.
======================
diff --git a/Documentation/virt/kvm/arm/pvtime.rst b/Documentation/virt/kvm/arm/pvtime.rst
index 2357dd2d8655..687b60d76ca9 100644
--- a/Documentation/virt/kvm/arm/pvtime.rst
+++ b/Documentation/virt/kvm/arm/pvtime.rst
@@ -76,5 +76,5 @@ It is advisable that one or more 64k pages are set aside for the purpose of
these structures and not used for other purposes, this enables the guest to map
the region using 64k pages and avoids conflicting attributes with other memory.
-For the user space interface see Documentation/virt/kvm/devices/vcpu.txt
+For the user space interface see Documentation/virt/kvm/devices/vcpu.rst
section "3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL".
diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index 01b081f6e7ea..a7dff9186bed 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -50,8 +50,8 @@ KVM_FEATURE_NOP_IO_DELAY 1 not necessary to perform delays
KVM_FEATURE_MMU_OP 2 deprecated
KVM_FEATURE_CLOCKSOURCE2 3 kvmclock available at msrs
-
0x4b564d00 and 0x4b564d01
+
KVM_FEATURE_ASYNC_PF 4 async pf can be enabled by
writing to msr 0x4b564d02
@@ -86,6 +86,12 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
before using paravirtualized
sched yield.
+KVM_FEATURE_ASYNC_PF_INT 14 guest checks this feature bit
+ before using the second async
+ pf control msr 0x4b564d06 and
+ async pf acknowledgment msr
+ 0x4b564d07.
+
KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
per-cpu warps are expeced in
kvmclock
diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 9963e680770a..ca374d3fe085 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -110,5 +110,5 @@ Returns:
Specifies the base address of the stolen time structure for this VCPU. The
base address must be 64 byte aligned and exist within a valid guest memory
-region. See Documentation/virt/kvm/arm/pvtime.txt for more information
+region. See Documentation/virt/kvm/arm/pvtime.rst for more information
including the layout of the stolen time structure.
diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
index dbaf207e560d..ed4fddd364ea 100644
--- a/Documentation/virt/kvm/hypercalls.rst
+++ b/Documentation/virt/kvm/hypercalls.rst
@@ -22,7 +22,7 @@ S390:
number in R1.
For further information on the S390 diagnose call as supported by KVM,
- refer to Documentation/virt/kvm/s390-diag.txt.
+ refer to Documentation/virt/kvm/s390-diag.rst.
PowerPC:
It uses R3-R10 and hypercall number in R11. R4-R11 are used as output registers.
@@ -30,7 +30,7 @@ PowerPC:
KVM hypercalls uses 4 byte opcode, that are patched with 'hypercall-instructions'
property inside the device tree's /hypervisor node.
- For more information refer to Documentation/virt/kvm/ppc-pv.txt
+ For more information refer to Documentation/virt/kvm/ppc-pv.rst
MIPS:
KVM hypercalls use the HYPCALL instruction with code 0 and the hypercall
diff --git a/Documentation/virt/kvm/index.rst b/Documentation/virt/kvm/index.rst
index dcc252634cf9..b6833c7bb474 100644
--- a/Documentation/virt/kvm/index.rst
+++ b/Documentation/virt/kvm/index.rst
@@ -28,3 +28,5 @@ KVM
arm/index
devices/index
+
+ running-nested-guests
diff --git a/Documentation/virt/kvm/mmu.rst b/Documentation/virt/kvm/mmu.rst
index 60981887d20b..46126ecc70f7 100644
--- a/Documentation/virt/kvm/mmu.rst
+++ b/Documentation/virt/kvm/mmu.rst
@@ -319,7 +319,7 @@ Handling a page fault is performed as follows:
- If both P bit and R/W bit of error code are set, this could possibly
be handled as a "fast page fault" (fixed without taking the MMU lock). See
- the description in Documentation/virt/kvm/locking.txt.
+ the description in Documentation/virt/kvm/locking.rst.
- if needed, walk the guest page tables to determine the guest translation
(gva->gpa or ngpa->gpa)
diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index 33892036672d..e37a14c323d2 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -190,41 +190,72 @@ MSR_KVM_ASYNC_PF_EN:
0x4b564d02
data:
- Bits 63-6 hold 64-byte aligned physical address of a
- 64 byte memory area which must be in guest RAM and must be
- zeroed. Bits 5-3 are reserved and should be zero. Bit 0 is 1
- when asynchronous page faults are enabled on the vcpu 0 when
- disabled. Bit 1 is 1 if asynchronous page faults can be injected
- when vcpu is in cpl == 0. Bit 2 is 1 if asynchronous page faults
- are delivered to L1 as #PF vmexits. Bit 2 can be set only if
- KVM_FEATURE_ASYNC_PF_VMEXIT is present in CPUID.
-
- First 4 byte of 64 byte memory location will be written to by
- the hypervisor at the time of asynchronous page fault (APF)
- injection to indicate type of asynchronous page fault. Value
- of 1 means that the page referred to by the page fault is not
- present. Value 2 means that the page is now available. Disabling
- interrupt inhibits APFs. Guest must not enable interrupt
- before the reason is read, or it may be overwritten by another
- APF. Since APF uses the same exception vector as regular page
- fault guest must reset the reason to 0 before it does
- something that can generate normal page fault. If during page
- fault APF reason is 0 it means that this is regular page
- fault.
-
- During delivery of type 1 APF cr2 contains a token that will
- be used to notify a guest when missing page becomes
- available. When page becomes available type 2 APF is sent with
- cr2 set to the token associated with the page. There is special
- kind of token 0xffffffff which tells vcpu that it should wake
- up all processes waiting for APFs and no individual type 2 APFs
- will be sent.
+ Asynchronous page fault (APF) control MSR.
+
+ Bits 63-6 hold 64-byte aligned physical address of a 64 byte memory area
+ which must be in guest RAM and must be zeroed. This memory is expected
+ to hold a copy of the following structure::
+
+ struct kvm_vcpu_pv_apf_data {
+ /* Used for 'page not present' events delivered via #PF */
+ __u32 flags;
+
+ /* Used for 'page ready' events delivered via interrupt notification */
+ __u32 token;
+
+ __u8 pad[56];
+ __u32 enabled;
+ };
+
+ Bits 5-4 of the MSR are reserved and should be zero. Bit 0 is set to 1
+ when asynchronous page faults are enabled on the vcpu, 0 when disabled.
+ Bit 1 is 1 if asynchronous page faults can be injected when vcpu is in
+ cpl == 0. Bit 2 is 1 if asynchronous page faults are delivered to L1 as
+ #PF vmexits. Bit 2 can be set only if KVM_FEATURE_ASYNC_PF_VMEXIT is
+ present in CPUID. Bit 3 enables interrupt based delivery of 'page ready'
+ events. Bit 3 can only be set if KVM_FEATURE_ASYNC_PF_INT is present in
+ CPUID.
+
+ 'Page not present' events are currently always delivered as synthetic
+ #PF exception. During delivery of these events APF CR2 register contains
+ a token that will be used to notify the guest when missing page becomes
+ available. Also, to make it possible to distinguish between real #PF and
+ APF, first 4 bytes of 64 byte memory location ('flags') will be written
+ to by the hypervisor at the time of injection. Only first bit of 'flags'
+ is currently supported, when set, it indicates that the guest is dealing
+ with asynchronous 'page not present' event. If during a page fault APF
+ 'flags' is '0' it means that this is regular page fault. Guest is
+ supposed to clear 'flags' when it is done handling #PF exception so the
+ next event can be delivered.
+
+ Note, since APF 'page not present' events use the same exception vector
+ as regular page fault, guest must reset 'flags' to '0' before it does
+ something that can generate normal page fault.
+
+ Bytes 5-7 of 64 byte memory location ('token') will be written to by the
+ hypervisor at the time of APF 'page ready' event injection. The content
+ of these bytes is a token which was previously delivered as 'page not
+ present' event. The event indicates the page in now available. Guest is
+ supposed to write '0' to 'token' when it is done handling 'page ready'
+ event and to write 1' to MSR_KVM_ASYNC_PF_ACK after clearing the location;
+ writing to the MSR forces KVM to re-scan its queue and deliver the next
+ pending notification.
+
+ Note, MSR_KVM_ASYNC_PF_INT MSR specifying the interrupt vector for 'page
+ ready' APF delivery needs to be written to before enabling APF mechanism
+ in MSR_KVM_ASYNC_PF_EN or interrupt #0 can get injected. The MSR is
+ available if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
+
+ Note, previously, 'page ready' events were delivered via the same #PF
+ exception as 'page not present' events but this is now deprecated. If
+ bit 3 (interrupt based delivery) is not set APF events are not delivered.
If APF is disabled while there are outstanding APFs, they will
not be delivered.
- Currently type 2 APF will be always delivered on the same vcpu as
- type 1 was, but guest should not rely on that.
+ Currently 'page ready' APF events will be always delivered on the
+ same vcpu as 'page not present' event was, but guest should not rely on
+ that.
MSR_KVM_STEAL_TIME:
0x4b564d03
@@ -319,3 +350,29 @@ data:
KVM guests can request the host not to poll on HLT, for example if
they are performing polling themselves.
+
+MSR_KVM_ASYNC_PF_INT:
+ 0x4b564d06
+
+data:
+ Second asynchronous page fault (APF) control MSR.
+
+ Bits 0-7: APIC vector for delivery of 'page ready' APF events.
+ Bits 8-63: Reserved
+
+ Interrupt vector for asynchnonous 'page ready' notifications delivery.
+ The vector has to be set up before asynchronous page fault mechanism
+ is enabled in MSR_KVM_ASYNC_PF_EN. The MSR is only available if
+ KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
+
+MSR_KVM_ASYNC_PF_ACK:
+ 0x4b564d07
+
+data:
+ Asynchronous page fault (APF) acknowledgment.
+
+ When the guest is done processing 'page ready' APF event and 'token'
+ field in 'struct kvm_vcpu_pv_apf_data' is cleared it is supposed to
+ write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
+ and check if there are more notifications pending. The MSR is available
+ if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
diff --git a/Documentation/virt/kvm/nested-vmx.rst b/Documentation/virt/kvm/nested-vmx.rst
index 592b0ab6970b..89851cbb7df9 100644
--- a/Documentation/virt/kvm/nested-vmx.rst
+++ b/Documentation/virt/kvm/nested-vmx.rst
@@ -116,10 +116,7 @@ struct shadow_vmcs is ever changed.
natural_width cr4_guest_host_mask;
natural_width cr0_read_shadow;
natural_width cr4_read_shadow;
- natural_width cr3_target_value0;
- natural_width cr3_target_value1;
- natural_width cr3_target_value2;
- natural_width cr3_target_value3;
+ natural_width dead_space[4]; /* Last remnants of cr3_target_value[0-3]. */
natural_width exit_qualification;
natural_width guest_linear_address;
natural_width guest_cr0;
diff --git a/Documentation/virt/kvm/review-checklist.rst b/Documentation/virt/kvm/review-checklist.rst
index 1f86a9d3f705..dc01aea4057b 100644
--- a/Documentation/virt/kvm/review-checklist.rst
+++ b/Documentation/virt/kvm/review-checklist.rst
@@ -10,7 +10,7 @@ Review checklist for kvm patches
2. Patches should be against kvm.git master branch.
3. If the patch introduces or modifies a new userspace API:
- - the API must be documented in Documentation/virt/kvm/api.txt
+ - the API must be documented in Documentation/virt/kvm/api.rst
- the API must be discoverable using KVM_CHECK_EXTENSION
4. New state must include support for save/restore.
diff --git a/Documentation/virt/kvm/running-nested-guests.rst b/Documentation/virt/kvm/running-nested-guests.rst
new file mode 100644
index 000000000000..d0a1fc754c84
--- /dev/null
+++ b/Documentation/virt/kvm/running-nested-guests.rst
@@ -0,0 +1,276 @@
+==============================
+Running nested guests with KVM
+==============================
+
+A nested guest is the ability to run a guest inside another guest (it
+can be KVM-based or a different hypervisor). The straightforward
+example is a KVM guest that in turn runs on a KVM guest (the rest of
+this document is built on this example)::
+
+ .----------------. .----------------.
+ | | | |
+ | L2 | | L2 |
+ | (Nested Guest) | | (Nested Guest) |
+ | | | |
+ |----------------'--'----------------|
+ | |
+ | L1 (Guest Hypervisor) |
+ | KVM (/dev/kvm) |
+ | |
+ .------------------------------------------------------.
+ | L0 (Host Hypervisor) |
+ | KVM (/dev/kvm) |
+ |------------------------------------------------------|
+ | Hardware (with virtualization extensions) |
+ '------------------------------------------------------'
+
+Terminology:
+
+- L0 – level-0; the bare metal host, running KVM
+
+- L1 – level-1 guest; a VM running on L0; also called the "guest
+ hypervisor", as it itself is capable of running KVM.
+
+- L2 – level-2 guest; a VM running on L1, this is the "nested guest"
+
+.. note:: The above diagram is modelled after the x86 architecture;
+ s390x, ppc64 and other architectures are likely to have
+ a different design for nesting.
+
+ For example, s390x always has an LPAR (LogicalPARtition)
+ hypervisor running on bare metal, adding another layer and
+ resulting in at least four levels in a nested setup — L0 (bare
+ metal, running the LPAR hypervisor), L1 (host hypervisor), L2
+ (guest hypervisor), L3 (nested guest).
+
+ This document will stick with the three-level terminology (L0,
+ L1, and L2) for all architectures; and will largely focus on
+ x86.
+
+
+Use Cases
+---------
+
+There are several scenarios where nested KVM can be useful, to name a
+few:
+
+- As a developer, you want to test your software on different operating
+ systems (OSes). Instead of renting multiple VMs from a Cloud
+ Provider, using nested KVM lets you rent a large enough "guest
+ hypervisor" (level-1 guest). This in turn allows you to create
+ multiple nested guests (level-2 guests), running different OSes, on
+ which you can develop and test your software.
+
+- Live migration of "guest hypervisors" and their nested guests, for
+ load balancing, disaster recovery, etc.
+
+- VM image creation tools (e.g. ``virt-install``, etc) often run
+ their own VM, and users expect these to work inside a VM.
+
+- Some OSes use virtualization internally for security (e.g. to let
+ applications run safely in isolation).
+
+
+Enabling "nested" (x86)
+-----------------------
+
+From Linux kernel v4.19 onwards, the ``nested`` KVM parameter is enabled
+by default for Intel and AMD. (Though your Linux distribution might
+override this default.)
+
+In case you are running a Linux kernel older than v4.19, to enable
+nesting, set the ``nested`` KVM module parameter to ``Y`` or ``1``. To
+persist this setting across reboots, you can add it in a config file, as
+shown below:
+
+1. On the bare metal host (L0), list the kernel modules and ensure that
+ the KVM modules::
+
+ $ lsmod | grep -i kvm
+ kvm_intel 133627 0
+ kvm 435079 1 kvm_intel
+
+2. Show information for ``kvm_intel`` module::
+
+ $ modinfo kvm_intel | grep -i nested
+ parm: nested:bool
+
+3. For the nested KVM configuration to persist across reboots, place the
+ below in ``/etc/modprobed/kvm_intel.conf`` (create the file if it
+ doesn't exist)::
+
+ $ cat /etc/modprobe.d/kvm_intel.conf
+ options kvm-intel nested=y
+
+4. Unload and re-load the KVM Intel module::
+
+ $ sudo rmmod kvm-intel
+ $ sudo modprobe kvm-intel
+
+5. Verify if the ``nested`` parameter for KVM is enabled::
+
+ $ cat /sys/module/kvm_intel/parameters/nested
+ Y
+
+For AMD hosts, the process is the same as above, except that the module
+name is ``kvm-amd``.
+
+
+Additional nested-related kernel parameters (x86)
+-------------------------------------------------
+
+If your hardware is sufficiently advanced (Intel Haswell processor or
+higher, which has newer hardware virt extensions), the following
+additional features will also be enabled by default: "Shadow VMCS
+(Virtual Machine Control Structure)", APIC Virtualization on your bare
+metal host (L0). Parameters for Intel hosts::
+
+ $ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs
+ Y
+
+ $ cat /sys/module/kvm_intel/parameters/enable_apicv
+ Y
+
+ $ cat /sys/module/kvm_intel/parameters/ept
+ Y
+
+.. note:: If you suspect your L2 (i.e. nested guest) is running slower,
+ ensure the above are enabled (particularly
+ ``enable_shadow_vmcs`` and ``ept``).
+
+
+Starting a nested guest (x86)
+-----------------------------
+
+Once your bare metal host (L0) is configured for nesting, you should be
+able to start an L1 guest with::
+
+ $ qemu-kvm -cpu host [...]
+
+The above will pass through the host CPU's capabilities as-is to the
+gues); or for better live migration compatibility, use a named CPU
+model supported by QEMU. e.g.::
+
+ $ qemu-kvm -cpu Haswell-noTSX-IBRS,vmx=on
+
+then the guest hypervisor will subsequently be capable of running a
+nested guest with accelerated KVM.
+
+
+Enabling "nested" (s390x)
+-------------------------
+
+1. On the host hypervisor (L0), enable the ``nested`` parameter on
+ s390x::
+
+ $ rmmod kvm
+ $ modprobe kvm nested=1
+
+.. note:: On s390x, the kernel parameter ``hpage`` is mutually exclusive
+ with the ``nested`` paramter — i.e. to be able to enable
+ ``nested``, the ``hpage`` parameter *must* be disabled.
+
+2. The guest hypervisor (L1) must be provided with the ``sie`` CPU
+ feature — with QEMU, this can be done by using "host passthrough"
+ (via the command-line ``-cpu host``).
+
+3. Now the KVM module can be loaded in the L1 (guest hypervisor)::
+
+ $ modprobe kvm
+
+
+Live migration with nested KVM
+------------------------------
+
+Migrating an L1 guest, with a *live* nested guest in it, to another
+bare metal host, works as of Linux kernel 5.3 and QEMU 4.2.0 for
+Intel x86 systems, and even on older versions for s390x.
+
+On AMD systems, once an L1 guest has started an L2 guest, the L1 guest
+should no longer be migrated or saved (refer to QEMU documentation on
+"savevm"/"loadvm") until the L2 guest shuts down. Attempting to migrate
+or save-and-load an L1 guest while an L2 guest is running will result in
+undefined behavior. You might see a ``kernel BUG!`` entry in ``dmesg``, a
+kernel 'oops', or an outright kernel panic. Such a migrated or loaded L1
+guest can no longer be considered stable or secure, and must be restarted.
+Migrating an L1 guest merely configured to support nesting, while not
+actually running L2 guests, is expected to function normally even on AMD
+systems but may fail once guests are started.
+
+Migrating an L2 guest is always expected to succeed, so all the following
+scenarios should work even on AMD systems:
+
+- Migrating a nested guest (L2) to another L1 guest on the *same* bare
+ metal host.
+
+- Migrating a nested guest (L2) to another L1 guest on a *different*
+ bare metal host.
+
+- Migrating a nested guest (L2) to a bare metal host.
+
+Reporting bugs from nested setups
+-----------------------------------
+
+Debugging "nested" problems can involve sifting through log files across
+L0, L1 and L2; this can result in tedious back-n-forth between the bug
+reporter and the bug fixer.
+
+- Mention that you are in a "nested" setup. If you are running any kind
+ of "nesting" at all, say so. Unfortunately, this needs to be called
+ out because when reporting bugs, people tend to forget to even
+ *mention* that they're using nested virtualization.
+
+- Ensure you are actually running KVM on KVM. Sometimes people do not
+ have KVM enabled for their guest hypervisor (L1), which results in
+ them running with pure emulation or what QEMU calls it as "TCG", but
+ they think they're running nested KVM. Thus confusing "nested Virt"
+ (which could also mean, QEMU on KVM) with "nested KVM" (KVM on KVM).
+
+Information to collect (generic)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The following is not an exhaustive list, but a very good starting point:
+
+ - Kernel, libvirt, and QEMU version from L0
+
+ - Kernel, libvirt and QEMU version from L1
+
+ - QEMU command-line of L1 -- when using libvirt, you'll find it here:
+ ``/var/log/libvirt/qemu/instance.log``
+
+ - QEMU command-line of L2 -- as above, when using libvirt, get the
+ complete libvirt-generated QEMU command-line
+
+ - ``cat /sys/cpuinfo`` from L0
+
+ - ``cat /sys/cpuinfo`` from L1
+
+ - ``lscpu`` from L0
+
+ - ``lscpu`` from L1
+
+ - Full ``dmesg`` output from L0
+
+ - Full ``dmesg`` output from L1
+
+x86-specific info to collect
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Both the below commands, ``x86info`` and ``dmidecode``, should be
+available on most Linux distributions with the same name:
+
+ - Output of: ``x86info -a`` from L0
+
+ - Output of: ``x86info -a`` from L1
+
+ - Output of: ``dmidecode`` from L0
+
+ - Output of: ``dmidecode`` from L1
+
+s390x-specific info to collect
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Along with the earlier mentioned generic details, the below is
+also recommended:
+
+ - ``/proc/sysinfo`` from L1; this will also include the info from L0
diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index 4e3e9362afeb..561969754bc0 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -161,7 +161,7 @@ device must complete the update before the driver callback returns.
When the device driver wants to populate a range of virtual addresses, it can
use::
- long hmm_range_fault(struct hmm_range *range);
+ int hmm_range_fault(struct hmm_range *range);
It will trigger a page fault on missing or read-only entries if write access is
requested (see below). Page faults use the generic mm page fault code path just
@@ -184,10 +184,7 @@ The usage pattern is::
range.notifier = &interval_sub;
range.start = ...;
range.end = ...;
- range.pfns = ...;
- range.flags = ...;
- range.values = ...;
- range.pfn_shift = ...;
+ range.hmm_pfns = ...;
if (!mmget_not_zero(interval_sub->notifier.mm))
return -EFAULT;
@@ -229,15 +226,10 @@ The hmm_range struct has 2 fields, default_flags and pfn_flags_mask, that specif
fault or snapshot policy for the whole range instead of having to set them
for each entry in the pfns array.
-For instance, if the device flags for range.flags are::
+For instance if the device driver wants pages for a range with at least read
+permission, it sets::
- range.flags[HMM_PFN_VALID] = (1 << 63);
- range.flags[HMM_PFN_WRITE] = (1 << 62);
-
-and the device driver wants pages for a range with at least read permission,
-it sets::
-
- range->default_flags = (1 << 63);
+ range->default_flags = HMM_PFN_REQ_FAULT;
range->pfn_flags_mask = 0;
and calls hmm_range_fault() as described above. This will fill fault all pages
@@ -246,18 +238,18 @@ in the range with at least read permission.
Now let's say the driver wants to do the same except for one page in the range for
which it wants to have write permission. Now driver set::
- range->default_flags = (1 << 63);
- range->pfn_flags_mask = (1 << 62);
- range->pfns[index_of_write] = (1 << 62);
+ range->default_flags = HMM_PFN_REQ_FAULT;
+ range->pfn_flags_mask = HMM_PFN_REQ_WRITE;
+ range->pfns[index_of_write] = HMM_PFN_REQ_WRITE;
With this, HMM will fault in all pages with at least read (i.e., valid) and for the
address == range->start + (index_of_write << PAGE_SHIFT) it will fault with
write permission i.e., if the CPU pte does not have write permission set then HMM
will call handle_mm_fault().
-Note that HMM will populate the pfns array with write permission for any page
-that is mapped with CPU write permission no matter what values are set
-in default_flags or pfn_flags_mask.
+After hmm_range_fault completes the flag bits are set to the current state of
+the page tables, ie HMM_PFN_VALID | HMM_PFN_WRITE will be set if the page is
+writable.
Represent and manage device memory from core kernel point of view
diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst
index e8d943b21cf9..611140ffef7e 100644
--- a/Documentation/vm/index.rst
+++ b/Documentation/vm/index.rst
@@ -31,6 +31,7 @@ descriptions of data structures and algorithms.
active_mm
balance
cleancache
+ free_page_reporting
frontswap
highmem
hmm
diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst
index 58a12376b7df..91228044ed16 100644
--- a/Documentation/vm/memory-model.rst
+++ b/Documentation/vm/memory-model.rst
@@ -46,11 +46,10 @@ maps the entire physical memory. For most architectures, the holes
have entries in the `mem_map` array. The `struct page` objects
corresponding to the holes are never fully initialized.
-To allocate the `mem_map` array, architecture specific setup code
-should call :c:func:`free_area_init_node` function or its convenience
-wrapper :c:func:`free_area_init`. Yet, the mappings array is not
-usable until the call to :c:func:`memblock_free_all` that hands all
-the memory to the page allocator.
+To allocate the `mem_map` array, architecture specific setup code should
+call :c:func:`free_area_init` function. Yet, the mappings array is not
+usable until the call to :c:func:`memblock_free_all` that hands all the
+memory to the page allocator.
If an architecture enables `CONFIG_ARCH_HAS_HOLES_MEMORYMODEL` option,
it may free parts of the `mem_map` array that do not cover the
diff --git a/Documentation/vm/page_frags.rst b/Documentation/vm/page_frags.rst
index 637cc49d1b2f..7d6f9385d129 100644
--- a/Documentation/vm/page_frags.rst
+++ b/Documentation/vm/page_frags.rst
@@ -26,7 +26,7 @@ to be disabled when executing the fragment allocation.
The network stack uses two separate caches per CPU to handle fragment
allocation. The netdev_alloc_cache is used by callers making use of the
-__netdev_alloc_frag and __netdev_alloc_skb calls. The napi_alloc_cache is
+netdev_alloc_frag and __netdev_alloc_skb calls. The napi_alloc_cache is
used by callers of the __napi_alloc_frag and __napi_alloc_skb calls. The
main difference between these two calls is the context in which they may be
called. The "netdev" prefixed functions are usable in any context as these
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst
index 0ed5ab8c7ab4..079f3f8c4784 100644
--- a/Documentation/vm/page_owner.rst
+++ b/Documentation/vm/page_owner.rst
@@ -83,8 +83,7 @@ Usage
4) Analyze information from page owner::
cat /sys/kernel/debug/page_owner > page_owner_full.txt
- grep -v ^PFN page_owner_full.txt > page_owner.txt
- ./page_owner_sort page_owner.txt sorted_page_owner.txt
+ ./page_owner_sort page_owner_full.txt sorted_page_owner.txt
See the result about who allocated each page
in the ``sorted_page_owner.txt``.
diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index 933ada4368ff..4eee598555c9 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -49,7 +49,7 @@ Possible debug options are::
P Poisoning (object and padding)
U User tracking (free and alloc)
T Trace (please only use on single slabs)
- A Toggle failslab filter mark for the cache
+ A Enable failslab filter mark for the cache
O Switch debugging off for caches that would have
caused higher minimum slab orders
- Switch all debugging off (useful if the kernel is
diff --git a/Documentation/vm/zswap.rst b/Documentation/vm/zswap.rst
index f8c6a79d7c70..d8d9fa4a1f0d 100644
--- a/Documentation/vm/zswap.rst
+++ b/Documentation/vm/zswap.rst
@@ -140,10 +140,10 @@ without any real benefit but with a performance drop for the system), a
special parameter has been introduced to implement a sort of hysteresis to
refuse taking pages into zswap pool until it has sufficient space if the limit
has been hit. To set the threshold at which zswap would start accepting pages
-again after it became full, use the sysfs ``accept_threhsold_percent``
+again after it became full, use the sysfs ``accept_threshold_percent``
attribute, e. g.::
- echo 80 > /sys/module/zswap/parameters/accept_threhsold_percent
+ echo 80 > /sys/module/zswap/parameters/accept_threshold_percent
Setting this parameter to 100 will disable the hysteresis.
diff --git a/Documentation/watchdog/convert_drivers_to_kernel_api.rst b/Documentation/watchdog/convert_drivers_to_kernel_api.rst
index dd934cc08e40..a1c3f038ce0e 100644
--- a/Documentation/watchdog/convert_drivers_to_kernel_api.rst
+++ b/Documentation/watchdog/convert_drivers_to_kernel_api.rst
@@ -2,7 +2,7 @@
Converting old watchdog drivers to the watchdog framework
=========================================================
-by Wolfram Sang <w.sang@pengutronix.de>
+by Wolfram Sang <wsa@kernel.org>
Before the watchdog framework came into the kernel, every driver had to
implement the API on its own. Now, as the framework factored out the common
@@ -115,7 +115,7 @@ Add the watchdog operations
---------------------------
All possible callbacks are defined in 'struct watchdog_ops'. You can find it
-explained in 'watchdog-kernel-api.txt' in this directory. start(), stop() and
+explained in 'watchdog-kernel-api.txt' in this directory. start() and
owner must be set, the rest are optional. You will easily find corresponding
functions in the old driver. Note that you will now get a pointer to the
watchdog_device as a parameter to these functions, so you probably have to
diff --git a/Documentation/watchdog/watchdog-kernel-api.rst b/Documentation/watchdog/watchdog-kernel-api.rst
index 864edbe932c1..068a55ee0d4a 100644
--- a/Documentation/watchdog/watchdog-kernel-api.rst
+++ b/Documentation/watchdog/watchdog-kernel-api.rst
@@ -123,8 +123,8 @@ The list of watchdog operations is defined as::
struct module *owner;
/* mandatory operations */
int (*start)(struct watchdog_device *);
- int (*stop)(struct watchdog_device *);
/* optional operations */
+ int (*stop)(struct watchdog_device *);
int (*ping)(struct watchdog_device *);
unsigned int (*status)(struct watchdog_device *);
int (*set_timeout)(struct watchdog_device *, unsigned int);
diff --git a/Documentation/x86/x86_64/uefi.rst b/Documentation/x86/x86_64/uefi.rst
index 88c3ba32546f..3b894103a734 100644
--- a/Documentation/x86/x86_64/uefi.rst
+++ b/Documentation/x86/x86_64/uefi.rst
@@ -36,7 +36,7 @@ Mechanics
elilo bootloader with x86_64 support, elilo configuration file,
kernel image built in first step and corresponding
- initrd. Instructions on building elilo and its dependencies
+ initrd. Instructions on building elilo and its dependencies
can be found in the elilo sourceforge project.
- Boot to EFI shell and invoke elilo choosing the kernel image built