diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2021-06-30 21:12:56 +0200 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2021-06-30 21:12:56 +0200 |
commit | df668a5fe461bb9d7e899c538acc7197746038f4 (patch) | |
tree | 315a71104f5cea7feeb56c9f2c768453408b72f7 /Documentation | |
parent | Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid... (diff) | |
parent | block: fix discard request merge (diff) | |
download | linux-df668a5fe461bb9d7e899c538acc7197746038f4.tar.xz linux-df668a5fe461bb9d7e899c538acc7197746038f4.zip |
Merge tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
- disk events cleanup (Christoph)
- gendisk and request queue allocation simplifications (Christoph)
- bdev_disk_changed cleanups (Christoph)
- IO priority improvements (Bart)
- Chained bio completion trace fix (Edward)
- blk-wbt fixes (Jan)
- blk-wbt enable/disable fix (Zhang)
- Scheduler dispatch improvements (Jan, Ming)
- Shared tagset scheduler improvements (John)
- BFQ updates (Paolo, Luca, Pietro)
- BFQ lock inversion fix (Jan)
- Documentation improvements (Kir)
- CLONE_IO block cgroup fix (Tejun)
- Remove of ancient and deprecated block dump feature (zhangyi)
- Discard merge fix (Ming)
- Misc fixes or followup fixes (Colin, Damien, Dan, Long, Max, Thomas,
Yang)
* tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block: (129 commits)
block: fix discard request merge
block/mq-deadline: Remove a WARN_ON_ONCE() call
blk-mq: update hctx->dispatch_busy in case of real scheduler
blk: Fix lock inversion between ioc lock and bfqd lock
bfq: Remove merged request already in bfq_requests_merged()
block: pass a gendisk to bdev_disk_changed
block: move bdev_disk_changed
block: add the events* attributes to disk_attrs
block: move the disk events code to a separate file
block: fix trace completion for chained bio
block/partitions/msdos: Fix typo inidicator -> indicator
block, bfq: reset waker pointer with shared queues
block, bfq: check waker only for queues with no in-flight I/O
block, bfq: avoid delayed merge of async queues
block, bfq: boost throughput by extending queue-merging times
block, bfq: consider also creation time in delayed stable merge
block, bfq: fix delayed stable merge check
block, bfq: let also stably merged queues enjoy weight raising
blk-wbt: make sure throttle is enabled properly
blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled()
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/admin-guide/cgroup-v1/blkio-controller.rst | 155 | ||||
-rw-r--r-- | Documentation/admin-guide/cgroup-v2.rst | 55 | ||||
-rw-r--r-- | Documentation/admin-guide/laptops/laptop-mode.rst | 11 | ||||
-rw-r--r-- | Documentation/admin-guide/sysctl/vm.rst | 8 | ||||
-rw-r--r-- | Documentation/block/bfq-iosched.rst | 38 | ||||
-rw-r--r-- | Documentation/filesystems/locking.rst | 2 |
6 files changed, 163 insertions, 106 deletions
diff --git a/Documentation/admin-guide/cgroup-v1/blkio-controller.rst b/Documentation/admin-guide/cgroup-v1/blkio-controller.rst index 36d43ae7dc13..16253eda192e 100644 --- a/Documentation/admin-guide/cgroup-v1/blkio-controller.rst +++ b/Documentation/admin-guide/cgroup-v1/blkio-controller.rst @@ -17,36 +17,37 @@ level logical devices like device mapper. HOWTO ===== + Throttling/Upper Limit policy ----------------------------- -- Enable Block IO controller:: +Enable Block IO controller:: CONFIG_BLK_CGROUP=y -- Enable throttling in block layer:: +Enable throttling in block layer:: CONFIG_BLK_DEV_THROTTLING=y -- Mount blkio controller (see cgroups.txt, Why are cgroups needed?):: +Mount blkio controller (see cgroups.txt, Why are cgroups needed?):: mount -t cgroup -o blkio none /sys/fs/cgroup/blkio -- Specify a bandwidth rate on particular device for root group. The format - for policy is "<major>:<minor> <bytes_per_second>":: +Specify a bandwidth rate on particular device for root group. The format +for policy is "<major>:<minor> <bytes_per_second>":: echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device - Above will put a limit of 1MB/second on reads happening for root group - on device having major/minor number 8:16. +This will put a limit of 1MB/second on reads happening for root group +on device having major/minor number 8:16. -- Run dd to read a file and see if rate is throttled to 1MB/s or not:: +Run dd to read a file and see if rate is throttled to 1MB/s or not:: # dd iflag=direct if=/mnt/common/zerofile of=/dev/null bs=4K count=1024 1024+0 records in 1024+0 records out 4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s - Limits for writes can be put using blkio.throttle.write_bps_device file. +Limits for writes can be put using blkio.throttle.write_bps_device file. Hierarchical Cgroups ==================== @@ -79,85 +80,89 @@ following:: Various user visible config options =================================== -CONFIG_BLK_CGROUP - - Block IO controller. -CONFIG_BFQ_CGROUP_DEBUG - - Debug help. Right now some additional stats file show up in cgroup + CONFIG_BLK_CGROUP + Block IO controller. + + CONFIG_BFQ_CGROUP_DEBUG + Debug help. Right now some additional stats file show up in cgroup if this option is enabled. -CONFIG_BLK_DEV_THROTTLING - - Enable block device throttling support in block layer. + CONFIG_BLK_DEV_THROTTLING + Enable block device throttling support in block layer. Details of cgroup files ======================= + Proportional weight policy files -------------------------------- -- blkio.weight - - Specifies per cgroup weight. This is default weight of the group - on all the devices until and unless overridden by per device rule. - (See blkio.weight_device). - Currently allowed range of weights is from 10 to 1000. -- blkio.weight_device - - One can specify per cgroup per device rules using this interface. - These rules override the default value of group weight as specified - by blkio.weight. + blkio.bfq.weight + Specifies per cgroup weight. This is default weight of the group + on all the devices until and unless overridden by per device rule + (see `blkio.bfq.weight_device` below). + + Currently allowed range of weights is from 1 to 1000. For more details, + see Documentation/block/bfq-iosched.rst. + + blkio.bfq.weight_device + Specifes per cgroup per device weights, overriding the default group + weight. For more details, see Documentation/block/bfq-iosched.rst. Following is the format:: - # echo dev_maj:dev_minor weight > blkio.weight_device + # echo dev_maj:dev_minor weight > blkio.bfq.weight_device Configure weight=300 on /dev/sdb (8:16) in this cgroup:: - # echo 8:16 300 > blkio.weight_device - # cat blkio.weight_device + # echo 8:16 300 > blkio.bfq.weight_device + # cat blkio.bfq.weight_device dev weight 8:16 300 Configure weight=500 on /dev/sda (8:0) in this cgroup:: - # echo 8:0 500 > blkio.weight_device - # cat blkio.weight_device + # echo 8:0 500 > blkio.bfq.weight_device + # cat blkio.bfq.weight_device dev weight 8:0 500 8:16 300 Remove specific weight for /dev/sda in this cgroup:: - # echo 8:0 0 > blkio.weight_device - # cat blkio.weight_device + # echo 8:0 0 > blkio.bfq.weight_device + # cat blkio.bfq.weight_device dev weight 8:16 300 -- blkio.time - - disk time allocated to cgroup per device in milliseconds. First + blkio.time + Disk time allocated to cgroup per device in milliseconds. First two fields specify the major and minor number of the device and third field specifies the disk time allocated to group in milliseconds. -- blkio.sectors - - number of sectors transferred to/from disk by the group. First + blkio.sectors + Number of sectors transferred to/from disk by the group. First two fields specify the major and minor number of the device and third field specifies the number of sectors transferred by the group to/from the device. -- blkio.io_service_bytes - - Number of bytes transferred to/from the disk by the group. These + blkio.io_service_bytes + Number of bytes transferred to/from the disk by the group. These are further divided by the type of operation - read or write, sync or async. First two fields specify the major and minor number of the device, third field specifies the operation type and the fourth field specifies the number of bytes. -- blkio.io_serviced - - Number of IOs (bio) issued to the disk by the group. These + blkio.io_serviced + Number of IOs (bio) issued to the disk by the group. These are further divided by the type of operation - read or write, sync or async. First two fields specify the major and minor number of the device, third field specifies the operation type and the fourth field specifies the number of IOs. -- blkio.io_service_time - - Total amount of time between request dispatch and request completion + blkio.io_service_time + Total amount of time between request dispatch and request completion for the IOs done by this cgroup. This is in nanoseconds to make it meaningful for flash devices too. For devices with queue depth of 1, this time represents the actual service time. When queue_depth > 1, @@ -170,8 +175,8 @@ Proportional weight policy files specifies the operation type and the fourth field specifies the io_service_time in ns. -- blkio.io_wait_time - - Total amount of time the IOs for this cgroup spent waiting in the + blkio.io_wait_time + Total amount of time the IOs for this cgroup spent waiting in the scheduler queues for service. This can be greater than the total time elapsed since it is cumulative io_wait_time for all IOs. It is not a measure of total time the cgroup spent waiting but rather a measure of @@ -185,24 +190,24 @@ Proportional weight policy files minor number of the device, third field specifies the operation type and the fourth field specifies the io_wait_time in ns. -- blkio.io_merged - - Total number of bios/requests merged into requests belonging to this + blkio.io_merged + Total number of bios/requests merged into requests belonging to this cgroup. This is further divided by the type of operation - read or write, sync or async. -- blkio.io_queued - - Total number of requests queued up at any given instant for this + blkio.io_queued + Total number of requests queued up at any given instant for this cgroup. This is further divided by the type of operation - read or write, sync or async. -- blkio.avg_queue_size - - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. + blkio.avg_queue_size + Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. The average queue size for this cgroup over the entire time of this cgroup's existence. Queue size samples are taken each time one of the queues of this cgroup gets a timeslice. -- blkio.group_wait_time - - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. + blkio.group_wait_time + Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This is the amount of time the cgroup had to wait since it became busy (i.e., went from 0 to 1 request queued) to get a timeslice for one of its queues. This is different from the io_wait_time which is the @@ -212,8 +217,8 @@ Proportional weight policy files will only report the group_wait_time accumulated till the last time it got a timeslice and will not include the current delta. -- blkio.empty_time - - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. + blkio.empty_time + Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This is the amount of time a cgroup spends without any pending requests when not being served, i.e., it does not include any time spent idling for one of the queues of the cgroup. This is in @@ -221,8 +226,8 @@ Proportional weight policy files the stat will only report the empty_time accumulated till the last time it had a pending request and will not include the current delta. -- blkio.idle_time - - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. + blkio.idle_time + Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This is the amount of time spent by the IO scheduler idling for a given cgroup in anticipation of a better request than the existing ones from other queues/cgroups. This is in nanoseconds. If this is read @@ -230,60 +235,60 @@ Proportional weight policy files idle_time accumulated till the last idle period and will not include the current delta. -- blkio.dequeue - - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This + blkio.dequeue + Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This gives the statistics about how many a times a group was dequeued from service tree of the device. First two fields specify the major and minor number of the device and third field specifies the number of times a group was dequeued from a particular device. -- blkio.*_recursive - - Recursive version of various stats. These files show the + blkio.*_recursive + Recursive version of various stats. These files show the same information as their non-recursive counterparts but include stats from all the descendant cgroups. Throttling/Upper limit policy files ----------------------------------- -- blkio.throttle.read_bps_device - - Specifies upper limit on READ rate from the device. IO rate is + blkio.throttle.read_bps_device + Specifies upper limit on READ rate from the device. IO rate is specified in bytes per second. Rules are per device. Following is the format:: echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.read_bps_device -- blkio.throttle.write_bps_device - - Specifies upper limit on WRITE rate to the device. IO rate is + blkio.throttle.write_bps_device + Specifies upper limit on WRITE rate to the device. IO rate is specified in bytes per second. Rules are per device. Following is the format:: echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.write_bps_device -- blkio.throttle.read_iops_device - - Specifies upper limit on READ rate from the device. IO rate is + blkio.throttle.read_iops_device + Specifies upper limit on READ rate from the device. IO rate is specified in IO per second. Rules are per device. Following is the format:: echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.read_iops_device -- blkio.throttle.write_iops_device - - Specifies upper limit on WRITE rate to the device. IO rate is + blkio.throttle.write_iops_device + Specifies upper limit on WRITE rate to the device. IO rate is specified in io per second. Rules are per device. Following is the format:: echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.write_iops_device -Note: If both BW and IOPS rules are specified for a device, then IO is - subjected to both the constraints. + Note: If both BW and IOPS rules are specified for a device, then IO is + subjected to both the constraints. -- blkio.throttle.io_serviced - - Number of IOs (bio) issued to the disk by the group. These + blkio.throttle.io_serviced + Number of IOs (bio) issued to the disk by the group. These are further divided by the type of operation - read or write, sync or async. First two fields specify the major and minor number of the device, third field specifies the operation type and the fourth field specifies the number of IOs. -- blkio.throttle.io_service_bytes - - Number of bytes transferred to/from the disk by the group. These + blkio.throttle.io_service_bytes + Number of bytes transferred to/from the disk by the group. These are further divided by the type of operation - read or write, sync or async. First two fields specify the major and minor number of the device, third field specifies the operation type and the fourth field @@ -291,6 +296,6 @@ Note: If both BW and IOPS rules are specified for a device, then IO is Common files among various policies ----------------------------------- -- blkio.reset_stats - - Writing an int to this file will result in resetting all the stats + blkio.reset_stats + Writing an int to this file will result in resetting all the stats for that cgroup. diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index b1e81aa8598a..4e59925e6583 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -56,6 +56,7 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou 5-3-3. IO Latency 5-3-3-1. How IO Latency Throttling Works 5-3-3-2. IO Latency Interface Files + 5-3-4. IO Priority 5-4. PID 5-4-1. PID Interface Files 5-5. Cpuset @@ -1866,6 +1867,60 @@ IO Latency Interface Files duration of time between evaluation events. Windows only elapse with IO activity. Idle periods extend the most recent window. +IO Priority +~~~~~~~~~~~ + +A single attribute controls the behavior of the I/O priority cgroup policy, +namely the blkio.prio.class attribute. The following values are accepted for +that attribute: + + no-change + Do not modify the I/O priority class. + + none-to-rt + For requests that do not have an I/O priority class (NONE), + change the I/O priority class into RT. Do not modify + the I/O priority class of other requests. + + restrict-to-be + For requests that do not have an I/O priority class or that have I/O + priority class RT, change it into BE. Do not modify the I/O priority + class of requests that have priority class IDLE. + + idle + Change the I/O priority class of all requests into IDLE, the lowest + I/O priority class. + +The following numerical values are associated with the I/O priority policies: + ++-------------+---+ +| no-change | 0 | ++-------------+---+ +| none-to-rt | 1 | ++-------------+---+ +| rt-to-be | 2 | ++-------------+---+ +| all-to-idle | 3 | ++-------------+---+ + +The numerical value that corresponds to each I/O priority class is as follows: + ++-------------------------------+---+ +| IOPRIO_CLASS_NONE | 0 | ++-------------------------------+---+ +| IOPRIO_CLASS_RT (real-time) | 1 | ++-------------------------------+---+ +| IOPRIO_CLASS_BE (best effort) | 2 | ++-------------------------------+---+ +| IOPRIO_CLASS_IDLE | 3 | ++-------------------------------+---+ + +The algorithm to set the I/O priority class for a request is as follows: + +- Translate the I/O priority class policy into a number. +- Change the request I/O priority class into the maximum of the I/O priority + class policy number and the numerical I/O priority class. + PID --- diff --git a/Documentation/admin-guide/laptops/laptop-mode.rst b/Documentation/admin-guide/laptops/laptop-mode.rst index c984c4262f2e..b61cc601d298 100644 --- a/Documentation/admin-guide/laptops/laptop-mode.rst +++ b/Documentation/admin-guide/laptops/laptop-mode.rst @@ -101,17 +101,6 @@ this results in concentration of disk activity in a small time interval which occurs only once every 10 minutes, or whenever the disk is forced to spin up by a cache miss. The disk can then be spun down in the periods of inactivity. -If you want to find out which process caused the disk to spin up, you can -gather information by setting the flag /proc/sys/vm/block_dump. When this flag -is set, Linux reports all disk read and write operations that take place, and -all block dirtyings done to files. This makes it possible to debug why a disk -needs to spin up, and to increase battery life even more. The output of -block_dump is written to the kernel output, and it can be retrieved using -"dmesg". When you use block_dump and your kernel logging level also includes -kernel debugging messages, you probably want to turn off klogd, otherwise -the output of block_dump will be logged, causing disk activity that is not -normally there. - Configuration ------------- diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 8387ad0b0b83..003d5cc3751b 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -25,7 +25,6 @@ files can be found in mm/swap.c. Currently, these files are in /proc/sys/vm: - admin_reserve_kbytes -- block_dump - compact_memory - compaction_proactiveness - compact_unevictable_allowed @@ -106,13 +105,6 @@ On x86_64 this is about 128MB. Changing this takes effect whenever an application requests memory. -block_dump -========== - -block_dump enables block I/O debugging when set to a nonzero value. More -information on block I/O debugging is in Documentation/admin-guide/laptops/laptop-mode.rst. - - compact_memory ============== diff --git a/Documentation/block/bfq-iosched.rst b/Documentation/block/bfq-iosched.rst index 66c5a4e54130..df3a8a47f58c 100644 --- a/Documentation/block/bfq-iosched.rst +++ b/Documentation/block/bfq-iosched.rst @@ -553,20 +553,36 @@ throughput sustainable with bfq, because updating the blkio.bfq.* stats is rather costly, especially for some of the stats enabled by CONFIG_BFQ_CGROUP_DEBUG. -Parameters to set ------------------ +Parameters +---------- -For each group, there is only the following parameter to set. +For each group, the following parameters can be set: -weight (namely blkio.bfq.weight or io.bfq-weight): the weight of the -group inside its parent. Available values: 1..1000 (default 100). The -linear mapping between ioprio and weights, described at the beginning -of the tunable section, is still valid, but all weights higher than -IOPRIO_BE_NR*10 are mapped to ioprio 0. + weight + This specifies the default weight for the cgroup inside its parent. + Available values: 1..1000 (default: 100). -Recall that, if low-latency is set, then BFQ automatically raises the -weight of the queues associated with interactive and soft real-time -applications. Unset this tunable if you need/want to control weights. + For cgroup v1, it is set by writing the value to `blkio.bfq.weight`. + + For cgroup v2, it is set by writing the value to `io.bfq.weight`. + (with an optional prefix of `default` and a space). + + The linear mapping between ioprio and weights, described at the beginning + of the tunable section, is still valid, but all weights higher than + IOPRIO_BE_NR*10 are mapped to ioprio 0. + + Recall that, if low-latency is set, then BFQ automatically raises the + weight of the queues associated with interactive and soft real-time + applications. Unset this tunable if you need/want to control weights. + + weight_device + This specifies a per-device weight for the cgroup. The syntax is + `minor:major weight`. A weight of `0` may be used to reset to the default + weight. + + For cgroup v1, it is set by writing the value to `blkio.bfq.weight_device`. + + For cgroup v2, the file name is `io.bfq.weight`. [1] diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 1e894480115b..2183fd8cc350 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -480,7 +480,7 @@ prototypes:: locking rules: ======================= =================== -ops bd_mutex +ops open_mutex ======================= =================== open: yes release: yes |