summaryrefslogtreecommitdiffstats
path: root/block/cfq-iosched.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* cfq_get_queue: fix possible NULL pointer accessOleg Nesterov2007-10-291-1/+4
| | | | | | | | | | | | cfq_get_queue()->cfq_find_alloc_queue() can fail, check the returned value. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Note that this isn't a bug at the moment, since the regular IO path does not call this path without __GFP_WAIT set. However, it could be a future bug, so I've applied it. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq_exit_queue() should cancel cfq_data->unplug_workOleg Nesterov2007-10-291-1/+1
| | | | | | | | | | | | | Spotted by Nick <gentuu@gmail.com>, perhaps explains the first trace in http://bugzilla.kernel.org/show_bug.cgi?id=9180. cfq_exit_queue() should cancel cfqd->unplug_work before freeing cfqd. blk_sync_queue() seems unneeded, removed. Q: why cfq_exit_queue() calls cfq_shutdown_timer_wq() twice? Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [BLOCK] Get rid of request_queue_t typedefJens Axboe2007-07-241-19/+20
| | | | | | | | | Some of the code has been gradually transitioned to using the proper struct request_queue, but there's lots left. So do a full sweet of the kernel and get rid of this typedef and replace its uses with the proper type. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq: Write-only stuff in CFQ data structuresAlexey Dobriyan2007-07-201-11/+0
| | | | | | | | There are some leftover bits from the task cooperator patch, that was yanked out again. While it will get reintroduced, no point in having this write-only stuff in the tree. So yank it. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq: async queue allocation per priorityVasily Tarasov2007-07-201-12/+44
| | | | | | | | | | | | If we have two processes with different ioprio_class, but the same ioprio_data, their async requests will fall into the same queue. I guess such behavior is not expected, because it's not right to put real-time requests and best-effort requests in the same queue. The attached patch fixes the problem by introducing additional *cfqq fields on cfqd, pointing to per-(class,priority) async queues. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* Slab allocators: Replace explicit zeroing with __GFP_ZEROChristoph Lameter2007-07-171-9/+9
| | | | | | | | | | | | | kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing variant in the past. But with __GFP_ZERO it is possible now to do zeroing while allocating. Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever we can. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cfq-iosched: fix async queue behaviourJens Axboe2007-07-101-3/+36
| | | | | | | | | | | With the cfq_queue hash removal, we inadvertently got rid of the async queue sharing. This was not intentional, in fact CFQ purposely shares the async queue per priority level to get good merging for async writes. So put some logic in cfq_get_queue() to track the shared queues. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* KMEM_CACHE(): simplify slab cache creationChristoph Lameter2007-05-071-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides a new macro KMEM_CACHE(<struct>, <flags>) to simplify slab creation. KMEM_CACHE creates a slab with the name of the struct, with the size of the struct and with the alignment of the struct. Additional slab flags may be specified if necessary. Example struct test_slab { int a,b,c; struct list_head; } __cacheline_aligned_in_smp; test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC) will create a new slab named "test_slab" of the size sizeof(struct test_slab) and aligned to the alignment of test slab. If it fails then we panic. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cfq-iosched: speedup cic rb lookupJens Axboe2007-04-301-2/+18
| | | | | | | We often lookup the same queue many times in succession, so cache the last looked up queue to avoid browsing the rbtree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: get rid of cfqq hashVasily Tarasov2007-04-301-100/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cfq hash is no more necessary. We always can get cfqq from io context. cfq_get_io_context_noalloc() function is introduced, because we don't want to allocate cic on merging and checking may_queue. In order to identify sync queue we've used hash key = CFQ_KEY_ASYNC. Since hash is eliminated we need to use other criterion: sync flag for queue is added. In all places where we dig in rb_tree we're in current context, so no additional locking is required. Advantages of this patch: no additional memory for hash, no seeking in hash, code is cleaner. But it is necessary now to seek cic in per-ioc rbtree, but it is faster: - most processes work only with few devices - most systems have only few block devices - it is a rb-tree Signed-off-by: Vasily Tarasov <vtaras@openvz.org> Changes by me: - Merge into CFQ devel branch - Get rid of cfq_get_io_context_noalloc() - Fix various bugs with dereferencing cic->cfqq[] with offset other than 0 or 1. - Fix bug in cfqq setup, is_sync condition was reversed. - Fix bug where only bio_sync() is used, we need to check for a READ too Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: tighten queue request overlap conditionJens Axboe2007-04-301-1/+2
| | | | | | | For tagged devices, allow overlap of requests if the idle window isn't enabled on the current active queue. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: improve sync vs async workloadsJens Axboe2007-04-301-13/+18
| | | | Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: never allow an async queue idlingJens Axboe2007-04-301-1/+6
| | | | | | | We don't enable it by default, don't let it get enabled during runtime. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: get rid of ->dispatch_sliceJens Axboe2007-04-301-5/+1
| | | | | | | We can track it fairly accurately locally, let the slice handling take care of the rest. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: don't pass unused preemption variable aroundJens Axboe2007-04-301-15/+13
| | | | | | We don't use it anymore in the slice expiry handling. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: get rid of ->cur_rr and ->cfq_listJens Axboe2007-04-301-55/+32
| | | | | | | | | It's only used for preemption now that the IDLE and RT queues also use the rbtree. If we pass an 'add_front' variable to cfq_service_tree_add(), we can set ->rb_key to 0 to force insertion at the front of the tree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: slice offset should take ioprio into accountJens Axboe2007-04-301-1/+2
| | | | | | Use the max_slice-cur_slice as the multipler for the insertion offset. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] cfq-iosched: style cleanups and commentsJens Axboe2007-04-301-16/+50
| | | | Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: sort IDLE queues into the rbtreeJens Axboe2007-04-301-36/+31
| | | | | | | Same treatment as the RT conversion, just put the sorted idle branch at the end of the tree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: sort RT queues into the rbtreeJens Axboe2007-04-301-15/+12
| | | | | | | | | Currently CFQ does a linked insert into the current list for RT queues. We can just factor the class into the rb insertion, and then we don't have to treat RT queues in a special way. It's faster, too. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] cfq-iosched: speed up rbtree handlingJens Axboe2007-04-301-14/+48
| | | | | | | | | | | | | | For cases where the rbtree is mainly used for sorting and min retrieval, a nice speedup of the rbtree code is to maintain a cache of the leftmost node in the tree. Also spotted in the CFS CPU scheduler code. Improved by Alan D. Brunelle <Alan.Brunelle@hp.com> by updating the leftmost hint in cfq_rb_first() if it isn't set, instead of only updating it on insert. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: rework the whole round-robin list conceptJens Axboe2007-04-301-238/+123
| | | | | | | | | | | | | | | | | | | | | Drawing on some inspiration from the CFS CPU scheduler design, overhaul the pending cfq_queue concept list management. Currently CFQ uses a doubly linked list per priority level for sorting and service uses. Kill those lists and maintain an rbtree of cfq_queue's, sorted by when to service them. This unfortunately means that the ionice levels aren't as strong anymore, will work on improving those later. We only scale the slice time now, not the number of times we service. This means that latency is better (for all priority levels), but that the distinction between the highest and lower levels aren't as big. The diffstat speaks for itself. cfq-iosched.c | 363 +++++++++++++++++--------------------------------- 1 file changed, 125 insertions(+), 238 deletions(-) Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: minor updatesJens Axboe2007-04-301-63/+18
| | | | | | | | | | | - Move the queue_new flag clear to when the queue is selected - Only select the non-first queue in cfq_get_best_queue(), if there's a substantial difference between the best and first. - Get rid of ->busy_rr - Only select a close cooperator, if the current queue is known to take a while to "think". Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: development updateJens Axboe2007-04-301-120/+261
| | | | | | | | | | | | | | | | | | | | | | | - Implement logic for detecting cooperating processes, so we choose the best available queue whenever possible. - Improve residual slice time accounting. - Remove dead code: we no longer see async requests coming in on sync queues. That part was removed a long time ago. That means that we can also remove the difference between cfq_cfqq_sync() and cfq_cfqq_class_sync(), they are now indentical. And we can kill the on_dispatch array, just make it a counter. - Allow a process to go into the current list, if it hasn't been serviced in this scheduler tick yet. Possible future improvements including caching the cfqq lookup in cfq_close_cooperator(), so we don't have to look it up twice. cfq_get_best_queue() should just use that last decision instead of doing it again. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: improve preemption for cooperating tasksJens Axboe2007-04-301-6/+20
| | | | | | | | | | When testing the syslet async io approach, I discovered that CFQ sometimes didn't perform as well as expected. cfq_should_preempt() needs to better check for cooperating tasks, so fix that by allowing preemption of an equal priority queue if the recently queued request is as good a candidate for IO as the one we are currently waiting for. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: fix alias + front merge bugJens Axboe2007-04-251-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a really rare and obscure bug in CFQ, that causes a crash in cfq_dispatch_insert() due to rq == NULL. One example of the resulting oops is seen here: http://lkml.org/lkml/2007/4/15/41 Neil correctly diagnosed the situation for how this can happen: if two concurrent requests with the exact same sector number (due to direct IO or aliasing between MD and the raw device access), the alias handling will add the request to the sortlist, but next_rq remains NULL. Read the more complete analysis at: http://lkml.org/lkml/2007/4/25/57 This looks like it requires md to trigger, even though it should potentially be possible to due with O_DIRECT (at least if you edit the kernel and doctor some of the unplug calls). The fix is to move the ->next_rq update to when we add a request to the rbtree. Then we remove the possibility for a request to exist in the rbtree code, but not have ->next_rq correctly updated. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cfq-iosched: fix sequential write regressionJens Axboe2007-04-211-15/+19
| | | | | | | | | | | | | | | We have a 10-15% performance regression for sequential writes on TCQ/NCQ enabled drives in 2.6.21-rcX after the CFQ update went in. It has been reported by Valerie Clement <valerie.clement@bull.net> and the Intel testing folks. The regression is because of CFQ's now more aggressive queue control, limiting the depth available to the device. This patches fixes that regression by allowing a greater depth when only one queue is busy. It has been tested to not impact sync-vs-async workloads too much - we still do a lot better than 2.6.20. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cfq-iosched: improve continue or break logic in cfq_dispatchJens Axboe2007-02-111-8/+8
| | | | | | | This improves performance considerably for sync requests when you have command queuing enabled. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: remove the implicit queue kicking in slice expireJens Axboe2007-02-111-6/+6
| | | | | | | We only really need it for a process going away, so move it to those locations. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: check whether a queue timed out in accountingJens Axboe2007-02-111-14/+18
| | | | | | Makes it more fair for the residual slice count. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: tweak the FIFO checkingJens Axboe2007-02-111-3/+4
| | | | | | | | We currently check the FIFO once per slice. Optimize that a bit and only do it as the first thing for a new slice, so we don't end up doing a single request and then seek to the FIFO requests. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: don't pass in queue for cfq_arm_slice_timer()Jens Axboe2007-02-111-5/+4
| | | | | | | It must always be the active queue, otherwise it's a bug. So just use the active_queue, don't pass it in explicitly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: account for slice over/under timeJens Axboe2007-02-111-20/+12
| | | | | | | | If a slice uses less than it is entitled to (or perhaps more), include that in the decision on how much time to give it the next time it gets serviced. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: defer slice activation to first request being activeJens Axboe2007-02-111-38/+53
| | | | | | | This better matches what time the queue is actually spending doing IO. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] cfq-iosched: use last service point as the fairness criteriaJens Axboe2007-02-111-14/+34
| | | | | | | | Right now we use slice_start, which gives async queues an unfair advantage. Chance that to service_last, and base the resorter on that. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: document the cfqq flagsJens Axboe2007-02-111-9/+9
| | | | Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] cfq-iosched: move on_rr check into cfq_resort_rr_list()Jens Axboe2007-02-111-10/+9
| | | | | | | Move the on_rr check into cfq_resort_rr_list(), every call site needs to check it anyway. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: remove cfq_io_context last_queueJens Axboe2007-02-111-17/+2
| | | | | | | It hasn't been used for a while, kill it off and remove the old if 0 code chunk. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] cfq-iosched: merging problemJens Axboe2007-01-021-3/+3
| | | | | | | | | | | | | | | | | | | Two issues: - The final return 1 should be a return 0, otherwise comparing cfqq is a noop. - bio_sync() only checks the sync flag, while rq_is_sync() checks both for READ and sync. The latter is what we want. Expand the bio check to include reads, and relax the restriction to allow merging of async io into sync requests. In the future we want to clean up the SYNC logic, right now it means both sync request (such as READ and O_DIRECT WRITE) and unplug-on-issue. Leave that for later. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cfq-iosched: tighten allow merge criteriaJens Axboe2006-12-221-13/+8
| | | | | | | | | | | | The logic in cfq_allow_merge() wasn't clear enough - basically allow merging for the same queues only. Do a fast check for 'rq and bio both sync/async' before doing the cfqq hash lookup. This is verified to work with the fixed elv_try_merge() from commit bb4067e34159648d394943d5e2a011f838bff22f. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cfq-iosched: don't allow sync merges across queuesJens Axboe2006-12-201-0/+33
| | | | | | | | | | | | Currently we allow any merge, even if the io originates from different processes. This can cause really bad starvation and unfairness, if those ios happen to be synchronous (reads or direct writes). So add a allow_merge hook to the io scheduler ops, so an io scheduler can help decide whether a bio/process combination may be merged with an existing request. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] Propagate down request sync flagJens Axboe2006-12-131-6/+12
| | | | | | | | We need to do this, otherwise the io schedulers don't get access to the sync flag. Then they cannot tell the difference between a regular write and an O_DIRECT write, which can cause a performance loss. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] slab: remove kmem_cache_tChristoph Lameter2006-12-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge branch 'master' of ↵David Howells2006-12-051-5/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/infiniband/core/iwcm.c drivers/net/chelsio/cxgb2.c drivers/net/wireless/bcm43xx/bcm43xx_main.c drivers/net/wireless/prism54/islpci_eth.c drivers/usb/core/hub.h drivers/usb/input/hid-core.c net/core/netpoll.c Fix up merge failures with Linus's head and fix new compilation failures. Signed-Off-By: David Howells <dhowells@redhat.com>
| * [BLOCK] Cleanup unused variable passingJens Axboe2006-12-011-5/+4
| | | | | | | | | | | | | | | | - ->init_queue() does not need the elevator passed in - ->put_request() is a hot path and need not have the queue passed in - cfq_update_io_seektime() does not need cfqd passed in Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* | WorkStruct: Pass the work_struct pointer instead of context dataDavid Howells2006-11-221-3/+5
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the work_struct pointer to the work function rather than context data. The work function can use container_of() to work out the data. For the cases where the container of the work_struct may go away the moment the pending bit is cleared, it is made possible to defer the release of the structure by deferring the clearing of the pending bit. To make this work, an extra flag is introduced into the management side of the work_struct. This governs auto-release of the structure upon execution. Ordinarily, the work queue executor would release the work_struct for further scheduling or deallocation by clearing the pending bit prior to jumping to the work function. This means that, unless the driver makes some guarantee itself that the work_struct won't go away, the work function may not access anything else in the work_struct or its container lest they be deallocated.. This is a problem if the auxiliary data is taken away (as done by the last patch). However, if the pending bit is *not* cleared before jumping to the work function, then the work function *may* access the work_struct and its container with no problems. But then the work function must itself release the work_struct by calling work_release(). In most cases, automatic release is fine, so this is the default. Special initiators exist for the non-auto-release case (ending in _NAR). Signed-Off-By: David Howells <dhowells@redhat.com>
* [PATCH] CFQ: request <-> request merging rr_list fixupJens Axboe2006-10-311-3/+3
| | | | | | | | | | | | | | In very rare circumstances would we be pruning a merged request and at the same time delete the implicated cfqq from the rr_list, and not readd it when the merged request got added. This could cause io stalls until that process issued io again. Fix it up by putting the rr_list add handling into cfq_add_rq_rb(), identical to how pruning is handled in cfq_del_rq_rb(). This fixes a hang reproducible with fsx-linux. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] CFQ: bad locking in changed_ioprio()Jens Axboe2006-10-301-2/+3
| | | | | | | | | When the ioprio code recently got juggled a bit, a bug was introduced. changed_ioprio() is no longer called with interrupts disabled, so using plain spin_lock() on the queue_lock is a bug. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] CFQ: use irq safe locking in cfq_cic_link()Jens Axboe2006-10-301-2/+3
| | | | | | | | | | | | | | If cfq_set_request() is called for a new process AND a non-fs io request (so that __GFP_WAIT may not be set), cfq_cic_link() may use spin_lock_irq() and spin_unlock_irq() with interrupts already disabled. Fix is to always use irq safe locking in cfq_cic_link() Acked-By: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] completions: lockdep annotate on stack completionsPeter Zijlstra2006-10-011-1/+1
| | | | | | | | | | | All on stack DECLARE_COMPLETIONs should be replaced by: DECLARE_COMPLETION_ONSTACK Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>