summaryrefslogtreecommitdiffstats
path: root/block/cfq-iosched.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* block: Add flag for telling the IO schedulers NOT to anticipate more IOJens Axboe2009-04-061-1/+3
| | | | | | | | | | | | | | | By default, CFQ will anticipate more IO from a given io context if the previously completed IO was sync. This used to be fine, since the only sync IO was reads and O_DIRECT writes. But with more "normal" sync writes being used now, we don't want to anticipate for those. Add a bio/request flag that informs the IO scheduler that this is a sync request that we should not idle for. Introduce WRITE_ODIRECT specifically for O_DIRECT writes, and make sure that the other sync writes set this flag. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cfq-iosched: Allow RT requests to pre-empt ongoing BE timesliceDivyesh Shah2009-01-301-1/+38
| | | | | | | | | | | | | | | | | | This patch adds the ability to pre-empt an ongoing BE timeslice when a RT request is waiting for the current timeslice to complete. This reduces the wait time to disk for RT requests from an upper bound of 4 (current value of cfq_quantum) to 1 disk request. Applied Jens' suggeested changes to avoid the rb lookup and use !cfq_class_rt() and retested. Latency(secs) for the RT task when doing sequential reads from 10G file. | only RT | RT + BE | RT + BE + this patch small (512 byte) reads | 143 | 163 | 145 large (1Mb) reads | 142 | 158 | 146 Signed-off-by: Divyesh Shah <dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: fix race between exiting queue and exiting taskJens Axboe2008-12-291-1/+9
| | | | | | | | | | | | | | | | | | Original patch from Nikanth Karthikesan <knikanth@suse.de> When a queue exits the queue lock is taken and cfq_exit_queue() would free all the cic's associated with the queue. But when a task exits, cfq_exit_io_context() gets cic one by one and then locks the associated queue to call __cfq_exit_single_io_context. It looks like between getting a cic from the ioc and locking the queue, the queue might have exited on another cpu. Fix this by rechecking the cfq_io_context queue key inside the queue lock again, and not calling into __cfq_exit_single_io_context() if somebody beat us to it. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: remove limit of dispatch depth of max 4 times quantumJens Axboe2008-12-291-6/+2
| | | | | | | | | | This basically limits the hardware queue depth to 4*quantum at any point in time, which is 16 with the default settings. As CFQ uses other means to shrink the hardware queue when necessary in the first place, there's really no need for this extra heuristic. Additionally, it ends up hurting performance in some cases. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: get rid of elevator_t typedefJens Axboe2008-12-291-3/+3
| | | | | | Just use struct elevator_queue everywhere instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: use cancel_work_sync() instead of kblockd_flush_work()Cheng Renquan2008-12-291-1/+1
| | | | | | | | | | | | After many improvements on kblockd_flush_work, it is now identical to cancel_work_sync, so a direct call to cancel_work_sync is suggested. The only difference is that cancel_work_sync is a GPL symbol, so no non-GPL modules anymore. Signed-off-by: Cheng Renquan <crquan@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: as/cfq ssd idle check updateJens Axboe2008-10-091-2/+4
| | | | | | | | | We really need to know about the hardware tagging support as well, since if the SSD does not do tagging then we still want to idle. Otherwise have the same dependent sync IO vs flooding async IO problem as on rotational media. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: add queue flag for SSD/non-rotational devicesJens Axboe2008-10-091-0/+6
| | | | | | | | | We don't want to idle in AS/CFQ if the device doesn't have a seek penalty. So add a QUEUE_FLAG_NONROT to indicate a non-rotational device, low level drivers should set this flag upon discovery of an SSD or similar device type. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: fix queue depth detectionAaron Carroll2008-10-091-9/+38
| | | | | | | | | | | | | | CFQ's detection of queueing devices assumes a non-queuing device and detects if the queue depth reaches a certain threshold. Under some workloads (e.g. synchronous reads), CFQ effectively forces a unit queue depth, thus defeating the detection logic. This leads to poor performance on queuing hardware, since the idle window remains enabled. This patch inverts the sense of the logic: assume a queuing-capable device, and detect if the depth does not exceed the threshold. Signed-off-by: Aaron Carroll <aaronc@gelato.unsw.edu.au> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: make kblockd_schedule_work() take the queue as parameterJens Axboe2008-10-091-1/+1
| | | | | | Preparatory patch for checking queuing affinity. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: get rid of enable_idle being unused warningJens Axboe2008-07-031-1/+1
| | | | Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: add message logging through blktraceJens Axboe2008-07-031-10/+55
| | | | | | | Now that blktrace has the ability to carry arbitrary messages in its stream, use that for some CFQ logging. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: properly protect ioc_gone and ioc countJens Axboe2008-07-031-3/+15
| | | | | | | | | | | If we have multiple tasks freeing cfq_io_contexts when cfq-iosched is being unloaded, we could complete() ioc_gone twice. Fix that by protecting ioc_gone complete() and clearing with a spinlock for just that purpose. Doesn't matter from a performance perspective, since it'll only enter that path when ioc_gone != NULL (when cfq-iosched is being rmmod'ed). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: fix RCU problem in cfq_cic_lookup()Jens Axboe2008-05-281-2/+26
| | | | | | | | | | | | | cfq_cic_lookup() needs to properly protect ioc->ioc_data before dereferencing it and also exclude updaters of ioc->ioc_data as well. Also add a number of comments documenting why the existing RCU usage is OK. Thanks a lot to "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> for review and comments! Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: reorder cfq_queue to save space on 64bit buildsRichard Kennedy2008-05-281-4/+4
| | | | | | | | saves 8 bytes of padding & increases objects/slab from 30 to 32 on my AMD64 config Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: make io priorities inherit CPU scheduling class as well as niceJens Axboe2008-05-071-2/+2
| | | | | | | | We currently set all processes to the best-effort scheduling class, regardless of what CPU scheduling class they belong to. Improve that so that we correctly track idle and rt scheduling classes as well. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: fix RCU race in the cfq io_context destructor handlingJens Axboe2008-05-071-6/+13
| | | | | | | | | | | | | | put_io_context() drops the RCU read lock before calling into cfq_dtor(), however we need to hold off freeing there before grabbing and dereferencing the first object on the list. So extend the rcu_read_lock() scope to cover the calling of cfq_dtor(), and optimize cfq_free_io_context() to use a new variant for call_for_each_cic() that assumes the RCU read lock is already held. Hit in the wild by Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: do not leak ioc_data across iosched switchesFabio Checconi2008-04-101-3/+6
| | | | | | | | | | | | | | | | | | | | | When switching scheduler from cfq, cfq_exit_queue() does not clear ioc->ioc_data, leaving a dangling pointer that can deceive the following lookups when the iosched is switched back to cfq. The pattern that can trigger that is the following: - elevator switch from cfq to something else; - module unloading, with elv_unregister() that calls cfq_free_io_context() on ioc freeing the cic (via the .trim op); - module gets reloaded and the elevator switches back to cfq; - reallocation of a cic at the same address as before (with a valid key). To fix it just assign NULL to ioc_data in __cfq_exit_single_io_context(), that is called from the regular exit path and from the elevator switching code. The only path that frees a cic and is not covered is the error handling one, but cic's freed in this way are never cached in ioc_data. Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: fix rcu freeing of cfq io contextsFabio Checconi2008-04-021-30/+27
| | | | | | | | | | SLAB_DESTROY_BY_RCU is not a direct substitute for normal call_rcu() freeing, since it'll page freeing but NOT object freeing. So change cfq to do the freeing on its own. Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: add hlist for browsing parallel to the radix treeJens Axboe2008-02-191-26/+12
| | | | | | | | | | | It's cumbersome to browse a radix tree from start to finish, especially since we modify keys when a process exits. So add a hlist for the single purpose of browsing over all known cfq_io_contexts, used for exit, io prio change, etc. This fixes http://bugzilla.kernel.org/show_bug.cgi?id=9948 Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: make checkpatch compliantJens Axboe2008-02-011-37/+46
| | | | Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: kill some big inlinesJens Axboe2008-01-281-16/+11
| | | | | | Use of inlines were a bit over the top, trim them down a bit. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: relax IOPRIO_CLASS_IDLE restrictionsJens Axboe2008-01-281-83/+34
| | | | | | | | | | | | | | | | | Currently you must be root to set idle io prio class on a process. This is due to the fact that the idle class is implemented as a true idle class, meaning that it will not make progress if someone else is requesting disk access. Unfortunately this means that it opens DOS opportunities by locking down file system resources, hence it is root only at the moment. This patch relaxes the idle class a little, by removing the truly idle part (which entals a grace period with associated timer). The modifications make the idle class as close to zero impact as can be done while still guarenteeing progress. This means we can relax the root only criteria as well. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: cfq: make the io contect sharing locklessJens Axboe2008-01-281-120/+147
| | | | | | | | | | | | | | | | | The io context sharing introduced a per-ioc spinlock, that would protect the cfq io context lookup. That is a regression from the original, since we never needed any locking there because the ioc/cic were process private. The cic lookup is changed from an rbtree construct to a radix tree, which we can then use RCU to make the reader side lockless. That is the performance critical path, modifying the radix tree is only done on process creation (when that process first does IO, actually) and on process exit (if that process has done IO). As it so happens, radix trees are also much faster for this type of lookup where the key is a pointer. It's a very sparse tree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* io_context sharing - cfq changesNikanth Karthikesan2008-01-281-2/+18
| | | | | | changes in the cfq for io_context sharing Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* ioprio: move io priority from task_struct to io_contextJens Axboe2008-01-281-18/+16
| | | | | | | This is where it belongs and then it doesn't take up space for a process that doesn't do IO. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: let elv_register() return voidAdrian Bunk2007-12-181-6/+2
| | | | | | | | | elv_register() always returns 0, and there isn't anything it does where it should return an error (the only error condition is so grave that it's handled with a BUG_ON). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq_idle_class_timer: add paranoid checks for jiffies overflowOleg Nesterov2007-11-071-11/+17
| | | | | | | | | In theory, if the queue was idle long enough, cfq_idle_class_timer may have a false (and very long) timeout because jiffies can wrap into the past wrt ->last_end_request. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq: fix IOPRIO_CLASS_IDLE delaysOleg Nesterov2007-11-071-0/+1
| | | | | | | | | | | | | | | | | After the fresh boot: ionice -c3 -p $$ echo cfq >> /sys/block/XXX/queue/scheduler dd if=/dev/XXX of=/dev/null bs=512 count=1 Now dd hangs in D state and the queue is completely stalled for approximately INITIAL_JIFFIES + CFQ_IDLE_GRACE jiffies. This is because cfq_init_queue() forgets to initialize cfq_data->last_end_request. (I guess this patch is not complete, overflow is still possible) Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq: fix IOPRIO_CLASS_IDLE accountingOleg Nesterov2007-11-071-2/+3
| | | | | | | | | | | Spotted by Nick <gentuu@gmail.com>, hopefully can explain the second trace in http://bugzilla.kernel.org/show_bug.cgi?id=9180. If ->async_idle_cfqq != NULL cfq_put_async_queues() puts it IOPRIO_BE_NR times in a loop. Fix this. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq_get_queue: fix possible NULL pointer accessOleg Nesterov2007-10-291-1/+4
| | | | | | | | | | | | cfq_get_queue()->cfq_find_alloc_queue() can fail, check the returned value. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Note that this isn't a bug at the moment, since the regular IO path does not call this path without __GFP_WAIT set. However, it could be a future bug, so I've applied it. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq_exit_queue() should cancel cfq_data->unplug_workOleg Nesterov2007-10-291-1/+1
| | | | | | | | | | | | | Spotted by Nick <gentuu@gmail.com>, perhaps explains the first trace in http://bugzilla.kernel.org/show_bug.cgi?id=9180. cfq_exit_queue() should cancel cfqd->unplug_work before freeing cfqd. blk_sync_queue() seems unneeded, removed. Q: why cfq_exit_queue() calls cfq_shutdown_timer_wq() twice? Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [BLOCK] Get rid of request_queue_t typedefJens Axboe2007-07-241-19/+20
| | | | | | | | | Some of the code has been gradually transitioned to using the proper struct request_queue, but there's lots left. So do a full sweet of the kernel and get rid of this typedef and replace its uses with the proper type. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq: Write-only stuff in CFQ data structuresAlexey Dobriyan2007-07-201-11/+0
| | | | | | | | There are some leftover bits from the task cooperator patch, that was yanked out again. While it will get reintroduced, no point in having this write-only stuff in the tree. So yank it. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq: async queue allocation per priorityVasily Tarasov2007-07-201-12/+44
| | | | | | | | | | | | If we have two processes with different ioprio_class, but the same ioprio_data, their async requests will fall into the same queue. I guess such behavior is not expected, because it's not right to put real-time requests and best-effort requests in the same queue. The attached patch fixes the problem by introducing additional *cfqq fields on cfqd, pointing to per-(class,priority) async queues. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* Slab allocators: Replace explicit zeroing with __GFP_ZEROChristoph Lameter2007-07-171-9/+9
| | | | | | | | | | | | | kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing variant in the past. But with __GFP_ZERO it is possible now to do zeroing while allocating. Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever we can. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cfq-iosched: fix async queue behaviourJens Axboe2007-07-101-3/+36
| | | | | | | | | | | With the cfq_queue hash removal, we inadvertently got rid of the async queue sharing. This was not intentional, in fact CFQ purposely shares the async queue per priority level to get good merging for async writes. So put some logic in cfq_get_queue() to track the shared queues. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* KMEM_CACHE(): simplify slab cache creationChristoph Lameter2007-05-071-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides a new macro KMEM_CACHE(<struct>, <flags>) to simplify slab creation. KMEM_CACHE creates a slab with the name of the struct, with the size of the struct and with the alignment of the struct. Additional slab flags may be specified if necessary. Example struct test_slab { int a,b,c; struct list_head; } __cacheline_aligned_in_smp; test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC) will create a new slab named "test_slab" of the size sizeof(struct test_slab) and aligned to the alignment of test slab. If it fails then we panic. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cfq-iosched: speedup cic rb lookupJens Axboe2007-04-301-2/+18
| | | | | | | We often lookup the same queue many times in succession, so cache the last looked up queue to avoid browsing the rbtree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: get rid of cfqq hashVasily Tarasov2007-04-301-100/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cfq hash is no more necessary. We always can get cfqq from io context. cfq_get_io_context_noalloc() function is introduced, because we don't want to allocate cic on merging and checking may_queue. In order to identify sync queue we've used hash key = CFQ_KEY_ASYNC. Since hash is eliminated we need to use other criterion: sync flag for queue is added. In all places where we dig in rb_tree we're in current context, so no additional locking is required. Advantages of this patch: no additional memory for hash, no seeking in hash, code is cleaner. But it is necessary now to seek cic in per-ioc rbtree, but it is faster: - most processes work only with few devices - most systems have only few block devices - it is a rb-tree Signed-off-by: Vasily Tarasov <vtaras@openvz.org> Changes by me: - Merge into CFQ devel branch - Get rid of cfq_get_io_context_noalloc() - Fix various bugs with dereferencing cic->cfqq[] with offset other than 0 or 1. - Fix bug in cfqq setup, is_sync condition was reversed. - Fix bug where only bio_sync() is used, we need to check for a READ too Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: tighten queue request overlap conditionJens Axboe2007-04-301-1/+2
| | | | | | | For tagged devices, allow overlap of requests if the idle window isn't enabled on the current active queue. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: improve sync vs async workloadsJens Axboe2007-04-301-13/+18
| | | | Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: never allow an async queue idlingJens Axboe2007-04-301-1/+6
| | | | | | | We don't enable it by default, don't let it get enabled during runtime. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: get rid of ->dispatch_sliceJens Axboe2007-04-301-5/+1
| | | | | | | We can track it fairly accurately locally, let the slice handling take care of the rest. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: don't pass unused preemption variable aroundJens Axboe2007-04-301-15/+13
| | | | | | We don't use it anymore in the slice expiry handling. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: get rid of ->cur_rr and ->cfq_listJens Axboe2007-04-301-55/+32
| | | | | | | | | It's only used for preemption now that the IDLE and RT queues also use the rbtree. If we pass an 'add_front' variable to cfq_service_tree_add(), we can set ->rb_key to 0 to force insertion at the front of the tree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: slice offset should take ioprio into accountJens Axboe2007-04-301-1/+2
| | | | | | Use the max_slice-cur_slice as the multipler for the insertion offset. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] cfq-iosched: style cleanups and commentsJens Axboe2007-04-301-16/+50
| | | | Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: sort IDLE queues into the rbtreeJens Axboe2007-04-301-36/+31
| | | | | | | Same treatment as the RT conversion, just put the sorted idle branch at the end of the tree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: sort RT queues into the rbtreeJens Axboe2007-04-301-15/+12
| | | | | | | | | Currently CFQ does a linked insert into the current list for RT queues. We can just factor the class into the rb insertion, and then we don't have to treat RT queues in a special way. It's faster, too. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>