summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* writeback: add more tracepointsTejun Heo2013-01-143-2/+132
| | | | | | | | | | | | | | | | | | | | | | | Add tracepoints for page dirtying, writeback_single_inode start, inode dirtying and writeback. For the latter two inode events, a pair of events are defined to denote start and end of the operations (the starting one has _start suffix and the one w/o suffix happens after the operation is complete). These inode ops are FS specific and can be non-trivial and having enclosing tracepoints is useful for external tracers. This is part of tracepoint additions to improve visiblity into dirtying / writeback operations for io tracer and userland. v2: writeback_dirty_inode[_start] TPs may be called for files on pseudo FSes w/ unregistered bdi. Check whether bdi->dev is %NULL before dereferencing. v3: buffer dirtying moved to a block TP. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: add block_{touch|dirty}_buffer tracepointTejun Heo2013-01-142-0/+55
| | | | | | | | | | | | | | | The former is triggered from touch_buffer() and the latter mark_buffer_dirty(). This is part of tracepoint additions to improve visiblity into dirtying / writeback operations for io tracer and userland. v2: Transformed writeback_dirty_buffer to block_dirty_buffer and made it share TP definition with block_touch_buffer. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* buffer: make touch_buffer() an exported functionTejun Heo2013-01-142-1/+7
| | | | | | | | | | | | | | We want to add a trace point to touch_buffer() but macros and inline functions defined in header files can't have tracing points. Move touch_buffer() to fs/buffer.c and make it a proper function. The new exported function is also declared inline. As most uses of touch_buffer() are inside buffer.c with nilfs2 as the only other user, the effect of this change should be negligible. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: add @req to bio_{front|back}_merge tracepointsTejun Heo2013-01-143-13/+38
| | | | | | | | | | | | | | bio_{front|back}_merge tracepoints report a bio merging into an existing request but didn't specify which request the bio is being merged into. Add @req to it. This makes it impossible to share the event template with block_bio_queue - split it out. @req isn't used or exported to userland at this point and there is no userland visible behavior change. Later changes will make use of the extra parameter. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: add missing block_bio_complete() tracepointTejun Heo2013-01-147-19/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bio completion didn't kick block_bio_complete TP. Only dm was explicitly triggering the TP on IO completion. This makes block_bio_complete TP useless for tracers which want to know about bios, and all other bio based drivers skip generating blktrace completion events. This patch makes all bio completions via bio_endio() generate block_bio_complete TP. * Explicit trace_block_bio_complete() invocation removed from dm and the trace point is unexported. * @rq dropped from trace_block_bio_complete(). bios may fly around w/o queue associated. Verifying and accessing the assocaited queue belongs to TP probes. * blktrace now gets both request and bio completions. Make it ignore bio completions if request completion path is happening. This makes all bio based drivers generate blktrace completion events properly and makes the block_bio_complete TP actually useful. v2: With this change, block_bio_complete TP could be invoked on sg commands which have bio's with %NULL bi_bdev. Update TP assignment code to check whether bio->bi_bdev is %NULL before dereferencing. Signed-off-by: Tejun Heo <tj@kernel.org> Original-patch-by: Namhyung Kim <namhyung@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Alasdair Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge branch 'blkcg-cfq-hierarchy' of ↵Jens Axboe2013-01-117-174/+902
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup into for-3.9/core Tejun writes: Hello, Jens. Please consider pulling from the following branch to receive cfq blkcg hierarchy support. The branch is based on top of v3.8-rc2. git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git blkcg-cfq-hierarchy The patchset was reviewd in the following thread. http://thread.gmane.org/gmane.linux.kernel.cgroups/5571
| * cfq-iosched: add hierarchical cfq_group statisticsTejun Heo2013-01-091-0/+105
| | | | | | | | | | | | | | | | | | | | Unfortunately, at this point, there's no way to make the existing statistics hierarchical without creating nasty surprises for the existing users. Just create recursive counterpart of the existing stats. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * cfq-iosched: collect stats from dead cfqgsTejun Heo2013-01-091-1/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To support hierarchical stats, it's necessary to remember stats from dead children. Add cfqg->dead_stats and make a dying cfqg transfer its stats to the parent's dead-stats. The transfer happens form ->pd_offline_fn() and it is possible that there are some residual IOs completing afterwards. Currently, we lose these stats. Given that cgroup removal isn't a very high frequency operation and the amount of residual IOs on offline are likely to be nil or small, this shouldn't be a big deal and the complexity needed to handle residual IOs - another callback and rather elaborate synchronization to reach and lock the matching q - doesn't seem justified. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()Tejun Heo2013-01-091-4/+9
| | | | | | | | | | | | | | | | | | | | Separate out cfqg_stats_reset() which takes struct cfqg_stats * from cfq_pd_reset_stats() and move the latter to where other pd methods are defined. cfqg_stats_reset() will be used to implement hierarchical stats. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lockTejun Heo2013-01-091-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | Instead of holding blkcg->lock while walking ->blkg_list and executing prfill(), RCU walk ->blkg_list and hold the blkg's queue lock while executing prfill(). This makes prfill() implementations easier as stats are mostly protected by queue lock. This will be used to implement hierarchical stats. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * block: RCU free request_queueTejun Heo2013-01-092-1/+10
| | | | | | | | | | | | | | | | RCU free request_queue so that blkcg_gq->q can be dereferenced under RCU lock. This will be used to implement hierarchical stats. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()Tejun Heo2013-01-092-0/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge(). The former two collect the [rw]stats designated by the target policy data and offset from the pd's subtree. The latter two add one [rw]stat to another. Note that the recursive sum functions require the queue lock to be held on entry to make blkg online test reliable. This is necessary to properly handle stats of a dying blkg. These will be used to implement hierarchical stats. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * blkcg: export __blkg_prfill_rwstat()Tejun Heo2013-01-091-0/+1
| | | | | | | | | | | | | | | | Hierarchical stats for cfq-iosched will need __blkg_prfill_rwstat(). Export it. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com>
| * blkcg: s/blkg_rwstat_sum()/blkg_rwstat_total()/Tejun Heo2013-01-092-4/+4
| | | | | | | | | | | | | | | | Rename blkg_rwstat_sum() to blkg_rwstat_total(). sum will be used for summing up stats from multiple blkgs. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * blkcg: implement blkcg_policy->on/offline_pd_fn() and blkcg_gq->onlineTejun Heo2013-01-092-1/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add two blkcg_policy methods, ->online_pd_fn() and ->offline_pd_fn(), which are invoked as the policy_data gets activated and deactivated while holding both blkcg and q locks. Also, add blkcg_gq->online bool, which is set and cleared as the blkcg_gq gets activated and deactivated. This flag also is toggled while holding both blkcg and q locks. These will be used to implement hierarchical stats. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * blkcg: add blkg_policy_data->plidTejun Heo2013-01-092-1/+4
| | | | | | | | | | | | | | | | Add pd->plid so that the policy a pd belongs to can be identified easily. This will be used to implement hierarchical blkg_[rw]stats. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * cfq-iosched: enable full blkcg hierarchy supportTejun Heo2013-01-093-26/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the previous two patches, all cfqg scheduling decisions are based on vfraction and ready for hierarchy support. The only thing which keeps the behavior flat is cfqg_flat_parent() which makes vfraction calculation consider all non-root cfqgs children of the root cfqg. Replace it with cfqg_parent() which returns the real parent. This enables full blkcg hierarchy support for cfq-iosched. For example, consider the following hierarchy. root / \ A:500 B:250 / \ AA:500 AB:1000 For simplicity, let's say all the leaf nodes have active tasks and are on service tree. For each leaf node, vfraction would be AA: (500 / 1500) * (500 / 750) =~ 0.2222 AB: (1000 / 1500) * (500 / 750) =~ 0.4444 B: (250 / 750) =~ 0.3333 and vdisktime will be distributed accordingly. For more detail, please refer to Documentation/block/cfq-iosched.txt. v2: cfq-iosched.txt updated to describe group scheduling as suggested by Vivek. v3: blkio-controller.txt updated. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * cfq-iosched: convert cfq_group_slice() to use cfqg->vfractionTejun Heo2013-01-091-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cfq_group_slice() calculates slice by taking a fraction of cfq_target_latency according to the ratio of cfqg->weight against service_tree->total_weight. This currently works only because all cfqgs are treated to be at the same level. To prepare for proper hierarchy support, convert cfq_group_slice() to base the calculation on cfqg->vfraction. As cfqg->vfraction is always a fraction of 1 and represents the fraction allocated to the cfqg with hierarchy considered, the slice can be simply calculated by multiplying cfqg->vfraction to cfq_target_latency (with fixed point shift factored in). As vfraction calculation currently treats all non-root cfqgs as children of the root cfqg, this patch doesn't introduce noticeable behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * cfq-iosched: implement hierarchy-ready cfq_group charge scalingTejun Heo2013-01-091-30/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, cfqg charges are scaled directly according to cfqg->weight. Regardless of the number of active cfqgs or the amount of active weights, a given weight value always scales charge the same way. This works fine as long as all cfqgs are treated equally regardless of their positions in the hierarchy, which is what cfq currently implements. It can't work in hierarchical settings because the interpretation of a given weight value depends on where the weight is located in the hierarchy. This patch reimplements cfqg charge scaling so that it can be used to support hierarchy properly. The scheme is fairly simple and light-weight. * When a cfqg is added to the service tree, v(disktime)weight is calculated. It walks up the tree to root calculating the fraction it has in the hierarchy. At each level, the fraction can be calculated as cfqg->weight / parent->level_weight By compounding these, the global fraction of vdisktime the cfqg has claim to - vfraction - can be determined. * When the cfqg needs to be charged, the charge is scaled inversely proportionally to the vfraction. The new scaling scheme uses the same CFQ_SERVICE_SHIFT for fixed point representation as before; however, the smallest scaling factor is now 1 (ie. 1 << CFQ_SERVICE_SHIFT). This is different from before where 1 was for CFQ_WEIGHT_DEFAULT and higher weight would result in smaller scaling factor. While this shifts the global scale of vdisktime a bit, it doesn't change the relative relationships among cfqgs and the scheduling result isn't different. cfq_group_notify_queue_add uses fixed CFQ_IDLE_DELAY when appending new cfqg to the service tree. The specific value of CFQ_IDLE_DELAY didn't have any relevance to vdisktime before and is unlikely to cause any visible behavior difference now especially as the scale shift isn't that large. As the new scheme now makes proper distinction between cfqg->weight and ->leaf_weight, reverse the weight aliasing for root cfqgs. For root, both weights are now mapped to ->leaf_weight instead of the other way around. Because we're still using cfqg_flat_parent(), this patch shouldn't change the scheduling behavior in any noticeable way. v2: Beefed up comments on vfraction as requested by Vivek. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * cfq-iosched: implement cfq_group->nr_active and ->children_weightTejun Heo2013-01-091-0/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To prepare for blkcg hierarchy support, add cfqg->nr_active and ->children_weight. cfqg->nr_active counts the number of active cfqgs at the cfqg's level and ->children_weight is sum of weights of those cfqgs. The level covers itself (cfqg->leaf_weight) and immediate children. The two values are updated when a cfqg enters and leaves the group service tree. Unless the hierarchy is very deep, the added overhead should be negligible. Currently, the parent is determined using cfqg_flat_parent() which makes the root cfqg the parent of all other cfqgs. This is to make the transition to hierarchy-aware scheduling gradual. Scheduling logic will be converted to use cfqg->children_weight without actually changing the behavior. When everything is ready, blkcg_weight_parent() will be replaced with proper parent function. This patch doesn't introduce any behavior chagne. v2: s/cfqg->level_weight/cfqg->children_weight/ as per Vivek. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * cfq-iosched: add leaf_weightTejun Heo2013-01-093-9/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cfq blkcg is about to grow proper hierarchy handling, where a child blkg's weight would nest inside the parent's. This makes tasks in a blkg to compete against both tasks in the sibling blkgs and the tasks of child blkgs. We're gonna use the existing weight as the group weight which decides the blkg's weight against its siblings. This patch introduces a new weight - leaf_weight - which decides the weight of a blkg against the child blkgs. It's named leaf_weight because another way to look at it is that each internal blkg nodes have a hidden child leaf node which contains all its tasks and leaf_weight is the weight of the leaf node and handled the same as the weight of the child blkgs. This patch only adds leaf_weight fields and exposes it to userland. The new weight isn't actually used anywhere yet. Note that cfq-iosched currently offcially supports only single level hierarchy and root blkgs compete with the first level blkgs - ie. root weight is basically being used as leaf_weight. For root blkgs, the two weights are kept in sync for backward compatibility. v2: cfqd->root_group->leaf_weight initialization was missing from cfq_init_queue() causing divide by zero when !CONFIG_CFQ_GROUP_SCHED. Fix it. Reported by Fengguang. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Fengguang Wu <fengguang.wu@intel.com>
| * blkcg: make blkcg_gq's hierarchicalTejun Heo2013-01-092-5/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently a child blkg (blkcg_gq) can be created even if its parent doesn't exist. ie. Given a blkg, it's not guaranteed that its ancestors will exist. This makes it difficult to implement proper hierarchy support for blkcg policies. Always create blkgs recursively and make a child blkg hold a reference to its parent. blkg->parent is added so that finding the parent is easy. blkcg_parent() is also added in the process. This change can be visible to userland. e.g. while issuing IO in a nested cgroup didn't affect the ancestors at all, now it will initialize all ancestor blkgs and zero stats for the request_queue will always appear on them. While this is userland visible, this shouldn't cause any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * blkcg: cosmetic updates to blkg_create()Tejun Heo2013-01-091-8/+7
| | | | | | | | | | | | | | | | | | | | | | * Rename out_* labels to err_*. * Do ERR_PTR() conversion once in the error return path. This patch is cosmetic and to prepare for the hierarchy support. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * blkcg: reorganize blkg_lookup_create() and friendsTejun Heo2013-01-091-23/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reorganize such that * __blkg_lookup() takes bool param @update_hint to determine whether to update hint. * __blkg_lookup_create() no longer performs lookup before trying to create. Renamed to blkg_create(). * blkg_lookup_create() now performs lookup and then invokes blkg_create() if lookup fails. * root_blkg creation in blkcg_activate_policy() updated accordingly. Note that blkcg_activate_policy() no longer updates lookup hint if root_blkg already exists. Except for the last lookup hint bit which is immaterial, this is pure reorganization and doesn't introduce any visible behavior change. This is to prepare for proper hierarchy support. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * blkcg: fix minor bug in blkg_alloc()Tejun Heo2013-01-091-1/+1
| | | | | | | | | | | | | | | | | | | | blkg_alloc() was mistakenly checking blkcg_policy_enabled() twice. The latter test should have been on whether pol->pd_init_fn() exists. This doesn't cause actual problems because both blkcg policies implement pol->pd_init_fn(). Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
| * cfq-iosched: Print sync-noidle information in blktrace messagesVivek Goyal2013-01-091-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we attach a character "S" or "A" to the cfqq<pid>, to represent whether queues is sync or async. Add one more character "N" to represent whether it is sync-noidle queue or sync queue. So now three different type of queues will look as follows. cfq1234S --> sync queus cfq1234SN --> sync noidle queue cfq1234A --> Async queue Previously S/A classification was being printed only if group scheduling was enabled. This patch also makes sure that this classification is displayed even if group idling is disabled. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * cfq-iosched: Get rid of unnecessary local variableVivek Goyal2013-01-091-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | Use of local varibale "n" seems to be unnecessary. Remove it. This brings it inline with function __cfq_group_st_add(), which is also doing the similar operation of adding a group to a rb tree. No functionality change here. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * cfq-iosched: Rename few functions related to selecting workloadVivek Goyal2013-01-091-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | choose_service_tree() selects/sets both wl_class and wl_type. Rename it to choose_wl_class_and_type() to make it very clear. cfq_choose_wl() only selects and sets wl_type. It is easy to confuse it with choose_st(). So rename it to cfq_choose_wl_type() to make it clear what does it do. Just renaming. No functionality change. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * cfq-iosched: Rename "service_tree" to "st" at some placesVivek Goyal2013-01-091-41/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | At quite a few places we use the keyword "service_tree". At some places, especially local variables, I have abbreviated it to "st". Also at couple of places moved binary operator "+" from beginning of line to end of previous line, as per Tejun's feedback. v2: Reverted most of the service tree name change based on Jeff Moyer's feedback. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * cfq-iosched: More renaming to better represent wl_class and wl_typeVivek Goyal2013-01-091-31/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some more renaming. Again making the code uniform w.r.t use of wl_class/class to represent IO class (RT, BE, IDLE) and using wl_type/type to represent subclass (SYNC, SYNC-IDLE, ASYNC). At places this patch shortens the string "workload" to "wl". Renamed "saved_workload" to "saved_wl_type". Renamed "saved_serving_class" to "saved_wl_class". For uniformity with "saved_wl_*" variables, renamed "serving_class" to "serving_wl_class" and renamed "serving_type" to "serving_wl_type". Again, just trying to improve upon code uniformity and improve readability. No functional change. v2: - Restored the usage of keyword "service" based on Jeff Moyer's feedback. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * cfq-iosched: Properly name all references to IO classVivek Goyal2013-01-091-33/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently CFQ has three IO classes, RT, BE and IDLE. At many a places we are calling workloads belonging to these classes as "prio". This gets very confusing as one starts to associate it with ioprio. So this patch just does bunch of renaming so that reading code becomes easier. All reference to RT, BE and IDLE workload are done using keyword "class" and all references to subclass, SYNC, SYNC-IDLE, ASYNC are made using keyword "type". This makes me feel much better while I am reading the code. There is no functionality change due to this patch. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
* | block: Remove should_sort judgement when flush blk_plugJianpeng Ma2013-01-112-13/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 975927b942c932,it add blk_rq_pos to sort rq when flushing. Although this commit was used for the situation which blk_plug handled multi devices on the same time like md device. I think there must be some situations like this but only single device. So remove the should_sort judgement. Because the parameter should_sort is only for this purpose,it can delete should_sort from blk_plug. CC: Shaohua Li <shli@kernel.org> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | block,elevator: use new hashtable implementationSasha Levin2013-01-113-21/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch elevator to use the new hashtable implementation. This reduces the amount of generic unrelated code in the elevator. This also removes the dymanic allocation of the hash table. The size of the table is constant so there's no point in paying the price of an extra dereference when accessing it. This patch depends on d9b482c ("hashtable: introduce a small and naive hashtable") which was merged in v3.6. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Linux 3.8-rc3v3.8-rc3Linus Torvalds2013-01-101-1/+1
| |
* | Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-armLinus Torvalds2013-01-097-22/+40
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ARM fixes from Russell King. * 'fixes' of git://git.linaro.org/people/rmk/linux-arm: ARM: 7616/1: cache-l2x0: aurora: Use writel_relaxed instead of writel ARM: 7615/1: cache-l2x0: aurora: Invalidate during clean operation with WT enable ARM: 7614/1: mm: fix wrong branch from Cortex-A9 to PJ4b ARM: 7612/1: imx: Do not select some errata that depends on !ARCH_MULTIPLATFORM ARM: 7611/1: VIC: fix bug in VIC irqdomain code ARM: 7610/1: versatile: bump IRQ numbers ARM: 7609/1: disable errata work-arounds which access secure registers ARM: 7608/1: l2x0: Only set .set_debug on PL310 r3p0 and earlier
| * | ARM: 7616/1: cache-l2x0: aurora: Use writel_relaxed instead of writelGregory CLEMENT2013-01-071-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of writel instead of writel_relaxed lead to deadlock in some situation (SMP on Armada 370 for instance). The use of writel_relaxed as it was done in the rest of this driver fixes this bug. Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | ARM: 7615/1: cache-l2x0: aurora: Invalidate during clean operation with WT ↵Gregory CLEMENT2013-01-071-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enable This patch fixes a bug for Aurora L2 cache controller when the write-through mode is enable. For the clean operation even if we don't have to flush the lines we still need to invalidate them. Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | ARM: 7614/1: mm: fix wrong branch from Cortex-A9 to PJ4bHaojian Zhuang2013-01-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If CONFIG_ARCH_MULTIPLATFORM & CONFIG_ARCH_MVEBU are both enabled, __v7_pj4b_setup is added between __v7_ca9mp_setup and __v7_setup. But there's no jump instruction added. If the chip is Cortex A5/A9, it goes through __v7_pj4b_setup also. It results in system hang. Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | ARM: 7612/1: imx: Do not select some errata that depends on !ARCH_MULTIPLATFORMFabio Estevam2013-01-061-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 62e4d357a (ARM: 7609/1: disable errata work-arounds which access secure registers) ARM_ERRATA_743622/751472 depends on !ARCH_MULTIPLATFORM. Since imx has been converted to multiplatform, the following warning happens: $ make imx_v6_v7_defconfig warning: (SOC_IMX6Q && ARCH_TEGRA_2x_SOC && ARCH_TEGRA_3x_SOC) selects ARM_ERRATA_751472 which has unmet direct dependencies (CPU_V7 && !ARCH_MULTIPLATFORM) warning: (SOC_IMX6Q && ARCH_TEGRA_3x_SOC) selects ARM_ERRATA_743622 which has unmet direct dependencies (CPU_V7 && !ARCH_MULTIPLATFORM) warning: (SOC_IMX6Q && ARCH_TEGRA_3x_SOC) selects ARM_ERRATA_743622 which has unmet direct dependencies (CPU_V7 && !ARCH_MULTIPLATFORM) warning: (SOC_IMX6Q && ARCH_TEGRA_2x_SOC && ARCH_TEGRA_3x_SOC) selects ARM_ERRATA_751472 which has unmet direct dependencies (CPU_V7 && !ARCH_MULTIPLATFORM) Recommended approach is to remove ARM_ERRATA_743622/751472 from being selected by SOC_IMX6Q and apply such workarounds into the bootloader. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Acked-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | ARM: 7611/1: VIC: fix bug in VIC irqdomain codeLinus Walleij2013-01-021-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VIC irqdomain code added in commit 07c9249f1fa90cc8189bed44c0bcece664596a72 "ARM: 7554/1: VIC: use irq_domain_add_simple()" Had two bugs: 1) It didn't call irq_create_mapping() once on each valid irq source in the slowpath when registering the controller. 2) It passed a -1 as IRQ offset for the DT case, whereas 0 should be passed as invalid IRQ instead. Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | ARM: 7610/1: versatile: bump IRQ numbersLinus Walleij2013-01-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Versatile starts to register Linux IRQ numbers from offset 0 which is illegal, since this is NO_IRQ. Bump all hard-coded IRQs by 32 to get rid of the problem. Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | ARM: 7609/1: disable errata work-arounds which access secure registersRob Herring2013-01-023-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to support secure and non-secure platforms in multi-platform kernels, errata work-arounds that access secure only registers need to be disabled. Make all the errata options that fit in this category depend on !CONFIG_ARCH_MULTIPLATFORM. This will effectively remove the errata options as platforms are converted over to multi-platform. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | ARM: 7608/1: l2x0: Only set .set_debug on PL310 r3p0 and earlierRob Herring2013-01-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PL310 errata work-arounds using .set_debug function are only needed on r3p0 and earlier, so check the rev and only set .set_debug on older revs. Avoiding debug register accesses fixes aborts on non-secure platforms like highbank. It is assumed that non-secure platforms needing these work-arounds have already implemented .set_debug with secure monitor calls. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* | | Merge tag 'edac_fixes_for_3.8' of ↵Linus Torvalds2013-01-092-18/+9
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC fixes from Borislav Petkov: "Two error path fixes causing a crash and a Kconfig fix for an issue which spilled all EDAC suboptions into the 'Device Drivers' menu." * tag 'edac_fixes_for_3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC: Cleanup device deregistering path EDAC: Fix EDAC Kconfig menu EDAC: Fix kernel panic on module unloading
| * | | EDAC: Cleanup device deregistering pathLans Zhang2013-01-071-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use device_unregister to replace put_device + device_del for cleanup, and fix the potential use after free. Signed-off-by: Lans Zhang <jia.zhang@windriver.com> Signed-off-by: Borislav Petkov <bp@alien8.de>
| * | | EDAC: Fix EDAC Kconfig menuBorislav Petkov2013-01-071-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After f65aad41772f("MIPS: Cavium: Add EDAC support."), when entering the "Device Drivers" toplevel menu in menuconfig, the suboptions behind EDAC appeared merged with the rest of the device drivers types. This was because the menuconfig option EDAC is querying an EDAC_SUPPORT Kconfig bool which was defined after the menu definition. When pushing EDAC_SUPPORT up, before the menu definition, the variable is defined earlier and the above menuconfig artifact doesn't happen. Drop a useless menuconfig comment while at it. Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Borislav Petkov <bp@alien8.de>
| * | | EDAC: Fix kernel panic on module unloadingKonstantin Khlebnikov2013-01-071-2/+1
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes use-after-free and double-free bugs in edac_mc_sysfs_exit(). mci_pdev has single reference and put_device() calls mc_attr_release() which calls kfree(). The following device_del() works with already released memory. An another kfree() in edac_mc_sysfs_exit() releses the same memory again. Great. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: stable@vger.kernel.org # 3.[67] Cc: Denis Kirjanov <kirjanov@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Link: http://lkml.kernel.org/r/20121214110310.11019.21098.stgit@zurg Signed-off-by: Borislav Petkov <bp@alien8.de>
* | | mm: reinstante dropped pmd_trans_splitting() checkLinus Torvalds2013-01-091-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The check for a pmd being in the process of being split was dropped by mistake by commit d10e63f29488 ("mm: numa: Create basic numa page hinting infrastructure"). Put it back. Reported-by: Dave Jones <davej@redhat.com> Debugged-by: Hillf Danton <dhillf@gmail.com> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Kirill Shutemov <kirill@shutemov.name> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | cred: Remove tgcred pointer from struct credMarc Dionne2013-01-091-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3a50597de863 ("KEYS: Make the session and process keyrings per-thread") removed the definition of the thread_group_cred structure, but left a now unused pointer in struct cred. Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds2013-01-0953-183/+351
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ARM SoC fixes from Olof Johansson: "People are back from the holiday breaks, and it shows. Here are a bunch of fixes for a number of platforms: - A couple of small fixes for Nomadik - A larger set of changes for kirkwood/mvebu - uart driver selection, dt clocks, gpio-poweroff fixups, a few __init annotation fixes and some error handling improvement in their xor dma driver. - i.MX had a couple of minor fixes (and a critical one for flexcan2 clock setup) - MXS has a small board fix and a framebuffer bugfix - A set of fixes for Samsung Exynos, fixing default bootargs and some Exynos5440 clock issues - A set of OMAP changes including PM fixes and a few sparse warning fixups All in all a bit more positive code delta than we'd ideally want to see here, mostly from the OMAP PM changes, but nothing overly crazy." * tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (44 commits) ARM: clps711x: Fix bad merge of clockevents setup ARM: highbank: save and restore L2 cache and GIC on suspend ARM: highbank: add a power request clear ARM: highbank: fix secondary boot and hotplug ARM: highbank: fix typos with hignbank in power request functions ARM: dts: fix highbank cpu mpidr values ARM: dts: add device_type prop to cpu nodes on Calxeda platforms ARM: mx5: Fix MX53 flexcan2 clock ARM: OMAP2+: am33xx-hwmod: Fix wrongly terminated am33xx_usbss_mpu_irqs array pinctrl: mvebu: make pdma clock on dove mandatory ARM: Dove: Add pinctrl clock to DT dma: mv_xor: fix error handling for clocks dma: mv_xor: fix error handling of mv_xor_channel_add() arm: mvebu: Add missing ; for cpu node. arm: mvebu: Armada XP MV78230 has only three Ethernet interfaces arm: mvebu: Armada XP MV78230 has two cores, not one clk: mvebu: Remove inappropriate __init tagging ARM: Kirkwood: Use fixed-regulator instead of board gpio call ARM: Kirkwood: Fix missing sdio clock ARM: Kirkwood: Switch TWSI1 of 88f6282 to DT clock providers ...