From 77ea733884eb5520f22c36def1309fe2ab61633e Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 18 Aug 2015 14:55:24 -0700 Subject: blkcg: move io_service_bytes and io_serviced stats into blkcg_gq Currently, both cfq-iosched and blk-throttle keep track of io_service_bytes and io_serviced stats. While keeping track of them separately may be useful during development, it doesn't make much sense otherwise. Also, blk-throttle was counting bio's as IOs while cfq-iosched request's, which is more confusing than informative. This patch adds ->stat_bytes and ->stat_ios to blkg (blkcg_gq), removes the counterparts from cfq-iosched and blk-throttle and let them print from the common blkg counters. The common counters are incremented during bio issue in blkcg_bio_issue_check(). The outputs are still filtered by whether the policy has blkg_policy_data on a given blkg, so cfq's output won't show up if it has never been used for a given blkg. The only times when the outputs would differ significantly are when policies are attached on the fly or elevators are switched back and forth. Those are quite exceptional operations and I don't think they warrant keeping separate counters. v3: Update blkio-controller.txt accordingly. v2: Account IOs during bio issues instead of request completions so that bio-based drivers can be handled the same way. Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe --- Documentation/cgroups/blkio-controller.txt | 24 ++++++------------------ 1 file changed, 6 insertions(+), 18 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt index 68b6a6a470b0..12686bec37b9 100644 --- a/Documentation/cgroups/blkio-controller.txt +++ b/Documentation/cgroups/blkio-controller.txt @@ -201,7 +201,7 @@ Proportional weight policy files specifies the number of bytes. - blkio.io_serviced - - Number of IOs completed to/from the disk by the group. These + - Number of IOs (bio) issued to the disk by the group. These are further divided by the type of operation - read or write, sync or async. First two fields specify the major and minor number of the device, third field specifies the operation type and the fourth field @@ -327,18 +327,11 @@ Note: If both BW and IOPS rules are specified for a device, then IO is subjected to both the constraints. - blkio.throttle.io_serviced - - Number of IOs (bio) completed to/from the disk by the group (as - seen by throttling policy). These are further divided by the type - of operation - read or write, sync or async. First two fields specify - the major and minor number of the device, third field specifies the - operation type and the fourth field specifies the number of IOs. - - blkio.io_serviced does accounting as seen by CFQ and counts are in - number of requests (struct request). On the other hand, - blkio.throttle.io_serviced counts number of IO in terms of number - of bios as seen by throttling policy. These bios can later be - merged by elevator and total number of requests completed can be - lesser. + - Number of IOs (bio) issued to the disk by the group. These + are further divided by the type of operation - read or write, sync + or async. First two fields specify the major and minor number of the + device, third field specifies the operation type and the fourth field + specifies the number of IOs. - blkio.throttle.io_service_bytes - Number of bytes transferred to/from the disk by the group. These @@ -347,11 +340,6 @@ Note: If both BW and IOPS rules are specified for a device, then IO is device, third field specifies the operation type and the fourth field specifies the number of bytes. - These numbers should roughly be same as blkio.io_service_bytes as - updated by CFQ. The difference between two is that - blkio.io_service_bytes will not be updated if CFQ is not operating - on request queue. - Common files among various policies ----------------------------------- - blkio.reset_stats -- cgit v1.2.3 From 2ee867dcfa2eaef1063b686da55c35878b2da4a2 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 18 Aug 2015 14:55:34 -0700 Subject: blkcg: implement interface for the unified hierarchy blkcg interface grew to be the biggest of all controllers and unfortunately most inconsistent too. The interface files are inconsistent with a number of cloes duplicates. Some files have recursive variants while others don't. There's distinction between normal and leaf weights which isn't intuitive and there are a lot of stat knobs which don't make much sense outside of debugging and expose too much implementation details to userland. In the unified hierarchy, everything is always hierarchical and internal nodes can't have tasks rendering the two structural issues twisting the current interface. The interface has to be updated in a significant anyway and this is a good chance to revamp it as a whole. This patch implements blkcg interface for the unified hierarchy. * (from a previous patch) blkcg is identified by "io" instead of "blkio" on the unified hierarchy. Given that the whole interface is updated anyway, the rename shouldn't carry noticeable conversion overhead. * The original interface consisted of 27 files is replaced with the following three files. blkio.stat : per-blkcg stats blkio.weight : per-cgroup and per-cgroup-queue weight settings blkio.max : per-cgroup-queue bps and iops max limits Documentation/cgroups/unified-hierarchy.txt updated accordingly. v2: blkcg_policy->dfl_cftypes wasn't removed on blkcg_policy_unregister() corrupting the cftypes list. Fixed. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe --- Documentation/cgroups/unified-hierarchy.txt | 61 ++++++++++++++- block/blk-cgroup.c | 53 +++++++++++++ block/blk-throttle.c | 112 ++++++++++++++++++++++++++++ block/cfq-iosched.c | 61 +++++++++++++-- include/linux/blk-cgroup.h | 1 + 5 files changed, 279 insertions(+), 9 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/unified-hierarchy.txt b/Documentation/cgroups/unified-hierarchy.txt index 1ee9caf29e57..bd1ce15d5178 100644 --- a/Documentation/cgroups/unified-hierarchy.txt +++ b/Documentation/cgroups/unified-hierarchy.txt @@ -27,7 +27,7 @@ CONTENTS 5-3-1. Format 5-3-2. Control Knobs 5-4. Per-Controller Changes - 5-4-1. blkio + 5-4-1. io 5-4-2. cpuset 5-4-3. memory 6. Planned Changes @@ -203,7 +203,7 @@ other issues. The mapping from nice level to weight isn't obvious or universal, and there are various other knobs which simply aren't available for tasks. -The blkio controller implicitly creates a hidden leaf node for each +The io controller implicitly creates a hidden leaf node for each cgroup to host the tasks. The hidden leaf has its own copies of all the knobs with "leaf_" prefixed. While this allows equivalent control over internal tasks, it's with serious drawbacks. It always adds an @@ -438,9 +438,62 @@ may be specified in any order and not all pairs have to be specified. 5-4. Per-Controller Changes -5-4-1. blkio +5-4-1. io -- blk-throttle becomes properly hierarchical. +- blkio is renamed to io. The interface is overhauled anyway. The + new name is more in line with the other two major controllers, cpu + and memory, and better suited given that it may be used for cgroup + writeback without involving block layer. + +- Everything including stat is always hierarchical making separate + recursive stat files pointless and, as no internal node can have + tasks, leaf weights are meaningless. The operation model is + simplified and the interface is overhauled accordingly. + + io.stat + + The stat file. The reported stats are from the point where + bio's are issued to request_queue. The stats are counted + independent of which policies are enabled. Each line in the + file follows the following format. More fields may later be + added at the end. + + $MAJ:$MIN rbytes=$RBYTES wbytes=$WBYTES rios=$RIOS wrios=$WIOS + + io.weight + + The weight setting, currently only available and effective if + cfq-iosched is in use for the target device. The weight is + between 10 and 1000 and defaults to 500. The first line + always contains the default weight in the following format to + use when per-device setting is missing. + + default $WEIGHT + + Subsequent lines list per-device weights of the following + format. + + $MAJ:$MIN $WEIGHT + + Writing "$WEIGHT" or "default $WEIGHT" changes the default + setting. Writing "$MAJ:$MIN $WEIGHT" sets per-device weight + while "$MAJ:$MIN default" clears it. + + This file is available only on non-root cgroups. + + io.max + + The maximum bandwidth and/or iops setting, only available if + blk-throttle is enabled. The file is of the following format. + + $MAJ:$MIN rbps=$RBPS wbps=$WBPS riops=$RIOPS wiops=$WIOPS + + ${R|W}BPS are read/write bytes per second and ${R|W}IOPS are + read/write IOs per second. "max" indicates no limit. Writing + to the file follows the same format but the individual + settings may be ommitted or specified in any order. + + This file is available only on non-root cgroups. 5-4-2. cpuset diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index b5e72d756be1..88bdb73bd5e0 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -854,6 +854,53 @@ void blkg_conf_finish(struct blkg_conf_ctx *ctx) } EXPORT_SYMBOL_GPL(blkg_conf_finish); +static int blkcg_print_stat(struct seq_file *sf, void *v) +{ + struct blkcg *blkcg = css_to_blkcg(seq_css(sf)); + struct blkcg_gq *blkg; + + rcu_read_lock(); + + hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) { + const char *dname; + struct blkg_rwstat rwstat; + u64 rbytes, wbytes, rios, wios; + + dname = blkg_dev_name(blkg); + if (!dname) + continue; + + spin_lock_irq(blkg->q->queue_lock); + + rwstat = blkg_rwstat_recursive_sum(blkg, NULL, + offsetof(struct blkcg_gq, stat_bytes)); + rbytes = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_READ]); + wbytes = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_WRITE]); + + rwstat = blkg_rwstat_recursive_sum(blkg, NULL, + offsetof(struct blkcg_gq, stat_ios)); + rios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_READ]); + wios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_WRITE]); + + spin_unlock_irq(blkg->q->queue_lock); + + if (rbytes || wbytes || rios || wios) + seq_printf(sf, "%s rbytes=%llu wbytes=%llu rios=%llu wios=%llu\n", + dname, rbytes, wbytes, rios, wios); + } + + rcu_read_unlock(); + return 0; +} + +struct cftype blkcg_files[] = { + { + .name = "stat", + .seq_show = blkcg_print_stat, + }, + { } /* terminate */ +}; + struct cftype blkcg_legacy_files[] = { { .name = "reset_stats", @@ -1101,6 +1148,7 @@ struct cgroup_subsys io_cgrp_subsys = { .css_offline = blkcg_css_offline, .css_free = blkcg_css_free, .can_attach = blkcg_can_attach, + .dfl_cftypes = blkcg_files, .legacy_cftypes = blkcg_legacy_files, .legacy_name = "blkio", #ifdef CONFIG_MEMCG @@ -1273,6 +1321,9 @@ int blkcg_policy_register(struct blkcg_policy *pol) mutex_unlock(&blkcg_pol_mutex); /* everything is in place, add intf files for the new policy */ + if (pol->dfl_cftypes) + WARN_ON(cgroup_add_dfl_cftypes(&io_cgrp_subsys, + pol->dfl_cftypes)); if (pol->legacy_cftypes) WARN_ON(cgroup_add_legacy_cftypes(&io_cgrp_subsys, pol->legacy_cftypes)); @@ -1312,6 +1363,8 @@ void blkcg_policy_unregister(struct blkcg_policy *pol) goto out_unlock; /* kill the intf files first */ + if (pol->dfl_cftypes) + cgroup_rm_cftypes(pol->dfl_cftypes); if (pol->legacy_cftypes) cgroup_rm_cftypes(pol->legacy_cftypes); diff --git a/block/blk-throttle.c b/block/blk-throttle.c index a8bb2fd8f523..c75a2636dd40 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -1265,6 +1265,117 @@ static struct cftype throtl_legacy_files[] = { { } /* terminate */ }; +static u64 tg_prfill_max(struct seq_file *sf, struct blkg_policy_data *pd, + int off) +{ + struct throtl_grp *tg = pd_to_tg(pd); + const char *dname = blkg_dev_name(pd->blkg); + char bufs[4][21] = { "max", "max", "max", "max" }; + + if (!dname) + return 0; + if (tg->bps[READ] == -1 && tg->bps[WRITE] == -1 && + tg->iops[READ] == -1 && tg->iops[WRITE] == -1) + return 0; + + if (tg->bps[READ] != -1) + snprintf(bufs[0], sizeof(bufs[0]), "%llu", tg->bps[READ]); + if (tg->bps[WRITE] != -1) + snprintf(bufs[1], sizeof(bufs[1]), "%llu", tg->bps[WRITE]); + if (tg->iops[READ] != -1) + snprintf(bufs[2], sizeof(bufs[2]), "%u", tg->iops[READ]); + if (tg->iops[WRITE] != -1) + snprintf(bufs[3], sizeof(bufs[3]), "%u", tg->iops[WRITE]); + + seq_printf(sf, "%s rbps=%s wbps=%s riops=%s wiops=%s\n", + dname, bufs[0], bufs[1], bufs[2], bufs[3]); + return 0; +} + +static int tg_print_max(struct seq_file *sf, void *v) +{ + blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), tg_prfill_max, + &blkcg_policy_throtl, seq_cft(sf)->private, false); + return 0; +} + +static ssize_t tg_set_max(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct blkcg *blkcg = css_to_blkcg(of_css(of)); + struct blkg_conf_ctx ctx; + struct throtl_grp *tg; + u64 v[4]; + int ret; + + ret = blkg_conf_prep(blkcg, &blkcg_policy_throtl, buf, &ctx); + if (ret) + return ret; + + tg = blkg_to_tg(ctx.blkg); + + v[0] = tg->bps[READ]; + v[1] = tg->bps[WRITE]; + v[2] = tg->iops[READ]; + v[3] = tg->iops[WRITE]; + + while (true) { + char tok[27]; /* wiops=18446744073709551616 */ + char *p; + u64 val = -1; + int len; + + if (sscanf(ctx.body, "%26s%n", tok, &len) != 1) + break; + if (tok[0] == '\0') + break; + ctx.body += len; + + ret = -EINVAL; + p = tok; + strsep(&p, "="); + if (!p || (sscanf(p, "%llu", &val) != 1 && strcmp(p, "max"))) + goto out_finish; + + ret = -ERANGE; + if (!val) + goto out_finish; + + ret = -EINVAL; + if (!strcmp(tok, "rbps")) + v[0] = val; + else if (!strcmp(tok, "wbps")) + v[1] = val; + else if (!strcmp(tok, "riops")) + v[2] = min_t(u64, val, UINT_MAX); + else if (!strcmp(tok, "wiops")) + v[3] = min_t(u64, val, UINT_MAX); + else + goto out_finish; + } + + tg->bps[READ] = v[0]; + tg->bps[WRITE] = v[1]; + tg->iops[READ] = v[2]; + tg->iops[WRITE] = v[3]; + + tg_conf_updated(tg); + ret = 0; +out_finish: + blkg_conf_finish(&ctx); + return ret ?: nbytes; +} + +static struct cftype throtl_files[] = { + { + .name = "max", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = tg_print_max, + .write = tg_set_max, + }, + { } /* terminate */ +}; + static void throtl_shutdown_wq(struct request_queue *q) { struct throtl_data *td = q->td; @@ -1273,6 +1384,7 @@ static void throtl_shutdown_wq(struct request_queue *q) } static struct blkcg_policy blkcg_policy_throtl = { + .dfl_cftypes = throtl_files, .legacy_cftypes = throtl_legacy_files, .pd_alloc_fn = throtl_pd_alloc, diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 7a7230136e43..97da5719c87a 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -1740,7 +1740,7 @@ static int cfq_print_leaf_weight(struct seq_file *sf, void *v) static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off, - bool is_leaf_weight) + bool on_dfl, bool is_leaf_weight) { struct blkcg *blkcg = css_to_blkcg(of_css(of)); struct blkg_conf_ctx ctx; @@ -1753,9 +1753,17 @@ static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of, if (ret) return ret; - ret = -EINVAL; - if (sscanf(ctx.body, "%llu", &v) != 1) + if (sscanf(ctx.body, "%llu", &v) == 1) { + /* require "default" on dfl */ + ret = -ERANGE; + if (!v && on_dfl) + goto out_finish; + } else if (!strcmp(strim(ctx.body), "default")) { + v = 0; + } else { + ret = -EINVAL; goto out_finish; + } cfqg = blkg_to_cfqg(ctx.blkg); cfqgd = blkcg_to_cfqgd(blkcg); @@ -1779,13 +1787,13 @@ out_finish: static ssize_t cfqg_set_weight_device(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { - return __cfqg_set_weight_device(of, buf, nbytes, off, false); + return __cfqg_set_weight_device(of, buf, nbytes, off, false, false); } static ssize_t cfqg_set_leaf_weight_device(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { - return __cfqg_set_weight_device(of, buf, nbytes, off, true); + return __cfqg_set_weight_device(of, buf, nbytes, off, false, true); } static int __cfq_set_weight(struct cgroup_subsys_state *css, u64 val, @@ -2103,6 +2111,48 @@ static struct cftype cfq_blkcg_legacy_files[] = { #endif /* CONFIG_DEBUG_BLK_CGROUP */ { } /* terminate */ }; + +static int cfq_print_weight_on_dfl(struct seq_file *sf, void *v) +{ + struct blkcg *blkcg = css_to_blkcg(seq_css(sf)); + struct cfq_group_data *cgd = blkcg_to_cfqgd(blkcg); + + seq_printf(sf, "default %u\n", cgd->weight); + blkcg_print_blkgs(sf, blkcg, cfqg_prfill_weight_device, + &blkcg_policy_cfq, 0, false); + return 0; +} + +static ssize_t cfq_set_weight_on_dfl(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + char *endp; + int ret; + u64 v; + + buf = strim(buf); + + /* "WEIGHT" or "default WEIGHT" sets the default weight */ + v = simple_strtoull(buf, &endp, 0); + if (*endp == '\0' || sscanf(buf, "default %llu", &v) == 1) { + ret = __cfq_set_weight(of_css(of), v, false); + return ret ?: nbytes; + } + + /* "MAJ:MIN WEIGHT" */ + return __cfqg_set_weight_device(of, buf, nbytes, off, true, false); +} + +static struct cftype cfq_blkcg_files[] = { + { + .name = "weight", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = cfq_print_weight_on_dfl, + .write = cfq_set_weight_on_dfl, + }, + { } /* terminate */ +}; + #else /* GROUP_IOSCHED */ static struct cfq_group *cfq_lookup_cfqg(struct cfq_data *cfqd, struct blkcg *blkcg) @@ -4659,6 +4709,7 @@ static struct elevator_type iosched_cfq = { #ifdef CONFIG_CFQ_GROUP_IOSCHED static struct blkcg_policy blkcg_policy_cfq = { + .dfl_cftypes = cfq_blkcg_files, .legacy_cftypes = cfq_blkcg_legacy_files, .cpd_alloc_fn = cfq_cpd_alloc, diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index b270aef519c6..9a7c4bd45fff 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -148,6 +148,7 @@ typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkg_policy_data *pd); struct blkcg_policy { int plid; /* cgroup files for the policy */ + struct cftype *dfl_cftypes; struct cftype *legacy_cftypes; /* operations */ -- cgit v1.2.3 From 69d7fde5909b614114343974cfc52cb8ff30b544 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 18 Aug 2015 14:55:36 -0700 Subject: blkcg: use CGROUP_WEIGHT_* scale for io.weight on the unified hierarchy cgroup is trying to make interface consistent across different controllers. For weight based resource control, the knob should have the range [1, 10000] and default to 100. This patch updates cfq-iosched so that the weight range conforms. The internal calculations have enough range and the widening of the weight range shouldn't cause any problem. * blkcg_policy->cpd_bind_fn() is added. If present, this is invoked when blkcg is attached to a hierarchy. * cfq_cpd_init() is updated to use the new default value on the unified hierarchy. * cfq_cpd_bind() callback is implemented to clear per-blkg configs and apply the default config matching the hierarchy type. * cfqd->root_group->[leaf_]weight initialization in cfq_init_queue() is moved into !CONFIG_CFQ_GROUP_IOSCHED block. cfq_cpd_bind() is now responsible for initializing the initial weights when blkcg is enabled. Signed-off-by: Tejun Heo Cc: Vivek Goyal Cc: Arianna Avanzini Signed-off-by: Jens Axboe --- Documentation/cgroups/unified-hierarchy.txt | 2 +- block/blk-cgroup.c | 21 +++++++++++ block/cfq-iosched.c | 55 +++++++++++++++++++++-------- include/linux/blk-cgroup.h | 2 ++ 4 files changed, 64 insertions(+), 16 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/unified-hierarchy.txt b/Documentation/cgroups/unified-hierarchy.txt index bd1ce15d5178..e0975c2cf03d 100644 --- a/Documentation/cgroups/unified-hierarchy.txt +++ b/Documentation/cgroups/unified-hierarchy.txt @@ -464,7 +464,7 @@ may be specified in any order and not all pairs have to be specified. The weight setting, currently only available and effective if cfq-iosched is in use for the target device. The weight is - between 10 and 1000 and defaults to 500. The first line + between 1 and 10000 and defaults to 100. The first line always contains the default weight in the following format to use when per-device setting is missing. diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 88bdb73bd5e0..ac8370cb2515 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1143,11 +1143,32 @@ static int blkcg_can_attach(struct cgroup_subsys_state *css, return ret; } +static void blkcg_bind(struct cgroup_subsys_state *root_css) +{ + int i; + + mutex_lock(&blkcg_pol_mutex); + + for (i = 0; i < BLKCG_MAX_POLS; i++) { + struct blkcg_policy *pol = blkcg_policy[i]; + struct blkcg *blkcg; + + if (!pol || !pol->cpd_bind_fn) + continue; + + list_for_each_entry(blkcg, &all_blkcgs, all_blkcgs_node) + if (blkcg->cpd[pol->plid]) + pol->cpd_bind_fn(blkcg->cpd[pol->plid]); + } + mutex_unlock(&blkcg_pol_mutex); +} + struct cgroup_subsys io_cgrp_subsys = { .css_alloc = blkcg_css_alloc, .css_offline = blkcg_css_offline, .css_free = blkcg_css_free, .can_attach = blkcg_can_attach, + .bind = blkcg_bind, .dfl_cftypes = blkcg_files, .legacy_cftypes = blkcg_legacy_files, .legacy_name = "blkio", diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 0fe721eaff28..04de88463a98 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -1522,6 +1522,9 @@ static void cfq_init_cfqg_base(struct cfq_group *cfqg) } #ifdef CONFIG_CFQ_GROUP_IOSCHED +static int __cfq_set_weight(struct cgroup_subsys_state *css, u64 val, + bool on_dfl, bool reset_dev, bool is_leaf_weight); + static void cfqg_stats_exit(struct cfqg_stats *stats) { blkg_rwstat_exit(&stats->merged); @@ -1578,14 +1581,14 @@ static struct blkcg_policy_data *cfq_cpd_alloc(gfp_t gfp) static void cfq_cpd_init(struct blkcg_policy_data *cpd) { struct cfq_group_data *cgd = cpd_to_cfqgd(cpd); + unsigned int weight = cgroup_on_dfl(blkcg_root.css.cgroup) ? + CGROUP_WEIGHT_DFL : CFQ_WEIGHT_LEGACY_DFL; - if (cpd_to_blkcg(cpd) == &blkcg_root) { - cgd->weight = 2 * CFQ_WEIGHT_LEGACY_DFL; - cgd->leaf_weight = 2 * CFQ_WEIGHT_LEGACY_DFL; - } else { - cgd->weight = CFQ_WEIGHT_LEGACY_DFL; - cgd->leaf_weight = CFQ_WEIGHT_LEGACY_DFL; - } + if (cpd_to_blkcg(cpd) == &blkcg_root) + weight *= 2; + + cgd->weight = weight; + cgd->leaf_weight = weight; } static void cfq_cpd_free(struct blkcg_policy_data *cpd) @@ -1593,6 +1596,19 @@ static void cfq_cpd_free(struct blkcg_policy_data *cpd) kfree(cpd_to_cfqgd(cpd)); } +static void cfq_cpd_bind(struct blkcg_policy_data *cpd) +{ + struct blkcg *blkcg = cpd_to_blkcg(cpd); + bool on_dfl = cgroup_on_dfl(blkcg_root.css.cgroup); + unsigned int weight = on_dfl ? CGROUP_WEIGHT_DFL : CFQ_WEIGHT_LEGACY_DFL; + + if (blkcg == &blkcg_root) + weight *= 2; + + WARN_ON_ONCE(__cfq_set_weight(&blkcg->css, weight, on_dfl, true, false)); + WARN_ON_ONCE(__cfq_set_weight(&blkcg->css, weight, on_dfl, true, true)); +} + static struct blkg_policy_data *cfq_pd_alloc(gfp_t gfp, int node) { struct cfq_group *cfqg; @@ -1742,6 +1758,8 @@ static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off, bool on_dfl, bool is_leaf_weight) { + unsigned int min = on_dfl ? CGROUP_WEIGHT_MIN : CFQ_WEIGHT_LEGACY_MIN; + unsigned int max = on_dfl ? CGROUP_WEIGHT_MAX : CFQ_WEIGHT_LEGACY_MAX; struct blkcg *blkcg = css_to_blkcg(of_css(of)); struct blkg_conf_ctx ctx; struct cfq_group *cfqg; @@ -1769,7 +1787,7 @@ static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of, cfqgd = blkcg_to_cfqgd(blkcg); ret = -ERANGE; - if (!v || (v >= CFQ_WEIGHT_LEGACY_MIN && v <= CFQ_WEIGHT_LEGACY_MAX)) { + if (!v || (v >= min && v <= max)) { if (!is_leaf_weight) { cfqg->dev_weight = v; cfqg->new_weight = v ?: cfqgd->weight; @@ -1797,15 +1815,17 @@ static ssize_t cfqg_set_leaf_weight_device(struct kernfs_open_file *of, } static int __cfq_set_weight(struct cgroup_subsys_state *css, u64 val, - bool is_leaf_weight) + bool on_dfl, bool reset_dev, bool is_leaf_weight) { + unsigned int min = on_dfl ? CGROUP_WEIGHT_MIN : CFQ_WEIGHT_LEGACY_MIN; + unsigned int max = on_dfl ? CGROUP_WEIGHT_MAX : CFQ_WEIGHT_LEGACY_MAX; struct blkcg *blkcg = css_to_blkcg(css); struct blkcg_gq *blkg; struct cfq_group_data *cfqgd; int ret = 0; - if (val < CFQ_WEIGHT_LEGACY_MIN || val > CFQ_WEIGHT_LEGACY_MAX) - return -EINVAL; + if (val < min || val > max) + return -ERANGE; spin_lock_irq(&blkcg->lock); cfqgd = blkcg_to_cfqgd(blkcg); @@ -1826,9 +1846,13 @@ static int __cfq_set_weight(struct cgroup_subsys_state *css, u64 val, continue; if (!is_leaf_weight) { + if (reset_dev) + cfqg->dev_weight = 0; if (!cfqg->dev_weight) cfqg->new_weight = cfqgd->weight; } else { + if (reset_dev) + cfqg->dev_leaf_weight = 0; if (!cfqg->dev_leaf_weight) cfqg->new_leaf_weight = cfqgd->leaf_weight; } @@ -1842,13 +1866,13 @@ out: static int cfq_set_weight(struct cgroup_subsys_state *css, struct cftype *cft, u64 val) { - return __cfq_set_weight(css, val, false); + return __cfq_set_weight(css, val, false, false, false); } static int cfq_set_leaf_weight(struct cgroup_subsys_state *css, struct cftype *cft, u64 val) { - return __cfq_set_weight(css, val, true); + return __cfq_set_weight(css, val, false, false, true); } static int cfqg_print_stat(struct seq_file *sf, void *v) @@ -2135,7 +2159,7 @@ static ssize_t cfq_set_weight_on_dfl(struct kernfs_open_file *of, /* "WEIGHT" or "default WEIGHT" sets the default weight */ v = simple_strtoull(buf, &endp, 0); if (*endp == '\0' || sscanf(buf, "default %llu", &v) == 1) { - ret = __cfq_set_weight(of_css(of), v, false); + ret = __cfq_set_weight(of_css(of), v, true, false, false); return ret ?: nbytes; } @@ -4512,9 +4536,9 @@ static int cfq_init_queue(struct request_queue *q, struct elevator_type *e) goto out_free; cfq_init_cfqg_base(cfqd->root_group); -#endif cfqd->root_group->weight = 2 * CFQ_WEIGHT_LEGACY_DFL; cfqd->root_group->leaf_weight = 2 * CFQ_WEIGHT_LEGACY_DFL; +#endif /* * Not strictly needed (since RB_ROOT just clears the node and we @@ -4715,6 +4739,7 @@ static struct blkcg_policy blkcg_policy_cfq = { .cpd_alloc_fn = cfq_cpd_alloc, .cpd_init_fn = cfq_cpd_init, .cpd_free_fn = cfq_cpd_free, + .cpd_bind_fn = cfq_cpd_bind, .pd_alloc_fn = cfq_pd_alloc, .pd_init_fn = cfq_pd_init, diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 9a7c4bd45fff..0a5cc7a1109b 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -138,6 +138,7 @@ struct blkcg_gq { typedef struct blkcg_policy_data *(blkcg_pol_alloc_cpd_fn)(gfp_t gfp); typedef void (blkcg_pol_init_cpd_fn)(struct blkcg_policy_data *cpd); typedef void (blkcg_pol_free_cpd_fn)(struct blkcg_policy_data *cpd); +typedef void (blkcg_pol_bind_cpd_fn)(struct blkcg_policy_data *cpd); typedef struct blkg_policy_data *(blkcg_pol_alloc_pd_fn)(gfp_t gfp, int node); typedef void (blkcg_pol_init_pd_fn)(struct blkg_policy_data *pd); typedef void (blkcg_pol_online_pd_fn)(struct blkg_policy_data *pd); @@ -155,6 +156,7 @@ struct blkcg_policy { blkcg_pol_alloc_cpd_fn *cpd_alloc_fn; blkcg_pol_init_cpd_fn *cpd_init_fn; blkcg_pol_free_cpd_fn *cpd_free_fn; + blkcg_pol_bind_cpd_fn *cpd_bind_fn; blkcg_pol_alloc_pd_fn *pd_alloc_fn; blkcg_pol_init_pd_fn *pd_init_fn; -- cgit v1.2.3