summaryrefslogtreecommitdiffstats
path: root/net/sched/sch_pie.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* net_sched: sch_pie: implement lockless pie_dump()Eric Dumazet2024-04-191-18/+21
| | | | | | | | | Instead of relying on RTNL, pie_dump() can use READ_ONCE() annotations, paired with WRITE_ONCE() ones in pie_change(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: Add module aliases for cls_,sch_,act_ modulesMichal Koutný2024-02-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No functional change intended, aliases will be used in followup commits. Note for backporters: you may need to add aliases also for modules that are already removed in mainline kernel but still in your version. Patches were generated with the help of Coccinelle scripts like: cat >scripts/coccinelle/misc/tcf_alias.cocci <<EOD virtual patch virtual report @ haskernel @ @@ @ tcf_has_kind depends on report && haskernel @ identifier ops; constant K; @@ static struct tcf_proto_ops ops = { .kind = K, ... }; +char module_alias = K; EOD /usr/bin/spatch -D report --cocci-file scripts/coccinelle/misc/tcf_alias.cocci \ --dir . \ -I ./arch/x86/include -I ./arch/x86/include/generated -I ./include \ -I ./arch/x86/include/uapi -I ./arch/x86/include/generated/uapi \ -I ./include/uapi -I ./include/generated/uapi \ --include ./include/linux/compiler-version.h --include ./include/linux/kconfig.h \ --jobs 8 --chunksize 1 2>/dev/null | \ sed 's/char module_alias = "\([^"]*\)";/MODULE_ALIAS_NET_CLS("\1");/' And analogously for: static struct tc_action_ops ops = { .kind = K, static struct Qdisc_ops ops = { .id = K, (Someone familiar would be able to fit those into one .cocci file without sed post processing.) Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240201130943.19536-3-mkoutny@suse.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: sched: add rcu annotations around qdisc->qdisc_sleepingEric Dumazet2023-06-071-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reported a race around qdisc->qdisc_sleeping [1] It is time we add proper annotations to reads and writes to/from qdisc->qdisc_sleeping. [1] BUG: KCSAN: data-race in dev_graft_qdisc / qdisc_lookup_rcu read to 0xffff8881286fc618 of 8 bytes by task 6928 on cpu 1: qdisc_lookup_rcu+0x192/0x2c0 net/sched/sch_api.c:331 __tcf_qdisc_find+0x74/0x3c0 net/sched/cls_api.c:1174 tc_get_tfilter+0x18f/0x990 net/sched/cls_api.c:2547 rtnetlink_rcv_msg+0x7af/0x8c0 net/core/rtnetlink.c:6386 netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x375/0x4c0 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmsg+0x1e3/0x270 net/socket.c:2586 __do_sys_sendmsg net/socket.c:2595 [inline] __se_sys_sendmsg net/socket.c:2593 [inline] __x64_sys_sendmsg+0x46/0x50 net/socket.c:2593 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd write to 0xffff8881286fc618 of 8 bytes by task 6912 on cpu 0: dev_graft_qdisc+0x4f/0x80 net/sched/sch_generic.c:1115 qdisc_graft+0x7d0/0xb60 net/sched/sch_api.c:1103 tc_modify_qdisc+0x712/0xf10 net/sched/sch_api.c:1693 rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6395 netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x375/0x4c0 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmsg+0x1e3/0x270 net/socket.c:2586 __do_sys_sendmsg net/socket.c:2595 [inline] __se_sys_sendmsg net/socket.c:2593 [inline] __x64_sys_sendmsg+0x46/0x50 net/socket.c:2593 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 6912 Comm: syz-executor.5 Not tainted 6.4.0-rc3-syzkaller-00190-g0d85b27b0cc6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023 Fixes: 3a7d0d07a386 ("net: sched: extend Qdisc with rcu") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Buslov <vladbu@nvidia.com> Acked-by: Jamal Hadi Salim<jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* fix typos in net/sched/* filesTaichi Nishimura2023-03-241-1/+1
| | | | | | | | This patch fixes typos in net/sched/* files. Signed-off-by: Taichi Nishimura <awkrail01@gmail.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* treewide: use get_random_bytes() when possibleJason A. Donenfeld2022-10-121-1/+1
| | | | | | | | | | | | | | The prandom_bytes() function has been a deprecated inline wrapper around get_random_bytes() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> # powerpc Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* net: sched: remove redundant NULL check in change hook functionZhengchao Shao2022-09-011-3/+0
| | | | | | | | | | | | | | Currently, the change function can be called by two ways. The one way is that qdisc_change() will call it. Before calling change function, qdisc_change() ensures tca[TCA_OPTIONS] is not empty. The other way is that .init() will call it. The opt parameter is also checked before calling change function in .init(). Therefore, it's no need to check the input parameter opt in change function. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20220829071219.208646-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net: sched: fix misspellings using misspell-fixer toolMenglong Dong2020-11-111-1/+1
| | | | | | | | | | | | | | | | | | Some typos are found out by misspell-fixer tool: $ misspell-fixer -rnv ./net/sched/ ./net/sched/act_api.c:686 ./net/sched/act_bpf.c:68 ./net/sched/cls_rsvp.h:241 ./net/sched/em_cmp.c:44 ./net/sched/sch_pie.c:408 Fix typos found by misspell-fixer. Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/5fa8e9d4.1c69fb81.5d889.5c64@mx.google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: sched: pie: change tc_pie_xstats->probLeslie Monis2020-03-101-1/+1
| | | | | | | | | | | | | Commit 105e808c1da2 ("pie: remove pie_vars->accu_prob_overflows") changes the scale of probability values in PIE from (2^64 - 1) to (2^56 - 1). This affects the precision of tc_pie_xstats->prob in user space. This patch ensures user space is unaffected. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pie: remove pie_vars->accu_prob_overflowsLeslie Monis2020-03-041-15/+6
| | | | | | | | | | | | | | | | | | | | | | | | | The variable pie_vars->accu_prob is used as an accumulator for probability values. Since probabilty values are scaled using the MAX_PROB macro denoting (2^64 - 1), pie_vars->accu_prob is likely to overflow as it is of type u64. The variable pie_vars->accu_prob_overflows counts the number of times the variable pie_vars->accu_prob overflows. The MAX_PROB macro needs to be equal to at least (2^39 - 1) in order to do precise calculations without any underflow. Thus MAX_PROB can be reduced to (2^56 - 1) without affecting the precision in calculations drastically. Doing so will eliminate the need for the variable pie_vars->accu_prob_overflows as the variable pie_vars->accu_prob will never overflow. Removing the variable pie_vars->accu_prob_overflows also reduces the size of the structure pie_vars to exactly 64 bytes. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pie: remove unnecessary type castingLeslie Monis2020-03-041-2/+2
| | | | | | | | | | | | | In function pie_calculate_probability(), the variables alpha and beta are of type u64. The variables qdelay, qdelay_old and params->target are of type psched_time_t (which is also u64). The explicit type casting done when calculating the value for the variable delta is redundant and not required. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pie: use term backlog instead of qlenLeslie Monis2020-03-041-11/+11
| | | | | | | | | | Remove ambiguity by using the term backlog instead of qlen when representing the queue length in bytes. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: export symbols to be reused by FQ-PIEMohit P. Tahiliani2020-01-231-85/+88
| | | | | | | | | | | | | | This patch makes the drop_early(), calculate_probability() and pie_process_dequeue() functions generic enough to be used by both PIE and FQ-PIE (to be added in a future commit). The major change here is in the way the functions take in arguments. This patch exports these functions and makes FQ-PIE dependent on sch_pie. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: fix alignment in struct instancesMohit P. Tahiliani2020-01-231-9/+9
| | | | | | | | | | Make the alignment in the initialization of the struct instances consistent in the file. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: fix commentingMohit P. Tahiliani2020-01-231-5/+5
| | | | | | | | | | | | | Fix punctuation and logical mistakes in the comments. The logical mistake was that "dequeue_rate" is no longer the default way to calculate queuing delay and is not needed. The default way to calculate queue delay was changed in commit cec2975f2b70 ("net: sched: pie: enable timestamp based delay calculation"). Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pie: rearrange structure members and their initializationsMohit P. Tahiliani2020-01-231-1/+1
| | | | | | | | | | | | Rearrange the members of the structure such that closely referenced members appear together and/or fit in the same cacheline. Also, change the order of their initializations to match the order in which they appear in the structure. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: move common code to pie.hMohit P. Tahiliani2020-01-231-85/+1
| | | | | | | | | | | | This patch moves macros, structures and small functions common to PIE and FQ-PIE (to be added in a future commit) from the file net/sched/sch_pie.c to the header file include/net/pie.h. All the moved functions are made inline. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: enable timestamp based delay calculationGautam Ramakrishnan2019-11-201-21/+99
| | | | | | | | | | | | | | RFC 8033 suggests an alternative approach to calculate the queue delay in PIE by using a timestamp on every enqueued packet. This patch adds an implementation of that approach and sets it as the default method to calculate queue delay. The previous method (based on Little's law) to calculate queue delay is set as optional. Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Acked-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 235Thomas Gleixner2019-06-191-10/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 53 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.904365654@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* netlink: make validation more configurable for future strictnessJohannes Berg2019-04-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be *everything*. The current *_strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to *_parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: make nla_nest_start() add NLA_F_NESTED flagMichal Kubecek2019-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: avoid slow division in drop probability decayLeslie Monis2019-02-281-1/+2
| | | | | | | | | | As per RFC 8033, it is sufficient for the drop probability decay factor to have a value of (1 - 1/64) instead of 98%. This avoids the need to do slow division. Suggested-by: David Laight <David.Laight@aculab.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: fix 64-bit divisionLeslie Monis2019-02-271-1/+1
| | | | | | | | | | Use div_u64() to resolve build failures on 32-bit platforms. Fixes: 3f7ae5f3dc52 ("net: sched: pie: add more cases to auto-tune alpha and beta") Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: fix mistake in reference linkLeslie Monis2019-02-261-1/+1
| | | | | | | Fix the incorrect reference link to RFC 8033 Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: update referencesMohit P. Tahiliani2019-02-251-3/+1
| | | | | | | | | | | | | | RFC 8033 replaces the IETF draft for PIE Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com> Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com> Signed-off-by: Manish Kumar B <bmanish15597@gmail.com> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Acked-by: Dave Taht <dave.taht@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: add derandomization mechanismMohit P. Tahiliani2019-02-251-1/+27
| | | | | | | | | | | | | | | | | | | | | | | | Random dropping of packets to achieve latency control may introduce outlier situations where packets are dropped too close to each other or too far from each other. This can cause the real drop percentage to temporarily deviate from the intended drop probability. In certain scenarios, such as a small number of simultaneous TCP flows, these deviations can cause significant deviations in link utilization and queuing latency. RFC 8033 suggests using a derandomization mechanism to avoid these deviations. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com> Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com> Signed-off-by: Manish Kumar B <bmanish15597@gmail.com> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Acked-by: Dave Taht <dave.taht@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: add more cases to auto-tune alpha and betaMohit P. Tahiliani2019-02-251-33/+32
| | | | | | | | | | | | | | | | | | | | | | | | The current implementation scales the local alpha and beta variables in the calculate_probability function by the same amount for all values of drop probability below 1%. RFC 8033 suggests using additional cases for auto-tuning alpha and beta when the drop probability is less than 1%. In order to add more auto-tuning cases, MAX_PROB must be scaled by u64 instead of u32 to prevent underflow when scaling the local alpha and beta variables in the calculate_probability function. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com> Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com> Signed-off-by: Manish Kumar B <bmanish15597@gmail.com> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Acked-by: Dave Taht <dave.taht@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: change initial value of pie_vars->burst_timeMohit P. Tahiliani2019-02-251-2/+2
| | | | | | | | | | | | | | | RFC 8033 suggests an initial value of 150 milliseconds for the maximum time allowed for a burst of packets. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com> Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com> Signed-off-by: Manish Kumar B <bmanish15597@gmail.com> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Acked-by: Dave Taht <dave.taht@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: change default value of pie_params->tupdateMohit P. Tahiliani2019-02-251-1/+1
| | | | | | | | | | | | | | | RFC 8033 suggests a default value of 15 milliseconds for the update interval. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com> Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com> Signed-off-by: Manish Kumar B <bmanish15597@gmail.com> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Acked-by: Dave Taht <dave.taht@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: change default value of pie_params->targetMohit P. Tahiliani2019-02-251-1/+1
| | | | | | | | | | | | | | | RFC 8033 suggests a default value of 15 milliseconds for the target queue delay. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com> Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com> Signed-off-by: Manish Kumar B <bmanish15597@gmail.com> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Acked-by: Dave Taht <dave.taht@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: change value of QUEUE_THRESHOLDMohit P. Tahiliani2019-02-251-1/+1
| | | | | | | | | | | | | | | RFC 8033 recommends a value of 16384 bytes for the queue threshold. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com> Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com> Signed-off-by: Manish Kumar B <bmanish15597@gmail.com> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Acked-by: Dave Taht <dave.taht@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: pie: fix coding style issuesLeslie Monis2018-10-081-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix 5 warnings and 14 checks issued by checkpatch.pl: CHECK: Logical continuations should be on the previous line + if ((q->vars.qdelay < q->params.target / 2) + && (q->vars.prob < MAX_PROB / 5)) WARNING: line over 80 characters + q->params.tupdate = usecs_to_jiffies(nla_get_u32(tb[TCA_PIE_TUPDATE])); CHECK: Blank lines aren't necessary after an open brace '{' +{ + CHECK: braces {} should be used on all arms of this statement + if (qlen < QUEUE_THRESHOLD) [...] + else { [...] CHECK: Unbalanced braces around else statement + else { CHECK: No space is necessary after a cast + if (delta > (s32) (MAX_PROB / (100 / 2)) && CHECK: Unnecessary parentheses around 'qdelay == 0' + if ((qdelay == 0) && (qdelay_old == 0) && update_prob) CHECK: Unnecessary parentheses around 'qdelay_old == 0' + if ((qdelay == 0) && (qdelay_old == 0) && update_prob) CHECK: Unnecessary parentheses around 'q->vars.prob == 0' + if ((q->vars.qdelay < q->params.target / 2) && + (q->vars.qdelay_old < q->params.target / 2) && + (q->vars.prob == 0) && + (q->vars.avg_dq_rate > 0)) CHECK: Unnecessary parentheses around 'q->vars.avg_dq_rate > 0' + if ((q->vars.qdelay < q->params.target / 2) && + (q->vars.qdelay_old < q->params.target / 2) && + (q->vars.prob == 0) && + (q->vars.avg_dq_rate > 0)) CHECK: Blank lines aren't necessary before a close brace '}' + +} CHECK: Comparison to NULL could be written "!opts" + if (opts == NULL) CHECK: No space is necessary after a cast + ((u32) PSCHED_TICKS2NS(q->params.target)) / WARNING: line over 80 characters + nla_put_u32(skb, TCA_PIE_TUPDATE, jiffies_to_usecs(q->params.tupdate)) || CHECK: Blank lines aren't necessary before a close brace '}' + +} CHECK: No space is necessary after a cast + .delay = ((u32) PSCHED_TICKS2NS(q->vars.qdelay)) / WARNING: Missing a blank line after declarations + struct sk_buff *skb; + skb = qdisc_dequeue_head(sch); WARNING: Missing a blank line after declarations + struct pie_sched_data *q = qdisc_priv(sch); + qdisc_reset_queue(sch); WARNING: Missing a blank line after declarations + struct pie_sched_data *q = qdisc_priv(sch); + q->params.tupdate = 0; Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: sch: add extack for change qdisc opsAlexander Aring2017-12-211-2/+3
| | | | | | | | | | This patch adds extack support for change callback for qdisc ops structtur to prepare per-qdisc specific changes for extack. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: sch: add extack for init callbackAlexander Aring2017-12-211-1/+2
| | | | | | | | | | This patch adds extack support for init callback to prepare per-qdisc specific changes for extack. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: Convert timers to use timer_setup()Kees Cook2017-10-181-4/+6
| | | | | | | | | | | | | | In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Add pointer back to Qdisc. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: pass extended ACK struct to parsing functionsJohannes Berg2017-04-131-1/+1
| | | | | | | | | Pass the new extended ACK reporting struct to all of the generic netlink parsing functions. For now, pass NULL in almost all callers (except for some in the core.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sched: replace __skb_dequeue with __qdisc_dequeue_headFlorian Westphal2016-09-191-1/+1
| | | | | | | | | | | | | After previous patch these functions are identical. Replace __skb_dequeue in qdiscs with __qdisc_dequeue_head. Next patch will then make __qdisc_dequeue_head handle single-linked list instead of strcut sk_buff_head argument. Doesn't change generated code. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* pie: use qdisc_dequeue_head wrapperFlorian Westphal2016-09-191-1/+1
| | | | | | | Doesn't change generated code. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: drop packets after root qdisc lock is releasedEric Dumazet2016-06-251-2/+3
| | | | | | | | | | | | | | | | | | | | | | Qdisc performance suffers when packets are dropped at enqueue() time because drops (kfree_skb()) are done while qdisc lock is held, delaying a dequeue() draining the queue. Nominal throughput can be reduced by 50 % when this happens, at a time we would like the dequeue() to proceed as fast as possible. Even FQ is vulnerable to this problem, while one of FQ goals was to provide some flow isolation. This patch adds a 'struct sk_buff **to_free' parameter to all qdisc->enqueue(), and in qdisc_drop() helper. I measured a performance increase of up to 12 %, but this patch is a prereq so that future batches in enqueue() can fly. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: sch_pie: defer skb freeingEric Dumazet2016-06-151-1/+1
| | | | | | | | | | pie_change() can use rtnl_qdisc_drop() to benefit from deferred freeing. pie_reset() is already using qdisc_reset_queue() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: update hierarchical backlog tooWANG Cong2016-02-291-2/+3
| | | | | | | | | | | | When the bottom qdisc decides to, for example, drop some packet, it calls qdisc_tree_decrease_qlen() to update the queue length for all its ancestors, we need to update the backlog too to keep the stats on root qdisc accurate. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sch_pie: schedule the timer after all init succeedWANG Cong2014-10-291-1/+1
| | | | | | | Cc: Vijay Subramanian <vijaynsu@cisco.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com>
* net: sched: implement qstat helper routinesJohn Fastabend2014-09-301-1/+1
| | | | | | | | | This adds helpers to manipulate qstats logic and replaces locations that touch the counters directly. This simplifies future patches to push qstats onto per cpu counters. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: Cleanup PIE commentsVijay Subramanian2014-02-141-5/+16
| | | | | | | | | | | Fix incorrect comment reported by Norbert Kiesel. Edit another comment to add more details. Also add references to algorithm (IETF draft and paper) to top of file. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> CC: Mythili Prabhu <mysuryan@cisco.com> CC: Norbert Kiesel <nkiesel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: replace macros net_random and net_srandom with direct calls to prandomAruna-Hewapathirane2014-01-151-1/+1
| | | | | | | | | | | | This patch removes the net_random and net_srandom macros and replaces them with direct calls to the prandom ones. As new commits only seem to use prandom_u32 there is no use to keep them around. This change makes it easier to grep for users of prandom_u32. Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: pkt_sched: PIE AQM schemeVijay Subramanian2014-01-061-0/+555
Proportional Integral controller Enhanced (PIE) is a scheduler to address the bufferbloat problem. >From the IETF draft below: " Bufferbloat is a phenomenon where excess buffers in the network cause high latency and jitter. As more and more interactive applications (e.g. voice over IP, real time video streaming and financial transactions) run in the Internet, high latency and jitter degrade application performance. There is a pressing need to design intelligent queue management schemes that can control latency and jitter; and hence provide desirable quality of service to users. We present here a lightweight design, PIE(Proportional Integral controller Enhanced) that can effectively control the average queueing latency to a target value. Simulation results, theoretical analysis and Linux testbed results have shown that PIE can ensure low latency and achieve high link utilization under various congestion situations. The design does not require per-packet timestamp, so it incurs very small overhead and is simple enough to implement in both hardware and software. " Many thanks to Dave Taht for extensive feedback, reviews, testing and suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and suggestions. Naeem Khademi and Dave Taht independently contributed to ECN support. For more information, please see technical paper about PIE in the IEEE Conference on High Performance Switching and Routing 2013. A copy of the paper can be found at ftp://ftpeng.cisco.com/pie/. Please also refer to the IETF draft submission at http://tools.ietf.org/html/draft-pan-tsvwg-pie-00 All relevant code, documents and test scripts and results can be found at ftp://ftpeng.cisco.com/pie/. For problems with the iproute2/tc or Linux kernel code, please contact Vijay Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu (mysuryan@cisco.com) Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: Mythili Prabhu <mysuryan@cisco.com> CC: Dave Taht <dave.taht@bufferbloat.net> Signed-off-by: David S. Miller <davem@davemloft.net>