summaryrefslogtreecommitdiffstats
path: root/net/sched (follow)
Commit message (Collapse)AuthorAgeFilesLines
* pkt_sched: Fix qdisc_create on stab error handlingJarek Poplawski2009-09-161-2/+2
| | | | | | | | | If qdisc_get_stab returns error in qdisc_create there is skipped qdisc ops->destroy, which is necessary because it's after ops->init at the moment, so memory leaks are quite probable. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: Fix tx queue selection in tc_modify_qdiscJarek Poplawski2009-09-152-7/+16
| | | | | | | | | | | | | | | After the recent mq change there is the new select_queue qdisc class method used in tc_modify_qdisc, but it works OK only for direct child qdiscs of mq qdisc. Grandchildren always get the first tx queue, which would give wrong qdisc_root etc. results (e.g. for sch_htb as child of sch_prio). This patch fixes it by using parent's dev_queue for such grandchildren qdiscs. The select_queue method's return type is changed BTW. With feedback from: Patrick McHardy <kaber@trash.net> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: Fix qdisc_graft WRT ingress qdiscJarek Poplawski2009-09-151-5/+10
| | | | | | | | After the recent mq change using ingress qdisc overwrites dev->qdisc; there is also a wrong old qdisc pointer passed to notify_and_destroy. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2009-09-1416-325/+504
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits) netxen: update copyright netxen: fix tx timeout recovery netxen: fix file firmware leak netxen: improve pci memory access netxen: change firmware write size tg3: Fix return ring size breakage netxen: build fix for INET=n cdc-phonet: autoconfigure Phonet address Phonet: back-end for autoconfigured addresses Phonet: fix netlink address dump error handling ipv6: Add IFA_F_DADFAILED flag net: Add DEVTYPE support for Ethernet based devices mv643xx_eth.c: remove unused txq_set_wrr() ucc_geth: Fix hangs after switching from full to half duplex ucc_geth: Rearrange some code to avoid forward declarations phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs drivers/net/phy: introduce missing kfree drivers/net/wan: introduce missing kfree net: force bridge module(s) to be GPL Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded ... Fixed up trivial conflicts: - arch/x86/include/asm/socket.h converted to <asm-generic/socket.h> in the x86 tree. The generic header has the same new #define's, so that works out fine. - drivers/net/tun.c fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that switched over to using 'tun->socket.sk' instead of the redundantly available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks to the TUN driver") which added a new 'tun->sk' use. Noted in 'next' by Stephen Rothwell.
| * Merge branch 'master' of ↵David S. Miller2009-09-111-1/+1
| |\ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
| | * netlink: constify nlmsghdr argumentsPatrick McHardy2009-08-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Consitfy nlmsghdr arguments to a couple of functions as preparation for the next patch, which will constify the netlink message data in all nfnetlink users. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | net_sched: fix estimator lock selection for mq child qdiscsPatrick McHardy2009-09-102-16/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When new child qdiscs are attached to the mq qdisc, they are actually attached as root qdiscs to the device queues. The lock selection for new estimators incorrectly picks the root lock of the existing and to be replaced qdisc, which results in a use-after-free once the old qdisc has been destroyed. Mark mq qdisc instances with a new flag and treat qdiscs attached to mq as children similar to regular root qdiscs. Additionally prevent estimators from being attached to the mq qdisc itself since it only updates its byte and packet counters during dumps. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net_sched: add classful multiqueue dummy schedulerDavid S. Miller2009-09-064-13/+273
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a classful dummy scheduler which can be used as root qdisc for multiqueue devices and exposes each device queue as a child class. This allows to address queues individually and graft them similar to regular classes. Additionally it presents an accumulated view of the statistics of all real root qdiscs in the dummy root. Two new callbacks are added to the qdisc_ops and qdisc_class_ops: - cl_ops->select_queue selects the tx queue number for new child classes. - qdisc_ops->attach() overrides root qdisc device grafting to attach non-shared qdiscs to the queues. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net_sched: move dev_graft_qdisc() to sch_generic.cPatrick McHardy2009-09-062-26/+25
| | | | | | | | | | | | | | | | | | | | | It will be used in a following patch by the multiqueue qdisc. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net_sched: reintroduce dev->qdisc for use by sch_apiPatrick McHardy2009-09-063-44/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the multiqueue integration with the qdisc API suffers from a few problems: - with multiple queues, all root qdiscs use the same handle. This means they can't be exposed to userspace in a backwards compatible fashion. - all API operations always refer to queue number 0. Newly created qdiscs are automatically shared between all queues, its not possible to address individual queues or restore multiqueue behaviour once a shared qdisc has been attached. - Dumps only contain the root qdisc of queue 0, in case of non-shared qdiscs this means the statistics are incomplete. This patch reintroduces dev->qdisc, which points to the (single) root qdisc from userspace's point of view. Currently it either points to the first (non-shared) default qdisc, or a qdisc shared between all queues. The following patches will introduce a classful dummy qdisc, which will be used as root qdisc and contain the per-queue qdiscs as children. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net_sched: remove some unnecessary checks in classful schedulersPatrick McHardy2009-09-067-67/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The class argument to the ->graft(), ->leaf(), ->dump(), ->dump_stats() all originate from either ->get() or ->walk() and are always valid. Remove unnecessary checks. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net_sched: make cls_ops->change and cls_ops->delete optionalPatrick McHardy2009-09-067-85/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some schedulers don't support creating, changing or deleting classes. Make the respective callbacks optionally and consistently return -EOPNOTSUPP for unsupported operations, instead of currently either -EOPNOTSUPP, -ENOSYS or no error. In case of sch_prio and sch_multiq, the removed operations additionally checked for an invalid class. This is not necessary since the class argument can only orginate from ->get() or in case of ->change is 0 for creation of new classes, in which case ->change() incorrectly returned -ENOENT. As a side-effect, this patch fixes a possible (root-only) NULL pointer function call in sch_ingress, which didn't implement a so far mandatory ->delete() operation. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net_sched: make cls_ops->tcf_chain() optionalPatrick McHardy2009-09-063-12/+5
| | | | | | | | | | | | | | | | | | | | | | | | Some qdiscs don't support attaching filters. Handle this centrally in cls_api and return a proper errno code (EOPNOTSUPP) instead of EINVAL. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net_sched: fix class grafting errno codesPatrick McHardy2009-09-052-11/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the parent qdisc doesn't support classes, use EOPNOTSUPP. If the parent class doesn't exist, use ENOENT. Currently EINVAL is returned in both cases. Additionally check whether grafting is supported and remove a now unnecessary graft function from sch_ingress. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'master' of ↵David S. Miller2009-09-025-5/+5
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/yellowfin.c
| * | | netdev: convert pseudo-devices to netdev_tx_tStephen Hemminger2009-09-011-2/+2
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | pkt_sched: Fix resource limiting in pfifo_fastKrishna Kumar2009-08-311-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pfifo_fast_enqueue has this check: if (skb_queue_len(list) < qdisc_dev(qdisc)->tx_queue_len) { which allows each band to enqueue upto tx_queue_len skbs for a total of 3*tx_queue_len skbs. I am not sure if this was the intention of limiting in qdisc. Patch compiled and 32 simultaneous netperf testing ran fine. Also: # tc -s qdisc show dev eth2 qdisc pfifo_fast 0: root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 16835026752 bytes 373116 pkt (dropped 0, overlimits 0 requeues 25) rate 0bit 0pps backlog 0b 0p requeues 25 Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Speed-up pfifo_fast lookup using a private bitmapKrishna Kumar2009-08-291-22/+48
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | Maintain a per-qdisc bitmap for pfifo_fast giving availability of skbs for each band. This allows faster lookup for a skb when there are no high priority skbs. Also, it helps in (rare) cases when there are no skbs on the list, where an immediate lookup is faster than iterating through the three bands. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Avoid enqueuing skb for default qdiscsKrishna Kumar2009-08-071-36/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dev_queue_xmit enqueue's a skb and calls qdisc_run which dequeue's the skb and xmits it. In most cases, the skb that is enqueue'd is the same one that is dequeue'd (unless the queue gets stopped or multiple cpu's write to the same queue and ends in a race with qdisc_run). For default qdiscs, we can remove the redundant enqueue/dequeue and simply xmit the skb since the default qdisc is work-conserving. The patch uses a new flag - TCQ_F_CAN_BYPASS to identify the default fast queue. The controversial part of the patch is incrementing qlen when a skb is requeued - this is to avoid checks like the second line below: + } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) && >> !q->gso_skb && + !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) { Results of a 2 hour testing for multiple netperf sessions (1, 2, 4, 8, 12 sessions on a 4 cpu system-X). The BW numbers are aggregate Mb/s across iterations tested with this version on System-X boxes with Chelsio 10gbps cards: ---------------------------------- Size | ORG BW NEW BW | ---------------------------------- 128K | 156964 159381 | 256K | 158650 162042 | ---------------------------------- Changes from ver1: 1. Move sch_direct_xmit declaration from sch_generic.h to pkt_sched.h 2. Update qdisc basic statistics for direct xmit path. 3. Set qlen to zero in qdisc_reset. 4. Changed some function names to more meaningful ones. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: use NETDEV_TX_OK instead of 0 in ndo_start_xmit() functionsPatrick McHardy2009-07-061-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is the result of an automatic spatch transformation to convert all ndo_start_xmit() return values of 0 to NETDEV_TX_OK. Some occurences are missed by the automatic conversion, those will be handled in a seperate patch. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tc: Fix unitialized kernel memory leakEric Dumazet2009-09-031-0/+2
| |/ |/| | | | | | | | | | | | | Three bytes of uninitialized kernel memory are currently leaked to user Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | pkt_sched: Revert tasklet_hrtimer changes.David S. Miller2009-09-022-19/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are full of unresolved problems, mainly that conversions don't work 1-1 from hrtimers to tasklet_hrtimers because unlike hrtimers tasklets can't be killed from softirq context. And when a qdisc gets reset, that's exactly what we need to do here. We'll work this out in the net-next-2.6 tree and if warranted we'll backport that work to -stable. This reverts the following 3 changesets: a2cb6a4dd470d7a64255a10b843b0d188416b78f ("pkt_sched: Fix bogon in tasklet_hrtimer changes.") 38acce2d7983632100a9ff3fd20295f6e34074a8 ("pkt_sched: Convert CBQ to tasklet_hrtimer.") ee5f9757ea17759e1ce5503bdae2b07e48e32af9 ("pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer") Signed-off-by: David S. Miller <davem@davemloft.net>
* | pkt_sched: Fix bogon in tasklet_hrtimer changes.David S. Miller2009-08-252-2/+2
| | | | | | | | | | | | | | | | | | | | | | Reported by Stephen Rothwell, luckily it's harmless: net/sched/sch_api.c: In function 'qdisc_watchdog': net/sched/sch_api.c:460: warning: initialization from incompatible pointer type net/sched/sch_cbq.c: In function 'cbq_undelay': net/sched/sch_cbq.c:595: warning: initialization from incompatible pointer type Signed-off-by: David S. Miller <davem@davemloft.net>
* | pkt_sched: Convert CBQ to tasklet_hrtimer.David S. Miller2009-08-241-10/+13
| | | | | | | | | | | | | | | | This code expects to run in softirq context, and bare hrtimers run in hw IRQ context. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Thomas Gleixner <tglx@linutronix.de>
* | pkt_sched: Convert qdisc_watchdog to tasklet_hrtimerDavid S. Miller2009-08-231-4/+4
| | | | | | | | | | | | | | | | None of this stuff should execute in hw IRQ context, therefore use a tasklet_hrtimer so that it runs in softirq context. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Thomas Gleixner <tglx@linutronix.de>
* | net: restore gnet_stats_basic to previous definitionEric Dumazet2009-08-185-5/+5
|/ | | | | | | | | | | | | | | | | | | In 5e140dfc1fe87eae27846f193086724806b33c7d "net: reorder struct Qdisc for better SMP performance" the definition of struct gnet_stats_basic changed incompatibly, as copies of this struct are shipped to userland via netlink. Restoring old behavior is not welcome, for performance reason. Fix is to use a private structure for kernel, and teach gnet_stats_copy_basic() to convert from kernel to user land, using legacy structure (struct gnet_stats_basic) Based on a report and initial patch from Michael Spang. Reported-by: Michael Spang <mspang@csclub.uwaterloo.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: correct off-by-one write allocations reportsEric Dumazet2009-06-181-2/+2
| | | | | | | | | | | | | commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) changed initial sk_wmem_alloc value. We need to take into account this offset when reporting sk_wmem_alloc to user, in PROC_FS files or various ioctls (SIOCOUTQ/TIOCOUTQ) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: Update drops stats in act_policeJarek Poplawski2009-06-181-0/+4
| | | | | | | | | | | | Action police statistics could be misleading because drops are not shown when expected. With feedback from: Jamal Hadi Salim <hadi@cyberus.ca> Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2009-06-151-0/+3
|\ | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/scsi/fcoe/fcoe.c net/core/drop_monitor.c net/core/net-traces.c
| * cls_cgroup: Fix oops when user send improperly 'tc filter add' requestMinoru Usui2009-06-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found a bug in cls_cgroup_change() in cls_cgroup.c. cls_cgroup_change() expected tca[TCA_OPTIONS] was set from user space properly, but tc in iproute2-2.6.29-1 (which I used) didn't set it. In the current source code of tc in git, it set tca[TCA_OPTIONS]. git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git If we always use a newest iproute2 in git when we use cls_cgroup, we don't face this oops probably. But I think, kernel shouldn't panic regardless of use program's behaviour. Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
* | pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2USJarek Poplawski2009-06-152-4/+4
| | | | | | | | | | | | | | | | Let's use TICKS instead of US, so PSCHED_TICKS2NS and PSCHED_NS2TICKS (like in PSCHED_TICKS_PER_SEC already) to avoid misleading. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: use symbolic values for ndo_start_xmit() return codesPatrick McHardy2009-06-131-1/+1
| | | | | | | | | | | | | | | | | | | | Convert magic values 1 and -1 to NETDEV_TX_BUSY and NETDEV_TX_LOCKED respectively. 0 (NETDEV_TX_OK) is not changed to keep the noise down, except in very few cases where its in direct proximity to one of the other values. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | pkt_sched: Use PSCHED_SHIFT in PSCHED time conversionJarek Poplawski2009-06-091-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Use PSCHED_SHIFT constant instead of '10' in PSCHED_US2NS() and PSCHED_NS2US() macros to enable changing this value later. Additionally use PSCHED_SHIFT in sch_hfsc SM_SHIFT and ISM_SHIFT definitions. This part of the patch is based on feedback from Patrick McHardy <kaber@trash.net>. Reported-by: Antonio Almeida <vexwek@gmail.com> Tested-by: Antonio Almeida <vexwek@gmail.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: skb->dst accessorsEric Dumazet2009-06-035-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: skb->rtable accessorEric Dumazet2009-06-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb Delete skb->rtable field Setting rtable is not allowed, just set dst instead as rtable is an alias. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2009-06-032-17/+28
|\| | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/forcedeth.c
| * net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel ↵Minoru Usui2009-06-021-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | panic when we use cls_cgroup This patch fixes a bug which unconfigured struct tcf_proto keeps chaining in tc_ctl_tfilter(), and avoids kernel panic in cls_cgroup_classify() when we use cls_cgroup. When we execute 'tc filter add', tcf_proto is allocated, initialized by classifier's init(), and chained. After it's chained, tc_ctl_tfilter() calls classifier's change(). When classifier's change() fails, tc_ctl_tfilter() does not free and keeps tcf_proto. In addition, cls_cgroup is initialized in change() not in init(). It accesses unconfigured struct tcf_proto which is chained before change(), then hits Oops. Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Tested-by: Minoru Usui <usui@mxm.nes.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
| * cls_cgroup: read classid atomically in classifierPaul Menage2009-05-271-11/+11
| | | | | | | | | | | | | | | | | | | | | | Avoid reading the unsynchronized value cs->classid multiple times, since it could change concurrently from non-zero to zero; this would result in the classifier returning a positive result with a bogus (zero) classid. Signed-off-by: Paul Menage <menage@google.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: txq_trans_update() helperEric Dumazet2009-05-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We would like to get rid of netdev->trans_start = jiffies; that about all net drivers have to use in their start_xmit() function, and use txq->trans_start instead. This can be done generically in core network, as suggested by David. Some devices, (particularly loopback) dont need trans_start update, because they dont have transmit watchdog. We could add a new device flag, or rely on fact that txq->tran_start can be updated is txq->xmit_lock_owner is different than -1. Use a helper function to hide our choice. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sch_teql: Use net_device internal statsEric Dumazet2009-05-201-12/+5
| | | | | | | | | | | | | | | | | | | | We can slightly reduce size of teqlN structure, not duplicating stats structure in teql_master but using stats field from net_device.stats for tx_errors and from netdev_queue for tx_bytes/tx_packets/tx_dropped values. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2009-05-191-2/+3
|\| | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/scsi/fcoe/fcoe.c
| * sch_teql: should not dereference skb after ndo_start_xmit()Eric Dumazet2009-05-191-2/+3
| | | | | | | | | | | | | | | | | | | | | | It is illegal to dereference a skb after a successful ndo_start_xmit() call. We must store skb length in a local variable instead. Bug was introduced in 2.6.27 by commit 0abf77e55a2459aa9905be4b226e4729d5b4f0cb (net_sched: Add accessor function for packet length for qdiscs) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: tx scalability works : trans_startEric Dumazet2009-05-181-9/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct net_device trans_start field is a hot spot on SMP and high performance devices, particularly multi queues ones, because every transmitter dirties it. Is main use is tx watchdog and bonding alive checks. But as most devices dont use NETIF_F_LLTX, we have to lock a netdev_queue before calling their ndo_start_xmit(). So it makes sense to move trans_start from net_device to netdev_queue. Its update will occur on a already present (and in exclusive state) cache line, for free. We can do this transition smoothly. An old driver continue to update dev->trans_start, while an updated one updates txq->trans_start. Further patches could also put tx_bytes/tx_packets counters in netdev_queue to avoid dirtying dev->stats (vlan device comes to mind) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | cls_cgroup: remove unneeded cgroup_lockLi Zefan2009-05-171-6/+0
|/ | | | | | | | | | | We can remove this lock here, since we are in cgroup write handler and thus the cgrp is guaranteed to be valid, and no lock is needed when writing a u32 variable. Signed-off-by: Li Zefan <lizf@cn.fujitsuc.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net-sched: fix bfifo default limitPatrick McHardy2009-05-071-1/+1
| | | | | | | | | When no limit is given, the bfifo uses a default of tx_queue_len * mtu. Packets handled by qdiscs include the link layer header, so this should be taken into account, similar to what other qdiscs do. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Only store high 16 bits of kernel generated filter prioritiesRobert Love2009-05-021-1/+1
| | | | | | | | | | | | The kernel should only be using the high 16 bits of a kernel generated priority. Filter priorities in all other cases only use the upper 16 bits of the u32 'prio' field of 'struct tcf_proto', but when the kernel generates the priority of a filter is saves all 32 bits which can result in incorrect lookup failures when a filter needs to be deleted or modified. Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sch_netem: Fix an inconsistency in ingress netem timestamps.Jarek Poplawski2009-04-201-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Alex Sidorenko reported: "while experimenting with 'netem' we have found some strange behaviour. It seemed that ingress delay as measured by 'ping' command shows up on some hosts but not on others. After some investigation I have found that the problem is that skbuff->tstamp field value depends on whether there are any packet sniffers enabled. That is: - if any ptype_all handler is registered, the tstamp field is as expected - if there are no ptype_all handlers, the tstamp field does not show the delay" This patch prevents unnecessary update of tstamp in dev_queue_xmit_nit() on ingress path (with act_mirred) adding a check, so minimal overhead on the fast path, but only when sniffers etc. are active. Since netem at ingress seems to logically emulate a network before a host, tstamp is zeroed to trigger the update and pretend delays are from the outside. Reported-by: Alex Sidorenko <alexandre.sidorenko@hp.com> Tested-by: Alex Sidorenko <alexandre.sidorenko@hp.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netsched: Allow meta match on vlan tag on receiveStephen Hemminger2009-04-141-2/+4
| | | | | | | | | When vlan acceleration is used on receive, the vlan tag is maintained outside of the skb data. The existing vlan tag match only works on TX path because it uses vlan_get_tag which tests for VLAN_HW_TX_ACCEL. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/*: use linux/kernel.h swap()Ilpo Järvinen2009-03-211-7/+2
| | | | | | | | | tcp_sack_swap seems unnecessary so I pushed swap to the caller. Also removed comment that seemed then pointless, and added include when not already there. Compile tested. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: Change misleading code in class delete.Jarek Poplawski2009-03-164-8/+20
| | | | | | | | | | | | | | | | | | | | While looking for a possible reason of bugzilla report on HTB oops: http://bugzilla.kernel.org/show_bug.cgi?id=12858 I found the code in htb_delete calling htb_destroy_class on zero refcount is very misleading: it can suggest this is a common path, and destroy is called under sch_tree_lock. Actually, this can never happen like this because before deletion cops->get() is done, and after delete a class is still used by tclass_notify. The class destroy is always called from cops->put(), so without sch_tree_lock. This doesn't mean much now (since 2.6.27) because all vulnerable calls were moved from htb_destroy_class to htb_delete, but there was a bug in older kernels. The same change is done for other classful scheds, which, it seems, didn't have similar locking problems here. Reported-by: m0sia <m0sia@m0sia.ru> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>