linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'fec-ptp'	David S. Miller	2014-10-14	3	-17/+267
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Luwei Zhou says: ==================== Enable FEC pps feather Change from v2 to v3: -Using the default channel 0 to be PPS channel not PTP_PIN_SET/GETFUNC interface. -Using the linux definition of NSEC_PER_SEC. Change from v1 to v2: - Fix the potential 32-bit multiplication overflow issue. - Optimize the hareware adjustment code to improve efficiency as Richard suggested - Use ptp PTP_PIN_SET/GETFUNC interface to set PPS channel not device tree and add PTP_PF_PPS enumeration - Modify comments style ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: fec: ptp: Enable PPS output based on ptp clock	Luwei Zhou	2014-10-14	3	-1/+205
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FEC ptp timer has 4 channel compare/trigger function. It can be used to enable pps output. The pulse would be ouput high exactly on N second. The pulse ouput high on compare event mode is used to produce pulse per second. The pulse width would be one cycle based on ptp timer clock source.Since 31-bit ptp hardware timer is used, the timer will wrap more than 2 seconds. We need to reload the compare compare event about every 1 second. Signed-off-by: Luwei Zhou <b45643@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: fec: ptp: Use hardware algorithm to adjust PTP counter.	Luwei Zhou	2014-10-14	2	-12/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The FEC IP supports hardware adjustment for ptp timer. Refer to the description of ENET_ATCOR and ENET_ATINC registers in the spec about the hardware adjustment. This patch uses hardware support to adjust the ptp offset and frequency on the slave side. Signed-off-by: Luwei Zhou <b45643@freescale.com> Signed-off-by: Frank Li <Frank.Li@freescale.com> Signed-off-by: Fugang Duan <b38611@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: fec: ptp: Use the 31-bit ptp timer.	Luwei Zhou	2014-10-14	1	-4/+6
\|/ \| \| \| \| \| \| \| \| \| \|	When ptp switches from software adjustment to hardware ajustment, linux ptp can't converge. It is caused by the IP limit. Hardware adjustment logcial have issue when ptp counter runs over 0x80000000(31 bit counter). The internal IP reference manual already remove 32bit free-running count support. This patch replace the 32-bit PTP timer with 31-bit. Signed-off-by: Luwei Zhou <b45643@freescale.com> Signed-off-by: Frank Li <Frank.Li@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv6: remove aca_lock spinlock from struct ifacaddr6	Li RongQing	2014-10-14	2	-2/+0
\| \| \| \| \| \| \|	no user uses this lock. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	x86: bpf_jit: fix two bugs in eBPF JIT compiler	Alexei Starovoitov	2014-10-14	1	-6/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. JIT compiler using multi-pass approach to converge to final image size, since x86 instructions are variable length. It starts with large gaps between instructions (so some jumps may use imm32 instead of imm8) and iterates until total program size is the same as in previous pass. This algorithm works only if program size is strictly decreasing. Programs that use LD_ABS insn need additional code in prologue, but it was not emitted during 1st pass, so there was a chance that 2nd pass would adjust imm32->imm8 jump offsets to the same number of bytes as increase in prologue, which may cause algorithm to erroneously decide that size converged. Fix it by always emitting largest prologue in the first pass which is detected by oldproglen==0 check. Also change error check condition 'proglen != oldproglen' to fail gracefully. 2. while staring at the code realized that 64-byte buffer may not be enough when 1st insn is large, so increase it to 128 to avoid buffer overflow (theoretical maximum size of prologue+div is 109) and add runtime check. Fixes: 622582786c9e ("net: filter: x86: internal BPF JIT") Reported-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Tested-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: fix ooo_okay setting vs Small Queues	Eric Dumazet	2014-10-14	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TCP Small Queues (tcp_tsq_handler()) can hold one reference on sk->sk_wmem_alloc, preventing skb->ooo_okay being set. We should relax test done to set skb->ooo_okay to take care of this extra reference. Minimal truesize of skb containing one byte of payload is SKB_TRUESIZE(1) Without this fix, we have more chance locking flows into the wrong transmit queue. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	skbuff: fix ftrace handling in skb_unshare	Alexander Aring	2014-10-14	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \|	If the skb is not dropped afterwards we should run consume_skb instead kfree_skb. Inside of function skb_unshare we do always a kfree_skb, doesn't depend if skb_copy failed or was successful. This patch switch this behaviour like skb_share_check, if allocation of sk_buff failed we use kfree_skb otherwise consume_skb. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	fm10k: Add skb->xmit_more support	Alexander Duyck	2014-10-14	1	-31/+34
\| \| \| \| \| \| \| \| \| \| \| \|	This change adds support for skb->xmit_more based on the changes that were made to igb to support the feature. The main changes are moving up the check for maybe_stop_tx so that we can check netif_xmit_stopped to determine if we must write the tail because we can add no further buffers. Acked-by: Matthew Vick <matthew.vick@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: fec: Fix sparse warnings with different lock contexts for basic block	Nimrod Andy	2014-10-14	1	-10/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	reproduce: make ARCH=arm C=1 2>fec.txt drivers/net/ethernet/freescale/fec_main.o cat fec.txt sparse warnings: drivers/net/ethernet/freescale/fec_main.c:2916:12: warning: context imbalance in 'fec_set_features' - different lock contexts for basic block Christopher Li suggest to change as below: if (need_lock) { lock(); do_something_real(); unlock(); } else { do_something_real(); } Reported-by: Fabio Estevam <festevam@gmail.com> Suggested-by: Christopher Li <sparse@chrisli.org> Signed-off-by: Fugang Duan <B38611@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	MAINTAINERS: Update contact information for Vince Bridgers	Vince Bridgers	2014-10-14	1	-1/+1
\| \| \| \| \|	Signed-off-by: Vince Bridgers <vbridger@opensource.altera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'sctp'	David S. Miller	2014-10-14	6	-89/+77
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Daniel Borkmann says: ==================== Here are some SCTP fixes. [ Note, immediate workaround would be to disable ASCONF (it is sysctl disabled by default). It is actually only used together with chunk authentication. ] ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: sctp: fix remote memory pressure from excessive queueing	Daniel Borkmann	2014-10-14	2	-26/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This scenario is not limited to ASCONF, just taken as one example triggering the issue. When receiving ASCONF probes in the form of ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------> [...] ---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------> ... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed ASCONFs and have increasing serial numbers, we process such ASCONF chunk(s) marked with !end_of_packet and !singleton, since we have not yet reached the SCTP packet end. SCTP does only do verification on a chunk by chunk basis, as an SCTP packet is nothing more than just a container of a stream of chunks which it eats up one by one. We could run into the case that we receive a packet with a malformed tail, above marked as trailing JUNK. All previous chunks are here goodformed, so the stack will eat up all previous chunks up to this point. In case JUNK does not fit into a chunk header and there are no more other chunks in the input queue, or in case JUNK contains a garbage chunk header, but the encoded chunk length would exceed the skb tail, or we came here from an entirely different scenario and the chunk has pdiscard=1 mark (without having had a flush point), it will happen, that we will excessively queue up the association's output queue (a correct final chunk may then turn it into a response flood when flushing the queue ;)): I ran a simple script with incremental ASCONF serial numbers and could see the server side consuming excessive amount of RAM [before/after: up to 2GB and more]. The issue at heart is that the chunk train basically ends with !end_of_packet and !singleton markers and since commit 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet") therefore preventing an output queue flush point in sctp_do_sm() -> sctp_cmd_interpreter() on the input chunk (chunk = event_arg) even though local_cork is set, but its precedence has changed since then. In the normal case, the last chunk with end_of_packet=1 would trigger the queue flush to accommodate possible outgoing bundling. In the input queue, sctp_inq_pop() seems to do the right thing in terms of discarding invalid chunks. So, above JUNK will not enter the state machine and instead be released and exit the sctp_assoc_bh_rcv() chunk processing loop. It's simply the flush point being missing at loop exit. Adding a try-flush approach on the output queue might not work as the underlying infrastructure might be long gone at this point due to the side-effect interpreter run. One possibility, albeit a bit of a kludge, would be to defer invalid chunk freeing into the state machine in order to possibly trigger packet discards and thus indirectly a queue flush on error. It would surely be better to discard chunks as in the current, perhaps better controlled environment, but going back and forth, it's simply architecturally not possible. I tried various trailing JUNK attack cases and it seems to look good now. Joint work with Vlad Yasevich. Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: sctp: fix panic on duplicate ASCONF chunks	Daniel Borkmann	2014-10-14	2	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When receiving a e.g. semi-good formed connection scan in the form of ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ---------------- ASCONF_a; ASCONF_b -----------------> ... where ASCONF_a equals ASCONF_b chunk (at least both serials need to be equal), we panic an SCTP server! The problem is that good-formed ASCONF chunks that we reply with ASCONF_ACK chunks are cached per serial. Thus, when we receive a same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do not need to process them again on the server side (that was the idea, also proposed in the RFC). Instead, we know it was cached and we just resend the cached chunk instead. So far, so good. Where things get nasty is in SCTP's side effect interpreter, that is, sctp_cmd_interpreter(): While incoming ASCONF_a (chunk = event_arg) is being marked !end_of_packet and !singleton, and we have an association context, we do not flush the outqueue the first time after processing the ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it queued up, although we set local_cork to 1. Commit 2e3216cd54b1 changed the precedence, so that as long as we get bundled, incoming chunks we try possible bundling on outgoing queue as well. Before this commit, we would just flush the output queue. Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we continue to process the same ASCONF_b chunk from the packet. As we have cached the previous ASCONF_ACK, we find it, grab it and do another SCTP_CMD_REPLY command on it. So, effectively, we rip the chunk->list pointers and requeue the same ASCONF_ACK chunk another time. Since we process ASCONF_b, it's correctly marked with end_of_packet and we enforce an uncork, and thus flush, thus crashing the kernel. Fix it by testing if the ASCONF_ACK is currently pending and if that is the case, do not requeue it. When flushing the output queue we may relink the chunk for preparing an outgoing packet, but eventually unlink it when it's copied into the skb right before transmission. Joint work with Vlad Yasevich. Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks	Daniel Borkmann	2014-10-14	3	-63/+60
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for ASCONF chunk") added basic verification of ASCONF chunks, however, it is still possible to remotely crash a server by sending a special crafted ASCONF chunk, even up to pre 2.6.12 kernels: skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950 end:0x440 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:129! [...] Call Trace: <IRQ> [<ffffffff8144fb1c>] skb_put+0x5c/0x70 [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp] [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp] [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20 [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp] [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp] [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0 [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp] [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp] [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp] [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp] [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter] [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0 [<ffffffff81497078>] ip_local_deliver+0x98/0xa0 [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440 [<ffffffff81496ac5>] ip_rcv+0x275/0x350 [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750 [<ffffffff81460588>] netif_receive_skb+0x58/0x60 This can be triggered e.g., through a simple scripted nmap connection scan injecting the chunk after the handshake, for example, ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ------------------ ASCONF; UNKNOWN ------------------> ... where ASCONF chunk of length 280 contains 2 parameters ... 1) Add IP address parameter (param length: 16) 2) Add/del IP address parameter (param length: 255) ... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the Address Parameter in the ASCONF chunk is even missing, too. This is just an example and similarly-crafted ASCONF chunks could be used just as well. The ASCONF chunk passes through sctp_verify_asconf() as all parameters passed sanity checks, and after walking, we ended up successfully at the chunk end boundary, and thus may invoke sctp_process_asconf(). Parameter walking is done with WORD_ROUND() to take padding into account. In sctp_process_asconf()'s TLV processing, we may fail in sctp_process_asconf_param() e.g., due to removal of the IP address that is also the source address of the packet containing the ASCONF chunk, and thus we need to add all TLVs after the failure to our ASCONF response to remote via helper function sctp_add_asconf_response(), which basically invokes a sctp_addto_chunk() adding the error parameters to the given skb. When walking to the next parameter this time, we proceed with ... length = ntohs(asconf_param->param_hdr.length); asconf_param = (void )asconf_param + length; ... instead of the WORD_ROUND()'ed length, thus resulting here in an off-by-one that leads to reading the follow-up garbage parameter length of 12336, and thus throwing an skb_over_panic for the reply when trying to sctp_addto_chunk() next time, which implicitly calls the skb_put() with that length. Fix it by using sctp_walk_params() [ which is also used in INIT parameter processing ] macro in the verification and* in ASCONF processing: it will make sure we don't spill over, that we walk parameters WORD_ROUND()'ed. Moreover, we're being more defensive and guard against unknown parameter types and missized addresses. Joint work with Vlad Yasevich. Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	phy/micrel: KSZ8031RNL RMII clock reconfiguration bug	Bruno Thomsen	2014-10-14	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bug: Unable to send and receive Ethernet packets with Micrel PHY. Affected devices: KSZ8031RNL (commercial temp) KSZ8031RNLI (industrial temp) Description: PHY device is correctly detected during probe. PHY power-up default is 25MHz crystal clock input and output 50MHz RMII clock to MAC. Reconfiguration of PHY to input 50MHz RMII clock from MAC causes PHY to become unresponsive if clock source is changed after Operation Mode Strap Override (OMSO) register setup. Cause: Long lead times on parts where clock setup match circuit design forces the usage of similar parts with wrong default setup. Solution: Swapped KSZ8031 register setup and added phy_write return code validation. Tested with Freescale i.MX28 Fast Ethernet Controler (fec). Signed-off-by: Bruno Thomsen <bth@kamstrup.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'bcmgenet_systemport'	David S. Miller	2014-10-10	3	-7/+9
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Florian Fainelli says: ==================== net: bcmgenet & systemport fixes This patch series fixes an off-by-one error introduced during a previous change, and the two other fixes fix a wake depth imbalance situation for the Wake-on-LAN interrupt line. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: systemport: avoid unbalanced enable_irq_wake calls	Florian Fainelli	2014-10-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Multiple enable_irq_wake() calls will keep increasing the IRQ wake_depth, which ultimately leads to the following types of situation: 1) enable Wake-on-LAN interrupt w/o password 2) enable Wake-on-LAN interrupt w/ password 3) enable Wake-on-LAN interrupt w/o password 4) disable Wake-on-LAN interrupt After step 4), SYSTEMPORT would always wake-up the system no matter what wake-up device we use, which is not what we want. Fix this by making sure there are no unbalanced enable_irq_wake() calls. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: bcmgenet: avoid unbalanced enable_irq_wake calls	Florian Fainelli	2014-10-10	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Multiple enable_irq_wake() calls will keep increasing the IRQ wake_depth, which ultimately leads to the following types of situation: 1) enable Wake-on-LAN interrupt w/o password 2) enable Wake-on-LAN interrupt w/ password 3) enable Wake-on-LAN interrupt w/o password 4) disable Wake-on-LAN interrupt After step 4), GENET would always wake-up the system no matter what wake-up device we use, which is not what we want. Fix this by making sure there are no unbalanced enable_irq_wake() calls. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: bcmgenet: fix off-by-one in incrementing read pointer	Florian Fainelli	2014-10-10	1	-5/+4
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit b629be5c8399d7c423b92135eb43a86c924d1cbc ("net: bcmgenet: check harder for out of memory conditions") moved the increment of the local read pointer before reading from the hardware descriptor using dmadesc_get_length_status(), which creates an off-by-one situation. Fix this by moving again the read_ptr increment after we have read the hardware descriptor to get both the control block and the read pointer back in sync. Fixes: b629be5c8399 ("net: bcmgenet: check harder for out of memory conditions") Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Petri Gynther <pgynther@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'net-drivers-pgcnt'	David S. Miller	2014-10-10	5	-23/+30
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Eric Dumazet says: ==================== net: fix races accessing page->_count This is illegal to use atomic_set(&page->_count, ...) even if we 'own' the page. Other entities in the kernel need to use get_page_unless_zero() to get a reference to the page before testing page properties, so we could loose a refcount increment. The only case it is valid is when page->_count is 0, we can use this in __netdev_alloc_frag() Note that I never seen crashes caused by these races, the issue was reported by Andres Lagar-Cavilla and Hugh Dickins. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: fix races in page->_count manipulation	Eric Dumazet	2014-10-10	1	-7/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is illegal to use atomic_set(&page->_count, ...) even if we 'own' the page. Other entities in the kernel need to use get_page_unless_zero() to get a reference to the page before testing page properties, so we could loose a refcount increment. The only case it is valid is when page->_count is 0 Fixes: 540eb7bf0bbed ("net: Update alloc frag to reduce get/put page usage and recycle pages") Signed-off-by: Eric Dumaze <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlx4: fix race accessing page->_count	Eric Dumazet	2014-10-10	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is illegal to use atomic_set(&page->_count, ...) even if we 'own' the page. Other entities in the kernel need to use get_page_unless_zero() to get a reference to the page before testing page properties, so we could loose a refcount increment. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ixgbe: fix race accessing page->_count	Eric Dumazet	2014-10-10	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is illegal to use atomic_set(&page->_count, 2) even if we 'own' the page. Other entities in the kernel need to use get_page_unless_zero() to get a reference to the page before testing page properties, so we could loose a refcount increment. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	igb: fix race accessing page->_count	Eric Dumazet	2014-10-10	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is illegal to use atomic_set(&page->_count, 2) even if we 'own' the page. Other entities in the kernel need to use get_page_unless_zero() to get a reference to the page before testing page properties, so we could loose a refcount increment. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	fm10k: fix race accessing page->_count	Eric Dumazet	2014-10-10	1	-4/+3
\|/ \| \| \| \| \| \| \| \| \| \|	This is illegal to use atomic_set(&page->_count, 2) even if we 'own' the page. Other entities in the kernel need to use get_page_unless_zero() to get a reference to the page before testing page properties, so we could loose a refcount increment. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/phy: micrel: Add clock support for KSZ8021/KSZ8031	Sascha Hauer	2014-10-10	3	-2/+36
\| \| \| \| \| \| \| \| \| \| \|	The KSZ8021 and KSZ8031 support RMII reference input clocks of 25MHz and 50MHz. Both PHYs differ in the default frequency they expect after reset. If this differs from the actual input clock, then register 0x1f bit 7 must be changed. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	flow-dissector: Fix alignment issue in __skb_flow_get_ports	Alexander Duyck	2014-10-10	1	-13/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch addresses a kernel unaligned access bug seen on a sparc64 system with an igb adapter. Specifically the __skb_flow_get_ports was returning a be32 pointer which was then having the value directly returned. In order to prevent this it is actually easier to simply not populate the ports or address values when an skb is not present. In this case the assumption is that the data isn't needed and rather than slow down the faster aligned accesses by making them have to assume the unaligned path on architectures that don't support efficent unaligned access it makes more sense to simply switch off the bits that were copying the source and destination address/port for the case where we only care about the protocol types and lengths which are normally 16 bit fields anyway. Reported-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: filter: fix the comments	Li RongQing	2014-10-10	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \|	1. sk_run_filter has been renamed, sk_filter() is using SK_RUN_FILTER. 2. Remove wrong comments about storing intermediate value. 3. replace sk_run_filter with __bpf_prog_run for check_load_and_stores's comments Cc: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Documentation: replace __sk_run_filter with __bpf_prog_run	Li RongQing	2014-10-10	1	-2/+2
\| \| \| \| \| \| \| \|	__sk_run_filter has been renamed as __bpf_prog_run, so replace them in comments Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'macvlan'	David S. Miller	2014-10-10	1	-8/+13
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jason Baron says: ==================== macvlan: optimize receive path So after porting this optimization to net-next, I found that the netperf results of TCP_RR regress right at the maximum peak of transactions/sec. That is as I increase the number of threads via the first argument to super_netperf, the number of transactions/sec keep increasing, peak, and then start decreasing. It is right at the peak, that I see a small regression with this patch (see results in patch 2/2). Without the patch, the ksoftirqd threads are the top cpu consumers threads on the system, since the extra 'netif_rx()', is queuing more softirq work, whereas with the patch, the ksoftirqd threads are below all of the 'netserver' threads in terms of their cpu usage. So there appears to be some interaction between how softirqs are serviced at the peak here and this patch. I think the test results are still supportive of this approach, but I wanted to be clear on my findings. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	macvlan: optimize the receive path	jbaron@akamai.com	2014-10-10	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The netif_rx() call on the fast path of macvlan_handle_frame() appears to be there to ensure that we properly throttle incoming packets. However, it would appear as though the proper throttling is already in place for all possible ingress paths, and that the call is redundant. If packets are arriving from the physical NIC, we've already throttled them by this point. Otherwise, if they are coming via macvlan_queue_xmit(), it calls either 'dev_forward_skb()', which ends up calling netif_rx_internal(), or else in the broadcast case, we are throttling via macvlan_broadcast_enqueue(). The test results below are from off the box to an lxc instance running macvlan. Once the tranactions/sec stop increasing, the cpu idle time has gone to 0. Results are from a quad core Intel E3-1270 V2@3.50GHz box with bnx2x 10G card. for i in {10,100,200,300,400,500}; do super_netperf $i -H $ip -t TCP_RR; done Average of 5 runs. trans/sec trans/sec (3.17-rc7-net-next) (3.17-rc7-net-next + this patch) ---------- ---------- 208101 211534 (+1.6%) 839493 850162 (+1.3%) 845071 844053 (-.12%) 816330 819623 (+.4%) 778700 789938 (+1.4%) 735984 754408 (+2.5%) Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	macvlan: pass 'bool' type to macvlan_count_rx()	jbaron@akamai.com	2014-10-10	1	-3/+3
\|/ \| \| \| \| \| \|	Pass last argument to macvlan_count_rx() as the correct bool type. Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'xgene'	David S. Miller	2014-10-10	12	-75/+566
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Iyappan Subramanian says: ==================== Add 10GbE support to APM X-Gene SoC ethernet driver Adding 10GbE support to APM X-Gene SoC ethernet driver. v4: Address comments from v3 * dtb: resolved merge conflict for the net tree v3: Address comments from v2 * dtb: changed to use all-zeros for the mac address v2: Address comments from v1 * created preparatory patch to review before adding new functionality * dtb: updated to use tabs consistently v1: * Initial version ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	drivers: net: xgene: Add 10GbE ethtool support	Iyappan Subramanian	2014-10-10	1	-6/+22
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Keyur Chudgar <kchudgar@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	drivers: net: xgene: Add 10GbE support	Iyappan Subramanian	2014-10-10	6	-30/+438
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Added 10GbE support - Removed unused macros/variables - Moved mac_init call to the end of hardware init Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Keyur Chudgar <kchudgar@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	drivers: net: xgene: Preparing for adding 10GbE support	Iyappan Subramanian	2014-10-10	4	-41/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Rearranged code to pave the way for adding 10GbE support - Added mac_ops structure containing function pointers for mac specific functions - Added port_ops structure containing function pointers for port specific functions Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Keyur Chudgar <kchudgar@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	dtb: Add 10GbE node to APM X-Gene SoC device tree	Iyappan Subramanian	2014-10-10	2	-2/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added 10GbE interface and clock nodes. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Keyur Chudgar <kchudgar@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Documentation: dts: Update section header for APM X-Gene	Iyappan Subramanian	2014-10-10	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Keyur Chudgar <kchudgar@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	MAINTAINERS: Update APM X-Gene section	Iyappan Subramanian	2014-10-10	1	-1/+0
\|/ \| \| \| \| \| \| \|	Updated APM X-Gene ethernet driver maintainers list. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Keyur Chudgar <kchudgar@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: bpf: fix bpf syscall dependence on anon_inodes	Alexei Starovoitov	2014-10-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	minimal configurations where EPOLL, PERF_EVENTS, etc are disabled, but NET is enabled, are failing to build with link error: kernel/built-in.o: In function `bpf_prog_load': syscall.c:(.text+0x3b728): undefined reference to `anon_inode_getfd' fix it by selecting ANON_INODES when NET is enabled Reported-by: Michal Sojka <sojkam1@fel.cvut.cz> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next	David S. Miller	2014-10-10	3	-162/+7
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pablo Neira Ayuso says: ==================== Netfilter fixes for net-next This batch contains two fixes for what you have in your net-next, they are: 1) Remove nf_send_reset6() from header file. This function now resides in the nf_reject_ipv6 module. Reported by Eric Dumazet. 2) Fix wrong NFT_REJECT_ICMPX_MAX definition and adjust code to fix errors reported by Dan Carpenter's static analysis tools. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	netfilter: fix wrong arithmetics regarding NFT_REJECT_ICMPX_MAX	Pablo Neira Ayuso	2014-10-07	2	-7/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NFT_REJECT_ICMPX_MAX should be __NFT_REJECT_ICMPX_MAX - 1. nft_reject_icmp_code() and nft_reject_icmpv6_code() are called from the packet path, so BUG_ON in case we try to access an unknown abstracted ICMP code. This should not happen since we already validate this from nft_reject_{inet,bridge}_init(). Fixes: 51b0a5d ("netfilter: nft_reject: introduce icmp code abstraction for inet and bridge") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| *	netfilter: kill nf_send_reset6() from include/net/netfilter/ipv6/nf_reject.h	Pablo Neira Ayuso	2014-10-07	1	-155/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nf_send_reset6() now resides in net/ipv6/netfilter/nf_reject_ipv6.c Fixes: c8d7b98 ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Eric Dumazet <edumazet@google.com>
* \|	Merge tag 'master-2014-10-08' of ↵	David S. Miller	2014-10-10	10	-42/+69
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-10-09 Please pull this batch of fixes intended for the 3.18 stream! Andrea Merello makes rtl818x_pci use a more reasonable transmission rate for HW generated frames. Fabian Frederick tweaks some kernel-doc bits to avoid warnings. Larry Finger corrects a possible unaligned access in the rtlwifi code. Marek Puzyniak avoids a kernel panic in ath9k_hw_reset. Sujith Manoharan goes for the hat trick -- he fixes a smatch warning in the shared ath code, he fixes a crash in ath9k, and he corrects a sequence number assignment problem in ath9k too. For ease of merging, I pulled the last bits of the wireless tree as well... Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	rtlwifi: Fix possible unaligned array in ether_addr_copy()	Larry Finger	2014-10-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two macros used to copy BSSID information use ether_addr_copy(), thus the arrays must be 2-byte aligned. In one case, the array could become unaligned if the struct containing it were changed. Use the __unaligned(2) attribute to retain the necessary alignment. In addition, the magic number used to specify the size of the array is replaced by ETH_ALEN. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| * \|	ath9k: Fix sequence number assignment	Sujith Manoharan	2014-10-08	4	-23/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, ath9k uses a global counter for all frames that need to be assigned a sequence number. QoS-data frames are handled properly since they have a per-tid counter. But, beacons and other management frames use the same counter even if multiple interfaces or contexts are present. Fix this issue by making the counter per-interface and using it when mac80211 sets IEEE80211_TX_CTL_ASSIGN_SEQ. Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| * \|	net: rfkill: kernel-doc warning fixes	Fabian Frederick	2014-10-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	s/state/blocked Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| * \|	ath9k_htc: avoid kernel panic in ath9k_hw_reset	Marek Puzyniak	2014-10-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	hw pointer of ath_hw is not assigned to proper value in function ath9k_hw_reset what finally causes kernel panic. This can be solved by proper initialization of ath_hw in ath9k_init_priv. Signed-off-by: Marek Puzyniak <marek.puzyniak@tieto.com> Acked-by: Oleksij Rempel <linux@rempel-privat.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| * \|	ath9k: Fix crash in MCC mode	Sujith Manoharan	2014-10-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a channel context is removed, the hw_queue_base is set to -1, this will result in a panic because ath9k_chanctx_stop_queues() can be called on an interface that is not assigned to any context yet - for example, when trying to scan. Fix this issue by setting the hw_queue_base to zero when a channel context is removed. Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>