summaryrefslogtreecommitdiffstats
path: root/net/rxrpc (follow)
Commit message (Collapse)AuthorAgeFilesLines
* net: Pass kern from net_proto_family.create to sk_allocEric W. Biederman2015-05-111-1/+1
| | | | | | | | | In preparation for changing how struct net is refcounted on kernel sockets pass the knowledge that we are creating a kernel socket from sock_create_kern through to sk_alloc. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Add a struct net parameter to sock_create_kernEric W. Biederman2015-05-111-2/+2
| | | | | | | | This is long overdue, and is part of cleaning up how we allocate kernel sockets that don't reference count struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* new helper: msg_data_left()Al Viro2015-04-111-10/+9
| | | | | | convert open-coded instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge remote-tracking branch 'dh/afs' into for-davemAl Viro2015-04-114-29/+148
|\
| * RxRPC: Handle VERSION Rx protocol packetsDavid Howells2015-04-013-1/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handle VERSION Rx protocol packets. We should respond to a VERSION packet with a string indicating the Rx version. This is a maximum of 64 characters and is padded out to 65 chars with NUL bytes. Note that other AFS clients use the version request as a NAT keepalive so we need to handle it rather than returning an abort. The standard formulation seems to be: <project> <version> built <yyyy>-<mm>-<dd> for example: " OpenAFS 1.6.2 built 2013-05-07 " (note the three extra spaces) as obtained with: rxdebug grand.mit.edu -version from the openafs package. Signed-off-by: David Howells <dhowells@redhat.com>
| * RxRPC: Use iov_iter_count() in rxrpc_send_data() instead of the len argumentDavid Howells2015-04-011-13/+11
| | | | | | | | | | | | | | Use iov_iter_count() in rxrpc_send_data() to get the remaining data length instead of using the len argument as the len argument is now redundant. Signed-off-by: David Howells <dhowells@redhat.com>
| * RxRPC: Don't call skb_add_data() if there's no data to copyDavid Howells2015-04-011-19/+19
| | | | | | | | | | | | | | Don't call skb_add_data() in rxrpc_send_data() if there's no data to copy and also skip the calculations associated with it in such a case. Signed-off-by: David Howells <dhowells@redhat.com>
| * RxRPC: Fix the conversion to iov_iterDavid Howells2015-04-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit: commit af2b040e470b470bfc881981db3c796072853eae Author: Al Viro <viro@zeniv.linux.org.uk> Date: Thu Nov 27 21:44:24 2014 -0500 Subject: rxrpc: switch rxrpc_send_data() to iov_iter primitives incorrectly changes a do-while loop into a while loop in rxrpc_send_data(). Unfortunately, at least one pass through the loop is required - even if there is no data - so that the packet the closes the send phase can be sent if MSG_MORE is not set. Signed-off-by: David Howells <dhowells@redhat.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-03-201-1/+1
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/emulex/benet/be_main.c net/core/sysctl_net_core.c net/ipv4/inet_diag.c The be_main.c conflict resolution was really tricky. The conflict hunks generated by GIT were very unhelpful, to say the least. It split functions in half and moved them around, when the real actual conflict only existed solely inside of one function, that being be_map_pci_bars(). So instead, to resolve this, I checked out be_main.c from the top of net-next, then I applied the be_main.c changes from 'net' since the last time I merged. And this worked beautifully. The inet_diag.c and sysctl_net_core.c conflicts were simple overlapping changes, and were easily to resolve. Signed-off-by: David S. Miller <davem@davemloft.net>
| * rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()Al Viro2015-03-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [I would really like an ACK on that one from dhowells; it appears to be quite straightforward, but...] MSG_PEEK isn't passed to ->recvmsg() via msg->msg_flags; as the matter of fact, neither the kernel users of rxrpc, nor the syscalls ever set that bit in there. It gets passed via flags; in fact, another such check in the same function is done correctly - as flags & MSG_PEEK. It had been that way (effectively disabled) for 8 years, though, so the patch needs beating up - that case had never been tested. If it is correct, it's -stable fodder. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-03-101-2/+2
|\| | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/cadence/macb.c Overlapping changes in macb driver, mostly fixes and cleanups in 'net' overlapping with the integration of at91_ether into macb in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
| * ip: fix error queue empty skb handlingWillem de Bruijn2015-03-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reading from the error queue, msg_name and msg_control are only populated for some errors. A new exception for empty timestamp skbs added a false positive on icmp errors without payload. `traceroute -M udpconn` only displayed gateways that return payload with the icmp error: the embedded network headers are pulled before sock_queue_err_skb, leaving an skb with skb->len == 0 otherwise. Fix this regression by refining when msg_name and msg_control branches are taken. The solutions for the two fields are independent. msg_name only makes sense for errors that configure serr->port and serr->addr_offset. Test the first instead of skb->len. This also fixes another issue. saddr could hold the wrong data, as serr->addr_offset is not initialized in some code paths, pointing to the start of the network header. It is only valid when serr->port is set (non-zero). msg_control support differs between IPv4 and IPv6. IPv4 only honors requests for ICMP and timestamps with SOF_TIMESTAMPING_OPT_CMSG. The skb->len test can simply be removed, because skb->dev is also tested and never true for empty skbs. IPv6 honors requests for all errors aside from local errors and timestamps on empty skbs. In both cases, make the policy more explicit by moving this logic to a new function that decides whether to process msg_control and that optionally prepares the necessary fields in skb->cb[]. After this change, the IPv4 and IPv6 paths are more similar. The last case is rxrpc. Here, simply refine to only match timestamps. Fixes: 49ca0d8bfaf3 ("net-timestamp: no-payload option") Reported-by: Jan Niehusmann <jan@gondor.com> Signed-off-by: Willem de Bruijn <willemb@google.com> ---- Changes v1->v2 - fix local origin test inversion in ip6_datagram_support_cmsg - make v4 and v6 code paths more similar by introducing analogous ipv4_datagram_support_cmsg - fix compile bug in rxrpc Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-03-041-4/+5
|\| | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/rocker/rocker.c The rocker commit was two overlapping changes, one to rename the ->vport member to ->pport, and another making the bitmask expression use '1ULL' instead of plain '1'. Signed-off-by: David S. Miller <davem@davemloft.net>
| * rxrpc: don't multiply with HZ twiceFlorian Westphal2015-03-011-1/+1
| | | | | | | | | | | | | | rxrpc_resend_timeout has an initial value of 4 * HZ; use it as-is. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * rxrpc: terminate retrans loop when sending of skb failsFlorian Westphal2015-03-011-3/+4
| | | | | | | | | | | | | | | | | | | | | | Typo, 'stop' is never set to true. Seems intent is to not attempt to retransmit more packets after sendmsg returns an error. This change is based on code inspection only. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Remove iocb argument from sendmsg and recvmsgYing Xue2015-03-024-24/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | After TIPC doesn't depend on iocb argument in its internal implementations of sendmsg() and recvmsg() hooks defined in proto structure, no any user is using iocb argument in them at all now. Then we can drop the redundant iocb argument completely from kinds of implementations of both sendmsg() and recvmsg() in the entire networking stack. Cc: Christoph Hellwig <hch@lst.de> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: rxrpc: change call to sock_recv_ts_and_drops() on rxrpc recvmsg to ↵Eyal Birger2015-03-021-1/+1
|/ | | | | | | | | | | | | | | | | | | | sock_recv_timestamp() Commit 3b885787ea4112 ("net: Generalize socket rx gap / receive queue overflow cmsg") allowed receiving packet dropcount information as a socket level option. RXRPC sockets recvmsg function was changed to support this by calling sock_recv_ts_and_drops() instead of sock_recv_timestamp(). However, protocol families wishing to receive dropcount should call sock_queue_rcv_skb() or set the dropcount specifically (as done in packet_rcv()). This was not done for rxrpc and thus this feature never worked on these sockets. Formalizing this by not calling sock_recv_ts_and_drops() in rxrpc as part of an effort to move skb->dropcount into skb->cb[] Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'for-davem' of ↵David S. Miller2015-02-051-36/+10
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs More iov_iter work from Al Viro. Signed-off-by: David S. Miller <davem@davemloft.net>
| * rxrpc: make the users of rxrpc_kernel_send_data() set kvec-backed msg_iter ↵Al Viro2015-02-041-3/+0
| | | | | | | | | | | | | | | | | | | | properly Use iov_iter_kvec() there, get rid of set_fs() games - now that rxrpc_send_data() uses iov_iter primitives, it'll handle ITER_KVEC just fine. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * rxrpc: switch rxrpc_send_data() to iov_iter primitivesAl Viro2015-02-041-33/+10
| | | | | | | | | | | | | | | | Convert skb_add_data() to iov_iter; allows to get rid of the explicit messing with iovec in its only caller - skb_add_data() will keep advancing ->msg_iter for us, so there's no need to similate that manually. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | net-timestamp: no-payload optionWillem de Bruijn2015-02-031-0/+5
|/ | | | | | | | | | | | | | | | | | | Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit timestamps, this loops timestamps on top of empty packets. Doing so reduces the pressure on SO_RCVBUF. Payload inspection and cmsg reception (aside from timestamps) are no longer possible. This works together with a follow on patch that allows administrators to only allow tx timestamping if it does not loop payload or metadata. Signed-off-by: Willem de Bruijn <willemb@google.com> ---- Changes (rfc -> v1) - add documentation - remove unnecessary skb->len test (thanks to Richard Cochran) Signed-off-by: David S. Miller <davem@davemloft.net>
* net: introduce helper macro for_each_cmsghdrGu Zheng2014-12-111-1/+1
| | | | | | | | Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating cmsghdr from msghdr, just cleanup. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* put iov_iter into msghdrAl Viro2014-12-091-5/+3
| | | | | | | | Note that the code _using_ ->msg_iter at that point will be very unhappy with anything other than unshifted iovec-backed iov_iter. We still need to convert users to proper primitives. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* net: Add and use skb_copy_datagram_msg() helper.David S. Miller2014-11-051-1/+1
| | | | | | | | | | | | | | | This encapsulates all of the skb_copy_datagram_iovec() callers with call argument signature "skb, offset, msghdr->msg_iov, length". When we move to iov_iters in the networking, the iov_iter object will sit in the msghdr. Having a helper like this means there will be less places to touch during that transformation. Based upon descriptions and patch from Al Viro. Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'next' of ↵Linus Torvalds2014-10-121-2/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris. Mostly ima, selinux, smack and key handling updates. * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits) integrity: do zero padding of the key id KEYS: output last portion of fingerprint in /proc/keys KEYS: strip 'id:' from ca_keyid KEYS: use swapped SKID for performing partial matching KEYS: Restore partial ID matching functionality for asymmetric keys X.509: If available, use the raw subjKeyId to form the key description KEYS: handle error code encoded in pointer selinux: normalize audit log formatting selinux: cleanup error reporting in selinux_nlmsg_perm() KEYS: Check hex2bin()'s return when generating an asymmetric key ID ima: detect violations for mmaped files ima: fix race condition on ima_rdwr_violation_check and process_measurement ima: added ima_policy_flag variable ima: return an error code from ima_add_boot_aggregate() ima: provide 'ima_appraise=log' kernel option ima: move keyring initialization to ima_init() PKCS#7: Handle PKCS#7 messages that contain no X.509 certs PKCS#7: Better handling of unsupported crypto KEYS: Overhaul key identification when searching for asymmetric keys KEYS: Implement binary asymmetric key ID handling ...
| * KEYS: Remove key_type::match in favour of overriding default by match_preparseDavid Howells2014-09-161-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A previous patch added a ->match_preparse() method to the key type. This is allowed to override the function called by the iteration algorithm. Therefore, we can just set a default that simply checks for an exact match of the key description with the original criterion data and allow match_preparse to override it as needed. The key_type::match op is then redundant and can be removed, as can the user_match() function. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-09-231-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/mips/net/bpf_jit.c drivers/net/can/flexcan.c Both the flexcan and MIPS bpf_jit conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | RxRPC: Fix missing __user annotationDavid Howells2014-09-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fix a missing __user annotation in a cast of a user space pointer (found by checker). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | sock: deduplicate errqueue dequeueWillem de Bruijn2014-09-021-13/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sk->sk_error_queue is dequeued in four locations. All share the exact same logic. Deduplicate. Also collapse the two critical sections for dequeue (at the top of the recv handler) and signal (at the bottom). This moves signal generation for the next packet forward, which should be harmless. It also changes the behavior if the recv handler exits early with an error. Previously, a signal for follow-up packets on the errqueue would then not be scheduled. The new behavior, to always signal, is arguably a bug fix. For rxrpc, the change causes the same function to be called repeatedly for each queued packet (because the recv handler == sk_error_report). It is likely that all packets will fail for the same reason (e.g., memory exhaustion). This code runs without sk_lock held, so it is not safe to trust that sk->sk_err is immutable inbetween releasing q->lock and the subsequent test. Introduce int err just to avoid this potential race. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: remove dead code after sk_data_ready changeEric Dumazet2014-08-231-8/+1
|/ / | | | | | | | | | | | | | | | | As a followup to commit 676d23690fb ("net: Fix use after free by removing length arg from sk_data_ready callbacks"), we can remove some useless code in sock_queue_rcv_skb() and rxrpc_queue_rcv_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2014-08-061-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: "Highlights: 1) Steady transitioning of the BPF instructure to a generic spot so all kernel subsystems can make use of it, from Alexei Starovoitov. 2) SFC driver supports busy polling, from Alexandre Rames. 3) Take advantage of hash table in UDP multicast delivery, from David Held. 4) Lighten locking, in particular by getting rid of the LRU lists, in inet frag handling. From Florian Westphal. 5) Add support for various RFC6458 control messages in SCTP, from Geir Ola Vaagland. 6) Allow to filter bridge forwarding database dumps by device, from Jamal Hadi Salim. 7) virtio-net also now supports busy polling, from Jason Wang. 8) Some low level optimization tweaks in pktgen from Jesper Dangaard Brouer. 9) Add support for ipv6 address generation modes, so that userland can have some input into the process. From Jiri Pirko. 10) Consolidate common TCP connection request code in ipv4 and ipv6, from Octavian Purdila. 11) New ARP packet logger in netfilter, from Pablo Neira Ayuso. 12) Generic resizable RCU hash table, with intial users in netlink and nftables. From Thomas Graf. 13) Maintain a name assignment type so that userspace can see where a network device name came from (enumerated by kernel, assigned explicitly by userspace, etc.) From Tom Gundersen. 14) Automatic flow label generation on transmit in ipv6, from Tom Herbert. 15) New packet timestamping facilities from Willem de Bruijn, meant to assist in measuring latencies going into/out-of the packet scheduler, latency from TCP data transmission to ACK, etc" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1536 commits) cxgb4 : Disable recursive mailbox commands when enabling vi net: reduce USB network driver config options. tg3: Modify tg3_tso_bug() to handle multiple TX rings amd-xgbe: Perform phy connect/disconnect at dev open/stop amd-xgbe: Use dma_set_mask_and_coherent to set DMA mask net: sun4i-emac: fix memory leak on bad packet sctp: fix possible seqlock seadlock in sctp_packet_transmit() Revert "net: phy: Set the driver when registering an MDIO bus device" cxgb4vf: Turn off SGE RX/TX Callback Timers and interrupts in PCI shutdown routine team: Simplify return path of team_newlink bridge: Update outdated comment on promiscuous mode net-timestamp: ACK timestamp for bytestreams net-timestamp: TCP timestamping net-timestamp: SCHED timestamp on entering packet scheduler net-timestamp: add key to disambiguate concurrent datagrams net-timestamp: move timestamp flags out of sk_flags net-timestamp: extend SCM_TIMESTAMPING ancillary data struct cxgb4i : Move stray CPL definitions to cxgb4 driver tcp: reduce spurious retransmits due to transient SACK reneging qlcnic: Initialize dcbnl_ops before register_netdev ...
| * net/rxrpc/ar-key.c: drop negativity check on unsigned valueAndrey Utkin2014-07-211-1/+1
| | | | | | | | | | | | | | Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=80611 Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | KEYS: RxRPC: Use key preparsingDavid Howells2014-07-221-68/+97
|/ | | | | | | | | Make use of key preparsing in the RxRPC protocol so that quota size determination can take place prior to keyring locking when a key is being added. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com>
* af_rxrpc: Fix XDR length check in rxrpc key demarshalling.Nathaniel W Filardo2014-05-161-1/+1
| | | | | | | | | | There may be padding on the ticket contained in the key payload, so just ensure that the claimed token length is large enough, rather than exactly the right size. Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Fix use after free by removing length arg from sk_data_ready callbacks.David S. Miller2014-04-112-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: David S. Miller <davem@davemloft.net>
* af_rxrpc: Keep rxrpc_call pointers in a hashtableTim Smith2014-03-043-106/+277
| | | | | | | | | | | | Keep track of rxrpc_call structures in a hashtable so they can be found directly from the network parameters which define the call. This allows incoming packets to be routed directly to a call without walking through hierarchy of peer -> transport -> connection -> call and all the spinlocks that that entailed. Signed-off-by: Tim Smith <tim@electronghost.co.uk> Signed-off-by: David Howells <dhowells@redhat.com>
* af_rxrpc: Request an ACK for every alternate DATA packetDavid Howells2014-02-261-2/+6
| | | | | | | | | | | | | | | | | | | | Set the RxRPC header flag to request an ACK packet for every odd-numbered DATA packet unless it's the last one (which implicitly requests an ACK anyway). This is similar to how librx appears to work. If we don't do this, we'll send out a full window of packets and then just sit there until the other side gets bored and sends an ACK to indicate that it's been idle for a while and has received no new packets. Requesting a lot of ACKs shouldn't be a problem as ACKs should be merged when possible. As AF_RXRPC currently works, it will schedule an ACK to be generated upon receipt of a DATA packet with the ACK-request packet set - and in the time taken to schedule this in a work queue, several other packets are likely to arrive and then all get ACK'd together. Signed-off-by: David Howells <dhowells@redhat.com>
* af_rxrpc: Expose more RxRPC parameters via sysctlsDavid Howells2014-02-264-4/+60
| | | | | | | | | Expose RxRPC parameters via sysctls to control the Rx window size, the Rx MTU maximum size and the number of packets that can be glued into a jumbo packet. More info added to Documentation/networking/rxrpc.txt. Signed-off-by: David Howells <dhowells@redhat.com>
* af_rxrpc: Improve ACK productionDavid Howells2014-02-263-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve ACK production by the following means: (1) Don't send an ACK_REQUESTED ack immediately even if the RXRPC_MORE_PACKETS flag isn't set on a data packet that has also has RXRPC_REQUEST_ACK set. MORE_PACKETS just means that the sender just emptied its Tx data buffer. More data will be forthcoming unless RXRPC_LAST_PACKET is also flagged. It is possible to see runs of DATA packets with MORE_PACKETS unset that aren't waiting for an ACK. It is therefore better to wait a small instant to see if we can combine an ACK for several packets. (2) Don't send an ACK_IDLE ack immediately unless we're responding to the terminal data packet of a call. Whilst sending an ACK_IDLE mid-call serves to let the other side know that we won't be asking it to resend certain Tx buffers and that it can discard them, spamming it with loads of acks just because we've temporarily run out of data just distracts it. (3) Put the ACK_IDLE ack generation timeout up to half a second rather than a single jiffy. Just because we haven't been given more data immediately doesn't mean that more isn't forthcoming. The other side may be busily finding the data to send to us. Signed-off-by: David Howells <dhowells@redhat.com>
* af_rxrpc: Add sysctls for configuring RxRPC parametersDavid Howells2014-02-2611-27/+211
| | | | | | | | | | | Add sysctls for configuring RxRPC protocol handling, specifically controls on delays before ack generation, the delay before resending a packet, the maximum lifetime of a call and the expiration times of calls, connections and transports that haven't been recently used. More info added in Documentation/networking/rxrpc.txt. Signed-off-by: David Howells <dhowells@redhat.com>
* af_rxrpc: Fix UDP MTU calculation from ICMP_FRAG_NEEDEDDavid Howells2014-02-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | AF_RXRPC sends UDP packets with the "Don't Fragment" bit set in an attempt to determine the maximum packet size between the local socket and the peer by invoking the generation of ICMP_FRAG_NEEDED packets. Once a packet is sent with the "Don't Fragment" bit set, it is then inconvenient to break it up as that requires recalculating all the rxrpc serial and sequence numbers and reencrypting all the fragments, so we switch off the "Don't Fragment" service temporarily and send the bounced packet again. Future packets then use the new MTU. That's all fine. The problem lies in rxrpc_UDP_error_report() where the code that deals with ICMP_FRAG_NEEDED packets lives. Packets of this type have a field (ee_info) to indicate the maximum packet size at the reporting node - but sometimes ee_info isn't filled in and is just left as 0 and the code must allow for this. When ee_info is 0, the code should take the MTU size we're currently using and reduce it for the next packet we want to send. However, it takes ee_info (which is known to be 0) and tries to reduce that instead. This was discovered by Coverity. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>
* af_rxrpc: Prevent RxRPC peers from ABORT-storming one anotherTim Smith2014-02-071-4/+8
| | | | | | | | | | | | When an ABORT is sent, aborting a connection, the sender quite reasonably forgets about the connection. If another frame is received, another ABORT will be sent. When the receiver gets it, it no longer applies to an extant connection, so an ABORT is sent, and so on... Prevent this by never sending a rejection for an ABORT packet. Signed-off-by: Tim Smith <tim@electronghost.co.uk> Signed-off-by: David Howells <dhowells@redhat.com>
* af_rxrpc: Remove incorrect checksum calculation from rxrpc_recvmsg()Tim Smith2014-02-071-24/+1
| | | | | | | | | | | | | | The UDP checksum was already verified in rxrpc_data_ready() - which calls skb_checksum_complete() - as the RxRPC packet header contains no checksum of its own. Subsequent calls to skb_copy_and_csum_datagram_iovec() are thus redundant and are, in any case, being passed only a subset of the UDP payload - so the checksum will always fail if that path is taken. So there is no need to check skb->ip_summed in rxrpc_recvmsg(), and no need for the csum_copy_error: exit path. Signed-off-by: Tim Smith <tim@electronghost.co.uk> Signed-off-by: David Howells <dhowells@redhat.com>
* Merge tag 'rxrpc-20140126' of ↵David S. Miller2014-01-292-1/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== RxRPC fixes Here are some small AF_RXRPC fixes. (1) Fix a place where a spinlock is taken conditionally but is released unconditionally. (2) Fix a double-free that happens when cleaning up on a checksum error. (3) Fix handling of CHECKSUM_PARTIAL whilst delivering messages to userspace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * af_rxrpc: Handle frames delivered from another VMTim Smith2014-01-261-1/+2
| | | | | | | | | | | | | | | | On input, CHECKSUM_PARTIAL should be treated the same way as CHECKSUM_UNNECESSARY. See include/linux/skbuff.h Signed-off-by: Tim Smith <tim@electronghost.co.uk> Signed-off-by: David Howells <dhowells@redhat.com>
| * af_rxrpc: Avoid setting up double-free on checksum errorTim Smith2014-01-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb_kill_datagram() does not dequeue the skb when MSG_PEEK is unset. This leaves a free'd skb on the queue, resulting a double-free later. Without this, the following oops can occur: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff8154fcf7>] skb_dequeue+0x47/0x70 PGD 0 Oops: 0002 [#1] SMP Modules linked in: af_rxrpc ... CPU: 0 PID: 1191 Comm: listen Not tainted 3.12.0+ #4 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8801183536b0 ti: ffff880035c92000 task.ti: ffff880035c92000 RIP: 0010:[<ffffffff8154fcf7>] skb_dequeue+0x47/0x70 RSP: 0018:ffff880035c93db8 EFLAGS: 00010097 RAX: 0000000000000246 RBX: ffff8800d2754b00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8800d254c084 RBP: ffff880035c93dd0 R08: ffff880035c93cf0 R09: ffff8800d968f270 R10: 0000000000000000 R11: 0000000000000293 R12: ffff8800d254c070 R13: ffff8800d254c084 R14: ffff8800cd861240 R15: ffff880119b39720 FS: 00007f37a969d740(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 00000000d4413000 CR4: 00000000000006f0 Stack: ffff8800d254c000 ffff8800d254c070 ffff8800d254c2c0 ffff880035c93df8 ffffffffa041a5b8 ffff8800cd844c80 ffffffffa04385a0 ffff8800cd844cb0 ffff880035c93e18 ffffffff81546cef ffff8800d45fea00 0000000000000008 Call Trace: [<ffffffffa041a5b8>] rxrpc_release+0x128/0x2e0 [af_rxrpc] [<ffffffff81546cef>] sock_release+0x1f/0x80 [<ffffffff81546d62>] sock_close+0x12/0x20 [<ffffffff811aaba1>] __fput+0xe1/0x230 [<ffffffff811aad3e>] ____fput+0xe/0x10 [<ffffffff810862cc>] task_work_run+0xbc/0xe0 [<ffffffff8106a3be>] do_exit+0x2be/0xa10 [<ffffffff8116dc47>] ? do_munmap+0x297/0x3b0 [<ffffffff8106ab8f>] do_group_exit+0x3f/0xa0 [<ffffffff8106ac04>] SyS_exit_group+0x14/0x20 [<ffffffff8166b069>] system_call_fastpath+0x16/0x1b Signed-off-by: Tim Smith <tim@electronghost.co.uk> Signed-off-by: David Howells <dhowells@redhat.com>
| * RxRPC: do not unlock unheld spinlock in rxrpc_connect_exclusive()Alexey Khoroshilov2014-01-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | If rx->conn is not NULL, rxrpc_connect_exclusive() does not acquire the transport's client lock, but it still releases it. The patch adds locking of the spinlock to this path. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: David Howells <dhowells@redhat.com>
* | rxrpc: out of bound read in debug codeDan Carpenter2014-01-221-7/+14
| | | | | | | | | | | | | | | | | | Smatch complains because we are using an untrusted index into the rxrpc_acks[] array. It's just a read and it's only in the debug code, but it's simple enough to add a check and fix it. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: add build-time checks for msg->msg_name sizeSteffen Hurrle2014-01-191-2/+2
|/ | | | | | | | | | | | | | | This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg handler msg_name and msg_namelen logic"). DECLARE_SOCKADDR validates that the structure we use for writing the name information to is not larger than the buffer which is reserved for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR consistently in sendmsg code paths. Signed-off-by: Steffen Hurrle <steffen@hurrle.net> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: rework recvmsg handler msg_name and msg_namelen logicHannes Frederic Sowa2013-11-211-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch now always passes msg->msg_namelen as 0. recvmsg handlers must set msg_namelen to the proper size <= sizeof(struct sockaddr_storage) to return msg_name to the user. This prevents numerous uninitialized memory leaks we had in the recvmsg handlers and makes it harder for new code to accidentally leak uninitialized memory. Optimize for the case recvfrom is called with NULL as address. We don't need to copy the address at all, so set it to NULL before invoking the recvmsg handler. We can do so, because all the recvmsg handlers must cope with the case a plain read() is called on them. read() also sets msg_name to NULL. Also document these changes in include/linux/net.h as suggested by David Miller. Changes since RFC: Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address. It also more naturally reflects the logic by the callers of verify_iovec. With this change in place I could remove " if (!uaddr || msg_sys->msg_namelen == 0) msg->msg_name = NULL ". This change does not alter the user visible error logic as we ignore msg_namelen as long as msg_name is NULL. Also remove two unnecessary curly brackets in ___sys_recvmsg and change comments to netdev style. Cc: David Miller <davem@davemloft.net> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>