summaryrefslogtreecommitdiffstats
path: root/net/ipv4 (follow)
Commit message (Collapse)AuthorAgeFilesLines
* [NET]: change layout of ehash tableEric Dumazet2007-02-085-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ehash table layout is currently this one : First half of this table is used by sockets not in TIME_WAIT state Second half of it is used by sockets in TIME_WAIT state. This is non optimal because of for a given hash or socket, the two chain heads are located in separate cache lines. Moreover the locks of the second half are never used. If instead of this halving, we use two list heads in inet_ehash_bucket instead of only one, we probably can avoid one cache miss, and reduce ram usage, particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep together related chains to speedup lookups and socket state change. In this patch I did not try to align struct inet_ehash_bucket, but a future patch could try to make this structure have a convenient size (a power of two or a multiple of L1_CACHE_SIZE). I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so maybe we dont need to scratch our heads to align the bucket... Note : In case struct inet_ehash_bucket is not a power of two, we could probably change alloc_large_system_hash() (in case it use __get_free_pages()) to free the unused space. It currently allocates a big zone, but the last quarter of it could be freed. Again, this should be a temporary 'problem'. Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: {ip,ip6}_tables: use struct xt_table instead of redefined ↵Jan Engelhardt2007-02-086-15/+15
| | | | | | | | structure names Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: {ip,ip6}_tables: remove x_tables wrapper functionsJan Engelhardt2007-02-0822-107/+138
| | | | | | | | | Use the x_tables functions directly to make it better visible which parts are shared between ip_tables and ip6_tables. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: x_tables: fix return values for LOG/ULOGJan Engelhardt2007-02-082-5/+9
| | | | | | Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: NAT: optional source port randomization supportEric Leblond2007-02-088-4/+46
| | | | | | | | This patch adds support to NAT to randomize source ports. Signed-off-by: Eric Leblond <eric@inl.fr> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: add IPv6-capable TCPMSS targetPatrick McHardy2007-02-083-234/+0
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Add UDPLITE support in a few missing spotsPatrick McHardy2007-02-081-0/+1
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: nf_nat: remove broken HOOKNAME macroPatrick McHardy2007-02-081-6/+0
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: tcp conntrack: do liberal tracking for picked up connectionsPatrick McHardy2007-02-081-25/+15
| | | | | | | | | | | | Do liberal tracking (only RSTs need to be in-window) for connections picked up without seeing a SYN to deal with window scaling. Also change logging of invalid packets not to log packets accepted by liberal tracking to avoid spamming the logs. Based on suggestion from James Ralston <ralston@pobox.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: unregister_netdevice as voidStephen Hemminger2007-02-082-2/+4
| | | | | | | | | There was no real useful information from the unregister_netdevice() return code, the only error occurred in a situation that was a driver bug. So change it to a void function. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4/IPV6] multicast: Check add_grhead() return valueAlexey Dobriyan2007-02-081-0/+2
| | | | | | | | add_grhead() allocates memory with GFP_ATOMIC and in at least two places skb from it passed to skb_put() without checking. Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [XFRM]: Fix missed error setting in xfrm4_policy.cDavid S. Miller2007-02-081-0/+1
| | | | | | | When we can't find the afinfo we should return EAFNOSUPPORT. GCC warned about the uninitialized 'err' for this path as well. Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPSEC]: IPv4 over IPv6 IPsec tunnelMiika Komu2007-02-081-17/+33
| | | | | | | | | This is the patch to support IPv4 over IPv6 IPsec. Signed-off-by: Miika Komu <miika@iki.fi> Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi> Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPSEC]: IPv6 over IPv4 IPsec tunnelMiika Komu2007-02-081-11/+46
| | | | | | | | | This is the patch to support IPv6 over IPv4 IPsec Signed-off-by: Miika Komu <miika@iki.fi> Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi> Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPSEC]: exporting xfrm_state_afinfoMiika Komu2007-02-081-0/+1
| | | | | | | | | This patch exports xfrm_state_afinfo. Signed-off-by: Miika Komu <miika@iki.fi> Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi> Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Don't apply FIN exception to full TSO segments.John Heffner2007-02-081-1/+2
| | | | | Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Check num sacks in SACK fast pathBaruch Even2007-02-081-0/+5
| | | | | | | | | | We clear the unused parts of the SACK cache, This prevents us from mistakenly taking the cache data if the old data in the SACK cache is the same as the data in the SACK block. This assumes that we never receive an empty SACK block with start and end both at zero. Signed-off-by: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Seperate DSACK from SACK fast pathBaruch Even2007-02-081-35/+31
| | | | | | | | | | | | | | Move DSACK code outside the SACK fast-path checking code. If the DSACK determined that the information was too old we stayed with a partial cache copied. Most likely this matters very little since the next packet will not be DSACK and we will find it in the cache. but it's still not good form and there is little reason to couple the two checks. Since the SACK receive cache doesn't need the data to be in host order we also remove the ntohl in the checking loop. Signed-off-by: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Advance fast path pointer for first block onlyBaruch Even2007-02-081-10/+24
| | | | | | | | | Only advance the SACK fast-path pointer for the first block, the fast-path assumes that only the first block advances next time so we should not move the cached skb for the next sack blocks. Signed-off-by: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4/IPV6]: Always wait for IPSEC SA resolution in socket contexts.David S. Miller2007-02-085-5/+5
| | | | | | | | | | Do this even for non-blocking sockets. This avoids the silly -EAGAIN that applications can see now, even for non-blocking sockets in some cases (f.e. connect()). With help from Venkat Tekkirala. Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: remove tcp header from tcp_v4_check (take #2)Frederik Deweerdt2007-02-084-9/+9
| | | | | | | | | | | The tcphdr struct passed to tcp_v4_check is not used, the following patch removes it from the parameter list. This adds the netfilter modifications missing in the patch I sent for rc3-mm1. Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Don't BUG on undersized allocationsPatrick McHardy2007-02-084-19/+31
| | | | | | | | | | | | | | | Currently netlink users BUG when the allocated skb for an event notification is undersized. While this is certainly a kernel bug, its not critical and crashing the kernel is too drastic, especially when considering that these errors have appeared multiple times in the past and it BUGs even if no listeners are present. This patch replaces BUG by WARN_ON and changes the notification functions to inform potential listeners of undersized allocations using a unique error code (EMSGSIZE). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: ctnetlink: fix compile failure with NF_CONNTRACK_MARK=nPatrick McHardy2007-02-031-0/+2
| | | | | | | | | | CC net/netfilter/nf_conntrack_netlink.o net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_conntrack_event': net/netfilter/nf_conntrack_netlink.c:392: error: 'struct nf_conn' has no member named 'mark' make[3]: *** [net/netfilter/nf_conntrack_netlink.o] Error 1 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: SIP conntrack: fix out of bounds memory accessPatrick McHardy2007-01-301-1/+1
| | | | | | | | When checking for an @-sign in skp_epaddr_len, make sure not to run over the packet boundaries. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: SIP conntrack: fix skipping over user info in SIP headersLars Immisch2007-01-301-1/+7
| | | | | | | | | | | | | When trying to skip over the username in the Contact header, stop at the end of the line if no @ is found to avoid mangling following headers. We don't need to worry about continuation lines because we search inside a SIP URI. Fixes Netfilter Bugzilla #532. Signed-off-by: Lars Immisch <lars@ibp.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Fix single-entry /proc/net/fib_trie output.Robert Olsson2007-01-271-6/+7
| | | | | | | | | When main table is just a single leaf this gets printed as belonging to the local table in /proc/net/fib_trie. A fix is below. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: nf_nat_pptp: fix expectation removalPatrick McHardy2007-01-261-2/+2
| | | | | | | | | When removing the expectation for the opposite direction, the PPTP NAT helper initializes the tuple for lookup with the addresses of the opposite direction, which makes the lookup fail. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: nf_nat: fix ICMP translation with statically linked conntrackPatrick McHardy2007-01-261-10/+10
| | | | | | | | | | | | | When nf_nat/nf_conntrack_ipv4 are linked statically, nf_nat is initialized before nf_conntrack_ipv4, which makes the nf_ct_l3proto_find_get(AF_INET) call during nf_nat initialization return the generic l3proto instead of the AF_INET specific one. This breaks ICMP error translation since the generic protocol always initializes the IPs in the tuple to 0. Change the linking order and put nf_conntrack_ipv4 first. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Restore SKB socket owner setting in tcp_transmit_skb().David S. Miller2007-01-262-2/+4
| | | | | | | | | | | | | Revert 931731123a103cfb3f70ac4b7abfc71d94ba1f03 We can't elide the skb_set_owner_w() here because things like certain netfilter targets (such as owner MATCH) need a socket to be set on the SKB for correct operation. Thanks to Jan Engelhardt and other netfilter list members for pointing this out. Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Fix sorting of SACK blocks.Baruch Even2007-01-251-4/+5
| | | | | | | | | | | | | The sorting of SACK blocks actually munges them rather than sort, causing the TCP stack to ignore some SACK information and breaking the assumption of ordered SACK blocks after sorting. The sort takes the data from a second buffer which isn't moved causing subsequent data moves to occur from the wrong location. The fix is to use a temporary buffer as a normal sort does. Signed-off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Fix the fib trie iterator to work with a single entry routing tablesEric W. Biederman2007-01-241-5/+16
| | | | | | | | | | | | | | | | | | | | In a kernel with trie routing enabled I had a simple routing setup with only a single route to the outside world and no default route. "ip route table list main" showed my the route just fine but /proc/net/route was an empty file. What was going on? Thinking it was a bug in something I did and I looked deeper. Eventually I setup a second route and everything looked correct, huh? Finally I realized that the it was just the iterator pair in fib_trie_get_first, fib_trie_get_next just could not handle a routing table with a single entry. So to save myself and others further confusion, here is a simple fix for the fib proc iterator so it works even when there is only a single route in a routing table. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: rare bad TCP checksum with 2.6.19Jarek Poplawski2007-01-241-1/+2
| | | | | | | | | | | | | | The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE" changed to unconditional copying of ip_summed field from collapsed skb. This patch reverts this change. The majority of substantial work including heavy testing and diagnosing by: Michael Tokarev <mjt@tls.msk.ru> Possible reasons pointed by: Herbert Xu and Patrick McHardy. Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: skb is unexpectedly freed.Masayuki Nakagawa2007-01-241-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I encountered a kernel panic with my test program, which is a very simple IPv6 client-server program. The server side sets IPV6_RECVPKTINFO on a listening socket, and the client side just sends a message to the server. Then the kernel panic occurs on the server. (If you need the test program, please let me know. I can provide it.) This problem happens because a skb is forcibly freed in tcp_rcv_state_process(). When a socket in listening state(TCP_LISTEN) receives a syn packet, then tcp_v6_conn_request() will be called from tcp_rcv_state_process(). If the tcp_v6_conn_request() successfully returns, the skb would be discarded by __kfree_skb(). However, in case of a listening socket which was already set IPV6_RECVPKTINFO, an address of the skb will be stored in treq->pktopts and a ref count of the skb will be incremented in tcp_v6_conn_request(). But, even if the skb is still in use, the skb will be freed. Then someone still using the freed skb will cause the kernel panic. I suggest to use kfree_skb() instead of __kfree_skb(). Signed-off-by: Masayuki Nakagawa <nakagawa.msy@ncos.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: ctnetlink: fix leak in ctnetlink_create_conntrack error pathPatrick McHardy2007-01-241-1/+1
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PATCH] email change for shemminger@osdl.orgStephen Hemminger2007-01-231-1/+1
| | | | | | | | | Change my email address to reflect OSDL merger. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> [ The irony. Somebody still has his sign-off message hardcoded in a script or his brainstem ;^] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [IPV4] devinet: inetdev_init out label moved after RCU assignmentJarek Poplawski2007-01-091-1/+2
| | | | | | | | | inetdev_init out label moved after RCU assignment (final suggestion by Herbert Xu) Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: style updates for the inet_sock->is_icsk assignment fixPaul Moore2007-01-091-1/+1
| | | | | | | | | A quick patch to change the inet_sock->is_icsk assignment to better fit with existing kernel coding style. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: nf_nat: fix hanging connections when loading the NAT modulePatrick McHardy2007-01-091-1/+1
| | | | | | | | | | | | When loading the NAT module, existing connection tracking entries don't have room for NAT information allocated and packets are dropped, causing hanging connections. They really should be entered into the NAT table as NULL mappings, but the current allocation scheme doesn't allow this. For now simply accept those packets to avoid the hanging connections. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Fix iov_len calculation in tcp_v4_send_ack().Craig Schlenter2007-01-091-1/+1
| | | | | | | | | | | | This fixes the ftp stalls present in the current kernels. All credit goes to Komuro <komurojun-mbn@nifty.com> for tracking this down. The patch is untested but it looks *cough* obviously correct. Signed-off-by: Craig Schlenter <craig@codefountain.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Fix incorrect "inet_sock->is_icsk" assignment.Paul Moore2007-01-091-1/+1
| | | | | | | | | | | The inet_create() and inet6_create() functions incorrectly set the inet_sock->is_icsk field. Both functions assume that the is_icsk field is large enough to hold at least a INET_PROTOSW_ICSK value when it is actually only a single bit. This patch corrects the assignment by doing a boolean comparison whose result will safely fit into a single bit field. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4/IPV6]: Fix inet{,6} device initialization order.David L Stevens2007-01-041-2/+3
| | | | | | | | | | | It is important that we only assign dev->ip{,6}_ptr only after all portions of the inet{,6} are setup. Otherwise we can receive packets before the multicast spinlocks et al. are initialized. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: nf_nat: fix MASQUERADE crash on device downMartin Josefsson2007-01-041-1/+4
| | | | | | | | | Check the return value of nfct_nat() in device_cmp(), we might very well have non NAT conntrack entries as well (Netfilter bugzilla #528). Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: New connection tracking is not EXPERIMENTAL anymorePatrick McHardy2007-01-041-2/+2
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Fix routing of REJECT target generated packets in output chainPatrick McHardy2007-01-041-2/+5
| | | | | | | | | Packets generated by the REJECT target in the output chain have a local destination address and a foreign source address. Make sure not to use the foreign source address for the output route lookup. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: compat offsets size changeDmitry Mishin2007-01-041-5/+5
| | | | | | | | | Used by compat code offsets of entries should be 'unsigned int' as entries array size has this dimension. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Fix reversed logic in udp_get_port().David S. Miller2006-12-221-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When this code was converted to use sk_for_each() the logic for the "best hash chain length" code was reversed, breaking everything. The original code was of the form: size = 0; do { if (++size >= best_size_so_far) goto next; } while ((sk = sk->next) != NULL); best_size_so_far = size; best = result; next:; and this got converted into: sk_for_each(sk2, node, head) if (++size < best_size_so_far) { best_size_so_far = size; best = result; } Which does something very very different from the original. Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Fix BUG of ip_rt_send_redirect()Li Yewang2006-12-181-1/+2
| | | | | | | Fix the redirect packet of the router if the jiffies wraparound. Signed-off-by: Li Yewang <lyw@nanjing-fnst.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Trivial fix to message in tcp_v4_inbound_md5_hashLeigh Brown2006-12-181-1/+1
| | | | | | | | The message logged in tcp_v4_inbound_md5_hash when the hash was expected but not found was reversed. Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Fix oops caused by tcp_v4_md5_do_delLeigh Brown2006-12-181-0/+1
| | | | | | | | | md5sig_info.alloced4 must be set to zero when freeing keys4, otherwise it will not be alloc'd again when another key is added to the same socket by tcp_v4_md5_do_add. Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Fix oops caused by __tcp_put_md5sig_pool()David S. Miller2006-12-141-2/+3
| | | | | | | It should call tcp_free_md5sig_pool() not __tcp_free_md5sig_pool() so that it does proper refcounting. Signed-off-by: David S. Miller <davem@davemloft.net>