summaryrefslogtreecommitdiffstats
path: root/net (follow)
Commit message (Collapse)AuthorAgeFilesLines
* netfilter: arp_tables: fix infoleak to userspaceVasiliy Kulikov2011-03-151-0/+3
| | | | | | | | | | | | | | | | | Structures ipt_replace, compat_ipt_replace, and xt_get_revision are copied from userspace. Fields of these structs that are zero-terminated strings are not checked. When they are used as argument to a format string containing "%s" in request_module(), some sensitive information is leaked to userspace via argument of spawned modprobe process. The first bug was introduced before the git epoch; the second is introduced by 6b7d31fc (v2.6.15-rc1); the third is introduced by 6b7d31fc (v2.6.15-rc1). To trigger the bug one should have CAP_NET_ADMIN. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: xt_connlimit: remove connlimit_rnd_initedChangli Gao2011-03-151-4/+7
| | | | | | | A potential race condition when generating connlimit_rnd is also fixed. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: xt_connlimit: use hlist insteadChangli Gao2011-03-151-14/+14
| | | | | | | The header of hlist is smaller than list. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: xt_connlimit: use kmalloc() instead of kzalloc()Changli Gao2011-03-151-1/+1
| | | | | | | All the members are initialized after kzalloc(). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: xt_connlimit: fix daddr connlimit in SNAT scenarioChangli Gao2011-03-151-11/+9
| | | | | | | | | | | | | | We use the reply tuples when limiting the connections by the destination addresses, however, in SNAT scenario, the final reply tuples won't be ready until SNAT is done in POSTROUING or INPUT chain, and the following nf_conntrack_find_get() in count_tem() will get nothing, so connlimit can't work as expected. In this patch, the original tuples are always used, and an additional member addr is appended to save the address in either end. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* IPVS: Add __ip_vs_control_{init,cleanup}_sysctl()Simon Horman2011-03-151-36/+62
| | | | | | | | Break out the portions of __ip_vs_control_init() and __ip_vs_control_cleanup() where aren't necessary when CONFIG_SYSCTL is undefined. Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Conditionally define and use ip_vs_lblc{r}_tableSimon Horman2011-03-152-9/+20
| | | | | | | ip_vs_lblc_table and ip_vs_lblcr_table, and code that uses them are unnecessary when CONFIG_SYSCTL is undefined. Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Minimise ip_vs_leave when CONFIG_SYSCTL is undefinedSimon Horman2011-03-151-2/+7
| | | | | | | | | | | | Much of ip_vs_leave() is unnecessary if CONFIG_SYSCTL is undefined. I tried an approach of breaking the now #ifdef'ed portions out into a separate function. However this appeared to grow the compiled code on x86_64 by about 200 bytes in the case where CONFIG_SYSCTL is defined. So I have gone with the simpler though less elegant #ifdef'ed solution for now. Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Conditinally use sysctl_lblc{r}_expirationSimon Horman2011-03-152-9/+28
| | | | | | | In preparation for not including sysctl_lblc{r}_expiration in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Add expire_quiescent_template()Simon Horman2011-03-151-2/+11
| | | | | | | In preparation for not including sysctl_expire_quiescent_template in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Add sysctl_expire_nodest_conn()Simon Horman2011-03-151-1/+7
| | | | | | | In preparation for not including sysctl_expire_nodest_conn in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Add sysctl_sync_ver()Simon Horman2011-03-151-2/+2
| | | | | | | In preparation for not including sysctl_sync_ver in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Add {sysctl_sync_threshold,period}()Simon Horman2011-03-153-9/+9
| | | | | | | In preparation for not including sysctl_sync_threshold in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Add sysctl_nat_icmp_send()Simon Horman2011-03-151-3/+8
| | | | | | | In preparation for not including sysctl_nat_icmp_send in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Add sysctl_snat_reroute()Simon Horman2011-03-151-4/+16
| | | | | | | In preparation for not including sysctl_snat_reroute in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Add ip_vs_route_me_harder()Simon Horman2011-03-151-26/+22
| | | | | | Add ip_vs_route_me_harder() to avoid repeating the same code twice. Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: rename estimator functionsJulian Anastasov2011-03-152-8/+8
| | | | | | | | | Rename ip_vs_new_estimator to ip_vs_start_estimator and ip_vs_kill_estimator to ip_vs_stop_estimator to better match their logic. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: optimize rates readingJulian Anastasov2011-03-152-39/+25
| | | | | | | | | | | | | | Move the estimator reading from estimation_timer to user context. ip_vs_read_estimator() will be used to decode the rate values. As the decoded rates are not set by estimation timer there is no need to reset them in ip_vs_zero_stats. There is no need ip_vs_new_estimator() to encode stats to rates, if the destination is in trash both the stats and the rates are inactive. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: properly zero stats and ratesJulian Anastasov2011-03-152-43/+68
| | | | | | | | | | | | | | | | | | | | | | | | | Currently, the new percpu counters are not zeroed and the zero commands do not work as expected, we still show the old sum of percpu values. OTOH, we can not reset the percpu counters from user context without causing the incrementing to use old and bogus values. So, as Eric Dumazet suggested fix that by moving all overhead to stats reading in user context. Do not introduce overhead in timer context (estimator) and incrementing (packet handling in softirqs). The new ustats0 field holds the zero point for all counter values, the rates always use 0 as base value as before. When showing the values to user space just give the difference between counters and the base values. The only drawback is that percpu stats are not zeroed, they are accessible only from /proc and are new interface, so it should not be a compatibility problem as long as the sum stats are correct after zeroing. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: reorganize tot_statsJulian Anastasov2011-03-153-26/+28
| | | | | | | | | | | | | | | | | The global tot_stats contains cpustats field just like the stats for dest and svc, so better use it to simplify the usage in estimation_timer. As tot_stats is registered as estimator we can remove the special ip_vs_read_cpu_stats call for tot_stats. Fix ip_vs_read_cpu_stats to be called under stats lock because it is still used as synchronization between estimation timer and user context (the stats readers). Also, make sure ip_vs_stats_percpu_show reads properly the u64 stats from user context. Signed-off-by: Julian Anastasov <ja@ssi.bg> Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
* netfilter:ipvs: use kmemdupShan Wei2011-03-152-7/+5
| | | | | | | | | | | The semantic patch that makes this output is available in scripts/coccinelle/api/memdup.cocci. More information about semantic patching is available at http://coccinelle.lip6.fr/ Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: remove _bh from percpu stats readingJulian Anastasov2011-03-151-4/+4
| | | | | | | | | ip_vs_read_cpu_stats is called only from timer, so no need for _bh locks. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: avoid lookup for fwmark 0Julian Anastasov2011-03-151-3/+5
| | | | | | | | | Restore the previous behaviour to lookup for fwmark service only when fwmark is non-null. This saves only CPU. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
* netfilter: nf_conntrack: fix sysctl memory leakStephen Hemminger2011-03-141-0/+1
| | | | | | | | | | | Message in log because sysctl table was not empty at netns exit WARNING: at net/sysctl_net.c:84 sysctl_net_exit+0x2a/0x2c() Instrumenting showed that the nf_conntrack_timestamp was the entry that was being created but not cleared. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: x_tables: return -ENOENT for non-existant matches/targetsPatrick McHardy2011-03-141-2/+2
| | | | | | | | | As Stephen correctly points out, we need to return -ENOENT in xt_find_match()/xt_find_target() after the patch "netfilter: x_tables: misuse of try_then_request_module" in order to properly indicate a non-existant module to the caller. Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: x_tables: misuse of try_then_request_moduleStephen Hemminger2011-03-091-7/+15
| | | | | | | | | | | | | | | Since xt_find_match() returns ERR_PTR(xx) on error not NULL, the macro try_then_request_module won't work correctly here. The macro expects its first argument will be zero if condition fails. But ERR_PTR(-ENOENT) is not zero. The correct solution is to propagate the error value back. Found by inspection, and compile tested only. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: ipset: fix the compile warning in ip_set_createShan Wei2011-03-081-1/+1
| | | | | | | | net/netfilter/ipset/ip_set_core.c:615: warning: ‘clash’ may be used uninitialized in this function Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: nf_ct_tcp: fix out of sync scenario while in SYN_RECVPablo Neira Ayuso2011-02-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the out of sync scenarios while in SYN_RECV state. Quoting Jozsef, what it happens if we are out of sync if the following: > > b. conntrack entry is outdated, new SYN received > > - (b1) we ignore it but save the initialization data from it > > - (b2) when the reply SYN/ACK receives and it matches the saved data, > > we pick up the new connection This is what it should happen if we are in SYN_RECV state. Initially, the SYN packet hits b1, thus we save data from it. But the SYN/ACK packet is considered a retransmission given that we're in SYN_RECV state. Therefore, we never hit b2 and we don't get in sync. To fix this, we ignore SYN/ACK if we are in SYN_RECV. If the previous packet was a SYN, then we enter the ignore case that get us in sync. This patch helps a lot to conntrackd in stress scenarios (assumming a client that generates lots of small TCP connections). During the failover, consider that the new primary has injected one outdated flow in SYN_RECV state (this is likely to happen if the conntrack event rate is high because the backup will be a bit delayed from the primary). With the current code, if the client starts a new fresh connection that matches the tuple, the SYN packet will be ignored without updating the state tracking, and the SYN+ACK in reply will blocked as it will not pass checkings III or IV (since all state tracking in the original direction is not initialized because of the SYN packet was ignored and the ignore case that get us in sync is not applied). I posted a couple of patches before this one. Changli Gao spotted a simpler way to fix this problem. This patch implements his idea. Cc: Changli Gao <xiaosuo@gmail.com> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
* ipvs: unify the formula to estimate the overhead of processing connectionsChangli Gao2011-02-254-63/+13
| | | | | | | | | | | lc and wlc use the same formula, but lblc and lblcr use another one. There is no reason for using two different formulas for the lc variants. The formula used by lc is used by all the lc variants in this patch. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Wensong Zhang <wensong@linux-vs.org> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: use enum to instead of magic numbersChangli Gao2011-02-241-14/+27
| | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: use hlist instead of listChangli Gao2011-02-221-23/+29
| | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: make "no destination available" message more informativePatrick Schaaf2011-02-1610-14/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | When IP_VS schedulers do not find a destination, they output a terse "WLC: no destination available" message through kernel syslog, which I can not only make sense of because syslog puts them in a logfile together with keepalived checker results. This patch makes the output a bit more informative, by telling you which virtual service failed to find a destination. Example output: kernel: [1539214.552233] IPVS: wlc: TCP 192.168.8.30:22 - no destination available kernel: [1539299.674418] IPVS: wlc: FWM 22 0x00000016 - no destination available I have tested the code for IPv4 and FWM services, as you can see from the example; I do not have an IPv6 setup to test the third code path with. To avoid code duplication, I put a new function ip_vs_scheduler_err() into ip_vs_sched.c, and use that from the schedulers instead of calling IP_VS_ERR_RL directly. Signed-off-by: Patrick Schaaf <netdev@bof.de> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: remove extra lookups for ICMP packetsJulian Anastasov2011-02-151-25/+3
| | | | | | | | | | | | Remove code that should not be called anymore. Now when ip_vs_out handles replies for local clients at LOCAL_IN hook we do not need to call conn_out_get and handle_response_icmp from ip_vs_in_icmp* because such lookups were already performed for the ICMP packet and no connection was found. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: fix timer in get_curr_sync_buffTinggong Wang2011-02-151-2/+2
| | | | | | | | | | Fix get_curr_sync_buff to keep buffer for 2 seconds as intended, not just for the current jiffie. By this way we will sync more connection structures with single packet. Signed-off-by: Tinggong Wang <wangtinggong@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* netfilter: nfnetlink_log: remove unused parameterFlorian Westphal2011-02-151-2/+1
| | | | | Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: xt_conntrack: warn about use in raw tableJan Engelhardt2011-02-141-0/+5
| | | | | | | nfct happens to run after the raw table only. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
* Revert "netfilter: xt_connlimit: connlimit-above early loop termination"Stefan Berger2011-02-141-10/+3
| | | | | | | | | | | This reverts commit 44bd4de9c2270b22c3c898310102bc6be9ed2978. I have to revert the early loop termination in connlimit since it generates problems when an iptables statement does not use -m state --state NEW before the connlimit match extension. Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* bridge: netfilter: fix information leakVasiliy Kulikov2011-02-141-0/+2
| | | | | | | | | | | Struct tmp is copied from userspace. It is not checked whether the "name" field is NULL terminated. This may lead to buffer overflow and passing contents of kernel stack as a module name to try_then_request_module() and, consequently, to modprobe commandline. It would be seen by all userspace processes. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: xt_connlimit: connlimit-above early loop terminationStefan Berger2011-02-111-3/+10
| | | | | | | | | | | | | | | | | | | | | | | The patch below introduces an early termination of the loop that is counting matches. It terminates once the counter has exceeded the threshold provided by the user. There's no point in continuing the loop afterwards and looking at other entries. It plays together with the following code further below: return (connections > info->limit) ^ info->inverse; where connections is the result of the counted connection, which in turn is the matches variable in the loop. So once -> matches = info->limit + 1 alias -> matches > info->limit alias -> matches > threshold we can terminate the loop. Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: ipset: add dependency on CONFIG_NETFILTER_NETLINKPatrick McHardy2011-02-101-0/+1
| | | | | | | | | | | | | When SYSCTL and PROC_FS and NETFILTER_NETLINK are not enabled: net/built-in.o: In function `try_to_load_type': ip_set_core.c:(.text+0x3ab49): undefined reference to `nfnl_unlock' ip_set_core.c:(.text+0x3ab4e): undefined reference to `nfnl_lock' ... Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* IPVS: precedence bug in ip_vs_sync_switch_mode()Dan Carpenter2011-02-071-1/+1
| | | | | | | | | '!' has higher precedence than '&'. IP_VS_STATE_MASTER is 0x1 so the original code is equivelent to if (!ipvs->sync_state) ... Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Use correct lock in SCTP moduleSimon Horman2011-02-031-1/+1
| | | | | | | | | Use sctp_app_lock instead of tcp_app_lock in the SCTP protocol module. This appears to be a typo introduced by the netns changes. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
* netfilter: xtables: add device group matchPatrick McHardy2011-02-033-0/+92
| | | | | | | Add a new 'devgroup' match to match on the device group of the incoming and outgoing network device of a packet. Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: ipset: send error message manuallyJozsef Kadlecsik2011-02-021-7/+26
| | | | | | | | | | | | | | When a message carries multiple commands and one of them triggers an error, we have to report to the userspace which one was that. The line number of the command plays this role and there's an attribute reserved in the header part of the message to be filled out with the error line number. In order not to modify the original message received from the userspace, we construct a new, complete netlink error message and modifies the attribute there, then send it. Netlink is notified not to send its ACK/error message. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: ipset: fix linking with CONFIG_IPV6=nPatrick McHardy2011-02-021-6/+9
| | | | | | | | | Add a dummy ip_set_get_ip6_port function that unconditionally returns false for CONFIG_IPV6=n and convert the real function to ipv6_skip_exthdr() to avoid pulling in the ip6_tables module when loading ipset. Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: ipset: add missing break statemtns in ip_set_get_ip_port()Patrick McHardy2011-02-021-0/+2
| | | | | | | | Don't fall through in the switch statement, otherwise IPv4 headers are incorrectly parsed again as IPv6 and the return value will always be 'false'. Signed-off-by: Patrick McHardy <kaber@trash.net>
* IPVS: Remove ip_vs_sync_cleanup from section __exitSimon Horman2011-02-011-1/+1
| | | | | | | | | | | | ip_vs_sync_cleanup() may be called from ip_vs_init() on error and thus needs to be accesible from section __init Reporte-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Tested-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* IPVS: Allow compilation with CONFIG_SYSCTL disabledSimon Horman2011-02-013-25/+29
| | | | | | | | | | | | | | This is a rather naieve approach to allowing PVS to compile with CONFIG_SYSCTL disabled. I am working on a more comprehensive patch which will remove compilation of all sysctl-related IPVS code when CONFIG_SYSCTL is disabled. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Tested-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* IPVS: remove duplicate initialisation or rs_tableSimon Horman2011-02-011-3/+0
| | | | | | | | Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Tested-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* IPVS: use z modifier for sizeof() argumentSimon Horman2011-02-011-1/+1
| | | | | | | | | Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Tested-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Patrick McHardy <kaber@trash.net>