diff options
author | David S. Miller <davem@davemloft.net> | 2015-04-15 00:51:19 +0200 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2015-04-15 00:51:19 +0200 |
commit | bae97d84100ae7a8dc3b79233ecd3a8f7c19ea57 (patch) | |
tree | 975f812d346f61d988a8dc5a0989539293700ad9 /include | |
parent | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (diff) | |
parent | netfilter: nf_tables: get rid of the expression example code (diff) | |
download | linux-bae97d84100ae7a8dc3b79233ecd3a8f7c19ea57.tar.xz linux-bae97d84100ae7a8dc3b79233ecd3a8f7c19ea57.zip |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
A final pull request, I know it's very late but this time I think it's worth a
bit of rush.
The following patchset contains Netfilter/nf_tables updates for net-next, more
specifically concatenation support and dynamic stateful expression
instantiation.
This also comes with a couple of small patches. One to fix the ebtables.h
userspace header and another to get rid of an obsolete example file in tree
that describes a nf_tables expression.
This time, I decided to paste the original descriptions. This will result in a
rather large commit description, but I think these bytes to keep.
Patrick McHardy says:
====================
netfilter: nf_tables: concatenation support
The following patches add support for concatenations, which allow multi
dimensional exact matches in O(1).
The basic idea is to split the data registers, currently consisting of
4 registers of 16 bytes each, into smaller units, 16 registers of 4
bytes each, and making sure each register store always leaves the
full 32 bit in a well defined state, meaning smaller stores will
zero the remaining bits.
Based on that, we can load multiple adjacent registers with different
values, thereby building a concatenated bigger value, and use that
value for set lookups.
Sets are changed to use variable sized extensions for their key and
data values, removing the fixed limit of 16 bytes while saving memory
if less space is needed.
As a side effect, these patches will allow some nice optimizations in
the future, like using jhash2 in nft_hash, removing the masking in
nft_cmp_fast, optimized data comparison using 32 bit word size etc.
These are not done so far however.
The patches are split up as follows:
* the first five patches add length validation to register loads and
stores to make sure we stay within bounds and prepare the validation
functions for the new addressing mode
* the next patches prepare for changing to 32 bit addressing by
introducing a struct nft_regs, which holds the verdict register as
well as the data registers. The verdict members are moved to a new
struct nft_verdict to allow to pull struct nft_data out of the stack.
* the next patches contain preparatory conversions of expressions and
sets to use 32 bit addressing
* the next patch introduces so far unused register conversion helpers
for parsing and dumping register numbers over netlink
* following is the real conversion to 32 bit addressing, consisting of
replacing struct nft_data in struct nft_regs by an array of u32s and
actually translating and validating the new register numbers.
* the final two patches add support for variable sized data items and
variable sized keys / data in set elements
The patches have been verified to work correctly with nft binaries using
both old and new addressing.
====================
Patrick McHardy says:
====================
netfilter: nf_tables: dynamic stateful expression instantiation
The following patches are the grand finale of my nf_tables set work,
using all the building blocks put in place by the previous patches
to support something like iptables hashlimit, but a lot more powerful.
Sets are extended to allow attaching expressions to set elements.
The dynset expression dynamically instantiates these expressions
based on a template when creating new set elements and evaluates
them for all new or updated set members.
In combination with concatenations this effectively creates state
tables for arbitrary combinations of keys, using the existing
expression types to maintain that state. Regular set GC takes care
of purging expired states.
We currently support two different stateful expressions, counter
and limit. Using limit as a template we can express the functionality
of hashlimit, but completely unrestricted in the combination of keys.
Using counter we can perform accounting for arbitrary flows.
The following examples from patch 5/5 show some possibilities.
Userspace syntax is still WIP, especially the listing of state
tables will most likely be seperated from normal set listings
and use a more structured format:
1. Limit the rate of new SSH connections per host, similar to iptables
hashlimit:
flow ip saddr timeout 60s \
limit 10/second \
accept
2. Account network traffic between each set of /24 networks:
flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \
counter
3. Account traffic to each host per user:
flow skuid . ip daddr \
counter
4. Account traffic for each combination of source address and TCP flags:
flow ip saddr . tcp flags \
counter
The resulting set content after a Xmas-scan look like this:
{
192.168.122.1 . fin | psh | urg : counter packets 1001 bytes 40040,
192.168.122.1 . ack : counter packets 74 bytes 3848,
192.168.122.1 . psh | ack : counter packets 35 bytes 3144
}
In the future the "expressions attached to elements" will be extended
to also support user created non-stateful expressions to allow to
efficiently select beween a set of parameter sets, f.i. a set of log
statements with different prefixes based on the interface, which currently
require one rule each. This will most likely have to wait until the next
kernel version though.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'include')
-rw-r--r-- | include/linux/netfilter_bridge/ebtables.h | 3 | ||||
-rw-r--r-- | include/net/netfilter/nf_tables.h | 103 | ||||
-rw-r--r-- | include/net/netfilter/nft_meta.h | 4 | ||||
-rw-r--r-- | include/uapi/linux/netfilter/nf_tables.h | 40 | ||||
-rw-r--r-- | include/uapi/linux/netfilter_bridge/ebtables.h | 2 |
5 files changed, 116 insertions, 36 deletions
diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h index 34e7a2b7f867..f1bd3962e6b6 100644 --- a/include/linux/netfilter_bridge/ebtables.h +++ b/include/linux/netfilter_bridge/ebtables.h @@ -12,9 +12,10 @@ #ifndef __LINUX_BRIDGE_EFF_H #define __LINUX_BRIDGE_EFF_H +#include <linux/if.h> +#include <linux/if_ether.h> #include <uapi/linux/netfilter_bridge/ebtables.h> - /* return values for match() functions */ #define EBT_MATCH 0 #define EBT_NOMATCH 1 diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index d6a2f0ed5130..e6bcf55dcf20 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1,6 +1,7 @@ #ifndef _NET_NF_TABLES_H #define _NET_NF_TABLES_H +#include <linux/module.h> #include <linux/list.h> #include <linux/netfilter.h> #include <linux/netfilter/nfnetlink.h> @@ -36,29 +37,43 @@ static inline void nft_set_pktinfo(struct nft_pktinfo *pkt, pkt->xt.family = ops->pf; } +/** + * struct nft_verdict - nf_tables verdict + * + * @code: nf_tables/netfilter verdict code + * @chain: destination chain for NFT_JUMP/NFT_GOTO + */ +struct nft_verdict { + u32 code; + struct nft_chain *chain; +}; + struct nft_data { union { - u32 data[4]; - struct { - u32 verdict; - struct nft_chain *chain; - }; + u32 data[4]; + struct nft_verdict verdict; }; } __attribute__((aligned(__alignof__(u64)))); -static inline int nft_data_cmp(const struct nft_data *d1, - const struct nft_data *d2, - unsigned int len) -{ - return memcmp(d1->data, d2->data, len); -} +/** + * struct nft_regs - nf_tables register set + * + * @data: data registers + * @verdict: verdict register + * + * The first four data registers alias to the verdict register. + */ +struct nft_regs { + union { + u32 data[20]; + struct nft_verdict verdict; + }; +}; -static inline void nft_data_copy(struct nft_data *dst, - const struct nft_data *src) +static inline void nft_data_copy(u32 *dst, const struct nft_data *src, + unsigned int len) { - BUILD_BUG_ON(__alignof__(*dst) != __alignof__(u64)); - *(u64 *)&dst->data[0] = *(u64 *)&src->data[0]; - *(u64 *)&dst->data[2] = *(u64 *)&src->data[2]; + memcpy(dst, src, len); } static inline void nft_data_debug(const struct nft_data *data) @@ -96,7 +111,8 @@ struct nft_data_desc { unsigned int len; }; -int nft_data_init(const struct nft_ctx *ctx, struct nft_data *data, +int nft_data_init(const struct nft_ctx *ctx, + struct nft_data *data, unsigned int size, struct nft_data_desc *desc, const struct nlattr *nla); void nft_data_uninit(const struct nft_data *data, enum nft_data_types type); int nft_data_dump(struct sk_buff *skb, int attr, const struct nft_data *data, @@ -112,12 +128,14 @@ static inline enum nft_registers nft_type_to_reg(enum nft_data_types type) return type == NFT_DATA_VERDICT ? NFT_REG_VERDICT : NFT_REG_1; } -int nft_validate_input_register(enum nft_registers reg); -int nft_validate_output_register(enum nft_registers reg); -int nft_validate_data_load(const struct nft_ctx *ctx, enum nft_registers reg, - const struct nft_data *data, - enum nft_data_types type); +unsigned int nft_parse_register(const struct nlattr *attr); +int nft_dump_register(struct sk_buff *skb, unsigned int attr, unsigned int reg); +int nft_validate_register_load(enum nft_registers reg, unsigned int len); +int nft_validate_register_store(const struct nft_ctx *ctx, + enum nft_registers reg, + const struct nft_data *data, + enum nft_data_types type, unsigned int len); /** * struct nft_userdata - user defined data associated with an object @@ -141,7 +159,10 @@ struct nft_userdata { * @priv: element private data and extensions */ struct nft_set_elem { - struct nft_data key; + union { + u32 buf[NFT_DATA_VALUE_MAXLEN / sizeof(u32)]; + struct nft_data val; + } key; void *priv; }; @@ -216,15 +237,15 @@ struct nft_expr; */ struct nft_set_ops { bool (*lookup)(const struct nft_set *set, - const struct nft_data *key, + const u32 *key, const struct nft_set_ext **ext); bool (*update)(struct nft_set *set, - const struct nft_data *key, + const u32 *key, void *(*new)(struct nft_set *, const struct nft_expr *, - struct nft_data []), + struct nft_regs *), const struct nft_expr *expr, - struct nft_data data[], + struct nft_regs *regs, const struct nft_set_ext **ext); int (*insert)(const struct nft_set *set, @@ -350,6 +371,7 @@ void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set, * @NFT_SET_EXT_TIMEOUT: element timeout * @NFT_SET_EXT_EXPIRATION: element expiration time * @NFT_SET_EXT_USERDATA: user data associated with the element + * @NFT_SET_EXT_EXPR: expression assiociated with the element * @NFT_SET_EXT_NUM: number of extension types */ enum nft_set_extensions { @@ -359,6 +381,7 @@ enum nft_set_extensions { NFT_SET_EXT_TIMEOUT, NFT_SET_EXT_EXPIRATION, NFT_SET_EXT_USERDATA, + NFT_SET_EXT_EXPR, NFT_SET_EXT_NUM }; @@ -470,6 +493,11 @@ static inline struct nft_userdata *nft_set_ext_userdata(const struct nft_set_ext return nft_set_ext(ext, NFT_SET_EXT_USERDATA); } +static inline struct nft_expr *nft_set_ext_expr(const struct nft_set_ext *ext) +{ + return nft_set_ext(ext, NFT_SET_EXT_EXPR); +} + static inline bool nft_set_elem_expired(const struct nft_set_ext *ext) { return nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION) && @@ -484,8 +512,7 @@ static inline struct nft_set_ext *nft_set_elem_ext(const struct nft_set *set, void *nft_set_elem_init(const struct nft_set *set, const struct nft_set_ext_tmpl *tmpl, - const struct nft_data *key, - const struct nft_data *data, + const u32 *key, const u32 *data, u64 timeout, gfp_t gfp); void nft_set_elem_destroy(const struct nft_set *set, void *elem); @@ -556,6 +583,7 @@ static inline void nft_set_gc_batch_add(struct nft_set_gc_batch *gcb, * @policy: netlink attribute policy * @maxattr: highest netlink attribute number * @family: address family for AF-specific types + * @flags: expression type flags */ struct nft_expr_type { const struct nft_expr_ops *(*select_ops)(const struct nft_ctx *, @@ -567,8 +595,11 @@ struct nft_expr_type { const struct nla_policy *policy; unsigned int maxattr; u8 family; + u8 flags; }; +#define NFT_EXPR_STATEFUL 0x1 + /** * struct nft_expr_ops - nf_tables expression operations * @@ -584,7 +615,7 @@ struct nft_expr_type { struct nft_expr; struct nft_expr_ops { void (*eval)(const struct nft_expr *expr, - struct nft_data data[NFT_REG_MAX + 1], + struct nft_regs *regs, const struct nft_pktinfo *pkt); unsigned int size; @@ -622,6 +653,18 @@ static inline void *nft_expr_priv(const struct nft_expr *expr) return (void *)expr->data; } +struct nft_expr *nft_expr_init(const struct nft_ctx *ctx, + const struct nlattr *nla); +void nft_expr_destroy(const struct nft_ctx *ctx, struct nft_expr *expr); +int nft_expr_dump(struct sk_buff *skb, unsigned int attr, + const struct nft_expr *expr); + +static inline void nft_expr_clone(struct nft_expr *dst, struct nft_expr *src) +{ + __module_get(src->ops->type->owner); + memcpy(dst, src, src->ops->size); +} + /** * struct nft_rule - nf_tables rule * diff --git a/include/net/netfilter/nft_meta.h b/include/net/netfilter/nft_meta.h index 0ee47c3e2e31..711887a09e91 100644 --- a/include/net/netfilter/nft_meta.h +++ b/include/net/netfilter/nft_meta.h @@ -26,11 +26,11 @@ int nft_meta_set_dump(struct sk_buff *skb, const struct nft_expr *expr); void nft_meta_get_eval(const struct nft_expr *expr, - struct nft_data data[NFT_REG_MAX + 1], + struct nft_regs *regs, const struct nft_pktinfo *pkt); void nft_meta_set_eval(const struct nft_expr *expr, - struct nft_data data[NFT_REG_MAX + 1], + struct nft_regs *regs, const struct nft_pktinfo *pkt); #endif diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index 05ee1e0804a3..5fa1cd04762e 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -5,16 +5,45 @@ #define NFT_CHAIN_MAXNAMELEN 32 #define NFT_USERDATA_MAXLEN 256 +/** + * enum nft_registers - nf_tables registers + * + * nf_tables used to have five registers: a verdict register and four data + * registers of size 16. The data registers have been changed to 16 registers + * of size 4. For compatibility reasons, the NFT_REG_[1-4] registers still + * map to areas of size 16, the 4 byte registers are addressed using + * NFT_REG32_00 - NFT_REG32_15. + */ enum nft_registers { NFT_REG_VERDICT, NFT_REG_1, NFT_REG_2, NFT_REG_3, NFT_REG_4, - __NFT_REG_MAX + __NFT_REG_MAX, + + NFT_REG32_00 = 8, + MFT_REG32_01, + NFT_REG32_02, + NFT_REG32_03, + NFT_REG32_04, + NFT_REG32_05, + NFT_REG32_06, + NFT_REG32_07, + NFT_REG32_08, + NFT_REG32_09, + NFT_REG32_10, + NFT_REG32_11, + NFT_REG32_12, + NFT_REG32_13, + NFT_REG32_14, + NFT_REG32_15, }; #define NFT_REG_MAX (__NFT_REG_MAX - 1) +#define NFT_REG_SIZE 16 +#define NFT_REG32_SIZE 4 + /** * enum nft_verdicts - nf_tables internal verdicts * @@ -209,6 +238,7 @@ enum nft_rule_compat_attributes { * @NFT_SET_INTERVAL: set contains intervals * @NFT_SET_MAP: set is used as a dictionary * @NFT_SET_TIMEOUT: set uses timeouts + * @NFT_SET_EVAL: set contains expressions for evaluation */ enum nft_set_flags { NFT_SET_ANONYMOUS = 0x1, @@ -216,6 +246,7 @@ enum nft_set_flags { NFT_SET_INTERVAL = 0x4, NFT_SET_MAP = 0x8, NFT_SET_TIMEOUT = 0x10, + NFT_SET_EVAL = 0x20, }; /** @@ -293,6 +324,7 @@ enum nft_set_elem_flags { * @NFTA_SET_ELEM_TIMEOUT: timeout value (NLA_U64) * @NFTA_SET_ELEM_EXPIRATION: expiration time (NLA_U64) * @NFTA_SET_ELEM_USERDATA: user data (NLA_BINARY) + * @NFTA_SET_ELEM_EXPR: expression (NLA_NESTED: nft_expr_attributes) */ enum nft_set_elem_attributes { NFTA_SET_ELEM_UNSPEC, @@ -302,6 +334,7 @@ enum nft_set_elem_attributes { NFTA_SET_ELEM_TIMEOUT, NFTA_SET_ELEM_EXPIRATION, NFTA_SET_ELEM_USERDATA, + NFTA_SET_ELEM_EXPR, __NFTA_SET_ELEM_MAX }; #define NFTA_SET_ELEM_MAX (__NFTA_SET_ELEM_MAX - 1) @@ -359,6 +392,9 @@ enum nft_data_attributes { }; #define NFTA_DATA_MAX (__NFTA_DATA_MAX - 1) +/* Maximum length of a value */ +#define NFT_DATA_VALUE_MAXLEN 64 + /** * enum nft_verdict_attributes - nf_tables verdict netlink attributes * @@ -531,6 +567,7 @@ enum nft_dynset_ops { * @NFTA_DYNSET_SREG_KEY: source register of the key (NLA_U32) * @NFTA_DYNSET_SREG_DATA: source register of the data (NLA_U32) * @NFTA_DYNSET_TIMEOUT: timeout value for the new element (NLA_U64) + * @NFTA_DYNSET_EXPR: expression (NLA_NESTED: nft_expr_attributes) */ enum nft_dynset_attributes { NFTA_DYNSET_UNSPEC, @@ -540,6 +577,7 @@ enum nft_dynset_attributes { NFTA_DYNSET_SREG_KEY, NFTA_DYNSET_SREG_DATA, NFTA_DYNSET_TIMEOUT, + NFTA_DYNSET_EXPR, __NFTA_DYNSET_MAX, }; #define NFTA_DYNSET_MAX (__NFTA_DYNSET_MAX - 1) diff --git a/include/uapi/linux/netfilter_bridge/ebtables.h b/include/uapi/linux/netfilter_bridge/ebtables.h index ba993360dbe9..773dfe8924c7 100644 --- a/include/uapi/linux/netfilter_bridge/ebtables.h +++ b/include/uapi/linux/netfilter_bridge/ebtables.h @@ -12,9 +12,7 @@ #ifndef _UAPI__LINUX_BRIDGE_EFF_H #define _UAPI__LINUX_BRIDGE_EFF_H -#include <linux/if.h> #include <linux/netfilter_bridge.h> -#include <linux/if_ether.h> #define EBT_TABLE_MAXNAMELEN 32 #define EBT_CHAIN_MAXNAMELEN EBT_TABLE_MAXNAMELEN |