diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-02-22 03:24:12 +0100 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-02-22 03:24:12 +0100 |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /net/bridge | |
parent | Merge tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/... (diff) | |
parent | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (diff) | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.xz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'net/bridge')
-rw-r--r-- | net/bridge/br_if.c | 2 | ||||
-rw-r--r-- | net/bridge/br_mdb.c | 66 | ||||
-rw-r--r-- | net/bridge/br_multicast.c | 179 | ||||
-rw-r--r-- | net/bridge/br_netfilter_hooks.c | 2 | ||||
-rw-r--r-- | net/bridge/br_netlink.c | 19 | ||||
-rw-r--r-- | net/bridge/br_netlink_tunnel.c | 3 | ||||
-rw-r--r-- | net/bridge/br_private.h | 12 | ||||
-rw-r--r-- | net/bridge/br_switchdev.c | 10 | ||||
-rw-r--r-- | net/bridge/br_vlan.c | 11 | ||||
-rw-r--r-- | net/bridge/br_vlan_options.c | 27 | ||||
-rw-r--r-- | net/bridge/netfilter/nf_conntrack_bridge.c | 4 |
11 files changed, 280 insertions, 55 deletions
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index ad13b48e3e08..24f01ff113f0 100644 --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -269,7 +269,7 @@ static void brport_get_ownership(const struct kobject *kobj, kuid_t *uid, kgid_t net_ns_get_ownership(dev_net(p->dev), uid, gid); } -static struct kobj_type brport_ktype = { +static const struct kobj_type brport_ktype = { #ifdef CONFIG_SYSFS .sysfs_ops = &brport_sysfs_ops, #endif diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index 00e5743647b0..25c48d81a597 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -259,7 +259,7 @@ static int __mdb_fill_info(struct sk_buff *skb, #endif } else { ether_addr_copy(e.addr.u.mac_addr, mp->addr.dst.mac_addr); - e.state = MDB_PG_FLAGS_PERMANENT; + e.state = MDB_PERMANENT; } e.addr.proto = mp->addr.proto; nest_ent = nla_nest_start_noflag(skb, @@ -421,8 +421,6 @@ static int br_mdb_dump(struct sk_buff *skb, struct netlink_callback *cb) rcu_read_lock(); - cb->seq = net->dev_base_seq; - for_each_netdev_rcu(net, dev) { if (netif_is_bridge_master(dev)) { struct net_bridge *br = netdev_priv(dev); @@ -685,51 +683,58 @@ static const struct nla_policy br_mdbe_attrs_pol[MDBE_ATTR_MAX + 1] = { [MDBE_ATTR_RTPROT] = NLA_POLICY_MIN(NLA_U8, RTPROT_STATIC), }; -static bool is_valid_mdb_entry(struct br_mdb_entry *entry, - struct netlink_ext_ack *extack) +static int validate_mdb_entry(const struct nlattr *attr, + struct netlink_ext_ack *extack) { + struct br_mdb_entry *entry = nla_data(attr); + + if (nla_len(attr) != sizeof(struct br_mdb_entry)) { + NL_SET_ERR_MSG_MOD(extack, "Invalid MDBA_SET_ENTRY attribute length"); + return -EINVAL; + } + if (entry->ifindex == 0) { NL_SET_ERR_MSG_MOD(extack, "Zero entry ifindex is not allowed"); - return false; + return -EINVAL; } if (entry->addr.proto == htons(ETH_P_IP)) { if (!ipv4_is_multicast(entry->addr.u.ip4)) { NL_SET_ERR_MSG_MOD(extack, "IPv4 entry group address is not multicast"); - return false; + return -EINVAL; } if (ipv4_is_local_multicast(entry->addr.u.ip4)) { NL_SET_ERR_MSG_MOD(extack, "IPv4 entry group address is local multicast"); - return false; + return -EINVAL; } #if IS_ENABLED(CONFIG_IPV6) } else if (entry->addr.proto == htons(ETH_P_IPV6)) { if (ipv6_addr_is_ll_all_nodes(&entry->addr.u.ip6)) { NL_SET_ERR_MSG_MOD(extack, "IPv6 entry group address is link-local all nodes"); - return false; + return -EINVAL; } #endif } else if (entry->addr.proto == 0) { /* L2 mdb */ if (!is_multicast_ether_addr(entry->addr.u.mac_addr)) { NL_SET_ERR_MSG_MOD(extack, "L2 entry group is not multicast"); - return false; + return -EINVAL; } } else { NL_SET_ERR_MSG_MOD(extack, "Unknown entry protocol"); - return false; + return -EINVAL; } if (entry->state != MDB_PERMANENT && entry->state != MDB_TEMPORARY) { NL_SET_ERR_MSG_MOD(extack, "Unknown entry state"); - return false; + return -EINVAL; } if (entry->vid >= VLAN_VID_MASK) { NL_SET_ERR_MSG_MOD(extack, "Invalid entry VLAN id"); - return false; + return -EINVAL; } - return true; + return 0; } static bool is_valid_mdb_source(struct nlattr *attr, __be16 proto, @@ -849,11 +854,10 @@ static int br_mdb_add_group_sg(const struct br_mdb_config *cfg, } p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL, - MCAST_INCLUDE, cfg->rt_protocol); - if (unlikely(!p)) { - NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (S, G) port group"); + MCAST_INCLUDE, cfg->rt_protocol, extack); + if (unlikely(!p)) return -ENOMEM; - } + rcu_assign_pointer(*pp, p); if (!(flags & MDB_PG_FLAGS_PERMANENT) && !cfg->src_entry) mod_timer(&p->timer, @@ -1075,11 +1079,10 @@ static int br_mdb_add_group_star_g(const struct br_mdb_config *cfg, } p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL, - cfg->filter_mode, cfg->rt_protocol); - if (unlikely(!p)) { - NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (*, G) port group"); + cfg->filter_mode, cfg->rt_protocol, + extack); + if (unlikely(!p)) return -ENOMEM; - } err = br_mdb_add_group_srcs(cfg, p, brmctx, extack); if (err) @@ -1101,8 +1104,7 @@ static int br_mdb_add_group_star_g(const struct br_mdb_config *cfg, return 0; err_del_port_group: - hlist_del_init(&p->mglist); - kfree(p); + br_multicast_del_port_group(p); return err; } @@ -1297,6 +1299,14 @@ static int br_mdb_config_attrs_init(struct nlattr *set_attrs, return 0; } +static const struct nla_policy mdba_policy[MDBA_SET_ENTRY_MAX + 1] = { + [MDBA_SET_ENTRY_UNSPEC] = { .strict_start_type = MDBA_SET_ENTRY_ATTRS + 1 }, + [MDBA_SET_ENTRY] = NLA_POLICY_VALIDATE_FN(NLA_BINARY, + validate_mdb_entry, + sizeof(struct br_mdb_entry)), + [MDBA_SET_ENTRY_ATTRS] = { .type = NLA_NESTED }, +}; + static int br_mdb_config_init(struct net *net, const struct nlmsghdr *nlh, struct br_mdb_config *cfg, struct netlink_ext_ack *extack) @@ -1307,7 +1317,7 @@ static int br_mdb_config_init(struct net *net, const struct nlmsghdr *nlh, int err; err = nlmsg_parse_deprecated(nlh, sizeof(*bpm), tb, - MDBA_SET_ENTRY_MAX, NULL, extack); + MDBA_SET_ENTRY_MAX, mdba_policy, extack); if (err) return err; @@ -1349,14 +1359,8 @@ static int br_mdb_config_init(struct net *net, const struct nlmsghdr *nlh, NL_SET_ERR_MSG_MOD(extack, "Missing MDBA_SET_ENTRY attribute"); return -EINVAL; } - if (nla_len(tb[MDBA_SET_ENTRY]) != sizeof(struct br_mdb_entry)) { - NL_SET_ERR_MSG_MOD(extack, "Invalid MDBA_SET_ENTRY attribute length"); - return -EINVAL; - } cfg->entry = nla_data(tb[MDBA_SET_ENTRY]); - if (!is_valid_mdb_entry(cfg->entry, extack)) - return -EINVAL; if (cfg->entry->ifindex != cfg->br->dev->ifindex) { struct net_device *pdev; diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index dea1ee1bd095..96d1fc78dd39 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -31,6 +31,7 @@ #include <net/ip6_checksum.h> #include <net/addrconf.h> #endif +#include <trace/events/bridge.h> #include "br_private.h" #include "br_private_mcast_eht.h" @@ -234,6 +235,29 @@ out: return pmctx; } +static struct net_bridge_mcast_port * +br_multicast_port_vid_to_port_ctx(struct net_bridge_port *port, u16 vid) +{ + struct net_bridge_mcast_port *pmctx = NULL; + struct net_bridge_vlan *vlan; + + lockdep_assert_held_once(&port->br->multicast_lock); + + if (!br_opt_get(port->br, BROPT_MCAST_VLAN_SNOOPING_ENABLED)) + return NULL; + + /* Take RCU to access the vlan. */ + rcu_read_lock(); + + vlan = br_vlan_find(nbp_vlan_group_rcu(port), vid); + if (vlan && !br_multicast_port_ctx_vlan_disabled(&vlan->port_mcast_ctx)) + pmctx = &vlan->port_mcast_ctx; + + rcu_read_unlock(); + + return pmctx; +} + /* when snooping we need to check if the contexts should be used * in the following order: * - if pmctx is non-NULL (port), check if it should be used @@ -668,6 +692,101 @@ void br_multicast_del_group_src(struct net_bridge_group_src *src, __br_multicast_del_group_src(src); } +static int +br_multicast_port_ngroups_inc_one(struct net_bridge_mcast_port *pmctx, + struct netlink_ext_ack *extack, + const char *what) +{ + u32 max = READ_ONCE(pmctx->mdb_max_entries); + u32 n = READ_ONCE(pmctx->mdb_n_entries); + + if (max && n >= max) { + NL_SET_ERR_MSG_FMT_MOD(extack, "%s is already in %u groups, and mcast_max_groups=%u", + what, n, max); + return -E2BIG; + } + + WRITE_ONCE(pmctx->mdb_n_entries, n + 1); + return 0; +} + +static void br_multicast_port_ngroups_dec_one(struct net_bridge_mcast_port *pmctx) +{ + u32 n = READ_ONCE(pmctx->mdb_n_entries); + + WARN_ON_ONCE(n == 0); + WRITE_ONCE(pmctx->mdb_n_entries, n - 1); +} + +static int br_multicast_port_ngroups_inc(struct net_bridge_port *port, + const struct br_ip *group, + struct netlink_ext_ack *extack) +{ + struct net_bridge_mcast_port *pmctx; + int err; + + lockdep_assert_held_once(&port->br->multicast_lock); + + /* Always count on the port context. */ + err = br_multicast_port_ngroups_inc_one(&port->multicast_ctx, extack, + "Port"); + if (err) { + trace_br_mdb_full(port->dev, group); + return err; + } + + /* Only count on the VLAN context if VID is given, and if snooping on + * that VLAN is enabled. + */ + if (!group->vid) + return 0; + + pmctx = br_multicast_port_vid_to_port_ctx(port, group->vid); + if (!pmctx) + return 0; + + err = br_multicast_port_ngroups_inc_one(pmctx, extack, "Port-VLAN"); + if (err) { + trace_br_mdb_full(port->dev, group); + goto dec_one_out; + } + + return 0; + +dec_one_out: + br_multicast_port_ngroups_dec_one(&port->multicast_ctx); + return err; +} + +static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid) +{ + struct net_bridge_mcast_port *pmctx; + + lockdep_assert_held_once(&port->br->multicast_lock); + + if (vid) { + pmctx = br_multicast_port_vid_to_port_ctx(port, vid); + if (pmctx) + br_multicast_port_ngroups_dec_one(pmctx); + } + br_multicast_port_ngroups_dec_one(&port->multicast_ctx); +} + +u32 br_multicast_ngroups_get(const struct net_bridge_mcast_port *pmctx) +{ + return READ_ONCE(pmctx->mdb_n_entries); +} + +void br_multicast_ngroups_set_max(struct net_bridge_mcast_port *pmctx, u32 max) +{ + WRITE_ONCE(pmctx->mdb_max_entries, max); +} + +u32 br_multicast_ngroups_get_max(const struct net_bridge_mcast_port *pmctx) +{ + return READ_ONCE(pmctx->mdb_max_entries); +} + static void br_multicast_destroy_port_group(struct net_bridge_mcast_gc *gc) { struct net_bridge_port_group *pg; @@ -702,6 +821,7 @@ void br_multicast_del_pg(struct net_bridge_mdb_entry *mp, } else { br_multicast_star_g_handle_mode(pg, MCAST_INCLUDE); } + br_multicast_port_ngroups_dec(pg->key.port, pg->key.addr.vid); hlist_add_head(&pg->mcast_gc.gc_node, &br->mcast_gc_list); queue_work(system_long_wq, &br->mcast_gc_work); @@ -1165,6 +1285,7 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br, return mp; if (atomic_read(&br->mdb_hash_tbl.nelems) >= br->hash_max) { + trace_br_mdb_full(br->dev, group); br_mc_disabled_update(br->dev, false, NULL); br_opt_toggle(br, BROPT_MULTICAST_ENABLED, false); return ERR_PTR(-E2BIG); @@ -1284,14 +1405,22 @@ struct net_bridge_port_group *br_multicast_new_port_group( unsigned char flags, const unsigned char *src, u8 filter_mode, - u8 rt_protocol) + u8 rt_protocol, + struct netlink_ext_ack *extack) { struct net_bridge_port_group *p; + int err; - p = kzalloc(sizeof(*p), GFP_ATOMIC); - if (unlikely(!p)) + err = br_multicast_port_ngroups_inc(port, group, extack); + if (err) return NULL; + p = kzalloc(sizeof(*p), GFP_ATOMIC); + if (unlikely(!p)) { + NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new port group"); + goto dec_out; + } + p->key.addr = *group; p->key.port = port; p->flags = flags; @@ -1305,8 +1434,8 @@ struct net_bridge_port_group *br_multicast_new_port_group( if (!br_multicast_is_star_g(group) && rhashtable_lookup_insert_fast(&port->br->sg_port_tbl, &p->rhnode, br_sg_port_rht_params)) { - kfree(p); - return NULL; + NL_SET_ERR_MSG_MOD(extack, "Couldn't insert new port group"); + goto free_out; } rcu_assign_pointer(p->next, next); @@ -1320,6 +1449,25 @@ struct net_bridge_port_group *br_multicast_new_port_group( eth_broadcast_addr(p->eth_addr); return p; + +free_out: + kfree(p); +dec_out: + br_multicast_port_ngroups_dec(port, group->vid); + return NULL; +} + +void br_multicast_del_port_group(struct net_bridge_port_group *p) +{ + struct net_bridge_port *port = p->key.port; + __u16 vid = p->key.addr.vid; + + hlist_del_init(&p->mglist); + if (!br_multicast_is_star_g(&p->key.addr)) + rhashtable_remove_fast(&port->br->sg_port_tbl, &p->rhnode, + br_sg_port_rht_params); + kfree(p); + br_multicast_port_ngroups_dec(port, vid); } void br_multicast_host_join(const struct net_bridge_mcast *brmctx, @@ -1387,7 +1535,7 @@ __br_multicast_add_group(struct net_bridge_mcast *brmctx, } p = br_multicast_new_port_group(pmctx->port, group, *pp, 0, src, - filter_mode, RTPROT_KERNEL); + filter_mode, RTPROT_KERNEL, NULL); if (unlikely(!p)) { p = ERR_PTR(-ENOMEM); goto out; @@ -1933,6 +2081,25 @@ static void __br_multicast_enable_port_ctx(struct net_bridge_mcast_port *pmctx) br_ip4_multicast_add_router(brmctx, pmctx); br_ip6_multicast_add_router(brmctx, pmctx); } + + if (br_multicast_port_ctx_is_vlan(pmctx)) { + struct net_bridge_port_group *pg; + u32 n = 0; + + /* The mcast_n_groups counter might be wrong. First, + * BR_VLFLAG_MCAST_ENABLED is toggled before temporary entries + * are flushed, thus mcast_n_groups after the toggle does not + * reflect the true values. And second, permanent entries added + * while BR_VLFLAG_MCAST_ENABLED was disabled, are not reflected + * either. Thus we have to refresh the counter. + */ + + hlist_for_each_entry(pg, &pmctx->port->mglist, mglist) { + if (pg->key.addr.vid == pmctx->vlan->vid) + n++; + } + WRITE_ONCE(pmctx->mdb_n_entries, n); + } } void br_multicast_enable_port(struct net_bridge_port *port) diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index 9554abcfd5b4..638a4d5359db 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -214,7 +214,7 @@ static int br_validate_ipv4(struct net *net, struct sk_buff *skb) if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl))) goto csum_error; - len = ntohs(iph->tot_len); + len = skb_ip_totlen(skb); if (skb->len < len) { __IP_INC_STATS(net, IPSTATS_MIB_INTRUNCATEDPKTS); goto drop; diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 4316cc82ae17..9173e52b89e2 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -202,6 +202,8 @@ static inline size_t br_port_info_size(void) + nla_total_size_64bit(sizeof(u64)) /* IFLA_BRPORT_HOLD_TIMER */ #ifdef CONFIG_BRIDGE_IGMP_SNOOPING + nla_total_size(sizeof(u8)) /* IFLA_BRPORT_MULTICAST_ROUTER */ + + nla_total_size(sizeof(u32)) /* IFLA_BRPORT_MCAST_N_GROUPS */ + + nla_total_size(sizeof(u32)) /* IFLA_BRPORT_MCAST_MAX_GROUPS */ #endif + nla_total_size(sizeof(u16)) /* IFLA_BRPORT_GROUP_FWD_MASK */ + nla_total_size(sizeof(u8)) /* IFLA_BRPORT_MRP_RING_OPEN */ @@ -298,7 +300,11 @@ static int br_port_fill_attrs(struct sk_buff *skb, nla_put_u32(skb, IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT, p->multicast_eht_hosts_limit) || nla_put_u32(skb, IFLA_BRPORT_MCAST_EHT_HOSTS_CNT, - p->multicast_eht_hosts_cnt)) + p->multicast_eht_hosts_cnt) || + nla_put_u32(skb, IFLA_BRPORT_MCAST_N_GROUPS, + br_multicast_ngroups_get(&p->multicast_ctx)) || + nla_put_u32(skb, IFLA_BRPORT_MCAST_MAX_GROUPS, + br_multicast_ngroups_get_max(&p->multicast_ctx))) return -EMSGSIZE; #endif @@ -858,6 +864,8 @@ static int br_afspec(struct net_bridge *br, } static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = { + [IFLA_BRPORT_UNSPEC] = { .strict_start_type = + IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT + 1 }, [IFLA_BRPORT_STATE] = { .type = NLA_U8 }, [IFLA_BRPORT_COST] = { .type = NLA_U32 }, [IFLA_BRPORT_PRIORITY] = { .type = NLA_U16 }, @@ -881,6 +889,8 @@ static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = { [IFLA_BRPORT_MAB] = { .type = NLA_U8 }, [IFLA_BRPORT_BACKUP_PORT] = { .type = NLA_U32 }, [IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT] = { .type = NLA_U32 }, + [IFLA_BRPORT_MCAST_N_GROUPS] = { .type = NLA_REJECT }, + [IFLA_BRPORT_MCAST_MAX_GROUPS] = { .type = NLA_U32 }, }; /* Change the state of the port and notify spanning tree */ @@ -1015,6 +1025,13 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[], if (err) return err; } + + if (tb[IFLA_BRPORT_MCAST_MAX_GROUPS]) { + u32 max_groups; + + max_groups = nla_get_u32(tb[IFLA_BRPORT_MCAST_MAX_GROUPS]); + br_multicast_ngroups_set_max(&p->multicast_ctx, max_groups); + } #endif if (tb[IFLA_BRPORT_GROUP_FWD_MASK]) { diff --git a/net/bridge/br_netlink_tunnel.c b/net/bridge/br_netlink_tunnel.c index 8914290c75d4..17abf092f7ca 100644 --- a/net/bridge/br_netlink_tunnel.c +++ b/net/bridge/br_netlink_tunnel.c @@ -188,6 +188,9 @@ initvars: } static const struct nla_policy vlan_tunnel_policy[IFLA_BRIDGE_VLAN_TUNNEL_MAX + 1] = { + [IFLA_BRIDGE_VLAN_TUNNEL_UNSPEC] = { + .strict_start_type = IFLA_BRIDGE_VLAN_TUNNEL_FLAGS + 1 + }, [IFLA_BRIDGE_VLAN_TUNNEL_ID] = { .type = NLA_U32 }, [IFLA_BRIDGE_VLAN_TUNNEL_VID] = { .type = NLA_U16 }, [IFLA_BRIDGE_VLAN_TUNNEL_FLAGS] = { .type = NLA_U16 }, diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 15ef7fd508ee..cef5f6ea850c 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -126,6 +126,8 @@ struct net_bridge_mcast_port { struct hlist_node ip6_rlist; #endif /* IS_ENABLED(CONFIG_IPV6) */ unsigned char multicast_router; + u32 mdb_n_entries; + u32 mdb_max_entries; #endif /* CONFIG_BRIDGE_IGMP_SNOOPING */ }; @@ -956,7 +958,9 @@ br_multicast_new_port_group(struct net_bridge_port *port, const struct br_ip *group, struct net_bridge_port_group __rcu *next, unsigned char flags, const unsigned char *src, - u8 filter_mode, u8 rt_protocol); + u8 filter_mode, u8 rt_protocol, + struct netlink_ext_ack *extack); +void br_multicast_del_port_group(struct net_bridge_port_group *p); int br_mdb_hash_init(struct net_bridge *br); void br_mdb_hash_fini(struct net_bridge *br); void br_mdb_notify(struct net_device *dev, struct net_bridge_mdb_entry *mp, @@ -974,6 +978,9 @@ void br_multicast_uninit_stats(struct net_bridge *br); void br_multicast_get_stats(const struct net_bridge *br, const struct net_bridge_port *p, struct br_mcast_stats *dest); +u32 br_multicast_ngroups_get(const struct net_bridge_mcast_port *pmctx); +void br_multicast_ngroups_set_max(struct net_bridge_mcast_port *pmctx, u32 max); +u32 br_multicast_ngroups_get_max(const struct net_bridge_mcast_port *pmctx); void br_mdb_init(void); void br_mdb_uninit(void); void br_multicast_host_join(const struct net_bridge_mcast *brmctx, @@ -1757,7 +1764,8 @@ static inline u16 br_vlan_flags(const struct net_bridge_vlan *v, u16 pvid) #ifdef CONFIG_BRIDGE_VLAN_FILTERING bool br_vlan_opts_eq_range(const struct net_bridge_vlan *v_curr, const struct net_bridge_vlan *range_end); -bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v); +bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v, + const struct net_bridge_port *p); size_t br_vlan_opts_nl_size(void); int br_vlan_process_options(const struct net_bridge *br, const struct net_bridge_port *p, diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c index 7eb6fd5bb917..de18e9c1d7a7 100644 --- a/net/bridge/br_switchdev.c +++ b/net/bridge/br_switchdev.c @@ -104,9 +104,8 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p, return 0; if (err) { - if (extack && !extack->_msg) - NL_SET_ERR_MSG_MOD(extack, - "bridge flag offload is not supported"); + NL_SET_ERR_MSG_WEAK_MOD(extack, + "bridge flag offload is not supported"); return -EOPNOTSUPP; } @@ -115,9 +114,8 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p, err = switchdev_port_attr_set(p->dev, &attr, extack); if (err) { - if (extack && !extack->_msg) - NL_SET_ERR_MSG_MOD(extack, - "error setting offload flag on port"); + NL_SET_ERR_MSG_WEAK_MOD(extack, + "error setting offload flag on port"); return err; } diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c index bc75fa1e4666..8a3dbc09ba38 100644 --- a/net/bridge/br_vlan.c +++ b/net/bridge/br_vlan.c @@ -1816,6 +1816,7 @@ out_err: /* v_opts is used to dump the options which must be equal in the whole range */ static bool br_vlan_fill_vids(struct sk_buff *skb, u16 vid, u16 vid_range, const struct net_bridge_vlan *v_opts, + const struct net_bridge_port *p, u16 flags, bool dump_stats) { @@ -1842,7 +1843,7 @@ static bool br_vlan_fill_vids(struct sk_buff *skb, u16 vid, u16 vid_range, goto out_err; if (v_opts) { - if (!br_vlan_opts_fill(skb, v_opts)) + if (!br_vlan_opts_fill(skb, v_opts, p)) goto out_err; if (dump_stats && !br_vlan_stats_fill(skb, v_opts)) @@ -1925,7 +1926,7 @@ void br_vlan_notify(const struct net_bridge *br, goto out_kfree; } - if (!br_vlan_fill_vids(skb, vid, vid_range, v, flags, false)) + if (!br_vlan_fill_vids(skb, vid, vid_range, v, p, flags, false)) goto out_err; nlmsg_end(skb, nlh); @@ -2030,7 +2031,7 @@ static int br_vlan_dump_dev(const struct net_device *dev, if (!br_vlan_fill_vids(skb, range_start->vid, range_end->vid, range_start, - vlan_flags, dump_stats)) { + p, vlan_flags, dump_stats)) { err = -EMSGSIZE; break; } @@ -2056,7 +2057,7 @@ update_end: else if (!dump_global && !br_vlan_fill_vids(skb, range_start->vid, range_end->vid, range_start, - br_vlan_flags(range_start, pvid), + p, br_vlan_flags(range_start, pvid), dump_stats)) err = -EMSGSIZE; } @@ -2131,6 +2132,8 @@ static const struct nla_policy br_vlan_db_policy[BRIDGE_VLANDB_ENTRY_MAX + 1] = [BRIDGE_VLANDB_ENTRY_STATE] = { .type = NLA_U8 }, [BRIDGE_VLANDB_ENTRY_TUNNEL_INFO] = { .type = NLA_NESTED }, [BRIDGE_VLANDB_ENTRY_MCAST_ROUTER] = { .type = NLA_U8 }, + [BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS] = { .type = NLA_REJECT }, + [BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS] = { .type = NLA_U32 }, }; static int br_vlan_rtm_process_one(struct net_device *dev, diff --git a/net/bridge/br_vlan_options.c b/net/bridge/br_vlan_options.c index a2724d03278c..e378c2f3a9e2 100644 --- a/net/bridge/br_vlan_options.c +++ b/net/bridge/br_vlan_options.c @@ -48,7 +48,8 @@ bool br_vlan_opts_eq_range(const struct net_bridge_vlan *v_curr, curr_mc_rtr == range_mc_rtr; } -bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v) +bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v, + const struct net_bridge_port *p) { if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_STATE, br_vlan_get_state(v)) || !__vlan_tun_put(skb, v)) @@ -58,6 +59,12 @@ bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v) if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_MCAST_ROUTER, br_vlan_multicast_router(v))) return false; + if (p && !br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx) && + (nla_put_u32(skb, BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS, + br_multicast_ngroups_get(&v->port_mcast_ctx)) || + nla_put_u32(skb, BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS, + br_multicast_ngroups_get_max(&v->port_mcast_ctx)))) + return false; #endif return true; @@ -70,6 +77,8 @@ size_t br_vlan_opts_nl_size(void) + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_TINFO_ID */ #ifdef CONFIG_BRIDGE_IGMP_SNOOPING + nla_total_size(sizeof(u8)) /* BRIDGE_VLANDB_ENTRY_MCAST_ROUTER */ + + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS */ + + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS */ #endif + 0; } @@ -212,6 +221,22 @@ static int br_vlan_process_one_opts(const struct net_bridge *br, return err; *changed = true; } + if (tb[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]) { + u32 val; + + if (!p) { + NL_SET_ERR_MSG_MOD(extack, "Can't set mcast_max_groups for non-port vlans"); + return -EINVAL; + } + if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx)) { + NL_SET_ERR_MSG_MOD(extack, "Multicast snooping disabled on this VLAN"); + return -EINVAL; + } + + val = nla_get_u32(tb[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]); + br_multicast_ngroups_set_max(&v->port_mcast_ctx, val); + *changed = true; + } #endif return 0; diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c index 5c5dd437f1c2..71056ee84773 100644 --- a/net/bridge/netfilter/nf_conntrack_bridge.c +++ b/net/bridge/netfilter/nf_conntrack_bridge.c @@ -212,7 +212,7 @@ static int nf_ct_br_ip_check(const struct sk_buff *skb) iph->version != 4) return -1; - len = ntohs(iph->tot_len); + len = skb_ip_totlen(skb); if (skb->len < nhoff + len || len < (iph->ihl * 4)) return -1; @@ -256,7 +256,7 @@ static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb, if (!pskb_may_pull(skb, sizeof(struct iphdr))) return NF_ACCEPT; - len = ntohs(ip_hdr(skb)->tot_len); + len = skb_ip_totlen(skb); if (pskb_trim_rcsum(skb, len)) return NF_ACCEPT; |