summaryrefslogtreecommitdiffstats
path: root/net (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'net-6.11-rc0' of ↵Linus Torvalds2024-07-198-21/+46
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter. Notably this includes fixes for a s390 build breakage. Current release - new code bugs: - eth: fbnic: fix s390 build - eth: airoha: fix NULL pointer dereference in airoha_qdma_cleanup_rx_queue() Previous releases - regressions: - flow_dissector: use DEBUG_NET_WARN_ON_ONCE - ipv4: fix incorrect TOS in route get reply - dsa: fix chip-wide frame size config in some drivers Previous releases - always broken: - netfilter: nf_set_pipapo: fix initial map fill - eth: gve: fix XDP TX completion handling when counters overflow" * tag 'net-6.11-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: eth: fbnic: don't build the driver when skb has more than 21 frags net: dsa: b53: Limit chip-wide jumbo frame config to CPU ports net: dsa: mv88e6xxx: Limit chip-wide frame size config to CPU ports net: airoha: Fix NULL pointer dereference in airoha_qdma_cleanup_rx_queue() net: wwan: t7xx: add support for Dell DW5933e ipv4: Fix incorrect TOS in fibmatch route get reply ipv4: Fix incorrect TOS in route get reply net: flow_dissector: use DEBUG_NET_WARN_ON_ONCE driver core: auxiliary bus: Fix documentation of auxiliary_device net: airoha: fix error branch in airoha_dev_xmit and airoha_set_gdm_ports gve: Fix XDP TX completion handling when counters overflow ipvs: properly dereference pe in ip_vs_add_service selftests: netfilter: add test case for recent mismatch bug netfilter: nf_set_pipapo: fix initial map fill netfilter: ctnetlink: use helper function to calculate expect ID eth: fbnic: fix s390 build.
| * Merge tag 'nf-24-07-17' of ↵Paolo Abeni2024-07-185-12/+36
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for net: 1) Call nf_expect_get_id() to delete expectation by ID. By trial and error it is possible to leak the LSB of the expectation address on x86_64. This bug is a leftover when converting the existing code to use nf_expect_get_id(). 2) Incorrect initialization in pipapo set backend leads to packet mismatches. From Florian Westphal. 3) Extend netfilter's selftests to cover for the pipapo set backend, also from Florian. 4) Fix sparse warning in IPVS when adding service, from Chen Hanxiao. netfilter pull request 24-07-17 * tag 'nf-24-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: ipvs: properly dereference pe in ip_vs_add_service selftests: netfilter: add test case for recent mismatch bug netfilter: nf_set_pipapo: fix initial map fill netfilter: ctnetlink: use helper function to calculate expect ID ==================== Link: https://patch.msgid.link/20240717215214.225394-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| | * ipvs: properly dereference pe in ip_vs_add_serviceChen Hanxiao2024-07-171-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use pe directly to resolve sparse warning: net/netfilter/ipvs/ip_vs_ctl.c:1471:27: warning: dereference of noderef expression Fixes: 39b972231536 ("ipvs: handle connections started by real-servers") Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * netfilter: nf_set_pipapo: fix initial map fillFlorian Westphal2024-07-173-6/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The initial buffer has to be inited to all-ones, but it must restrict it to the size of the first field, not the total field size. After each round in the map search step, the result and the fill map are swapped, so if we have a set where f->bsize of the first element is smaller than m->bsize_max, those one-bits are leaked into future rounds result map. This makes pipapo find an incorrect matching results for sets where first field size is not the largest. Followup patch adds a test case to nft_concat_range.sh selftest script. Thanks to Stefano Brivio for pointing out that we need to zero out the remainder explicitly, only correcting memset() argument isn't enough. Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Reported-by: Yi Chen <yiche@redhat.com> Cc: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * netfilter: ctnetlink: use helper function to calculate expect IDPablo Neira Ayuso2024-07-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Delete expectation path is missing a call to the nf_expect_get_id() helper function to calculate the expectation ID, otherwise LSB of the expectation object address is leaked to userspace. Fixes: 3c79107631db ("netfilter: ctnetlink: don't use conntrack/expect object addresses as id") Reported-by: zdi-disclosures@trendmicro.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | ipv4: Fix incorrect TOS in fibmatch route get replyIdo Schimmel2024-07-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TOS value that is returned to user space in the route get reply is the one with which the lookup was performed ('fl4->flowi4_tos'). This is fine when the matched route is configured with a TOS as it would not match if its TOS value did not match the one with which the lookup was performed. However, matching on TOS is only performed when the route's TOS is not zero. It is therefore possible to have the kernel incorrectly return a non-zero TOS: # ip link add name dummy1 up type dummy # ip address add 192.0.2.1/24 dev dummy1 # ip route get fibmatch 192.0.2.2 tos 0xfc 192.0.2.0/24 tos 0x1c dev dummy1 proto kernel scope link src 192.0.2.1 Fix by instead returning the DSCP field from the FIB result structure which was populated during the route lookup. Output after the patch: # ip link add name dummy1 up type dummy # ip address add 192.0.2.1/24 dev dummy1 # ip route get fibmatch 192.0.2.2 tos 0xfc 192.0.2.0/24 dev dummy1 proto kernel scope link src 192.0.2.1 Extend the existing selftests to not only verify that the correct route is returned, but that it is also returned with correct "tos" value (or without it). Fixes: b61798130f1b ("net: ipv4: RTM_GETROUTE: return matched fib result when requested") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | ipv4: Fix incorrect TOS in route get replyIdo Schimmel2024-07-182-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TOS value that is returned to user space in the route get reply is the one with which the lookup was performed ('fl4->flowi4_tos'). This is fine when the matched route is configured with a TOS as it would not match if its TOS value did not match the one with which the lookup was performed. However, matching on TOS is only performed when the route's TOS is not zero. It is therefore possible to have the kernel incorrectly return a non-zero TOS: # ip link add name dummy1 up type dummy # ip address add 192.0.2.1/24 dev dummy1 # ip route get 192.0.2.2 tos 0xfc 192.0.2.2 tos 0x1c dev dummy1 src 192.0.2.1 uid 0 cache Fix by adding a DSCP field to the FIB result structure (inside an existing 4 bytes hole), populating it in the route lookup and using it when filling the route get reply. Output after the patch: # ip link add name dummy1 up type dummy # ip address add 192.0.2.1/24 dev dummy1 # ip route get 192.0.2.2 tos 0xfc 192.0.2.2 dev dummy1 src 192.0.2.1 uid 0 cache Fixes: 1a00fee4ffb2 ("ipv4: Remove rt_key_{src,dst,tos} from struct rtable.") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | net: flow_dissector: use DEBUG_NET_WARN_ON_ONCEPablo Neira Ayuso2024-07-181-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following splat is easy to reproduce upstream as well as in -stable kernels. Florian Westphal provided the following commit: d1dab4f71d37 ("net: add and use __skb_get_hash_symmetric_net") but this complementary fix has been also suggested by Willem de Bruijn and it can be easily backported to -stable kernel which consists in using DEBUG_NET_WARN_ON_ONCE instead to silence the following splat given __skb_get_hash() is used by the nftables tracing infrastructure to to identify packets in traces. [69133.561393] ------------[ cut here ]------------ [69133.561404] WARNING: CPU: 0 PID: 43576 at net/core/flow_dissector.c:1104 __skb_flow_dissect+0x134f/ [...] [69133.561944] CPU: 0 PID: 43576 Comm: socat Not tainted 6.10.0-rc7+ #379 [69133.561959] RIP: 0010:__skb_flow_dissect+0x134f/0x2ad0 [69133.561970] Code: 83 f9 04 0f 84 b3 00 00 00 45 85 c9 0f 84 aa 00 00 00 41 83 f9 02 0f 84 81 fc ff ff 44 0f b7 b4 24 80 00 00 00 e9 8b f9 ff ff <0f> 0b e9 20 f3 ff ff 41 f6 c6 20 0f 84 e4 ef ff ff 48 8d 7b 12 e8 [69133.561979] RSP: 0018:ffffc90000006fc0 EFLAGS: 00010246 [69133.561988] RAX: 0000000000000000 RBX: ffffffff82f33e20 RCX: ffffffff81ab7e19 [69133.561994] RDX: dffffc0000000000 RSI: ffffc90000007388 RDI: ffff888103a1b418 [69133.562001] RBP: ffffc90000007310 R08: 0000000000000000 R09: 0000000000000000 [69133.562007] R10: ffffc90000007388 R11: ffffffff810cface R12: ffff888103a1b400 [69133.562013] R13: 0000000000000000 R14: ffffffff82f33e2a R15: ffffffff82f33e28 [69133.562020] FS: 00007f40f7131740(0000) GS:ffff888390800000(0000) knlGS:0000000000000000 [69133.562027] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [69133.562033] CR2: 00007f40f7346ee0 CR3: 000000015d200001 CR4: 00000000001706f0 [69133.562040] Call Trace: [69133.562044] <IRQ> [69133.562049] ? __warn+0x9f/0x1a0 [ 1211.841384] ? __skb_flow_dissect+0x107e/0x2860 [...] [ 1211.841496] ? bpf_flow_dissect+0x160/0x160 [ 1211.841753] __skb_get_hash+0x97/0x280 [ 1211.841765] ? __skb_get_hash_symmetric+0x230/0x230 [ 1211.841776] ? mod_find+0xbf/0xe0 [ 1211.841786] ? get_stack_info_noinstr+0x12/0xe0 [ 1211.841798] ? bpf_ksym_find+0x56/0xe0 [ 1211.841807] ? __rcu_read_unlock+0x2a/0x70 [ 1211.841819] nft_trace_init+0x1b9/0x1c0 [nf_tables] [ 1211.841895] ? nft_trace_notify+0x830/0x830 [nf_tables] [ 1211.841964] ? get_stack_info+0x2b/0x80 [ 1211.841975] ? nft_do_chain_arp+0x80/0x80 [nf_tables] [ 1211.842044] nft_do_chain+0x79c/0x850 [nf_tables] Fixes: 9b52e3f267a6 ("flow_dissector: handle no-skb use case") Suggested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240715141442.43775-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* | Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2024-07-191-11/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: "Several new features here: - Virtio find vqs API has been reworked (required to fix the scalability issue we have with adminq, which I hope to merge later in the cycle) - vDPA driver for Marvell OCTEON - virtio fs performance improvement - mlx5 migration speedups Fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (56 commits) virtio: rename virtio_find_vqs_info() to virtio_find_vqs() virtio: remove unused virtio_find_vqs() and virtio_find_vqs_ctx() helpers virtio: convert the rest virtio_find_vqs() users to virtio_find_vqs_info() virtio_balloon: convert to use virtio_find_vqs_info() virtiofs: convert to use virtio_find_vqs_info() scsi: virtio_scsi: convert to use virtio_find_vqs_info() virtio_net: convert to use virtio_find_vqs_info() virtio_crypto: convert to use virtio_find_vqs_info() virtio_console: convert to use virtio_find_vqs_info() virtio_blk: convert to use virtio_find_vqs_info() virtio: rename find_vqs_info() op to find_vqs() virtio: remove the original find_vqs() op virtio: call virtio_find_vqs_info() from virtio_find_single_vq() directly virtio: convert find_vqs() op implementations to find_vqs_info() virtio_pci: convert vp_*find_vqs() ops to find_vqs_info() virtio: introduce virtio_queue_info struct and find_vqs_info() config op virtio: make virtio_find_single_vq() call virtio_find_vqs() virtio: make virtio_find_vqs() call virtio_find_vqs_ctx() caif_virtio: use virtio_find_single_vq() for single virtqueue finding vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready() ...
| * | virtio: rename virtio_find_vqs_info() to virtio_find_vqs()Jiri Pirko2024-07-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since the original virtio_find_vqs() is no longer present, rename virtio_find_vqs_info() back to virtio_find_vqs(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-20-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | virtio: convert the rest virtio_find_vqs() users to virtio_find_vqs_info()Jiri Pirko2024-07-171-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of passing separate names and callbacks arrays to virtio_find_vqs(), have one of virtual_queue_info structs and pass it to virtio_find_vqs_info(). Suggested-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-18-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | | Merge tag 'nfs-for-6.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2024-07-199-59/+266
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Anna Schumaker: "New Features: - Add support for large folios - Implement rpcrdma generic device removal notification - Add client support for attribute delegations - Use a LAYOUTRETURN during reboot recovery to report layoutstats and errors - Improve throughput for random buffered writes - Add NVMe support to pnfs/blocklayout Bugfixes: - Fix rpcrdma_reqs_reset() - Avoid soft lockups when using UDP - Fix an nfs/blocklayout premature PR key unregestration - Another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server - Do not extend writes to the entire folio - Pass explicit offset and count values to tracepoints - Fix a race to wake up sleeping SUNRPC sync tasks - Fix gss_status tracepoint output Cleanups: - Add missing MODULE_DESCRIPTION() macros - Add blocklayout / SCSI layout tracepoints - Remove asm-generic headers from xprtrdma verbs.c - Remove unused 'struct mnt_fhstatus' - Other delegation related cleanups - Other folio related cleanups - Other pNFS related cleanups - Other xprtrdma cleanups" * tag 'nfs-for-6.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits) SUNRPC: Fixup gss_status tracepoint error output SUNRPC: Fix a race to wake a sync task nfs: split nfs_read_folio nfs: pass explicit offset/count to trace events nfs: do not extend writes to the entire folio nfs/blocklayout: add support for NVMe nfs: remove nfs_page_length nfs: remove the unused max_deviceinfo_size field from struct pnfs_layoutdriver_type nfs: don't reuse partially completed requests in nfs_lock_and_join_requests nfs: move nfs_wait_on_request to write.c nfs: fold nfs_page_group_lock_subrequests into nfs_lock_and_join_requests nfs: fold nfs_folio_find_and_lock_request into nfs_lock_and_join_requests nfs: simplify nfs_folio_find_and_lock_request nfs: remove nfs_folio_private_request nfs: remove dead code for the old swap over NFS implementation NFSv4.1 another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server nfs: Block on write congestion nfs: Properly initialize server->writeback nfs: Drop pointless check from nfs_commit_release_pages() nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg ...
| * | | SUNRPC: Fix a race to wake a sync taskBenjamin Coddington2024-07-181-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've observed NFS clients with sync tasks sleeping in __rpc_execute waiting on RPC_TASK_QUEUED that have not responded to a wake-up from rpc_make_runnable(). I suspect this problem usually goes unnoticed, because on a busy client the task will eventually be re-awoken by another task completion or xprt event. However, if the state manager is draining the slot table, a sync task missing a wake-up can result in a hung client. We've been able to prove that the waker in rpc_make_runnable() successfully calls wake_up_bit() (ie- there's no race to tk_runstate), but the wake_up_bit() call fails to wake the waiter. I suspect the waker is missing the load of the bit's wait_queue_head, so waitqueue_active() is false. There are some very helpful comments about this problem above wake_up_bit(), prepare_to_wait(), and waitqueue_active(). Fix this by inserting smp_mb__after_atomic() before the wake_up_bit(), which pairs with prepare_to_wait() calling set_current_state(). Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | SUNRPC: avoid soft lockup when transmitting UDP to reachable server.NeilBrown2024-07-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to the commit identified below, call_transmit_status() would handle -EPERM and other errors related to an unreachable server by falling through to call_status() which added a 3-second delay and handled the failure as a timeout. Since that commit, call_transmit_status() falls through to handle_bind(). For UDP this moves straight on to handle_connect() and handle_transmit() so we immediately retransmit - and likely get the same error. This results in an indefinite loop in __rpc_execute() which triggers a soft-lockup warning. For the errors that indicate an unreachable server, call_transmit_status() should fall back to call_status() as it did before. This cannot cause the thundering herd that the previous patch was avoiding, as the call_status() will insert a delay. Fixes: ed7dc973bd91 ("SUNRPC: Prevent thundering herd when the socket is not connected") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | xprtrdma: Remove temp allocation of rpcrdma_rep objectsChuck Lever2024-07-083-37/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original code was designed so that most calls to rpcrdma_rep_create() would occur on the NUMA node that the device preferred. There are a few cases where that's not possible, so those reps are marked as temporary. However, we have the device (and its preferred node) already in rpcrdma_rep_create(), so let's use that to guarantee the memory is allocated from the correct node. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | xprtrdma: Clean up synopsis of frwr_mr_unmap()Chuck Lever2024-07-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7a03aeb66c41 ("xprtrdma: Micro-optimize MR DMA-unmapping") removed the last use of the @r_xprt parameter in this function, but neglected to remove the parameter itself. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | xprtrdma: Handle device removal outside of the CM event handlerChuck Lever2024-07-082-9/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wait for all disconnects to complete to ensure the transport has divested all of its hardware resources before the underlying RDMA device can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | rpcrdma: Implement generic device removalChuck Lever2024-07-083-4/+197
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e87a911fed07 ("nvme-rdma: use ib_client API to detect device removal") explains the benefits of handling device removal outside of the CM event handler. Sketch in an IB device removal notification mechanism that can be used by both the client and server side RPC-over-RDMA transport implementations. Suggested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | xprtrdma: Fix rpcrdma_reqs_reset()Chuck Lever2024-07-082-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid FastReg operations getting MW_BIND_ERR after a reconnect. rpcrdma_reqs_reset() is called on transport tear-down to get each rpcrdma_req back into a clean state. MRs on req->rl_registered are waiting for a FastReg, are already registered, or are waiting for invalidation. If the transport is being torn down when reqs_reset() is called, the matching LocalInv might never be posted. That leaves these MR registered /and/ on req->rl_free_mrs, where they can be re-used for the next connection. Since xprtrdma does not keep specific track of the MR state, it's not possible to know what state these MRs are in, so the only safe thing to do is release them immediately. Fixes: 5de55ce951a1 ("xprtrdma: Release in-flight MRs on disconnect") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | xprtrdma: removed asm-generic headers from verbs.cTanzir Hasan2024-07-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | asm-generic/barrier.h and asm/bitops.h are already brought into the file through the header linux/sunrpc/svc_rdma.h and the file can still be built with their removal. They have been replaced with the preferred linux/bitops.h and asm/barrier.h to remove the need for the asm-generic header. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Tanzir Hasan <tanzirh@google.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | | | Merge tag 'nfsd-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds2024-07-174-69/+128
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Chuck Lever: "This is a light release containing optimizations, code clean-ups, and minor bug fixes. This development cycle focused on work outside of upstream kernel development: - Continuing to build upstream CI for NFSD based on kdevops - Continuing to focus on the quality of NFSD in LTS kernels - Participation in IETF nfsv4 WG discussions about NFSv4 ACLs, directory delegation, and NFSv4.2 COPY offload Notable features for v6.11 that do not come through the NFSD tree include NFS server-side support for the new pNFS NVMe layout type [RFC9561]. Functional testing for pNFS block layouts like this one has been introduced to our kdevops CI harness. Work on improving the resolution of file attribute time stamps in local filesystems is also ongoing tree-wide. As always I am grateful to NFSD contributors, reviewers, testers, and bug reporters who participated during this cycle" * tag 'nfsd-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: nfsd_file_lease_notifier_call gets a file_lease as an argument gss_krb5: Fix the error handling path for crypto_sync_skcipher_setkey MAINTAINERS: Add a bugzilla link for NFSD nfsd: new netlink ops to get/set server pool_mode sunrpc: refactor pool_mode setting code nfsd: allow passing in array of thread counts via netlink nfsd: make nfsd_svc take an array of thread counts sunrpc: fix up the special handling of sv_nrpools == 1 SUNRPC: Add a trace point in svc_xprt_deferred_close NFSD: Support write delegations in LAYOUTGET lockd: Use *-y instead of *-objs in Makefile NFSD: Fix nfsdcld warning svcrdma: Handle ADDR_CHANGE CM event properly svcrdma: Refactor the creation of listener CMA ID NFSD: remove unused structs 'nfsd3_voidargs' NFSD: harden svcxdr_dupstr() and svcxdr_tmpalloc() against integer overflows
| * | | gss_krb5: Fix the error handling path for crypto_sync_skcipher_setkeyGaosheng Cui2024-07-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we fail to call crypto_sync_skcipher_setkey, we should free the memory allocation for cipher, replace err_return with err_free_cipher to free the memory of cipher. Fixes: 4891f2d008e4 ("gss_krb5: import functionality to derive keys into the kernel") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | sunrpc: refactor pool_mode setting codeJeff Layton2024-07-081-21/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the pool_mode setting code to be called from internal callers so we can call it from a new netlink op. Add a new svc_pool_map_get function to return the current setting. Change the existing module parameter handling to use the new interfaces under the hood. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | sunrpc: fix up the special handling of sv_nrpools == 1Jeff Layton2024-07-081-19/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only pooled services take a reference to the svc_pool_map. The sunrpc code has always used the sv_nrpools value to detect whether the service is pooled. The problem there is that nfsd is a pooled service, but when it's running in "global" pool_mode, it doesn't take a reference to the pool map because it has a sv_nrpools value of 1. This means that we have two separate codepaths for starting the server, depending on whether it's pooled or not. Fix this by adding a new flag to the svc_serv, that indicates whether the serv is pooled. With this we can have the nfsd service unconditionally take a reference, regardless of pool_mode. Note that this is a behavior change for /sys/module/sunrpc/parameters/pool_mode. Usually this file does not allow you to change the pool-mode while there are nfsd threads running, but if the pool-mode is "global" it's allowed. My assumption is that this is a bug, since it probably should never have worked this way. This patch changes the behavior such that you get back EBUSY even when nfsd is running in global mode. I think this is more reasonable behavior, and given that most people set this today using the module parameter, it's doubtful anyone will notice. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | SUNRPC: Add a trace point in svc_xprt_deferred_closeChuck Lever2024-07-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The trace point in svc_xprt_close() reports only some local close requests. Try to capture more local close requests. Note that "trace-cmd record -T -e sunrpc:svc_xprt_close" will neatly capture the identity of the caller requesting the close. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | svcrdma: Handle ADDR_CHANGE CM event properlyChuck Lever2024-07-081-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sagi tells me that when a bonded device reports an address change, the consumer must destroy its listener IDs and create new ones. See commit a032e4f6d60d ("nvmet-rdma: fix bonding failover possible NULL deref"). Suggested-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | svcrdma: Refactor the creation of listener CMA IDChuck Lever2024-07-081-27/+40
| |/ / | | | | | | | | | | | | | | | | | | | | | In a moment, I will add a second consumer of CMA ID creation in svcrdma. Refactor so this code can be reused. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | Merge tag 'net-next-6.11' of ↵Linus Torvalds2024-07-17248-4246/+9380
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Not much excitement - a handful of large patchsets (devmem among them) did not make it in time. Core & protocols: - Use local_lock in addition to local_bh_disable() to protect per-CPU resources in networking, a step closer for local_bh_disable() not to act as a big lock on PREEMPT_RT - Use flex array for netdevice priv area, ensure its cache alignment - Add a sysctl knob to allow user to specify a default rto_min at socket init time. Bit of a big hammer but multiple companies were independently carrying such patch downstream so clearly it's useful - Support scheduling transmission of packets based on CLOCK_TAI - Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned off using cpusets - Support multiple L2TPv3 UDP tunnels using the same 5-tuple address - Allow configuration of multipath hash seed, to both allow synchronizing hashing of two routers, and preventing partial accidental sync - Improve TCP compliance with RFC 9293 for simultaneous connect() - Support sending NAT keepalives in IPsec ESP in UDP states. Userspace IKE daemon had to do this before, but the kernel can better keep track of it - Support sending supervision HSR frames with MAC addresses stored in ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled - Introduce IPPROTO_SMC for selecting SMC when socket is created - Allow UDP GSO transmit from devices with no checksum offload - openvswitch: add packet sampling via psample, separating the sampled traffic from "upcall" packets sent to user space for forwarding - nf_tables: shrink memory consumption for transaction objects Things we sprinkled into general kernel code: - Power Sequencing subsystem (used by Qualcomm Bluetooth driver for QCA6390) [ Already merged separately - Linus ] - Add IRQ information in sysfs for auxiliary bus - Introduce guard definition for local_lock - Add aligned flavor of __cacheline_group_{begin, end}() markings for grouping fields in structures BPF: - Notify user space (via epoll) when a struct_ops object is getting detached/unregistered - Add new kfuncs for a generic, open-coded bits iterator - Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head - Support resilient split BTF which cuts down on duplication and makes BTF as compact as possible WRT BTF from modules - Add support for dumping kfunc prototypes from BTF which enables both detecting as well as dumping compilable prototypes for kfuncs - riscv64 BPF JIT improvements in particular to add 12-argument support for BPF trampolines and to utilize bpf_prog_pack for the latter - Add the capability to offload the netfilter flowtable in XDP layer through kfuncs Driver API: - Allow users to configure IRQ tresholds between which automatic IRQ moderation can choose - Expand Power Sourcing (PoE) status with power, class and failure reason. Support setting power limits - Track additional RSS contexts in the core, make sure configuration changes don't break them - Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP data paths - Support updating firmware on SFP modules Tests and tooling: - mptcp: use net/lib.sh to manage netns - TCP-AO and TCP-MD5: replace debug prints used by tests with tracepoints - openvswitch: make test self-contained (don't depend on OvS CLI tools) Drivers: - Ethernet high-speed NICs: - Broadcom (bnxt): - increase the max total outstanding PTP TX packets to 4 - add timestamping statistics support - implement netdev_queue_mgmt_ops - support new RSS context API - Intel (100G, ice, idpf): - implement FEC statistics and dumping signal quality indicators - support E825C products (with 56Gbps PHYs) - nVidia/Mellanox: - support HW-GRO - mlx4/mlx5: support per-queue statistics via netlink - obey the max number of EQs setting in sub-functions - AMD/Solarflare: - support new RSS context API - AMD/Pensando: - ionic: rework fix for doorbell miss to lower overhead and skip it on new HW - Wangxun: - txgbe: support Flow Director perfect filters - Ethernet NICs consumer, embedded and virtual: - Add driver for Tehuti Networks TN40xx chips - Add driver for Meta's internal NIC chips - Add driver for Ethernet MAC on Airoha EN7581 SoCs - Add driver for Renesas Ethernet-TSN devices - Google cloud vNIC: - flow steering support - Microsoft vNIC: - support page sizes other than 4KB on ARM64 - vmware vNIC: - support latency measurement (update to version 9) - VirtIO net: - support for Byte Queue Limits - support configuring thresholds for automatic IRQ moderation - support for AF_XDP Rx zero-copy - Synopsys (stmmac): - support for STM32MP13 SoC - let platforms select the right PCS implementation - TI: - icssg-prueth: add multicast filtering support - icssg-prueth: enable PTP timestamping and PPS - Renesas: - ravb: improve Rx performance 30-400% by using page pool, theaded NAPI and timer-based IRQ coalescing - ravb: add MII support for R-Car V4M - Cadence (macb): - macb: add ARP support to Wake-On-LAN - Cortina: - use phylib for RX and TX pause configuration - Ethernet switches: - nVidia/Mellanox: - support configuration of multipath hash seed - report more accurate max MTU - use page_pool to improve Rx performance - MediaTek: - mt7530: add support for bridge port isolation - Qualcomm: - qca8k: add support for bridge port isolation - Microchip: - lan9371/2: add 100BaseTX PHY support - NXP: - vsc73xx: implement VLAN operations - Ethernet PHYs: - aquantia: enable support for aqr115c - aquantia: add support for PHY LEDs - realtek: add support for rtl8224 2.5Gbps PHY - xpcs: add memory-mapped device support - add BroadR-Reach link mode and support in Broadcom's PHY driver - CAN: - add document for ISO 15765-2 protocol support - mcp251xfd: workaround for erratum DS80000789E, use timestamps to catch when device returns incorrect FIFO status - WiFi: - mac80211/cfg80211: - parse Transmit Power Envelope (TPE) data in mac80211 instead of in drivers - improvements for 6 GHz regulatory flexibility - multi-link improvements - support multiple radios per wiphy - remove DEAUTH_NEED_MGD_TX_PREP flag - Intel (iwlwifi): - bump FW API to 91 for BZ/SC devices - report 64-bit radiotap timestamp - enable P2P low latency by default - handle Transmit Power Envelope (TPE) advertised by AP - remove support for older FW for new devices - fast resume (keeping the device configured) - mvm: re-enable Multi-Link Operation (MLO) - aggregation (A-MSDU) optimizations - MediaTek (mt76): - mt7925 Multi-Link Operation (MLO) support - Qualcomm (ath10k): - LED support for various chipsets - Qualcomm (ath12k): - remove unsupported Tx monitor handling - support channel 2 in 6 GHz band - support Spatial Multiplexing Power Save (SMPS) in 6 GHz band - supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID Advertisements (EMA) - support dynamic VLAN - add panic handler for resetting the firmware state - DebugFS support for datapath statistics - WCN7850: support for Wake on WLAN - Microchip (wilc1000): - read MAC address during probe to make it visible to user space - suspend/resume improvements - TI (wl18xx): - support newer firmware versions - RealTek (rtw89): - preparation for RTL8852BE-VT support - Wake on WLAN support for WiFi 6 chips - 36-bit PCI DMA support - RealTek (rtlwifi): - RTL8192DU support - Broadcom (brcmfmac): - Management Frame Protection support (to enable WPA3) - Bluetooth: - qualcomm: use the power sequencer for QCA6390 - btusb: mediatek: add ISO data transmission functions - hci_bcm4377: add BCM4388 support - btintel: add support for BlazarU core - btintel: add support for Whale Peak2 - btnxpuart: add support for AW693 A1 chipset - btnxpuart: add support for IW615 chipset - btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591" * tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1589 commits) eth: fbnic: Fix spelling mistake "tiggerring" -> "triggering" tcp: Replace strncpy() with strscpy() wifi: ath12k: fix build vs old compiler tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child(). eth: fbnic: Write the TCAM tables used for RSS control and Rx to host eth: fbnic: Add L2 address programming eth: fbnic: Add basic Rx handling eth: fbnic: Add basic Tx handling eth: fbnic: Add link detection eth: fbnic: Add initial messaging to notify FW of our presence eth: fbnic: Implement Rx queue alloc/start/stop/free eth: fbnic: Implement Tx queue alloc/start/stop/free eth: fbnic: Allocate a netdevice and napi vectors with queues eth: fbnic: Add FW communication mechanism eth: fbnic: Add message parsing for FW messages eth: fbnic: Add register init to set PCIe/Ethernet device config eth: fbnic: Allocate core device specific structures and devlink interface eth: fbnic: Add scaffolding for Meta's NIC driver PCI: Add Meta Platforms vendor ID net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECK ...
| * | | tcp: Replace strncpy() with strscpy()Kees Cook2024-07-161-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the deprecated[1] uses of strncpy() in tcp_ca_get_name_by_key() and tcp_get_default_congestion_control(). The callers use the results as standard C strings (via nla_put_string() and proc handlers respectively), so trailing padding is not needed. Since passing the destination buffer arguments decays it to a pointer, the size can't be trivially determined by the compiler. ca->name is the same length in both cases, so strscpy() won't fail (when ca->name is NUL-terminated). Include the length explicitly instead of using the 2-argument strscpy(). Link: https://github.com/KSPP/linux/issues/90 [1] Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240714041111.it.918-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child().Kuniyuki Iwashima2024-07-161-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzkaller reported KMSAN splat in tcp_create_openreq_child(). [0] The uninit variable is tcp_rsk(req)->ao_keyid. tcp_rsk(req)->ao_keyid is initialised only when tcp_conn_request() finds a valid TCP AO option in SYN. Then, tcp_rsk(req)->used_tcp_ao is set accordingly. Let's not read tcp_rsk(req)->ao_keyid when tcp_rsk(req)->used_tcp_ao is false. [0]: BUG: KMSAN: uninit-value in tcp_create_openreq_child+0x198b/0x1ff0 net/ipv4/tcp_minisocks.c:610 tcp_create_openreq_child+0x198b/0x1ff0 net/ipv4/tcp_minisocks.c:610 tcp_v4_syn_recv_sock+0x18e/0x2170 net/ipv4/tcp_ipv4.c:1754 tcp_check_req+0x1a3e/0x20c0 net/ipv4/tcp_minisocks.c:852 tcp_v4_rcv+0x26a4/0x53a0 net/ipv4/tcp_ipv4.c:2265 ip_protocol_deliver_rcu+0x884/0x1270 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x30f/0x530 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] ip_local_deliver+0x230/0x4c0 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:460 [inline] ip_sublist_rcv_finish net/ipv4/ip_input.c:580 [inline] ip_list_rcv_finish net/ipv4/ip_input.c:631 [inline] ip_sublist_rcv+0x10f7/0x13e0 net/ipv4/ip_input.c:639 ip_list_rcv+0x952/0x9c0 net/ipv4/ip_input.c:674 __netif_receive_skb_list_ptype net/core/dev.c:5703 [inline] __netif_receive_skb_list_core+0xd92/0x11d0 net/core/dev.c:5751 __netif_receive_skb_list net/core/dev.c:5803 [inline] netif_receive_skb_list_internal+0xd8f/0x1350 net/core/dev.c:5895 gro_normal_list include/net/gro.h:515 [inline] napi_complete_done+0x3f2/0x990 net/core/dev.c:6246 e1000_clean+0x1fa4/0x5e50 drivers/net/ethernet/intel/e1000/e1000_main.c:3808 __napi_poll+0xd9/0x990 net/core/dev.c:6771 napi_poll net/core/dev.c:6840 [inline] net_rx_action+0x90f/0x17e0 net/core/dev.c:6962 handle_softirqs+0x152/0x6b0 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu kernel/softirq.c:637 [inline] irq_exit_rcu+0x5d/0x120 kernel/softirq.c:649 common_interrupt+0x83/0x90 arch/x86/kernel/irq.c:278 asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693 __msan_instrument_asm_store+0xd6/0xe0 arch_atomic_inc arch/x86/include/asm/atomic.h:53 [inline] raw_atomic_inc include/linux/atomic/atomic-arch-fallback.h:992 [inline] atomic_inc include/linux/atomic/atomic-instrumented.h:436 [inline] page_ref_inc include/linux/page_ref.h:153 [inline] folio_ref_inc include/linux/page_ref.h:160 [inline] filemap_map_order0_folio mm/filemap.c:3596 [inline] filemap_map_pages+0x11c7/0x2270 mm/filemap.c:3644 do_fault_around mm/memory.c:4879 [inline] do_read_fault mm/memory.c:4912 [inline] do_fault mm/memory.c:5051 [inline] do_pte_missing mm/memory.c:3897 [inline] handle_pte_fault mm/memory.c:5381 [inline] __handle_mm_fault mm/memory.c:5524 [inline] handle_mm_fault+0x3677/0x6f00 mm/memory.c:5689 do_user_addr_fault+0x1373/0x2b20 arch/x86/mm/fault.c:1338 handle_page_fault arch/x86/mm/fault.c:1481 [inline] exc_page_fault+0x54/0xc0 arch/x86/mm/fault.c:1539 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623 Uninit was stored to memory at: tcp_create_openreq_child+0x1984/0x1ff0 net/ipv4/tcp_minisocks.c:611 tcp_v4_syn_recv_sock+0x18e/0x2170 net/ipv4/tcp_ipv4.c:1754 tcp_check_req+0x1a3e/0x20c0 net/ipv4/tcp_minisocks.c:852 tcp_v4_rcv+0x26a4/0x53a0 net/ipv4/tcp_ipv4.c:2265 ip_protocol_deliver_rcu+0x884/0x1270 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x30f/0x530 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] ip_local_deliver+0x230/0x4c0 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:460 [inline] ip_sublist_rcv_finish net/ipv4/ip_input.c:580 [inline] ip_list_rcv_finish net/ipv4/ip_input.c:631 [inline] ip_sublist_rcv+0x10f7/0x13e0 net/ipv4/ip_input.c:639 ip_list_rcv+0x952/0x9c0 net/ipv4/ip_input.c:674 __netif_receive_skb_list_ptype net/core/dev.c:5703 [inline] __netif_receive_skb_list_core+0xd92/0x11d0 net/core/dev.c:5751 __netif_receive_skb_list net/core/dev.c:5803 [inline] netif_receive_skb_list_internal+0xd8f/0x1350 net/core/dev.c:5895 gro_normal_list include/net/gro.h:515 [inline] napi_complete_done+0x3f2/0x990 net/core/dev.c:6246 e1000_clean+0x1fa4/0x5e50 drivers/net/ethernet/intel/e1000/e1000_main.c:3808 __napi_poll+0xd9/0x990 net/core/dev.c:6771 napi_poll net/core/dev.c:6840 [inline] net_rx_action+0x90f/0x17e0 net/core/dev.c:6962 handle_softirqs+0x152/0x6b0 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu kernel/softirq.c:637 [inline] irq_exit_rcu+0x5d/0x120 kernel/softirq.c:649 common_interrupt+0x83/0x90 arch/x86/kernel/irq.c:278 asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693 Uninit was created at: __alloc_pages_noprof+0x82d/0xcb0 mm/page_alloc.c:4706 __alloc_pages_node_noprof include/linux/gfp.h:269 [inline] alloc_pages_node_noprof include/linux/gfp.h:296 [inline] alloc_slab_page mm/slub.c:2265 [inline] allocate_slab mm/slub.c:2428 [inline] new_slab+0x2af/0x14e0 mm/slub.c:2481 ___slab_alloc+0xf73/0x3150 mm/slub.c:3667 __slab_alloc mm/slub.c:3757 [inline] __slab_alloc_node mm/slub.c:3810 [inline] slab_alloc_node mm/slub.c:3990 [inline] kmem_cache_alloc_noprof+0x53a/0x9f0 mm/slub.c:4009 reqsk_alloc_noprof net/ipv4/inet_connection_sock.c:920 [inline] inet_reqsk_alloc+0x63/0x700 net/ipv4/inet_connection_sock.c:951 tcp_conn_request+0x339/0x4860 net/ipv4/tcp_input.c:7177 tcp_v4_conn_request+0x13b/0x190 net/ipv4/tcp_ipv4.c:1719 tcp_rcv_state_process+0x2dd/0x4a10 net/ipv4/tcp_input.c:6711 tcp_v4_do_rcv+0xbee/0x10d0 net/ipv4/tcp_ipv4.c:1932 tcp_v4_rcv+0x3fad/0x53a0 net/ipv4/tcp_ipv4.c:2334 ip_protocol_deliver_rcu+0x884/0x1270 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x30f/0x530 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] ip_local_deliver+0x230/0x4c0 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:460 [inline] ip_sublist_rcv_finish net/ipv4/ip_input.c:580 [inline] ip_list_rcv_finish net/ipv4/ip_input.c:631 [inline] ip_sublist_rcv+0x10f7/0x13e0 net/ipv4/ip_input.c:639 ip_list_rcv+0x952/0x9c0 net/ipv4/ip_input.c:674 __netif_receive_skb_list_ptype net/core/dev.c:5703 [inline] __netif_receive_skb_list_core+0xd92/0x11d0 net/core/dev.c:5751 __netif_receive_skb_list net/core/dev.c:5803 [inline] netif_receive_skb_list_internal+0xd8f/0x1350 net/core/dev.c:5895 gro_normal_list include/net/gro.h:515 [inline] napi_complete_done+0x3f2/0x990 net/core/dev.c:6246 e1000_clean+0x1fa4/0x5e50 drivers/net/ethernet/intel/e1000/e1000_main.c:3808 __napi_poll+0xd9/0x990 net/core/dev.c:6771 napi_poll net/core/dev.c:6840 [inline] net_rx_action+0x90f/0x17e0 net/core/dev.c:6962 handle_softirqs+0x152/0x6b0 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu kernel/softirq.c:637 [inline] irq_exit_rcu+0x5d/0x120 kernel/softirq.c:649 common_interrupt+0x83/0x90 arch/x86/kernel/irq.c:278 asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693 CPU: 0 PID: 239 Comm: modprobe Tainted: G B 6.10.0-rc7-01816-g852e42cc2dd4 #3 1107521f0c7b55c9309062382d0bda9f604dbb6d Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Fixes: 06b22ef29591 ("net/tcp: Wire TCP-AO to request sockets") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20240714161719.6528-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-07-1516-30/+190
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge in late fixes to prepare for the 6.11 net-next PR. Conflicts: 93c3a96c301f ("net: pse-pd: Do not return EOPNOSUPP if config is null") 4cddb0f15ea9 ("net: ethtool: pse-pd: Fix possible null-deref") 30d7b6727724 ("net: ethtool: Add new power limit get and set features") https://lore.kernel.org/20240715123204.623520bb@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | net: bridge: mst: Check vlan state for egress decisionElliot Ayrey2024-07-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a port is blocking in the common instance but forwarding in an MST instance, traffic egressing the bridge will be dropped because the state of the common instance is overriding that of the MST instance. Fix this by skipping the port state check in MST mode to allow checking the vlan state via br_allowed_egress(). This is similar to what happens in br_handle_frame_finish() when checking ingress traffic, which was introduced in the change below. Fixes: ec7328b59176 ("net: bridge: mst: Multiple Spanning Tree (MST) mode") Signed-off-by: Elliot Ayrey <elliot.ayrey@alliedtelesis.co.nz> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | xdp: fix invalid wait context of page_pool_destroy()Taehee Yoo2024-07-151-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the driver uses a page pool, it creates a page pool with page_pool_create(). The reference count of page pool is 1 as default. A page pool will be destroyed only when a reference count reaches 0. page_pool_destroy() is used to destroy page pool, it decreases a reference count. When a page pool is destroyed, ->disconnect() is called, which is mem_allocator_disconnect(). This function internally acquires mutex_lock(). If the driver uses XDP, it registers a memory model with xdp_rxq_info_reg_mem_model(). The xdp_rxq_info_reg_mem_model() internally increases a page pool reference count if a memory model is a page pool. Now the reference count is 2. To destroy a page pool, the driver should call both page_pool_destroy() and xdp_unreg_mem_model(). The xdp_unreg_mem_model() internally calls page_pool_destroy(). Only page_pool_destroy() decreases a reference count. If a driver calls page_pool_destroy() then xdp_unreg_mem_model(), we will face an invalid wait context warning. Because xdp_unreg_mem_model() calls page_pool_destroy() with rcu_read_lock(). The page_pool_destroy() internally acquires mutex_lock(). Splat looks like: ============================= [ BUG: Invalid wait context ] 6.10.0-rc6+ #4 Tainted: G W ----------------------------- ethtool/1806 is trying to lock: ffffffff90387b90 (mem_id_lock){+.+.}-{4:4}, at: mem_allocator_disconnect+0x73/0x150 other info that might help us debug this: context-{5:5} 3 locks held by ethtool/1806: stack backtrace: CPU: 0 PID: 1806 Comm: ethtool Tainted: G W 6.10.0-rc6+ #4 f916f41f172891c800f2fed Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 Call Trace: <TASK> dump_stack_lvl+0x7e/0xc0 __lock_acquire+0x1681/0x4de0 ? _printk+0x64/0xe0 ? __pfx_mark_lock.part.0+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 lock_acquire+0x1b3/0x580 ? mem_allocator_disconnect+0x73/0x150 ? __wake_up_klogd.part.0+0x16/0xc0 ? __pfx_lock_acquire+0x10/0x10 ? dump_stack_lvl+0x91/0xc0 __mutex_lock+0x15c/0x1690 ? mem_allocator_disconnect+0x73/0x150 ? __pfx_prb_read_valid+0x10/0x10 ? mem_allocator_disconnect+0x73/0x150 ? __pfx_llist_add_batch+0x10/0x10 ? console_unlock+0x193/0x1b0 ? lockdep_hardirqs_on+0xbe/0x140 ? __pfx___mutex_lock+0x10/0x10 ? tick_nohz_tick_stopped+0x16/0x90 ? __irq_work_queue_local+0x1e5/0x330 ? irq_work_queue+0x39/0x50 ? __wake_up_klogd.part.0+0x79/0xc0 ? mem_allocator_disconnect+0x73/0x150 mem_allocator_disconnect+0x73/0x150 ? __pfx_mem_allocator_disconnect+0x10/0x10 ? mark_held_locks+0xa5/0xf0 ? rcu_is_watching+0x11/0xb0 page_pool_release+0x36e/0x6d0 page_pool_destroy+0xd7/0x440 xdp_unreg_mem_model+0x1a7/0x2a0 ? __pfx_xdp_unreg_mem_model+0x10/0x10 ? kfree+0x125/0x370 ? bnxt_free_ring.isra.0+0x2eb/0x500 ? bnxt_free_mem+0x5ac/0x2500 xdp_rxq_info_unreg+0x4a/0xd0 bnxt_free_mem+0x1356/0x2500 bnxt_close_nic+0xf0/0x3b0 ? __pfx_bnxt_close_nic+0x10/0x10 ? ethnl_parse_bit+0x2c6/0x6d0 ? __pfx___nla_validate_parse+0x10/0x10 ? __pfx_ethnl_parse_bit+0x10/0x10 bnxt_set_features+0x2a8/0x3e0 __netdev_update_features+0x4dc/0x1370 ? ethnl_parse_bitset+0x4ff/0x750 ? __pfx_ethnl_parse_bitset+0x10/0x10 ? __pfx___netdev_update_features+0x10/0x10 ? mark_held_locks+0xa5/0xf0 ? _raw_spin_unlock_irqrestore+0x42/0x70 ? __pm_runtime_resume+0x7d/0x110 ethnl_set_features+0x32d/0xa20 To fix this problem, it uses rhashtable_lookup_fast() instead of rhashtable_lookup() with rcu_read_lock(). Using xa without rcu_read_lock() here is safe. xa is freed by __xdp_mem_allocator_rcu_free() and this is called by call_rcu() of mem_xa_remove(). The mem_xa_remove() is called by page_pool_destroy() if a reference count reaches 0. The xa is already protected by the reference count mechanism well in the control plane. So removing rcu_read_lock() for page_pool_destroy() is safe. Fixes: c3f812cea0d7 ("page_pool: do not release pool until inflight == 0.") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20240712095116.3801586-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | af_packet: Handle outgoing VLAN packets without hardware offloadingChengen Du2024-07-151-2/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The issue initially stems from libpcap. The ethertype will be overwritten as the VLAN TPID if the network interface lacks hardware VLAN offloading. In the outbound packet path, if hardware VLAN offloading is unavailable, the VLAN tag is inserted into the payload but then cleared from the sk_buff struct. Consequently, this can lead to a false negative when checking for the presence of a VLAN tag, causing the packet sniffing outcome to lack VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing tool may be unable to parse packets as expected. The TCI-TPID is missing because the prb_fill_vlan_info() function does not modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the payload and not in the sk_buff struct. The skb_vlan_tag_present() function only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header is stripped, preventing the packet capturing tool from determining the correct TCI-TPID value. Additionally, the protocol in SLL is incorrect, which means the packet capturing tool cannot parse the L3 header correctly. Link: https://github.com/the-tcpdump-group/libpcap/issues/1105 Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.com/T/#u Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace") Cc: stable@vger.kernel.org Signed-off-by: Chengen Du <chengen.du@canonical.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240713114735.62360-1-chengen.du@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | ipv6: take care of scope when choosing the src addrNicolas Dichtel2024-07-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the source address is selected, the scope must be checked. For example, if a loopback address is assigned to the vrf device, it must not be chosen for packets sent outside. CC: stable@vger.kernel.org Fixes: afbac6010aec ("net: ipv6: Address selection needs to consider L3 domains") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240710081521.3809742-4-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | ipv6: fix source address selection with route leakNicolas Dichtel2024-07-142-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By default, an address assigned to the output interface is selected when the source address is not specified. This is problematic when a route, configured in a vrf, uses an interface from another vrf (aka route leak). The original vrf does not own the selected source address. Let's add a check against the output interface and call the appropriate function to select the source address. CC: stable@vger.kernel.org Fixes: 0d240e7811c4 ("net: vrf: Implement get_saddr for IPv6") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://patch.msgid.link/20240710081521.3809742-3-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | ipv4: fix source address selection with route leakNicolas Dichtel2024-07-141-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By default, an address assigned to the output interface is selected when the source address is not specified. This is problematic when a route, configured in a vrf, uses an interface from another vrf (aka route leak). The original vrf does not own the selected source address. Let's add a check against the output interface and call the appropriate function to select the source address. CC: stable@vger.kernel.org Fixes: 8cbb512c923d ("net: Add source address lookup op for VRF") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240710081521.3809742-2-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | net: ethtool: pse-pd: Fix possible null-derefKory Maincent2024-07-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a possible null dereference when a PSE supports both c33 and PoDL, but only one of the netlink attributes is specified. The c33 or PoDL PSE capabilities are already validated in the ethnl_set_pse_validate() call. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20240705184116.13d8235a@kernel.org/ Fixes: 4d18e3ddf427 ("net: ethtool: pse-pd: Expand pse commands with the PSE PoE interface") Link: https://patch.msgid.link/20240711-fix_pse_pd_deref-v3-2-edd78fc4fe42@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | net: pse-pd: Do not return EOPNOSUPP if config is nullKory Maincent2024-07-141-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For a PSE supporting both c33 and PoDL, setting config for one type of PoE leaves the other type's config null. Currently, this case returns EOPNOTSUPP, which is incorrect. Instead, we should do nothing if the configuration is empty. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Fixes: d83e13761d5b ("net: pse-pd: Use regulator framework within PSE framework") Link: https://patch.msgid.link/20240711-fix_pse_pd_deref-v3-1-edd78fc4fe42@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | Merge tag 'ipsec-2024-07-11' of ↵Jakub Kicinski2024-07-148-17/+82
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-07-11 1) Fix esp_output_tail_tcp() on unsupported ESPINTCP. From Hagar Hemdan. 2) Fix two bugs in the recently introduced SA direction separation. From Antony Antony. 3) Fix unregister netdevice hang on hardware offload. We had to add another list where skbs linked to that are unlinked from the lists (deleted) but not yet freed. 4) Fix netdev reference count imbalance in xfrm_state_find. From Jianbo Liu. 5) Call xfrm_dev_policy_delete when killingi them on offloaded policies. Jianbo Liu. * tag 'ipsec-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: call xfrm_dev_policy_delete when kill policy xfrm: fix netdev reference count imbalance xfrm: Export symbol xfrm_dev_state_delete. xfrm: Fix unregister netdevice hang on hardware offload. xfrm: Log input direction mismatch error in one place xfrm: Fix input error path memory access net: esp: cleanup esp_output_tail_tcp() in case of unsupported ESPINTCP ==================== Link: https://patch.msgid.link/20240711100025.1949454-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | | * | | xfrm: call xfrm_dev_policy_delete when kill policyJianbo Liu2024-07-082-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfrm_policy_kill() is called at different places to delete xfrm policy. It will call xfrm_pol_put(). But xfrm_dev_policy_delete() is not called to free the policy offloaded to hardware. The three commits cited here are to handle this issue by calling xfrm_dev_policy_delete() outside xfrm_get_policy(). But they didn't cover all the cases. An example, which is not handled for now, is xfrm_policy_insert(). It is called when XFRM_MSG_UPDPOLICY request is received. Old policy is replaced by new one, but the offloaded policy is not deleted, so driver doesn't have the chance to release hardware resources. To resolve this issue for all cases, move xfrm_dev_policy_delete() into xfrm_policy_kill(), so the offloaded policy can be deleted from hardware when it is called, which avoids hardware resources leakage. Fixes: 919e43fad516 ("xfrm: add an interface to offload policy") Fixes: bf06fcf4be0f ("xfrm: add missed call to delete offloaded policies") Fixes: 982c3aca8bac ("xfrm: delete offloaded policy") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | | * | | xfrm: fix netdev reference count imbalanceJianbo Liu2024-07-081-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cited commit, netdev_tracker_alloc() is called for the newly allocated xfrm state, but dev_hold() is missed, which causes netdev reference count imbalance, because netdev_put() is called when the state is freed in xfrm_dev_state_free(). Fix the issue by replacing netdev_tracker_alloc() with netdev_hold(). Fixes: f8a70afafc17 ("xfrm: add TX datapath support for IPsec packet offload mode") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | | * | | xfrm: Export symbol xfrm_dev_state_delete.Steffen Klassert2024-07-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a build failure if xfrm_user is build as a module. Fixes: 07b87f9eea0c ("xfrm: Fix unregister netdevice hang on hardware offload.") Reported-by: Mark Brown <broonie@kernel.org> Tested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | | * | | xfrm: Fix unregister netdevice hang on hardware offload.Steffen Klassert2024-06-251-2/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When offloading xfrm states to hardware, the offloading device is attached to the skbs secpath. If a skb is free is deferred, an unregister netdevice hangs because the netdevice is still refcounted. Fix this by removing the netdevice from the xfrm states when the netdevice is unregistered. To find all xfrm states that need to be cleared we add another list where skbs linked to that are unlinked from the lists (deleted) but not yet freed. Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | | * | | xfrm: Log input direction mismatch error in one placeAntony Antony2024-06-173-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the offload data path decrypted the packet before checking the direction, leading to error logging and packet dropping. However, dropped packets wouldn't be visible in tcpdump or audit log. With this fix, the offload path, upon noticing SA direction mismatch, will pass the packet to the stack without decrypting it. The L3 layer will then log the error, audit, and drop ESP without decrypting or decapsulating it. This also ensures that the slow path records the error and audit log, making dropped packets visible in tcpdump. Fixes: 304b44f0d5a4 ("xfrm: Add dir validation to "in" data path lookup") Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | | * | | xfrm: Fix input error path memory accessAntony Antony2024-06-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When there is a misconfiguration of input state slow path KASAN report error. Fix this error. west login: [ 52.987278] eth1: renamed from veth11 [ 53.078814] eth1: renamed from veth21 [ 53.181355] eth1: renamed from veth31 [ 54.921702] ================================================================== [ 54.922602] BUG: KASAN: wild-memory-access in xfrmi_rcv_cb+0x2d/0x295 [ 54.923393] Read of size 8 at addr 6b6b6b6b00000000 by task ping/512 [ 54.924169] [ 54.924386] CPU: 0 PID: 512 Comm: ping Not tainted 6.9.0-08574-gcd29a4313a1b #25 [ 54.925290] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 54.926401] Call Trace: [ 54.926731] <IRQ> [ 54.927009] dump_stack_lvl+0x2a/0x3b [ 54.927478] kasan_report+0x84/0xa6 [ 54.927930] ? xfrmi_rcv_cb+0x2d/0x295 [ 54.928410] xfrmi_rcv_cb+0x2d/0x295 [ 54.928872] ? xfrm4_rcv_cb+0x3d/0x5e [ 54.929354] xfrm4_rcv_cb+0x46/0x5e [ 54.929804] xfrm_rcv_cb+0x7e/0xa1 [ 54.930240] xfrm_input+0x1b3a/0x1b96 [ 54.930715] ? xfrm_offload+0x41/0x41 [ 54.931182] ? raw_rcv+0x292/0x292 [ 54.931617] ? nf_conntrack_confirm+0xa2/0xa2 [ 54.932158] ? skb_sec_path+0xd/0x3f [ 54.932610] ? xfrmi_input+0x90/0xce [ 54.933066] xfrm4_esp_rcv+0x33/0x54 [ 54.933521] ip_protocol_deliver_rcu+0xd7/0x1b2 [ 54.934089] ip_local_deliver_finish+0x110/0x120 [ 54.934659] ? ip_protocol_deliver_rcu+0x1b2/0x1b2 [ 54.935248] NF_HOOK.constprop.0+0xf8/0x138 [ 54.935767] ? ip_sublist_rcv_finish+0x68/0x68 [ 54.936317] ? secure_tcpv6_ts_off+0x23/0x168 [ 54.936859] ? ip_protocol_deliver_rcu+0x1b2/0x1b2 [ 54.937454] ? __xfrm_policy_check2.constprop.0+0x18d/0x18d [ 54.938135] NF_HOOK.constprop.0+0xf8/0x138 [ 54.938663] ? ip_sublist_rcv_finish+0x68/0x68 [ 54.939220] ? __xfrm_policy_check2.constprop.0+0x18d/0x18d [ 54.939904] ? ip_local_deliver_finish+0x120/0x120 [ 54.940497] __netif_receive_skb_one_core+0xc9/0x107 [ 54.941121] ? __netif_receive_skb_list_core+0x1c2/0x1c2 [ 54.941771] ? blk_mq_start_stopped_hw_queues+0xc7/0xf9 [ 54.942413] ? blk_mq_start_stopped_hw_queue+0x38/0x38 [ 54.943044] ? virtqueue_get_buf_ctx+0x295/0x46b [ 54.943618] process_backlog+0xb3/0x187 [ 54.944102] __napi_poll.constprop.0+0x57/0x1a7 [ 54.944669] net_rx_action+0x1cb/0x380 [ 54.945150] ? __napi_poll.constprop.0+0x1a7/0x1a7 [ 54.945744] ? vring_new_virtqueue+0x17a/0x17a [ 54.946300] ? note_interrupt+0x2cd/0x367 [ 54.946805] handle_softirqs+0x13c/0x2c9 [ 54.947300] do_softirq+0x5f/0x7d [ 54.947727] </IRQ> [ 54.948014] <TASK> [ 54.948300] __local_bh_enable_ip+0x48/0x62 [ 54.948832] __neigh_event_send+0x3fd/0x4ca [ 54.949361] neigh_resolve_output+0x1e/0x210 [ 54.949896] ip_finish_output2+0x4bf/0x4f0 [ 54.950410] ? __ip_finish_output+0x171/0x1b8 [ 54.950956] ip_send_skb+0x25/0x57 [ 54.951390] raw_sendmsg+0xf95/0x10c0 [ 54.951850] ? check_new_pages+0x45/0x71 [ 54.952343] ? raw_hash_sk+0x21b/0x21b [ 54.952815] ? kernel_init_pages+0x42/0x51 [ 54.953337] ? prep_new_page+0x44/0x51 [ 54.953811] ? get_page_from_freelist+0x72b/0x915 [ 54.954390] ? signal_pending_state+0x77/0x77 [ 54.954936] ? preempt_count_sub+0x14/0xb3 [ 54.955450] ? __might_resched+0x8a/0x240 [ 54.955951] ? __might_sleep+0x25/0xa0 [ 54.956424] ? first_zones_zonelist+0x2c/0x43 [ 54.956977] ? __rcu_read_lock+0x2d/0x3a [ 54.957476] ? __pte_offset_map+0x32/0xa4 [ 54.957980] ? __might_resched+0x8a/0x240 [ 54.958483] ? __might_sleep+0x25/0xa0 [ 54.958963] ? inet_send_prepare+0x54/0x54 [ 54.959478] ? sock_sendmsg_nosec+0x42/0x6c [ 54.960000] sock_sendmsg_nosec+0x42/0x6c [ 54.960502] __sys_sendto+0x15d/0x1cc [ 54.960966] ? __x64_sys_getpeername+0x44/0x44 [ 54.961522] ? __handle_mm_fault+0x679/0xae4 [ 54.962068] ? find_vma+0x6b/0x8b [ 54.962497] ? find_vma_intersection+0x8a/0x8a [ 54.963052] ? handle_mm_fault+0x38/0x154 [ 54.963556] ? handle_mm_fault+0xeb/0x154 [ 54.964059] ? preempt_latency_start+0x29/0x34 [ 54.964613] ? preempt_count_sub+0x14/0xb3 [ 54.965141] ? up_read+0x4b/0x5c [ 54.965557] __x64_sys_sendto+0x76/0x82 [ 54.966041] do_syscall_64+0x69/0xd5 [ 54.966497] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 54.967119] RIP: 0033:0x7f2d2fec9a73 [ 54.967572] Code: 8b 15 a9 83 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 80 3d 71 0b 0d 00 00 41 89 ca 74 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24 [ 54.969747] RSP: 002b:00007ffe85756418 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 54.970655] RAX: ffffffffffffffda RBX: 0000558bebad1340 RCX: 00007f2d2fec9a73 [ 54.971511] RDX: 0000000000000040 RSI: 0000558bebad73c0 RDI: 0000000000000003 [ 54.972366] RBP: 0000558bebad73c0 R08: 0000558bebad35c0 R09: 0000000000000010 [ 54.973234] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000040 [ 54.974091] R13: 00007ffe85757b00 R14: 0000001d00000001 R15: 0000558bebad4680 [ 54.974951] </TASK> [ 54.975244] ================================================================== [ 54.976133] Disabling lock debugging due to kernel taint [ 54.976784] Oops: stack segment: 0000 [#1] PREEMPT DEBUG_PAGEALLOC KASAN [ 54.977603] CPU: 0 PID: 512 Comm: ping Tainted: G B 6.9.0-08574-gcd29a4313a1b #25 [ 54.978654] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 54.979750] RIP: 0010:xfrmi_rcv_cb+0x2d/0x295 [ 54.980293] Code: 00 00 41 57 41 56 41 89 f6 41 55 41 54 55 53 48 89 fb 51 85 f6 75 31 48 89 df e8 d7 e8 ff ff 48 89 c5 48 89 c7 e8 8b a4 4f ff <48> 8b 7d 00 48 89 ee e8 eb f3 ff ff 49 89 c5 b8 01 00 00 00 4d 85 [ 54.982462] RSP: 0018:ffffc90000007990 EFLAGS: 00010282 [ 54.983099] RAX: 0000000000000001 RBX: ffff8881126e9900 RCX: fffffbfff07b77cd [ 54.983948] RDX: fffffbfff07b77cd RSI: fffffbfff07b77cd RDI: ffffffff83dbbe60 [ 54.984794] RBP: 6b6b6b6b00000000 R08: 0000000000000008 R09: 0000000000000001 [ 54.985647] R10: ffffffff83dbbe67 R11: fffffbfff07b77cc R12: 00000000ffffffff [ 54.986512] R13: 00000000ffffffff R14: 00000000ffffffff R15: 0000000000000002 [ 54.987365] FS: 00007f2d2fc0dc40(0000) GS:ffffffff82eb2000(0000) knlGS:0000000000000000 [ 54.988329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 54.989026] CR2: 00007ffe85755ff8 CR3: 0000000109941000 CR4: 0000000000350ef0 [ 54.989897] Call Trace: [ 54.990223] <IRQ> [ 54.990500] ? __die_body+0x1a/0x56 [ 54.990950] ? die+0x30/0x49 [ 54.991326] ? do_trap+0x9b/0x132 [ 54.991751] ? do_error_trap+0x7d/0xaf [ 54.992223] ? exc_stack_segment+0x35/0x45 [ 54.992734] ? asm_exc_stack_segment+0x22/0x30 [ 54.993294] ? xfrmi_rcv_cb+0x2d/0x295 [ 54.993764] ? xfrm4_rcv_cb+0x3d/0x5e [ 54.994228] xfrm4_rcv_cb+0x46/0x5e [ 54.994670] xfrm_rcv_cb+0x7e/0xa1 [ 54.995106] xfrm_input+0x1b3a/0x1b96 [ 54.995572] ? xfrm_offload+0x41/0x41 [ 54.996038] ? raw_rcv+0x292/0x292 [ 54.996472] ? nf_conntrack_confirm+0xa2/0xa2 [ 54.997011] ? skb_sec_path+0xd/0x3f [ 54.997466] ? xfrmi_input+0x90/0xce [ 54.997925] xfrm4_esp_rcv+0x33/0x54 [ 54.998378] ip_protocol_deliver_rcu+0xd7/0x1b2 [ 54.998944] ip_local_deliver_finish+0x110/0x120 [ 54.999520] ? ip_protocol_deliver_rcu+0x1b2/0x1b2 [ 55.000111] NF_HOOK.constprop.0+0xf8/0x138 [ 55.000630] ? ip_sublist_rcv_finish+0x68/0x68 [ 55.001195] ? secure_tcpv6_ts_off+0x23/0x168 [ 55.001743] ? ip_protocol_deliver_rcu+0x1b2/0x1b2 [ 55.002331] ? __xfrm_policy_check2.constprop.0+0x18d/0x18d [ 55.003008] NF_HOOK.constprop.0+0xf8/0x138 [ 55.003527] ? ip_sublist_rcv_finish+0x68/0x68 [ 55.004078] ? __xfrm_policy_check2.constprop.0+0x18d/0x18d [ 55.004755] ? ip_local_deliver_finish+0x120/0x120 [ 55.005351] __netif_receive_skb_one_core+0xc9/0x107 [ 55.005972] ? __netif_receive_skb_list_core+0x1c2/0x1c2 [ 55.006626] ? blk_mq_start_stopped_hw_queues+0xc7/0xf9 [ 55.007266] ? blk_mq_start_stopped_hw_queue+0x38/0x38 [ 55.007899] ? virtqueue_get_buf_ctx+0x295/0x46b [ 55.008476] process_backlog+0xb3/0x187 [ 55.008961] __napi_poll.constprop.0+0x57/0x1a7 [ 55.009540] net_rx_action+0x1cb/0x380 [ 55.010020] ? __napi_poll.constprop.0+0x1a7/0x1a7 [ 55.010610] ? vring_new_virtqueue+0x17a/0x17a [ 55.011173] ? note_interrupt+0x2cd/0x367 [ 55.011675] handle_softirqs+0x13c/0x2c9 [ 55.012169] do_softirq+0x5f/0x7d [ 55.012597] </IRQ> [ 55.012882] <TASK> [ 55.013179] __local_bh_enable_ip+0x48/0x62 [ 55.013704] __neigh_event_send+0x3fd/0x4ca [ 55.014227] neigh_resolve_output+0x1e/0x210 [ 55.014761] ip_finish_output2+0x4bf/0x4f0 [ 55.015278] ? __ip_finish_output+0x171/0x1b8 [ 55.015823] ip_send_skb+0x25/0x57 [ 55.016261] raw_sendmsg+0xf95/0x10c0 [ 55.016729] ? check_new_pages+0x45/0x71 [ 55.017229] ? raw_hash_sk+0x21b/0x21b [ 55.017708] ? kernel_init_pages+0x42/0x51 [ 55.018225] ? prep_new_page+0x44/0x51 [ 55.018704] ? get_page_from_freelist+0x72b/0x915 [ 55.019292] ? signal_pending_state+0x77/0x77 [ 55.019840] ? preempt_count_sub+0x14/0xb3 [ 55.020357] ? __might_resched+0x8a/0x240 [ 55.020860] ? __might_sleep+0x25/0xa0 [ 55.021345] ? first_zones_zonelist+0x2c/0x43 [ 55.021896] ? __rcu_read_lock+0x2d/0x3a [ 55.022396] ? __pte_offset_map+0x32/0xa4 [ 55.022901] ? __might_resched+0x8a/0x240 [ 55.023404] ? __might_sleep+0x25/0xa0 [ 55.023879] ? inet_send_prepare+0x54/0x54 [ 55.024391] ? sock_sendmsg_nosec+0x42/0x6c [ 55.024918] sock_sendmsg_nosec+0x42/0x6c [ 55.025428] __sys_sendto+0x15d/0x1cc [ 55.025892] ? __x64_sys_getpeername+0x44/0x44 [ 55.026441] ? __handle_mm_fault+0x679/0xae4 [ 55.026988] ? find_vma+0x6b/0x8b [ 55.027414] ? find_vma_intersection+0x8a/0x8a [ 55.027966] ? handle_mm_fault+0x38/0x154 [ 55.028470] ? handle_mm_fault+0xeb/0x154 [ 55.028972] ? preempt_latency_start+0x29/0x34 [ 55.029532] ? preempt_count_sub+0x14/0xb3 [ 55.030047] ? up_read+0x4b/0x5c [ 55.030463] __x64_sys_sendto+0x76/0x82 [ 55.030949] do_syscall_64+0x69/0xd5 [ 55.031406] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 55.032028] RIP: 0033:0x7f2d2fec9a73 [ 55.032481] Code: 8b 15 a9 83 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 80 3d 71 0b 0d 00 00 41 89 ca 74 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24 [ 55.034660] RSP: 002b:00007ffe85756418 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 55.035567] RAX: ffffffffffffffda RBX: 0000558bebad1340 RCX: 00007f2d2fec9a73 [ 55.036424] RDX: 0000000000000040 RSI: 0000558bebad73c0 RDI: 0000000000000003 [ 55.037293] RBP: 0000558bebad73c0 R08: 0000558bebad35c0 R09: 0000000000000010 [ 55.038153] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000040 [ 55.039012] R13: 00007ffe85757b00 R14: 0000001d00000001 R15: 0000558bebad4680 [ 55.039871] </TASK> [ 55.040167] Modules linked in: [ 55.040585] ---[ end trace 0000000000000000 ]--- [ 55.041164] RIP: 0010:xfrmi_rcv_cb+0x2d/0x295 [ 55.041714] Code: 00 00 41 57 41 56 41 89 f6 41 55 41 54 55 53 48 89 fb 51 85 f6 75 31 48 89 df e8 d7 e8 ff ff 48 89 c5 48 89 c7 e8 8b a4 4f ff <48> 8b 7d 00 48 89 ee e8 eb f3 ff ff 49 89 c5 b8 01 00 00 00 4d 85 [ 55.043889] RSP: 0018:ffffc90000007990 EFLAGS: 00010282 [ 55.044528] RAX: 0000000000000001 RBX: ffff8881126e9900 RCX: fffffbfff07b77cd [ 55.045386] RDX: fffffbfff07b77cd RSI: fffffbfff07b77cd RDI: ffffffff83dbbe60 [ 55.046250] RBP: 6b6b6b6b00000000 R08: 0000000000000008 R09: 0000000000000001 [ 55.047104] R10: ffffffff83dbbe67 R11: fffffbfff07b77cc R12: 00000000ffffffff [ 55.047960] R13: 00000000ffffffff R14: 00000000ffffffff R15: 0000000000000002 [ 55.048820] FS: 00007f2d2fc0dc40(0000) GS:ffffffff82eb2000(0000) knlGS:0000000000000000 [ 55.049805] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.050507] CR2: 00007ffe85755ff8 CR3: 0000000109941000 CR4: 0000000000350ef0 [ 55.051366] Kernel panic - not syncing: Fatal exception in interrupt [ 55.052136] Kernel Offset: disabled [ 55.052577] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: 304b44f0d5a4 ("xfrm: Add dir validation to "in" data path lookup") Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | | * | | net: esp: cleanup esp_output_tail_tcp() in case of unsupported ESPINTCPHagar Hemdan2024-05-232-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xmit() functions should consume skb or return error codes in error paths. When the configuration "CONFIG_INET_ESPINTCP" is not set, the implementation of the function "esp_output_tail_tcp" violates this rule. The function frees the skb and returns the error code. This change removes the kfree_skb from both functions, for both esp4 and esp6. WARN_ON is added because esp_output_tail_tcp() should never be called if CONFIG_INET_ESPINTCP is not set. This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc. Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)") Signed-off-by: Hagar Hemdan <hagarhem@amazon.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | | | | net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECKAsbjørn Sloth Tønnesen2024-07-151-14/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NL_REQ_ATTR_CHECK() is used in fl_set_key_flags() to set extended attributes about the origin of an error, this patch propagates tca[TCA_OPTIONS] through. Before this patch: $ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/tc.yaml \ --do newtfilter --json '{ "chain": 0, "family": 0, "handle": 4, "ifindex": 22, "info": 262152, "kind": "flower", "options": { "flags": 0, "key-enc-flags": 8, "key-eth-type": 2048 }, "parent": 4294967283 }' Netlink error: Invalid argument nl_len = 68 (52) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'Missing flags mask', 'miss-type': 111} After this patch: [same cmd] Netlink error: Invalid argument nl_len = 76 (60) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'Missing flags mask', 'miss-type': 111, 'miss-nest': 56} Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Link: https://patch.msgid.link/20240713021911.1631517-14-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | | flow_dissector: set encapsulation control flags for non-IPAsbjørn Sloth Tønnesen2024-07-152-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure to set encapsulated control flags also for non-IP packets, such that it's possible to allow matching on e.g. TUNNEL_OAM on a geneve packet carrying a non-IP packet. Suggested-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Tested-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Link: https://patch.msgid.link/20240713021911.1631517-13-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | | flow_dissector: cleanup FLOW_DISSECTOR_KEY_ENC_FLAGSAsbjørn Sloth Tønnesen2024-07-152-18/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that TCA_FLOWER_KEY_ENC_FLAGS is unused, as it's former data is stored behind TCA_FLOWER_KEY_ENC_CONTROL, then remove the last bits of FLOW_DISSECTOR_KEY_ENC_FLAGS. FLOW_DISSECTOR_KEY_ENC_FLAGS is unreleased, and have been in net-next since 2024-06-04. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Tested-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Link: https://patch.msgid.link/20240713021911.1631517-12-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>