| Commit message (Collapse) | Author | Files | Lines |
|
Currently the teardown/setup flow for driver probe/remove is quite
a bit different from the reset flows in pdsc_fw_down()/pdsc_fw_up().
One key piece that's missing are the calls to pci_alloc_irq_vectors()
and pci_free_irq_vectors(). The pcie reset case is calling
pci_free_irq_vectors() on reset_prepare, but not calling the
corresponding pci_alloc_irq_vectors() on reset_done. This is causing
unexpected/unwanted interrupt behavior due to the adminq interrupt
being accidentally put into legacy interrupt mode. Also, the
pci_alloc_irq_vectors()/pci_free_irq_vectors() functions are being
called directly in probe/remove respectively.
Fix this inconsistency by making the following changes:
1. Always call pdsc_dev_init() in pdsc_setup(), which calls
pci_alloc_irq_vectors() and get rid of the now unused
pds_dev_reinit().
2. Always free/clear the pdsc->intr_info in pdsc_teardown()
since this structure will get re-alloced in pdsc_setup().
3. Move the calls of pci_free_irq_vectors() to pdsc_teardown()
since pci_alloc_irq_vectors() will always be called in
pdsc_setup()->pdsc_dev_init() for both the probe/remove and
reset flows.
4. Make sure to only create the debugfs "identity" entry when it
doesn't already exist, which it will in the reset case because
it's already been created in the initial call to pdsc_dev_init().
Fixes: ffa55858330f ("pds_core: implement pci reset handlers")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240129234035.69802-7-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
During reset the BARs might be accessed when they are
unmapped. This can cause unexpected issues, so fix it by
clearing the cached BAR values so they are not accessed
until they are re-mapped.
Also, make sure any places that can access the BARs
when they are NULL are prevented.
Fixes: 49ce92fbee0b ("pds_core: add FW update feature to devlink")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-6-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are multiple paths that can result in using the pdsc's
adminq.
[1] pdsc_adminq_isr and the resulting work from queue_work(),
i.e. pdsc_work_thread()->pdsc_process_adminq()
[2] pdsc_adminq_post()
When the device goes through reset via PCIe reset and/or
a fw_down/fw_up cycle due to bad PCIe state or bad device
state the adminq is destroyed and recreated.
A NULL pointer dereference can happen if [1] or [2] happens
after the adminq is already destroyed.
In order to fix this, add some further state checks and
implement reference counting for adminq uses. Reference
counting was used because multiple threads can attempt to
access the adminq at the same time via [1] or [2]. Additionally,
multiple clients (i.e. pds-vfio-pci) can be using [2]
at the same time.
The adminq_refcnt is initialized to 1 when the adminq has been
allocated and is ready to use. Users/clients of the adminq
(i.e. [1] and [2]) will increment the refcnt when they are using
the adminq. When the driver goes into a fw_down cycle it will
set the PDSC_S_FW_DEAD bit and then wait for the adminq_refcnt
to hit 1. Setting the PDSC_S_FW_DEAD before waiting will prevent
any further adminq_refcnt increments. Waiting for the
adminq_refcnt to hit 1 allows for any current users of the adminq
to finish before the driver frees the adminq. Once the
adminq_refcnt hits 1 the driver clears the refcnt to signify that
the adminq is deleted and cannot be used. On the fw_up cycle the
driver will once again initialize the adminq_refcnt to 1 allowing
the adminq to be used again.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-5-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The initial design for the adminq interrupt was done based
on client drivers having their own adminq and adminq
interrupt. So, each client driver's adminq isr would use
their specific adminqcq for the private data struct. For the
time being the design has changed to only use a single
adminq for all clients. So, instead use the struct pdsc for
the private data to simplify things a bit.
This also has the benefit of not dereferencing the adminqcq
to access the pdsc struct when the PDSC_S_STOPPING_DRIVER bit
is set and the adminqcq has actually been cleared/freed.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-4-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is a small window where pdsc_work_thread()
calls pdsc_process_adminq() and pdsc_process_adminq()
passes the PDSC_S_STOPPING_DRIVER check and starts
to process adminq/notifyq work and then the driver
starts a fw_down cycle. This could cause some
undefined behavior if the notifyqcq/adminqcq are
free'd while pdsc_process_adminq() is running. Use
cancel_work_sync() on the adminqcq's work struct
to make sure any pending work items are cancelled
and any in progress work items are completed.
Also, make sure to not call cancel_work_sync() if
the work item has not be initialized. Without this,
traces will happen in cases where a reset fails and
teardown is called again or if reset fails and the
driver is removed.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-3-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The PCIe reset handlers can run at the same time as the
health thread. This can cause the health thread to
stomp on the PCIe reset. Fix this by preventing the
health thread from running while a PCIe reset is happening.
As part of this use timer_shutdown_sync() during reset and
remove to make sure the timer doesn't ever get rearmed.
Fixes: ffa55858330f ("pds_core: implement pci reset handlers")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-2-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported a lockdep splat [1].
Blamed commit hinted about the possible lockdep
violation, and code used unix_state_lock_nested()
in an attempt to silence lockdep.
It is not sufficient, because unix_state_lock_nested()
is already used from unix_state_double_lock().
We need to use a separate subclass.
This patch adds a distinct enumeration to make things
more explicit.
Also use swap() in unix_state_double_lock() as a clean up.
v2: add a missing inline keyword to unix_state_lock_nested()
[1]
WARNING: possible circular locking dependency detected
6.8.0-rc1-syzkaller-00356-g8a696a29c690 #0 Not tainted
syz-executor.1/2542 is trying to acquire lock:
ffff88808b5df9e8 (rlock-AF_UNIX){+.+.}-{2:2}, at: skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863
but task is already holding lock:
ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&u->lock/1){+.+.}-{2:2}:
lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
_raw_spin_lock_nested+0x31/0x40 kernel/locking/spinlock.c:378
sk_diag_dump_icons net/unix/diag.c:87 [inline]
sk_diag_fill+0x6ea/0xfe0 net/unix/diag.c:157
sk_diag_dump net/unix/diag.c:196 [inline]
unix_diag_dump+0x3e9/0x630 net/unix/diag.c:220
netlink_dump+0x5c1/0xcd0 net/netlink/af_netlink.c:2264
__netlink_dump_start+0x5d7/0x780 net/netlink/af_netlink.c:2370
netlink_dump_start include/linux/netlink.h:338 [inline]
unix_diag_handler_dump+0x1c3/0x8f0 net/unix/diag.c:319
sock_diag_rcv_msg+0xe3/0x400
netlink_rcv_skb+0x1df/0x430 net/netlink/af_netlink.c:2543
sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
netlink_unicast+0x7e6/0x980 net/netlink/af_netlink.c:1367
netlink_sendmsg+0xa37/0xd70 net/netlink/af_netlink.c:1908
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
sock_write_iter+0x39a/0x520 net/socket.c:1160
call_write_iter include/linux/fs.h:2085 [inline]
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0xa74/0xca0 fs/read_write.c:590
ksys_write+0x1a0/0x2c0 fs/read_write.c:643
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
-> #0 (rlock-AF_UNIX){+.+.}-{2:2}:
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863
unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
____sys_sendmsg+0x592/0x890 net/socket.c:2584
___sys_sendmsg net/socket.c:2638 [inline]
__sys_sendmmsg+0x3b2/0x730 net/socket.c:2724
__do_sys_sendmmsg net/socket.c:2753 [inline]
__se_sys_sendmmsg net/socket.c:2750 [inline]
__x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&u->lock/1);
lock(rlock-AF_UNIX);
lock(&u->lock/1);
lock(rlock-AF_UNIX);
*** DEADLOCK ***
1 lock held by syz-executor.1/2542:
#0: ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089
stack backtrace:
CPU: 1 PID: 2542 Comm: syz-executor.1 Not tainted 6.8.0-rc1-syzkaller-00356-g8a696a29c690 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
check_noncircular+0x366/0x490 kernel/locking/lockdep.c:2187
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863
unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
____sys_sendmsg+0x592/0x890 net/socket.c:2584
___sys_sendmsg net/socket.c:2638 [inline]
__sys_sendmmsg+0x3b2/0x730 net/socket.c:2724
__do_sys_sendmmsg net/socket.c:2753 [inline]
__se_sys_sendmmsg net/socket.c:2750 [inline]
__x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f26d887cda9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f26d95a60c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f26d89abf80 RCX: 00007f26d887cda9
RDX: 000000000000003e RSI: 00000000200bd000 RDI: 0000000000000004
RBP: 00007f26d88c947a R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000008c0 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f26d89abf80 R15: 00007ffcfe081a68
Fixes: 2aac7a2cb0d9 ("unix_diag: Pending connections IDs NLA")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240130184235.1620738-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, the test is racy and seems to not pass anymore.
In order to rectify it, aim on TCP_TW_RST.
Doesn't seem way too good with this sleep() part, but it seems as
a reasonable compromise for the test. There is a plan in-line comment on
how-to improve it, going to do it on the top, at this moment I want it
to run on netdev/patchwork selftests dashboard.
It also slightly changes tcp_ao-lib in order to get SO_ERROR propagated
to test_client_verify() return value.
Fixes: c6df7b2361d7 ("selftests/net: Add TCP-AO RST test")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-3-d190430a6c60@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As the names of (struct test_key) members didn't reflect whether the key
was used for TX or RX, the verification for the counters was done
incorrectly for asymmetrical selftests.
Rename these with _tx appendix and fix checks in verify_counters().
While at it, as the checks are now correct, introduce skip_counters_checks,
which is intended for tests where it's expected that a key that was set
with setsockopt(sk, IPPROTO_TCP, TCP_AO_INFO, ...) might had no chance
of getting used on the wire.
Fixes the following failures, exposed by the previous commit:
> not ok 51 server: Check current != rnext keys set before connect(): Counter pkt_good was expected to increase 0 => 0 for key 132:5
> not ok 52 server: Check current != rnext keys set before connect(): Counter pkt_good was not expected to increase 0 => 21 for key 137:10
>
> not ok 63 server: Check current flapping back on peer's RnextKey request: Counter pkt_good was expected to increase 0 => 0 for key 132:5
> not ok 64 server: Check current flapping back on peer's RnextKey request: Counter pkt_good was not expected to increase 0 => 40 for key 137:10
Cc: Mohammad Nassiri <mnassiri@ciena.com>
Fixes: 3c3ead555648 ("selftests/net: Add TCP-AO key-management test")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-2-d190430a6c60@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The end_server() function only operates in the server thread
and always takes an accept socket instead of a listen socket as
its input argument. To align with this, invert the boolean values
used when calling verify_counters() within the end_server() function.
As a result of this typo, the test didn't correctly check for
the non-symmetrical scenario, where i.e. peer-A uses a key <100:200>
to send data, but peer-B uses another key <105:205> to send its data.
So, in simple words, different keys for TX and RX.
Fixes: 3c3ead555648 ("selftests/net: Add TCP-AO key-management test")
Signed-off-by: Mohammad Nassiri <mnassiri@ciena.com>
Link: https://lore.kernel.org/all/934627c5-eebb-4626-be23-cfb134c01d1a@arista.com/
[amended 'Fixes' tag, added the issue description and carried-over to lkml]
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-1-d190430a6c60@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Not all mv88e6xxx device support C45 read/write operations. Those
which do not return -EOPNOTSUPP. However, when phylib scans the bus,
it considers this fatal, and the probe of the MDIO bus fails, which in
term causes the mv88e6xxx probe as a whole to fail.
When there is no device on the bus for a given address, the pull up
resistor on the data line results in the read returning 0xffff. The
phylib core code understands this when scanning for devices on the
bus. C45 allows multiple devices to be supported at one address, so
phylib will perform a few reads at each address, so although thought
not the most efficient solution, it is a way to avoid fatal
errors. Make use of this as a minimal fix for stable to fix the
probing problems.
Follow up patches will rework how C45 operates to make it similar to
C22 which considers -ENODEV as a none-fatal, and swap mv88e6xxx to
using this.
Cc: stable@vger.kernel.org
Fixes: 743a19e38d02 ("net: dsa: mv88e6xxx: Separate C22 and C45 transactions")
Reported-by: Tim Menninger <tmenninger@purestorage.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240129224948.1531452-1-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When inetdev_valid_mtu fails, cork->opt should be freed if it is
allocated in ip_setup_cork. Otherwise there could be a memleak.
Fixes: 501a90c94510 ("inet: protect against too small mtu values.")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240129091017.2938835-1-alexious@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Bhupesh's email responds indicating they've changed employers and with
no new contact information. Let's drop the line from MAINTAINERS to
avoid getting the same response over and over.
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/20240129-remove-dwmac-qcom-ethqos-reviewer-v1-1-2645eab61451@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
l2_tos_ttl_inherit.sh verifies the inheritance of tos and ttl
for GRETAP, VXLAN and GENEVE.
Before testing it checks if the required module is available
and if not skips the tests accordingly.
Currently only GRETAP and VXLAN are tested because the GENEVE
module is missing.
Fixes: b690842d12fd ("selftests/net: test l2 tunnel TOS/TTL inheriting")
Signed-off-by: Matthias May <matthias.may@westermo.com>
Link: https://lore.kernel.org/r/20240130101157.196006-1-matthias.may@westermo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When port function state change is requested, and when the driver
does not support it, it refers to the hw address attribute instead
of state attribute. Seems like a copy paste error.
Fix it by referring to the port function state attribute.
Fixes: c0bea69d1ca7 ("devlink: Validate port function request")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240129191059.129030-1-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The original idea of the delay_time check was to not apply multicast
snooping too early when an MLD querier appears. And to instead wait at
least for MLD reports to arrive before switching from flooding to group
based, MLD snooped forwarding, to avoid temporary packet loss.
However in a batman-adv mesh network it was noticed that after 248 days of
uptime 32bit MIPS based devices would start to signal that they had
stopped applying multicast snooping due to missing queriers - even though
they were the elected querier and still sending MLD queries themselves.
While time_is_before_jiffies() generally is safe against jiffies
wrap-arounds, like the code comments in jiffies.h explain, it won't
be able to track a difference larger than ULONG_MAX/2. With a 32bit
large jiffies and one jiffies tick every 10ms (CONFIG_HZ=100) on these MIPS
devices running OpenWrt this would result in a difference larger than
ULONG_MAX/2 after 248 (= 2^32/100/60/60/24/2) days and
time_is_before_jiffies() would then start to return false instead of
true. Leading to multicast snooping not being applied to multicast
packets anymore.
Fix this issue by using a proper timer_list object which won't have this
ULONG_MAX/2 difference limitation.
Fixes: b00589af3b04 ("bridge: disable snooping if there is no querier")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240127175033.9640-1-linus.luessing@c0d3.blue
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
One of the test cases in the test_bridge_backup_port.sh selftest relies
on a matchall classifier to drop unrelated traffic so that the Tx drop
counter on the VXLAN device will only be incremented as a result of
traffic generated by the test.
However, the configuration option for the matchall classifier is
missing from the configuration file which might explain the failures we
see in the netdev CI [1].
Fix by adding CONFIG_NET_CLS_MATCHALL to the configuration file.
[1]
# Backup nexthop ID - invalid IDs
# -------------------------------
[...]
# TEST: Forwarding out of vx0 [ OK ]
# TEST: No forwarding using backup nexthop ID [ OK ]
# TEST: Tx drop increased [FAIL]
# TEST: IPv6 address family nexthop as backup nexthop [ OK ]
# TEST: No forwarding out of swp1 [ OK ]
# TEST: Forwarding out of vx0 [ OK ]
# TEST: No forwarding using backup nexthop ID [ OK ]
# TEST: Tx drop increased [FAIL]
[...]
Fixes: b408453053fb ("selftests: net: Add bridge backup port and backup nexthop ID test")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240129123703.1857843-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported an interesting trace [1] caused by a stale sk->sk_wq
pointer in a closed llc socket.
In commit ff7b11aa481f ("net: socket: set sock->sk to NULL after
calling proto_ops::release()") Eric Biggers hinted that some protocols
are missing a sock_orphan(), we need to perform a full audit.
In net-next, I plan to clear sock->sk from sock_orphan() and
amend Eric patch to add a warning.
[1]
BUG: KASAN: slab-use-after-free in list_empty include/linux/list.h:373 [inline]
BUG: KASAN: slab-use-after-free in waitqueue_active include/linux/wait.h:127 [inline]
BUG: KASAN: slab-use-after-free in sock_def_write_space_wfree net/core/sock.c:3384 [inline]
BUG: KASAN: slab-use-after-free in sock_wfree+0x9a8/0x9d0 net/core/sock.c:2468
Read of size 8 at addr ffff88802f4fc880 by task ksoftirqd/1/27
CPU: 1 PID: 27 Comm: ksoftirqd/1 Not tainted 6.8.0-rc1-syzkaller-00049-g6098d87eaf31 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:377 [inline]
print_report+0xc4/0x620 mm/kasan/report.c:488
kasan_report+0xda/0x110 mm/kasan/report.c:601
list_empty include/linux/list.h:373 [inline]
waitqueue_active include/linux/wait.h:127 [inline]
sock_def_write_space_wfree net/core/sock.c:3384 [inline]
sock_wfree+0x9a8/0x9d0 net/core/sock.c:2468
skb_release_head_state+0xa3/0x2b0 net/core/skbuff.c:1080
skb_release_all net/core/skbuff.c:1092 [inline]
napi_consume_skb+0x119/0x2b0 net/core/skbuff.c:1404
e1000_unmap_and_free_tx_resource+0x144/0x200 drivers/net/ethernet/intel/e1000/e1000_main.c:1970
e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3860 [inline]
e1000_clean+0x4a1/0x26e0 drivers/net/ethernet/intel/e1000/e1000_main.c:3801
__napi_poll.constprop.0+0xb4/0x540 net/core/dev.c:6576
napi_poll net/core/dev.c:6645 [inline]
net_rx_action+0x956/0xe90 net/core/dev.c:6778
__do_softirq+0x21a/0x8de kernel/softirq.c:553
run_ksoftirqd kernel/softirq.c:921 [inline]
run_ksoftirqd+0x31/0x60 kernel/softirq.c:913
smpboot_thread_fn+0x660/0xa10 kernel/smpboot.c:164
kthread+0x2c6/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
Allocated by task 5167:
kasan_save_stack+0x33/0x50 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
unpoison_slab_object mm/kasan/common.c:314 [inline]
__kasan_slab_alloc+0x81/0x90 mm/kasan/common.c:340
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slub.c:3813 [inline]
slab_alloc_node mm/slub.c:3860 [inline]
kmem_cache_alloc_lru+0x142/0x6f0 mm/slub.c:3879
alloc_inode_sb include/linux/fs.h:3019 [inline]
sock_alloc_inode+0x25/0x1c0 net/socket.c:308
alloc_inode+0x5d/0x220 fs/inode.c:260
new_inode_pseudo+0x16/0x80 fs/inode.c:1005
sock_alloc+0x40/0x270 net/socket.c:634
__sock_create+0xbc/0x800 net/socket.c:1535
sock_create net/socket.c:1622 [inline]
__sys_socket_create net/socket.c:1659 [inline]
__sys_socket+0x14c/0x260 net/socket.c:1706
__do_sys_socket net/socket.c:1720 [inline]
__se_sys_socket net/socket.c:1718 [inline]
__x64_sys_socket+0x72/0xb0 net/socket.c:1718
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd3/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
Freed by task 0:
kasan_save_stack+0x33/0x50 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
kasan_save_free_info+0x3f/0x60 mm/kasan/generic.c:640
poison_slab_object mm/kasan/common.c:241 [inline]
__kasan_slab_free+0x121/0x1b0 mm/kasan/common.c:257
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2121 [inline]
slab_free mm/slub.c:4299 [inline]
kmem_cache_free+0x129/0x350 mm/slub.c:4363
i_callback+0x43/0x70 fs/inode.c:249
rcu_do_batch kernel/rcu/tree.c:2158 [inline]
rcu_core+0x819/0x1680 kernel/rcu/tree.c:2433
__do_softirq+0x21a/0x8de kernel/softirq.c:553
Last potentially related work creation:
kasan_save_stack+0x33/0x50 mm/kasan/common.c:47
__kasan_record_aux_stack+0xba/0x100 mm/kasan/generic.c:586
__call_rcu_common.constprop.0+0x9a/0x7b0 kernel/rcu/tree.c:2683
destroy_inode+0x129/0x1b0 fs/inode.c:315
iput_final fs/inode.c:1739 [inline]
iput.part.0+0x560/0x7b0 fs/inode.c:1765
iput+0x5c/0x80 fs/inode.c:1755
dentry_unlink_inode+0x292/0x430 fs/dcache.c:400
__dentry_kill+0x1ca/0x5f0 fs/dcache.c:603
dput.part.0+0x4ac/0x9a0 fs/dcache.c:845
dput+0x1f/0x30 fs/dcache.c:835
__fput+0x3b9/0xb70 fs/file_table.c:384
task_work_run+0x14d/0x240 kernel/task_work.c:180
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0xa8a/0x2ad0 kernel/exit.c:871
do_group_exit+0xd4/0x2a0 kernel/exit.c:1020
__do_sys_exit_group kernel/exit.c:1031 [inline]
__se_sys_exit_group kernel/exit.c:1029 [inline]
__x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1029
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd3/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
The buggy address belongs to the object at ffff88802f4fc800
which belongs to the cache sock_inode_cache of size 1408
The buggy address is located 128 bytes inside of
freed 1408-byte region [ffff88802f4fc800, ffff88802f4fcd80)
The buggy address belongs to the physical page:
page:ffffea0000bd3e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2f4f8
head:ffffea0000bd3e00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
anon flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000840 ffff888013b06b40 0000000000000000 0000000000000001
raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Reclaimable, gfp_mask 0xd20d0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE), pid 4956, tgid 4956 (sshd), ts 31423924727, free_ts 0
set_page_owner include/linux/page_owner.h:31 [inline]
post_alloc_hook+0x2d0/0x350 mm/page_alloc.c:1533
prep_new_page mm/page_alloc.c:1540 [inline]
get_page_from_freelist+0xa28/0x3780 mm/page_alloc.c:3311
__alloc_pages+0x22f/0x2440 mm/page_alloc.c:4567
__alloc_pages_node include/linux/gfp.h:238 [inline]
alloc_pages_node include/linux/gfp.h:261 [inline]
alloc_slab_page mm/slub.c:2190 [inline]
allocate_slab mm/slub.c:2354 [inline]
new_slab+0xcc/0x3a0 mm/slub.c:2407
___slab_alloc+0x4af/0x19a0 mm/slub.c:3540
__slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3625
__slab_alloc_node mm/slub.c:3678 [inline]
slab_alloc_node mm/slub.c:3850 [inline]
kmem_cache_alloc_lru+0x379/0x6f0 mm/slub.c:3879
alloc_inode_sb include/linux/fs.h:3019 [inline]
sock_alloc_inode+0x25/0x1c0 net/socket.c:308
alloc_inode+0x5d/0x220 fs/inode.c:260
new_inode_pseudo+0x16/0x80 fs/inode.c:1005
sock_alloc+0x40/0x270 net/socket.c:634
__sock_create+0xbc/0x800 net/socket.c:1535
sock_create net/socket.c:1622 [inline]
__sys_socket_create net/socket.c:1659 [inline]
__sys_socket+0x14c/0x260 net/socket.c:1706
__do_sys_socket net/socket.c:1720 [inline]
__se_sys_socket net/socket.c:1718 [inline]
__x64_sys_socket+0x72/0xb0 net/socket.c:1718
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd3/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
page_owner free stack trace missing
Memory state around the buggy address:
ffff88802f4fc780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88802f4fc800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88802f4fc880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88802f4fc900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88802f4fc980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 43815482370c ("net: sock_def_readable() and friends RCU conversion")
Reported-and-tested-by: syzbot+32b89eaa102b372ff76d@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240126165532.3396702-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
TSO and TBS cannot coexist. For now we set i.MX Ethernet QOS controller to
use the first TX queue with TSO and the rest for TBS.
TX queues with TBS can support etf qdisc hw offload.
Signed-off-by: Esben Haabendal <esben@geanix.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
With the dma conf being reallocated on each call to stmmac_open(), any
information in there is lost, unless we specifically handle it.
The STMMAC_TBS_EN bit is set when adding an etf qdisc, and the etf qdisc
therefore would stop working when link was set down and then back up.
Fixes: ba39b344e924 ("net: ethernet: stmicro: stmmac: generate stmmac dma conf before open")
Cc: stable@vger.kernel.org
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
On a parisc64 kernel I sometimes notice this kernel warning:
Kernel unaligned access to 0x40ff8814 at ndisc_send_skb+0xc0/0x4d8
The address 0x40ff8814 points to the in6addr_linklocal_allrouters
variable and the warning simply means that some ipv6 function tries to
read a 64-bit word directly from the not-64-bit aligned
in6addr_linklocal_allrouters variable.
Unaligned accesses are non-critical as the architecture or exception
handlers usually will fix it up at runtime. Nevertheless it may trigger
a performance penality for some architectures. For details read the
"unaligned-memory-access" kernel documentation.
The patch below ensures that the ipv6 loopback and router addresses will
always be naturally aligned. This prevents the unaligned accesses for
all architectures.
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 034dfc5df99eb ("ipv6: export in6addr_loopback to modules")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/ZbNuFM1bFqoH-UoY@p100
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Modern OSes use iptables implementation with nf_tables as a backend,
e.g.:
$ iptables -V
iptables v1.8.8 (nf_tables)
Pablo points out that we need CONFIG_NFT_COMPAT to make that work,
otherwise we see a lot of:
Warning: Extension DNAT revision 0 not supported, missing kernel module?
with DNAT being just an example here, other modules we need
include udp, TTL, length etc.
Link: https://lore.kernel.org/r/20240126201308.2903602-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When working with GPIO, its direction must be set either when the GPIO is
requested by gpiod_get*() or later on by one of the gpiod_direction_*()
functions. Neither of this is done here which results in undefined
behavior on some systems.
As the reset GPIO is used right after it is requested here, it makes sense
to configure it as GPIOD_OUT_HIGH right away. With that, the following
gpiod_set_value_cansleep(1) becomes redundant and can be safely
removed.
Fixes: a653f2f538f9 ("net: dsa: qca8k: introduce reset via gpio feature")
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/1706266175-3408-1-git-send-email-michal.vokac@ysoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
All error handling paths, except this one, go to 'out' where
release_swfw_sync() is called.
This call balances the acquire_swfw_sync() call done at the beginning of
the function.
Branch to the error handling path in order to correctly release some
resources in case of error.
Fixes: ae14a1d8e104 ("ixgbe: Fix IOSF SB access issues")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The e1000e driver supports hardware with a variety of different clock
speeds, and thus a variety of different increment values used for
programming its PTP hardware clock.
The values currently programmed in e1000e_ptp_init are incorrect. In
particular, only two maximum adjustments are used: 24000000 - 1, and
600000000 - 1. These were originally intended to be used with the 96 MHz
clock and the 25 MHz clock.
Both of these values are actually slightly too high. For the 96 MHz clock,
the actual maximum value that can safely be programmed is 23,999,938. For
the 25 MHz clock, the maximum value is 599,999,904.
Worse, several devices use a 24 MHz clock or a 38.4 MHz clock. These parts
are incorrectly assigned one of either the 24million or 600million values.
For the 24 MHz clock, this is not a significant issue: its current
increment value can support an adjustment up to 7billion in the positive
direction. However, the 38.4 KHz clock uses an increment value which can
only support up to 230,769,157 before it starts overflowing.
To understand where these values come from, consider that frequency
adjustments have the form of:
new_incval = base_incval + (base_incval * adjustment) / (unit of adjustment)
The maximum adjustment is reported in terms of parts per billion:
new_incval = base_incval + (base_incval * adjustment) / 1 billion
The largest possible adjustment is thus given by the following:
max_incval = base_incval + (base_incval * max_adj) / 1 billion
Re-arranging to solve for max_adj:
max_adj = (max_incval - base_incval) * 1 billion / base_incval
We also need to ensure that negative adjustments cannot underflow. This can
be achieved simply by ensuring max_adj is always less than 1 billion.
Introduce new macros in e1000.h codifying the maximum adjustment in PPB for
each frequency given its associated increment values. Also clarify where
these values come from by commenting about the above equations.
Replace the switch statement in e1000e_ptp_init with one which mirrors the
increment value switch statement from e1000e_get_base_timinica. For each
device, assign the appropriate maximum adjustment based on its frequency.
Some parts can have one of two frequency modes as determined by
E1000_TSYNCRXCTL_SYSCFI.
Since the new flow directly matches the assignments in
e1000e_get_base_timinca, and uses well defined macro names, it is much
easier to verify that the resulting maximum adjustments are correct. It
also avoids difficult to parse construction such as the "hw->mac.type <
e1000_phc_lpt", and the use of fallthrough which was especially confusing
when combined with a conditional block.
Note that I believe the current increment value configuration used for
24MHz clocks is sub-par, as it leaves at least 3 extra bits available in
the INCVALUE register. However, fixing that requires more careful review of
the clock rate and associated values.
Reported-by: Trey Harrison <harrisondigitalmedia@gmail.com>
Fixes: 68fe1d5da548 ("e1000e: Add Support for 38.4MHZ frequency")
Fixes: d89777bf0e42 ("e1000e: add support for IEEE-1588 PTP")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
TCP rx zerocopy intent is to map pages initially allocated
from NIC drivers, not pages owned by a fs.
This patch adds to can_map_frag() these additional checks:
- Page must not be a compound one.
- page->mapping must be NULL.
This fixes the panic reported by ZhangPeng.
syzbot was able to loopback packets built with sendfile(),
mapping pages owned by an ext4 file to TCP rx zerocopy.
r3 = socket$inet_tcp(0x2, 0x1, 0x0)
mmap(&(0x7f0000ff9000/0x4000)=nil, 0x4000, 0x0, 0x12, r3, 0x0)
r4 = socket$inet_tcp(0x2, 0x1, 0x0)
bind$inet(r4, &(0x7f0000000000)={0x2, 0x4e24, @multicast1}, 0x10)
connect$inet(r4, &(0x7f00000006c0)={0x2, 0x4e24, @empty}, 0x10)
r5 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
0x181e42, 0x0)
fallocate(r5, 0x0, 0x0, 0x85b8)
sendfile(r4, r5, 0x0, 0x8ba0)
getsockopt$inet_tcp_TCP_ZEROCOPY_RECEIVE(r4, 0x6, 0x23,
&(0x7f00000001c0)={&(0x7f0000ffb000/0x3000)=nil, 0x3000, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0}, &(0x7f0000000440)=0x40)
r6 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
0x181e42, 0x0)
Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive")
Link: https://lore.kernel.org/netdev/5106a58e-04da-372a-b836-9d3d0bd2507b@huawei.com/T/
Reported-and-bisected-by: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rx_data_reassembly skb is stored during NCI data exchange for processing
fragmented packets. It is dropped only when the last fragment is processed
or when an NTF packet with NCI_OP_RF_DEACTIVATE_NTF opcode is received.
However, the NCI device may be deallocated before that which leads to skb
leak.
As by design the rx_data_reassembly skb is bound to the NCI device and
nothing prevents the device to be freed before the skb is processed in
some way and cleaned, free it on the NCI device cleanup.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Cc: stable@vger.kernel.org
Reported-by: syzbot+6b7c68d9c21e4ee4251b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/000000000000f43987060043da7b@google.com/
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Syzkaller reported [1] hitting a warning after failing to allocate
resources for skb in hsr_init_skb(). Since a WARN_ONCE() call will
not help much in this case, it might be prudent to switch to
netdev_warn_once(). At the very least it will suppress syzkaller
reports such as [1].
Just in case, use netdev_warn_once() in send_prp_supervision_frame()
for similar reasons.
[1]
HSR: Could not send supervision frame
WARNING: CPU: 1 PID: 85 at net/hsr/hsr_device.c:294 send_hsr_supervision_frame+0x60a/0x810 net/hsr/hsr_device.c:294
RIP: 0010:send_hsr_supervision_frame+0x60a/0x810 net/hsr/hsr_device.c:294
...
Call Trace:
<IRQ>
hsr_announce+0x114/0x370 net/hsr/hsr_device.c:382
call_timer_fn+0x193/0x590 kernel/time/timer.c:1700
expire_timers kernel/time/timer.c:1751 [inline]
__run_timers+0x764/0xb20 kernel/time/timer.c:2022
run_timer_softirq+0x58/0xd0 kernel/time/timer.c:2035
__do_softirq+0x21a/0x8de kernel/softirq.c:553
invoke_softirq kernel/softirq.c:427 [inline]
__irq_exit_rcu kernel/softirq.c:632 [inline]
irq_exit_rcu+0xb7/0x120 kernel/softirq.c:644
sysvec_apic_timer_interrupt+0x95/0xb0 arch/x86/kernel/apic/apic.c:1076
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:649
...
This issue is also found in older kernels (at least up to 5.10).
Cc: stable@vger.kernel.org
Reported-by: syzbot+3ae0a3f42c84074b7c8e@syzkaller.appspotmail.com
Fixes: 121c33b07b31 ("net: hsr: introduce common code for skb initialization")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case the interface between the MAC and the PHY is SGMII, then the bit
GIGA_MODE on the MAC side needs to be set regardless of the speed at
which it is running.
Fixes: d28d6d2e37d1 ("net: lan966x: add port module support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 46cf789b68b2 ("connector: Move maintainence under networking
drivers umbrella.") moved the connector maintenance but did not include
the connector header files.
It seems that it has always been implied that these headers were
maintained along with the rest of the connector code, both before and
after the cited commit. Make this explicit.
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The stacktrace was:
[ 86.305548] BUG: kernel NULL pointer dereference, address: 0000000000000092
[ 86.306815] #PF: supervisor read access in kernel mode
[ 86.307717] #PF: error_code(0x0000) - not-present page
[ 86.308624] PGD 0 P4D 0
[ 86.309091] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 86.309883] CPU: 2 PID: 3139 Comm: pimd Tainted: G U 6.8.0-6wind-knet #1
[ 86.311027] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
[ 86.312728] RIP: 0010:ip_mr_forward (/build/work/knet/net/ipv4/ipmr.c:1985)
[ 86.313399] Code: f9 1f 0f 87 85 03 00 00 48 8d 04 5b 48 8d 04 83 49 8d 44 c5 00 48 8b 40 70 48 39 c2 0f 84 d9 00 00 00 49 8b 46 58 48 83 e0 fe <80> b8 92 00 00 00 00 0f 84 55 ff ff ff 49 83 47 38 01 45 85 e4 0f
[ 86.316565] RSP: 0018:ffffad21c0583ae0 EFLAGS: 00010246
[ 86.317497] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 86.318596] RDX: ffff9559cb46c000 RSI: 0000000000000000 RDI: 0000000000000000
[ 86.319627] RBP: ffffad21c0583b30 R08: 0000000000000000 R09: 0000000000000000
[ 86.320650] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 86.321672] R13: ffff9559c093a000 R14: ffff9559cc00b800 R15: ffff9559c09c1d80
[ 86.322873] FS: 00007f85db661980(0000) GS:ffff955a79d00000(0000) knlGS:0000000000000000
[ 86.324291] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 86.325314] CR2: 0000000000000092 CR3: 000000002f13a000 CR4: 0000000000350ef0
[ 86.326589] Call Trace:
[ 86.327036] <TASK>
[ 86.327434] ? show_regs (/build/work/knet/arch/x86/kernel/dumpstack.c:479)
[ 86.328049] ? __die (/build/work/knet/arch/x86/kernel/dumpstack.c:421 /build/work/knet/arch/x86/kernel/dumpstack.c:434)
[ 86.328508] ? page_fault_oops (/build/work/knet/arch/x86/mm/fault.c:707)
[ 86.329107] ? do_user_addr_fault (/build/work/knet/arch/x86/mm/fault.c:1264)
[ 86.329756] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
[ 86.330350] ? __irq_work_queue_local (/build/work/knet/kernel/irq_work.c:111 (discriminator 1))
[ 86.331013] ? exc_page_fault (/build/work/knet/./arch/x86/include/asm/paravirt.h:693 /build/work/knet/arch/x86/mm/fault.c:1515 /build/work/knet/arch/x86/mm/fault.c:1563)
[ 86.331702] ? asm_exc_page_fault (/build/work/knet/./arch/x86/include/asm/idtentry.h:570)
[ 86.332468] ? ip_mr_forward (/build/work/knet/net/ipv4/ipmr.c:1985)
[ 86.333183] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
[ 86.333920] ipmr_mfc_add (/build/work/knet/./include/linux/rcupdate.h:782 /build/work/knet/net/ipv4/ipmr.c:1009 /build/work/knet/net/ipv4/ipmr.c:1273)
[ 86.334583] ? __pfx_ipmr_hash_cmp (/build/work/knet/net/ipv4/ipmr.c:363)
[ 86.335357] ip_mroute_setsockopt (/build/work/knet/net/ipv4/ipmr.c:1470)
[ 86.336135] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
[ 86.336854] ? ip_mroute_setsockopt (/build/work/knet/net/ipv4/ipmr.c:1470)
[ 86.337679] do_ip_setsockopt (/build/work/knet/net/ipv4/ip_sockglue.c:944)
[ 86.338408] ? __pfx_unix_stream_read_actor (/build/work/knet/net/unix/af_unix.c:2862)
[ 86.339232] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
[ 86.339809] ? aa_sk_perm (/build/work/knet/security/apparmor/include/cred.h:153 /build/work/knet/security/apparmor/net.c:181)
[ 86.340342] ip_setsockopt (/build/work/knet/net/ipv4/ip_sockglue.c:1415)
[ 86.340859] raw_setsockopt (/build/work/knet/net/ipv4/raw.c:836)
[ 86.341408] ? security_socket_setsockopt (/build/work/knet/security/security.c:4561 (discriminator 13))
[ 86.342116] sock_common_setsockopt (/build/work/knet/net/core/sock.c:3716)
[ 86.342747] do_sock_setsockopt (/build/work/knet/net/socket.c:2313)
[ 86.343363] __sys_setsockopt (/build/work/knet/./include/linux/file.h:32 /build/work/knet/net/socket.c:2336)
[ 86.344020] __x64_sys_setsockopt (/build/work/knet/net/socket.c:2340)
[ 86.344766] do_syscall_64 (/build/work/knet/arch/x86/entry/common.c:52 /build/work/knet/arch/x86/entry/common.c:83)
[ 86.345433] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
[ 86.346161] ? syscall_exit_work (/build/work/knet/./include/linux/audit.h:357 /build/work/knet/kernel/entry/common.c:160)
[ 86.346938] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
[ 86.347657] ? syscall_exit_to_user_mode (/build/work/knet/kernel/entry/common.c:215)
[ 86.348538] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
[ 86.349262] ? do_syscall_64 (/build/work/knet/./arch/x86/include/asm/cpufeature.h:171 /build/work/knet/arch/x86/entry/common.c:98)
[ 86.349971] entry_SYSCALL_64_after_hwframe (/build/work/knet/arch/x86/entry/entry_64.S:129)
The original packet in ipmr_cache_report() may be queued and then forwarded
with ip_mr_forward(). This last function has the assumption that the skb
dst is set.
After the below commit, the skb dst is dropped by ipv4_pktinfo_prepare(),
which causes the oops.
Fixes: bb7403655b3c ("ipmr: support IP_PKTINFO on cache report IGMP msg")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240125141847.1931933-1-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Setup PMCR port register for actual speed and duplex on internally
connected PHYs of the MT7988 built-in switch. This fixes links with
speeds other than 1000M.
Fixes: 110c18bfed41 ("net: dsa: mt7530: introduce driver for MT7988 built-in switch")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/a5b04dfa8256d8302f402545a51ac4c626fdba25.1706071272.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot found __ip6_tnl_rcv() could access unitiliazed data [1].
Call pskb_inet_may_pull() to fix this, and initialize ipv6h
variable after this call as it can change skb->head.
[1]
BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline]
BUG: KMSAN: uninit-value in INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline]
BUG: KMSAN: uninit-value in IP6_ECN_decapsulate+0x7df/0x1e50 include/net/inet_ecn.h:321
__INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline]
INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline]
IP6_ECN_decapsulate+0x7df/0x1e50 include/net/inet_ecn.h:321
ip6ip6_dscp_ecn_decapsulate+0x178/0x1b0 net/ipv6/ip6_tunnel.c:727
__ip6_tnl_rcv+0xd4e/0x1590 net/ipv6/ip6_tunnel.c:845
ip6_tnl_rcv+0xce/0x100 net/ipv6/ip6_tunnel.c:888
gre_rcv+0x143f/0x1870
ip6_protocol_deliver_rcu+0xda6/0x2a60 net/ipv6/ip6_input.c:438
ip6_input_finish net/ipv6/ip6_input.c:483 [inline]
NF_HOOK include/linux/netfilter.h:314 [inline]
ip6_input+0x15d/0x430 net/ipv6/ip6_input.c:492
ip6_mc_input+0xa7e/0xc80 net/ipv6/ip6_input.c:586
dst_input include/net/dst.h:461 [inline]
ip6_rcv_finish+0x5db/0x870 net/ipv6/ip6_input.c:79
NF_HOOK include/linux/netfilter.h:314 [inline]
ipv6_rcv+0xda/0x390 net/ipv6/ip6_input.c:310
__netif_receive_skb_one_core net/core/dev.c:5532 [inline]
__netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5646
netif_receive_skb_internal net/core/dev.c:5732 [inline]
netif_receive_skb+0x58/0x660 net/core/dev.c:5791
tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1555
tun_get_user+0x53af/0x66d0 drivers/net/tun.c:2002
tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
call_write_iter include/linux/fs.h:2084 [inline]
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0x786/0x1200 fs/read_write.c:590
ksys_write+0x20f/0x4c0 fs/read_write.c:643
__do_sys_write fs/read_write.c:655 [inline]
__se_sys_write fs/read_write.c:652 [inline]
__x64_sys_write+0x93/0xd0 fs/read_write.c:652
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x6d/0x140 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
Uninit was created at:
slab_post_alloc_hook+0x129/0xa70 mm/slab.h:768
slab_alloc_node mm/slub.c:3478 [inline]
kmem_cache_alloc_node+0x5e9/0xb10 mm/slub.c:3523
kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:560
__alloc_skb+0x318/0x740 net/core/skbuff.c:651
alloc_skb include/linux/skbuff.h:1286 [inline]
alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6334
sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2787
tun_alloc_skb drivers/net/tun.c:1531 [inline]
tun_get_user+0x1e8a/0x66d0 drivers/net/tun.c:1846
tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
call_write_iter include/linux/fs.h:2084 [inline]
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0x786/0x1200 fs/read_write.c:590
ksys_write+0x20f/0x4c0 fs/read_write.c:643
__do_sys_write fs/read_write.c:655 [inline]
__se_sys_write fs/read_write.c:652 [inline]
__x64_sys_write+0x93/0xd0 fs/read_write.c:652
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x6d/0x140 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
CPU: 0 PID: 5034 Comm: syz-executor331 Not tainted 6.7.0-syzkaller-00562-g9f8413c4a66f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Fixes: 0d3c703a9d17 ("ipv6: Cleanup IPv6 tunnel receive path")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240125170557.2663942-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The gro.sh test-case relay on the gro_flush_timeout to ensure
that all the segments belonging to any given batch are properly
aggregated.
The other end, the sender is a user-space program transmitting
each packet with a separate write syscall. A busy host and/or
stracing the sender program can make the relevant segments reach
the GRO engine after the flush timeout triggers.
Give the GRO flush timeout more slack, to avoid sporadic self-tests
failures.
Fixes: 9af771d2ec04 ("selftests/net: allow GRO coalesce test on veth")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Tested-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/bffec2beab3a5672dd13ecabe4fad81d2155b367.1706206101.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
the udpgro_fraglist self-test uses the BPF classifiers, but the
current net self-test configuration does not include it, causing
CI failures:
# selftests: net: udpgro_frglist.sh
# ipv6
# tcp - over veth touching data
# -l 4 -6 -D 2001:db8::1 -t rx -4 -t
# Error: TC classifier not found.
# We have an error talking to the kernel
# Error: TC classifier not found.
# We have an error talking to the kernel
Add the missing knob.
Fixes: edae34a3ed92 ("selftests net: add UDP GRO fraglist + bpf self-tests")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/7c3643763b331e9a400e1874fe089193c99a1c3f.1706170897.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
commit 056bce63c469 ("bnxt_en: Make PTP TX timestamp HWRM query silent")
changed a netdev_err() to netdev_WARN_ONCE().
netdev_WARN_ONCE() is it generates a kernel WARNING, which is bad, for
the following reasons:
* You do not a kernel warning if the firmware queries are late
* In busy networks, timestamp query failures fairly regularly
* A WARNING message doesn't bring much value, since the code path
is clear.
(This was discussed in-depth in [1])
Transform the netdev_WARN_ONCE() into a netdev_warn_once(), and print a
more well-behaved message, instead of a full WARN().
bnxt_en 0000:67:00.0 eth0: TS query for TX timer failed rc = fffffff5
[1] Link: https://lore.kernel.org/all/ZbDj%2FFI4EJezcfd1@gmail.com/
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Fixes: 056bce63c469 ("bnxt_en: Make PTP TX timestamp HWRM query silent")
Link: https://lore.kernel.org/r/20240125134104.2045573-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The logic to determine if SMC-D link group matches is incorrect. The
correct logic should be that it only returns true when the GID is the
same, and the SMC-D device is the same and the extended GID is the same
(in the case of virtual ISM).
It can be fixed by adding brackets around the conditional (or ternary)
operator expression. But for better readability and maintainability, it
has been changed to an if-else statement.
Reported-by: Matthew Rosato <mjrosato@linux.ibm.com>
Closes: https://lore.kernel.org/r/13579588-eb9d-4626-a063-c0b77ed80f11@linux.ibm.com
Fixes: b40584d14570 ("net/smc: compatible with 128-bits extended GID of virtual ISM device")
Link: https://lore.kernel.org/r/13579588-eb9d-4626-a063-c0b77ed80f11@linux.ibm.com
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20240125123916.77928-1-guwen@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The big_tcp test-case requires a few kernel knobs currently
not specified in the net selftests config, causing the
following failure:
# selftests: net: big_tcp.sh
# Error: Failed to load TC action module.
# We have an error talking to the kernel
...
# Testing for BIG TCP:
# CLI GSO | GW GRO | GW GSO | SER GRO
# ./big_tcp.sh: line 107: test: !=: unary operator expected
...
# on on on on : [FAIL_on_link1]
Add the missing configs
Fixes: 6bb382bcf742 ("selftests: add a selftest for big tcp")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/all/21630ecea872fea13f071342ac64ef52a991a9b5.1706282943.git.pabeni@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Sync initialization and calibration routines with MediaTek's reference
driver. Improves compliance and resolves link stability issues with
CH340 IoT devices connected to MT798x built-in PHYs.
Fixes: 98c485eaf509 ("net: phy: add driver for MediaTek SoC built-in GE PHYs")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/f2195279c234c0f618946424b8236026126bc595.1706071311.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Set DMA coherent mask to 32-bit which makes PPE offloading engine start
working on BPi-R4 which got 4 GiB of RAM.
Fixes: 2d75891ebc09 ("net: ethernet: mtk_eth_soc: support 36-bit DMA addressing on MT7988")
Suggested-by: Elad Yifee <eladwf@users.github.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/97e90925368b405f0974b9b15f1b7377c4a329ad.1706113251.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The nfp driver will merge the tp source port and tp destination port
into one dword which the offset must be zero to do hardware offload.
However, the mangle action for the tp source port and tp destination
port is separated for tc ct action. Modify the mangle action for the
FLOW_ACT_MANGLE_HDR_TYPE_TCP and FLOW_ACT_MANGLE_HDR_TYPE_UDP to
satisfy the nfp driver offload check for the tp port.
The mangle action provides a 4B value for source, and a 4B value for
the destination, but only 2B of each contains the useful information.
For offload the 2B of each is combined into a single 4B word. Since the
incoming mask for the source is '0xFFFF<mask>' the shift-left will
throw away the 0xFFFF part. When this gets combined together in the
offload it will clear the destination field. Fix this by setting the
lower bits back to 0xFFFF, effectively doing a rotate-left operation on
the mask.
Fixes: 5cee92c6f57a ("nfp: flower: support hw offload for ct nat action")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Hui Zhou <hui.zhou@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Link: https://lore.kernel.org/r/20240124151909.31603-3-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The nfp offload flow pay will not allocate a mask id when the out port
is openvswitch internal port. This is because these flows are used to
configure the pre_tun table and are never actually send to the firmware
as an add-flow message. When a tc rule which action contains ct and
the post ct entry's out port is openvswitch internal port, the merge
offload flow pay with the wrong mask id of 0 will be send to the
firmware. Actually, the nfp can not support hardware offload for this
situation, so return EOPNOTSUPP.
Fixes: bd0fe7f96a3c ("nfp: flower-ct: add zone table entry when handling pre/post_ct flows")
CC: stable@vger.kernel.org # 5.14+
Signed-off-by: Hui Zhou <hui.zhou@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Link: https://lore.kernel.org/r/20240124151909.31603-2-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For a skb frag with a newly allocated copy page, the true size is
incorrectly set to packet buffer size. It should be set to PAGE_SIZE
instead.
Fixes: 82fd151d38d9 ("gve: Reduce alloc and copy costs in the GQ rx path")
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Link: https://lore.kernel.org/r/20240124161025.1819836-1-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The busywait timeout value is a millisecond, not a second. So the
current setting 2 is too small. On slow/busy host (or VMs) the
current timeout can expire even on "correct" execution, causing random
failures. Let's copy the WAIT_TIMEOUT from forwarding/lib.sh and set
BUSYWAIT_TIMEOUT here.
Fixes: 25ae948b4478 ("selftests/net: add lib.sh")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240124061344.1864484-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|