| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use of percpu_counter structure to track count of orphaned
sockets is causing problems on modern hosts with 256 cpus
or more.
Stefan Bach reported a serious spinlock contention in real workloads,
that I was able to reproduce with a netfilter rule dropping
incoming FIN packets.
53.56% server [kernel.kallsyms] [k] queued_spin_lock_slowpath
|
---queued_spin_lock_slowpath
|
--53.51%--_raw_spin_lock_irqsave
|
--53.51%--__percpu_counter_sum
tcp_check_oom
|
|--39.03%--__tcp_close
| tcp_close
| inet_release
| inet6_release
| sock_close
| __fput
| ____fput
| task_work_run
| exit_to_usermode_loop
| do_syscall_64
| entry_SYSCALL_64_after_hwframe
| __GI___libc_close
|
--14.48%--tcp_out_of_resources
tcp_write_timeout
tcp_retransmit_timer
tcp_write_timer_handler
tcp_write_timer
call_timer_fn
expire_timers
__run_timers
run_timer_softirq
__softirqentry_text_start
As explained in commit cf86a086a180 ("net/dst: use a smaller percpu_counter
batch for dst entries accounting"), default batch size is too big
for the default value of tcp_max_orphans (262144).
But even if we reduce batch sizes, there would still be cases
where the estimated count of orphans is beyond the limit,
and where tcp_too_many_orphans() has to call the expensive
percpu_counter_sum_positive().
One solution is to use plain per-cpu counters, and have
a timer to periodically refresh this cache.
Updating this cache every 100ms seems about right, tcp pressure
state is not radically changing over shorter periods.
percpu_counter was nice 15 years ago while hosts had less
than 16 cpus, not anymore by current standards.
v2: Fix the build issue for CONFIG_CRYPTO_DEV_CHELSIO_TLS=m,
reported by kernel test robot <lkp@intel.com>
Remove unused socket argument from tcp_too_many_orphans()
Fixes: dd24c00191d5 ("net: Use a percpu_counter for orphan_count")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Stefan Bach <sfb@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
Move ports related config to dedicated struct to keep things organized.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
QCA original code report port instability and sa that SGMII also require
to set internal delay. Generalize the rgmii delay function and apply the
advised value if they are not defined in DT.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
QCA8328 switch is the bigger brother of the qca8327. Same regs different
chip. Change the function to set the correct pin layout and introduce a
new match_data to differentiate the 2 switch as they have the same ID
and their internal PHY have the same ID.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some qca8327 switch require to force the ignore of power on sel
strapping. Some switch require to set the led open drain mode in regs
instead of using strapping. While most of the device implements this
using the correct way using pin strapping, there are still some broken
device that require to be set using sw regs.
Introduce a new binding and support these special configuration.
As led open drain require to ignore pin strapping to work, the probe
fails with EINVAL error with incorrect configuration.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support enabling PLL on the SGMII CPU port. Some device require this
special configuration or no traffic is transmitted and the switch
doesn't work at all. A dedicated binding is added to the CPU node
port to apply the correct reg on mac config.
Fail to correctly configure sgmii with qca8327 switch and warn if pll is
used on qca8337 with a revision greater than 1.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Future proof commit. This switch have 2 CPU ports and one valid
configuration is first CPU port set to sgmii and second CPU port set to
rgmii-id. The current implementation detects delay only for CPU port
zero set to rgmii and doesn't count any delay set in a secondary CPU
port. Drop the current delay scan function and move it to the sgmii
parser function to generalize and implicitly add support for secondary
CPU port set to rgmii-id. Introduce new logic where delay is enabled
also with internal delay binding declared and rgmii set as PHY mode.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently CPU port is always hardcoded to port 0. This switch have 2 CPU
ports. The original intention of this driver seems to be use the
mac06_exchange bit to swap MAC0 with MAC6 in the strange configuration
where device have connected only the CPU port 6. To skip the
introduction of a new binding, rework the driver to address the
secondary CPU port as primary and drop any reference of hardcoded port.
With configuration of mac06 exchange, just skip the definition of port0
and define the CPU port as a secondary. The driver will autoconfigure
the switch to use that as the primary CPU port.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for this in the qca8k driver. Also add support for SGMII
rx/tx clock falling edge. This is only present for pad0, pad5 and
pad6 have these bit reserved from Documentation. Add a comment that this
is hardcoded to PAD0 as qca8327/28/34/37 have an unique sgmii line and
setting falling in port0 applies to both configuration with sgmii used
for port0 or port6.
Co-developed-by: Matthew Hagan <mnhagan88@gmail.com>
Signed-off-by: Matthew Hagan <mnhagan88@gmail.com>
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add missing mac power sel support needed for ipq8064/5 SoC that require
1.8v for the internal regulator port instead of the default 1.5v.
If other device needs this, consider adding a dedicated binding to
support this.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
| |
The variable err is being initialized with a value that is never read, it
is being updated immediately afterwards. The assignment is redundant and
can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
Adds support to flush or invalidate CPT CTX entries as part of FLR
and also provides a mailbox to flush CPT CTX entries in case of
graceful exit.
This patch also adds support for AF -> CPT PF uplink mailbox messages
and adds a new mbox message to submit a CPT instruction from AF.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Perform CPT LF teardown in non FLR path as well via cpt_lf_free()
Currently CPT LF teardown and reset sequence is only
done when FLR is handled with CPT LF still attached.
This patch also fixes cpt_lf_alloc() to set EXEC_LDWB in
CPT_AF_LFX_CTL2 when being completely overwritten as that is
the default value and is better for performance.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
| |
This patch enables and registers interrupt handler for CPT HW
interrupts.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On i386, when builtin (not a loadable module), the winbond-840 driver
inspects boot_cpu_data to see what CPU family it is running on, and
then acts on that data. The "family" struct member (x86) does not exist
when running on UML, so prevent that test and do the default action.
Prevents this build error on UML + i386:
../drivers/net/ethernet/dec/tulip/winbond-840.c: In function ‘init_registers’:
../drivers/net/ethernet/dec/tulip/winbond-840.c:882:19: error: ‘struct cpuinfo_um’ has no member named ‘x86’
if (boot_cpu_data.x86 <= 4) {
Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-um@lists.infradead.org
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Link: https://lore.kernel.org/r/20211014050606.7288-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On a UML build, the igc_ptp driver uses CONFIG_X86_TSC for timestamp
conversion. The function that is used is not available on UML builds,
so have the function use the default system_counterval_t timestamp
instead for UML builds.
Prevents this build error on UML:
../drivers/net/ethernet/intel/igc/igc_ptp.c: In function ‘igc_device_tstamp_to_system’:
../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: implicit declaration of function ‘convert_art_ns_to_tsc’ [-Werror=implicit-function-declaration]
return convert_art_ns_to_tsc(tstamp);
../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: incompatible types when returning type ‘int’ but ‘struct system_counterval_t’ was expected
return convert_art_ns_to_tsc(tstamp);
Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-um@lists.infradead.org
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/r/20211014050516.6846-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On i386, when builtin (not a loadable module), the fealnx driver
inspects boot_cpu_data to see what CPU family it is running on, and
then acts on that data. The "family" struct member (x86) does not exist
when running on UML, so prevent that test and do the default action.
Prevents this build error on UML + i386:
../drivers/net/ethernet/fealnx.c: In function ‘netdev_open’:
../drivers/net/ethernet/fealnx.c:861:19: error: ‘struct cpuinfo_um’ has no member named ‘x86’
Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-um@lists.infradead.org
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Link: https://lore.kernel.org/r/20211014050500.5620-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adding comment to avoid the misusing of netvsc_xdp_xmit().
Otherwise the value of skb->queue_mapping could be 0 and
then the return value of skb_get_rx_queue() could be MAX_U16
cause by overflow.
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1634174786-1810351-1-git-send-email-jiasheng@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
| |
The variable pp is not in use - delete it.
Signed-off-by: Yuval Shaia <yshaia@marvell.com>
Link: https://lore.kernel.org/r/20211013064921.26346-1-yshaia@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PHY driver dp83867 has rich supports for OF-platform to fine-tune the PHY
chip during phy configuration. However, for non-OF platform, certain PHY
tunable parameters such as IO impedance and RX & TX internal delays are
critical and should be initialized to its default during PHY driver probe.
Tested-by: Clement <clement@intel.com>
Signed-off-by: Lay, Kuan Loon <kuan.loon.lay@intel.com>
Co-developed-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20211013065941.2124858-1-boon.leong.ong@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the PTP_PEROUT_DUTY_CYCLE flag is set, then check if the
request_on value in ptp_perout_request matches the pre-defined
values or a toggle option.
Return a failure if the value is not supported.
Preserve the old behaviors if the PTP_PEROUT_DUTY_CYCLE flag is not
set.
Tested with an oscilloscope on EVB-LAN7430:
e.g., to output PPS 1sec period 500mS on (high) to GPIO 2.
./testptp -L 2,2
./testptp -p 1000000000 -w 500000000
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Link: https://lore.kernel.org/r/1634046593-64312-1-git-send-email-yuiko.oshino@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It seems reasonable to fine-tune only some of the skew values when using
one of the rgmii-*id PHY modes, and even when all skew values are
specified, using the correct ID PHY mode makes sense for documentation
purposes. Such a configuration also appears in the binding docs in
Documentation/devicetree/bindings/net/micrel-ksz90x1.txt, so the driver
should not warn about it.
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Link: https://lore.kernel.org/r/20211012103402.21438-1-matthias.schiffer@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| | |
tools/testing/selftests/net/ioam6.sh
7b1700e009cc ("selftests: net: modify IOAM tests for undef bits")
bf77b1400a56 ("selftests: net: Test for the IOAM encapsulation with IPv6")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Quite calm.
The noisy DSA driver (embedded switches) changes, and adjustment to
IPv6 IOAM behavior add to diffstat's bottom line but are not scary.
Current release - regressions:
- af_unix: rename UNIX-DGRAM to UNIX to maintain backwards
compatibility
- procfs: revert "add seq_puts() statement for dev_mcast", minor
format change broke user space
Current release - new code bugs:
- dsa: fix bridge_num not getting cleared after ports leaving the
bridge, resource leak
- dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware
bridges using VID 0, prevent packet drops if pvid is removed
- dsa: mv88e6xxx: keep the pvid at 0 when VLAN-unaware, prevent HW
getting confused about station to VLAN mapping
Previous releases - regressions:
- virtio-net: fix for skb_over_panic inside big mode
- phy: do not shutdown PHYs in READY state
- dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's, fix link
LED staying lit after ifdown
- mptcp: fix possible infinite wait on recvmsg(MSG_WAITALL)
- mqprio: Correct stats in mqprio_dump_class_stats()
- ice: fix deadlock for Tx timestamp tracking flush
- stmmac: fix feature detection on old hardware
Previous releases - always broken:
- sctp: account stream padding length for reconf chunk
- icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe()
- isdn: cpai: check ctr->cnr to avoid array index out of bound
- isdn: mISDN: fix sleeping function called from invalid context
- nfc: nci: fix potential UAF of rf_conn_info object
- dsa: microchip: prevent ksz_mib_read_work from kicking back in
after it's canceled in .remove and crashing
- dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged
ports
- dsa: sja1105, ocelot: break circular dependency between switch and
tag drivers
- dsa: felix: improve timestamping in presence of packe loss
- mlxsw: thermal: fix out-of-bounds memory accesses
Misc:
- ipv6: ioam: move the check for undefined bits to improve
interoperability"
* tag 'net-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits)
icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe
MAINTAINERS: Update the devicetree documentation path of imx fec driver
sctp: account stream padding length for reconf chunk
mlxsw: thermal: Fix out-of-bounds memory accesses
ethernet: s2io: fix setting mac address during resume
NFC: digital: fix possible memory leak in digital_in_send_sdd_req()
NFC: digital: fix possible memory leak in digital_tg_listen_mdaa()
nfc: fix error handling of nfc_proto_register()
Revert "net: procfs: add seq_puts() statement for dev_mcast"
net: encx24j600: check error in devm_regmap_init_encx24j600
net: korina: select CRC32
net: arc: select CRC32
net: dsa: felix: break at first CPU port during init and teardown
net: dsa: tag_ocelot_8021q: fix inability to inject STP BPDUs into BLOCKING ports
net: dsa: felix: purge skb from TX timestamping queue if it cannot be sent
net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib
net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver
net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with the skb PTP header
net: mscc: ocelot: deny TX timestamping of non-PTP packets
net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently, mlxsw allows cooling states to be set above the maximum
cooling state supported by the driver:
# cat /sys/class/thermal/thermal_zone2/cdev0/type
mlxsw_fan
# cat /sys/class/thermal/thermal_zone2/cdev0/max_state
10
# echo 18 > /sys/class/thermal/thermal_zone2/cdev0/cur_state
# echo $?
0
This results in out-of-bounds memory accesses when thermal state
transition statistics are enabled (CONFIG_THERMAL_STATISTICS=y), as the
transition table is accessed with a too large index (state) [1].
According to the thermal maintainer, it is the responsibility of the
driver to reject such operations [2].
Therefore, return an error when the state to be set exceeds the maximum
cooling state supported by the driver.
To avoid dead code, as suggested by the thermal maintainer [3],
partially revert commit a421ce088ac8 ("mlxsw: core: Extend cooling
device with cooling levels") that tried to interpret these invalid
cooling states (above the maximum) in a special way. The cooling levels
array is not removed in order to prevent the fans going below 20% PWM,
which would cause them to get stuck at 0% PWM.
[1]
BUG: KASAN: slab-out-of-bounds in thermal_cooling_device_stats_update+0x271/0x290
Read of size 4 at addr ffff8881052f7bf8 by task kworker/0:0/5
CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.15.0-rc3-custom-45935-gce1adf704b14 #122
Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2FO"/"SA000874", BIOS 4.6.5 03/08/2016
Workqueue: events_freezable_power_ thermal_zone_device_check
Call Trace:
dump_stack_lvl+0x8b/0xb3
print_address_description.constprop.0+0x1f/0x140
kasan_report.cold+0x7f/0x11b
thermal_cooling_device_stats_update+0x271/0x290
__thermal_cdev_update+0x15e/0x4e0
thermal_cdev_update+0x9f/0xe0
step_wise_throttle+0x770/0xee0
thermal_zone_device_update+0x3f6/0xdf0
process_one_work+0xa42/0x1770
worker_thread+0x62f/0x13e0
kthread+0x3ee/0x4e0
ret_from_fork+0x1f/0x30
Allocated by task 1:
kasan_save_stack+0x1b/0x40
__kasan_kmalloc+0x7c/0x90
thermal_cooling_device_setup_sysfs+0x153/0x2c0
__thermal_cooling_device_register.part.0+0x25b/0x9c0
thermal_cooling_device_register+0xb3/0x100
mlxsw_thermal_init+0x5c5/0x7e0
__mlxsw_core_bus_device_register+0xcb3/0x19c0
mlxsw_core_bus_device_register+0x56/0xb0
mlxsw_pci_probe+0x54f/0x710
local_pci_probe+0xc6/0x170
pci_device_probe+0x2b2/0x4d0
really_probe+0x293/0xd10
__driver_probe_device+0x2af/0x440
driver_probe_device+0x51/0x1e0
__driver_attach+0x21b/0x530
bus_for_each_dev+0x14c/0x1d0
bus_add_driver+0x3ac/0x650
driver_register+0x241/0x3d0
mlxsw_sp_module_init+0xa2/0x174
do_one_initcall+0xee/0x5f0
kernel_init_freeable+0x45a/0x4de
kernel_init+0x1f/0x210
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff8881052f7800
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 1016 bytes inside of
1024-byte region [ffff8881052f7800, ffff8881052f7c00)
The buggy address belongs to the page:
page:0000000052355272 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1052f0
head:0000000052355272 order:3 compound_mapcount:0 compound_pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffffea0005034800 0000000300000003 ffff888100041dc0
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8881052f7a80: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc
ffff8881052f7b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8881052f7b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff8881052f7c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8881052f7c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[2] https://lore.kernel.org/linux-pm/9aca37cb-1629-5c67-1895-1fdc45c0244e@linaro.org/
[3] https://lore.kernel.org/linux-pm/af9857f2-578e-de3a-e62b-6baff7e69fd4@linaro.org/
CC: Daniel Lezcano <daniel.lezcano@linaro.org>
Fixes: a50c1e35650b ("mlxsw: core: Implement thermal zone")
Fixes: a421ce088ac8 ("mlxsw: core: Extend cooling device with cooling levels")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20211012174955.472928-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
After recent cleanups, gcc started warning about a suspicious
memcpy() call during the s2io_io_resume() function:
In function '__dev_addr_set',
inlined from 'eth_hw_addr_set' at include/linux/etherdevice.h:318:2,
inlined from 's2io_set_mac_addr' at drivers/net/ethernet/neterion/s2io.c:5205:2,
inlined from 's2io_io_resume' at drivers/net/ethernet/neterion/s2io.c:8569:7:
arch/x86/include/asm/string_32.h:182:25: error: '__builtin_memcpy' accessing 6 bytes at offsets 0 and 2 overlaps 4 bytes at offset 2 [-Werror=restrict]
182 | #define memcpy(t, f, n) __builtin_memcpy(t, f, n)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/netdevice.h:4648:9: note: in expansion of macro 'memcpy'
4648 | memcpy(dev->dev_addr, addr, len);
| ^~~~~~
What apparently happened is that an old cleanup changed the calling
conventions for s2io_set_mac_addr() from taking an ethernet address
as a character array to taking a struct sockaddr, but one of the
callers was not changed at the same time.
Change it to instead call the low-level do_s2io_prog_unicast() function
that still takes the old argument type.
Fixes: 2fd376884558 ("S2io: Added support set_mac_address driver entry point")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211013143613.2049096-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
devm_regmap_init may return error which caused by like out of memory,
this will results in null pointer dereference later when reading
or writing register:
general protection fault in encx24j600_spi_probe
KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
CPU: 0 PID: 286 Comm: spi-encx24j600- Not tainted 5.15.0-rc2-00142-g9978db750e31-dirty #11 9c53a778c1306b1b02359f3c2bbedc0222cba652
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:regcache_cache_bypass drivers/base/regmap/regcache.c:540
Code: 54 41 89 f4 55 53 48 89 fb 48 83 ec 08 e8 26 94 a8 fe 48 8d bb a0 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 4a 03 00 00 4c 8d ab b0 00 00 00 48 8b ab a0 00
RSP: 0018:ffffc900010476b8 EFLAGS: 00010207
RAX: dffffc0000000000 RBX: fffffffffffffff4 RCX: 0000000000000000
RDX: 0000000000000012 RSI: ffff888002de0000 RDI: 0000000000000094
RBP: ffff888013c9a000 R08: 0000000000000000 R09: fffffbfff3f9cc6a
R10: ffffc900010476e8 R11: fffffbfff3f9cc69 R12: 0000000000000001
R13: 000000000000000a R14: ffff888013c9af54 R15: ffff888013c9ad08
FS: 00007ffa984ab580(0000) GS:ffff88801fe00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055a6384136c8 CR3: 000000003bbe6003 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
encx24j600_spi_probe drivers/net/ethernet/microchip/encx24j600.c:459
spi_probe drivers/spi/spi.c:397
really_probe drivers/base/dd.c:517
__driver_probe_device drivers/base/dd.c:751
driver_probe_device drivers/base/dd.c:782
__device_attach_driver drivers/base/dd.c:899
bus_for_each_drv drivers/base/bus.c:427
__device_attach drivers/base/dd.c:971
bus_probe_device drivers/base/bus.c:487
device_add drivers/base/core.c:3364
__spi_add_device drivers/spi/spi.c:599
spi_add_device drivers/spi/spi.c:641
spi_new_device drivers/spi/spi.c:717
new_device_store+0x18c/0x1f1 [spi_stub 4e02719357f1ff33f5a43d00630982840568e85e]
dev_attr_store drivers/base/core.c:2074
sysfs_kf_write fs/sysfs/file.c:139
kernfs_fop_write_iter fs/kernfs/file.c:300
new_sync_write fs/read_write.c:508 (discriminator 4)
vfs_write fs/read_write.c:594
ksys_write fs/read_write.c:648
do_syscall_64 arch/x86/entry/common.c:50
entry_SYSCALL_64_after_hwframe arch/x86/entry/entry_64.S:113
Add error check in devm_regmap_init_encx24j600 to avoid this situation.
Fixes: 04fbfce7a222 ("net: Microchip encx24j600 driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Link: https://lore.kernel.org/r/20211012125901.3623144-1-sunnanyong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | |\
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-10-12
* tag 'mlx5-fixes-2021-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Fix division by 0 in mlx5e_select_queue for representors
net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp
net/mlx5e: Switchdev representors are not vlan challenged
net/mlx5e: Fix memory leak in mlx5_core_destroy_cq() error path
net/mlx5e: Allow only complete TXQs partition in MQPRIO channel mode
net/mlx5: Fix cleanup of bridge delayed work
====================
Link: https://lore.kernel.org/r/20211012205323.20123-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Commit 846d6da1fcdb ("net/mlx5e: Fix division by 0 in
mlx5e_select_queue") makes mlx5e_build_nic_params assign a non-zero
initial value to priv->num_tc_x_num_ch, so that mlx5e_select_queue
doesn't fail with division by 0 if called before the first activation of
channels. However, the initialization flow of representors doesn't call
mlx5e_build_nic_params, so this bug can still happen with representors.
This commit fixes the bug by adding the missing assignment to
mlx5e_build_rep_params.
Fixes: 846d6da1fcdb ("net/mlx5e: Fix division by 0 in mlx5e_select_queue")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Due to current HW arch limitations, RX-FCS (scattering FCS frame field
to software) and RX-port-timestamp (improved timestamp accuracy on the
receive side) can't work together.
RX-port-timestamp is not controlled by the user and it is enabled by
default when supported by the HW/FW.
This patch sets RX-port-timestamp opposite to RX-FCS configuration.
Fixes: 102722fc6832 ("net/mlx5e: Add support for RXFCS feature flag")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Before this patch, mlx5 representors advertised the
NETIF_F_VLAN_CHALLENGED bit, this could lead to missing features when
using reps with vxlan/bridge and maybe other virtual interfaces,
when such interfaces inherit this bit and block vlan usage in their
topology.
Example:
$ip link add dev bridge type bridge
# add representor interface to the bridge
$ip link set dev pf0hpf master
$ip link add link bridge name vlan10 type vlan id 10 protocol 802.1q
Error: 8021q: VLANs not supported on device.
Reps are perfectly capable of handling vlan traffic, although they don't
implement vlan_{add,kill}_vid ndos, hence, remove
NETIF_F_VLAN_CHALLENGED advertisement.
Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors")
Reported-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Prior to this patch in case mlx5_core_destroy_cq() failed it returns
without completing all destroy operations and that leads to memory leak.
Instead, complete the destroy flow before return error.
Also move mlx5_debug_cq_remove() to the beginning of mlx5_core_destroy_cq()
to be symmetrical with mlx5_core_create_cq().
kmemleak complains on:
unreferenced object 0xc000000038625100 (size 64):
comm "ethtool", pid 28301, jiffies 4298062946 (age 785.380s)
hex dump (first 32 bytes):
60 01 48 94 00 00 00 c0 b8 05 34 c3 00 00 00 c0 `.H.......4.....
02 00 00 00 00 00 00 00 00 db 7d c1 00 00 00 c0 ..........}.....
backtrace:
[<000000009e8643cb>] add_res_tree+0xd0/0x270 [mlx5_core]
[<00000000e7cb8e6c>] mlx5_debug_cq_add+0x5c/0xc0 [mlx5_core]
[<000000002a12918f>] mlx5_core_create_cq+0x1d0/0x2d0 [mlx5_core]
[<00000000cef0a696>] mlx5e_create_cq+0x210/0x3f0 [mlx5_core]
[<000000009c642c26>] mlx5e_open_cq+0xb4/0x130 [mlx5_core]
[<0000000058dfa578>] mlx5e_ptp_open+0x7f4/0xe10 [mlx5_core]
[<0000000081839561>] mlx5e_open_channels+0x9cc/0x13e0 [mlx5_core]
[<0000000009cf05d4>] mlx5e_switch_priv_channels+0xa4/0x230
[mlx5_core]
[<0000000042bbedd8>] mlx5e_safe_switch_params+0x14c/0x300
[mlx5_core]
[<0000000004bc9db8>] set_pflag_tx_port_ts+0x9c/0x160 [mlx5_core]
[<00000000a0553443>] mlx5e_set_priv_flags+0xd0/0x1b0 [mlx5_core]
[<00000000a8f3d84b>] ethnl_set_privflags+0x234/0x2d0
[<00000000fd27f27c>] genl_family_rcv_msg_doit+0x108/0x1d0
[<00000000f495e2bb>] genl_family_rcv_msg+0xe4/0x1f0
[<00000000646c5c2c>] genl_rcv_msg+0x78/0x120
[<00000000d53e384e>] netlink_rcv_skb+0x74/0x1a0
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Valentine Fatiev <valentinef@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Do not allow configurations of MQPRIO channel mode that do not
fully define and utilize the channels txqs.
Fixes: ec60c4581bd9 ("net/mlx5e: Support MQPRIO channel mode")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently, bridge cleanup is calling to cancel_delayed_work(). When this
function is finished, there is a chance that the delayed work is still
running. Also, the delayed work is queueing itself.
As a result, we might execute the delayed work after the bridge cleanup
have finished and hit a null-ptr oops[1].
Fix it by using cancel_delayed_work_sync(), which is waiting until the
work is done and will cancel the queue work.
[1]
[ 8202.143043 ] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 8202.144438 ] #PF: supervisor write access in kernel mode
[ 8202.145476 ] #PF: error_code(0x0002) - not-present page
[ 8202.146520 ] PGD 0 P4D 0
[ 8202.147126 ] Oops: 0002 [#1] SMP NOPTI
[ 8202.147899 ] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.14.0-rc6_for_upstream_min_debug_2021_08_25_16_06 #1
[ 8202.149741 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 8202.151908 ] RIP: 0010:_raw_spin_lock+0xc/0x20
[ 8202.156234 ] RSP: 0018:ffff88846f885ea0 EFLAGS: 00010046
[ 8202.157289 ] RAX: 0000000000000000 RBX: ffff88846f880000 RCX: 0000000000000000
[ 8202.158731 ] RDX: 0000000000000001 RSI: ffff8881004000c8 RDI: 0000000000000000
[ 8202.160177 ] RBP: ffff8881fe684978 R08: ffff888100140000 R09: ffffffff824455b8
[ 8202.161569 ] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 8202.163004 ] R13: 0000000000000012 R14: 0000000000000200 R15: ffff88812992d000
[ 8202.164018 ] FS: 0000000000000000(0000) GS:ffff88846f880000(0000) knlGS:0000000000000000
[ 8202.164960 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8202.165634 ] CR2: 0000000000000000 CR3: 0000000108cac004 CR4: 0000000000370ea0
[ 8202.166450 ] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8202.167807 ] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8202.168852 ] Call Trace:
[ 8202.169421 ] <IRQ>
[ 8202.169792 ] __queue_work+0xf2/0x3d0
[ 8202.170481 ] ? queue_work_node+0x40/0x40
[ 8202.171270 ] call_timer_fn+0x2b/0x100
[ 8202.171932 ] __run_timers.part.0+0x152/0x220
[ 8202.172717 ] ? __hrtimer_run_queues+0x171/0x290
[ 8202.173526 ] ? kvm_clock_get_cycles+0xd/0x10
[ 8202.174232 ] ? ktime_get+0x35/0x90
[ 8202.174943 ] run_timer_softirq+0x26/0x50
[ 8202.175745 ] __do_softirq+0xc7/0x271
[ 8202.176373 ] irq_exit_rcu+0x93/0xb0
[ 8202.176983 ] sysvec_apic_timer_interrupt+0x72/0x90
[ 8202.177755 ] </IRQ>
[ 8202.178245 ] asm_sysvec_apic_timer_interrupt+0x12/0x20
Fixes: c636a0f0f3f0 ("net/mlx5: Bridge, dynamic entry ageing")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fix the following build/link error by adding a dependency on the CRC32
routines:
ld: drivers/net/ethernet/korina.o: in function `korina_multicast_list':
korina.c:(.text+0x1af): undefined reference to `crc32_le'
Fixes: ef11291bcd5f9 ("Add support the Korina (IDT RC32434) Ethernet MAC")
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Florian fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20211012152509.21771-1-vegard.nossum@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fix the following build/link error by adding a dependency on the CRC32
routines:
ld: drivers/net/ethernet/arc/emac_main.o: in function `arc_emac_set_rx_mode':
emac_main.c:(.text+0xb11): undefined reference to `crc32_le'
The crc32_le() call comes through the ether_crc_le() call in
arc_emac_set_rx_mode().
[v2: moved the select to ARC_EMAC_CORE; the Makefile is a bit confusing,
but the error comes from emac_main.o, which is part of the arc_emac module,
which in turn is enabled by CONFIG_ARC_EMAC_CORE. Note that arc_emac is
different from emac_arc...]
Fixes: 775dd682e2b0ec ("arc_emac: implement promiscuous mode and multicast filtering")
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Link: https://lore.kernel.org/r/20211012093446.1575-1-vegard.nossum@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The NXP LS1028A switch has two Ethernet ports towards the CPU, but only
one of them is capable of acting as an NPI port at a time (inject and
extract packets using DSA tags).
However, using the alternative ocelot-8021q tagging protocol, it should
be possible to use both CPU ports symmetrically, but for that we need to
mark both ports in the device tree as DSA masters.
In the process of doing that, it can be seen that traffic to/from the
network stack gets broken, and this is because the Felix driver iterates
through all DSA CPU ports and configures them as NPI ports. But since
there can only be a single NPI port, we effectively end up in a
situation where DSA thinks the default CPU port is the first one, but
the hardware port configured to be an NPI is the last one.
I would like to treat this as a bug, because if the updated device trees
are going to start circulating, it would be really good for existing
kernels to support them, too.
Fixes: adb3dccf090b ("net: dsa: felix: convert to the new .change_tag_protocol DSA API")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
At present, when a PTP packet which requires TX timestamping gets
dropped under congestion by the switch, things go downhill very fast.
The driver keeps a clone of that skb in a queue of packets awaiting TX
timestamp interrupts, but interrupts will never be raised for the
dropped packets.
Moreover, matching timestamped packets to timestamps is done by a 2-bit
timestamp ID, and this can wrap around and we can match on the wrong skb.
Since with the default NPI-based tagging protocol, we get no notification
about packet drops, the best we can do is eventually recover from the
drop of a PTP frame: its skb will be dead memory until another skb which
was assigned the same timestamp ID happens to find it.
However, with the ocelot-8021q tagger which injects packets using the
manual register interface, it appears that we can check for more
information, such as:
- whether the input queue has reached the high watermark or not
- whether the injection group's FIFO can accept additional data or not
so we know that a PTP frame is likely to get dropped before actually
sending it, and drop it ourselves (because DSA uses NETIF_F_LLTX, so it
can't return NETDEV_TX_BUSY to ask the qdisc to requeue the packet).
But when we do that, we can also remove the skb from the timestamping
queue, because there surely won't be any timestamp that matches it.
Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Michael reported that when using the "ocelot-8021q" tagging protocol,
the switch driver module must be manually loaded before the tagging
protocol can be loaded/is available.
This appears to be the same problem described here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
where due to the fact that DSA tagging protocols make use of symbols
exported by the switch drivers, circular dependencies appear and this
breaks module autoloading.
The ocelot_8021q driver needs the ocelot_can_inject() and
ocelot_port_inject_frame() functions from the switch library. Previously
the wrong approach was taken to solve that dependency: shims were
provided for the case where the ocelot switch library was compiled out,
but that turns out to be insufficient, because the dependency when the
switch lib _is_ compiled is problematic too.
We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as
static inline functions, because these access I/O functions like
__ocelot_write_ix() which is called by ocelot_write_rix(). Making those
static inline basically means exposing the whole guts of the ocelot
switch library, not ideal...
We already have one tagging protocol driver which calls into the switch
driver during xmit but not using any exported symbol: sja1105_defer_xmit.
We can do the same thing here: create a kthread worker and one work item
per skb, and let the switch driver itself do the register accesses to
send the skb, and then consume it.
Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
Reported-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
As explained here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
DSA tagging protocol drivers cannot depend on symbols exported by switch
drivers, because this creates a circular dependency that breaks module
autoloading.
The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function
exported by the common ocelot switch lib. This function looks at
OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the
DSA tag, for PTP timestamping (the command: one-step/two-step, and the
TX timestamp identifier).
None of that requires deep insight into the driver, it is quite
stateless, as it only depends upon the skb->cb. So let's make it a
static inline function and put it in include/linux/dsa/ocelot.h, a
file that despite its name is used by the ocelot switch driver for
populating the injection header too - since commit 40d3f295b5fe ("net:
mscc: ocelot: use common tag parsing code with DSA").
With that function declared as static inline, its body is expanded
inside each call site, so the dependency is broken and the DSA tagger
can be built without the switch library, upon which the felix driver
depends.
Fixes: 39e5308b3250 ("net: mscc: ocelot: support PTP Sync one-step timestamping")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
the skb PTP header
The sad reality is that when a PTP frame with a TX timestamping request
is transmitted, it isn't guaranteed that it will make it all the way to
the wire (due to congestion inside the switch), and that a timestamp
will be taken by the hardware and placed in the timestamp FIFO where an
IRQ will be raised for it.
The implication is that if enough PTP frames are silently dropped by the
hardware such that the timestamp ID has rolled over, it is possible to
match a timestamp to an old skb.
Furthermore, nobody will match on the real skb corresponding to this
timestamp, since we stupidly matched on a previous one that was stale in
the queue, and stopped there.
So PTP timestamping will be broken and there will be no way to recover.
It looks like the hardware parses the sequenceID from the PTP header,
and also provides that metadata for each timestamp. The driver currently
ignores this, but it shouldn't.
As an extra resiliency measure, do the following:
- check whether the PTP sequenceID also matches between the skb and the
timestamp, treat the skb as stale otherwise and free it
- if we see a stale skb, don't stop there and try to match an skb one
more time, chances are there's one more skb in the queue with the same
timestamp ID, otherwise we wouldn't have ever found the stale one (it
is by timestamp ID that we matched it).
While this does not prevent PTP packet drops, it at least prevents
the catastrophic consequences of incorrect timestamp matching.
Since we already call ptp_classify_raw in the TX path, save the result
in the skb->cb of the clone, and just use that result in the interrupt
code path.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It appears that Ocelot switches cannot timestamp non-PTP frames,
I tested this using the isochron program at:
https://github.com/vladimiroltean/tsn-scripts
with the result that the driver increments the ocelot_port->ts_id
counter as expected, puts it in the REW_OP, but the hardware seems to
not timestamp these packets at all, since no IRQ is emitted.
Therefore check whether we are sending PTP frames, and refuse to
populate REW_OP otherwise.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When skb_match is NULL, it means we received a PTP IRQ for a timestamp
ID that the kernel has no idea about, since there is no skb in the
timestamping queue with that timestamp ID.
This is a grave error and not something to just "continue" over.
So print a big warning in case this happens.
Also, move the check above ocelot_get_hwtimestamp(), there is no point
in reading the full 64-bit current PTP time if we're not going to do
anything with it anyway for this skb.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
PTP packets with 2-step TX timestamp requests are matched to packets
based on the egress port number and a 6-bit timestamp identifier.
All PTP timestamps are held in a common FIFO that is 128 entry deep.
This patch ensures that back-to-back timestamping requests cannot exceed
the hardware FIFO capacity. If that happens, simply send the packets
without requesting a TX timestamp to be taken (in the case of felix,
since the DSA API has a void return code in ds->ops->port_txtstamp) or
drop them (in the case of ocelot).
I've moved the ts_id_lock from a per-port basis to a per-switch basis,
because we need separate accounting for both numbers of PTP frames in
flight. And since we need locking to inc/dec the per-switch counter,
that also offers protection for the per-port counter and hence there is
no reason to have a per-port counter anymore.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
At present, there is a problem when user space bombards a port with PTP
event frames which have TX timestamping requests (or when a tc-taprio
offload is installed on a port, which delays the TX timestamps by a
significant amount of time). The driver will happily roll over the 2-bit
timestamp ID and this will cause incorrect matches between an skb and
the TX timestamp collected from the FIFO.
The Ocelot switches have a 6-bit PTP timestamp identifier, and the value
63 is reserved, so that leaves identifiers 0-62 to be used.
The timestamp identifiers are selected by the REW_OP packet field, and
are actually shared between CPU-injected frames and frames which match a
VCAP IS2 rule that modifies the REW_OP. The hardware supports
partitioning between the two uses of the REW_OP field through the
PTP_ID_LOW and PTP_ID_HIGH registers, and by default reserves the PTP
IDs 0-3 for CPU-injected traffic and the rest for VCAP IS2.
The driver does not use VCAP IS2 to set REW_OP for 2-step timestamping,
and it also writes 0xffffffff to both PTP_ID_HIGH and PTP_ID_LOW in
ocelot_init_timestamp() which makes all timestamp identifiers available
to CPU injection.
Therefore, we can make use of all 63 timestamp identifiers, which should
allow more timestampable packets to be in flight on each port. This is
only part of the solution, more issues will be addressed in future changes.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
driver
It's nice to be able to test a tagging protocol with dsa_loop, but not
at the cost of losing the ability of building the tagging protocol and
switch driver as modules, because as things stand, there is a circular
dependency between the two. Tagging protocol drivers cannot depend on
switch drivers, that is a hard fact.
The reasoning behind the blamed patch was that accessing dp->priv should
first make sure that the structure behind that pointer is what we really
think it is.
Currently the "sja1105" and "sja1110" tagging protocols only operate
with the sja1105 switch driver, just like any other tagging protocol and
switch combination. The only way to mix and match them is by modifying
the code, and this applies to dsa_loop as well (by default that uses
DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
practice there isn't one.
Until we extend dsa_loop to allow user space configuration, treat the
problem as a non-issue and just say that DSA ports found by tag_sja1105
are always sja1105 ports, which is in fact true. But keep the
dsa_port_is_sja1105 function so that it's easy to patch it during
testing, and rely on dead code elimination.
Fixes: 994d2cbb08ca ("net: dsa: tag_sja1105: be dsa_loop-safe")
Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The problem is that DSA tagging protocols really must not depend on the
switch driver, because this creates a circular dependency at insmod
time, and the switch driver will effectively not load when the tagging
protocol driver is missing.
The code was structured in the way it was for a reason, though. The DSA
driver-facing API for PTP timestamping relies on the assumption that
two-step TX timestamps are provided by the hardware in an out-of-band
manner, typically by raising an interrupt and making that timestamp
available inside some sort of FIFO which is to be accessed over
SPI/MDIO/etc.
So the API puts .port_txtstamp into dsa_switch_ops, because it is
expected that the switch driver needs to save some state (like put the
skb into a queue until its TX timestamp arrives).
On SJA1110, TX timestamps are provided by the switch as Ethernet
packets, so this makes them be received and processed by the tagging
protocol driver. This in itself is great, because the timestamps are
full 64-bit and do not require reconstruction, and since Ethernet is the
fastest I/O method available to/from the switch, PTP timestamps arrive
very quickly, no matter how bottlenecked the SPI connection is, because
SPI interaction is not needed at all.
DSA's code structure and strict isolation between the tagging protocol
driver and the switch driver break the natural code organization.
When the tagging protocol driver receives a packet which is classified
as a metadata packet containing timestamps, it passes those timestamps
one by one to the switch driver, which then proceeds to compare them
based on the recorded timestamp ID that was generated in .port_txtstamp.
The communication between the tagging protocol and the switch driver is
done through a method exported by the switch driver, sja1110_process_meta_tstamp.
To satisfy build requirements, we force a dependency to build the
tagging protocol driver as a module when the switch driver is a module.
However, as explained in the first paragraph, that causes the circular
dependency.
To solve this, move the skb queue from struct sja1105_private :: struct
sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
The latter is a data structure for which hacks have already been put
into place to be able to create persistent storage per switch that is
accessible from the tagging protocol driver (see sja1105_setup_ports).
With the skb queue directly accessible from the tagging protocol driver,
we can now move sja1110_process_meta_tstamp into the tagging driver
itself, and avoid exporting a symbol.
Fixes: 566b18c8b752 ("net: dsa: sja1105: implement TX timestamping for SJA1110")
Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In commit 74fc4f828769 ("net: Fix offloading indirect devices dependency
on qdisc order creation"), it adds a process to trigger the callback to
setup the bo callback when the driver regists a callback.
In our current implement, we are not ready to run the callback when nfp
call the function flow_indr_dev_register, then there will be error
message as:
kernel: Oops: 0000 [#1] SMP PTI
kernel: CPU: 0 PID: 14119 Comm: kworker/0:0 Tainted: G
kernel: Workqueue: events work_for_cpu_fn
kernel: RIP: 0010:nfp_flower_indr_setup_tc_cb+0x258/0x410
kernel: RSP: 0018:ffffbc1e02c57bf8 EFLAGS: 00010286
kernel: RAX: 0000000000000000 RBX: ffff9c761fabc000 RCX: 0000000000000001
kernel: RDX: 0000000000000001 RSI: fffffffffffffff0 RDI: ffffffffc0be9ef1
kernel: RBP: ffffbc1e02c57c58 R08: ffffffffc08f33aa R09: ffff9c6db7478800
kernel: R10: 0000009c003f6e00 R11: ffffbc1e02800000 R12: ffffbc1e000d9000
kernel: R13: ffffbc1e000db428 R14: ffff9c6db7478800 R15: ffff9c761e884e80
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: fffffffffffffff0 CR3: 00000009e260a004 CR4: 00000000007706f0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: PKRU: 55555554
kernel: Call Trace:
kernel: ? flow_indr_dev_register+0xab/0x210
kernel: ? __cond_resched+0x15/0x30
kernel: ? kmem_cache_alloc_trace+0x44/0x4b0
kernel: ? nfp_flower_setup_tc+0x1d0/0x1d0 [nfp]
kernel: flow_indr_dev_register+0x158/0x210
kernel: ? tcf_block_unbind+0xe0/0xe0
kernel: nfp_flower_init+0x40b/0x650 [nfp]
kernel: nfp_net_pci_probe+0x25f/0x960 [nfp]
kernel: ? nfp_rtsym_read_le+0x76/0x130 [nfp]
kernel: nfp_pci_probe+0x6a9/0x820 [nfp]
kernel: local_pci_probe+0x45/0x80
So we need to call flow_indr_dev_register in app start process instead of
init stage.
Fixes: 74fc4f828769 ("net: Fix offloading indirect devices dependency on qdisc order creation")
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Link: https://lore.kernel.org/r/20211012124850.13025-1-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit 4dd0d5c33c3e ("ice: add lock around Tx timestamp tracker flush")
added a lock around the Tx timestamp tracker flow which is used to
cleanup any left over SKBs and prepare for device removal.
This lock is problematic because it is being held around a call to
ice_clear_phy_tstamp. The clear function takes a mutex to send a PHY
write command to firmware. This could lead to a deadlock if the mutex
actually sleeps, and causes the following warning on a kernel with
preemption debugging enabled:
[ 715.419426] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:573
[ 715.427900] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3100, name: rmmod
[ 715.435652] INFO: lockdep is turned off.
[ 715.439591] Preemption disabled at:
[ 715.439594] [<0000000000000000>] 0x0
[ 715.446678] CPU: 52 PID: 3100 Comm: rmmod Tainted: G W OE 5.15.0-rc4+ #42 bdd7ec3018e725f159ca0d372ce8c2c0e784891c
[ 715.458058] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
[ 715.468483] Call Trace:
[ 715.470940] dump_stack_lvl+0x6a/0x9a
[ 715.474613] ___might_sleep.cold+0x224/0x26a
[ 715.478895] __mutex_lock+0xb3/0x1440
[ 715.482569] ? stack_depot_save+0x378/0x500
[ 715.486763] ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.494979] ? kfree+0xc1/0x520
[ 715.498128] ? mutex_lock_io_nested+0x12a0/0x12a0
[ 715.502837] ? kasan_set_free_info+0x20/0x30
[ 715.507110] ? __kasan_slab_free+0x10b/0x140
[ 715.511385] ? slab_free_freelist_hook+0xc7/0x220
[ 715.516092] ? kfree+0xc1/0x520
[ 715.519235] ? ice_deinit_lag+0x16c/0x220 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.527359] ? ice_remove+0x1cf/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.535133] ? pci_device_remove+0xab/0x1d0
[ 715.539318] ? __device_release_driver+0x35b/0x690
[ 715.544110] ? driver_detach+0x214/0x2f0
[ 715.548035] ? bus_remove_driver+0x11d/0x2f0
[ 715.552309] ? pci_unregister_driver+0x26/0x250
[ 715.556840] ? ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.564799] ? __do_sys_delete_module.constprop.0+0x2d8/0x4e0
[ 715.570554] ? do_syscall_64+0x3b/0x90
[ 715.574303] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 715.579529] ? start_flush_work+0x542/0x8f0
[ 715.583719] ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.591923] ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.599960] ? wait_for_completion_io+0x250/0x250
[ 715.604662] ? lock_acquire+0x196/0x200
[ 715.608504] ? do_raw_spin_trylock+0xa5/0x160
[ 715.612864] ice_sbq_rw_reg+0x1e6/0x2f0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.620813] ? ice_reset+0x130/0x130 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.628497] ? __debug_check_no_obj_freed+0x1e8/0x3c0
[ 715.633550] ? trace_hardirqs_on+0x1c/0x130
[ 715.637748] ice_write_phy_reg_e810+0x70/0xf0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.646220] ? do_raw_spin_trylock+0xa5/0x160
[ 715.650581] ? ice_ptp_release+0x910/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.658797] ? ice_ptp_release+0x255/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.667013] ice_clear_phy_tstamp+0x2c/0x110 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.675403] ice_ptp_release+0x408/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.683440] ice_remove+0x560/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.691037] ? _raw_spin_unlock_irqrestore+0x46/0x73
[ 715.696005] pci_device_remove+0xab/0x1d0
[ 715.700018] __device_release_driver+0x35b/0x690
[ 715.704637] driver_detach+0x214/0x2f0
[ 715.708389] bus_remove_driver+0x11d/0x2f0
[ 715.712489] pci_unregister_driver+0x26/0x250
[ 715.716857] ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[ 715.724637] __do_sys_delete_module.constprop.0+0x2d8/0x4e0
[ 715.730210] ? free_module+0x6d0/0x6d0
[ 715.733963] ? task_work_run+0xe1/0x170
[ 715.737803] ? exit_to_user_mode_loop+0x17f/0x1d0
[ 715.742509] ? rcu_read_lock_sched_held+0x12/0x80
[ 715.747215] ? trace_hardirqs_on+0x1c/0x130
[ 715.751401] do_syscall_64+0x3b/0x90
[ 715.754981] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 715.760033] RIP: 0033:0x7f4dfe59000b
[ 715.763612] Code: 73 01 c3 48 8b 0d 6d 1e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 1e 0c 00 f7 d8 64 89 01 48
[ 715.782357] RSP: 002b:00007ffe8c891708 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 715.789923] RAX: ffffffffffffffda RBX: 00005558a20468b0 RCX: 00007f4dfe59000b
[ 715.797054] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005558a2046918
[ 715.804189] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 715.811319] R10: 00007f4dfe603ac0 R11: 0000000000000206 R12: 00007ffe8c891940
[ 715.818455] R13: 00007ffe8c8920a3 R14: 00005558a20462a0 R15: 00005558a20468b0
Notice that this is the only case where we use the lock in this way. In
the cleanup kthread and work kthread the lock is only taken around the
bit accesses. This was done intentionally to avoid this kind of issue.
The way the lock is used, we only protect ordering of bit sets vs bit
clears. The Tx writers in the hot path don't need to be protected
against the entire kthread loop. The Tx queues threads only need to
ensure that they do not re-use an index that is currently in use. The
cleanup loop does not need to block all new set bits, since it will
re-queue itself if new timestamps are present.
Fix the tracker flow so that it uses the same flow as the standard
cleanup thread. In addition, ensure the in_use bitmap actually gets
cleared properly.
This fixes the warning and also avoids the potential deadlock that might
have occurred otherwise.
Fixes: 4dd0d5c33c3e ("ice: add lock around Tx timestamp tracker flush")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When the ksz module is installed and removed using rmmod, kernel crashes
with null pointer dereferrence error. During rmmod, ksz_switch_remove
function tries to cancel the mib_read_workqueue using
cancel_delayed_work_sync routine and unregister switch from dsa.
During dsa_unregister_switch it calls ksz_mac_link_down, which in turn
reschedules the workqueue since mib_interval is non-zero.
Due to which queue executed after mib_interval and it tries to access
dp->slave. But the slave is unregistered in the ksz_switch_remove
function. Hence kernel crashes.
To avoid this crash, before canceling the workqueue, resetted the
mib_interval to 0.
v1 -> v2:
-Removed the if condition in ksz_mib_read_work
Fixes: 469b390e1ba3 ("net: dsa: microchip: use delayed_work instead of timer + work")
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|