summaryrefslogtreecommitdiffstats
path: root/drivers (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2019-02-2424-208/+250
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: "Hopefully the last pull request for this release. Fingers crossed: 1) Only refcount ESP stats on full sockets, from Martin Willi. 2) Missing barriers in AF_UNIX, from Al Viro. 3) RCU protection fixes in ipv6 route code, from Paolo Abeni. 4) Avoid false positives in untrusted GSO validation, from Willem de Bruijn. 5) Forwarded mesh packets in mac80211 need more tailroom allocated, from Felix Fietkau. 6) Use operstate consistently for linkup in team driver, from George Wilkie. 7) ThunderX bug fixes from Vadim Lomovtsev. Mostly races between VF and PF code paths. 8) Purge ipv6 exceptions during netdevice removal, from Paolo Abeni. 9) nfp eBPF code gen fixes from Jiong Wang. 10) bnxt_en firmware timeout fix from Michael Chan. 11) Use after free in udp/udpv6 error handlers, from Paolo Abeni. 12) Fix a race in x25_bind triggerable by syzbot, from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits) net: phy: realtek: Dummy IRQ calls for RTL8366RB tcp: repaired skbs must init their tso_segs net/x25: fix a race in x25_bind() net: dsa: Remove documentation for port_fdb_prepare Revert "bridge: do not add port to router list when receives query with source 0.0.0.0" selftests: fib_tests: sleep after changing carrier. again. net: set static variable an initial value in atl2_probe() net: phy: marvell10g: Fix Multi-G advertisement to only advertise 10G bpf, doc: add bpf list as secondary entry to maintainers file udp: fix possible user after free in error handler udpv6: fix possible user after free in error handler fou6: fix proto error handler argument type udpv6: add the required annotation to mib type mdio_bus: Fix use-after-free on device_register fails net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 bnxt_en: Wait longer for the firmware message response to complete. bnxt_en: Fix typo in firmware message timeout logic. nfp: bpf: fix ALU32 high bits clearance bug nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K Documentation: networking: switchdev: Update port parent ID section ...
| * net: phy: realtek: Dummy IRQ calls for RTL8366RBLinus Walleij2019-02-241-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a regression introduced by commit 0d2e778e38e0ddffab4bb2b0e9ed2ad5165c4bf7 "net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt". This assumes that a PHY cannot trigger interrupt unless it has .config_intr() or .ack_interrupt() implemented. A later patch makes the code assume both need to be implemented for interrupts to be present. But this PHY (which is inside a DSA) will happily fire interrupts without either callback. Implement dummy callbacks for .config_intr() and .ack_interrupt() in the phy header to fix this. Tested on the RTL8366RB on D-Link DIR-685. Fixes: 0d2e778e38e0 ("net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt") Cc: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: set static variable an initial value in atl2_probe()Mao Wenan2019-02-231-3/+1
| | | | | | | | | | | | | | | | | | cards_found is a static variable, but when it enters atl2_probe(), cards_found is set to zero, the value is not consistent with last probe, so next behavior is not our expect. Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: phy: marvell10g: Fix Multi-G advertisement to only advertise 10GMaxime Chevallier2019-02-231-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | Some Marvell Alaska PHYs support 2.5G, 5G and 10G BaseT links. Their default behaviour is to advertise all of these modes, but at the moment, only 10GBaseT is supported. To prevent link partners from establishing link at that speed, clear these modes upon configuring aneg parameters. Fixes: 20b2af32ff3f ("net: phy: add Marvell Alaska X 88X3310 10Gigabit PHY support") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reported-by: Russell King <linux@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2019-02-231-11/+6
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf 2019-02-23 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a bug in BPF's LPM deletion logic to match correct prefix length, from Alban. 2) Fix AF_XDP teardown by not destroying umem prematurely as it is still needed till all outstanding skbs are freed, from Björn. 3) Fix unkillable BPF_PROG_TEST_RUN under preempt kernel by checking signal_pending() outside need_resched() condition which is never triggered there, from Stanislav. 4) Fix two nfp JIT bugs, one in code emission for K-based xor, and another one to explicitly clear upper bits in alu32, from Jiong. 5) Add bpf list address to maintainers file, from Daniel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * nfp: bpf: fix ALU32 high bits clearance bugJiong Wang2019-02-231-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFP BPF JIT compiler is doing a couple of small optimizations when jitting ALU imm instructions, some of these optimizations could save code-gen, for example: A & -1 = A A | 0 = A A ^ 0 = A However, for ALU32, high 32-bit of the 64-bit register should still be cleared according to ISA semantics. Fixes: cd7df56ed3e6 ("nfp: add BPF to NFP code translator") Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_KJiong Wang2019-02-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The intended optimization should be A ^ 0 = A, not A ^ -1 = A. Fixes: cd7df56ed3e6 ("nfp: add BPF to NFP code translator") Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | mdio_bus: Fix use-after-free on device_register failsYueHaibing2019-02-231-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KASAN has found use-after-free in fixed_mdio_bus_init, commit 0c692d07842a ("drivers/net/phy/mdio_bus.c: call put_device on device_register() failure") call put_device() while device_register() fails,give up the last reference to the device and allow mdiobus_release to be executed ,kfreeing the bus. However in most drives, mdiobus_free be called to free the bus while mdiobus_register fails. use-after-free occurs when access bus again, this patch revert it to let mdiobus_free free the bus. KASAN report details as below: BUG: KASAN: use-after-free in mdiobus_free+0x85/0x90 drivers/net/phy/mdio_bus.c:482 Read of size 4 at addr ffff8881dc824d78 by task syz-executor.0/3524 CPU: 1 PID: 3524 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xfa/0x1ce lib/dump_stack.c:113 print_address_description+0x65/0x270 mm/kasan/report.c:187 kasan_report+0x149/0x18d mm/kasan/report.c:317 mdiobus_free+0x85/0x90 drivers/net/phy/mdio_bus.c:482 fixed_mdio_bus_init+0x283/0x1000 [fixed_phy] ? 0xffffffffc0e40000 ? 0xffffffffc0e40000 ? 0xffffffffc0e40000 do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f6215c19c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003 RBP: 00007f6215c19c70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6215c1a6bc R13: 00000000004bcefb R14: 00000000006f7030 R15: 0000000000000004 Allocated by task 3524: set_track mm/kasan/common.c:85 [inline] __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496 kmalloc include/linux/slab.h:545 [inline] kzalloc include/linux/slab.h:740 [inline] mdiobus_alloc_size+0x54/0x1b0 drivers/net/phy/mdio_bus.c:143 fixed_mdio_bus_init+0x163/0x1000 [fixed_phy] do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 3524: set_track mm/kasan/common.c:85 [inline] __kasan_slab_free+0x130/0x180 mm/kasan/common.c:458 slab_free_hook mm/slub.c:1409 [inline] slab_free_freelist_hook mm/slub.c:1436 [inline] slab_free mm/slub.c:2986 [inline] kfree+0xe1/0x270 mm/slub.c:3938 device_release+0x78/0x200 drivers/base/core.c:919 kobject_cleanup lib/kobject.c:662 [inline] kobject_release lib/kobject.c:691 [inline] kref_put include/linux/kref.h:67 [inline] kobject_put+0x146/0x240 lib/kobject.c:708 put_device+0x1c/0x30 drivers/base/core.c:2060 __mdiobus_register+0x483/0x560 drivers/net/phy/mdio_bus.c:382 fixed_mdio_bus_init+0x26b/0x1000 [fixed_phy] do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8881dc824c80 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 248 bytes inside of 2048-byte region [ffff8881dc824c80, ffff8881dc825480) The buggy address belongs to the page: page:ffffea0007720800 count:1 mapcount:0 mapping:ffff8881f6c02800 index:0x0 compound_mapcount: 0 flags: 0x2fffc0000010200(slab|head) raw: 02fffc0000010200 0000000000000000 0000000500000001 ffff8881f6c02800 raw: 0000000000000000 00000000800f000f 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881dc824c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8881dc824c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8881dc824d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8881dc824d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8881dc824e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 0c692d07842a ("drivers/net/phy/mdio_bus.c: call put_device on device_register() failure") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bnxt_en: Wait longer for the firmware message response to complete.Michael Chan2019-02-232-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code waits up to 20 usec for the firmware response to complete once we've seen the valid response header in the buffer. It turns out that in some scenarios, this wait time is not long enough. Extend it to 150 usec and use usleep_range() instead of udelay(). Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bnxt_en: Fix typo in firmware message timeout logic.Michael Chan2019-02-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The logic that polls for the firmware message response uses a shorter sleep interval for the first few passes. But there was a typo so it was using the wrong counter (larger counter) for these short sleep passes. The result is a slightly shorter timeout period for these firmware messages than intended. Fix it by using the proper counter. Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge tag 'mac80211-for-davem-2019-02-22' of ↵David S. Miller2019-02-221-1/+1
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Three more fixes: * mac80211 mesh code wasn't allocating SKB tailroom properly in some cases * tx_sk_pacing_shift should be 7 for better performance * mac80211_hwsim wasn't propagating genlmsg_reply() errors ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | mac80211_hwsim: propagate genlmsg_reply return codeLi RongQing2019-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | genlmsg_reply can fail, so propagate its return code Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * | | net: thunderx: remove link change polling code and info from nicpfVadim Lomovtsev2019-02-221-102/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since link change polling routine was moved to nicvf side, we don't need anymore polling function at nicpf side along with link status info for all enabled Vfs as at VF side this info is already tracked. This commit is to remove unnecessary code & fields from nicpf structure. Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: thunderx: move link state polling function to VFVadim Lomovtsev2019-02-223-19/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the link change polling task to VF side in order to prevent races between VF and PF while sending link change message(s). This commit is to implement link change request to be initiated by VF. Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: thunderx: add mutex to protect mailbox from concurrent calls for same VFVadim Lomovtsev2019-02-222-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases it could happen that nicvf_send_msg_to_pf() could be called concurrently for the same NIC VF, and thus re-writing mailbox contents and breaking messaging sequence with PF by re-writing NICVF data. This commit is to implement mutex for NICVF to protect mailbox registers and NICVF messaging control data from concurrent access. Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: thunderx: rework xcast message structure to make it fit into 64 bitVadim Lomovtsev2019-02-223-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To communicate to PF each of ThunderX NIC VF uses mailbox which is pair of 64 bit registers available to both VFn and PF. This commit is to change the xcast message structure in order to fit it into 64 bit. Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: thunderx: add nicvf_send_msg_to_pf result check for set_rx_mode_taskVadim Lomovtsev2019-02-221-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rx_set_mode invokes number of messages to be send to PF for receive mode configuration. In case if there any issues we need to stop sending messages and release allocated memory. This commit is to implement check of nicvf_msg_send_to_pf() result. Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: thunderx: make CFG_DONE message to run through generic send-ack sequenceVadim Lomovtsev2019-02-222-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the end of NIC VF initialization VF sends CFG_DONE message to PF without using nicvf_msg_send_to_pf routine. This potentially could re-write data in mailbox. This commit is to implement common way of sending CFG_DONE message by the same way with other configuration messages by using nicvf_send_msg_to_pf() routine. Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to ↵Vadim Lomovtsev2019-02-222-15/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | private for each of them. Having one work queue for receive mode configuration ndo_set_rx_mode() call for all VFs results in making each of them wait till the set_rx_mode() call completes for another VF if any of close, set receive mode and change flags calls being already invoked. Potentially this could cause device state change before appropriate call of receive mode configuration completes, so the call itself became meaningless, corrupt data or break configuration sequence. We don't need any delays in NIC VF configuration sequence so having delayed work call with 0 delay has no sense. This commit is to implement one work queue for each NIC VF for set_rx_mode task and to let them work independently and replacing delayed_work with work_struct. Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: thunderx: correct typo in macro nameVadim Lomovtsev2019-02-222-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Correct STREERING to STEERING at macro name for BGX steering register. Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | team: use operstate consistently for linkupGeorge Wilkie2019-02-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a port is added to a team, its initial state is derived from netif_carrier_ok rather than netif_oper_up. If it is carrier up but operationally down at the time of being added, the port state.linkup will be set prematurely. port state.linkup should be set consistently using netif_oper_up rather than netif_carrier_ok. Fixes: f1d22a1e0595 ("team: account for oper state") Signed-off-by: George Wilkie <gwilkie@vyatta.att-mail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | r8152: Fix an error on RTL8153-BD MAC Address Passthrough supportDavid Chen2019-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RTL8153-BD is used in Dell DA300 type-C dongle. Added RTL8153-BD support to activate MAC address pass through on DA300. Apply correction on previously submitted patch in net.git tree. Signed-off-by: David Chen <david.chen7@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ipvlan: disallow userns cap_net_admin to change global mode/flagsDaniel Borkmann2019-02-221-0/+4
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running Docker with userns isolation e.g. --userns-remap="default" and spawning up some containers with CAP_NET_ADMIN under this realm, I noticed that link changes on ipvlan slave device inside that container can affect all devices from this ipvlan group which are in other net namespaces where the container should have no permission to make changes to, such as the init netns, for example. This effectively allows to undo ipvlan private mode and switch globally to bridge mode where slaves can communicate directly without going through hostns, or it allows to switch between global operation mode (l2/l3/l3s) for everyone bound to the given ipvlan master device. libnetwork plugin here is creating an ipvlan master and ipvlan slave in hostns and a slave each that is moved into the container's netns upon creation event. * In hostns: # ip -d a [...] 8: cilium_host@bond0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 ipvlan mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 10.41.0.1/32 scope link cilium_host valid_lft forever preferred_lft forever [...] * Spawn container & change ipvlan mode setting inside of it: # docker run -dt --cap-add=NET_ADMIN --network cilium-net --name client -l app=test cilium/netperf 9fff485d69dcb5ce37c9e33ca20a11ccafc236d690105aadbfb77e4f4170879c # docker exec -ti client ip -d a [...] 10: cilium0@if4: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 ipvlan mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0 valid_lft forever preferred_lft forever # docker exec -ti client ip link change link cilium0 name cilium0 type ipvlan mode l2 # docker exec -ti client ip -d a [...] 10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0 valid_lft forever preferred_lft forever * In hostns (mode switched to l2): # ip -d a [...] 8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 10.41.0.1/32 scope link cilium_host valid_lft forever preferred_lft forever [...] Same l3 -> l2 switch would also happen by creating another slave inside the container's network namespace when specifying the existing cilium0 link to derive the actual (bond0) master: # docker exec -ti client ip link add link cilium0 name cilium1 type ipvlan mode l2 # docker exec -ti client ip -d a [...] 2: cilium1@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0 valid_lft forever preferred_lft forever * In hostns: # ip -d a [...] 8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 10.41.0.1/32 scope link cilium_host valid_lft forever preferred_lft forever [...] One way to mitigate it is to check CAP_NET_ADMIN permissions of the ipvlan master device's ns, and only then allow to change mode or flags for all devices bound to it. Above two cases are then disallowed after the patch. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bonding: fix PACKET_ORIGDEV regressionMichal Soltys2019-02-211-21/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a subtle PACKET_ORIGDEV regression which was a side effect of fixes introduced by: 6a9e461f6fe4 bonding: pass link-local packets to bonding master also. ... to: b89f04c61efe bonding: deliver link-local packets with skb->dev set to link that packets arrived on While 6a9e461f6fe4 restored pre-b89f04c61efe presence of link-local packets on bonding masters (which is required e.g. by linux bridges participating in spanning tree or needed for lab-like setups created with group_fwd_mask) it also caused the originating device information to be lost due to cloning. Maciej Żenczykowski proposed another solution that doesn't require packet cloning and retains original device information - instead of returning RX_HANDLER_PASS for all link-local packets it's now limited only to packets from inactive slaves. At the same time, packets passed to bonding masters retain correct information about the originating device and PACKET_ORIGDEV can be used to determine it. This elegantly solves all issues so far: - link-local packets that were removed from bonding masters - LLDP daemons being forced to explicitly bind to slave interfaces - PACKET_ORIGDEV having no effect on bond interfaces Fixes: 6a9e461f6fe4 (bonding: pass link-local packets to bonding master also.) Reported-by: Vincent Bernat <vincent@bernat.ch> Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: vrf: remove MTU limits for vrf deviceHangbin Liu2019-02-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similiar to commit e94cd8113ce63 ("net: remove MTU limits for dummy and ifb device"), MTU is irrelevant for VRF device. We init it as 64K while limit it to [68, 1500] may make users feel confused. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OKJan Sokolowski2019-02-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An issue has been found while testing zero-copy XDP that causes a reset to be triggered. As it takes some time to turn the carrier on after setting zc, and we already start trying to transmit some packets, watchdog considers this as an erroneous state and triggers a reset. Don't do any work if netif carrier is not OK. Fixes: 8221c5eba8c13 (ixgbe: add AF_XDP zero-copy Tx support) Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: fix XDP_REDIRECT/XDP xmit ring cleanup raceBjörn Töpel2019-02-212-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the driver clears the XDP xmit ring due to re-configuration or teardown, in-progress ndo_xdp_xmit must be taken into consideration. The ndo_xdp_xmit function is typically called from a NAPI context that the driver does not control. Therefore, we must be careful not to clear the XDP ring, while the call is on-going. This patch adds a synchronize_rcu() to wait for napi(s) (preempt-disable regions and softirqs), prior clearing the queue. Further, the __I40E_CONFIG_BUSY flag is checked in the ndo_xdp_xmit implementation to avoid touching the XDP xmit queue during re-configuration. Fixes: d9314c474d4f ("i40e: add support for XDP_REDIRECT") Fixes: 123cecd427b6 ("i40e: added queue pair disable/enable functions") Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ixgbe: fix potential RX buffer starvation for AF_XDPMagnus Karlsson2019-02-212-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the RX rings are created they are also populated with buffers so that packets can be received. Usually these are kernel buffers, but for AF_XDP in zero-copy mode, these are user-space buffers and in this case the application might not have sent down any buffers to the driver at this point. And if no buffers are allocated at ring creation time, no packets can be received and no interrupts will be generated so the NAPI poll function that allocates buffers to the rings will never get executed. To rectify this, we kick the NAPI context of any queue with an attached AF_XDP zero-copy socket in two places in the code. Once after an XDP program has loaded and once after the umem is registered. This take care of both cases: XDP program gets loaded first then AF_XDP socket is created, and the reverse, AF_XDP socket is created first, then XDP program is loaded. Fixes: d0bcacd0a130 ("ixgbe: add AF_XDP zero-copy Rx support") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: fix potential RX buffer starvation for AF_XDPMagnus Karlsson2019-02-212-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the RX rings are created they are also populated with buffers so that packets can be received. Usually these are kernel buffers, but for AF_XDP in zero-copy mode, these are user-space buffers and in this case the application might not have sent down any buffers to the driver at this point. And if no buffers are allocated at ring creation time, no packets can be received and no interrupts will be generated so the NAPI poll function that allocates buffers to the rings will never get executed. To rectify this, we kick the NAPI context of any queue with an attached AF_XDP zero-copy socket in two places in the code. Once after an XDP program has loaded and once after the umem is registered. This take care of both cases: XDP program gets loaded first then AF_XDP socket is created, and the reverse, AF_XDP socket is created first, then XDP program is loaded. Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ixgbe: fix older devices that do not support IXGBE_MRQC_L3L4TXSWENJeff Kirsher2019-02-211-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The enabling L3/L4 filtering for transmit switched packets for all devices caused unforeseen issue on older devices when trying to send UDP traffic in an ordered sequence. This bit was originally intended for X550 devices, which supported this feature, so limit the scope of this bit to only X550 devices. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
| * | net: marvell: mvneta: fix DMA debug warningRussell King2019-02-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Booting 4.20 on SolidRun Clearfog issues this warning with DMA API debug enabled: WARNING: CPU: 0 PID: 555 at kernel/dma/debug.c:1230 check_sync+0x514/0x5bc mvneta f1070000.ethernet: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000002dd7dc00] [size=240 bytes] Modules linked in: ahci mv88e6xxx dsa_core xhci_plat_hcd xhci_hcd devlink armada_thermal marvell_cesa des_generic ehci_orion phy_armada38x_comphy mcp3021 spi_orion evbug sfp mdio_i2c ip_tables x_tables CPU: 0 PID: 555 Comm: bridge-network- Not tainted 4.20.0+ #291 Hardware name: Marvell Armada 380/385 (Device Tree) [<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14) [<c0014888>] (show_stack) from [<c07f54e0>] (dump_stack+0x9c/0xd4) [<c07f54e0>] (dump_stack) from [<c00312bc>] (__warn+0xf8/0x124) [<c00312bc>] (__warn) from [<c00313b0>] (warn_slowpath_fmt+0x38/0x48) [<c00313b0>] (warn_slowpath_fmt) from [<c00b0370>] (check_sync+0x514/0x5bc) [<c00b0370>] (check_sync) from [<c00b04f8>] (debug_dma_sync_single_range_for_cpu+0x6c/0x74) [<c00b04f8>] (debug_dma_sync_single_range_for_cpu) from [<c051bd14>] (mvneta_poll+0x298/0xf58) [<c051bd14>] (mvneta_poll) from [<c0656194>] (net_rx_action+0x128/0x424) [<c0656194>] (net_rx_action) from [<c000a230>] (__do_softirq+0xf0/0x540) [<c000a230>] (__do_softirq) from [<c00386e0>] (irq_exit+0x124/0x144) [<c00386e0>] (irq_exit) from [<c009b5e0>] (__handle_domain_irq+0x58/0xb0) [<c009b5e0>] (__handle_domain_irq) from [<c03a63c4>] (gic_handle_irq+0x48/0x98) [<c03a63c4>] (gic_handle_irq) from [<c0009a10>] (__irq_svc+0x70/0x98) ... This appears to be caused by mvneta_rx_hwbm() calling dma_sync_single_range_for_cpu() with the wrong struct device pointer, as the buffer manager device pointer is used to map and unmap the buffer. Fix this. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge tag 'scsi-fixes' of ↵Linus Torvalds2019-02-234-3/+14
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Four small fixes: three in drivers and one in the core. The core fix is also minor in scope since the bug it fixes is only known to affect systems using SCSI reservations. Of the driver bugs, the libsas one is the most major because it can lead to multiple disks on the same expander not being exposed" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: core: reset host byte in DID_NEXUS_FAILURE case scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task
| * | | scsi: core: reset host byte in DID_NEXUS_FAILURE caseMartin Wilck2019-02-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to 4.12, __scsi_error_from_host_byte() would reset the host byte to DID_OK for various cases including DID_NEXUS_FAILURE. Commit 2a842acab109 ("block: introduce new block status code type") replaced this function with scsi_result_to_blk_status() and removed the host-byte resetting code for the DID_NEXUS_FAILURE case. As the line set_host_byte(cmd, DID_OK) was preserved for the other cases, I suppose this was an editing mistake. The fact that the host byte remains set after 4.13 is causing problems with the sg_persist tool, which now returns success rather then exit status 24 when a RESERVATION CONFLICT error is encountered. Fixes: 2a842acab109 "block: introduce new block status code type" Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attachedJohn Garry2019-02-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sysfs phy_identifier attribute for a sas_end_device comes from the rphy phy_identifier value. Currently this is not being set for rphys with an end device attached, so we see incorrect symlinks from systemd disk/by-path: root@localhost:~# ls -l /dev/disk/by-path/ total 0 lrwxrwxrwx 1 root root 9 Feb 13 12:26 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy0-lun-0 -> ../../sdb lrwxrwxrwx 1 root root 10 Feb 13 12:26 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy0-lun-0-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Feb 13 12:26 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy0-lun-0-part2 -> ../../sdb2 lrwxrwxrwx 1 root root 10 Feb 13 12:26 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy0-lun-0-part3 -> ../../sdc3 Indeed, each sas_end_device phy_identifier value is 0: root@localhost:/# more sys/class/sas_device/end_device-0\:0\:2/phy_identifier 0 root@localhost:/# more sys/class/sas_device/end_device-0\:0\:10/phy_identifier 0 This patch fixes the discovery code to set the phy_identifier. With this, we now get proper symlinks: root@localhost:~# ls -l /dev/disk/by-path/ total 0 lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy10-lun-0 -> ../../sdg lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy11-lun-0 -> ../../sdh lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy2-lun-0 -> ../../sda lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy2-lun-0-part1 -> ../../sda1 lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy3-lun-0 -> ../../sdb lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy3-lun-0-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy3-lun-0-part2 -> ../../sdb2 lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy4-lun-0 -> ../../sdc lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy4-lun-0-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy4-lun-0-part2 -> ../../sdc2 lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy4-lun-0-part3 -> ../../sdc3 lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy5-lun-0 -> ../../sdd lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy7-lun-0 -> ../../sde lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy7-lun-0-part1 -> ../../sde1 lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy7-lun-0-part2 -> ../../sde2 lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy7-lun-0-part3 -> ../../sde3 lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy8-lun-0 -> ../../sdf lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy8-lun-0-part1 -> ../../sdf1 lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy8-lun-0-part2 -> ../../sdf2 lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy8-lun-0-part3 -> ../../sdf3 Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver") Reported-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Tested-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocationMasato Suzuki2019-02-161-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function sd_zbc_do_report_zones() issues a REPORT ZONES command with a buffer size calculated based on the number of zones requested by the caller. This value should however not exceed the capabilities of the hardware maximum command size, that is, should not exceed the max_hw_sectors limit of the device. This problem leads to failures of report zones commands when re-validating disks with some SAS HBAs. Fix this by limiting a report zone command buffer size to the minimum of the device max_hw_sectors and calculated value based on the requested number of zones. This does not change the semantic of the report_zones file operation as report zones can always return less zone reports than requested. Short reports are handled using a loop execution of the report_zones file operation in the function blk_report_zones(). [Damien] Before patch 'e76239a3748c ("block: add a report_zones method")', report zones buffer allocation was limited to max_sectors when allocated in blk_report_zones(). This however does not consider the actual format of the device reply which is interface dependent. Limiting the allocation based on the size of the expected reply format rather than the size of the array of generic sturct blkzone passed by blk_report_zones() makes more sense. Fixes: e76239a3748c ("block: add a report_zones method") Cc: stable@vger.kernel.org Signed-off-by: Masato Suzuki <masato.suzuki@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_taskAnoob Soman2019-02-161-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a target sends Check Condition, whilst initiator is busy xmiting re-queued data, could lead to race between iscsi_complete_task() and iscsi_xmit_task() and eventually crashing with the following kernel backtrace. [3326150.987523] ALERT: BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 [3326150.987549] ALERT: IP: [<ffffffffa05ce70d>] iscsi_xmit_task+0x2d/0xc0 [libiscsi] [3326150.987571] WARN: PGD 569c8067 PUD 569c9067 PMD 0 [3326150.987582] WARN: Oops: 0002 [#1] SMP [3326150.987593] WARN: Modules linked in: tun nfsv3 nfs fscache dm_round_robin [3326150.987762] WARN: CPU: 2 PID: 8399 Comm: kworker/u32:1 Tainted: G O 4.4.0+2 #1 [3326150.987774] WARN: Hardware name: Dell Inc. PowerEdge R720/0W7JN5, BIOS 2.5.4 01/22/2016 [3326150.987790] WARN: Workqueue: iscsi_q_13 iscsi_xmitworker [libiscsi] [3326150.987799] WARN: task: ffff8801d50f3800 ti: ffff8801f5458000 task.ti: ffff8801f5458000 [3326150.987810] WARN: RIP: e030:[<ffffffffa05ce70d>] [<ffffffffa05ce70d>] iscsi_xmit_task+0x2d/0xc0 [libiscsi] [3326150.987825] WARN: RSP: e02b:ffff8801f545bdb0 EFLAGS: 00010246 [3326150.987831] WARN: RAX: 00000000ffffffc3 RBX: ffff880282d2ab20 RCX: ffff88026b6ac480 [3326150.987842] WARN: RDX: 0000000000000000 RSI: 00000000fffffe01 RDI: ffff880282d2ab20 [3326150.987852] WARN: RBP: ffff8801f545bdc8 R08: 0000000000000000 R09: 0000000000000008 [3326150.987862] WARN: R10: 0000000000000000 R11: 000000000000fe88 R12: 0000000000000000 [3326150.987872] WARN: R13: ffff880282d2abe8 R14: ffff880282d2abd8 R15: ffff880282d2ac08 [3326150.987890] WARN: FS: 00007f5a866b4840(0000) GS:ffff88028a640000(0000) knlGS:0000000000000000 [3326150.987900] WARN: CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [3326150.987907] WARN: CR2: 0000000000000078 CR3: 0000000070244000 CR4: 0000000000042660 [3326150.987918] WARN: Stack: [3326150.987924] WARN: ffff880282d2ad58 ffff880282d2ab20 ffff880282d2abe8 ffff8801f545be18 [3326150.987938] WARN: ffffffffa05cea90 ffff880282d2abf8 ffff88026b59cc80 ffff88026b59cc00 [3326150.987951] WARN: ffff88022acf32c0 ffff880289491800 ffff880255a80800 0000000000000400 [3326150.987964] WARN: Call Trace: [3326150.987975] WARN: [<ffffffffa05cea90>] iscsi_xmitworker+0x2f0/0x360 [libiscsi] [3326150.987988] WARN: [<ffffffff8108862c>] process_one_work+0x1fc/0x3b0 [3326150.987997] WARN: [<ffffffff81088f95>] worker_thread+0x2a5/0x470 [3326150.988006] WARN: [<ffffffff8159cad8>] ? __schedule+0x648/0x870 [3326150.988015] WARN: [<ffffffff81088cf0>] ? rescuer_thread+0x300/0x300 [3326150.988023] WARN: [<ffffffff8108ddf5>] kthread+0xd5/0xe0 [3326150.988031] WARN: [<ffffffff8108dd20>] ? kthread_stop+0x110/0x110 [3326150.988040] WARN: [<ffffffff815a0bcf>] ret_from_fork+0x3f/0x70 [3326150.988048] WARN: [<ffffffff8108dd20>] ? kthread_stop+0x110/0x110 [3326150.988127] ALERT: RIP [<ffffffffa05ce70d>] iscsi_xmit_task+0x2d/0xc0 [libiscsi] [3326150.988138] WARN: RSP <ffff8801f545bdb0> [3326150.988144] WARN: CR2: 0000000000000078 [3326151.020366] WARN: ---[ end trace 1c60974d4678d81b ]--- Commit 6f8830f5bbab ("scsi: libiscsi: add lock around task lists to fix list corruption regression") introduced "taskqueuelock" to fix list corruption during the race, but this wasn't enough. Re-setting of conn->task to NULL, could race with iscsi_xmit_task(). iscsi_complete_task() { .... if (conn->task == task) conn->task = NULL; } conn->task in iscsi_xmit_task() could be NULL and so will be task. __iscsi_get_task(task) will crash (NullPtr de-ref), trying to access refcount. iscsi_xmit_task() { struct iscsi_task *task = conn->task; __iscsi_get_task(task); } This commit will take extra conn->session->back_lock in iscsi_xmit_task() to ensure iscsi_xmit_task() waits for iscsi_complete_task(), if iscsi_complete_task() wins the race. If iscsi_xmit_task() wins the race, iscsi_xmit_task() increments task->refcount (__iscsi_get_task) ensuring iscsi_complete_task() will not iscsi_free_task(). Signed-off-by: Anoob Soman <anoob.soman@citrix.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | | | Merge tag 'pm-5.0' of ↵Linus Torvalds2019-02-232-2/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix a regression in the PM-runtime framework introduced by the recent switch-over of it to using hrtimers and a use-after-free introduced by one of the recent changes in the scmi-cpufreq driver. Specifics: - Use hrtimer_try_to_cancel() instead of hrtimer_cancel() in the PM-runtime framework to avoid a possible timer-related deadlock introduced recently (Vincent Guittot). - Reorder the scmi-cpufreq driver code to avoid accessing memory that has just been freed (Yangtao Li)" * tag 'pm-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM-runtime: Fix deadlock when canceling hrtimer cpufreq: scmi: Fix use-after-free in scmi_cpufreq_exit()
| * \ \ \ Merge branch 'pm-cpufreq-fixes'Rafael J. Wysocki2019-02-221-1/+1
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | * pm-cpufreq-fixes: cpufreq: scmi: Fix use-after-free in scmi_cpufreq_exit()
| | * | | | cpufreq: scmi: Fix use-after-free in scmi_cpufreq_exit()Yangtao Li2019-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This issue was detected with the help of Coccinelle. So change the order of function calls to fix it. Fixes: 1690d8bb91e37 (cpufreq: scpi/scmi: Fix freeing of dynamic OPPs) Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Cc: 4.20+ <stable@vger.kernel.org> # 4.20+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | PM-runtime: Fix deadlock when canceling hrtimerVincent Guittot2019-02-211-1/+1
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When rpm_resume() desactivates the autosuspend timer, it should only try to cancel hrtimer but not wait for the handler to finish, because both rpm_resume() and pm_suspend_timer_fn() take the power.lock. A deadlock is possible as follows: CPU0 CPU1 rpm_resume() spin_lock_irqsave pm_suspend_timer_fn() spin_lock_irqsave pm_runtime_deactivate_timer() hrtimer_cancel() It is sufficient to call hrtimer_try_to_cancel() from pm_runtime_deactivate_timer(), because dev->power.timer_expires reset to 0 by it, so use that function instead of hrtimer_cancel(). Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers") Reported-by: Sunzhaosheng Sun(Zhaosheng) <sunzhaosheng@hisilicon.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | | Merge tag 'drm-fixes-2019-02-22' of git://anongit.freedesktop.org/drm/drmLinus Torvalds2019-02-2211-18/+47
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm fixes from Dave Airlie: "This contains a single i915 tiled display fix, and a set of amdgpu/radeon fixes. i915: - tiled display fix amdgpu/radeon: - runtime PM fix - bulk moves disable (fix is too large for 5.0) - a set of display fixes that are all cc'ed stable so we didn't want to leave them until -next" * tag 'drm-fixes-2019-02-22' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: disable bulk moves for now drm/amd/display: set clocks to 0 on suspend on dce80 drm/amd/display: fix optimize_bandwidth func pointer for dce80 drm/amd/display: Fix negative cursor pos programming drm/i915/fbdev: Actually configure untiled displays drm/amd/display: Raise dispclk value for dce11 drm/amd/display: Fix MST reboot/poweroff sequence drm/amdgpu: Update sdma golden setting for vega20 drm/amdgpu: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime gpu: drm: radeon: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime
| * \ \ \ \ Merge branch 'drm-fixes-5.0' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie2019-02-2210-13/+40
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into drm-fixes A bit bigger than normal for this week due to fixes for some long standing display issues that are bound for stable. These changes would be going to stable anyway, so I figured it was better via 5.0 than 5.1. - Several display fixes - Fix PX systems due to core changes in runtime pm - Disable bulk moves. They are fixed in 5.1, but fix is too invasive for 5.0 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190220225715.3240-1-alexander.deucher@amd.com
| | * | | | | drm/amdgpu: disable bulk moves for nowChristian König2019-02-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The changes to fix those are two invasive for backporting. Just disable the feature in 4.20 and 5.0. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Cc: <stable@vger.kernel.org> [4.20+] Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| | * | | | | drm/amd/display: set clocks to 0 on suspend on dce80Bhawanpreet Lakha2019-02-201-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [Why] When a dce80 asic was suspended, the clocks were not set to 0. Upon resume, the new clock was compared to the existing clock, they were found to be the same, and so the clock was not set. This resulted in a blackscreen. [How] In atomic commit, check to see if there are any active pipes. If no, set clocks to 0 Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
| | * | | | | drm/amd/display: fix optimize_bandwidth func pointer for dce80Bhawanpreet Lakha2019-02-202-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [Why] optimize_bandwidth was using dce100_prepare_bandwidth this is incorrect [How] change it to dce100_optimize_bandwidth Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
| | * | | | | drm/amd/display: Fix negative cursor pos programmingNicholas Kazlauskas2019-02-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [Why] If the cursor pos passed from DM is less than the plane_state->dst_rect top left corner then the unsigned cursor pos wraps around to a large positive number since cursor pos is a u32. There was an attempt to guard against this in hubp1_cursor_set_position by checking the src_x_offset and src_y_offset and offseting the cursor hotspot within hubp1_cursor_set_position. However, the cursor position itself is still being programmed incorrectly as a large value. This manifests itself visually as the cursor disappearing or containing strange artifacts near the middle of the screen on raven. [How] Don't subtract the destination rect top left corner from the pos but add it to the hotspot instead. This happens before the pos gets passed into hubp1_cursor_set_position. This achieves the same result but avoids the subtraction wrap around. With this fix the original cursor programming logic can be used again. Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Leo Li <sunpeng.li@amd.com> Acked-by: Murton Liu <Murton.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| | * | | | | drm/amd/display: Raise dispclk value for dce11Roman Li2019-02-191-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [Why] The visual corruption due to low display clock value. Observed on Carrizo 4K@60Hz. [How] There was earlier patch for dce_update_clocks: Adding +15% workaround also to to dce11_update_clocks Signed-off-by: Roman Li <Roman.Li@amd.com> Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| | * | | | | drm/amd/display: Fix MST reboot/poweroff sequenceLeo (Hanghong) Ma2019-02-191-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [Why] drm_dp_mst_topology_mgr_suspend() is added into the new reboot sequence, which disables the UP request at the beginning. Therefore sideband messages are blocked. [How] Finish MST sideband message transaction before UP request is suppressed. Signed-off-by: Leo (Hanghong) Ma <hanghong.ma@amd.com> Reviewed-by: Roman Li <Roman.Li@amd.com> Acked-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
| | * | | | | drm/amdgpu: Update sdma golden setting for vega20shaoyunl2019-02-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to hardware engineer, WRITE_BURST_LENGTH [9:8] in register SDMA0_CHICKEN_BITS need to change to 3 for better performance Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| | * | | | | drm/amdgpu: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtimeAlex Deucher2019-02-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on a similar patch from Rafael for radeon. When using ATPX to control dGPU power, the state is not retained across suspend and resume cycles by default. This can probably be loosened for Hybrid Graphics (_PR3) laptops where I think the state is properly retained. Fixes: c62ec4610c40 ("PM / core: Fix direct_complete handling for devices with no callbacks") Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org