| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
UDP tunnel sockets are always opened unbound to a specific device. This
patch allow the socket to be bound on a custom device, which
incidentally makes UDP tunnels VRF-aware if binding to an l3mdev.
Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com>
Tested-by: Amine Kherbouche <akherbouche@scaleway.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Many drivers load the device's firmware image during the initialization
flow either from the flash or from the disk. Currently this option is not
controlled by the user and the driver decides from where to load the
firmware image.
'fw_load_policy' gives the ability to control this option which allows the
user to choose between different loading policies supported by the driver.
This parameter can be useful while testing and/or debugging the device. For
example, testing a firmware bug fix.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
The userspace may need to control the carrier state.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the flowi* structures are used and memsetted by server functions
in critical path. Currently flowi_common has a couple of holes that
we can eliminate reordering the struct fields. As a side effect,
both flowi4 and flowi6 shrink by 8 bytes.
Before:
pahole -EC flowi_common
struct flowi_common {
// ...
/* size: 40, cachelines: 1, members: 10 */
/* sum members: 32, holes: 1, sum holes: 4 */
/* padding: 4 */
/* last cacheline: 40 bytes */
};
pahole -EC flowi6
struct flowi6 {
// ...
/* size: 88, cachelines: 2, members: 6 */
/* padding: 4 */
/* last cacheline: 24 bytes */
};
pahole -EC flowi4
struct flowi4 {
// ...
/* size: 56, cachelines: 1, members: 4 */
/* padding: 4 */
/* last cacheline: 56 bytes */
};
After:
struct flowi_common {
// ...
/* size: 32, cachelines: 1, members: 10 */
/* last cacheline: 32 bytes */
};
struct flowi6 {
// ...
/* size: 80, cachelines: 2, members: 6 */
/* padding: 4 */
/* last cacheline: 16 bytes */
};
struct flowi4 {
// ...
/* size: 48, cachelines: 1, members: 4 */
/* padding: 4 */
/* last cacheline: 48 bytes */
};
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
Most of the doorbelling entities are outside of the core module.
L2 queues, Roce queues, iscsi and fcoe all need to register.
Make the APIs available for these drivers.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
Add the database used to register doorbelling entities, and APIs for adding
and deleting entries, and logic for traversing the database and doorbelling
once on behalf of all entities.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
Most linux hosts never setup TCP MD5 keys. We can avoid a
cache line miss (accessing tp->md5ig_info) on RX and TX
using a jump label.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case GRO is not as efficient as it should be or disabled,
we might have a user thread trapped in __release_sock() while
softirq handler flood packets up to the point we have to drop.
This patch balances work done from user thread and softirq,
to give more chances to __release_sock() to complete its work
before new packets are added the the backlog.
This also helps if we receive many ACK packets, since GRO
does not aggregate them.
This patch brings ~60% throughput increase on a receiver
without GRO, but the spectacular gain is really on
1000x release_sock() latency reduction I have measured.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Jean-Louis Dupond reported poor iscsi TCP receive performance
that we tracked to backlog drops.
Apparently we fail to send window updates reflecting the
fact that we are under stress.
Note that we might lack a proper window increase when
backlog is fully processed, since __release_sock() clears
sk->sk_backlog.len _after_ all skbs have been processed.
This should not matter in practice. If we had a significant
load through socket backlog, we are in a dangerous
situation.
Reported-by: Jean-Louis Dupond <jean-louis@dupond.be>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Tested-by: Jean-Louis Dupond<jean-louis@dupond.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tell the compiler that most TCP flows are using SACK these days.
There is no need to add the unlikely() clause in tcp_is_reno(),
the compiler is able to infer it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trace events are already present for the receive entry points, to indicate
how the reception entered the stack.
This patch adds the corresponding exit trace events that will bound the
reception such that all events occurring between the entry and the exit
can be considered as part of the reception context. This greatly helps
for dependency and root cause analyses.
Without this, it is not possible with tracepoint instrumentation to
determine whether a sched_wakeup event following a netif_receive_skb
event is the result of the packet reception or a simple coincidence after
further processing by the thread. It is possible using other mechanisms
like kretprobes, but considering the "entry" points are already present,
it would be good to add the matching exit events.
In addition to linking packets with wakeups, the entry/exit event pair
can also be used to perform network stack latency analyses.
Signed-off-by: Geneviève Bastien <gbastien@versatic.net>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: David S. Miller <davem@davemloft.net>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> (tracing side)
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
There are no such structs flow_dissector_key_flow_vlan or
flow_dissector_key_flow_tags, the actual structs used are struct
flow_dissector_key_vlan and struct flow_dissector_key_tags. So correct the
comments against FLOW_DISSECTOR_KEY_VLAN, FLOW_DISSECTOR_KEY_FLOW_LABEL and
FLOW_DISSECTOR_KEY_CVLAN to refer to those.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Daniel Borkmann says:
====================
bpf-next 2018-11-30
The following pull-request contains BPF updates for your *net-next* tree.
(Getting out bit earlier this time to pull in a dependency from bpf.)
The main changes are:
1) Add libbpf ABI versioning and document API naming conventions
as well as ABI versioning process, from Andrey.
2) Add a new sk_msg_pop_data() helper for sk_msg based BPF
programs that is used in conjunction with sk_msg_push_data()
for adding / removing meta data to the msg data, from John.
3) Optimize convert_bpf_ld_abs() for 0 offset and fix various
lib and testsuite build failures on 32 bit, from David.
4) Make BPF prog dump for !JIT identical to how we dump subprogs
when JIT is in use, from Yonghong.
5) Rename btf_get_from_id() to make it more conform with libbpf
API naming conventions, from Martin.
6) Add a missing BPF kselftest config item, from Naresh.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This adds a BPF SK_MSG program helper so that we can pop data from a
msg. We use this to pop metadata from a previous push data call.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit 838e96904ff3 ("bpf: Introduce bpf_func_info")
added bpf func info support. The userspace is able
to get better ksym's for bpf programs with jit, and
is able to print out func prototypes.
For a program containing func-to-func calls, the existing
implementation returns user specified number of function
calls and BTF types if jit is enabled. If the jit is not
enabled, it only returns the type for the main function.
This is undesirable. Interpreter may still be used
and we should keep feature identical regardless of
whether jit is enabled or not.
This patch fixed this discrepancy.
Fixes: 838e96904ff3 ("bpf: Introduce bpf_func_info")
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | | |
Trivial conflict in net/core/filter.c, a locally computed
'sdif' is now an argument to the function.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull networking fixes from David Miller:
1) ARM64 JIT fixes for subprog handling from Daniel Borkmann.
2) Various sparc64 JIT bug fixes (fused branch convergance, frame
pointer usage detection logic, PSEODU call argument handling).
3) Fix to use BH locking in nf_conncount, from Taehee Yoo.
4) Fix race of TX skb freeing in ipheth driver, from Bernd Eckstein.
5) Handle return value of TX NAPI completion properly in lan743x
driver, from Bryan Whitehead.
6) MAC filter deletion in i40e driver clears wrong state bit, from
Lihong Yang.
7) Fix use after free in rionet driver, from Pan Bian.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
s390/qeth: fix length check in SNMP processing
net: hisilicon: remove unexpected free_netdev
rapidio/rionet: do not free skb before reading its length
i40e: fix kerneldoc for xsk methods
ixgbe: recognize 1000BaseLX SFP modules as 1Gbps
i40e: Fix deletion of MAC filters
igb: fix uninitialized variables
netfilter: nf_tables: deactivate expressions in rule replecement routine
lan743x: Enable driver to work with LAN7431
tipc: fix lockdep warning during node delete
lan743x: fix return value for lan743x_tx_napi_poll
net: via: via-velocity: fix spelling mistake "alignement" -> "alignment"
qed: fix spelling mistake "attnetion" -> "attention"
net: thunderx: fix NULL pointer dereference in nic_remove
sctp: increase sk_wmem_alloc when head->truesize is increased
firestream: fix spelling mistake: "Inititing" -> "Initializing"
net: phy: add workaround for issue where PHY driver doesn't bind to the device
usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2
sparc: Adjust bpf JIT prologue for PSEUDO calls.
bpf, doc: add entries of who looks over which jits
...
|
| | |\ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Disable BH while holding list spinlock in nf_conncount, from
Taehee Yoo.
2) List corruption in nf_conncount, also from Taehee.
3) Fix race that results in leaving around an empty list node in
nf_conncount, from Taehee Yoo.
4) Proper chain handling for inactive chains from the commit path,
from Florian Westphal. This includes a selftest for this.
5) Do duplicate rule handles when replacing rules, also from Florian.
6) Remove net_exit path in xt_RATEEST that results in splat, from Taehee.
7) Possible use-after-free in nft_compat when releasing extensions.
From Florian.
8) Memory leak in xt_hashlimit, from Taehee.
9) Call ip_vs_dst_notifier after ipv6_dev_notf, from Xin Long.
10) Fix cttimeout with udplite and gre, from Florian.
11) Preserve oif for IPv6 link-local generated traffic from mangle
table, from Alin Nastac.
12) Missing error handling in masquerade notifiers, from Taehee Yoo.
13) Use mutex to protect registration/unregistration of masquerade
extensions in order to prevent a race, from Taehee.
14) Incorrect condition check in tree_nodes_free(), also from Taehee.
15) Fix chain counter leak in rule replacement path, from Taehee.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
register_{netdevice/inetaddr/inet6addr}_notifier may return an error
value, this patch adds the code to handle these error paths.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
syzbot was able to trigger the WARN in cttimeout_default_get() by
passing UDPLITE as l4protocol. Alias UDPLITE to UDP, both use
same timeout values.
Furthermore, also fetch GRE timeouts. GRE is a bit more complicated,
as it still can be a module and its netns_proto_gre struct layout isn't
visible outside of the gre module. Can't move timeouts around, it
appears conntrack sysctl unregister assumes net_generic() returns
nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead.
A followup nf-next patch could make gre tracker be built-in as well
if needed, its not that large.
Last, make the WARN() mention the missing protocol value in case
anything else is missing.
Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com
Fixes: 8866df9264a3 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Make fetching of the BPF call address from ppc64 JIT generic. ppc64
was using a slightly different variant rather than through the insns'
imm field encoding as the target address would not fit into that space.
Therefore, the target subprog number was encoded into the insns' offset
and fetched through fp->aux->func[off]->bpf_func instead. Given there
are other JITs with this issue and the mechanism of fetching the address
is JIT-generic, move it into the core as a helper instead. On the JIT
side, we get information on whether the retrieved address is a fixed
one, that is, not changing through JIT passes, or a dynamic one. For
the former, JITs can optimize their imm emission because this doesn't
change jump offsets throughout JIT process.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Tested-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Pull dma-mapping fixes from Christoph Hellwig:
"Two dma-direct / swiotlb regressions fixes:
- zero is a valid physical address on some arm boards, we can't use
it as the error value
- don't try to cache flush the error return value (no matter what it
is)"
* tag 'dma-mapping-4.20-3' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: Skip cache maintenance on map error
dma-direct: Make DIRECT_MAPPING_ERROR viable for SWIOTLB
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
With the overflow buffer removed, we no longer have a unique address
which is guaranteed not to be a valid DMA target to use as an error
token. The DIRECT_MAPPING_ERROR value of 0 tries to at least represent
an unlikely DMA target, but unfortunately there are already SWIOTLB
users with DMA-able memory at physical address 0 which now gets falsely
treated as a mapping failure and leads to all manner of misbehaviour.
The best we can do to mitigate that is flip DIRECT_MAPPING_ERROR to the
other commonly-used error value of all-bits-set, since the last single
byte of memory is by far the least-likely-valid DMA target.
Fixes: dff8d6c1ed58 ("swiotlb: remove the overflow buffer")
Reported-by: John Stultz <john.stultz@linaro.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| |\ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Pull XArray updates from Matthew Wilcox:
"We found some bugs in the DAX conversion to XArray (and one bug which
predated the XArray conversion). There were a couple of bugs in some
of the higher-level functions, which aren't actually being called in
today's kernel, but surfaced as a result of converting existing radix
tree & IDR users over to the XArray.
Some of the other changes to how the higher-level APIs work were also
motivated by converting various users; again, they're not in use in
today's kernel, so changing them has a low probability of introducing
a bug.
Dan can still trigger a bug in the DAX code with hot-offline/online,
and we're working on tracking that down"
* tag 'xarray-4.20-rc4' of git://git.infradead.org/users/willy/linux-dax:
XArray tests: Add missing locking
dax: Avoid losing wakeup in dax_lock_mapping_entry
dax: Fix huge page faults
dax: Fix dax_unlock_mapping_entry for PMD pages
dax: Reinstate RCU protection of inode
dax: Make sure the unlocking entry isn't locked
dax: Remove optimisation from dax_lock_mapping_entry
XArray tests: Correct some 64-bit assumptions
XArray: Correct xa_store_range
XArray: Fix Documentation
XArray: Handle NULL pointers differently for allocation
XArray: Unify xa_store and __xa_store
XArray: Add xa_store_bh() and xa_store_irq()
XArray: Turn xa_erase into an exported function
XArray: Unify xa_cmpxchg and __xa_cmpxchg
XArray: Regularise xa_reserve
nilfs2: Use xa_erase_irq
XArray: Export __xa_foo to non-GPL modules
XArray: Fix xa_for_each with a single element at 0
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Minor fixes.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
These convenience wrappers disable interrupts while taking the spinlock.
A number of drivers would otherwise have to open-code these functions.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Make xa_erase() take the spinlock and then call __xa_erase(), but make
it out of line since it's such a common function.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
xa_cmpxchg() was one of the largest functions in the xarray
implementation. By turning it into a wrapper and having the callers
take the lock (like several other functions), we save 160 bytes on a
tinyconfig build and reduce the duplication in xarray.c.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The xa_reserve() function was a little unusual in that it attempted to
be callable for all kinds of locking scenarios. Make it look like the
other APIs with __xa_reserve, xa_reserve_bh and xa_reserve_irq variants.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
| |\ \ \ \ \ \
| | |_|_|/ / /
| |/| | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- revert of the high-resolution scrolling feature, as it breaks certain
hardware due to incompatibilities between Logitech and Microsoft
worlds. Peter Hutterer is working on a fixed implementation. Until
that is finished, revert by Benjamin Tissoires.
- revert of incorrect strncpy->strlcpy conversion in uhid, from David
Herrmann
- fix for buggy sendfile() implementation on uhid device node, from
Eric Biggers
- a few assorted device-ID specific quirks
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
Revert "Input: Add the `REL_WHEEL_HI_RES` event code"
Revert "HID: input: Create a utility class for counting scroll events"
Revert "HID: logitech: Add function to enable HID++ 1.0 "scrolling acceleration""
Revert "HID: logitech: Enable high-resolution scrolling on Logitech mice"
Revert "HID: logitech: Use LDJ_DEVICE macro for existing Logitech mice"
Revert "HID: logitech: fix a used uninitialized GCC warning"
Revert "HID: input: simplify/fix high-res scroll event handling"
HID: Add quirk for Primax PIXART OEM mice
HID: i2c-hid: Disable runtime PM for LG touchscreen
HID: multitouch: Add pointstick support for Cirque Touchpad
HID: steam: remove input device when a hid client is running.
Revert "HID: uhid: use strlcpy() instead of strncpy()"
HID: uhid: forbid UHID_CREATE under KERNEL_DS or elevated privileges
HID: input: Ignore battery reported by Symbol DS4308
HID: Add quirk for Microsoft PIXART OEM mouse
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This reverts commit aaf9978c3c0291ef3beaa97610bc9c3084656a85.
Quoting Peter:
There is a HID feature report called "Resolution Multiplier"
Described in the "Enhanced Wheel Support in Windows" doc and
the "USB HID Usage Tables" page 30.
http://download.microsoft.com/download/b/d/1/bd1f7ef4-7d72-419e-bc5c-9f79ad7bb66e/wheel.docx
https://www.usb.org/sites/default/files/documents/hut1_12v2.pdf
This was new for Windows Vista, so we're only a decade behind here. I only
accidentally found this a few days ago while debugging a stuck button on a
Microsoft mouse.
The docs above describe it like this: a wheel control by default sends
value 1 per notch. If the resolution multiplier is active, the wheel is
expected to send a value of $multiplier per notch (e.g. MS Sculpt mouse) or
just send events more often, i.e. for less physical motion (e.g. MS Comfort
mouse).
For the latter, you need the right HW of course. The Sculpt mouse has
tactile wheel clicks, so nothing really changes. The Comfort mouse has
continuous motion with no tactile clicks. Similar to the free-wheeling
Logitech mice but without any inertia.
Note that the doc also says that Vista and onwards *always* enable this
feature where available.
An example HID definition looks like this:
Usage Page Generic Desktop (0x01)
Usage Resolution Multiplier (0x48)
Logical Minimum 0
Logical Maximum 1
Physical Minimum 1
Physical Maximum 16
Report Size 2 # in bits
Report Count 1
Feature (Data, Var, Abs)
So the actual bits have values 0 or 1 and that reflects real values 1 or 16.
We've only seen single-bits so far, so there's low-res and hi-res, but
nothing in between.
The multiplier is available for HID usages "Wheel" and "AC Pan" (horiz wheel).
Microsoft suggests that
> Vendors should ship their devices with smooth scrolling disabled and allow
> Windows to enable it. This ensures that the device works like a regular HID
> device on legacy operating systems that do not support smooth scrolling.
(see the wheel doc linked above)
The mice that we tested so far do reset on unplug.
Device Support looks to be all (?) Microsoft mice but nothing else
Not supported:
- Logitech G500s, G303
- Roccat Kone XTD
- all the cheap Lenovo, HP, Dell, Logitech USB mice that come with a
workstation that I could find don't have it.
- Etekcity something something
- Razer Imperator
Supported:
- Microsoft Comfort Optical Mouse 3000 - yes, physical: 1:4
- Microsoft Sculpt Ergonomic Mouse - yes, physical: 1:12
- Microsoft Surface mouse - yes, physical: 1:4
So again, I think this is really just available on Microsoft mice, but
probably all decent MS mice released over the last decade.
Looking at the hardware itself:
- no noticeable notches in the weel
- low-res: 18 events per 360deg rotation (click angle 20 deg)
- high-res: 72 events per 360deg → matches multiplier of 4
- I can feel the notches during wheel turns
- low-res: 24 events per 360 deg rotation (click angle 15 deg)
- horiz wheel is tilt-based, continuous output value 1
- high-res: 24 events per 360deg with value 12 → matches multiplier of 12
- horiz wheel output rate doubles/triples?, values is 3
- It's a touch strip, not a wheel so no notches
- high-res: events have value 4 instead of 1
a bit strange given that it doesn't actually have notches.
Ok, why is this an issue for the current API? First, because the logitech
multiplier used in Harry's patches looks suspiciously like the Resolution
Multiplier so I think we should assume it's the same thing. Nestor, can you
shed some light on that?
- `REL_WHEEL` is defined as the number of notches, emulated where needed.
- `REL_WHEEL_HI_RES` is the movement of the user's finger in microns.
- `WM_MOUSEWHEEL` (Windows) is is a multiple of 120, defined as "the threshold
for action to be taken and one such action"
https://docs.microsoft.com/en-us/windows/desktop/inputdev/wm-mousewheel
If the multiplier is set to M, this means we need an accumulated value of M
until we can claim there was a wheel click. So after enabling the multiplier
and setting it to the maximum (like Windows):
- M units are 15deg rotation → 1 unit is 2620/M micron (see below). This is
the `REL_WHEEL_HI_RES` value.
- wheel diameter 20mm: 15 deg rotation is 2.62mm, 2620 micron (pi * 20mm /
(360deg/15deg))
- For every M units accumulated, send one `REL_WHEEL` event
The problem here is that we've now hardcoded 20mm/15 deg into the kernel and
we have no way of getting the size of the wheel or the click angle into the
kernel.
In userspace we now have to undo the kernel's calculation. If our click angle
is e.g. 20 degree we have to undo the (lossy) calculation from the kernel and
calculate the correct angle instead. This also means the 15 is a hardcoded
option forever and cannot be changed.
In hid-logitech-hidpp.c, the microns per unit is hardcoded per device.
Harry, did you measure those by hand? We'd need to update the kernel for
every device and there are 10 years worth of devices from MS alone.
The multiplier default is 8 which is in the right ballpark, so I'm pretty
sure this is the same as the Resolution Multiplier, just in HID++ lingo. And
given that the 120 magic factor is what Windows uses in the end, I can't
imagine Logitech rolling their own thing here. Nestor?
And we're already fairly inaccurate with the microns anyway. The MX Anywhere
2S has a click angle of 20 degrees (18 stops) and a 17mm wheel, so a wheel
notch is approximately 2.67mm, one event at multiplier 8 (1/8 of a notch)
would be 334 micron. That's only 80% of the fallback value of 406 in the
kernel. Multiplier 6 gives us 445micron (10% off). I'm assuming multiplier 7
doesn't exist because it's not a factor of 120.
Summary:
Best option may be to simply do what Windows is doing, all the HW manufacturers
have to use that approach after all. Switch `REL_WHEEL_HI_RES` to report in
fractions of 120, with 120 being one notch and divide that by the multiplier
for the actual events. So e.g. the Logitech multiplier 8 would send value 15
for each event in hi-res mode. This can be converted in userspace to
whatever userspace needs (combined with a hwdb there that tells you wheel
size/click angle/...).
Conflicts:
include/uapi/linux/input-event-codes.h -> I kept the new
reserved event in the code, so I had to adapt the revert
slightly
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Harry Cutts <hcutts@chromium.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This reverts commit 1ff2e1a44e02d4bdbb9be67c7d9acc240a67141f.
It turns out the current API is not that compatible with
some Microsoft mice, so better start again from scratch.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Harry Cutts <hcutts@chromium.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Like the previous patch, the goal is to ease to convert nsids from one
netns to another netns.
A new attribute (NETNSA_CURRENT_NSID) is added to the kernel answer when
NETNSA_TARGET_NSID is provided, thus the user can easily convert nsids.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Like it was done for link and address, add the ability to perform get/dump
in another netns by specifying a target nsid attribute.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Use the new boolopt API to add an option which disables learning from
link-local packets. The default is kept as before and learning is
enabled. This is a simple map from a boolopt bit to a bridge private
flag that is tested before learning.
v2: pass NULL for extack via sysfs
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
We have been adding many new bridge options, a big number of which are
boolean but still take up netlink attribute ids and waste space in the skb.
Recently we discussed learning from link-local packets[1] and decided
yet another new boolean option will be needed, thus introducing this API
to save some bridge nl space.
The API supports changing the value of multiple boolean options at once
via the br_boolopt_multi struct which has an optmask (which options to
set, bit per opt) and optval (options' new values). Future boolean
options will only be added to the br_boolopt_id enum and then will have
to be handled in br_boolopt_toggle/get. The API will automatically
add the ability to change and export them via netlink, sysfs can use the
single boolopt function versions to do the same. The behaviour with
failing/succeeding is the same as with normal netlink option changing.
If an option requires mapping to internal kernel flag or needs special
configuration to be enabled then it should be handled in
br_boolopt_toggle. It should also be able to retrieve an option's current
state via br_boolopt_get.
v2: WARN_ON() on unsupported option as that shouldn't be possible and
also will help catch people who add new options without handling
them for both set and get. Pass down extack so if an option desires
it could set it on error and be more user-friendly.
[1] https://www.spinics.net/lists/netdev/msg532698.html
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |_|_|_|_|/
|/| | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Add types and macros for packed ring.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-11-26
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Extend BTF to support function call types and improve the BPF
symbol handling with this info for kallsyms and bpftool program
dump to make debugging easier, from Martin and Yonghong.
2) Optimize LPM lookups by making longest_prefix_match() handle
multiple bytes at a time, from Eric.
3) Adds support for loading and attaching flow dissector BPF progs
from bpftool, from Stanislav.
4) Extend the sk_lookup() helper to be supported from XDP, from Nitin.
5) Enable verifier to support narrow context loads with offset > 0
to adapt to LLVM code generation (currently only offset of 0 was
supported). Add test cases as well, from Andrey.
6) Simplify passing device functions for offloaded BPF progs by
adding callbacks to bpf_prog_offload_ops instead of ndo_bpf.
Also convert nfp and netdevsim to make use of them, from Quentin.
7) Add support for sock_ops based BPF programs to send events to
the perf ring-buffer through perf_event_output helper, from
Sowmini and Daniel.
8) Add read / write support for skb->tstamp from tc BPF and cg BPF
programs to allow for supporting rate-limiting in EDT qdiscs
like fq from BPF side, from Vlad.
9) Extend libbpf API to support map in map types and add test cases
for it as well to BPF kselftests, from Nikita.
10) Account the maximum packet offset accessed by a BPF program in
the verifier and use it for optimizing nfp JIT, from Jiong.
11) Fix error handling regarding kprobe_events in BPF sample loader,
from Daniel T.
12) Add support for queue and stack map type in bpftool, from David.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This could be used to rate limit egress traffic in concert with a qdisc
which supports Earliest Departure Time, such as FQ.
Write access from cg skb progs only with CAP_SYS_ADMIN, since the value
will be used by downstream qdiscs. It might make sense to relax this.
Changes v1 -> v2:
- allow access from cg skb, write only with CAP_SYS_ADMIN
Signed-off-by: Vlad Dumitrescu <vladum@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Kernel test robot (lkp@intel.com) reports a compilation error at
https://www.spinics.net/lists/netdev/msg534913.html
introduced by commit 838e96904ff3 ("bpf: Introduce bpf_func_info").
If CONFIG_BPF is defined and CONFIG_BPF_SYSCALL is not defined,
the following error will appear:
kernel/bpf/core.c:414: undefined reference to `btf_type_by_id'
kernel/bpf/core.c:415: undefined reference to `btf_name_by_offset'
When CONFIG_BPF_SYSCALL is not defined,
let us define stub inline functions for btf_type_by_id()
and btf_name_by_offset() in include/linux/btf.h.
This way, the compilation failure can be avoided.
Fixes: 838e96904ff3 ("bpf: Introduce bpf_func_info")
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This patch added interface to load a program with the following
additional information:
. prog_btf_fd
. func_info, func_info_rec_size and func_info_cnt
where func_info will provide function range and type_id
corresponding to each function.
The func_info_rec_size is introduced in the UAPI to specify
struct bpf_func_info size passed from user space. This
intends to make bpf_func_info structure growable in the future.
If the kernel gets a different bpf_func_info size from userspace,
it will try to handle user request with part of bpf_func_info
it can understand. In this patch, kernel can understand
struct bpf_func_info {
__u32 insn_offset;
__u32 type_id;
};
If user passed a bpf func_info record size of 16 bytes, the
kernel can still handle part of records with the above definition.
If verifier agrees with function range provided by the user,
the bpf_prog ksym for each function will use the func name
provided in the type_id, which is supposed to provide better
encoding as it is not limited by 16 bytes program name
limitation and this is better for bpf program which contains
multiple subprograms.
The bpf_prog_info interface is also extended to
return btf_id, func_info, func_info_rec_size and func_info_cnt
to userspace, so userspace can print out the function prototype
for each xlated function. The insn_offset in the returned
func_info corresponds to the insn offset for xlated functions.
With other jit related fields in bpf_prog_info, userspace can also
print out function prototypes for each jited function.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This patch adds BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
to support the function debug info.
BTF_KIND_FUNC_PROTO must not have a name (i.e. !t->name_off)
and it is followed by >= 0 'struct bpf_param' objects to
describe the function arguments.
The BTF_KIND_FUNC must have a valid name and it must
refer back to a BTF_KIND_FUNC_PROTO.
The above is the conclusion after the discussion between
Edward Cree, Alexei, Daniel, Yonghong and Martin.
By combining BTF_KIND_FUNC and BTF_LIND_FUNC_PROTO,
a complete function signature can be obtained. It will be
used in the later patches to learn the function signature of
a running bpf program.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
BPF_F_QUERY_EFFECTIVE is in the middle of the flags valid
for BPF_MAP_CREATE. Move it to its own section to reduce confusion.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Add a new flag BPF_F_ZERO_SEED, which forces a hash map
to initialize the seed to zero. This is useful when doing
performance analysis both on individual BPF programs, as
well as the kernel's hash table implementation.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Currently BPF verifier allows narrow loads for a context field only with
offset zero. E.g. if there is a __u32 field then only the following
loads are permitted:
* off=0, size=1 (narrow);
* off=0, size=2 (narrow);
* off=0, size=4 (full).
On the other hand LLVM can generate a load with offset different than
zero that make sense from program logic point of view, but verifier
doesn't accept it.
E.g. tools/testing/selftests/bpf/sendmsg4_prog.c has code:
#define DST_IP4 0xC0A801FEU /* 192.168.1.254 */
...
if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) &&
where ctx is struct bpf_sock_addr.
Some versions of LLVM can produce the following byte code for it:
8: 71 12 07 00 00 00 00 00 r2 = *(u8 *)(r1 + 7)
9: 67 02 00 00 18 00 00 00 r2 <<= 24
10: 18 03 00 00 00 00 00 fe 00 00 00 00 00 00 00 00 r3 = 4261412864 ll
12: 5d 32 07 00 00 00 00 00 if r2 != r3 goto +7 <LBB0_6>
where `*(u8 *)(r1 + 7)` means narrow load for ctx->user_ip4 with size=1
and offset=3 (7 - sizeof(ctx->user_family) = 3). This load is currently
rejected by verifier.
Verifier code that rejects such loads is in bpf_ctx_narrow_access_ok()
what means any is_valid_access implementation, that uses the function,
works this way, e.g. bpf_skb_is_valid_access() for __sk_buff or
sock_addr_is_valid_access() for bpf_sock_addr.
The patch makes such loads supported. Offset can be in [0; size_default)
but has to be multiple of load size. E.g. for __u32 field the following
loads are supported now:
* off=0, size=1 (narrow);
* off=1, size=1 (narrow);
* off=2, size=1 (narrow);
* off=3, size=1 (narrow);
* off=0, size=2 (narrow);
* off=2, size=2 (narrow);
* off=0, size=4 (full).
Reported-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The kernel functions to prepare verifier and translate for offloaded
program retrieve "offload" from "prog", and "netdev" from "offload".
Then both "prog" and "netdev" are passed to the callbacks.
Simplify this by letting the drivers retrieve the net device themselves
from the offload object attached to prog - if they need it at all. There
is currently no need to pass the netdev as an argument to those
functions.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Function bpf_prog_offload_verifier_prep(), called from the kernel BPF
verifier to run a driver-specific callback for preparing for the
verification step for offloaded programs, takes a pointer to a struct
bpf_verifier_env object. However, no driver callback needs the whole
structure at this time: the two drivers supporting this, nfp and
netdevsim, only need a pointer to the struct bpf_prog instance held by
env.
Update the callback accordingly, on kernel side and in these two
drivers.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to program destruction to the struct and remove the
subcommand that was used to call them through the NDO.
Remove function __bpf_offload_ndo(), which is no longer used.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to code translation to the struct and remove the
subcommand that was used to call them through the NDO.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
In a way similar to the change previously brought to the verify_insn
hook and to the finalize callback, switch to the newly added ops in
struct bpf_prog_offload for calling the functions used to prepare driver
verifiers.
Since the dev_ops pointer in struct bpf_prog_offload is no longer used
by any callback, we can now remove it from struct bpf_prog_offload.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|