diff options
author | Kumar Kartikeya Dwivedi <memxor@gmail.com> | 2022-11-14 20:15:25 +0100 |
---|---|---|
committer | Alexei Starovoitov <ast@kernel.org> | 2022-11-15 06:52:45 +0100 |
commit | f0c5941ff5b255413d31425bb327c2aec3625673 (patch) | |
tree | 8fb73e169efa4876bb8f8bc0dceb51f1156ffa1a /kernel/bpf/syscall.c | |
parent | bpf: Fix copy_map_value, zero_map_value (diff) | |
download | linux-f0c5941ff5b255413d31425bb327c2aec3625673.tar.xz linux-f0c5941ff5b255413d31425bb327c2aec3625673.zip |
bpf: Support bpf_list_head in map values
Add the support on the map side to parse, recognize, verify, and build
metadata table for a new special field of the type struct bpf_list_head.
To parameterize the bpf_list_head for a certain value type and the
list_node member it will accept in that value type, we use BTF
declaration tags.
The definition of bpf_list_head in a map value will be done as follows:
struct foo {
struct bpf_list_node node;
int data;
};
struct map_value {
struct bpf_list_head head __contains(foo, node);
};
Then, the bpf_list_head only allows adding to the list 'head' using the
bpf_list_node 'node' for the type struct foo.
The 'contains' annotation is a BTF declaration tag composed of four
parts, "contains:name:node" where the name is then used to look up the
type in the map BTF, with its kind hardcoded to BTF_KIND_STRUCT during
the lookup. The node defines name of the member in this type that has
the type struct bpf_list_node, which is actually used for linking into
the linked list. For now, 'kind' part is hardcoded as struct.
This allows building intrusive linked lists in BPF, using container_of
to obtain pointer to entry, while being completely type safe from the
perspective of the verifier. The verifier knows exactly the type of the
nodes, and knows that list helpers return that type at some fixed offset
where the bpf_list_node member used for this list exists. The verifier
also uses this information to disallow adding types that are not
accepted by a certain list.
For now, no elements can be added to such lists. Support for that is
coming in future patches, hence draining and freeing items is done with
a TODO that will be resolved in a future patch.
Note that the bpf_list_head_free function moves the list out to a local
variable under the lock and releases it, doing the actual draining of
the list items outside the lock. While this helps with not holding the
lock for too long pessimizing other concurrent list operations, it is
also necessary for deadlock prevention: unless every function called in
the critical section would be notrace, a fentry/fexit program could
attach and call bpf_map_update_elem again on the map, leading to the
same lock being acquired if the key matches and lead to a deadlock.
While this requires some special effort on part of the BPF programmer to
trigger and is highly unlikely to occur in practice, it is always better
if we can avoid such a condition.
While notrace would prevent this, doing the draining outside the lock
has advantages of its own, hence it is used to also fix the deadlock
related problem.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221114191547.1694267-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Diffstat (limited to 'kernel/bpf/syscall.c')
-rw-r--r-- | kernel/bpf/syscall.c | 22 |
1 files changed, 21 insertions, 1 deletions
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 85532d301124..fdbae52f463f 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -536,6 +536,9 @@ void btf_record_free(struct btf_record *rec) module_put(rec->fields[i].kptr.module); btf_put(rec->fields[i].kptr.btf); break; + case BPF_LIST_HEAD: + /* Nothing to release for bpf_list_head */ + break; default: WARN_ON_ONCE(1); continue; @@ -578,6 +581,9 @@ struct btf_record *btf_record_dup(const struct btf_record *rec) goto free; } break; + case BPF_LIST_HEAD: + /* Nothing to acquire for bpf_list_head */ + break; default: ret = -EFAULT; WARN_ON_ONCE(1); @@ -637,6 +643,11 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj) case BPF_KPTR_REF: field->kptr.dtor((void *)xchg((unsigned long *)field_ptr, 0)); break; + case BPF_LIST_HEAD: + if (WARN_ON_ONCE(rec->spin_lock_off < 0)) + continue; + bpf_list_head_free(field, field_ptr, obj + rec->spin_lock_off); + break; default: WARN_ON_ONCE(1); continue; @@ -965,7 +976,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf, if (!value_type || value_size != map->value_size) return -EINVAL; - map->record = btf_parse_fields(btf, value_type, BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR, + map->record = btf_parse_fields(btf, value_type, + BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD, map->value_size); if (!IS_ERR_OR_NULL(map->record)) { int i; @@ -1012,6 +1024,14 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf, goto free_map_tab; } break; + case BPF_LIST_HEAD: + if (map->map_type != BPF_MAP_TYPE_HASH && + map->map_type != BPF_MAP_TYPE_LRU_HASH && + map->map_type != BPF_MAP_TYPE_ARRAY) { + ret = -EOPNOTSUPP; + goto free_map_tab; + } + break; default: /* Fail if map_type checks are missing for a field type */ ret = -EOPNOTSUPP; |