libbpf: perfbuf: Add API to get the ring buffer

Add support for writing a custom event reader, by exposing the ring buffer. With the new API perf_buffer__buffer() you will get access to the raw mmaped()'ed per-cpu underlying memory of the ring buffer. This region contains both the perf buffer data and header (struct perf_event_mmap_page), which manages the ring buffer state (head/tail positions, when accessing the head/tail position it's important to take into consideration SMP). With this type of low level access one can implement different types of consumers here are few simple examples where this API helps with: 1. perf_event_read_simple is allocating using malloc, perhaps you want to handle the wrap-around in some other way. 2. Since perf buf is per-cpu then the order of the events is not guarnteed, for example: Given 3 events where each event has a timestamp t0 < t1 < t2, and the events are spread on more than 1 CPU, then we can end up with the following state in the ring buf: CPU[0] => [t0, t2] CPU[1] => [t1] When you consume the events from CPU[0], you could know there is a t1 missing, (assuming there are no drops, and your event data contains a sequential index). So now one can simply do the following, for CPU[0], you can store the address of t0 and t2 in an array (without moving the tail, so there data is not perished) then move on the CPU[1] and set the address of t1 in the same array. So you end up with something like: void **arr[] = [&t0, &t1, &t2], now you can consume it orderely and move the tails as you process in order. 3. Assuming there are multiple CPUs and we want to start draining the messages from them, then we can "pick" with which one to start with according to the remaining free space in the ring buffer. Signed-off-by: Jon Doron <jond@wiz.io> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220715181122.149224-1-arilou@gmail.com
author: Jon Doron <jond@wiz.io> 2022-07-15 20:11:22 +0200
committer: Andrii Nakryiko <andrii@kernel.org> 2022-07-15 21:53:22 +0200
commit: 9ff5efdeb0897547cffc275e88d2a55462d9b08e (patch)
tree: 88ef6c3780e7fbcc5258cb0b4b8585e1bdffe0a9 /tools/lib
parent: Merge branch 'Use lightweigt version of bpftool' (diff)
download: linux-9ff5efdeb0897547cffc275e88d2a55462d9b08e.tar.xz
linux-9ff5efdeb0897547cffc275e88d2a55462d9b08e.zip
3 files changed, 33 insertions, 0 deletions
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 68da1aca406c..77ae83308199 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -11734,6 +11734,22 @@ int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx)
 	return cpu_buf->fd;
 }
 
+int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf, size_t *buf_size)
+{
+	struct perf_cpu_buf *cpu_buf;
+
+	if (buf_idx >= pb->cpu_cnt)
+		return libbpf_err(-EINVAL);
+
+	cpu_buf = pb->cpu_bufs[buf_idx];
+	if (!cpu_buf)
+		return libbpf_err(-ENOENT);
+
+	*buf = cpu_buf->base;
+	*buf_size = pb->mmap_size;
+	return 0;
+}
+
 /*
  * Consume data from perf ring buffer corresponding to slot *buf_idx* in
  * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index e4d5353f757b..c51d6e6b3066 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1053,6 +1053,22 @@ LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
 LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx);
 LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb);
 LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx);
+/**
+ * @brief **perf_buffer__buffer()** returns the per-cpu raw mmap()'ed underlying
+ * memory region of the ring buffer.
+ * This ring buffer can be used to implement a custom events consumer.
+ * The ring buffer starts with the *struct perf_event_mmap_page*, which
+ * holds the ring buffer managment fields, when accessing the header
+ * structure it's important to be SMP aware.
+ * You can refer to *perf_event_read_simple* for a simple example.
+ * @param pb the perf buffer structure
+ * @param buf_idx the buffer index to retreive
+ * @param buf (out) gets the base pointer of the mmap()'ed memory
+ * @param buf_size (out) gets the size of the mmap()'ed region
+ * @return 0 on success, negative error code for failure
+ */
+LIBBPF_API int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
+				   size_t *buf_size);
 
 struct bpf_prog_linfo;
 struct bpf_prog_info;
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 94b589ecfeaa..4c4c40b8f935 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -362,4 +362,5 @@ LIBBPF_1.0.0 {
 		libbpf_bpf_link_type_str;
 		libbpf_bpf_map_type_str;
 		libbpf_bpf_prog_type_str;
+		perf_buffer__buffer;
 };
author	Jon Doron <jond@wiz.io>	2022-07-15 20:11:22 +0200
committer	Andrii Nakryiko <andrii@kernel.org>	2022-07-15 21:53:22 +0200
commit	9ff5efdeb0897547cffc275e88d2a55462d9b08e (patch)
tree	88ef6c3780e7fbcc5258cb0b4b8585e1bdffe0a9 /tools/lib
parent	Merge branch 'Use lightweigt version of bpftool' (diff)
download	linux-9ff5efdeb0897547cffc275e88d2a55462d9b08e.tar.xz linux-9ff5efdeb0897547cffc275e88d2a55462d9b08e.zip