Proposal for perf event arrays #4122

mikeagun · 2025-01-10T00:01:06Z

Description

Adds proposal for perf event array maps.

Testing

Do any existing tests cover this change? Are new tests needed?

Documentation

Is there any documentation impact for this change?

Installation

Is there any installer impact for this change?

docs/PerfEventArray.md

shankarseal · 2025-01-11T00:22:20Z

docs/PerfEventArray.md

+ *
+ * 		The *flags* are used to indicate the index in *map* for which
+ * 		the value must be put, masked with **BPF_F_INDEX_MASK**.
+ * 		Alternatively, *flags* can be set to **BPF_F_CURRENT_CPU**


Curious what happens if BPF_F_CURRENT_CPU is not passed.

Linux requires you to pass BPF_F_CURRENT_CPU or manually specify the current CPU. Writing to any other CPU is not currently supported and returns an error. I've updated notes in a couple places to match this.

Restricting to the current CPU will make it easier to reduce or eliminate locking (for better perf), so I think we can start with that restriction.

https://docs.ebpf.io/linux/helper-function/bpf_perf_event_output/

shankarseal · 2025-01-11T00:23:53Z

docs/PerfEventArray.md

+ * 		helper.
+ *
+ * 		On user space, a program willing to read the values needs to
+ * 		call **perf_event_open**\ () on the perf event (either for


Where is perf_event_open defined?

That was from linux libbpf notes, and I've rewritten that section. On linux perf events exist independently of bpf, whereas on Windows perf event array only exists as a bpf map so perf_event_open isn't needed.

docs/PerfEventArray.md

matthewige · 2025-01-11T00:28:00Z

docs/PerfEventArray.md

+
+The plan is to implement perf buffers using the existing per-cpu and ring buffer maps support in ebpf-for-windows.
+
+To match linux behaviour, by default the callback will only be called inside calls to `perf_buffer__poll()`.


Consider moving this into the details in line 38 section 1. and/or consider clearly mentioning what the default behavior is on line 39.

Added notes to clarify poll behavior.

matthewige · 2025-01-11T00:30:03Z

docs/PerfEventArray.md

+
+1. Implement a new map type `BPF_MAP_TYPE_PERF_EVENT_ARRAY`.
+    1. Support linux-compatible default behaviour (but supports only as subset of the perf event array features)
+    2. Initially only support bpf programs as producers and user space as consumer


To confirm - there's nothing in this proposal that prohibits or makes it difficult to expand km/um support in the future, is there?

Correct. It's a subset of the functionality, but I don't see any problems with extending this to support other Linux perf features.

matthewige · 2025-01-11T00:40:12Z

docs/PerfEventArray.md

+  2. Ring buffer maps support reserve and submit to separately allocate and then fill in the record
+  3. For specific program types with a payload, perf event arrays can copy payload from the bpf context by
+  putting the length to copy in the `BPF_F_CTXLEN_MASK` field of the flags
+      - `perf_event_output` takes the bpf context as an argument and the helper implementation copies the payload


Please let me know if I missed something, but I didn't see any section which outlines how this is implemented.

A couple of things come to mind:

Can you clarify what the 'payload' is here? Is that basically a memcpy from the ctx structure itself, or is that a memcpy of some specific portion of the ctx (for example, the XDP ctx structure is more than just the packet payload. Is the 'payload' that gets used here the ctx structure, or the underlying packet)?

Is this supposed to be implemented in the extension, or is this handled by the ebpf framework?

If this is handled by the ebpf framework, can you confirm that the framework has enough information to extract out this buffer?

Extensions pass a ebpf_context_descriptor_t structure to the ebpf platform, which in turn has the offset into the context of data start/end. If the 'payload' in this context is always extracted from this data, then I think we would have enough information to implement this in the platform itself?

If this isn't the case, then we'd need to implement some implementations in the extensions themselves. We could consider a default implementation where the ebpf platform handles this based on the ebpf_context_descriptor_t , and allow for the extension to register a flag which indicates that it needs to implement the additional payload piece of this helper.

It copies whatever is at the context data pointer, so we can implement it in the core using the context pointer passed by the program plus the data pointer offset in ebpf_context_descriptor_t from the extension.

I added some notes clarifying this.

dthaler · 2025-01-11T18:44:16Z

docs/PerfEventArray.md

+  - We are just looking at transfering variable sized records between kernel and user space (not other linux perf features)
+  1. Perf event arrays are per-cpu, whereas ring buffers are a single shared buffer
+  2. Ring buffer maps support reserve and submit to separately allocate and then fill in the record
+  3. For specific program types with a payload, perf event arrays can copy payload from the bpf context by


which specific program types that we currently support on windows?

Added a note on this - the initial plan will be to support perf_event_output for any program type, and for any program type where the program context includes a data pointer support the CTXLEN field in the flags (instead of supporting specific program types).

dthaler · 2025-01-11T18:45:55Z

docs/PerfEventArray.md

+
+# Proposal
+
+The proposed behaviour matches linux (except the auto callback feature when set), but currently only supports user-space consumers and bpf-program producers with a subset of the features.


what features are not part of the proposal?

Linux perf is a whole kernel subsystem for performance monitoring and tracing which includes a bpf interface, whereas the current proposal is just for the kernel-to-userspace ringbuffer functionality of the perf array map.

Attaching bpf programs to perf events, perf counters, sending events from user space to kernel, and non-bpf perf events (e.g. hardware events, process tracing events) are the main bpf features not currently planned. It should be possible to add them in the future though by adding the additional features and API functions.

Michael Agun added 2 commits January 9, 2025 15:58

add perfbuf proposal

086494b

Add auto callback option to proposal.

bd39533

mikeagun marked this pull request as ready for review January 10, 2025 22:23

mikeagun requested review from dthaler, poornagmsft, Alan-Jowett, saxena-anurag, shankarseal, dv-msft, gtrevi, shpalani, matthewige, mtfriesen and rectified95 as code owners January 10, 2025 22:23

shankarseal reviewed Jan 11, 2025

View reviewed changes

docs/PerfEventArray.md Outdated Show resolved Hide resolved

shankarseal reviewed Jan 11, 2025

View reviewed changes

docs/PerfEventArray.md Outdated Show resolved Hide resolved

shankarseal reviewed Jan 11, 2025

View reviewed changes

docs/PerfEventArray.md Outdated Show resolved Hide resolved

shankarseal reviewed Jan 11, 2025

View reviewed changes

matthewige reviewed Jan 11, 2025

View reviewed changes

dthaler reviewed Jan 11, 2025

View reviewed changes

Michael Agun added 5 commits January 15, 2025 10:19

PR feedback and fix C api comments for Windows

fceeaf7

add payload copy notes.

7e0a6dc

Fix notes on linux behaviour (can only write to current CPU).

e55e205

add note on auto callbacks to poll() documentation

995ab0a

Revise program types note (support any with data).

32b189e

mikeagun requested review from dthaler, shankarseal and matthewige January 16, 2025 23:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for perf event arrays #4122

Proposal for perf event arrays #4122

mikeagun commented Jan 10, 2025

shankarseal Jan 11, 2025

mikeagun Jan 15, 2025

shankarseal Jan 11, 2025

mikeagun Jan 15, 2025

matthewige Jan 11, 2025

mikeagun Jan 15, 2025

matthewige Jan 11, 2025

mikeagun Jan 15, 2025

matthewige Jan 11, 2025

matthewige Jan 11, 2025

mikeagun Jan 15, 2025

dthaler Jan 11, 2025

mikeagun Jan 16, 2025 •

edited

Loading

dthaler Jan 11, 2025

mikeagun Jan 15, 2025


		The plan is to implement perf buffers using the existing per-cpu and ring buffer maps support in ebpf-for-windows.

		To match linux behaviour, by default the callback will only be called inside calls to `perf_buffer__poll()`.


		# Proposal

		The proposed behaviour matches linux (except the auto callback feature when set), but currently only supports user-space consumers and bpf-program producers with a subset of the features.

Proposal for perf event arrays #4122

Are you sure you want to change the base?

Proposal for perf event arrays #4122

Conversation

mikeagun commented Jan 10, 2025

Description

Testing

Documentation

Installation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikeagun Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikeagun Jan 16, 2025 •

edited

Loading