-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for perf event arrays #4122
base: main
Are you sure you want to change the base?
Conversation
docs/PerfEventArray.md
Outdated
* | ||
* The *flags* are used to indicate the index in *map* for which | ||
* the value must be put, masked with **BPF_F_INDEX_MASK**. | ||
* Alternatively, *flags* can be set to **BPF_F_CURRENT_CPU** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what happens if BPF_F_CURRENT_CPU
is not passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linux requires you to pass BPF_F_CURRENT_CPU or manually specify the current CPU. Writing to any other CPU is not currently supported and returns an error. I've updated notes in a couple places to match this.
Restricting to the current CPU will make it easier to reduce or eliminate locking (for better perf), so I think we can start with that restriction.
https://docs.ebpf.io/linux/helper-function/bpf_perf_event_output/
docs/PerfEventArray.md
Outdated
* helper. | ||
* | ||
* On user space, a program willing to read the values needs to | ||
* call **perf_event_open**\ () on the perf event (either for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is perf_event_open
defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was from linux libbpf notes, and I've rewritten that section. On linux perf events exist independently of bpf, whereas on Windows perf event array only exists as a bpf map so perf_event_open isn't needed.
|
||
The plan is to implement perf buffers using the existing per-cpu and ring buffer maps support in ebpf-for-windows. | ||
|
||
To match linux behaviour, by default the callback will only be called inside calls to `perf_buffer__poll()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving this into the details in line 38 section 1. and/or consider clearly mentioning what the default behavior is on line 39.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added notes to clarify poll behavior.
docs/PerfEventArray.md
Outdated
|
||
1. Implement a new map type `BPF_MAP_TYPE_PERF_EVENT_ARRAY`. | ||
1. Support linux-compatible default behaviour (but supports only as subset of the perf event array features) | ||
2. Initially only support bpf programs as producers and user space as consumer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm - there's nothing in this proposal that prohibits or makes it difficult to expand km/um support in the future, is there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. It's a subset of the functionality, but I don't see any problems with extending this to support other Linux perf features.
docs/PerfEventArray.md
Outdated
2. Ring buffer maps support reserve and submit to separately allocate and then fill in the record | ||
3. For specific program types with a payload, perf event arrays can copy payload from the bpf context by | ||
putting the length to copy in the `BPF_F_CTXLEN_MASK` field of the flags | ||
- `perf_event_output` takes the bpf context as an argument and the helper implementation copies the payload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me know if I missed something, but I didn't see any section which outlines how this is implemented.
A couple of things come to mind:
- Can you clarify what the 'payload' is here? Is that basically a memcpy from the ctx structure itself, or is that a memcpy of some specific portion of the ctx (for example, the XDP ctx structure is more than just the packet payload. Is the 'payload' that gets used here the ctx structure, or the underlying packet)?
- Is this supposed to be implemented in the extension, or is this handled by the ebpf framework?
- If this is handled by the ebpf framework, can you confirm that the framework has enough information to extract out this buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extensions pass a ebpf_context_descriptor_t structure to the ebpf platform, which in turn has the offset into the context of data start/end. If the 'payload' in this context is always extracted from this data, then I think we would have enough information to implement this in the platform itself?
If this isn't the case, then we'd need to implement some implementations in the extensions themselves. We could consider a default implementation where the ebpf platform handles this based on the ebpf_context_descriptor_t , and allow for the extension to register a flag which indicates that it needs to implement the additional payload piece of this helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It copies whatever is at the context data pointer, so we can implement it in the core using the context pointer passed by the program plus the data pointer offset in ebpf_context_descriptor_t from the extension.
I added some notes clarifying this.
docs/PerfEventArray.md
Outdated
- We are just looking at transfering variable sized records between kernel and user space (not other linux perf features) | ||
1. Perf event arrays are per-cpu, whereas ring buffers are a single shared buffer | ||
2. Ring buffer maps support reserve and submit to separately allocate and then fill in the record | ||
3. For specific program types with a payload, perf event arrays can copy payload from the bpf context by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which specific program types that we currently support on windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note on this - the initial plan will be to support perf_event_output for any program type, and for any program type where the program context includes a data pointer support the CTXLEN field in the flags (instead of supporting specific program types).
docs/PerfEventArray.md
Outdated
|
||
# Proposal | ||
|
||
The proposed behaviour matches linux (except the auto callback feature when set), but currently only supports user-space consumers and bpf-program producers with a subset of the features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what features are not part of the proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linux perf is a whole kernel subsystem for performance monitoring and tracing which includes a bpf interface, whereas the current proposal is just for the kernel-to-userspace ringbuffer functionality of the perf array map.
Attaching bpf programs to perf events, perf counters, sending events from user space to kernel, and non-bpf perf events (e.g. hardware events, process tracing events) are the main bpf features not currently planned. It should be possible to add them in the future though by adding the additional features and API functions.
Description
Adds proposal for perf event array maps.
Testing
Do any existing tests cover this change? Are new tests needed?
Documentation
Is there any documentation impact for this change?
Installation
Is there any installer impact for this change?