Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ourlogs): Allow log ingestion behind a flag #4448

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

k-fish
Copy link
Member

@k-fish k-fish commented Jan 15, 2025

This adds log ingestion (currently only working for the OTel log format) behind a feature flag 'organizations:ourlogs-ingestion'.

This PR aims to be the minimum possible to support local and test-org ingestion before we move to dogfooding.

Screenshot 2025-01-15 at 1 26 49 PM

Other notes:

  • We need to add two DataCategorys because we need to track quantity (for current discarded breadcrumb client outcome tracking) and also bytes for total log bytes ingested, which is one of the quota recommendations.
  • Eventually we will convert Breadcrumbs into logs as well, very similar to span extraction for spans on the event. How exactly that will work is still being discussed with product and sdk folks.
  • The name ourlogs is an internal name to disambiguate between 'our log product' logs (corny, I know) and internally created logs. User facing strings will be set to 'Log' to avoid exposing implementation details.
  • Depends on flag(ourlogs): Add product feature flag (frontend/backend) sentry#81930 for the ingest feature flag.

This adds log ingestion (currently only working for the OTel log format) behind a feature flag 'organizations:ourlogs-ingestion'.

This PR aims to be the minimum possible to support local and test-org ingestion before we move to dogfooding.

Other notes:
- We need to add two DataCategories because we need to track quantity (for current discarded breadcrumb ClientOutcome tracking) and also bytes for total log bytes ingested, which is one of the quota recommendations.
- Eventually we will convert Breadcrumbs into logs as well, very similar to span extraction for spans on the event. How exactly that will work is still being discussed with product and sdk folks.
- The name 'ourlogs' is an internal name to disambiguate between 'our log product' logs and internally created logs. User facing strings will be set to 'Log' to avoid exposing implementation details.
@k-fish k-fish requested a review from a team as a code owner January 15, 2025 18:58
@k-fish k-fish force-pushed the feat/ourlogs/ingest-log-items branch from dc3eba0 to 3f6f6f9 Compare January 15, 2025 19:21
@jjbayer jjbayer self-assigned this Jan 16, 2025
relay-base-schema/src/data_category.rs Outdated Show resolved Hide resolved
relay-base-schema/src/data_category.rs Outdated Show resolved Hide resolved
relay-event-schema/src/protocol/ourlog.rs Outdated Show resolved Hide resolved
pub struct OurLog {
/// Time when the event occurred.
#[metastructure(required = true, trim = false)]
pub timestamp_nanos: Annotated<u64>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other data types accept both unix timestamps and formatted date strings, although they do not have nanosecond precision:

/// Timestamp when the span was ended.
#[metastructure(required = true, trim = false)]
pub timestamp: Annotated<Timestamp>,

Are nanos required for logs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTel defines them as nanos, the consumers will consume nanos, and they are stored as nanos, we may need breadcrumbs to switch from floats to nanos but otherwise I think we are leaning towards keeping it the same format throughout instead of having slightly different intermediate formats.

relay-event-schema/src/protocol/ourlog.rs Outdated Show resolved Hide resolved
relay-server/src/services/processor/ourlog.rs Outdated Show resolved Hide resolved
relay-server/src/services/processor/ourlog.rs Outdated Show resolved Hide resolved
relay-server/src/services/processor.rs Show resolved Hide resolved
relay-server/src/services/store.rs Outdated Show resolved Hide resolved
relay-server/src/utils/rate_limits.rs Outdated Show resolved Hide resolved
@k-fish k-fish changed the base branch from master to feat/ourlogs/add-data-categories January 16, 2025 17:32
@k-fish k-fish changed the base branch from feat/ourlogs/add-data-categories to master January 17, 2025 03:06
Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! It would be nice to have at least one integration test, see for example

def test_span_ingestion(
.

relay-server/src/services/processor/ourlog.rs Outdated Show resolved Hide resolved
relay-server/src/services/processor/ourlog.rs Outdated Show resolved Hide resolved
Comment on lines +969 to +987
// We need to track the count and bytes separately for possible rate limits and quotas on both counts and bytes.
self.outcome_aggregator.send(TrackOutcome {
category: DataCategory::LogItem,
event_id: None,
outcome: Outcome::Accepted,
quantity: 1,
remote_addr: None,
scoping,
timestamp: received_at,
});
self.outcome_aggregator.send(TrackOutcome {
category: DataCategory::LogByte,
event_id: None,
outcome: Outcome::Accepted,
quantity: payload_len as u32,
remote_addr: None,
scoping,
timestamp: received_at,
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of producing "accepted" outcomes in Relay. Ideally we would produce them as close to storage as possible, e.g. after storing the data in Snuba, so that we do not overcount items when data gets lost between Relay and storage. For spans, we moved the "accepted" outcomes to Relay because there was simply no consumer in sentry that could do it, and there was no precedence for creating outcomes in Snuba itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has that changed now? Is there prior art to doing it in a snuba consumer etc.? I was actually thinking about that while adding them here because of the possibility of rejecting the item later on.

relay-server/src/services/store.rs Outdated Show resolved Hide resolved
relay-ourlogs/src/ourlog.rs Outdated Show resolved Hide resolved
relay-server/src/services/store.rs Outdated Show resolved Hide resolved
relay-event-schema/src/protocol/ourlog.rs Outdated Show resolved Hide resolved
relay-event-schema/src/protocol/ourlog.rs Show resolved Hide resolved
k-fish added a commit that referenced this pull request Jan 17, 2025
### Summary
This is pulling out data categories from #4448 as their own PR.

---------

Co-authored-by: Joris Bayer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants