Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a command to get response body #856

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

OrKoN
Copy link
Contributor

@OrKoN OrKoN commented Jan 13, 2025

Closes #747


Preview | Diff

@OrKoN OrKoN force-pushed the orkon/get-response-body branch 2 times, most recently from 1ea00c4 to 9129bae Compare January 13, 2025 14:37
@jgraham
Copy link
Member

jgraham commented Jan 13, 2025

CC @juliandescottes who was also going to look at this. Very briefly, some high level things I think we should try to look at:

  • What's the lifecycle? How long should bodies be stored? I think having the lifecycle be implementation-defined is bad because we'll inevitably get interop problems where one browser keeps bodies for longer than another.
  • How does this work with request interception. We'd eventually like to be able to rewrite bodies as part of the interception API. As with network events it would be good to have a consistent model here rather than two unrelated sets of commands.

A question is whether requiring an interception is acceptable. If it is one could add a returnBody: "none" / "string" / "handle" in network.continueRequest or network.continueResponse, and if you provide that, you get an extra network.bodyReady event that in the case of a string occurs when the full body is known, or in the case of a handle is immediate (or maybe it becomes a property on some existing event), and (for strings) you get read-once semantics i.e. the implementation is expected to cache the body until it's read or the page is navigated, but to expire it after a read.

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 13, 2025

requiring an interception is acceptable

Puppeteer allows getting bodies without interception so I do not think requiring an interception would be acceptable.

What's the lifecycle? How long should bodies be stored?

Chrome allows configuring limits https://chromedevtools.github.io/devtools-protocol/tot/Network/#method-enable so we could have something similar too. Probably, clearing on a new-document navigation would make sense but I will need do some testing.

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 13, 2025

How does this work with request interception.

I think we will be able to use the same command but if the request is paused on the responseStarted phase we have diferent steps to fetch the body.

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 13, 2025

I think having the lifecycle be implementation-defined is bad because we'll inevitably get interop problems where one browser keeps bodies for longer than another.

I think it would be a good starting point to start with implementation-defined lifecycle and resolve the arising interop issues later as long as they do not require structural changes as there could be many edge cases and differences on the network stack that might not be easily unifiable (unless we just unconditionally store request bodies even if never requested which is not very efficient). The following questions need to be considered for the lifecycle:

  • is the worker/worklet that served the response still alive to provide blob data?
  • is the process hosting the response data still alive?
  • was the response body evicted for other reasons (memory limits)?

@OrKoN OrKoN requested a review from sadym-chromium January 13, 2025 16:33
@jgraham
Copy link
Member

jgraham commented Jan 14, 2025

there could be many edge cases and differences on the network stack that might not be easily unifiable

These are almost all resolvable, depending on the model. For example if instead of having a "getResponseBody" command, we adopted a model like network request interception where you can subscribe to get the body for some/all responses, and instead had a "responseBodyReady" event for matching requests. Indeed you could probably rather directly reuse the existing infrastructure and make it another network interception phase (although we'd need to work out how to specify which kind of body you want in the case where we support both strings and stream handles). Puppeteer in that case would need to subscribe to all bodies and manage the lifecycle itself, which isn't ideal in the short term, but you could probably move to a more efficient client side API that made storing the bodies opt-in.

Anyway, I'm not specifically advocating that solution, just saying that there clearly are options that avoid making the lifecycle totally implementation defined, and I think we should explore those because the other option is that we offer an unreliable user experience and/or end up needing to do significant platform work to align on the most permissive implementation.

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 14, 2025

Puppeteer in that case would need to subscribe to all bodies and manage the lifecycle itself, which isn't ideal in the short term, but you could probably move to a more efficient client side API that made storing the bodies opt-in.

I do not think opt-in into all bodies would work for Puppeteer or Playwright. Interception has overhead and in many cases like the har generation firefox-devtools/bidi-har-export#22 the interception is not on. Although I think an ability to opt-in into all bodies would work for har I think an ability to lazily fetch the body after the fact without interception is an important feature.

@juliandescottes
Copy link
Contributor

A question is whether requiring an interception is acceptable.

As @OrKoN mentioned, for generating HAR files during performance tests, requiring interception would probably impact the network performance (unless we could define some interception rules that are automatically applied without having to send a command to resume the request), so it sounds difficult to combine the two approaches. An interception-like feature which effectively blocks requests can't be the only way to retrieve response bodies.

If we make this a command - as in this PR - then consumers can decide to get only the responses they are interested in, but need to request it before the content is unavailable (which brings questions about the lifecycle). For the example of HAR generation for perf tests, it also means the client (eg browsertime) keeps track of all network events collected, and at the end sends commands to get all response bodies. But that seems to already be what they are doing for Chrome on browsertime's side so it's probably fine as a pattern.

Alternatively we could make it a separate event. Then there is no issue with the lifecycle, but the only granularity for users is whether or not they want to receive response bodies. That might be slightly more consistent with an interception API. We could imagine 2 events: network.responseBodyStreamOpened / network.responseBodyStreamClosed, where the first one also corresponds to a new interception phase. The second event would not map to an interception phase but it would contain the response body. Again the issue with this is the granularity of the response bodies you will receive... Unless we have a way to restrict it to specific patterns? (which I know looks like interception, but it seems wrong to me to tie it to interception when we know we can't block all requests just to get the body)

@jgraham
Copy link
Member

jgraham commented Jan 14, 2025

Yes, sorry, in the previous comment I didn't mean to imply that you'd have to explicitly continue the request, just that there would be a way to enable an extra non-blocking lifecycle event containing the response body for certain URLs.

Functionally this is identical to network.addIntercept, but it indeed might be confusing to reuse the command given that the semantics would be different.

I agree that perf monitoring is a case where adding additional blocking is unacceptable.

So basically the design I was imagining is similar to @juliandescottes' final paragraph.

@jgraham
Copy link
Member

jgraham commented Jan 14, 2025

I think an ability to lazily fetch the body after the fact without interception is an important feature.

Then I think you have to define the lifecycle over which bodies are expected to be available. "Don't expose GC behaviour" is a fundamental design principle for the web platform, and whilst I often think that we can have slightly different constraints in automation, this is a case where I think "don't expose the memory management strategy of the implementation" is a principle we should work from, for basically the same reasons.

I also think that forcing the implementation to cache responses for a long or unbounded amount of time (e.g. until navigation) is very problematic; it seems likely that this will show up in tests as unexpected memory growth that doesn't replicate in non-automation scenarios.

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 14, 2025

I am not sure I fully understand the proposal but it sounds it would be similar to just calling the command to get the body at responseStarted phase (and its response being the responseBodyStreamClosed event)? I think that from the lifecycle perspective if the request is not blocked there is still no guarantee that it would not be cleaned up and not saved by the implementation. Or do you propose that the client should know ahead of time what URLs of requests it needs bodies of look like? Should that also include the max body size to be emitted by events? I think it still does not really help with the lazy body fetching situation, e.g., what if you want to inspect the body of the slowest request?

@jgraham
Copy link
Member

jgraham commented Jan 14, 2025

The proposal would be something like a network.enableResponseBodies command with parameters like

{
  ? contexts: [+browsingContext.BrowsingContext],
  ? urlPatterns: [*network.UrlPattern],
  ? type: "string" .default "string" // Extensibility point for allowing a handle later on
  ?maxSize: js-int .default -1 // Allow opting out of large bodies
}

If a specific response matches a response body filter added in this way then there would be an additional event network.responseBody with a string containing the response data (possibly base64 encoded), once it's ready. Alternatively, we could just add the body in network.responseCompleted in this case.

For interception I think we'd just add a parameter to the existing network.addIntercept command that would make a handle to a stream containing the body available in network.responseStarted; this would unfortunately be a slightly different mechanism, but we already have an event that's emitted at the right time, and if you're going to modify the stream then the other concerns around overheads disappear.

what if you want to inspect the body of the slowest request?

In this design you have to get everything and the client gets to decide which responses to keep.

In your design you can't reliably do what you're asking for, because it depends on whether the implementation decided to hold on to the body until you requested it. In practice this means that everyone has to agree on what the lifecycle should be via the inefficient mechanism of getting bug reports from users until the behaviours are sufficiently similar in enough cases that people don't notice the differences any more.

In a design where we don't transmit the body until requested, I think you still want something like network.enableResponseBodies, but instead of adding an extra event, it means that the implementation has to cache those bodies until someone sends a network.getResponseBody() command (assuming we don't want the data held on both the client and browser sides) with a response id (which we'd need to add), or we reach some "natural" endpoint for the cache (e.g. navigation), or the client sends a network.clearResponseBodyCache command.

There's an additional question here about how you handle the case where multiple network.enableResponseBodies filters match the same request/response: would explicitly clearing the cache clear it for all rules, or would it be better to explicitly specify which part of the cache should be cleared using a token, similar to the way event [un]subscriptions now work. Given the experience with events, probably the latter, but then getResponseBody should also know about which filter the command corresponds to so that each "subscriber" (i.e. matching filter) has a chance to retrieve the data.

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 14, 2025

In a design where we don't transmit the body until requested, I think you still want something like network.enableResponseBodies, but instead of adding an extra event, it means that the implementation has to cache those bodies until someone sends a network.getResponseBody() command (assuming we don't want the data held on both the client and browser sides) with a response id (which we'd need to add), or we reach some "natural" endpoint for the cache (e.g. navigation), or the client sends a network.clearResponseBodyCache command.

I think navigation is not a sufficient condition for clearing cache because loading one site can cause multiple navigations in multiple navigables so we would still need to define it (and it is mostly implementation defined). From the experience of dealing with bug reports in Puppeteer, the users who inspect response bodies generally want the response body to be indefinitely available (unless there are concerned with the memory usage) and it comes in conflict with browser engines implementations where we cannot arbitrarily move data to a cache storage and keep it indefinitely. Basically, we cannot guarantee that the data is there without incurring the overhead of always reading the data out of the process and backing it up elsewhere (e.g., a different process).

@jgraham
Copy link
Member

jgraham commented Jan 15, 2025

Just to summarize where I think we are, we've considered four possible areas of design space:

  1. Clients have to request a body at a point during the lifecycle when it is guaranteed to still be available (i.e. after the request has been initiated, but before the response is complete)
  2. Clients subscribe upfront to response bodies for requests matching certain URLs in certain contexts and are sent them in an event. Keeping responses available is a client side concern.
  3. Clients subscribe upfront for the browser to retain response bodies for requests matching certain URLs in certain contexts. If they later decide to actually use the body, they send a separate command to retrieve it, and there is presumably a way to clear the browser-side cache.
  4. Clients send a command to retrieve a response body at any point after the response starts. Whether or not it's available is entirely at the discretion of the browser.

Of these options, 1 imposes unacceptable overhead since it requires one roundtrip per request that might be intercepted. 4 is closest to the current model in CDP (and hence Puppeteer), and is good enough for devtools where there are no interoperability requirements. But for it to work cross-browser we would need to converge on similar models for how long bodies should be retained, and assuming that will happen by convergent evolution and reverse engineering is missing the point of standardisation, and seems likely to incur significant engineering costs later if users run into differences. 2 adds a lot of protocol overhead in transferring (possibly large) bodies that may not be used, plus it requires clients to implement the lifecycle management themselves. 3 reduces the protocol traffic, but requires browsers to store some bodies that may not be used (and likely requires additional IPC to do so), rather than giving them the option to throw away bodies that are difficult to persist.

@jgraham
Copy link
Member

jgraham commented Jan 15, 2025

In terms of use cases, 1. is fine for any request interception use cases. 2. is fine for HAR file generation. However the flexibility of existing clients suggests use cases for which control similar to 1 is required for overhead similar to 2, but no one has precisely set out what those use cases are (ideally with links to real examples, rather than hypothetical possibilities, which are of course easy to construct).

@juliandescottes
Copy link
Contributor

Thanks for summarizing the options. Is it correct to say we are leaning towards 3 or 4? 1 is a no go because of the overhead, and 2 would lead to a lot of potentially big events sent over the wire.

It seems like approach 4 can also be emulated with approach 3? To me option 3 sounds like option 4, but with a command to opt-in + a filtering capability. Clients could just enableResponseBodies for all contexts, all patterns, and then we would effectively be in the same situation as 4? Browsers might still technically need to stop storing responses if too many are stored, or am I missing something?

But it feels like 3 could be a good compromise. I am not sure if libraries like Puppeteer will expose all the arguments of enableResponseBodies to end users, it might be easier to enable it transparently. But for more low level use cases it would allow to optimize and reduce the browser overhead.

One question for design 3 or 4 is whether we still want to send out a notification when the response body is available or if we are fine with just letting consumers call the command and potentially having to wait a bit to get the response (I know that at least for Firefox' current implementation there will be some gap between the moment we send responseCompleted and the moment we can retrieve the response body.

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 16, 2025

I think the bodies should not be transmitted via events as it is difficult to control what and when is sent to the connection without a lot of various kinds of configurations and I do not think filtering by URLs has a solid use case while it makes it harder to use (if you do not know URLs you would need bodies for, you would need to do two runs: first to collect URLs, then to fetch their bodies). As for lifecycle hardening, I think what we could is to configure caching per user context with on/off flag and max total storage per navigable and specify that bodies have to be removed when navigable is destroyed. I think for any other guarantees about body availability we would require architectural changes and additional consultations about that internally. So I would be in favor of option 4 + configuration per user context with the size limit per navigable and the response being retained as long as the navigable is there and fits within the limit (with older bodies evicted first). I would not be opposed to adding URls later but given that the set of URLs could be changed/removed at any time it makes the caching somewhat complicated.

For the interception use case, I think there is no problem to guarantee the body availability without any additional configuration and I think it also needs to be requested by the client via a command while the response is paused.

@jgraham
Copy link
Member

jgraham commented Jan 17, 2025

In my mind the key difference between 3 and 4 is to what extent the browser gets to arbitrarily decide the lifetime of the response bodies. In a literal interpretation of 4 (or the PR that's currently written), an implementation that always returned "no such response body" would be conforming, but rather useless. The most core feature of 3 is that the browser holds on to the responses for a defined lifetime, although practically we'd probably want some tools to allow the client to reduce the lifetime when it's aware that it isn't going to need the responses.

think what we could is to configure caching per user context with on/off flag and max total storage per navigable and specify that bodies have to be removed when navigable is destroyed.

But it would still be conforming to act as if all bodies had been expired from the cache? Assuming it is, a useful implementation would require some information from outside the specification to determine the defacto requirements for interoperability. That's the part that I don't think is acceptable about "lifetimes are entirely up to the browser".

I do not think filtering by URLs has a solid use case while it makes it harder to use

Just like request interception, "*" would be a valid filter, matching everything, and could be the default. So I don't think this would make things any harder from a user point of view. The value proposition is that often when writing tests you know facts like "I want to inspect the bodies of third party API responses if a test fails, but I'm never going to inspect the responses to image requests". Therefore you can just filter down to the bodies that the test might later try to access. That's unlike devtools use cases where you don't have any upfront knowledge that can be used.

That said, I agree it's not a critical feature. But it seems clear that giving the client tools to avoid leaking memory on the browser side is important.

@juliandescottes
Copy link
Contributor

@jgraham

The most core feature of 3 is that the browser holds on to the responses for a defined lifetime

That's what I still don't understand. How is the lifetime defined for approach 3? It seems identical to approach 4 in that regard. You can only filter out upfront some contexts and url patterns you don't plan to ever request, but then there's nothing which tells the browser for how long it should hold onto the "accepted" bodies?

Or if the implicit idea is that we are holding onto bodies forever until cleared by the user, then again we can just emulate 4 with 3, so I think technically we'll end up with the same issues of having to limit arbitrarily the amount of responses we store.

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 17, 2025

So we obviously do not want to make the command useless by no implementation actually storing anything but also we do not want to force implementations to support use cases like user closes the navigable/user context while requests are being loaded and still expects the body to be saved somewhere for a later inspection. Maybe we could phrase it as something like "the bodies can be removed from the cache for a good reason like process destruction". We can specify the clean up to happen when navigable is destroyed but it could actually happen earlier if processes are destroyed during navigations. If we want to carry over bodies across processes, we would need some additional investigation into the topic. Unfortunately, I think there are several implementation-specific concerns here.

Just like request interception, "*" would be a valid filter, matching everything, and could be the default. So I don't think this would make things any harder from a user point of view. The value proposition is that often when writing tests you know facts like "I want to inspect the bodies of third party API responses if a test fails, but I'm never going to inspect the responses to image requests". Therefore you can just filter down to the bodies that the test might later try to access. That's unlike devtools use cases where you don't have any upfront knowledge that can be used.

I think there are testing use case (performance audit) where you do not have upfront knowledge. But yeah I agree that in some use cases you could minimize the usage of the cache by defining URLs upfront. Also, URLs do not allow filtering out by content type that might be even more useful than URLs filtering. But I also thing the initial version could be an on/off switch with more detailed configuration being specified later for other use cases.

@jgraham
Copy link
Member

jgraham commented Jan 17, 2025

That's what I still don't understand. How is the lifetime defined for approach 3?

Well that's to be decided. But an obvious starting point would be "response bodies are maintained for the lifetime of the navigable".

I think technically we'll end up with the same issues of having to limit arbitrarily the amount of responses we store

With the above approach you might indeed eventually run out of memory or similar. Infra has specific text about this kind of limit.

Conceptually I think there's an important difference between a limit that is externally imposed by hardware etc. and one that's just "implementation defined" and allows any behaviour to be considered conforming. In particular, in the former case one should be able to point to the specific limitation that caused the problem. "Sorry I couldn't cache that 2Gb document because there was only 1Gb of free storage" is very different to "Sorry, I couldn't cache that 2kB document even though there was plenty of storage available".

In the spirit of infra, I think it would be reasonable for the spec to use some of the following techniques for limiting response body cache size, or make implementation of shared semantics easier:

  • Defined minimum/maximum cache size in the absence of hardware constraints
  • Pre-defined, or user-controlled, limits on what is cached
  • Upfront limits on the kinds of response bodies that can be retrieved. If caching non-https resources causes a problem we can say that you can't retrieve those.
  • Other upfront limits on which response bodies can be retrieved based on other properties. For example if responses for requests created in workers are difficult to cache then we can explicitly say they're not supported.

It's generally fairly easy to loosen those kinds of constraints in the future if it becomes clear that we're not meeting all the necessary use cases.

The problem is when instead of having spec-defined answers it's all implementation-defined. The end result of that is users depending on the behaviour of a specific implementation and finding that their tests etc. do no in fact work cross-browser.

@jgraham
Copy link
Member

jgraham commented Jan 17, 2025

Maybe we could phrase it as something like "the bodies can be removed from the cache for a good reason like process destruction".

Yes, I agree that the cache should not outlive the browsing context group (which I think corresponds to process), rather than the navigable (which might change processes).

@OrKoN OrKoN force-pushed the orkon/get-response-body branch 2 times, most recently from cb602a9 to 76c3cb3 Compare January 17, 2025 12:21
@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 17, 2025

so I have added the following constraints for the lifecycle (not meant as the final spec text):

Responses must be removed on the following conditions:

- when a browsing context group switch or destruction happens, responses
  associated with the previous browsing context group are removed.

- when a navigation commits a new document, resources associated with the
  previous document are removed.

- when a worker scope is removed the responses provided by the worker are
  removed.

- when a user-configured (command to be defined) per navigable size limit is
  exceeded, the oldest response body is evicted until there is space for
  the new body. Evicted resources should result in a distinct error
  code.

If there is an agreement for these, I could think about specifying this in some way. We can exclude worker served responses if we want to for now.

@OrKoN OrKoN force-pushed the orkon/get-response-body branch from 76c3cb3 to a644037 Compare January 17, 2025 12:25
@juliandescottes
Copy link
Contributor

We can exclude worker served responses if we want to for now.

That might be better. Workers can be short lived so it might be better to tie responses to the owning document, but depending on the worker, we might not have an owning document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support getting response body for network responses
3 participants