-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a command to get response body #856
base: main
Are you sure you want to change the base?
Conversation
1ea00c4
to
9129bae
Compare
CC @juliandescottes who was also going to look at this. Very briefly, some high level things I think we should try to look at:
A question is whether requiring an interception is acceptable. If it is one could add a |
Puppeteer allows getting bodies without interception so I do not think requiring an interception would be acceptable.
Chrome allows configuring limits https://chromedevtools.github.io/devtools-protocol/tot/Network/#method-enable so we could have something similar too. Probably, clearing on a new-document navigation would make sense but I will need do some testing. |
I think we will be able to use the same command but if the request is paused on the responseStarted phase we have diferent steps to fetch the body. |
I think it would be a good starting point to start with implementation-defined lifecycle and resolve the arising interop issues later as long as they do not require structural changes as there could be many edge cases and differences on the network stack that might not be easily unifiable (unless we just unconditionally store request bodies even if never requested which is not very efficient). The following questions need to be considered for the lifecycle:
|
These are almost all resolvable, depending on the model. For example if instead of having a "getResponseBody" command, we adopted a model like network request interception where you can subscribe to get the body for some/all responses, and instead had a "responseBodyReady" event for matching requests. Indeed you could probably rather directly reuse the existing infrastructure and make it another network interception phase (although we'd need to work out how to specify which kind of body you want in the case where we support both strings and stream handles). Puppeteer in that case would need to subscribe to all bodies and manage the lifecycle itself, which isn't ideal in the short term, but you could probably move to a more efficient client side API that made storing the bodies opt-in. Anyway, I'm not specifically advocating that solution, just saying that there clearly are options that avoid making the lifecycle totally implementation defined, and I think we should explore those because the other option is that we offer an unreliable user experience and/or end up needing to do significant platform work to align on the most permissive implementation. |
I do not think opt-in into all bodies would work for Puppeteer or Playwright. Interception has overhead and in many cases like the har generation firefox-devtools/bidi-har-export#22 the interception is not on. Although I think an ability to opt-in into all bodies would work for har I think an ability to lazily fetch the body after the fact without interception is an important feature. |
As @OrKoN mentioned, for generating HAR files during performance tests, requiring interception would probably impact the network performance (unless we could define some interception rules that are automatically applied without having to send a command to resume the request), so it sounds difficult to combine the two approaches. An interception-like feature which effectively blocks requests can't be the only way to retrieve response bodies. If we make this a command - as in this PR - then consumers can decide to get only the responses they are interested in, but need to request it before the content is unavailable (which brings questions about the lifecycle). For the example of HAR generation for perf tests, it also means the client (eg browsertime) keeps track of all network events collected, and at the end sends commands to get all response bodies. But that seems to already be what they are doing for Chrome on browsertime's side so it's probably fine as a pattern. Alternatively we could make it a separate event. Then there is no issue with the lifecycle, but the only granularity for users is whether or not they want to receive response bodies. That might be slightly more consistent with an interception API. We could imagine 2 events: |
Yes, sorry, in the previous comment I didn't mean to imply that you'd have to explicitly continue the request, just that there would be a way to enable an extra non-blocking lifecycle event containing the response body for certain URLs. Functionally this is identical to I agree that perf monitoring is a case where adding additional blocking is unacceptable. So basically the design I was imagining is similar to @juliandescottes' final paragraph. |
Then I think you have to define the lifecycle over which bodies are expected to be available. "Don't expose GC behaviour" is a fundamental design principle for the web platform, and whilst I often think that we can have slightly different constraints in automation, this is a case where I think "don't expose the memory management strategy of the implementation" is a principle we should work from, for basically the same reasons. I also think that forcing the implementation to cache responses for a long or unbounded amount of time (e.g. until navigation) is very problematic; it seems likely that this will show up in tests as unexpected memory growth that doesn't replicate in non-automation scenarios. |
I am not sure I fully understand the proposal but it sounds it would be similar to just calling the command to get the body at responseStarted phase (and its response being the responseBodyStreamClosed event)? I think that from the lifecycle perspective if the request is not blocked there is still no guarantee that it would not be cleaned up and not saved by the implementation. Or do you propose that the client should know ahead of time what URLs of requests it needs bodies of look like? Should that also include the max body size to be emitted by events? I think it still does not really help with the lazy body fetching situation, e.g., what if you want to inspect the body of the slowest request? |
The proposal would be something like a
If a specific response matches a response body filter added in this way then there would be an additional event For interception I think we'd just add a parameter to the existing
In this design you have to get everything and the client gets to decide which responses to keep. In your design you can't reliably do what you're asking for, because it depends on whether the implementation decided to hold on to the body until you requested it. In practice this means that everyone has to agree on what the lifecycle should be via the inefficient mechanism of getting bug reports from users until the behaviours are sufficiently similar in enough cases that people don't notice the differences any more. In a design where we don't transmit the body until requested, I think you still want something like There's an additional question here about how you handle the case where multiple |
I think navigation is not a sufficient condition for clearing cache because loading one site can cause multiple navigations in multiple navigables so we would still need to define it (and it is mostly implementation defined). From the experience of dealing with bug reports in Puppeteer, the users who inspect response bodies generally want the response body to be indefinitely available (unless there are concerned with the memory usage) and it comes in conflict with browser engines implementations where we cannot arbitrarily move data to a cache storage and keep it indefinitely. Basically, we cannot guarantee that the data is there without incurring the overhead of always reading the data out of the process and backing it up elsewhere (e.g., a different process). |
Just to summarize where I think we are, we've considered four possible areas of design space:
Of these options, 1 imposes unacceptable overhead since it requires one roundtrip per request that might be intercepted. 4 is closest to the current model in CDP (and hence Puppeteer), and is good enough for devtools where there are no interoperability requirements. But for it to work cross-browser we would need to converge on similar models for how long bodies should be retained, and assuming that will happen by convergent evolution and reverse engineering is missing the point of standardisation, and seems likely to incur significant engineering costs later if users run into differences. 2 adds a lot of protocol overhead in transferring (possibly large) bodies that may not be used, plus it requires clients to implement the lifecycle management themselves. 3 reduces the protocol traffic, but requires browsers to store some bodies that may not be used (and likely requires additional IPC to do so), rather than giving them the option to throw away bodies that are difficult to persist. |
In terms of use cases, 1. is fine for any request interception use cases. 2. is fine for HAR file generation. However the flexibility of existing clients suggests use cases for which control similar to 1 is required for overhead similar to 2, but no one has precisely set out what those use cases are (ideally with links to real examples, rather than hypothetical possibilities, which are of course easy to construct). |
Thanks for summarizing the options. Is it correct to say we are leaning towards 3 or 4? 1 is a no go because of the overhead, and 2 would lead to a lot of potentially big events sent over the wire. It seems like approach 4 can also be emulated with approach 3? To me option 3 sounds like option 4, but with a command to opt-in + a filtering capability. Clients could just But it feels like 3 could be a good compromise. I am not sure if libraries like Puppeteer will expose all the arguments of One question for design 3 or 4 is whether we still want to send out a notification when the response body is available or if we are fine with just letting consumers call the command and potentially having to wait a bit to get the response (I know that at least for Firefox' current implementation there will be some gap between the moment we send responseCompleted and the moment we can retrieve the response body. |
I think the bodies should not be transmitted via events as it is difficult to control what and when is sent to the connection without a lot of various kinds of configurations and I do not think filtering by URLs has a solid use case while it makes it harder to use (if you do not know URLs you would need bodies for, you would need to do two runs: first to collect URLs, then to fetch their bodies). As for lifecycle hardening, I think what we could is to configure caching per user context with on/off flag and max total storage per navigable and specify that bodies have to be removed when navigable is destroyed. I think for any other guarantees about body availability we would require architectural changes and additional consultations about that internally. So I would be in favor of option 4 + configuration per user context with the size limit per navigable and the response being retained as long as the navigable is there and fits within the limit (with older bodies evicted first). I would not be opposed to adding URls later but given that the set of URLs could be changed/removed at any time it makes the caching somewhat complicated. For the interception use case, I think there is no problem to guarantee the body availability without any additional configuration and I think it also needs to be requested by the client via a command while the response is paused. |
In my mind the key difference between 3 and 4 is to what extent the browser gets to arbitrarily decide the lifetime of the response bodies. In a literal interpretation of 4 (or the PR that's currently written), an implementation that always returned "no such response body" would be conforming, but rather useless. The most core feature of 3 is that the browser holds on to the responses for a defined lifetime, although practically we'd probably want some tools to allow the client to reduce the lifetime when it's aware that it isn't going to need the responses.
But it would still be conforming to act as if all bodies had been expired from the cache? Assuming it is, a useful implementation would require some information from outside the specification to determine the defacto requirements for interoperability. That's the part that I don't think is acceptable about "lifetimes are entirely up to the browser".
Just like request interception, "*" would be a valid filter, matching everything, and could be the default. So I don't think this would make things any harder from a user point of view. The value proposition is that often when writing tests you know facts like "I want to inspect the bodies of third party API responses if a test fails, but I'm never going to inspect the responses to image requests". Therefore you can just filter down to the bodies that the test might later try to access. That's unlike devtools use cases where you don't have any upfront knowledge that can be used. That said, I agree it's not a critical feature. But it seems clear that giving the client tools to avoid leaking memory on the browser side is important. |
That's what I still don't understand. How is the lifetime defined for approach 3? It seems identical to approach 4 in that regard. You can only filter out upfront some contexts and url patterns you don't plan to ever request, but then there's nothing which tells the browser for how long it should hold onto the "accepted" bodies? Or if the implicit idea is that we are holding onto bodies forever until cleared by the user, then again we can just emulate 4 with 3, so I think technically we'll end up with the same issues of having to limit arbitrarily the amount of responses we store. |
So we obviously do not want to make the command useless by no implementation actually storing anything but also we do not want to force implementations to support use cases like
I think there are testing use case (performance audit) where you do not have upfront knowledge. But yeah I agree that in some use cases you could minimize the usage of the cache by defining URLs upfront. Also, URLs do not allow filtering out by content type that might be even more useful than URLs filtering. But I also thing the initial version could be an on/off switch with more detailed configuration being specified later for other use cases. |
Well that's to be decided. But an obvious starting point would be "response bodies are maintained for the lifetime of the navigable".
With the above approach you might indeed eventually run out of memory or similar. Infra has specific text about this kind of limit. Conceptually I think there's an important difference between a limit that is externally imposed by hardware etc. and one that's just "implementation defined" and allows any behaviour to be considered conforming. In particular, in the former case one should be able to point to the specific limitation that caused the problem. "Sorry I couldn't cache that 2Gb document because there was only 1Gb of free storage" is very different to "Sorry, I couldn't cache that 2kB document even though there was plenty of storage available". In the spirit of infra, I think it would be reasonable for the spec to use some of the following techniques for limiting response body cache size, or make implementation of shared semantics easier:
It's generally fairly easy to loosen those kinds of constraints in the future if it becomes clear that we're not meeting all the necessary use cases. The problem is when instead of having spec-defined answers it's all implementation-defined. The end result of that is users depending on the behaviour of a specific implementation and finding that their tests etc. do no in fact work cross-browser. |
Yes, I agree that the cache should not outlive the browsing context group (which I think corresponds to process), rather than the navigable (which might change processes). |
cb602a9
to
76c3cb3
Compare
so I have added the following constraints for the lifecycle (not meant as the final spec text):
If there is an agreement for these, I could think about specifying this in some way. We can exclude worker served responses if we want to for now. |
76c3cb3
to
a644037
Compare
That might be better. Workers can be short lived so it might be better to tie responses to the owning document, but depending on the worker, we might not have an owning document. |
Closes #747
Preview | Diff