Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search param removes values after ampersand, introduced in 2.18.2 #3106

Open
ezwc opened this issue Dec 12, 2024 · 6 comments
Open

Search param removes values after ampersand, introduced in 2.18.2 #3106

ezwc opened this issue Dec 12, 2024 · 6 comments
Labels
bug something broken P2 considered for next cycle regression this used to work

Comments

@ezwc
Copy link

ezwc commented Dec 12, 2024

I pass to one of my pages a search string, like:
?param1=something&param2=something2

Accessing it uding:
def layout(**kwargs):

In 2.18.1, this works for values that included ampersand. For example:
?param1=something&bla&param2=something2
would result in
kwargs[param1]='something&bla'

With 2.18.2 I get just:
kwargs[param1]='something'
with anything after the ampersand removed.

I would guess this is related to #2991 .

To be clear, I specifically downgraded dash to 2.18.1 and the issue went away.

@gvwilson gvwilson added regression this used to work bug something broken P2 considered for next cycle labels Dec 12, 2024
@C0deBeez
Copy link
Contributor

C0deBeez commented Dec 16, 2024

Thank you for reporting this issue.

Root Cause

The behavior you’re experiencing stems from a change in the query string parsing logic introduced in Dash 2.18.2. Specifically, the _parse_query_string function uses unquote(search) in combination with parse_qs. While this combination adheres to general query string parsing rules, it introduces problems when handling unencoded reserved characters like & within values.

For example:

  • In ?param1=something&bla&param2=something2, parse_qs treats &bla as a new key (bla) instead of part of the value for param1.
  • As per the RFC 3986 standard, & is a reserved character and must be percent-encoded (%26) if it’s meant to be part of a value. So, the correctly encoded query string should be:
    ?param1=something%26bla&param2=something2

Observed Issue with unquote

Additionally, even a properly encoded query string like ?param1=something%26bla&param2=something2 results in the value bla being dropped in Dash 2.18.2. This happens because unquote(search) decodes %26 back to &, and parse_qs again interprets & as a delimiter rather than part of the value.

Proposed Solution

To address this issue, while supporting duplicate keys and empty values, _parse_query_string can be updated to use parse_qsl instead of parse_qs. The parse_qsl method offers better control, especially when handling encoded characters within query strings.

Updated _parse_query_string Implementation

from urllib.parse import parse_qsl

def _parse_query_string(search):
  if search and len(search) > 0 and search[0] == "?":
      search = search[1:]
  else:
      return {}

  parsed_qs = {}
  for k, v in parse_qsl(search, keep_blank_values=True):
      if k in parsed_qs:
          if isinstance(parsed_qs[k], list):
              parsed_qs[k].append(v)
          else:
              parsed_qs[k] = [parsed_qs[k], v]
      else:
          parsed_qs[k] = v
  return parsed_qs



query_strings = [
    "?param1=something&bla&param2=something2",
    "?param1=something%26bla&param2=something2",
    "?param1=value1&param1=value2&param2=value3",
    "?param1=&param2=something2",
    "",
]

for qs in query_strings:
    print(f"Query: {qs}")
    print(f"Parsed: {_parse_query_string(qs)}")

Output:


Query: ?param1=something&bla&param2=something2
Parsed: {'param1': 'something', 'bla': '', 'param2': 'something2'}

Query: ?param1=something%26bla&param2=something2
Parsed: {'param1': 'something&bla', 'param2': 'something2'}

Query: ?param1=value1&param1=value2&param2=value3
Parsed: {'param1': ['value1', 'value2'], 'param2': 'value3'}

Query: ?param1=&param2=something2
Parsed: {'param1': '', 'param2': 'something2'}

Query: 
Parsed: {}

Recommendations

  1. Encourage Proper URI Encoding
    According to the URI specification, reserved characters like & and = must be percent-encoded when used as part of values. For example:
    ?param1=something%26bla&param2=something2
    This ensures predictable parsing behavior.

  2. Discussions
    @gvwilson @T4rk1n Given the above, should Dash enforce URI encoding compliance, or should we handle these edge cases within the parsing logic instead?
    If we decide to enforce URI encoding compliance, parse_qsl seems to be a better solution as it provides better handling for encoded characters, duplicate keys, and empty values. On the other hand, if we choose not to enforce URI compliance, we may need to consider removing the search = unquote(search) line from _parse_query_string to prevent issues like misinterpreting %26 (decoded as &) as a delimiter rather than part of a value.

Looking forward to your thoughts on how best to proceed.

@gvwilson gvwilson changed the title [BUG] Search param removes values after ampersand, introduced in 2.18.2 Search param removes values after ampersand, introduced in 2.18.2 Jan 3, 2025
@gvwilson
Copy link
Contributor

gvwilson commented Jan 3, 2025

@T4rk1n which of these options should we pursue (or do you prefer another entirely)?

@T4rk1n
Copy link
Contributor

T4rk1n commented Jan 6, 2025

I'd like to follow the rfc standard and expect the values to be properly encoded.
But it should parse it in the values and not do what is described here:

This happens because unquote(search) decodes %26 back to &, and parse_qs again interprets & as a delimiter rather than part of the value.

@gvwilson
Copy link
Contributor

gvwilson commented Jan 8, 2025

thanks @T4rk1n - @ezwc if you'd like to create a PR we will prioritize review.

@C0deBeez
Copy link
Contributor

Hi @ezwc,

Thank you for raising this issue and for providing detailed examples.

Why This Issue Requires RFC Compliance

The core problem arises from the interpretation of & in the query string. According to RFC 3986, & is defined as the delimiter for separating key-value pairs in query strings. If you want & to be part of a value, it must be percent-encoded as %26. This ensures unambiguous parsing.

For example:

?param1=something&bla&param2=something2

Under RFC-compliant parsing, this would result in:

{
    "param1": "something",
    "bla": '',
    "param2": "something2"
}

If your intended behavior is to include & within a value, it should be encoded as %26:

?param1=something%26bla&param2=something2

This will correctly parse as:

{
    "param1": "something&bla",
    "param2": "something2"
}

Why RFC Compliance Is Necessary

  1. Consistency: Adhering to RFC 3986 ensures Dash behaves consistently with other tools, libraries, and systems.
  2. Compatibility: Non-compliance may lead to unexpected behavior in different environments.
  3. Avoiding Ambiguity: Without encoding &, there is no clear distinction between a delimiter and a value.

Proposed Fix

Based on @T4rk1n's suggestion, I propose the following implementation for _parse_query_string, which fully complies with the RFC:

from urllib.parse import parse_qs

def _parse_query_string(search):
    if not search or not search.startswith("?"):
        return {}

    query_string = search[1:]

    parsed_qs = parse_qs(query_string, keep_blank_values=True)

    return {
        k: v[0] if len(v) == 1 else v
        for k, v in parsed_qs.items()
    }

Test Cases

Here are some test cases with the proposed solution:

query_strings = [ 
    "?param1=something&bla&param2=something2",
    "?param1=something%26bla&param2=something2",
    "?param1=value1&param1=value2&param2=value3",
    "?param1=&param2=something2",
    "",
]

for qs in query_strings:
    print(f"Query: {qs}")
    print(f"Parsed: {_parse_query_string(qs)}")

Output:

Query: ?param1=something&bla&param2=something2
Parsed: {'param1': 'something', 'bla': '', 'param2': 'something2'}

Query: ?param1=something%26bla&param2=something2
Parsed: {'param1': 'something&bla', 'param2': 'something2'}

Query: ?param1=value1&param1=value2&param2=value3
Parsed: {'param1': ['value1', 'value2'], 'param2': 'value3'}

Query: ?param1=&param2=something2
Parsed: {'param1': '', 'param2': 'something2'}

Query: 
Parsed: {}

@gvwilson @T4rk1n If there are no objections, I am happy to submit a PR with this fix to resolve the issue.

Let me know if you have any concerns or additional feedback. 😊

@gvwilson
Copy link
Contributor

@C0deBeez thanks very much Ethan - if you'd like to put together a PR I will try to expedite review. Cheers - @gvwilson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken P2 considered for next cycle regression this used to work
Projects
None yet
Development

No branches or pull requests

4 participants