Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hostname waypoints #1224

Merged
merged 3 commits into from
Aug 1, 2024

Conversation

howardjohn
Copy link
Member

@howardjohn howardjohn commented Jul 23, 2024

This PR adds support for sending traffic to waypoints identified by hostnames. Currently, this code is not exercised by the control plane, so it should never be hit. However, we need this to be present 1 release before we turn it on for istiod for compatibility purposes. This gives us that capability.

This is preferred long term as the implementation is simpler in both the control plane and ztunnel, and hostname is already the unique key for a service, so its a better lookup key. Additionally, it makes dual stack usage more clear (i.e. we don't have to pick an arbitrary IP to put, we can just reference the hostname).

Related: istio/istio#52318
Istiod change: istio/istio#52294

@howardjohn howardjohn added the release-notes-none Indicates a PR that does not require release notes. label Jul 23, 2024
@istio-testing
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@istio-testing istio-testing added do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 23, 2024
Based on user feedback.

Before/after

dns outage
```
2024-07-29T17:10:59.111431Z     warn    xds::client:xds{id=14}  XDS client connection error: gRPC connection error:status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Temporary failure in name resolution, retrying in 15s
2024-07-29T17:22:14.958433Z     warn    xds::client:xds{id=3}   XDS client connection error: gRPC connection error connecting to https://istiod.istio-system.svc:15012: status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Temporary failure in name resolution (hint: is the DNS server reachable?), retrying in 80ms
```

wrong dns name
```
2024-07-29T17:22:47.910253Z     warn    xds::client:xds{id=10}  XDS client connection error: gRPC connection error:status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Name or service not known, retrying in 10.24s
2024-07-29T17:22:47.910253Z     warn    xds::client:xds{id=10}  XDS client connection error: gRPC connection error connecting to https://istiodx.istio-system.svc:15012: status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Name or service not known, retrying in 10.24s
```

Bad auth (ztunnel)
```
2024-07-29T17:25:29.137815Z     warn    xds::client:xds{id=11}  XDS client connection error: gRPC connection error connecting to https://istiod.istio-system.svc:15012: status: Unauthenticated, message: "authentication failure", retrying in 15s
2024-07-29T17:35:00.273104Z     warn    xds::client:xds{id=9}   XDS client connection error: gRPC connection error connecting to https://istiod.istio-system.svc:15012: status: Unauthenticated, message: "authentication failure" (hint: check the control plane logs for more information), retrying in 5.12s
```
@howardjohn howardjohn force-pushed the waypoint/support-hostname branch from 19d5dc5 to f851be1 Compare July 29, 2024 22:40
@howardjohn howardjohn marked this pull request as ready for review July 29, 2024 22:41
@howardjohn howardjohn requested a review from a team as a code owner July 29, 2024 22:41
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jul 29, 2024
@howardjohn
Copy link
Member Author

/retest

1 similar comment
@howardjohn
Copy link
Member Author

/retest

Copy link
Contributor

@ilrudie ilrudie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code seems fine. Just a couple of nits from your friendly neighborhood unwrap nag.

What is the intended timeline for rolling this out? 1.23 shipping in pretty much imminent so guessing 1.24 ships with this in ztunnel in preparation for 1.25 in istiod?

What is the plan for knowing this works as intended if actual usability is that far out? It seems like timing wise this could complicate maintaining ztunnel in the short term and might make more sense to merge closer to shipping 1.24 so that any fixes for ztunnel that come up as folks get their hands on 1.23 in the wild won't need to contend with these changes as we try to backport them.

src/state.rs Show resolved Hide resolved
src/state.rs Outdated Show resolved Hide resolved
src/state.rs Outdated Show resolved Hide resolved
@howardjohn
Copy link
Member Author

The code seems fine. Just a couple of nits from your friendly neighborhood unwrap nag.

What is the intended timeline for rolling this out? 1.23 shipping in pretty much imminent so guessing 1.24 ships with this in ztunnel in preparation for 1.25 in istiod?

What is the plan for knowing this works as intended if actual usability is that far out? It seems like timing wise this could complicate maintaining ztunnel in the short term and might make more sense to merge closer to shipping 1.24 so that any fixes for ztunnel that come up as folks get their hands on 1.23 in the wild won't need to contend with these changes as we try to backport them.

I think the key to this is just testing... which was lacking in the initial PR -- I have built it out a lot more now. WDYT?

FWIW I have also been manually testing with Istiod PR, just hard to turn that into automation

@howardjohn howardjohn force-pushed the waypoint/support-hostname branch from 47ff2a3 to 42c73bb Compare August 1, 2024 20:40
@ilrudie ilrudie added the cherrypick/release-1.23 Set this label on a PR to auto-merge it to the release-1.23 branch label Aug 1, 2024
Copy link
Contributor

@ilrudie ilrudie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I do think we should reconsider carrying the diff if we don't get the cherrypick accepted and we won't have the istiod change in 1.24

@istio-testing istio-testing merged commit 189df4c into istio:master Aug 1, 2024
3 checks passed
@istio-testing
Copy link
Contributor

In response to a cherrypick label: new pull request created: #1246

@thedebugger
Copy link

@ilrudie are you saying we should hold off on this change in 1.23 since related istiod changes aren't certain in 1.24?

@ilrudie
Copy link
Contributor

ilrudie commented Aug 2, 2024

@thedebugger, The opposite, the istiod changes are ready, or nearly so, I think we should cherry-pick this to 1.23 so that we can merge the istiod changes soon and begin full testing.

howardjohn added a commit to howardjohn/ztunnel that referenced this pull request Aug 7, 2024
Regression from istio#1224

Before that PR (and now the behavior returns), we would use the IP
family of the IP stuck in the workload.waypoint or service.waypoint
field. This was always IPv4 due to istiod's implementation.

With the PR, we started to use the original IP family of the request.
Since waypoints had IPv6 addresses, we would use them. Due to another
bug in Istiod (fixed in istio/istio#52555), the
waypoints didn't actually listen on IPv6, though, so would drop the
requests.

With this PR, we will continue to send to the IP actually specified. In
the near future we will move to hostname references, where we can
correctly pick the best IP family based on the request.
istio-testing pushed a commit that referenced this pull request Aug 7, 2024
Regression from #1224

Before that PR (and now the behavior returns), we would use the IP
family of the IP stuck in the workload.waypoint or service.waypoint
field. This was always IPv4 due to istiod's implementation.

With the PR, we started to use the original IP family of the request.
Since waypoints had IPv6 addresses, we would use them. Due to another
bug in Istiod (fixed in istio/istio#52555), the
waypoints didn't actually listen on IPv6, though, so would drop the
requests.

With this PR, we will continue to send to the IP actually specified. In
the near future we will move to hostname references, where we can
correctly pick the best IP family based on the request.
istio-testing pushed a commit to istio-testing/ztunnel that referenced this pull request Aug 7, 2024
Regression from istio#1224

Before that PR (and now the behavior returns), we would use the IP
family of the IP stuck in the workload.waypoint or service.waypoint
field. This was always IPv4 due to istiod's implementation.

With the PR, we started to use the original IP family of the request.
Since waypoints had IPv6 addresses, we would use them. Due to another
bug in Istiod (fixed in istio/istio#52555), the
waypoints didn't actually listen on IPv6, though, so would drop the
requests.

With this PR, we will continue to send to the IP actually specified. In
the near future we will move to hostname references, where we can
correctly pick the best IP family based on the request.
istio-testing added a commit that referenced this pull request Aug 7, 2024
Regression from #1224

Before that PR (and now the behavior returns), we would use the IP
family of the IP stuck in the workload.waypoint or service.waypoint
field. This was always IPv4 due to istiod's implementation.

With the PR, we started to use the original IP family of the request.
Since waypoints had IPv6 addresses, we would use them. Due to another
bug in Istiod (fixed in istio/istio#52555), the
waypoints didn't actually listen on IPv6, though, so would drop the
requests.

With this PR, we will continue to send to the IP actually specified. In
the near future we will move to hostname references, where we can
correctly pick the best IP family based on the request.

Co-authored-by: John Howard <[email protected]>
antonioberben pushed a commit to antonioberben/ztunnel that referenced this pull request Oct 1, 2024
* Improve XDS error diagnostics

Based on user feedback.

Before/after

dns outage
```
2024-07-29T17:10:59.111431Z     warn    xds::client:xds{id=14}  XDS client connection error: gRPC connection error:status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Temporary failure in name resolution, retrying in 15s
2024-07-29T17:22:14.958433Z     warn    xds::client:xds{id=3}   XDS client connection error: gRPC connection error connecting to https://istiod.istio-system.svc:15012: status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Temporary failure in name resolution (hint: is the DNS server reachable?), retrying in 80ms
```

wrong dns name
```
2024-07-29T17:22:47.910253Z     warn    xds::client:xds{id=10}  XDS client connection error: gRPC connection error:status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Name or service not known, retrying in 10.24s
2024-07-29T17:22:47.910253Z     warn    xds::client:xds{id=10}  XDS client connection error: gRPC connection error connecting to https://istiodx.istio-system.svc:15012: status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Name or service not known, retrying in 10.24s
```

Bad auth (ztunnel)
```
2024-07-29T17:25:29.137815Z     warn    xds::client:xds{id=11}  XDS client connection error: gRPC connection error connecting to https://istiod.istio-system.svc:15012: status: Unauthenticated, message: "authentication failure", retrying in 15s
2024-07-29T17:35:00.273104Z     warn    xds::client:xds{id=9}   XDS client connection error: gRPC connection error connecting to https://istiod.istio-system.svc:15012: status: Unauthenticated, message: "authentication failure" (hint: check the control plane logs for more information), retrying in 5.12s
```

* Support hostname waypoints

* add test and drop unwrap
antonioberben pushed a commit to antonioberben/ztunnel that referenced this pull request Oct 1, 2024
Regression from istio#1224

Before that PR (and now the behavior returns), we would use the IP
family of the IP stuck in the workload.waypoint or service.waypoint
field. This was always IPv4 due to istiod's implementation.

With the PR, we started to use the original IP family of the request.
Since waypoints had IPv6 addresses, we would use them. Due to another
bug in Istiod (fixed in istio/istio#52555), the
waypoints didn't actually listen on IPv6, though, so would drop the
requests.

With this PR, we will continue to send to the IP actually specified. In
the near future we will move to hostname references, where we can
correctly pick the best IP family based on the request.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherrypick/release-1.23 Set this label on a PR to auto-merge it to the release-1.23 branch release-notes-none Indicates a PR that does not require release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants