Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some tests timing out on helix #5247

Open
radical opened this issue Aug 9, 2024 · 7 comments
Open

Some tests timing out on helix #5247

radical opened this issue Aug 9, 2024 · 7 comments
Labels
area-meta blocking-clean-ci Blocking a green CI testing ☑️ tracking Tracking issue for some TODOs
Milestone

Comments

@radical
Copy link
Member

radical commented Aug 9, 2024

Build Information

Build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=804618&view=results
Build error leg or test failing: Aspire.Hosting.Elasticsearch.Tests.WorkItemExecution
Pull request: #5243

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "",
  "ErrorPattern": "Aborting test run: test run timeout of [0-9]+ milliseconds exceeded",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=804618
Error message validated: [Aborting test run: test run timeout of [0-9]+ milliseconds exceeded]
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 9/11/2024 7:28:44 PM UTC

Report

Build Definition Test Pull Request
919972 dotnet/aspire Aspire.Playground.Tests.ProjectSpecificTests.Aspire.Playground.Tests.ProjectSpecificTests.WithDockerfileTest #7073
919479 dotnet/aspire Aspire.Hosting.Redis.Tests.RedisFunctionalTests.Aspire.Hosting.Redis.Tests.RedisFunctionalTests.VerifyDatabasesAreNotDuplicatedForPersistentRedisInsightContainer
919273 dotnet/aspire Aspire.Hosting.PostgreSQL.Tests.PostgresFunctionalTests.Aspire.Hosting.PostgreSQL.Tests.PostgresFunctionalTests.WithDataShouldPersistStateBetweenUsages #7127
919200 dotnet/aspire Aspire.Hosting.Azure.Tests.AzureCosmosDBEmulatorFunctionalTests.Aspire.Hosting.Azure.Tests.AzureCosmosDBEmulatorFunctionalTests.VerifyWaitForOnCosmosDBEmulatorBlocksDependentResources(usePreview: False) #7117
919185 dotnet/aspire Aspire.Hosting.Redis.Tests.RedisFunctionalTests.Aspire.Hosting.Redis.Tests.RedisFunctionalTests.VerifyDatabasesAreNotDuplicatedForPersistentRedisInsightContainer #7092
919096 dotnet/aspire Aspire.Hosting.Azure.Tests.AzureStorageEmulatorFunctionalTests.Aspire.Hosting.Azure.Tests.AzureStorageEmulatorFunctionalTests.VerifyAzureStorageEmulatorResource
918629 dotnet/aspire Aspire.Hosting.MongoDB.Tests.MongoDbFunctionalTests.Aspire.Hosting.MongoDB.Tests.MongoDbFunctionalTests.WithDataShouldPersistStateBetweenUsages #7026
918658 dotnet/aspire Aspire.EndToEnd.Tests.IntegrationServicesTests.Aspire.EndToEnd.Tests.IntegrationServicesTests.VerifyComponentWorks(resourceName: postgres)
918471 dotnet/aspire Aspire.Hosting.PostgreSQL.Tests.PostgresFunctionalTests.Aspire.Hosting.PostgreSQL.Tests.PostgresFunctionalTests.WithDataShouldPersistStateBetweenUsages #7105
912694 dotnet/aspire Aspire.Playground.Tests.WorkItemExecution #7014
913927 dotnet/aspire Aspire.Playground.Tests.WorkItemExecution #7068
913500 dotnet/aspire Aspire.Playground.Tests.WorkItemExecution #7048
912693 dotnet/aspire Aspire.Hosting.Azure.Tests.WorkItemExecution #7037
912688 dotnet/aspire Aspire.Hosting.Azure.Tests.AzureCosmosDBEmulatorFunctionalTests.Aspire.Hosting.Azure.Tests.AzureCosmosDBEmulatorFunctionalTests.VerifyWaitForOnCosmosDBEmulatorBlocksDependentResources #7048
912591 dotnet/aspire Aspire.Playground.Tests.WorkItemExecution #7040
912433 dotnet/aspire Aspire.Playground.Tests.WorkItemExecution #7014
912185 dotnet/aspire Aspire.Playground.Tests.WorkItemExecution #7048
911374 dotnet/aspire Aspire.Playground.Tests.WorkItemExecution #6998
911155 dotnet/aspire Aspire.Playground.Tests.WorkItemExecution #7014
909712 dotnet/aspire Aspire.Playground.Tests.WorkItemExecution #7037
909682 dotnet/aspire Aspire.Playground.Tests.WorkItemExecution
909382 dotnet/aspire Aspire.Playground.Tests.WorkItemExecution #7034
908616 dotnet/aspire Aspire.Hosting.Azure.Tests.WorkItemExecution #7005
908294 dotnet/aspire Aspire.Playground.Tests.WorkItemExecution #7005
908258 dotnet/aspire Aspire.Playground.Tests.WorkItemExecution #7032
906449 dotnet/aspire Aspire.Playground.Tests.WorkItemExecution #6737
906358 dotnet/aspire Aspire.EndToEnd.Tests.IntegrationServicesTests.Aspire.EndToEnd.Tests.IntegrationServicesTests.VerifyComponentWorks(resourceName: postgres) #6946

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 10 27
@radical radical added the blocking-clean-ci Blocking a green CI label Aug 9, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-engineering-systems infrastructure helix infra engineering repo stuff label Aug 9, 2024
@radical radical added testing ☑️ and removed area-engineering-systems infrastructure helix infra engineering repo stuff labels Aug 9, 2024
@radical
Copy link
Member Author

radical commented Aug 9, 2024

cc @eerhardt @sebastienros

One of the tests timing out is Aspire.Hosting.Elasticsearch.Tests - log.

Why is this timing out after 10minutes on helix when all the tests combined didn't take that long on the build machine?

  • One reason could be that the helix agents are under-powered compared to the build machines. And these tests start up a new app per xunit test in ElasticsearchFunctionalTests.

@radical
Copy link
Member Author

radical commented Aug 9, 2024

I chose a broader error message to match against, so we can get any tests that are timing out on helix now.

@mitchdenny
Copy link
Member

I am still hitting this today on this PR: #5223

@radical
Copy link
Member Author

radical commented Aug 12, 2024

I am still hitting this today on this PR: #5223

I'll bump the timeouts. We are hitting new ones as we moved new tests to helix last week.

@radical
Copy link
Member Author

radical commented Aug 12, 2024

Aspire.Hosting.Elasticsearch.Tests - I think this is timing out because each of the functional tests starts a new app, and the elasticsearch containers take a few minutes to start up, causing the time taken for all the tests to be more than 10 minutes.

@mitchdenny
Copy link
Member

This just happened for mongo in the playground tests. I was briefly able to repro a timeout locally as well, but when I stopped the test and retried it worked fine (multiple times). When it was timing out I noticed that mongo express didn't have an external port allocated. Make me think that Docker failed to forward the port into the container. This would explain it waiting forever to be able to connect to the container via the endpoint.

@joperezr joperezr added the area-engineering-systems infrastructure helix infra engineering repo stuff label Dec 19, 2024
@joperezr
Copy link
Member

@radical Is there any additional logging that we can add to help us narrow down what is going on here? It would be good if we can make this issue somehow actionable.

@joperezr joperezr added this to the Backlog milestone Dec 19, 2024
@joperezr joperezr removed area-engineering-systems infrastructure helix infra engineering repo stuff untriaged New issue has not been triaged labels Dec 19, 2024
@joperezr joperezr added tracking Tracking issue for some TODOs area-meta labels Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-meta blocking-clean-ci Blocking a green CI testing ☑️ tracking Tracking issue for some TODOs
Projects
None yet
Development

No branches or pull requests

3 participants