Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating cloud bursting scripts and repo using CycleCloud 8.4 and Slurm 23.11.9-1 #283

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

vinil-v
Copy link

@vinil-v vinil-v commented Oct 7, 2024

Updating cloud bursting setup scripts and repo for creating CycleCloud Slurm Hybrid HPC setup.

  • cyclecloud 8.4
  • cylecloud-slurm 3.0.9
  • slurm 23.11.9-1

@vinil-v
Copy link
Author

vinil-v commented Oct 9, 2024

@aditigaur4 - could you please review the PR?

@aditigaur4 aditigaur4 self-requested a review October 10, 2024 16:24
@tbugfinder
Copy link

tbugfinder commented Oct 11, 2024

It would be beneficial to explain how to best add additional scripts or provide an example. As this plugs into an existing environment cluster init is a pre-req.
Also using custom builds of slurm binaries might be applicable and therefore a howto available in docs.

@xpillons
Copy link

@vinil-v can you add an architecture diagram as well as defining which communication port should be open between the on-prem environment and the cloud environment ? I guess that the home directories are still on-prem ? How does users's identity is preserved between both environments ?

Updating the architecture diagram and NFS server info
@vinil-v
Copy link
Author

vinil-v commented Oct 16, 2024

@vinil-v can you add an architecture diagram as well as defining which communication port should be open between the on-prem environment and the cloud environment ? I guess that the home directories are still on-prem ? How does users's identity is preserved between both environments ?

@xpillons - updated as per your recommendation.

Copy link
Collaborator

@aditigaur4 aditigaur4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vinil, Thank you for the PR. The issue is that the PR makes the script hard code the cyclecloud version, the autoscale package version as well as a slurm version. This is not possible for us to support in this manner. Moreover there is a whole new slurm template for this, but the better way would be add that in the default templates directory so it gets maintenance updates.

We need to do this in a generic way and not locked down to a specific slurm/cyclecloud version, as things will quickly go out of date.

@vinil-v
Copy link
Author

vinil-v commented Oct 22, 2024

Vinil, Thank you for the PR. The issue is that the PR makes the script hard code the cyclecloud version, the autoscale package version as well as a slurm version. This is not possible for us to support in this manner. Moreover there is a whole new slurm template for this, but the better way would be add that in the default templates directory so it gets maintenance updates.

We need to do this in a generic way and not locked down to a specific slurm/cyclecloud version, as things will quickly go out of date.

@aditigaur4
Hello

Aditi

, thank you for reviewing the PR. For this configuration, I utilized the most recent packages from cyclecloud-slurm.

Concerning the version of slurm and the project, version 23.11.9-1 of slurm is the newest in the 3.0.9 project. The main issue is the necessity for an external scheduler to enable this setup. I aligned the slurm version with the cyclecloud-slurm version. However, I've also introduced a variable to manage the project/autoscale and slurm versions, which can be updated with new releases. Could you provide guidance on how to make this setup more adaptable, especially regarding the integration of an external scheduler for headless operations?
About the template - it was derived from cyclecloud-slurm 3.0.9's slurm.txt. To achieve headless functionality, I removed the scheduler components without altering anything else.
To generalize, we should specify the Slurm version alongside the supported project version, possibly creating a compatibility matrix for this configuration. This is under the consideration that we're integrating an external scheduler with our autoscale packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants