-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create ray cluster with ssa #778
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #778 +/- ##
==========================================
+ Coverage 90.43% 92.40% +1.97%
==========================================
Files 23 23
Lines 1359 1396 +37
==========================================
+ Hits 1229 1290 +61
+ Misses 130 106 -24 ☔ View full report in Codecov by Sentry. |
658779c
to
11dd0cb
Compare
9d8295d
to
73987d6
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
d02ec50
to
73cbb2c
Compare
43fb625
to
6c8f3db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add user documentation for this PR?
https://github.com/project-codeflare/codeflare-sdk/blob/main/docs/sphinx/user-docs/ray-cluster-interaction.rst
352c26c
to
34b0e86
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Began testing this again on both Kind and OpenShift.
It seems that Appwrapper is not working with the new changes when creating a cluster via apply()
.
TypeError: Discoverer.get() takes 1 positional argument but 2 were given
34b0e86
to
1f606fb
Compare
I have encapsulated the exception and was not able to reproduce the exact issue here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested this again and Appwrapper is working for initial apply but I ran into a couple of issues when trying to update the AppWrapper and apply my changes.
I kept running into this error with regards to the field manager even with force=True
:
A conflict occurred with the RayCluster resource.
Only one RayCluster with the same name is allowed. Please delete or rename the existing RayCluster before creating a new one with the desired name.
Response: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Apply failed with 1 conflict: conflict with \\"codeflare-operator\\" using workload.codeflare.dev/v1beta2: .spec.components","reason":"Conflict","details":{"causes":[{"reason":"FieldManagerConflict","message":"conflict with \\"codeflare-operator\\" using workload.codeflare.dev/v1beta2","field":".spec.components"}]},"code":409}\n'
nice catch. I reproduced it. There is 2 issues actually. For appwrappers, I was not propagating the force=True . Trying to edit an appwrapper manually with
|
AppWrappers enforce that spec.components[*].template is immutable after creation, because updates may invalidate the PodSet info that they've extracted and given to Kueue. I'm missing the bigger context of what you are trying to do, but you will not be able to mutate a RayCluster wrapped in an AppWrapper in place. |
hi @dgrove-oss , |
1f606fb
to
074557a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm, just one nit
074557a
to
225db7b
Compare
@KPostOffice and @Bobbins228 do you think we can merge it? |
- Adds RayCluster.apply() implementation - Adds e2e tests for apply - Adds unit tests for apply - Exclude unit tests code from coverage - Add coverage to cluster.py - Adding coverage for the case of an openshift cluster - Refactoring apply - Adding constant, adding error checking, adding documentation - Rename _apply_resources into _apply_ray_cluster
c17095a
to
b21b177
Compare
Issue link
What changes have been made
Adding apply function.
Verification steps
Checks