Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-37021][state/forst] Implement fast cp/restore for ForSt StateBackend #25924

Closed

Conversation

AlexYinHan
Copy link
Contributor

What is the purpose of the change

This PR implements fast snapshot/restore for ForSt. Specifically, it implements different strategies for ForStStateDataTransfer, so that ForSt can reuse the checkpoint files as much as possible and thus reduce the cost of file copying.

Brief change log

  • Enhance the FileMappingManager so it can track the ownership of files
  • Implement different strategies for ForStStateDataTransfer

Verifying this change

This change added tests and can be verified as follows:

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (not applicable)

@AlexYinHan AlexYinHan marked this pull request as draft January 8, 2025 12:36
@flinkbot
Copy link
Collaborator

flinkbot commented Jan 8, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@AlexYinHan AlexYinHan marked this pull request as ready for review January 9, 2025 11:43
@AlexYinHan AlexYinHan force-pushed the yh/fast_cp_on_fast_rescale branch 3 times, most recently from e25d688 to d8b1853 Compare January 14, 2025 03:28
Copy link
Contributor

@Zakelly Zakelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I haven't finished review and left couple of comments in advance:

@AlexYinHan AlexYinHan force-pushed the yh/fast_cp_on_fast_rescale branch 2 times, most recently from eaea7d9 to 0585bec Compare January 16, 2025 04:10
@AlexYinHan AlexYinHan force-pushed the yh/fast_cp_on_fast_rescale branch from 0585bec to 42c4040 Compare January 16, 2025 06:42
@AlexYinHan AlexYinHan force-pushed the yh/fast_cp_on_fast_rescale branch 3 times, most recently from e64f689 to bd2aec3 Compare January 17, 2025 11:05
@AlexYinHan AlexYinHan force-pushed the yh/fast_cp_on_fast_rescale branch 2 times, most recently from 958d085 to b324519 Compare January 18, 2025 01:51
@AlexYinHan
Copy link
Contributor Author

@flinkbot run azure

@AlexYinHan AlexYinHan force-pushed the yh/fast_cp_on_fast_rescale branch 2 times, most recently from a69c797 to a9905bd Compare January 19, 2025 08:53
@Zakelly
Copy link
Contributor

Zakelly commented Jan 19, 2025

And it seems the DataTransferStrategy should be per-transfer related, each transfer should have its own strategy, right? But now there is only one strategy for one state backend.

We can fix this later.

@AlexYinHan AlexYinHan force-pushed the yh/fast_cp_on_fast_rescale branch from a9905bd to 5f8ee5a Compare January 20, 2025 03:00
Copy link
Contributor

@Zakelly Zakelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Zakelly
Copy link
Contributor

Zakelly commented Jan 20, 2025

CI is not triggering....

@Zakelly
Copy link
Contributor

Zakelly commented Jan 20, 2025

@flinkbot run azure

@AlexYinHan AlexYinHan force-pushed the yh/fast_cp_on_fast_rescale branch from 5f8ee5a to b364d6b Compare January 20, 2025 04:40
@AlexYinHan AlexYinHan force-pushed the yh/fast_cp_on_fast_rescale branch from b364d6b to 88520eb Compare January 20, 2025 05:29
@Zakelly
Copy link
Contributor

Zakelly commented Jan 20, 2025

@flinkbot run azure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants