Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core, Spark: Minimize executor memory pressure in broadcast of data to delete files #11957

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

amogh-jahagirdar
Copy link
Contributor

Leaving in draft now, but in an effort to save executor memory as part of the broadcast of data to. file scoped deletes, we can remove the referenced manifest location because that's only needed on the driver at commit time.

Added a builder to FileMetadata which would null out the referenced manifest location.

Additional driver memory is consumed via a mapping of delete to referenced manifest location which is used at commit time to rebuild the delete file with the manifest so that the commit optimization which uses referenced manifest location can be leveraged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant