Core, Spark: Minimize executor memory pressure in broadcast of data to delete files #11957
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Leaving in draft now, but in an effort to save executor memory as part of the broadcast of data to. file scoped deletes, we can remove the referenced manifest location because that's only needed on the driver at commit time.
Added a builder to FileMetadata which would null out the referenced manifest location.
Additional driver memory is consumed via a mapping of delete to referenced manifest location which is used at commit time to rebuild the delete file with the manifest so that the commit optimization which uses referenced manifest location can be leveraged.