-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python, rust): add statistics_enabled to ColumnProperties #3126
feat(python, rust): add statistics_enabled to ColumnProperties #3126
Conversation
Signed-off-by: Max Piskunov <max.piskunov@plus.ai>
775dff4
to
c934459
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3126 +/- ##
==========================================
- Coverage 72.28% 72.24% -0.04%
==========================================
Files 134 134
Lines 42973 42988 +15
Branches 42973 42988 +15
==========================================
- Hits 31062 31056 -6
- Misses 9923 9940 +17
- Partials 1988 1992 +4 ☔ View full report in Codecov by Sentry. |
@maxitg looks good, could you remove the max_statistics_size as well? |
@ion-elgreco I’ve added a second commit to remove it. I initially avoided it to prevent a breaking change. |
Signed-off-by: Max Piskunov <max.piskunov@plus.ai>
ca754fc
to
92f5cac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvement! This is a situation where I'm definitely okay with breaking changes since the code wasn't working anyways, we're also on the precipice of a breaking set of changes anyways 😄
Description
statistics_enabled
inColumnProperties
.max_statistics_size
fromColumnProperties
.Related Issue(s)
Documentation
The only available way currently to avoid writing statistics to the transaction log for a column is via
max_statistics_size
. However, that has been deprecated inarrow-rs
, furthermore it appears to be ignored: apache/arrow-rs#2033. Being able to skip statistics is necessary for large data columns, as otherwise transaction log explodes after a few tens of thousands of transactions and becomes non-scalable.Examples