-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jan 1, 2025: This week(s) in DataFusion #13970
Comments
congrats @jonahgao for becoming a committer! |
This is a pretty cool API from @westonpace making it easier to implement remote (async) catalogs: |
2025: The year of 1000 systems built on Datafusion: https://www.influxdata.com/blog/datafusion-2025-influxdb/ |
Summary of Chicago meetup |
Blog on DuckDB vs Datafusion: https://performancede.substack.com/p/duckdb-vs-datafusion |
Some really great work by @zhuqi-lucas to add the h20.ai benchmark to the repo: Also thank you @2010YOUY01 for the assist! |
A cool PR from @chenkovsky adding metadata support: |
We (@Omega359 ) has all sqlogictests running cleanly on main! This is 10,000 |
|
Introduction
This ticket is a weekly-ish summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please leave comments on this ticket about things that I may have missed or you think should get wider attention by the community. Follow on to #13760
Loosely inspired by https://this-week-in-rust.org/
Reminder, find new content (and please post some!) to
Community Highlights
Performance
DataFusion's core value proposition is great performance without having to re-implement it yourself
LIKE
predicates: Implement predicate pruning forlike
expressions (prefix matching) #12978 (prefix match)Materialized Views
@suremarc is cranking along with a materialized view implementation 🚀. See the https://github.com/datafusion-contrib/datafusion-materialized-views repo and PRs like
Features
SHOW FUNCTIONS
#13799generate_series
Support 1 or 3 arg in generate_series() UDTF #13856initcap
work with unicode as well as ascii: Support unicode character forinitcap
function #13752Substrait!
The substrait implementation is getting some significant upgrades thanks to @Blizzara @robtandy and @vbarua. For example:
BTW it would be great if someone could fix the docs:
Sort improvements
execution_mode
withemission_type
andboundedness
#13823Quality
sqlite
test suiteAlso, @Omega359 just landed a major project to start running the sqlite test suite #13936 which is huge
Documentation
user_doc
macro, porttrim
to use new macro #13952Bugs / testing
DataFusion is in the "we are finding all the corner case bugs now" phase of its life and people
Huge shout out to everyone who helped test the
44.0.0
release, especially @andygrove @timsaucer and @shehabgamin from sail #13855ScalarValue::to_array_of_size
for DenseUnion #13797array_distinct
#13810null_buffer
length check toStringArrayBuilder
/LargeStringArrayBuilder
#13758Easier to use remote /
async
catalogsAsyncCatalogProvider
helpers for asynchronous catalogs #13800Releases
44.0.0
#13334 🎉0.53.0
/ sqlparser_derive0.3.0
datafusion-sqlparser-rs#151754.0.0
(December 2024) arrow-rs#6342#13722
Looking to get more involved? Please help review code! 🎣
DataFusion has a long history of community members contributing in all aspects of the project. Reviewing PRs is an especially great way to get introduced to the project, help the community and grow your own knowledge -- researching and understanding the code enough to review PRs also often inspires additional ideas for improvements.
We have docs about reviews. TLDR is: look for test coverage, if the change is understandable and well documented, and if the code can be improved. When you think the PR looks good to merge, try
@
mentioning one of the committers.Help wanted
Please feel leave your own comments on this ticket if you are looking for help
Community
Upcoming meetups:
DISCUSSION: January 2025 DataFusion Meetup in Amsterdam / CIDR 2025 #12988
The text was updated successfully, but these errors were encountered: