Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jan 1, 2025: This week(s) in DataFusion #13970

Closed
3 of 11 tasks
alamb opened this issue Jan 1, 2025 · 9 comments
Closed
3 of 11 tasks

Jan 1, 2025: This week(s) in DataFusion #13970

alamb opened this issue Jan 1, 2025 · 9 comments
Assignees

Comments

@alamb
Copy link
Contributor

alamb commented Jan 1, 2025

Introduction

This ticket is a weekly-ish summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please leave comments on this ticket about things that I may have missed or you think should get wider attention by the community. Follow on to #13760

Loosely inspired by https://this-week-in-rust.org/

Reminder, find new content (and please post some!) to

Community Highlights

Performance

DataFusion's core value proposition is great performance without having to re-implement it yourself

Materialized Views

@suremarc is cranking along with a materialized view implementation 🚀. See the https://github.com/datafusion-contrib/datafusion-materialized-views repo and PRs like

Features

Substrait!

The substrait implementation is getting some significant upgrades thanks to @Blizzara @robtandy and @vbarua. For example:

BTW it would be great if someone could fix the docs:

Sort improvements

Quality

sqlite test suite

Also, @Omega359 just landed a major project to start running the sqlite test suite #13936 which is huge

Documentation

Bugs / testing

DataFusion is in the "we are finding all the corner case bugs now" phase of its life and people

Huge shout out to everyone who helped test the 44.0.0 release, especially @andygrove @timsaucer and @shehabgamin from sail #13855

Easier to use remote /async catalogs

Releases

#13722

Looking to get more involved? Please help review code! 🎣

DataFusion has a long history of community members contributing in all aspects of the project. Reviewing PRs is an especially great way to get introduced to the project, help the community and grow your own knowledge -- researching and understanding the code enough to review PRs also often inspires additional ideas for improvements.

We have docs about reviews. TLDR is: look for test coverage, if the change is understandable and well documented, and if the code can be improved. When you think the PR looks good to merge, try @ mentioning one of the committers.

Help wanted

  • I would love to see the community offer additional help testing, triaging bugs helping to make DataFusion a more stable foundation for building systems

Please feel leave your own comments on this ticket if you are looking for help

Community

Upcoming meetups:

@alamb alamb pinned this issue Jan 1, 2025
@alamb alamb self-assigned this Jan 1, 2025
@Omega359
Copy link
Contributor

Omega359 commented Jan 1, 2025

congrats @jonahgao for becoming a committer!

@alamb
Copy link
Contributor Author

alamb commented Jan 7, 2025

This is a pretty cool API from @westonpace making it easier to implement remote (async) catalogs:

@alamb
Copy link
Contributor Author

alamb commented Jan 8, 2025

2025: The year of 1000 systems built on Datafusion: https://www.influxdata.com/blog/datafusion-2025-influxdb/

@alamb
Copy link
Contributor Author

alamb commented Jan 9, 2025

@alamb
Copy link
Contributor Author

alamb commented Jan 9, 2025

Blog on DuckDB vs Datafusion: https://performancede.substack.com/p/duckdb-vs-datafusion

@alamb
Copy link
Contributor Author

alamb commented Jan 12, 2025

Some really great work by @zhuqi-lucas to add the h20.ai benchmark to the repo:

Also thank you @2010YOUY01 for the assist!

@alamb
Copy link
Contributor Author

alamb commented Jan 12, 2025

A cool PR from @chenkovsky adding metadata support:

@alamb
Copy link
Contributor Author

alamb commented Jan 16, 2025

We (@Omega359 ) has all sqlogictests running cleanly on main! This is 10,000 .slt files and 100s of thousands of tests (it takes ~ 2 hours to run with a release build 🤯 )

@alamb
Copy link
Contributor Author

alamb commented Jan 18, 2025

@alamb alamb closed this as completed Jan 18, 2025
@alamb alamb unpinned this issue Jan 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants