-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal Varianсe Sampling booster #4266
Conversation
…added spinx version upper bound
…added spinx version upper bound
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for the contribution! I'll do a more complete review in the next day or two.
For now, I left two very minor suggestions for the R side.
Could you also please add some tests on this feature? As a start, you could try updating the Dask tests to cover this code:
boosting_types = ['gbdt', 'dart', 'goss', 'rf'] |
If you get stuck, please let me know and I'd be happy to help.
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
@kruda are you still interested in pursuing this pull request? |
@shiyu1994 what does it mean that this PR is listed in #4677 with your name next to it? Does it mean you're planning a separate implementation of MVS from your own branch? And if that's it, should this PR be closed? It has been almost 4 months since it received a new commit. |
Hi @jameslamb, sorry if that confuses you. No. We are just trying to merge this PR. If @kruda is not responding, can we merge this first (there are only minor issues to be fixed). And we can open some subsequent PRs to complete it. What do you think about this? I'll remove my name from the list to avoid ambiguity. Thanks you. |
I suppose it'll be better to continue the work (apply fixes) directly in this PR. If @kruda didn't unchecked enabled by default possibility to push into this branch by maintainers, then it'll possible. |
Oh sure. It would be the best if we can directly push to this branch. Thanks for your suggestion. BTW, @StrikerRUS @jameslamb, I think both of you are more experienced then me in managing open-source projects. So if any of my behaviors makes you confused or seems no good to the project, please feel free to correct me. Thanks! |
@shiyu1994 Thank you for your kindness and openness! I believe everything is fine, please don't worry. |
I agree with @StrikerRUS . If someone opens a pull request into this project from their fork (even It's ok for you to push changes directly here @shiyu1994 . Then @StrikerRUS , I, and others can review them when you're ready. |
I think it is better to merge this PR into a branch of the LightGBM repo first, then we continue to develop on that branch, and merge it to the master branch when it is ready. |
Just created a new |
the merge is blocked by conflicts 😅 , it seems most of them are document related. @shiyu1994 @jameslamb can you help to reslove them? |
…t#5068) * Update test_windows.ps1 * Update .appveyor.yml * Update test_windows.ps1 * Update .appveyor.yml
I'm picking this up. This PR is almost ready. |
@shiyu1994 then we still merge it to mvs_dev branch first, then you can develop based on it? |
We should definitely do that. @kruda could delete their fork or this branch at any time. Let's move merge this code to |
Done with that. Will work on |
@kruda Thank you so much for your work! |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Implementation of minimal variance sampling for stochastic gradient boosting. Contributes to #2644.
Note for reviers
I've tested it on examples for binary and multiclass classification from lightgbm repository, tried to train on Higgs dataset using slightly modified config from https://github.com/guolinke/boosting_tree_benchmarks.git. Trained several times on https://www.kaggle.com/c/DontGetKicked .
On tested datasets this MVS implementation shows better score and overfitts, than simple bagging strategy with same sampling rate.
Update:
Mean relative error change table:
Example on adult dataset:
Code to reproduce:
https://github.com/kruda/lightgbm_mvs_experiments