-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] support sub-classing scikit-learn estimators #6783
base: master
Are you sure you want to change the base?
Conversation
…htGBM into python/sklearn-subclassing
assert dask_spec.args[:-1] == sklearn_spec.args | ||
assert dask_spec.defaults[:-1] == sklearn_spec.defaults | ||
assert dask_spec.args[-1] == "client" | ||
assert dask_spec.kwonlyargs == [*sklearn_spec.kwonlyargs, "client"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made these changes based on this test failure:
> assert dask_spec.kwonlyargs == sklearn_spec.kwonlyargs
E AssertionError: assert ['client'] == []
E
E Left contains one more item: 'client'
E Use -v to get more diff
But also... if the changes I'm proposing in dask.py
are accepted, we wouldn't even need to have this test any more, in my opinion. It was just here to ensure the 2 lists of keyword args (one in LGBMModel
and one in the Dask estimators) was consistent.
I'd like to discuss removing this test as part of the other review conversation on this PR.
I recently saw a Stack Overflow post ("Why can't I wrap LGBM?") expressing the same concerns from #4426 ... it's difficult to sub-class
lightgbm
'sscikit-learn
estimators.It doesn't have to be! Look how minimal the code is for
XGBRFRegressor
:https://github.com/dmlc/xgboost/blob/45009413ce9f0d2bdfcd0c9ea8af1e71e3c0a191/python-package/xgboost/sklearn.py#L1869
This PR proposes borrowing some patterns I learned while working on
xgboost
'sscikit-learn
estimators to make it easier to sub-classlightgbm
estimators. This also has the nice side effect of simplifying thelightgbm.dask
code 😁Notes for Reviewers
Why make the breaking change of requiring keyword args?
As part of this PR, I'm proposing immediately switching the constructors for
scikit-learn
estimators here (including those inlightgbm.dask
) to only supporting keyword arguments.Why I'm proposing this instead of a deprecation cycle:
scikit-learn
itself does this (HistGradientBoostingClassifier example)I posted a related answer to that Stack Overflow question
https://stackoverflow.com/a/79344862/3986677