Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] support sub-classing scikit-learn estimators #6783

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Jan 10, 2025

I recently saw a Stack Overflow post ("Why can't I wrap LGBM?") expressing the same concerns from #4426 ... it's difficult to sub-class lightgbm's scikit-learn estimators.

It doesn't have to be! Look how minimal the code is for XGBRFRegressor:

https://github.com/dmlc/xgboost/blob/45009413ce9f0d2bdfcd0c9ea8af1e71e3c0a191/python-package/xgboost/sklearn.py#L1869

This PR proposes borrowing some patterns I learned while working on xgboost's scikit-learn estimators to make it easier to sub-class lightgbm estimators. This also has the nice side effect of simplifying the lightgbm.dask code 😁

Notes for Reviewers

Why make the breaking change of requiring keyword args?

As part of this PR, I'm proposing immediately switching the constructors for scikit-learn estimators here (including those in lightgbm.dask) to only supporting keyword arguments.

Why I'm proposing this instead of a deprecation cycle:

import lightgbm as lgb
lgb.LGBMClassifier("gbdt")
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# TypeError: LGBMClassifier.__init__() takes 1 positional argument but 2 were given

I posted a related answer to that Stack Overflow question

https://stackoverflow.com/a/79344862/3986677

@jameslamb jameslamb changed the title WIP: [python-package] support sub-classing scikit-learn estimators [python-package] support sub-classing scikit-learn estimators Jan 11, 2025
@jameslamb jameslamb marked this pull request as ready for review January 11, 2025 05:06
assert dask_spec.args[:-1] == sklearn_spec.args
assert dask_spec.defaults[:-1] == sklearn_spec.defaults
assert dask_spec.args[-1] == "client"
assert dask_spec.kwonlyargs == [*sklearn_spec.kwonlyargs, "client"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made these changes based on this test failure:

>       assert dask_spec.kwonlyargs == sklearn_spec.kwonlyargs
E       AssertionError: assert ['client'] == []
E         
E         Left contains one more item: 'client'
E         Use -v to get more diff

(build link)

But also... if the changes I'm proposing in dask.py are accepted, we wouldn't even need to have this test any more, in my opinion. It was just here to ensure the 2 lists of keyword args (one in LGBMModel and one in the Dask estimators) was consistent.

I'd like to discuss removing this test as part of the other review conversation on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant