Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] Add
feature_names_in_
attribute for scikit-learn estimators (fixes #6279) #6310[python-package] Add
feature_names_in_
attribute for scikit-learn estimators (fixes #6279) #6310Changes from 14 commits
8209ffa
52835d8
adc7683
08e67aa
0ecc337
c110c9d
a8a5631
4e1f1dc
b826426
fd1ce7c
edd951a
25888c6
574d9ce
e55474f
8ac21d3
318c3a4
be2bed0
d34d48f
346fb78
11c8334
a8ddc66
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is quite a lot of repetition in these tests. Generally I'm supportive of repetition in tests in favor of making it easier to diagnose issues or selectively include / exclude tests... but for the cases of
.feature_names_in_
and.get_feature_names_out()
, I think the tests should be combined.So can you please reduce these 4 tests down to 2? One for
numpy
input without feature names, one forpandas
input with feature names?Ending with assertions like this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using a code comment, could you please test for this directly? That'd ensure that if
load_digits()
behavior around feature names ever changes, this test will fail and alert us instead of silently passing or maybe failing in some other hard-to-understand way.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please extend these tests to cover all 4 estimators (
LGBMModel
,LGBMClassifier
,LGBMRegressor
,LGBMRanker
)? I know that those last 3 inherit fromLGBMModel
, but if someone were to make a change in how this attributes for, say,LGBMClassifier
only that breaks this behavior, we'd want a failing test to alert us to that.Follow the same pattern used in the existing test right above these,
test_check_is_fitted()
, using the same data for all of the estimators.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this for loop approach, could you please change these tests to parameterize over classes, like this?
LightGBM/tests/python_package_test/test_sklearn.py
Line 1279 in 4401401
That'd reduce the risk of mistakes like this one (where only the
LGBMModel
instance,est
is being tested).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the thing being tested in this PR isn't really dependent on the content of the learned model, could you please use
n_estimators=2
andnum_leaves=7
in all the tests? That'd make the tests slightly faster and cheaper without reducing their effectiveness in detecting issues.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simplicity, please just treat all samples in
X
as part of a single query group. LightGBM supports that, and it won't materially change the effectiveness of these tests.