BUG: Fixed TypeError for Series.isin() when large series and values contains NA (#60678) #60736
+36
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
doc/source/whatsnew/v3.0.0.rst
file if fixing a bug or adding a new feature.Issue
Series.isin()
raisesTypeError: boolean value of NA is ambiguous
when Series is large enough (>1_000_000) andvalues
containsNA
Reason
Series.isin()
internally usesnp.isin()
for large series with smaller values to increase performance but it does not handles the case when values is ofdtype=object
and containsNA
and passes it tonp.isin
pandas/pandas/core/algorithms.py
Lines 543 to 568 in a4e8149
np.isin()
internally callsin1d()
which uses==
sign withNA
.TypeError
because the boolean value ofpd.NA
is ambiguous. refer docs.Fix Implemented
Explicitly checking if
values
containsNA
when large series and small number of values (<= 26) to avoid usingnp.isin
inalgorithms.py
.Testing
Successfully pass all existing test cases in
test_isin.py
with tests added for large series withdtype
asboolean
,Int64
andFloat64
as follow:dtype==boolean
andvalues
containpd.NA
dtype==boolean
andvalues
contains mixed data withpd.NA
dtype==boolean
andvalues
emptydtype==Int64
andvalues
containspd.NA
dtype==Float64
andvalues
containspd.NA