Why is `in` membership check searching in pd.Series keys rather than values

What a crazy gotcha i was hit with.

Before i thought of the recommended pandas way of s.isin, i used the x in s. (do not do this, it’s much slower than s.isin)

I did not realize then that it was searching in the keys of the series and not the values.
Only later i realized if i wanted to map, lambda, in, i have to do s.values to extract series values first.

  1. Why is map, lambda, in (i don’t know which one or combination of them is causing this behaviour), searching within series keys?

I thought the right-hand-side of any in operator just has to be an iterable, and by doing for i in series: print(i) to check what is the iterable contained in any object(series in this case), we indeed see the values (1,2,3) printed, so the items among which membership check is happening should be the values (1,2,3) and not the index (0,1,2)?

  1. How does series.isin work differently from map, lambda, in for it to check membership within series values correctly?
s = pd.Series([1,2,3])
s
to_search = [1,2,3]
s.isin(to_search)
list(map(lambda x:x in s, to_search))   # WHY IS IT SEARCHING KEYS!
list(map(lambda x:x in s.values, to_search))

image

2 Likes

That’s just how the method is defined. From the documentation:

Return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.

This is all in. Given two python objects a and b, a in b is just a convenient way of writing b.__contains__(a). This means that b.__contains__ defines the behavior of a in b.

Turning back to your example, 3 in s is short for s.__contains__(3).

>>> s.__contains__(3)
False

Why does this method behave like this? In the source code (or by running help(s.__contains__)) we can see the following:

def __contains__(self, key) -> bool_t:
    """True if the key is in the info axis"""
    return key in self._info_axis

So, this s_info_axis guy holds the information.

>>> s._info_axis
RangeIndex(start=0, stop=3, step=1)

The output above answers your question.

This should be a non-issue by now, but for the sake a completion, let’s make one more observation.

There are two different threads here:

  1. in uses obj.__contains__.
  2. A for loop uses obj.__iter__.

They’re two completely different things that are often in aligment, but not here:

>>> print(*s.__iter__())
1 2 3

Thanks, is there a way to see the class and method inheritance chain?

I was thinking which __iter__ is it. There is a __iter__ in pandas\pandas\core\generic.py which is basically the index.

    def __iter__(self):
        """
        Iterate over info axis.

        Returns
        -------
        iterator
            Info axis as iterator.
        """
        return iter(self._info_axis)

Later, i did s.__iter__ to see <bound method IndexOpsMixin.__iter__ then i realize it’s another __iter__ in the mixin.

Is there a easier way to see the inheritance than printing obj.__dunder__ ?
Also, whenever i’m in a method, i must scroll up endlessly to find which class contains this method and risk jumping into another class, any way to jump out of/collapse current cursor position over a method to find it’s class? (i’m using VS Code)

Yeah, things like this are the reason why I presented the alternative of calling help, because it will show you true information and get this out of the way for you, even if it isn’t as insightful.

Given a class, you can find its methods by calling dir:

>>> dir(int)
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']

So it’s enough to resolve this problem for classes. To find the ancestor classes we can use inspect.getmro in the following manner:

>>> import inspect
>>> print(*inspect.getmro(type(s)), sep="\n")
<class 'pandas.core.series.Series'>
<class 'pandas.core.base.IndexOpsMixin'>
<class 'pandas.core.generic.NDFrame'>
<class 'pandas.core.base.PandasObject'>
<class 'pandas.core.accessor.DirNamesMixin'>
<class 'pandas.core.base.SelectionMixin'>
<class 'pandas.core.indexing.IndexingMixin'>
<class 'object'>

Alternatively we can just inspect type(s).__mro__, but I think this isn’t as universally applicable (I maybe be wrong):

>>> type(s).__mro__
(<class 'pandas.core.series.Series'>, <class 'pandas.core.base.IndexOpsMixin'>, <class 'pandas.core.generic.NDFrame'>, <class 'pandas.core.base.PandasObject'>, <class 'pandas.core.accessor.DirNamesMixin'>, <class 'pandas.core.base.SelectionMixin'>, <class 'pandas.core.indexing.IndexingMixin'>, <class 'object'>)

I don’t know. Maybe try asking at Super User. Please report back when you know the answer.

1 Like