Why can we use series.str accessor for list?

In 293-8
laptops["gpu"].head().str.split()
Output of the above code:

0       [Intel, Iris, Plus, Graphics, 640]
1              [Intel, HD, Graphics, 6000]
2               [Intel, HD, Graphics, 620]
3                  [AMD, Radeon, Pro, 455]
4       [Intel, Iris, Plus, Graphics, 650]

When Series.str accessor was defined to access a series of string, but in this case, it’s a series of lists . Why laptops["gpu"].head().str.split().str[0] still run in expected way like the following output?

0       Intel
1       Intel
2       Intel
3         AMD
4       Intel

I was expected a Series.list accessor and our code would look like laptops["gpu"].head().str.split().list[0]

1 Like

I’m not too sure but it’s probably an artifact from pandas not having a dedicated string type in the past.

Excerpt from the documentation

There are two ways to store text data in pandas:

  1. object -dtype NumPy array.
  2. StringDtype extension type.

We recommend using StringDtype to store text data.

Prior to pandas 1.0, object dtype was the only option. This was unfortunate for many reasons:

  1. You can accidentally store a mixture of strings and non-strings in an object dtype array. It’s better to have a dedicated dtype.
  2. object dtype breaks dtype-specific operations like DataFrame.select_dtypes(). There isn’t a clear way to select just text while excluding non-text but still object-dtype columns.
  3. When reading code, the contents of an object dtype array is less clear than 'string'.

And also this:

Before v.0.25.0, the .str-accessor did only the most rudimentary type checks. Starting with v.0.25.0, the type of the Series is inferred and the allowed types (i.e. strings) are enforced more rigorously.

Generally speaking, the .str accessor is intended to work only on strings. With very few exceptions, other uses are not supported, and may be disabled at a later point.

When we call str[0], I believe it’s a shorthand for str.get(0).

If you read the documentation for str.get, you’ll see that it specifies explicitly that it works for lists:

Series.str.get(i)

Extract element from each component at specified position.

Extract element from lists, tuples, or strings in each element in the Series/Index.

2 Likes