Output of the above code:
0 [Intel, Iris, Plus, Graphics, 640]
1 [Intel, HD, Graphics, 6000]
2 [Intel, HD, Graphics, 620]
3 [AMD, Radeon, Pro, 455]
4 [Intel, Iris, Plus, Graphics, 650]
Series.str accessor was defined to access a series of string, but in this case, it’s a series of lists . Why
laptops["gpu"].head().str.split().str still run in expected way like the following output?
I was expected a
Series.list accessor and our code would look like
I’m not too sure but it’s probably an artifact from pandas not having a dedicated string type in the past.
Excerpt from the documentation
There are two ways to store text data in pandas:
object -dtype NumPy array.
StringDtype extension type.
We recommend using
StringDtype to store text data.
Prior to pandas 1.0,
object dtype was the only option. This was unfortunate for many reasons:
- You can accidentally store a mixture of strings and non-strings in an
object dtype array. It’s better to have a dedicated dtype.
object dtype breaks dtype-specific operations like
DataFrame.select_dtypes(). There isn’t a clear way to select just text while excluding non-text but still object-dtype columns.
- When reading code, the contents of an
object dtype array is less clear than
And also this:
Before v.0.25.0, the
.str-accessor did only the most rudimentary type checks. Starting with v.0.25.0, the type of the Series is inferred and the allowed types (i.e. strings) are enforced more rigorously.
Generally speaking, the
.str accessor is intended to work only on strings. With very few exceptions, other uses are not supported, and may be disabled at a later point.
When we call
str, I believe it’s a shorthand for
If you read the documentation for
str.get, you’ll see that it specifies explicitly that it works for lists:
Extract element from each component at specified position.
Extract element from lists, tuples, or strings in each element in the Series/Index.