What's the difference between between str.split()[0] with str.split().str[0]?

Hi,

I was wondering what’s the difference between the two?

  1. df[col].str.split().str[0] with df[col].str.split()[0]
  2. df[col][df[col] == "condition] with df[df[col] == “condition”][col]

Thanks

1 Like

Hi @muktisetyaji9b320cc7,

Welcome to the Community!

Let’s imagine that we have the following dataframe:

df = pd.DataFrame({'Phrase': ['Santa Claus', 'Christmas Tree', 'Merry Christmas'], 
                   'Number': [1, 2, 3]})
print(df)

Output:

            Phrase  Number
0      Santa Claus       1
1   Christmas Tree       2
2  Merry Christmas       3

Let’s apply df[col].str.split().str[0] and df[col].str.split()[0] from your first question, but before doing so, we’ll apply also df[col].str.split():

print(df['Phrase'].str.split())
print('\n')
print(df['Phrase'].str.split().str[0])
print('\n')
print(df['Phrase'].str.split()[0])
0        [Santa, Claus]
1     [Christmas, Tree]
2    [Merry, Christmas]
Name: Phrase, dtype: object


0        Santa
1    Christmas
2        Merry
Name: Phrase, dtype: object


['Santa', 'Claus']

We see that:
df[col].str.split() splits each string from the column by white space and returns a list of strings for that cell,
df[col].str.split().str[0] takes the first item of the list of strings for each cell and returns it as a string,
df[col].str.split()[0] returns the first value of the Series object, i.e., the first list of strings.

As for your second question, there is no difference between those pieces of code:

print(df['Phrase'][df['Phrase'] == 'Christmas Tree'])
print('\n')
print(df[df['Phrase'] == 'Christmas Tree']['Phrase'])

Output:

1    Christmas Tree
Name: Phrase, dtype: object


1    Christmas Tree
Name: Phrase, dtype: object
2 Likes

Hi @Elena_Kosourova,

Sorry for the really late reply. Btw, thanks for the answer :+1:.

For the second question though, that was my point. while we print the dataframe there’s no different at all. But when assigning new value only one line of code is working. Here’s the snippet.

Code that is not working

df = pd.DataFrame({'Phrase': ['Santa Claus', 'Christmas Tree', 'Merry Christmas'], 'Number': [1, 2, 3]})
df[df['Phrase'] == 'Christmas Tree']['Phrase'] = "The Tree"
print(df)

Output:

            Phrase  Number
0      Santa Claus       1
1   Christmas Tree       2
2  Merry Christmas       3

Code that is working

df = pd.DataFrame({'Phrase': ['Santa Claus', 'Christmas Tree', 'Merry Christmas'], 'Number': [1, 2, 3]})
df['Phrase'][df['Phrase'] == 'Christmas Tree'] = "The Tree"
print(df)

Output:

            Phrase  Number
0      Santa Claus       1
1         The Tree       2
2  Merry Christmas       3

Thanks :slight_smile:

1 Like

Hi @muktisetyaji9b320cc7,

Yes, indeed, I checked it now, and you’re right. When I run df[df['Phrase'] == 'Christmas Tree']['Phrase'] = "The Tree", the dataframe isn’t modified, and a SettingWithCopyWarning is thrown. This warning is actually not so innocent, and even if the rest of the code runs normally (i.e., the program doesn’t stop running as it happens, for example, with SyntaxError), the code can behave unexpectedly in such cases and lead to wrong results. You can read more about it here. Great catch, by the way, thank you! :slightly_smiling_face:

1 Like