Hi,
I was wondering what’s the difference between the two?
- df[col].str.split().str[0] with df[col].str.split()[0]
- df[col][df[col] == "condition] with df[df[col] == “condition”][col]
Thanks
Hi,
I was wondering what’s the difference between the two?
Thanks
Welcome to the Community!
Let’s imagine that we have the following dataframe:
df = pd.DataFrame({'Phrase': ['Santa Claus', 'Christmas Tree', 'Merry Christmas'],
'Number': [1, 2, 3]})
print(df)
Output:
Phrase Number
0 Santa Claus 1
1 Christmas Tree 2
2 Merry Christmas 3
Let’s apply df[col].str.split().str[0]
and df[col].str.split()[0]
from your first question, but before doing so, we’ll apply also df[col].str.split()
:
print(df['Phrase'].str.split())
print('\n')
print(df['Phrase'].str.split().str[0])
print('\n')
print(df['Phrase'].str.split()[0])
0 [Santa, Claus]
1 [Christmas, Tree]
2 [Merry, Christmas]
Name: Phrase, dtype: object
0 Santa
1 Christmas
2 Merry
Name: Phrase, dtype: object
['Santa', 'Claus']
We see that:
df[col].str.split()
splits each string from the column by white space and returns a list of strings for that cell,
df[col].str.split().str[0]
takes the first item of the list of strings for each cell and returns it as a string,
df[col].str.split()[0]
returns the first value of the Series object, i.e., the first list of strings.
As for your second question, there is no difference between those pieces of code:
print(df['Phrase'][df['Phrase'] == 'Christmas Tree'])
print('\n')
print(df[df['Phrase'] == 'Christmas Tree']['Phrase'])
Output:
1 Christmas Tree
Name: Phrase, dtype: object
1 Christmas Tree
Name: Phrase, dtype: object
Hi @Elena_Kosourova,
Sorry for the really late reply. Btw, thanks for the answer .
For the second question though, that was my point. while we print the dataframe there’s no different at all. But when assigning new value only one line of code is working. Here’s the snippet.
Code that is not working
df = pd.DataFrame({'Phrase': ['Santa Claus', 'Christmas Tree', 'Merry Christmas'], 'Number': [1, 2, 3]})
df[df['Phrase'] == 'Christmas Tree']['Phrase'] = "The Tree"
print(df)
Output:
Phrase Number
0 Santa Claus 1
1 Christmas Tree 2
2 Merry Christmas 3
Code that is working
df = pd.DataFrame({'Phrase': ['Santa Claus', 'Christmas Tree', 'Merry Christmas'], 'Number': [1, 2, 3]})
df['Phrase'][df['Phrase'] == 'Christmas Tree'] = "The Tree"
print(df)
Output:
Phrase Number
0 Santa Claus 1
1 The Tree 2
2 Merry Christmas 3
Thanks
Yes, indeed, I checked it now, and you’re right. When I run df[df['Phrase'] == 'Christmas Tree']['Phrase'] = "The Tree"
, the dataframe isn’t modified, and a SettingWithCopyWarning
is thrown. This warning is actually not so innocent, and even if the rest of the code runs normally (i.e., the program doesn’t stop running as it happens, for example, with SyntaxError
), the code can behave unexpectedly in such cases and lead to wrong results. You can read more about it here. Great catch, by the way, thank you!