Base on this dataframe:
Data_usage.csv (2.9 KB)
is a (47, 34)
What I’m trying to achieve is:
Sum all the X for each Company, only able to get this:
n = len(data[data.iloc[0]])
ValueError: cannot index with vector containing NA / NaN values
I drop the NaN
value like this:
import numpy as np
data1 = data.replace(np.nan, '', regex=True)
data1.head()
My question is:
My main question is how to count all the x for each company like Facebook, I tried with a specific column: single_row = data.loc["Facebook"].value_counts()
single_row = 24
what I’m looking for is to have the number of x for each company, and the % to be saved in the new column.
Since the values are Str I don’t know how to do it?
1 Like
Hi infinetw
In the original DataFrame do you have only NaN
and x
for data values?
@infinetw Please provide sample data frame in text format instead of image so we can reproduce same quickly on our side.
Yes, because it Has Blank spaces and X
When I pasted from Jupyter lost the format and is hard to read I upload the csv to the question.
In that case, you can replace blank spaces with NaN
.
You can use following code to find out the count of missing values in each row.
df.isnull().sum(axis=1)
Otherwise, following can be used to find the count of non-null values
df.notnull().sum(axis=1)
Hope this helps.
Thanks.
Thanks for your answer, but how I count all the X values for each company?
Your dataset has either NaN
or x
for values.
Above code will give you the count of Null and Not null values for each row. So, the count of not null values is the count of x
in the row.
You can use Company
column as the index of the DataFrame, you’ll get the count of x
for that company.
data = [['Facebook', np.NaN, 'x', np.NaN], ['Instagram', np.NaN, np.NaN, 'x'],
['Tinder', 'x', 'x', 'x'], ['Grindr', np.NaN, 'x', 'x'],
['Uber', 'x', np.NaN,'x']]
cols = ['Company', '% of data', 'email','Age']
df = pd.DataFrame(data, columns=cols)
df.set_index('Company',inplace=True)
df.notnull().sum(axis=1)
Company
Facebook 1
Instagram 1
Tinder 3
Grindr 2
Uber 2
dtype: int64
Hope this helps.
Thanks.
You can use lambda
here
df["New_col"] = df.apply(lambda x: sum(x == 'x'), axis=1)
It will apply lambda function row wise and compare each cell of row that is it x
or not?.
Great!! how I used the lambda result to divide the number of x with the number of fields (34) to get the %
You can do
df["New_col"] = df.apply(lambda x: sum(x == 'x'), axis=1) / 34
PS: This should work i have not tested.