Sum all string in a Pandas Dataframe

Base on this dataframe:

Data_usage.csv (2.9 KB)

is a (47, 34)

What I’m trying to achieve is:
Sum all the X for each Company, only able to get this:

n = len(data[data.iloc[0]])
ValueError: cannot index with vector containing NA / NaN values

I drop the NaN value like this:

import numpy as np
data1 = data.replace(np.nan, '', regex=True)

My question is:
My main question is how to count all the x for each company like Facebook, I tried with a specific column: single_row = data.loc["Facebook"].value_counts() single_row = 24

what I’m looking for is to have the number of x for each company, and the % to be saved in the new column.
Since the values are Str I don’t know how to do it?

1 Like

Hi infinetw

In the original DataFrame do you have only NaN and x for data values?

@infinetw Please provide sample data frame in text format instead of image so we can reproduce same quickly on our side.

Yes, because it Has Blank spaces and X

When I pasted from Jupyter lost the format and is hard to read I upload the csv to the question.

In that case, you can replace blank spaces with NaN.
You can use following code to find out the count of missing values in each row.

Otherwise, following can be used to find the count of non-null values

Hope this helps.

Thanks for your answer, but how I count all the X values for each company?

Your dataset has either NaN or x for values.
Above code will give you the count of Null and Not null values for each row. So, the count of not null values is the count of x in the row.
You can use Company column as the index of the DataFrame, you’ll get the count of x for that company.

data = [['Facebook', np.NaN, 'x', np.NaN], ['Instagram', np.NaN, np.NaN, 'x'],
        ['Tinder', 'x', 'x', 'x'], ['Grindr', np.NaN, 'x', 'x'],
        ['Uber', 'x', np.NaN,'x']]
cols = ['Company', '% of data', 'email','Age']
df = pd.DataFrame(data, columns=cols)
Facebook     1
Instagram    1
Tinder       3
Grindr       2
Uber         2
dtype: int64

Hope this helps.

You can use lambda here

df["New_col"] = df.apply(lambda x: sum(x == 'x'), axis=1)

It will apply lambda function row wise and compare each cell of row that is it x or not?.

Great!! how I used the lambda result to divide the number of x with the number of fields (34) to get the %

You can do

df["New_col"] = df.apply(lambda x: sum(x == 'x'), axis=1) / 34

PS: This should work i have not tested.