Use pandas groupby to find mean with boolean


How can I find the means of a group using groupby, which also includes a boolean. For example, on my excel sheet I want to find the mean of my groups (located in column 1) for values in columns 2,3 and 4, only if columns 5 and 6 both contain the string “nontoxic”. I uploaded screen shot of the example.
I know how to write the code to find the mean of each group for eachcategory (see below), but not how to include the boolean.

Thanks for any help!

df.groupby([“Fields”])[“Dend CSA”,“Ax Count”,“Ax Circ”].mean()

Hey, Kjusts.

If you provide a copiable version of the data, so that people can quickly get it into a dataframe and start coding, you’re more likely to improve the speed with which you’ll get an answer.

I cannot attach excel sheet because I am a new user, so it is pasted below. Real version is much larger, but this is enough information to figure out correct code. Thanks!

Fields Dend CSA Ax Count Ax Circ Axonal Health Dendritic Health
C - 3 695.575 626 0.936 nontoxic nontoxic
C - 3 268.057 560 0.952 nontoxic nontoxic
C - 4 2055.575 815 0.936 nontoxic nontoxic
C - 4 265.381 1119 0.972 nontoxic nontoxic
C - 5 397.05 687 0.941 nontoxic nontoxic
C - 5 1040.011 518 0.939 nontoxic nontoxic
C - 5 1033.697 708 0.95 nontoxic nontoxic
C - 5 879.787 537 0.937 nontoxic nontoxic
C - 6 631.682 586 0.92 nontoxic nontoxic
C - 6 1188.017 732 0.93 nontoxic nontoxic
C - 6 366.724 678 0.958 nontoxic nontoxic
C - 7 862.852 1513 0.958 nontoxic nontoxic
C - 7 1199.273 647 0.966 nontoxic nontoxic
C - 7 75.652 584 0.962 nontoxic nontoxic
C - 8 627.774 885 0.947 nontoxic nontoxic
C - 8 631.788 1153 0.964 nontoxic nontoxic

A sample is not only fine, but ideal.

Can you please provide the data in a native Python object like a list of lists or a dictionary? The text data you provided still requires a bit of work.

In a perfect world you’d share an actual dataframe, but that’s not possible. So a dictionary will work fine. This way whoever decides to help just has to create a dataframe from the dictionary, which can be done very quickly, relative to wrangling the text blob you provided.


@kjusts, you may want to read about query method:

I figured out a way to do it by making a new df (below), which only keeps rows that contain string “nontoxic” for Column Axonal Health and Dendritic Health. Then I take the mean of the new df using groupby (below). I still wish there was a shorter way since I do not want to make a new df each time, as for my real data I will be calculating various stats with high number of columns that will contain booleans to determine which rows to select. I wrote a code below with data saved as list of list and then covered to df if anyone wants to give it a wing. Thanks.

df_nontoxic = df[df[‘Axonal Health’]==“nontoxic”) & (df[‘Dendritic Health’] == ‘nontoxic’)]

df_nontoxic.groupby([“Fields”])[“Dend CSA”,“Ax Count”,“Ax Circ”].mean()

Data = [[“C-3”, “C-3”, “C-4”, “C-4”, “C-5”, “C-5”, “C-5”, “C-6”, “C-6”, “C-6”], [695.575, 268.057, 2055.575, 265.381, 1040.011, 1033.697, 879.787, 631.682, 1188.017, 366.724],[ 626, 560, 815, 1119, 518, 708, 537, 586, 732, 678], [0.936, 0.952, 0.936, 0.972, 0.939, 0.95, 0.937, 0.92, 0.93, 0.958], [“nontoxic”, “nontoxic”, “nontoxic”, “nontoxic”, ‘nontoxic’, “nontoxic”, “nontoxic”, “toxic”, “toxic”, “toxic”], [“toxic”, “toxic”, “nontoxic”, “nontoxic”, “nontoxic”, “nontoxic”, “nontoxic”, “nontoxic”, “nontoxic”, “nontoxic”]]
df = pd.DataFrame(Data)
df = df.transpose()
df.columns = [“Fields”,“Dend CSA”, “Ax Count”, “Ax Circ”, “Axonal Health”, “Dendritic Health”]