'NA' as value for ONE ENTRY in new column when using MUTATE in R

I created a new column (class_cost_z) to find zscores of costs associated with groups of people. both the class column and the cost column had no ‘NA’ values, however, when I mutate to a new column based on these two columns I am getting an ‘NA’ value for one row. I don’t know why?!

The code I was using is as follows:

df <- 
  df%>% 
  group_by(class) %>% 
  mutate(class_cost_z = (cost - mean(cost, na.rm=TRUE)/(sd(cost, na.rm=TRUE))))

I checked the data using this code which is helpful to determine NA:

APC2[is.na(df$class_cost_z),]

I did also try removing the ‘na.rm=TRUE’ just in case this was causing any issue but I get the same result - always the same row having an ‘NA’ result in the ‘class_cost_z’ column. Any help in resolving this would be appreciated! :slight_smile:

1 Like

Hi @samjee410,

It’s hard to say really. Since you’re grouping by class, which class has the ‘NA’ value? Are the values in the cost column all numeric?

And if it’s possible, can you share the full code?

1 Like

Hi, thanks for your reply.

The class with the NA value is what is being referred to as ‘RN’, and this is the class being used for many other enties.

The Cost is all numeric as per below, I also checked the row with the NA resulting value and there is nothing unusual about it.

class(data2$Cost)
[1] “numeric”

The code that is giving me a table of the tallies of NA, NULLS and BLANKS is as follows (see also screenshot for output in R studio viewer):

na_NULL_blank <- sapply(data2, function(x) sum(is.na(x) | x == "" | x== "NULL"))
View(na_NULL_blank)

na_null_blank output

I don’t know if this is the problem, though it is correctly tallying other columns in the data (ie. they are showing as ‘0’).

It could be a red herring, but after the grouping by class, is it possible to just get the “RN” group and calculate the cost zscore for only that group? If possible, does it return “NA”?

2 Likes

Sorry for late reply, I’ve decided to not use the code and run simple checks for individual na and blanks.

subset(df, is.na(D1))
subset(df, is.na(D1)|trimws(D1) == "")
1 Like

I see.

No worries then; I’m glad you’ve found an alternate solution.