Unexpected Missing Values

Hello DataQuest Community,

I need help on how to clean “Unexpected Missing Values”, for instance, I have a string fields which have numbers in some points and I want to insert NaN where there is a number. I have this code below for dealing with one field I want to know how I can do it on more than one field have numbers:

# Detecting numbers 
for row in df['OWN_OCCUPIED']:
        df.loc[cnt, 'OWN_OCCUPIED']=np.nan
    except ValueError:

You can used vectorized methods throughout rather than deal row by row.

  1. Use this with errors== "coerce" to convert string values to NaN.
  2. Create a mask = series.notna() boolean mask to find values that survived the coercion (these are the numbers in string form that can be int() converted)
  3. series.loc[mask,'OWN_OCCUPIED'] = np.nan

Thank you very much for your comment hanqi,

By the way the sample code I posted works but only for only column at a time. How do I use “series.loc[mask,‘OWN_OCCUPIED’] = np.nan” on multiple columns instead of just one “OWN_OCCUPIED”? I have three other columns which should have names but have numbers erroneously entered.

Great question, i see you are pushing your limits thinking on more than 1 dimension now.
Soon you will be dealing with multi dataframe considerations.

Maintain this thought for all future operations you see, because it will be put to good use for many other data munging operations too (eg. type conversion, get_dummies, etc).

Many of these dataframe methods have numpy equivalents. Go for numpy if you can sacrifice readable labels for speed.