Working-with-missing-data Next Steps

Missing Data Next Step

Please take a look .

Also if someone did this challenge completely please share the code:

  • Drop the rows that had suspect values for injured and killed totals.
  • Clean the values in the vehicle_1 through vehicle_5 columns by analyzing the different values and merging duplicates and near-duplicates.
  • Analyze whether collisions are more likely in certain locations, at certain times, or for certain vehicle types.

Also if someone expert in handling multiple index please leave a note

My approach and observation:
what I did was divided time into 8 bins and analysed in which borough during which quarter of day highest death occurred.

Result I got is :

QUEENS	q1	5.0

Here q1 is 12am - 3 am, as I divided 24 hours in 8 bins.

Tried Analyzing whether collisions are more likely in certain locations, at certain times

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

sup_data = pd.read_csv('supplemental_data.csv')

location_cols = ['location', 'on_street', 'off_street', 'borough']
null_before = mvc[location_cols].isnull().sum()

for col in location_cols:
    mask1=mvc[col].isnull()
    mvc[col] = mvc[col].mask(mask1,sup_data[col])
    
null_after = mvc[location_cols].isnull().sum()


x=mvc.groupby('borough').count()['location'].sort_values(ascending=False)

y=mvc['time'].str.replace(':','.').astype('float')
bins = pd.cut(y,bins=8,labels=['q1','q2','q3','q4','q5','q6','q7','q8'])
bins.unique()

mvc['timebin'] = bins
mvc['timebin'] 

pivot_xx=pd.pivot_table(mvc,index=['borough','timebin'],values='total_killed',aggfunc=np.sum)
pivot_yy=pd.pivot_table(mvc,index=['borough'],values='total_killed',aggfunc=np.sum)



max_death = pivot_xx.apply(max).values[0]
higest_death = pivot_xx[pivot_xx['total_killed'] == max_death]

maxDeathBoroughTime =pivot_xx.idxmax().values
maxDeathBorough = pivot_yy.idxmax().values[0]


	total_killed
borough	
BRONX	5.0
BROOKLYN	14.0
MANHATTAN	12.0
QUEENS	18.0
STATEN ISLAND	0.0
higest_deathDataFrame (<class 'pandas.core.frame.DataFrame'>)


total_killed
borough	timebin	
QUEENS	q1	5.0