Working-with-missing-data Next Steps

Missing Data Next Step

Please take a look .

Also if someone did this challenge completely please share the code:

• Drop the rows that had suspect values for `injured` and `killed` totals.
• Clean the values in the `vehicle_1` through `vehicle_5` columns by analyzing the different values and merging duplicates and near-duplicates.
• Analyze whether collisions are more likely in certain locations, at certain times, or for certain vehicle types.

Also if someone expert in handling multiple index please leave a note

My approach and observation:
what I did was divided time into 8 bins and analysed in which borough during which quarter of day highest death occurred.

Result I got is :

``````QUEENS	q1	5.0
``````

Here q1 is 12am - 3 am, as I divided 24 hours in 8 bins.

Tried Analyzing whether collisions are more likely in certain locations, at certain times

``````import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

sup_data = pd.read_csv('supplemental_data.csv')

location_cols = ['location', 'on_street', 'off_street', 'borough']
null_before = mvc[location_cols].isnull().sum()

for col in location_cols:
mask1=mvc[col].isnull()
mvc[col] = mvc[col].mask(mask1,sup_data[col])

null_after = mvc[location_cols].isnull().sum()

x=mvc.groupby('borough').count()['location'].sort_values(ascending=False)

y=mvc['time'].str.replace(':','.').astype('float')
bins = pd.cut(y,bins=8,labels=['q1','q2','q3','q4','q5','q6','q7','q8'])
bins.unique()

mvc['timebin'] = bins
mvc['timebin']

pivot_xx=pd.pivot_table(mvc,index=['borough','timebin'],values='total_killed',aggfunc=np.sum)
pivot_yy=pd.pivot_table(mvc,index=['borough'],values='total_killed',aggfunc=np.sum)

max_death = pivot_xx.apply(max).values[0]
higest_death = pivot_xx[pivot_xx['total_killed'] == max_death]

maxDeathBoroughTime =pivot_xx.idxmax().values
maxDeathBorough = pivot_yy.idxmax().values[0]

``````
``````	total_killed
borough
BRONX	5.0
BROOKLYN	14.0
MANHATTAN	12.0
QUEENS	18.0
STATEN ISLAND	0.0
``````
``````higest_deathDataFrame (<class 'pandas.core.frame.DataFrame'>)

total_killed
borough	timebin
QUEENS	q1	5.0
``````