Prison Break: Analyzing Helicopter Prison Escape Data

Hello Everbody

I hope you are doing great, so I worked on the Prison break dataset and used pandas to make the work easier and now I need feedback to make my work better.

UPDATED:

Helicopter Escapees.ipynb (59.6 KB)
Click here to view the jupyter notebook file in a new tab

2 Likes

Hi @OlutokiJohn

The project looks great and thanks for sharing with DQ community. I love the fact that you went beyond the guidelines and worked this project using pandas. The visualization process is well displayed together with the codes. Keep it up mate for the good work.

Basically there are more than one way this can be achieved . Since you have already created the column- Year which shows the year when the attempt occurred, you can then choose to create two dataframe from the main one. To create the first dataframe;

  • apply Boolean vector on Succeeded column and select where the value is Yes- which implies successful escapes; have a look
df_succeeded=prison_break_data[prison_break_data['Succeeded']=='Yes']
  • then use value_counts() method on the Year column to know how many successful attempts were recorded in a given year . Have a look at the code below;
print(tabulate(df_succeeded['Year'].value_counts(dropna=False).to_frame().head(5),
              headers=[f'\033[31mYear\033[0m', '\033[31mNumber of attempts(succeeded)\033[0m'], tablefmt='fancy_grid'))

You will notice that have included tabulate( which you have to import -from tabulate import tabulate)- which is just helping us to have our output from value_counts() to a table which is more readable . Have also included some special format just to enhance the readabality still, and they shouldn’t bother you that much if you can’t understand . You can choose to ignore them and just apply the value_counts() , the result will be the same.

Also to create the second dataframe, follow the same procedure only that your boolean vector should now settle on No that is on Succeeded column. Have a look;

df_failed=prison_break_data[prison_break_data['Succeeded']=='No']

print(tabulate(df_failed['Year'].value_counts(dropna=False).to_frame().head(5),
              headers=[f'\033[31mYear\033[0m', '\033[31mNumber of attempts(failed)\033[0m'], tablefmt='fancy_grid'))

If you run the above code lines jointly, then you will have a result as displayed below;

I only printed the first five years in each case, you can remove the .head() to have all years included.

I would also wish to add the following suggestions;

  • Data Visualization and data cleaning are two different process , I don’t think if it’s a cool idea to combine these two to have one subheading the way you did. Always consider working them separately with each having a subheading of it’s own.
  • Always make sure your subheading names’ tells what you are just about to do, for example if you have a subheading like ‘Visualizing result’, then a reader will expect some charts, tables, and graphs same to when you have one as ‘data cleaning’- a reader will be expecting to see how you are doing the cleaning. In your case , you have a subheading ‘visualizing and data cleaning’ but you never did any visualization nor did you do the cleaning.
  • Though your graphs in cell[13] are self explanatory, you need to add some supporting statements or rather to make some observations. Remember not every reader will understand all your outputs.
  • Check on your arguments and the explanations you are giving, they are not that consistent. For example, your last conclusion contradict what you have on your graph. On the graph, France has a record more than 14 but in the conclusion you have indicated that France has a record of 14.
  • Check also on the style, your styling generally is not well that managed. For example, your two conclusion has got different styles , one statement is bold and the other is bold and italic. I think the best approach is to have same style for both explanation, observations, and when giving statements.

Further Exploration

Being that you worked on this project using pandas, there is just a lot of exploration that you can do with just few lines of codes. For example applying value_counts() in Succeeded column ,you will have display showing the number of those who succeeded in the attempts of escaping and the number of those that failed in their attempts. Same to Date column, you can go a head and generate a new column with months only and try to analyse the months where most escape attempts occurred. if you have time or if you think of expounding further on this project, then the two can serve you a great start.

Otherwise congratulations for the good work, and wishing you the best in your upcoming projects.

1 Like

Hello, @brayanopiyo18 Thank you very much, I did some visualization tho, the charts where I answered the questions.

I’ve done all you suggested, kindly go through I’d love to learn more

2 Likes

Hi @OlutokiJohn , this is very great. I appreciate the effort you have put in place to incorporate the suggestions I raised. Your subheadings now links up with the workings that follow. Have also noticed more explore including the months with more escape attempts which has been displayed into more well formatted table.
You have successfully managed to have the subheadings separately and your explanations/observations are now consistent in regard to styling. Your arguments as well are now consistent in regards to outputted graphs. This so great mate and keep it up.
One more thing though, I think you can summarized all your findings to subheading called conclusion.

Otherwise for me , all is now well.

Happy coding::clinking_glasses::clinking_glasses:

2 Likes

Thank you very much @brayanopiyo18, I really appreciate

1 Like