( Help needed ) Helicopter prison escapes analysis - Guided project

My first attempt at the guided project that studies the prison escapes using helicopters in the years from 1972 to the current year of this post 07-2022

i would appreciate any feeedback whether it is about the code used or the overall look and feel of notebook and markdown cells

question : how can i link a notebook for it to open directly in the browser without the need to download the file

helicopter_prison_escapes_Analysis.ipynb (76.2 KB)

Click here to view the jupyter notebook file in a new tab

1 Like

Hi @zackthemonk!
Thanks for sharing your first project here, and welcome to the community! With regards to your bold question, the links that you click on to open the notebook in your browser are automatically added by the forum shortly after you upload an ‘.ipynb’ file.

I’m unfamiliar with the helper package that you import at the beginning of your project. @Elena_Kosourova, is that something new that DQ rolled out for early projects? Could either of you point me towards the documentation?

To me, it looks like you have got a handle on the basics of data manipulation using that helper package. It is always helpful to include a few more comments about what parts of the data you are accessing or manipulating in each set of cells. For example, I think that In [72]'s purpose is to drop the longer expository paragraph from the rest of the data generated by data_from_url(), but to reach that conclusion, I had to work through the code myself.

Depending on where you are on your data science journey, you will find that there are a number of other functions that can help you present your notebook more crisply. For example, if your data variable is a DataFrame, you could use the function data.head(3) to perform the same function as your for loop in In [71] (it also nicely formats the output).

In terms of data presentation, consider the difference between the two histograms of the same data I have pasted below (image source).

In the left histogram, the bins are small enough that we start to visualize noise in the data. In the right one, the bins are larger and help convey a clearer picture (there is a lot to say about how to decide the appropriate bin size… that is another discussion entirely. The histograms come from an article that discusses this topic).

Now that you have the data read into your notebook, take some time to think about other interesting ideas you might like to explore. One that comes to mind is what does this data look like "in context?"

  • How many helicopter escapes per 1,000 prisoners in each country?
  • Are helicopter escapes more common than other escapes (i.e. no vehicle, automobile, etc.)?
  • Are helicopter escapes more common in urban or rural areas?

Then you could start hypothesizing why you see certain patterns in the data. For example:

  • “I believe Parisian prisons are more likely to have helicopter escapes because most buildings must be shorter than the Eiffel Tower.”
  • “I believe helicopter escapes will be more common in urban settings because helicopters are more common in the city.”

Hope these questions help get you thinking about where to take your project next. As you work through more lessons and guided projects, these sorts of questions will naturally come to your mind!

1 Like

Hi @chefpaul92,

It’s great to have you back in our Community! :grinning:

The helper isn’t a package. It’s a small collection of functions related to the only project on Prison Break. This is a new project of the Data Science Path that learners start from. Here is what’s inside the helper file:

import pandas as pd
import re
import matplotlib.pyplot as plt
from IPython.core.display import display, HTML

def data_from_url(url):
    df = pd.read_html(url)[1]
    lol = df.to_numpy().tolist()
    return lol

def fetch_year(date_string):
    return int(re.findall("\d{4}", date_string)[0])

def barplot(list_of_2_element_list):
    d = {ya[0]:ya[1] for ya in list_of_2_element_list}
    axes = plt.axes()

    spines = axes.spines
    ax = plt.barh(*zip(*d.items()), height=.5)
    plt.yticks(list(d.keys()), list(d.keys()))
    plt.xticks(range(4), range(4))
    rectangles = ax.patches
    for rectangle in rectangles:
        x_value = rectangle.get_width()
        y_value = rectangle.get_y() + rectangle.get_height() / 2
        space = 5
        ha = 'left'
        label = "{}".format(x_value)
        if x_value > 0:
                (x_value, y_value),
                xytext=(space, 0),
                textcoords="offset points",


def unique_countries(countries):
    s = pd.Series(countries)
    return list(s.unique())

def display_no_index(df):
def print_pretty_table(countries_frequency):
    countries = df.Country.value_counts().index
    occurrences = df.Country.value_counts().values
    d = {"Country": countries, "Number of Occurrences": occurrences}

df = pd.read_html("https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes")[1]
df = df[["Date", "Prison name", "Country", "Succeeded", "Escapee(s)"]]