Prison Break further questions: Counting the number of escapees

I am trying to answer the further questions on the Prison Break Guided Project. I have come to the question: ’
How does the number of escapees affect the success?

  • I started by saving the relevant data (escapees and succeeded) in a dataframe
  • I deleted the rows with missing data in the ‘Escapee(s)’ column.
  • I then changed the Yes/No in ‘Succeeded’ to 1/0, and converted from an object to an int.
  • I now need to look at the escapees column and find a way to count the number of escapees in each row.

I am unsure how to do this as in each row the format is different and it is not clear where one name ends and the other begins.

Except for the first row, it seems that every name is glued to the next one:

Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson

In the example above, the names would be:

  • Garret Brock Trapnell
  • Martin Joseph McNally
  • James Kenneth Johnson

You can possibly handle the first row separately and, if you know about regular expressions, you can handle the remaining ones by splitting on an appropriate pattern.

Can you please include a link to the guided project in your original post?

Thank you for your reply! I do not know about regular expressions, but I will look into this.

Most of the rows have names glued onto the next one, but there are a few exceptions that I assume I will have to handle separately.

  1. Joel David Kaplan Carlos Antonio Contreras Castro
  2. Four members of the Manuel Rodriguez Patriotic…
  3. Ashraf Sekkaki plus three other criminals

Guided project:

Regular expressions aren’t necessary here, they would just help. With the tools you have, you can consider iterating over the string of names and every time the following character is uppercase, you add a new name to a list.

Eventually, you’ll probably have to deal with exceptions like “McDonald’s”, where it’s just a single name.

I created this function which split the names (FordDavid → Ford David) without affecting ‘McDonald’s’.
def split_names(s):
for i in range(len(s)-1)[::-1]:
if (s[i].isupper() and s[i-1].islower()) and (s[i-2]!=‘M’ and s[i-1]!=‘c’):
s = s[:i]+’//’+s[i:]
if s[i].isupper and s[i-1] == ‘)’:
s = s[:i]+’//’+s[i:]
return s.split(’//’)

input: split_names(‘Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson’)
output: [’’,
‘Garrett Brock Trapnell’,
‘Martin Joseph McNally’,
‘James Kenneth Johnson’]

I then iterated through my data and applied the function to each row.

There are four rows where the names are separated by spaces

  • 0: Joel David Kaplan Carlos Antonio Contreras Castro (2 escapees)
  • 1: JB O’Hagan Seamus TwomeyKevin Mallon (3 escapees)
  • 12: Mahoney Danny Francis MitchellRandy Lackey (3 escapees)
  • 23: Orlando Cartagena Jose Rodriguez Victor Diaz Hector Diaz Jose Tapia (5 escapees)

This is the code I wrote to deal with these exception:
df.Names[0] = [’’, ‘Joel David Kaplan’, ‘Carlos Antonio Contreras Castro’]
df.Names[1] = [’’, ‘JB O’Hagan’, ‘Seamus Twomey’, ‘Kevin Mallon’]
df.Names[12] = [’’, ‘Mahoney Danny’, ‘Francis Mitchell’, ‘Randy Lackey’]
df.Names[23] = [’’, ‘Orlando Cartagena’, ‘Jose Rodriguez’, ‘Victor Diaz’, ‘Hector Diaz’, ‘Jose Tapia’]

The function split_names() has ’ ’ as the first element in each list, so I added ’ ’ here for consistency.

This code works but I get the warning:
A value is trying to be set on a copy of a slice from a DataFrame

Is there a better way of dealing with these exceptions?

Could you write your code in between two pairs of three backticks so that. . .

    code goes here

. . . is rendered as. . .

code goes here

. . .?

Instead of modifying the dataframe, I’d focus on modifying just the output.

To understand what’s going with that warning, checkout SettingwithCopyWarning: How to Fix This Warning in Pandas.