Trouble with Pandas

I have a dataframe that I’ve taken in from a csv. This dataframe has a column that has two years in it: 2018, and 2019. I’d like to turn this into two different dataframes with their own year so that only one dataframe has 2018 data and one has 2019 data.

Column 6 contains the year.

all2018 = []
all2019 = []
for row in rawdata:
    if row[6] == 2018:
        all2018.append(row)
    else:
        all2019.append(row) 

I was pretty confident that this code would work, but I seem to be running into a cascading group of errors. The first error that I get is “IndexError: string index out of range” and the last error that I get is “attrib() got an unexpected keyword argument ‘convert’.” I’m not sure which error I should be trying to fix.

Also, I thought that dataframes took care of some of these datatype/structure issues. Would I be better off by just making this a list of lists?

The ultimate goal is to create two different dataframes so that I can compare sales from 2018 to sales from 2019 and then create a ratio for the change in sales and find the largest differences.

Thanks.

Hi, you should first refer to which dataframe index in the raw data you are extracting data from
2-check out the datatype in that series, usually it is of a string type
3- if it is of a string data type use ==‘2018’ instead
Good luck

I’m not sure what you mean by your first question. I need the code to loop through every row, which is done through the for loop and I need it to specifically look at the 6th column as the determinant for where the data should go.

I’ve checked and that column appears to be of type ‘int’.

Hi @charlesd

based on an assumption, that you are trying to segregate the data by a for loop, please try the following code:

for index, row in df.iterrows(): 
             if condition:
                    # do this
             else:
                   # do this

for row in df: 
# Only reads the first row in data-frame. you can use a print statement to see the result for both the codes. 

cleaner way, to segregate the dataframe is to create two dataframes based on value in column:

df_new1 = DataFrame[DataFrame[Column_Name] == Criteria_Value1]
df_new2 = DataFrame[DataFrame[Column_Name] == Criteria_Value2]
1 Like

Hey, that worked! Thank you so much!

I have a few questions:

  1. Why do I need to use index, row in the for loop?

  2. When drawing data from my dataframe, why do I need to refer to the dataframe twice? Ex: DataFrame[DataFrame[Column_Name]

It seems like I need to learn a different syntax for dealing with dataframes? Perhaps I’ll have to buy a pandas book or find a good reference website or something.

The best reference website for pandas is the pandas documentation.

  1. In the first case, the behavior of iterrows is similar to that of the enumerate () function, which also returns index, value. There are many cases when you can use both the row index and the value, for example, if there is a behavior that depends on the previous row and you need to reference the previous index
  2. The need to refer to the dataframe twice is due to the fact that it is a single-line entry. You could write it like this
boolean_mask = DataFrame[Column_Name] == Criteria_Value1
df_new1 = DataFrame[boolean_mask]
1 Like