Going fast! #DataquestChallenge Premium Annual Offer:
500 get 50% & the next 1000 get 40% off.
GET OFFER CODE

I am stuck. Need help with adding column

Hi Everyone,
The problem can be seen in the attached file. I am getting all NaN values when I add a column to an existing df. You will find that while I have successfully added one column for the year 2010, I am facing the problem with the column 2011.

Please help. Thank you.

arpsci.ipynb (16.2 KB)

Click here to view the jupyter notebook file in a new tab

Hi @faraz_llb, I think I might know why you’re getting these results but I’m not 100% sure. Can you please provide the data file as well? Doing so would allow others to recreate the problem and troubleshoot it more efficiently.

My best guess is that you’re getting this result because of this line of code:

arpsci.drop(arpsci.index[69:138], inplace=True)

which effectively deletes the contents of year_2011 since it is a pointer to these rows. I think you can overcome this by adding on .copy() to the end of the line of code where you define year_2011. Specifically change to:

year_2011 = arpsci.loc[arpsci["year"] == 2011, "value"].copy()

You could also try printing the contents of year_2011 after you delete the rows in order to test my hypothesis.

Let me know what you find out…I’m curious what is causing this!

Hi Mike,

As per your guidance, I tried. It did not work. I believe that it has something to do with the index.numbering. The index number for the year_2011 starts with 69. I have been looking over the internet to find out how I can drop the index numbering from year_2011. So far, I have not been successful.

The file requested is attached to this message.

Thanks.

arpsci2010.csv (5.4 KB)

That was going to be my second guess: indexing issue. Also, I believe a better way to accomplish what you’re looking to do is the pandas function pivot() but it’s been a while since I used it…

Give me a moment to play around with the data and see what I can come up with!

1 Like

One quick way is to do:

year_2011 = arpsci.loc[arpsci["year"] == 2011, "value"].reset_index(drop=True)

which does solve your problem!

oh yes, finally. Thank you Mike.

1 Like

Excellent!

And just for completeness sake, here is how you could do it in one line of code using pivot():

arpsci.pivot(index="level_1", columns="year", values="value")

Adding these lines will make printing the df look a little cleaner:

arp.columns.name = None
arp.index.name = None
1 Like