Question about DataFrame.plot.bar function

I’m working on plotting a bar chart to compare the first ten rows and the last ten rows of a column (unemployment rate)
the dataframe name is recent_grads

My code is:

first = recent_grads[:10]["Unemployment_rate"]
last = recent_grads[-10:]["Unemployment_rate"]
df = pd.DataFrame(first, last)
df.plot.bar(x = first, y = last, legend = False)

What I expected to happen: I expected it to returned a bar chart

What actually happened: However, Python keeps telling that the last 10 values in the last variable is not in the index

KeyError: '[0.047584 0.04010498 0.10711573 0.07754113 0.08174221 0.04632028\n 0.06511219 0.1490482 0.05362065 0.10494572] not in index'

Please help I tried Google and StackOverflow but nothing made sense.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.bar.html

Look at what the examples are putting into their x and y inputs.
As a start, you have to care about the type, value, and sometimes id() (when learning copying/iterators/OOP) of every python variable you use.

KeyError happens when something is used to index another object, usually a collection of items stored in key-value structure. You don’t have to know what the collection is, as long as you can see what was being used as a key to index and why it is wrong. Here you see a list of numbers separated by spaces and \n in the error message. Does it look familiar? Try to guess what it is, where it comes from.
Besides thinking about how the developers of Matplotlib may have implemented something, you can abstract away from the low level details and think about API design. Meaning what do the Matplotlib/Pandas developers intend for you to provide to make a function work?
(The function in this case being bar plotting).
Maybe the way you are thinking works, but are there other easier ways of API design that lighten the burden of users, so they have to provide less information/ type less? (this is the hint to answer to your question)

Hi Hanqi,
I followed your instruction to read the document and fixed my error. I’m still unsure about key error message and API desgin. Sorry.

I understand I need to give the function plot.bar a DataFrame + specificed columns for x and y axis

In the example given in the document:

df = pd.DataFrame({'lab':['A', 'B', 'C'], 'val':[10, 30, 20]})
>>> ax = df.plot.bar(x='lab', y='val', rot=0)

The DataFrame consists of 2 series : lab and val

Isnt it the same with what I did? My DataFrame consists of 2 series: first and last.

first and last are variables. "first" and "last" are strings. They represent different things. Pandas takes what you give and passes it to matplotlib, doing some checking along the way, this check generated the keyerror because it can’t find the column name you wanted in the dataframe you provided.

It’s hard in this case to google because every application can generate a different keyerror message depending on what wrong inputs were given, but when you have the chance to google a standardized error message,(like those designed by pandas developers using validate parameter during pd.merge) that can lead you to the place in the source code, and when you read the surrounding code and do some tracing, you can quickly learn what causes the error message.

1 Like