World happiness seaborn heatmap correlation matrix creation problem

Instead of drawing separate scatterplots to find the correlation strength between the happiness score and variables such as freedom, generosity, family etc
I am trying to construct a correlation matrix as a heatmap using the code below but it is not working please advise on what is wrong?

I am trying to edit code I found here
https://datatofish.com/correlation-matrix-pandas/

I also tried following along from here

It is using the world happiness datasets from the below link

https://app.dataquest.io/m/370/working-with-missing-data/4/assigning-the-corrected-data-back-to-the-main-dataframe

import seaborn as sn
import seaborn as sn
import numpy as np
%matplotlib inline

    df = pd.Dataframe(happiness2015['Freedom','Generosity','Happiness Score','Trust (Government Corruption)'])

    df.pivot(index='Happiness Score',columns='Freedom','Generosity','Trust (Government Corruption)')

corrMatrix = df.corr()
sn.heatmap(corrMatrix,annot=True)
plt.show()

I also just tried doing pivot but I did not get as far as that since it returned error

    df = pd.Dataframe(happiness2015['Freedom','Generosity','Happiness Score','Trust (Government Corruption)'])

    df.pivot(index='Happiness Score',columns='Freedom','Generosity','Trust (Government Corruption)')

As I understand the index param sets the column to use to make new frame’s index which I want to be the happy score, but this returns error.

File “”, line 8
df.pivot(index=‘Happiness Score’,columns=‘Freedom’,‘Generosity’,‘Trust (Government Corruption)’)
^
SyntaxError: positional argument follows keyword argument

Hi @jamesberentsen
You’re making a mistake here

When you are calling more than one column, you need to do it in a list. So that line should be like this
df.pivot(index=['Happiness Score',columns='Freedom','Generosity','Trust (Government Corruption)'])

I also found an error here

wich actually is sns.heatmap(corrMatrix, annot=True)

The rest of the code looks good, and it should print a cool graph

Good luck!

1 Like

Hi alegiraldo666

Thanks for your response.

I get a syntax error when I run that though.

Also I do not think there is an error with
sn.heatmap(corrMatrix,annot=True)
since seaborn was imported as sn not sns

I seem to have isolated the error to here with ‘dataframe’ being the problem , but even with it amended it does not work

AttributeError: module ‘pandas’ has no attribute ‘Dataframe’

Just even trying to get the correlation table output does not work

  1. Align your code properly
  1. It seems you want to index a DataFrame with these columns?

Use these commands to select a specific subset of your data.

  • df[col] | Returns column with label col as Series
  • df[[col1, col2]] | Returns columns as a new DataFrame
  • s.iloc[0] | Selection by position
  • s.loc['index_one'] | Selection by index
  • df.iloc[0,:] | First row
  • df.iloc[0,0] | First element of first column

When you select a dataframe with multiple columns, a DataFrame will be returned, so no need of using pd.DataFrame again

df = happiness2015[['Freedom','Generosity','Happiness Score','Trust (Government Corruption)']]

Learn More About Data Selection


https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

Check the documentation, only 3 parameters (index, columns and values) are passed in df.pivot()
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html
Why are you pivoting the DataFrame??

1 Like

Hi info.victoromondi,

I see what you mean now thanks , I was getting indent error
because it was not indented properly , I can get correlation table now

I need the index to be ‘happiness score though’

not quite I want ‘happiness score’ to be the index and the other columns
‘freedom’,‘generosity’ etc to be columns along top so that I can get a correlation
table showing pearson correlation coefficient between happy score and each column.
So it looks like part one of the error was to (a)leave out the square brackets as I only used one pair of the two, I had read that this would return a series and thought it should be okay, but it was not

df[[col1, col2]] | Returns columns as a new DataFrame

(b) It seemed like the values param was optional, and so I thought
it would take all columns apart from ‘happiness score’ as that is the index

values Column(s) to use for populating new frame’s values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns.

Check the documentation, only 3 parameters ( index , columns and values ) are passed in df.pivot()

I thought I only passed in 2 - the 2nd being ‘columns’

I am pivoting , because that was how it was done as part of the steps to get a heatmap correlation table in the video link above on youtube. I think that is how I set the index to ‘happiness score’

You see in photo above ‘happiness score’ is one of the columns , which does not make sense in the context of what I am tryingto achieve , as I want to see correlation between ‘happiness score’ and variables of interest ‘generosity’, ‘health’ etc as ‘happiness score’ is also a row here and it does not make sense to correlate it with itself so I need to remove it from column to the index

I am still getting error thouugh so problem is in the pivoting,
error being positional srgument which I have tried to make sense of but cannot

Since pivoting did not work I tried
df.set_index(‘Happiness Score’)

however the heatmap does not look correct and the index is not the ‘happiness scores’ and ‘happiness scores’ is still one of the columns


import seaborn as sn
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

df=combined[['Freedom','Generosity','Happiness Score','Health (Life Expectancy)']]
df.set_index('Happiness Score')

#df.pivot(index='Happiness Score',columns='Freedom','Generosity','Trust (Government Corruption)')

corrMatrix = df.corr()

corrMatrix
sn.heatmap(corrMatrix,annot=True)
plt.show()

Hi again, i just realized that i made a mistake in the code :sweat_smile:

df.pivot(index='Happiness Score' ,columns=['Freedom','Generosity','Trust (Government Corruption)'])

i think it should work now

1 Like

Hi there,

I’m afraid to say it does not work.
I get this error

ValueError: Length mismatch: Expected 470 rows, received array of length 3


![Screen Shot 2020-08-15 at 22.27.46|690x360]a(upload://2Jl4GRot5kPJ6FB1L0RmV5iR7wn.png)

All the series are the same element length

edit:
I found part of the answer here

problem was I did not have inplace=True
df.set_index(‘Happiness Score’,inplace=True)

before

after

import seaborn as sn
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

df=combined[['Happiness Score','Freedom','Generosity','Trust (Government Corruption)']]
df.set_index('Happiness Score', inplace=True)

#df.set_index('Fruit', inplace=True)

corrMatrix = df.corr()

corrMatrix
sn.heatmap(corrMatrix,annot=True,cmap='coolwarm')
plt.figure(figsize=(20,20))
plt.show()

There is still a problem with the heatmap though since it is
not showing the correlation between happiness score and the other columns
such as genrosity, trust freedom
instead the first column freedom is correlated against itself then trust and generosity which is not what I expected as output