Clustering basis questions

Screen Link: Learn data science with Python and R projects

My Code:

from sklearn.metrics.pairwise import euclidean_distances

print(euclidean_distances(votes.iloc[0,3:].values.reshape(1, -1), votes.iloc[1,3:].values.reshape(1, -1)))

distance = euclidean_distances(votes.iloc[0,3:].values.reshape(1,-1), votes.iloc[2,3:].values.reshape(1,-1))

Why we need to use reshape? and the question is to calculate the Euclidean distance between first and third row. why it has calculated the distance between the [2,3:]?


This was most likely needed because the dataquest platform had a different sklearn version when this content was created. The current version they use doesn’t apparently need the array to be reshaped.


euclidean_distances(votes.iloc[0,3:].values, votes.iloc[2,3:].values)


euclidean_distances(votes.iloc[0,3:], votes.iloc[2,3:])

seems to run fine for this exercise.

That’s what you are doing. [0,3:] is for the first first row and [2,3:] is for the third row. The function will calculate the distance between those two rows. 0 and 2 are the row indices. 3: is for the columns.