Hello,
I had a general question on the “Bar Plots, Histograms, and Distributions Mission” on the 5th page.
URL: Learn data science with Python and R projects
The correct code is:
registered_freq = bike_sharing['registered'].value_counts(bins=10).sort_index()
I was wondering why we use the sort_index()
instead of using value_counts(bins=10, ascending=True)
instead.
1 Like
Here we have multi-index so both will give different output. sort_index
will do sorting based on each level while value_count(ascending=True)
seems to be sorting only over first level index from multi index.
For example
import numpy as np
index = pd.Index([3, 1, 2, 3, 4, np.nan])
# Only first level is in sorting order (first interval)
index.value_counts(bins=4,ascending=True)
(0.996, 1.75] 1
(1.75, 2.5] 1
(3.25, 4.0] 1
(2.5, 3.25] 2
dtype: int64
# Both level are in sort order
index.value_counts(bins=4).sort_index()
(0.996, 1.75] 1
(1.75, 2.5] 1
(2.5, 3.25] 2
(3.25, 4.0] 1
dtype: int64
1 Like
value.counts() sorts the frequency (That is the number of times the unique values occur only).Hence making the argument ascending =True, only means you want to display the frequency generated in ascending order as the value_counts() function sets it to ascending=False (descending order )by default
sort_index() sorts the index(that is the unique values generated only)