I didn't get the purpose of that task

Hi There,

I didn’t get the main task in this page(no.7).
now I did the value counts but I don’t know why I have to group by or select the fist 5%
(You might want to select the top 20, or you might want to select those that have over a certain percentage of the total values (e.g. > 5%).)

could you give me a hint

hey @wALEED.tawheed

It would be great if you can attach a screen link to the mission you are facing difficulty at, for us to help you precisely.

As a best guess, the task is asking you to proceed with the analysis for the first top 20 results only. i.e. say if your value_counts() method gave you 55 distinct values present in the said field. since value_counts() by default orders the result in descending order of the count of distinct elements as below:

Val1 - 10
Val2 - 9
Val3 - 7

Val55 - 1

What the task is asking you is to select the top 20 results ie. Val1 to Val20 or take all the elements which constitute more than 5% proportion of the total value counts. You can achieve the proportion in % form using normalize argument in value_counts().

Hey @Rucha
thanks for your reply
this step what I am talking about : https://app.dataquest.io/m/294/guided-project%3A-exploring-ebay-car-sales-data/7/exploring-price-by-brand

I am still don’t know the main goal of this step

I updated my code could you check if I am doing good or not?
Basics.ipynb (167.0 KB)

Click here to view the jupyter notebook file in a new tab

hey @wALEED.tawheed

This step has now given you the top 20 brands. Top because these brands have been advertised more that the other brands. image

the series.index() should be applied as a chain method to the above code instead of applying it separately. That way you can only extract the names of the brand and avoid the count of values.

This as per instructions is kind of optional. you may do so though for presentational purpose or for analysis further or for both!

hope that helps.

Hi @Rucha

ummm i got confused, I didn;t get the idea of chain method. I looked for it but i did’t get it also :confused:

so why we use the series.index
when I did it that was the result

Int64Index([32819, 27171, 791, 48233, 5354, 5308, 7791, 36652, 30759,
49240,

43668, 40918, 37840, 38299, 47337, 12682, 35923, 34723, 14715,
36818],
dtype=‘int64’, length=46676)

is it useful ??

thanks a gain for your support

hey @wALEED.tawheed

try this code in your code cell and observe the results.

for each in laptops.manufacturer.value_counts(normalize = True).index:
    print(each) 

This here is called chaining the methods. We first applied value_counts() method, then in the same code line, we applied index method