I am confused here

I understood that len() function produces the number of data points in a data set but here produces only the number of rows, where is the mistake?
Something else, it’s an obvious and common-sense note that the number of rows is 7198, not 7197 or I don’t understand the meaning of this note?
Thanks

Hi @Maho

The len() function here simply counts the number of rows. It doesn’t recognise the header row as a special row here since we are converting the csv file into a list. The header is also just part of the list and counted just as a row or as a data point like you have mentioned. So if you do len() to the whole list, it will return the total number of rows, which also includes the header.

We have to manually remove and store the header row to a variable and keep the rest of the table as the datapoints. Once you done that and find the len() of the new table, it will be give you the total rows of data.

I hope this helps.

1 Like

I understand this but I have asked about len() function, I understood that len() function produces the number of data points in a data set but here produces only the number of rows, where is the mistake?
Thanks

header = apps_data[0]
new_apps_data= apps_data[1:]
len(new_apps_data)

Now you will get 7197

I guess, I understood your question correctly.

1 Like

:smile:
I don’t ask about this, my question is : I understood that len() function produces the number of data points in a data set but here produces only the number of rows, where is the mistake?
Thanks a lot

Hi @Maho,

In this case, your apps_data is actually not a real table, but a list of lists. The so-called “rows” here doesn’t look like real table rows, they are actually those small lists, from which this list of lists consists. Hence, when you use the len() function, you are just counting those small lists (you can call them “rows”), which represent data points in this case.

2 Likes

I can understand from your explanation that in a simple list as [ 5, 6, 3 ] len() function produces the number of elements within the list. But in a list of lists len() function produces the number of rows. Am I right?

Yes, correct. In a list of lists, like this

my_list = [[2, 3, 6, 8], [1, 9], [4, 6, 1, 10], [2, 7, 5, 20, 8, 1]]

it will produce the number of small lists (which you call “rows” here).

2 Likes

More generally, len will usually return the length of the first dimension.

from numpy.random import random
arr = random([2,3,4])
len(arr)
Out[11]: 2

This is a 2x3x4 dimensional array of numbers.
In this 3 (or higher) Dimensional array , we can’t use the words row or column to describe them anymore.

It does not produce the number of data points. This statement only applies to 1D objects like list, tuple set. Make a guess what it returns for dict? You probably realize the definition of “data points” is difficult. Because dictionaries contain not just values, but keys. Then when you move to pandas dataframes, they have values, column names, index names, what is “data points” becomes more vague as objects get more complex.

Usually, the dimension that len counts for you is the same as the dimension that gets accessed in a for loop.
Meaning if i take the above 2x3x4 random array and loop through it,

In [19]: for i in arr:
    ...:     print(i)
    ...:     print(i.shape)
    ...:
[[0.67937696 0.04112136 0.00963583 0.53491795]
 [0.10775869 0.57667792 0.78254441 0.38309252]
 [0.37719515 0.84977202 0.66155156 0.01233598]]
(3, 4)
[[0.5921208  0.43623514 0.84039817 0.23873288]
 [0.1893176  0.24225012 0.91538839 0.48808803]
 [0.95874432 0.85262728 0.48629505 0.70217657]]
(3, 4)

It loops through the 1st dimension with 2 items, and for each item, it gets a 3x4 array. (values and shape printed above).

However, other objects like dataframes are less intuitive because the objects that len and for loop works on are different.

import pandas as pd
df = pd.DataFrame({'A':[1,2,3],'B':[2,3,4]})

df

Out[16]:
   A  B
0  1  2
1  2  3
2  3  4

Here is a 3 row 2 column dataframe.
len(df) gives 3, but looping goes through the columns instead:

In [34]: for i in df:
    ...:     print(i)
    ...:     print(df[i])
    ...:
    ...:
A
0    1
1    2
2    3
Name: A, dtype: int64
B
0    2
1    3
2    4
Name: B, dtype: int64

This is because it’s more common for people to work with columns since each column of values mean the same thing and probably are manipulated in the same way.

How len and for x in object interact with any object is defined by the __len__ and __iter__ methods (because len(object) is translated to object.__len__(), similar pattern for many other python built-ins) in the object which you can overwrite and create (if creating your own classes and objects) to define the behavior you want. (People build on such extensibility
to create open source tools that make working with pandas objects more convenient, by adding new attributes to them: https://pandas.pydata.org/pandas-docs/stable/development/extending.html#extending-subclassing-pandas)

class SubclassedDataFrame(pd.DataFrame):  
    def __len__(self):
        return len(self.columns)
        #return 1
        #return 4   # IndexError: list index out of range

sdf = SubclassedDataFrame({'A':[1,2,3],'B':[2,3,4]})
sdf
len(sdf)
Output
	A	B
0	1	2
1	2	3

2

For example, here i can create my custom dataframe that does not return 3 when len(df) but counts columns instead. Interestingly i found that the html display of the dataframe in jupyter depends on len too, that’s why you only see 2 rows shown (even though in memory it’s still storing 3 rows, so no problem, just confusing display), and hardcoding return 4 will break the html display with IndexError.

Here is a demo of how python can be seen as a dunder method defined language and you just edit those __x__ methods to make objects behave as you want: https://www.youtube.com/watch?v=cKPlPJyQrt4&ab_channel=PyData.

Here’s another example of writing your own liar list by overridding __len__ behaviour to report list lengths to be 10 longer than their real length:

class Liar(list):
    def __len__(self):
        return super().__len__() + 10
    
lying_list = Liar([1,2])
len(lying_list)   # returns 12 instead of 2

Maybe you would be confused by all these class, subclass, super(), def __init__, self, DQ will teach some of them as you go along, or you can immediately consult other sources like https://realpython.com/python-super/. I hope by laying out what’s possible you can have new frameworks of thinking about objects.