In the 5th slide of the exploring Ebay car sale project. There was a small step to explore the distribution of values in the date_crawled, ad_created, and last_seen columns (all string columns) as percentages.
I am curious how did python sorted this index if we have not change the class? The class of these dates are still strings in my view.
Does anyone have any insight on this?
Are you assuming strings have to be turned into numbers to be sorted? https://docs.python.org/3/reference/datamodel.html
In python, any object can be compared. When you define your own objects, you will specify __lt__,__gt__,__le__,__eq__, etc methods (or use functools.total_ordering) to say how objects and which attributes (depending on application, you may tighten or loosen conditions for equality) among objects are compared to determine the size between the 2 objects.
Strings are objects too. They are iterable and are compared from left to right until ties are broken (so doesn’t mean longer string is bigger if shorter string’s earlier position characters win first). ord() turns each character of a string to a Unicode code point number and compares that. This is what you see in docs https://docs.python.org/3/tutorial/datastructures.html#comparing-sequences-and-other-types as ‘lexicographical order’(or more intuitively dictionary order, as how you see them in any real life dictionary). Just for the perfectionist who craves symmetry, you can invert ord('2') with chr(50)
That’s why you must be careful with types. A number printed on the console may not actually be a int or float but a string. Nevertheless, if one side of the comparison is a letter, you obviously know the other side which looks like a number is a string too, if it wasn’t, you get TypeError: '>' not supported between instances of 'str' and 'int'. So confusion is more likely when both operands are numbers that are encoded as strings and they have different lengths and you see these as integers '222'>'1111' is True as strings, False as numbers
To add to my answer, besides python, sql can also compare sizes between more than just numbers. Look at it’s BETWEEN operator for one, which can compare text, numbers, dates and probably many more. (Most sql variants have much more datatypes than python)