OOP, Class, and __init__

Screen Link:

i am lost in the concepts of this lecture, and have several questions. it’d be great if you could answer them, and possibly provide further sources to understand them!

  1. what are the methods that always have the first argument as the object itself? there are some class or function that don’t need (self) to operate, how do you figure that out?

  2. is init to use class as a function and so to get the initial instance? then why don’t we simply define function, rather than define class and then function in the class?

  3. at this page:
    Learn data science with Python and R projects
    why don’t we have to define ‘length’? is it because it’s a built-in function?

thank you!

2 Likes
  1. When you use a method like instance.method(args), python will transform it into method(instance,args). Every instance has it’s own set of attributes (data and methods), and python needs to know which instance called that method to use the correct information. This design facilitates method chaining patterns like instance.instancemethod1().instancemethod2() if instancemethod1 is designed to return self. When you code a decision tree from scratch using recursion to grow the tree, you have to return self to attach each lower level structure back to the bigger structure on top. Same as building just any tree (there are many types, some like BST are introduced in Dataquest) recursively in data structures and algorithms exercises. Something slightly off tangent but could be helpful once you are intermediate level in pandas: https://towardsdatascience.com/whats-the-difference-between-pd-merge-and-df-merge-ab387bc20a2e

Class methods do not need self, but need cls. Actually the name of the 1st variable doesn’t matter, they are not keywords, so this is not a clear distinction. Static methods do not need self or cls, so this should guide you to further research. They are something the designer selects, the same variable name could be set as a instance method, class method or static method (not simultaneously in one class!), so if you know what sorts of class behaviour you want, you will figure it out. Understanding the former will take time.

  1. class is not a function. class is a template defined as a statement of what attributes (data and methods) every instance should have, so you don’t have to rewrite them each time you create a new instance. __init__ is used usually to intialize how an instance looks like, usually by taking the data you pass in when calling instance creation code, and assigning them to the state of the instance using self.data = “what you passed as argument in here”. Different instance creation code will pass in different arguments to create different types of instances. Such arguments could be data to be processed by the methods, but could also be flags/counters that control the behaviour of methods, if your methods in the class are written to have their control flow affected by the flags.
    Besides just assigning data in __init__, it can also call it’s own class/instance methods to setup certain state. Just understand the general concept of __init__ is to get a var instance into a certain state the moment you do var = myclass(). You can front-load and do alot, putting everything in __init__, but that means less flexibility, or do nothing or have no __init__ at all, and assign data to the instance on the go with var.newatt = 10. Binding methods during runtime is much more rare though, i’ve never seen them for years, ever.

We don’t use functions because it’s much harder to keep track of which instance has which state. In functional programming, those functions have no side effects and what comes out always depends on what goes in. In class methods, a method that always adds the input integer to self.counter and returns self.counter will be returning constantly larger integers given the same input integer. If you wanted to simulate the same with a function, you need the function to leave side effects by passing a mutable object like list/dict from the global scope into it’s input so the function can manipulate the contents of the mutable structure, and when the function exits it’s scope, the structure still exists globally. Even though you can keep state like this, how are you going to manage 10 instances, by using 10 lists with different variable names? The benefit of class is all the data management is consolidated into 1 instance variable which holds state. That’s how you can run data preprocessing functions during training step in machine learning which extract statistics to be saved in eg. self.mean, self.variance, to be later applied to the test set, using the same preprocessing instance.

class also allows method definitions to use less arguments than necessary. If you are writing functions, you may need to define def func(arg1,arg2,arg3) and call it with 3 args too. In class methods, it can simply be def func(arg1), with the other two just referencing self.arg2 and self.arg3 directly. It does not need to take data from the arguments because it already has the data stored in the instance and can access it.

classes also let you define a lot of dunder methods (eg.__add__, __eq__, __hash__, __getitem__, __str__) :Operator and Function Overloading in Custom Python Classes – Real Python to let you override default python defined behaviour, to control how your code interacts with those instances, and how those instances interact with each other and existing built-in objects. This is how the same behaviour can be expressed simpler by implementing dunders: James Powell: So you want to be a Python expert? | PyData Seattle 2017 - YouTube

classes also allow inheritance which functions do not. Inheritance gives a conceptual documentation of the hierarchy of objects, and lets you conveniently have access to parent attributes (data and methods) in the child objects too. People usually add new methods in child classes or edit how the parent method is implemented. The ICPO of python will use the child’s version of the attribute first https://lerner.co.il/2019/09/10/legb-meet-icpo-pythons-search-strategy-for-attributes/ , and only use parent if can’t find the attribute in child. Note the child does not have to use all parent methods, because every child is different. In this case, all the methods in parent could be implemented as raise NotImplementedError because the parent instance will never be created, and parent class is just acting as a template naming the common methods waiting to be inherited for the children to define the implementations of some or all the methods in the parent. An example is the Layer parent class in neural networks defining def forward, def backward, def params def gradients but doing nothing in them (all just raising NotImplementedError), and waiting for child class Sigmoid(Layer) to implement the contents of def forward, def backward, or class Linear(Layer) to implement all 4 parent methods. (eg. from this book Data Science from Scratch : Joel Grus : 9781492041139).

Some more pure python examples of raising NotImplementedError:
image
Here python has no idea how to compare equality between float and complex number. __eq__ has to be implemented to tell python what do you mean when you do mycustomobject == another_built_in_or_custom_object

  1. self.length = 0 is defined, don’t understand what you mean “don’t have to define length”?
1 Like

thank you summerale for this yt link. I was very confused between objects, instance and methods. This solved the issue.

A few things I can’t seem to iron out if anyone is will to help clarify that would be great!

  1. Can someone walk through in step by step process how things are being added and counted in the dictionary. I understand self.count is being added 0 three times via the for range in 3 but I am getting confused on the double self.counts with 0 and +1 with no else. The first time through adding 0 it would be 0:1 so if it goes next to the add method it would make it 2 bc it sees one and adds one? Or does it go through once to be 0:1 since it meets neither of the self.count criteria? Basically how is this process kicking off i.e. how is the 0:1 being counted first bc the add method is just returning the KV pair theres no append to the self.count dic. I know that may not be the most cogent question but just trying to visualize this step by step thanks!

class FreqTable():

def __init__(self):
    self.count = {}
    
def add(self, element):
    # Check if this is the first time
    if not element in self.count:
        self.count[element] = 0
    self.count[element] += 1
    return self.count[element]
    
def get_count(self, element):
    # Check if the element was ever added
    if element not in self.count:
        return 0
    return self.count[element]

Solution testing

freq_table = FreqTable()
for _ in range(3):
freq_table.add(0)
print(freq_table.get_count(0))
print(freq_table.get_count(1))
print(freq_table.count)