Coding Self Attention Transformer Networks in PyTorch for Question Classification

Hi everyone,
I just published a series of blogs w.r.t. Self Attention Transformers. The blogs series goes through coding self-attention transformers in PyTorch and then using the coded model for question classification. The classification problem has two categories of a different number of classes in each category. The blogs also explain two different ways to solve the classification problem. Please have a look at the series here,

Part - 1: