How to change parameters of Bert model for a better performance on test set?

Hi community !

Screen Link:

My Code:

.
.
.
from transformers import BertForSequenceClassification,AdamW,BertConfig
from transformers import BertTokenizer
print("Loading BertTokenizer...")
tokenizer=BertTokenizer.from_pretrained("bert-base-uncased")


model=BertForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels=2,
    output_attentions=False,
    output_hidden_states=False,
    
)

model.cuda()

optimizer=AdamW(model.parameters(),
                lr=1.5e-5,
                eps=1e-8,
               )

from transformers import get_linear_schedule_with_warmup

epochs=4

total_steps=len(train_dataloader)*epochs

scheduler=get_linear_schedule_with_warmup(optimizer,
                                          num_warmup_steps=0,
                                          num_training_steps=total_steps)


import random 

seed_val=42

random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

##################################################################
#                        TRAINING                                #
##################################################################     

# loss_values=[]

training_stats = []
for epoch_i in range(0,epochs):

  print("****Epoch {:} /{:} ******".format(epoch_i+1,epochs))
  print("Training...")

  t0=time.time()
  total_loss=0

  model.train()


  for step,batch in enumerate(train_dataloader):
    if step%100==0 and not step==0:
      elapsed=format_time(time.time()-t0)
      print(" Batch {:>5,} of {:>5,}. Elapsed: {:}".format(step,len(train_dataloader),elapsed))

    b_input_ids=batch[0].to(device)
    b_input_mask=batch[1].to(device)
    b_labels=batch[2].to(device)
   

    model.zero_grad()
    # outputs=model(b_input_ids,
    #               token_type_ids=None,
    #               # attention_masks=b_input_mask,
    #               labels=b_labels
    #               )
    
    loss, logits = model(b_input_ids,
                                  token_type_ids=None, 
                                  attention_mask=b_input_mask,
                                  labels=b_labels
                           )

  
    total_loss +=loss.item()
    loss.backward()

    torch.nn.utils.clip_grad_norm(model.parameters(),1.0)
    optimizer.step()
    scheduler.step()

  avg_train_loss=total_loss/len(train_dataloader)

  
  print("")
  print(" Average training loss :{0:.2f}".format(avg_train_loss))
  print("Training epoch took {:}".format(format_time(time.time()-t0)))
  training_time = format_time(time.time() - t0)



##################################################################
#                        VALIDATION                                #
##################################################################     

  print("")
  print("Runing Validation ...")

  t0=time.time()
  model.eval()

  total_eval_loss,eval_accuracy=0,0
  nb_eval_steps,nb_eval_examples=0,0

  for batch in validation_dataloader:

    batch=tuple(t.to(device) for t in batch)

    b_input_ids,b_input_mask,b_labels=batch

    with torch.no_grad():
    #   outputs=model(b_input_ids,
    #                 token_type_ids=None,
    #                 # attention_masks=b_input_mask
    #                 )
    # logits=outputs[0]
      loss, logits = model(b_input_ids, 
                                token_type_ids=None, 
                                attention_mask=b_input_mask,
                                labels=b_labels)


      total_eval_loss += loss.item()
    #Move to cpu

    logits=logits.detach().cpu().numpy()
    label_ids=b_labels.to('cpu').numpy()


    # # Accuracy of this batch 
    tmp_eval_accuracy=flat_accuracy(logits,label_ids)
    eval_accuracy+=tmp_eval_accuracy
    nb_eval_steps+=1
    
  print("  Accuracy: {0:.2f}".format(eval_accuracy/nb_eval_steps)) 
  print("  Validation took:{:}".format(format_time(time.time()- t0)))
  avg_val_loss = total_eval_loss / len(validation_dataloader)
  print(" Average validation loss :{0:.2f}".format(avg_val_loss))

  avg_val_accuracy = eval_accuracy / len(validation_dataloader)
  validation_time = format_time(time.time() - t0)
  


  training_stats.append(
      {
          'epoch': epoch_i + 1,
          'Training Loss': avg_train_loss,
          'Valid. Loss': avg_val_loss,
          'Valid. Accur.': avg_val_accuracy,
          'Training Time': training_time,
          'Validation Time': validation_time
      }
  )

  
print("")   
print("Training completed!")



predictions=predictions[:,1]
predictions[predictions>0]=0
predictions[predictions<0]=1
predictions=predictions.astype(np.int64)

sample_submission=pd.read_csv('sample_submission.csv',sep=',',index_col=0)
sample_submission["target"]=predictions
sample_submission.head()

to_submit=sample_submission.to_csv("submission.csv",index=True)

What I expected to happen:
I’m working on an NLP Task from Kaggle competition, the purpose is to predict if a tweet expresses a real disaster or not. I’m using BertForSequenceClassification.

My Training set size is 10000, I split it into:

  • 8000 as Training set
  • 2000 as Validation set
  • Learning rate : 2e-5
  • Epochs :4
  • Batch size :32

What actually happened:

The performance on test set is bad (0.47 when submitting on Kaagle). I tried many changes on Learning rate and Epochs, but I still have the same problem.


Replace this line with the output/error

Any help would be appreciated

Hey @y.kemiche,

A couple of things you can do is try training for more epochs something like 50 or 100 considering it is not overfitting.

Then another thing you can do is get more training data and if it’s all you have then try splitting it as
Training set: 9200 or 9500
and Validation set: 800 or 500.

Thanks for your answer,I tried both solutions and I got same results

Hi, @y.kemiche

I am very confused about this block of prediction optimization.

predictions=predictions[:,1]
predictions[predictions>0]=0
predictions[predictions<0]=1
predictions=predictions.astype(np.int64)

If I understand correctly what you are trying to do, then it should look like.

predictions=predictions[:,1]
predictions[predictions<0.5]=0
predictions[predictions>=0.5]=1
predictions=predictions.astype(np.int64)

Hi @ moriturus7

That’s what I’m getting when I print first 10 predictions(the code below) : the first column shows output for label “0” and the second colomn shows output for label “1”…I’m supposing that when the first column is positive then the label will be “0”,when the second column is positive the label will be “1”

>>predictions[0:10]

   array([[ 0.20133917, -0.41839105],
          [ 1.1800584 , -0.93878055],
          [ 0.26010755, -0.40417665],
          [ 2.7373803 , -1.8675895 ],
          [-0.41964293,  0.14992158],
          [ 1.7840809 , -1.1146419 ],
          [-3.4311104 ,  2.7755234 ],
          [ 3.0697374 , -2.0666115 ],
          [ 0.30287465, -0.27323273],
          [-3.7533224 ,  3.2343194 ]], dtype=float32)

These are very strange results for a classifier.

You should also have a binary classification task with only one label. 1 - Yes, 0 - No is one label.

moriturus7 This is the output of Bert model