Skip to content

Latest commit

 

History

History
64 lines (34 loc) · 2.89 KB

README.md

File metadata and controls

64 lines (34 loc) · 2.89 KB

NBME - Score Clinical Patient Notes

NBME - Score Clinical Patient Notes

A NLP competition for NER (Named Entity Recognition) task.

My Result

Ranked on 117th (117 out of 1501) - bronze medal

My baseline code

Useful resource

Focal Loss

Focal loss was basically invented for object detection task, to overcome the class imbalance issue. However, this method is also perfectly suitable for NER tasks.

PyTorch code for focal loss

Shorter version of focal loss

Also, we could improve the focal loss by using it with Label smoothing.

Focal Label Smoothing

PyTorch code for Label Smoothing - simple implementation (not optimised for NBME competition dataset)

As the provided dataset in this competition has a huge number of unlabeled data with imbalanced classes, training the model with focal loss and pseudo labeling worked perfectly fine.

Training MLM model

Codes

Sample notebook for training MLM model with unlabeled data

Training the DeBERTa v3 large model with the unlabeled data as a masked language model is one of the key point of getting a high score for this competition. By pre-training the masked language model with unlabeled text data, the fine-tuned models were able to understand the distribution of the words in the patient notes.

Links

Huggingface example

Notebook for CommonLit competition (1)

Notebook for CommonLit competition (2)

Meta Pseudo Labels

Codes

Sample notebook for meta pseudo labeling

Related papers: - Meta Pseudo Labels - Fine-Tuning Pre-trained Language Model with Weak Supervision - Can Students outperform Teacher models

Basically, the notebook above for the MPL (Meta Pseudo Labeling) is designed on the hypothethis that the student model could outperform the teacher model when additional data is involved.

AWP

The Adversarial Weight Perturbation (AWP) was used in the first place solution of the Feedback Prize competition, and it is showen to be effective in the NBME competition as well.

Simple code for AWP