forked from chandar-lab/NeoBERT
-
Notifications
You must be signed in to change notification settings - Fork 0
Data/Masking Variations #12
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
thread for ideas on data/masking. We already know that fixed MLM mask %, masking random tokens the whole time isn't the best way to pretrain.
- mixing in poor quality data via some schedule/curricula. masked model should see what trash looks like
- "intelligent masker" sort of like generator in electra, but not fake words, where to mask. Draws on Ernie's work in v1-v3. Should not completely replace random masking, and probably follow some phases/schedule/curricula as well. Does not need to be trained alongside model (potentially) instead leveraging gliner-bi-edge-v2.0 or similar
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request