Data/Masking Variations

thread for ideas on data/masking. We already know that fixed MLM mask %, masking random tokens **the whole time** isn't the best way to pretrain.

- [ ] mixing in poor quality data via some schedule/curricula. masked model should see what trash looks like
- [ ] "intelligent masker" sort of like generator in electra, but not fake words, where to mask. Draws on Ernie's work in v1-v3. Should not completely replace random masking, and probably follow some phases/schedule/curricula as well. Does not need to be trained alongside model (potentially) instead leveraging [gliner-bi-edge-v2.0](https://hf.co/knowledgator/gliner-bi-edge-v2.0) or similar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data/Masking Variations #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Data/Masking Variations #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions