Skip to content

Data/Masking Variations #12

@pszemraj

Description

@pszemraj

thread for ideas on data/masking. We already know that fixed MLM mask %, masking random tokens the whole time isn't the best way to pretrain.

  • mixing in poor quality data via some schedule/curricula. masked model should see what trash looks like
  • "intelligent masker" sort of like generator in electra, but not fake words, where to mask. Draws on Ernie's work in v1-v3. Should not completely replace random masking, and probably follow some phases/schedule/curricula as well. Does not need to be trained alongside model (potentially) instead leveraging gliner-bi-edge-v2.0 or similar

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions