Skip to content

Support Multiple Labels for Each CorrectionΒ #30

@QueenieZqq

Description

@QueenieZqq

Why Multiple Labels

In the current API design, we define a correction (ProofreadCorrection) by the index range that it appears in the original text, and we only allow one label (CorrectionType) to be associated with each correction.

dictionary ProofreadResult {
  DOMString correctedInput;
  sequence<ProofreadCorrection> corrections;
}

dictionary ProofreadCorrection {
  unsigned long long startIndex;
  unsigned long long endIndex;
  DOMString correction;
  CorrectionType type; // exists if proofreader.includeCorrectionTypes === true
  DOMString explanation; // exists if proofreader.includeCorrectionExplanations === true
}

enum CorrectionType { "spelling", "punctuation", "capitalization", "preposition", "missing-words", "grammar" };

However, in practice, we notice that a single range-defined ProofreadCorrection could contain multiple types of correction. For example:

const original_text = "`thatd` a good amt of time!!! !" // `thatd` is the text to be corrected
const proofread_text = "`That's` a good amount of time!" // `That's` is the corrected text

With the current API shape, we can provide a single label for the correction from "thatd" to "That's", while there are actually three types of corrections made - "Captilization", "Spelling" and "Punctuation".

Similarly we could also have

const original_text = "Saudi state news agency spa says saudi armed forces shot down an explosive-laden drone launched against `saudi trritory` from al hudaydah governorate, on yemens ed sea coast." // `saudi trritory` is the text to be corrected
const proofread_text = "Saudi state news agency SPA says Saudi armed forces shot down an explosive-laden drone launched against `Saudi territory` from Al Hudaydah governorate, on Yemen's Red Sea coast." // `Saudi territory` is the corrected text

Here, the correction from "saudi trritory" to "Saudi territory" contains two types of corrections made - "Captilization" and "Spelling".

Therefore, we propose a change to the API design.

API Shape Proposal

The only update needed is the shape of proofreading correction output:

dictionary ProofreadCorrection {
  unsigned long long startIndex;
  unsigned long long endIndex;
  DOMString correction;
  sequence<CorrectionType> type; // exists if proofreader.includeCorrectionTypes === true
  DOMString explanation; // exists if proofreader.includeCorrectionExplanations === true
}

Instead of only supporting a single correction type label, the new API will allow a sequence of CorrectionTypes to be associated with each correction.

When there's only one label, the sequence will be of size 1. This new API shape will still be generic to allow different backend implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions