Translator: Improve translation evaluation in Co-op Translator using cosine similarity #37

skytin1004 · 2024-10-10T01:43:29Z

Describe the feature you'd like

Currently, Co-op Translator identifies translation issues by comparing the number of line breaks between the original and translated content. While this helps flag significant differences, it is not always accurate, especially for longer documents where line breaks might be intentionally added by OpenAI to improve readability.

I would like to introduce a more sophisticated evaluation method using cosine similarity. By converting both the original and translated documents into vectors using embedding techniques like TF-IDF or Doc2Vec, we could measure the semantic similarity between the two. If the cosine similarity score is above a certain threshold (e.g., 0.7), we can assume the translation is accurate. If it falls below, the document could be flagged for further review.

Problem this feature solves

This feature would provide a more reliable way to assess the quality of translations by comparing the meaning rather than the formatting. It would help in cases where line breaks are not a definitive measure of translation accuracy, ensuring that meaningful translations are not mistakenly flagged as errors due to formatting differences.

Alternatives considered

We initially considered using Azure OpenAI to verify translation quality by sending both the original and translated documents for comparison. However, this approach was discarded because it would be too time-consuming and costly.

Additional context

Embedding techniques such as TF-IDF or Doc2Vec could be integrated into the translation process to generate vector representations of the documents. By calculating cosine similarity between the original and translated content, we can evaluate how closely the translated document retains the original meaning. A similarity score could be displayed along with the translation result, helping reviewers focus on documents with lower scores.

Are you willing to submit a pull request to implement this feature?

I am willing to submit a pull request

Code of Conduct

I agree to follow this project's Code of Conduct

skytin1004 added the enhancement New feature or request label Oct 10, 2024

skytin1004 changed the title ~~Improving translation evaluation in Co-op Translator using cosine similarity~~ Improve translation evaluation in Co-op Translator using cosine similarity Oct 10, 2024

skytin1004 added the translator Related to any changes in the translation-related source files label Oct 11, 2024

github-actions bot changed the title ~~Improve translation evaluation in Co-op Translator using cosine similarity~~ Translator: Improve translation evaluation in Co-op Translator using cosine similarity Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translator: Improve translation evaluation in Co-op Translator using cosine similarity #37

Translator: Improve translation evaluation in Co-op Translator using cosine similarity #37

skytin1004 commented Oct 10, 2024

Translator: Improve translation evaluation in Co-op Translator using cosine similarity #37

Translator: Improve translation evaluation in Co-op Translator using cosine similarity #37

Comments

skytin1004 commented Oct 10, 2024

Describe the feature you'd like

Problem this feature solves

Alternatives considered

Additional context

Are you willing to submit a pull request to implement this feature?

Code of Conduct