Translator: Improve translation evaluation in Co-op Translator using cosine similarity #37
Open
1 of 2 tasks
Labels
enhancement
New feature or request
translator
Related to any changes in the translation-related source files
Describe the feature you'd like
Currently, Co-op Translator identifies translation issues by comparing the number of line breaks between the original and translated content. While this helps flag significant differences, it is not always accurate, especially for longer documents where line breaks might be intentionally added by OpenAI to improve readability.
I would like to introduce a more sophisticated evaluation method using cosine similarity. By converting both the original and translated documents into vectors using embedding techniques like TF-IDF or Doc2Vec, we could measure the semantic similarity between the two. If the cosine similarity score is above a certain threshold (e.g., 0.7), we can assume the translation is accurate. If it falls below, the document could be flagged for further review.
Problem this feature solves
This feature would provide a more reliable way to assess the quality of translations by comparing the meaning rather than the formatting. It would help in cases where line breaks are not a definitive measure of translation accuracy, ensuring that meaningful translations are not mistakenly flagged as errors due to formatting differences.
Alternatives considered
We initially considered using Azure OpenAI to verify translation quality by sending both the original and translated documents for comparison. However, this approach was discarded because it would be too time-consuming and costly.
Additional context
Embedding techniques such as TF-IDF or Doc2Vec could be integrated into the translation process to generate vector representations of the documents. By calculating cosine similarity between the original and translated content, we can evaluate how closely the translated document retains the original meaning. A similarity score could be displayed along with the translation result, helping reviewers focus on documents with lower scores.
Are you willing to submit a pull request to implement this feature?
Code of Conduct
The text was updated successfully, but these errors were encountered: