Skip to content

Added i18n component and related scripts #1082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 51 commits into
base: master
Choose a base branch
from

Conversation

yihao03
Copy link
Member

@yihao03 yihao03 commented Mar 28, 2025

run yarn trans to start translation

yihao03 and others added 27 commits March 12, 2025 22:12
…toring segment handling to use arrays for better management
… merging logic and remove unnecessary debug logs
…ded try catch when parsing xml to json to report failed parsing possibly attributed to unsound xml structure.
- Created a new XML file for references (97references97.xml) containing a comprehensive list of references used in the SICP JS project.
- Added a new XML file for the index preface (98indexpreface98.xml) to provide context and formatting for the index section.
- Introduced a new XML file for the making section (99making99.xml) detailing the background, interactive features, and development history of the SICP JS project.
- Updated subsection2.xml to close the previously open SUBSECTION tag and include a comment for clarity.
@yihao03
Copy link
Member Author

yihao03 commented Apr 12, 2025

breaking changes are made: xml repositories are divided into folders, currently consisting of en and cn folders to store translated content. The same applies to the json folder after running "yarn json". Frontend needs to be changed accordingly to fetch json files from the corresponding url

@yihao03 yihao03 self-assigned this Apr 13, 2025
@coder114514 coder114514 force-pushed the master branch 2 times, most recently from f4d52e0 to 851b85c Compare April 14, 2025 09:32
throw new Error('Undefined language');
}

if (!troubleshoot) assistant = await createAssistant(langCode, language, ai as any);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re casting your OpenAI client to any when calling createAssistant. Remove the as any cast and unify your imports so you only pull in OpenAI from the main package entrypoint. For example:

// i18n/initializers/initialize.ts
import OpenAI from 'openai';

// instead of importing from internal paths:
// import OpenAI from 'openai/index.mjs';

Then simply call:

assistant = await createAssistant(langCode, language, ai);

By using a single import OpenAI from 'openai' everywhere, TypeScript will recognize the same class, eliminating the need for any.

@@ -0,0 +1,45 @@
import fs from "fs";
import OpenAI from "openai/index.mjs";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the previous comment about dropping the ai as any cast, you should also update the import in your initialize.ts to:

import OpenAI from 'openai';

Comment on lines +26 to +28
const fileStreams = [path.join(__dirname, "../ai_files", langCode, "dictionary.txt")].map(
path => fs.createReadStream(path)
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should attach an error handler to each stream so file I/O issues (like missing or unreadable files) are caught immediately:

const fileStreams = [path.join(__dirname, '../ai_files', langCode, 'dictionary.txt')].map(
    filePath => {
      const stream = fs.createReadStream(filePath);

      stream.on('error', err => {
          throw new Error(`Failed to read dictionary file at ${filePath}: ${err.message}`);  
      });
      return stream;
  });

This ensures your process fails fast with a clear error instead of silently hanging or emitting later failures.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openai, sax, and dotenv are not declared in your package.json dependencies, and you also need the Node.js types for TypeScript. You can add them like:

{
  "dependencies": {
    "sicp": "^1.1.4",
    "openai": "^x.y.z",
    "sax": "^x.y.z",
    "dotenv": "^x.y.z"
  },
  "devDependencies": {
    "@types/node": "^x.y.z"
  }
}

That way, running yarn install will automatically pull in those packages and the Node types.

Comment on lines +410 to +414
const logPath = path.join(logDir, `json-summary-${timestamp}.log`);
fs.writeFileSync(logPath, summaryLog);
console.log(
`Summary log saved to logs/translation-summary-${timestamp}.log`
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a mismatch between the log filename you write (json-summary-...) and the one you log to the console (translation-summary-...).

Comment on lines +15 to +18
const getDirectories = async source =>
(await readdir(source, { withFileTypes: true }))
.filter(dirent => dirent.isDirectory())
.map(dirent => dirent.name);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter source is implicitly typed as any. You should add an explicit type annotation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several magic literals in this file, for example:

  • Batch size fallback: max_trans_num = Number(process.env.MAX_TRANSLATION_NO) || 5
  • API list limit: await ai.beta.assistants.list({ limit: 100 })
  • Filename prefix in logs: translation-summary-
  • Timestamp hack: .toISOString().replace(/[:.]/g, "-")

There are more magic values throughout consider extracting them into named constants or a configuration object for better maintainability.

Comment on lines +184 to +195
try {
const cnStats = await fs.promises.stat(cnFilePath);
if (!cnStats.isFile()) {
return true;
}

const enStats = await fs.promises.stat(enFilePath);
return enStats.mtime > cnStats.mtime;
} catch (error) {
return true;
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching all errors and returning true can mask real file‑system issues. You should be logging or re‑throwing unexpected errors so you don’t skip translations silently.

// Add detailed errors captured during translation process
const fileErrors = getFileErrors();
if (Object.keys(fileErrors).length > 0) {
failureCount = Object.keys(fileErrors).length;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failureCount variable is declared globally and reused here, so resetting it in the summary section overwrites the failures accumulated during the batch loop rather than keeping them separate. This makes the initial accumulation effectively redundant.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file handles a lot of logic and, for example, embeds several magic literals directly:

  • MAXLEN = Number(process.env.MAX_LEN) || 3000
  • max_trans_num = Number(process.env.MAX_TRANSLATION_NO) || 5
  • { limit: 100 } in the API calls
  • Log prefixes like "translation-summary-", "json-summary-", "emergency-log-"
  • The ignoredTags array

Embedding these values makes configuration harder and ties defaults to code changes.

Proposed changes:

  1. Extract config constants

    • Move environment fallbacks, numeric limits, prefixes, and tag arrays into config.ts.
    • This lets you validate required vars and override settings (via .env, CI, etc.) without modifying application code.
  2. Modularize by responsibility

    • config.ts: all constants and environment setup
    • logger.ts: error‑logging and summary helpers
    • parser.ts: SAX parser instantiation and XML escaping utilities
    • translator.ts: core translation flow and CLI handling
    • io.ts: file system utilities (directory traversal, stream management)

Breaking the file into focused modules and centralizing configuration will improve readability, testability, and ease of maintenance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants