Skip to content

Conversation

@pasotee
Copy link
Contributor

@pasotee pasotee commented Oct 20, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Improved parsing of multiline translations in PO files, including correct handling of LF and CRLF line breaks.
    • Enhanced interpretation of escape sequences inside quoted strings so escaped characters are correctly rendered.
  • Tests

    • Updated test fixtures and expectations to reflect additional parsed translations and multiline handling.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 20, 2025

Walkthrough

Po parsing was changed to split header meta lines by raw newline characters and to interpret common escape sequences inside quoted strings (e.g., \n, \r, \t, \", \\) as their actual characters. Tests and example PO resource entries were updated to cover multiline translations (LF and CRLF).

Changes

Cohort / File(s) Summary
Parser logic
backend/data/src/main/kotlin/io/tolgee/formats/po/in/PoParser.kt
processHeader now splits header text on raw \n characters. Character handling for escaped sequences was changed: when inside quotes and currentEscaped is true, common escapes (n, r, t, ", \) are converted to their corresponding characters; unknown escapes preserve the backslash plus character.
Unit tests
backend/data/src/test/kotlin/io/tolgee/unit/formats/po/in/PoParserTest.kt, backend/data/src/test/kotlin/io/tolgee/unit/formats/po/in/PoFileProcessorTest.kt
Added assertions in PoParserTest validating multiline msgstr values (LF and CRLF). Updated PoFileProcessorTest expected translation count from 9 to 11.
Test resources
backend/data/src/test/resources/import/po/example.po
Added two PO entries demonstrating multiline translations: one using LF (\n) and one using CRLF (\r\n) line breaks in msgstr.

Sequence Diagram(s)

sequenceDiagram
  participant FS as FileSystem
  participant Parser as PoParser
  participant Test as TestRunner

  FS->>Parser: load example.po (header + entries)
  note right of Parser #DDEEFF: Header processing
  Parser->>Parser: processHeader(split by raw "\n")
  note right of Parser #F6F2DD: Entry parsing
  Parser->>Parser: parse msgid/msgstr chars
  alt escaped char inside quotes
    Parser->>Parser: translate '\n','\r','\t','"','\\' → actual char
  else unknown escape
    Parser->>Parser: keep backslash + char
  end
  Parser->>Test: produce translations (11)
  Test->>FS: compare against updated example.po expectations
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nibbled backslashes late at night,

Turned \n to breath and \" to light.
Two new lines hopped into view,
CRLF and LF—both true.
Hooray, the parser sings anew!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Added PO handling for new lines + tests" directly and accurately reflects the main changes in the pull request. The changeset focuses on improving how the PO file parser handles newline characters (both LF and CRLF) in multiline translations, with corresponding parser logic updates in PoParser.kt and multiple test additions and updates to validate this behavior. The title is specific enough to convey the primary intent without being overly verbose, and it clearly indicates what functionality was added and that tests were included.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32c24b6 and bc273ef.

📒 Files selected for processing (1)
  • backend/data/src/main/kotlin/io/tolgee/formats/po/in/PoParser.kt (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/data/src/main/kotlin/io/tolgee/formats/po/in/PoParser.kt

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
backend/data/src/main/kotlin/io/tolgee/formats/po/in/PoParser.kt (2)

54-68: Header split on LF aligns with new decoding; add safety and locale fixes.

Good change. Two follow-ups:

  • Guard malformed header lines (no colon) to avoid IndexOutOfBounds.
  • Use Locale.ROOT for case-insensitive keys; avoid default-locale surprises (e.g., Turkish i).
  • Minor: use lineSequence()/forEach instead of map-for-side-effects.

Apply:

-      it.msgstr.split("\n").map { metaLine ->
-        val trimmed = metaLine.trim()
-        if (trimmed.isBlank()) {
-          return@map
-        }
-        val colonPosition = trimmed.indexOf(":")
-        val key = trimmed.substring(0 until colonPosition)
-        val value = trimmed.substring(colonPosition + 1).trim()
-        when (key.lowercase(Locale.getDefault())) {
+      it.msgstr.lineSequence().forEach { metaLine ->
+        val trimmed = metaLine.trim()
+        if (trimmed.isEmpty()) return@forEach
+        val colonPosition = trimmed.indexOf(':')
+        if (colonPosition <= 0) return@forEach
+        val key = trimmed.substring(0, colonPosition).lowercase(Locale.ROOT)
+        val value = trimmed.substring(colonPosition + 1).trim()
+        when (key) {
           "project-id-version" -> result.projectIdVersion = value
           "language" -> result.language = value
           "plural-forms" -> result.pluralForms = value
           else -> result.other[key] = value
         }
-      }
+      }

Please add a small test asserting header meta (e.g., language/plural-forms) still parses when a line lacks a colon or has trailing spaces.


129-137: Escape decoding inside quotes looks right; consider completing the table.

Current set covers n/r/t/"/\. Add common C escapes b (backspace), f (form feed), v (vertical tab). Unknowns remain literal (good).

-      val specialEscape: Char? = if (quoted) when (this) {
+      val specialEscape: Char? = if (quoted) when (this) {
         'n'  -> '\n'
         'r'  -> '\r'
         't'  -> '\t'
+        'b'  -> '\b'
+        'f'  -> '\u000C'
+        'v'  -> '\u000B'
         '"'  -> '"'
         '\\' -> '\\'
         else -> null
       } else null

Add one assertion that unknown escapes (e.g., "\q") are preserved as two characters and that "\b" maps to backspace when enabled.

Also applies to: 138-143

backend/data/src/test/kotlin/io/tolgee/unit/formats/po/in/PoParserTest.kt (1)

30-31: Avoid index fragility; assert by msgid.

Order can shift; look up entries by msgid for robustness.

-    assertThat(result.translations[10].msgstr.toString()).isEqualTo("This\nis\na\nmultiline\nstring")
-    assertThat(result.translations[11].msgstr.toString()).isEqualTo("This\r\nis\r\na\r\nmultiline\r\nstring")
+    val lf = result.translations.first { it.msgid.toString() == "Multiline message with \\n" }
+    assertThat(lf.msgstr.toString()).isEqualTo("This\nis\na\nmultiline\nstring")
+    val crlf = result.translations.first { it.msgid.toString() == "Multiline message with \\n\\r" }
+    assertThat(crlf.msgstr.toString()).isEqualTo("This\r\nis\r\na\r\nmultiline\r\nstring")
backend/data/src/test/kotlin/io/tolgee/unit/formats/po/in/PoFileProcessorTest.kt (1)

32-32: Consider asserting presence, not only count.

Exact size can fluctuate as the fixture evolves. Optionally assert keys exist to make intent explicit.

 assertThat(mockUtil.fileProcessorContext.translations).hasSize(11)
+assertThat(mockUtil.fileProcessorContext.translations.keys)
+  .contains("Multiline message with \\n", "Multiline message with \\n\\r")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99cbd12 and 32c24b6.

📒 Files selected for processing (4)
  • backend/data/src/main/kotlin/io/tolgee/formats/po/in/PoParser.kt (2 hunks)
  • backend/data/src/test/kotlin/io/tolgee/unit/formats/po/in/PoFileProcessorTest.kt (1 hunks)
  • backend/data/src/test/kotlin/io/tolgee/unit/formats/po/in/PoParserTest.kt (1 hunks)
  • backend/data/src/test/resources/import/po/example.po (1 hunks)
🔇 Additional comments (1)
backend/data/src/test/resources/import/po/example.po (1)

53-57: Test data additions LGTM.

The LF/CRLF cases are well chosen and align with parser changes.

@JanCizmar JanCizmar requested a review from Anty0 October 20, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant