Skip to content

Bug — Latin-1 / WinAnsi text loses the Euro sign (and other CP-1252 glyphs), rendered as ? #48

@Nizoka

Description

@Nizoka

Bug — Latin-1 / WinAnsi text loses the Euro sign (and other CP-1252 glyphs), rendered as ?

Affects: pdfnative ≤ 1.2.0
Severity: Medium — visible data corruption in generated PDFs; any document
containing (and other characters in the CP-1252 0x80–0x9F range) shows ?
instead of the intended glyph.

Symptom

When text containing the Euro sign (U+20AC) is rendered with a standard
PDF base font (Helvetica / Times / Courier) using the built-in single-byte text
encoding, the character is written to the content stream as ? instead of the
correct code point. The same happens for other characters that live in the
Windows-1252 (CP-1252) 0x800x9F band but are absent from Latin-1 /
StandardEncoding, e.g. ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž ‘ ’ “ ” • – — ™ š › œ ž Ÿ.

The input string is correct UTF-8 at the API boundary; the loss happens inside
pdfnative when the Unicode code point is mapped to a single-byte font
encoding and no matching glyph is found, so the encoder falls back to ?.

Reproduction (minimal)

import { buildPDFBytes } from 'pdfnative';
import { readFileSync, writeFileSync } from 'node:fs';

const bytes = buildPDFBytes(
  {
    title: 'My first invoice',
    headers: ['Item', 'Price'],
    rows: [{ cells: ['Widget A', '12.00 €'], type: '', pointed: false }],
    footerText: '',
  },
  { compress: true },
);

writeFileSync('out.pdf', bytes);
// Open out.pdf: the price cell shows "12.00 ?" instead of "12.00 €".

Reproduction via PDFNative Cloud (downstream)

This was first observed through the Cloud API. The request body is valid UTF-8,
yet the is lost in the rendered PDF:

$body = @{
  document = @{
    title   = "My first invoice"
    headers = @("Item", "Price")
    rows    = @(, @("Widget A", "12.00 €"))
  }
} | ConvertTo-Json -Depth 6

Invoke-RestMethod `
  -Uri http://localhost:8787/v1/generate `
  -Method POST `
  -Headers @{ Authorization = "Bearer pk_test_YOUR_KEY" } `
  -ContentType "application/json; charset=utf-8" `
  -Body ([System.Text.Encoding]::UTF8.GetBytes($body)) `
  -OutFile table.pdf
# table.pdf shows "12.00 ?"

The Cloud layer (pdfnative-cloud) passes the strings through verbatim to
buildPDFBytes / buildDocumentPDFBytes with no font or encoding handling of
its own, which confirms the mapping happens inside pdfnative.

Root cause (suspected)

The single-byte text encoder maps Unicode → font code points using Latin-1 /
StandardEncoding, where:

  • 0x800x9F are control positions (no ), and
  • (U+20AC) has no Latin-1 code point at all.

PDF's standard fonts are actually meant to be used with WinAnsiEncoding
(CP-1252), in which is at byte 0x80. If the encoder targets Latin-1 (or a
bare StandardEncoding) and substitutes unknown code points with ?, every
CP-1252-only glyph is lost.

Suggested fix

  1. Use /Encoding /WinAnsiEncoding for the standard Type1 base fonts and
    map Unicode → WinAnsi (so U+20AC → 0x80, U+2019 0x92, etc.) before
    writing the single-byte string.
  2. When a code point is outside WinAnsi, prefer embedding a Unicode TrueType
    font with Identity-H
    (composite font + ToUnicode CMap) instead of
    emitting ?, so non-CP-1252 scripts keep working.
  3. As a minimum, replace the silent ? fallback with the correct WinAnsi byte
    whenever one exists; only fall back when truly no glyph is available.

Acceptance: the minimal reproduction above must render 12.00 € (and the
CP-1252 punctuation set) correctly, and a ToUnicode map should make the text
selectable/searchable as .

References

  • ISO 32000-1 §9.6.6.4 / Annex D — WinAnsiEncoding and the standard Latin
    character set (Euro at 0x80).
  • Unicode: = U+20AC; CP-1252 byte 0x80.
  • Downstream call site (no encoding logic, passes text through):
    pdfnative-cloudapps/api/src/lib/generation.ts (generateTablePdf /
    generateDocumentPdf).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions