Bug — Latin-1 / WinAnsi text loses the Euro sign € (and other CP-1252 glyphs), rendered as ?
Affects: pdfnative ≤ 1.2.0
Severity: Medium — visible data corruption in generated PDFs; any document
containing € (and other characters in the CP-1252 0x80–0x9F range) shows ?
instead of the intended glyph.
Symptom
When text containing the Euro sign € (U+20AC) is rendered with a standard
PDF base font (Helvetica / Times / Courier) using the built-in single-byte text
encoding, the character is written to the content stream as ? instead of the
correct code point. The same happens for other characters that live in the
Windows-1252 (CP-1252) 0x80–0x9F band but are absent from Latin-1 /
StandardEncoding, e.g. ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž ‘ ’ “ ” • – — ™ š › œ ž Ÿ.
The input string is correct UTF-8 at the API boundary; the loss happens inside
pdfnative when the Unicode code point is mapped to a single-byte font
encoding and no matching glyph is found, so the encoder falls back to ?.
Reproduction (minimal)
import { buildPDFBytes } from 'pdfnative';
import { readFileSync, writeFileSync } from 'node:fs';
const bytes = buildPDFBytes(
{
title: 'My first invoice',
headers: ['Item', 'Price'],
rows: [{ cells: ['Widget A', '12.00 €'], type: '', pointed: false }],
footerText: '',
},
{ compress: true },
);
writeFileSync('out.pdf', bytes);
// Open out.pdf: the price cell shows "12.00 ?" instead of "12.00 €".
Reproduction via PDFNative Cloud (downstream)
This was first observed through the Cloud API. The request body is valid UTF-8,
yet the € is lost in the rendered PDF:
$body = @{
document = @{
title = "My first invoice"
headers = @("Item", "Price")
rows = @(, @("Widget A", "12.00 €"))
}
} | ConvertTo-Json -Depth 6
Invoke-RestMethod `
-Uri http://localhost:8787/v1/generate `
-Method POST `
-Headers @{ Authorization = "Bearer pk_test_YOUR_KEY" } `
-ContentType "application/json; charset=utf-8" `
-Body ([System.Text.Encoding]::UTF8.GetBytes($body)) `
-OutFile table.pdf
# table.pdf shows "12.00 ?"
The Cloud layer (pdfnative-cloud) passes the strings through verbatim to
buildPDFBytes / buildDocumentPDFBytes with no font or encoding handling of
its own, which confirms the mapping happens inside pdfnative.
Root cause (suspected)
The single-byte text encoder maps Unicode → font code points using Latin-1 /
StandardEncoding, where:
0x80–0x9F are control positions (no €), and
€ (U+20AC) has no Latin-1 code point at all.
PDF's standard fonts are actually meant to be used with WinAnsiEncoding
(CP-1252), in which € is at byte 0x80. If the encoder targets Latin-1 (or a
bare StandardEncoding) and substitutes unknown code points with ?, every
CP-1252-only glyph is lost.
Suggested fix
- Use
/Encoding /WinAnsiEncoding for the standard Type1 base fonts and
map Unicode → WinAnsi (so U+20AC → 0x80, U+2019 ’ → 0x92, etc.) before
writing the single-byte string.
- When a code point is outside WinAnsi, prefer embedding a Unicode TrueType
font with Identity-H (composite font + ToUnicode CMap) instead of
emitting ?, so non-CP-1252 scripts keep working.
- As a minimum, replace the silent
? fallback with the correct WinAnsi byte
whenever one exists; only fall back when truly no glyph is available.
Acceptance: the minimal reproduction above must render 12.00 € (and the
CP-1252 punctuation set) correctly, and a ToUnicode map should make the text
selectable/searchable as €.
References
- ISO 32000-1 §9.6.6.4 / Annex D — WinAnsiEncoding and the standard Latin
character set (Euro at 0x80).
- Unicode:
€ = U+20AC; CP-1252 byte 0x80.
- Downstream call site (no encoding logic, passes text through):
pdfnative-cloud → apps/api/src/lib/generation.ts (generateTablePdf /
generateDocumentPdf).
Bug — Latin-1 / WinAnsi text loses the Euro sign
€(and other CP-1252 glyphs), rendered as?Affects:
pdfnative≤ 1.2.0Severity: Medium — visible data corruption in generated PDFs; any document
containing
€(and other characters in the CP-1252 0x80–0x9F range) shows?instead of the intended glyph.
Symptom
When text containing the Euro sign
€(U+20AC) is rendered with a standardPDF base font (Helvetica / Times / Courier) using the built-in single-byte text
encoding, the character is written to the content stream as
?instead of thecorrect code point. The same happens for other characters that live in the
Windows-1252 (CP-1252)
0x80–0x9Fband but are absent from Latin-1 /StandardEncoding, e.g.‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž ‘ ’ “ ” • – — ™ š › œ ž Ÿ.The input string is correct UTF-8 at the API boundary; the loss happens inside
pdfnativewhen the Unicode code point is mapped to a single-byte fontencoding and no matching glyph is found, so the encoder falls back to
?.Reproduction (minimal)
Reproduction via PDFNative Cloud (downstream)
This was first observed through the Cloud API. The request body is valid UTF-8,
yet the
€is lost in the rendered PDF:The Cloud layer (
pdfnative-cloud) passes the strings through verbatim tobuildPDFBytes/buildDocumentPDFByteswith no font or encoding handling ofits own, which confirms the mapping happens inside
pdfnative.Root cause (suspected)
The single-byte text encoder maps Unicode → font code points using Latin-1 /
StandardEncoding, where:0x80–0x9Fare control positions (no€), and€(U+20AC) has no Latin-1 code point at all.PDF's standard fonts are actually meant to be used with WinAnsiEncoding
(CP-1252), in which
€is at byte0x80. If the encoder targets Latin-1 (or abare
StandardEncoding) and substitutes unknown code points with?, everyCP-1252-only glyph is lost.
Suggested fix
/Encoding /WinAnsiEncodingfor the standard Type1 base fonts andmap Unicode → WinAnsi (so U+20AC →
0x80, U+2019’→0x92, etc.) beforewriting the single-byte string.
font with Identity-H (composite font +
ToUnicodeCMap) instead ofemitting
?, so non-CP-1252 scripts keep working.?fallback with the correct WinAnsi bytewhenever one exists; only fall back when truly no glyph is available.
Acceptance: the minimal reproduction above must render
12.00 €(and theCP-1252 punctuation set) correctly, and a
ToUnicodemap should make the textselectable/searchable as
€.References
character set (Euro at
0x80).€= U+20AC; CP-1252 byte0x80.pdfnative-cloud→apps/api/src/lib/generation.ts(generateTablePdf/generateDocumentPdf).