Language Detection And Highlighting

Implementation roots:

Core detection entrypoint: ../crates/localpaste_core/src/detection/mod.rs
GUI highlight pipeline entrypoints:
- ../crates/localpaste_gui/src/app/highlight/mod.rs
- ../crates/localpaste_gui/src/app/highlight/worker.rs

Feature Topology

localpaste_core keeps magika as opt-in (default = []).
localpaste_gui and localpaste_server enable magika by default.
localpaste_cli uses localpaste_core defaults, so it is heuristic-only unless explicitly feature-enabled in downstream builds.

This keeps GUI/server detection broad by default while preserving portability for core/CLI users.

Detection Flow

For auto-detected language (language_is_manual == false):

If magika feature is enabled:
- run Magika detection,
- reject non-text results,
- reject generic labels (txt, randomtxt, unknown, empty, undefined),
- normalize and return if non-empty and not text.
Otherwise (or if Magika is unavailable/fails/generic), run heuristic fallback.
Normalize heuristic label and return unless empty/text.

Auto mode is intentionally "pending detection":

switching to auto clears the resolved language label,
the next content edit re-runs detection,
if detection resolves a concrete language, that value is locked (language_is_manual = true) until explicitly switched back to auto.
API create requests that omit language_is_manual detect immediately and lock when detection resolves; pass language_is_manual: false to start unresolved auto mode and defer detection until a later edit.

For manual language (language_is_manual == true), content edits do not re-run auto detection.

Important

Manual language selection disables automatic re-detection on edits until you switch back to auto mode.

Magika session lifecycle:

lazy singleton (OnceLock<Result<Mutex<magika::Session>, String>>),
guarded with Mutex because Magika identify calls require &mut self,
prewarm() is called in GUI/server startup paths to avoid first-save load latency.

Normalization Contract

Normalization maps legacy aliases and user-entered variants to stable labels (examples):

csharp, c# -> cs
c++ -> cpp
bash, sh, zsh -> shell
pwsh, ps1 -> powershell
yml -> yaml
js -> javascript
ts -> typescript
md -> markdown
plaintext, plain text, plain, txt -> text

Unknown values pass through in lowercase.

Manual language picker values are defined centrally in MANUAL_LANGUAGE_OPTIONS and stored as normalized values.

Filter And Search Semantics

Language filter matching normalizes both:

stored language metadata,
incoming filter value.

This preserves interoperability across legacy and current labels (for example, csharp and cs).

Search ranking also checks normalized language values to avoid losing metadata relevance as stored labels evolve.

GUI Highlight Resolution

GUI highlight resolution uses a multi-step strategy instead of a fixed name table:

exact syntax name
exact extension
case-insensitive name
normalized-name match (alphanumeric only)
case-insensitive extension scan
explicit fallback candidates for known mismatches/high-priority labels
plain text

Policy:

Keep explicit fallback mapping narrow and intentional.
Preserve unsupported-language visibility by keeping their metadata labels even when rendering falls back to plain text.

Fallback candidate mapping lives in:

../crates/localpaste_gui/src/app/highlight/syntax.rs (syntax_fallback_candidates)

Fallback coverage tests live in:

../crates/localpaste_gui/src/app/highlight/worker.rs (resolver_tests)

Examples covered by the resolver tests include:

fallback-to-grammar labels (for example: typescript, powershell, sass)
metadata-only/plain-render labels (for example: zig, kotlin, dart)

Virtual Editor Async Highlight Flow

Virtual-editor highlight behavior is async and staged to avoid mid-burst visual churn while typing.

Flow:

UI sends a highlight request keyed by paste/context (paste_id, revision, text_len, language_hint, theme_key).
Worker coalesces queued requests and computes either:
- full render (HighlightRender), or
- changed-range patch (HighlightPatch) when the UI base snapshot matches the worker cache base.
UI merges matching patches into staged/current highlight state.
Staged highlight applies:
- immediately only when there is no current render,
- otherwise only after idle threshold.

Current policy constants (virtual editor):

idle apply threshold: 200ms
adaptive debounce windows:
- tiny edits (<=4 changed chars, <=2 touched lines): 15ms
- medium edits: 35ms
- larger supported buffers (>=64KB): 50ms
- async highlighting disabled: 0ms (synchronous/no debounce path)
plain rendering guardrail: >=256KB content

Primary implementation:

request/stage/apply lifecycle: ../crates/localpaste_gui/src/app/highlight_flow.rs
virtual edit hint capture: ../crates/localpaste_gui/src/app/virtual_ops.rs
editor dispatch and debounce usage: ../crates/localpaste_gui/src/app/ui/editor_panel.rs

Runtime Provider Default (Magika)

When Magika is enabled, runtime defaults to CPU execution provider:

env var: MAGIKA_FORCE_CPU
default: true
falsey values (0, false, no, off) allow runtime/provider defaults

Reference: ../.env.example

YAML Refinement Guardrail

YAML auto-detection keeps single-line mappings valid (for example, key: value) while still rejecting obvious non-YAML noise when refining model output.

Primary implementation:

../crates/localpaste_core/src/detection/mod.rs

Validation Targets

When touching detection/highlight behavior, validate:

core detection tests (localpaste_core::detection::tests),
GUI resolver/worker tests (localpaste_gui::app::highlight::worker::resolver_tests),
GUI manual checks in dev/gui-notes.md,
GUI perf checks in dev/gui-perf-protocol.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language Detection And Highlighting

Feature Topology

Detection Flow

Normalization Contract

Filter And Search Semantics

GUI Highlight Resolution

Virtual Editor Async Highlight Flow

Runtime Provider Default (Magika)

YAML Refinement Guardrail

Validation Targets

FilesExpand file tree

language-detection.md

Latest commit

History

language-detection.md

File metadata and controls

Language Detection And Highlighting

Feature Topology

Detection Flow

Normalization Contract

Filter And Search Semantics

GUI Highlight Resolution

Virtual Editor Async Highlight Flow

Runtime Provider Default (Magika)

YAML Refinement Guardrail

Validation Targets