Skip to content

Implement of Knuth-plass algorithm in emacs-lisp, support for mixed typesetting of CJK and Latin languages.

Notifications You must be signed in to change notification settings

Kinneyzhang/emacs-kp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

中文文档

Introduction

Emacs-kp implements the knuth-plass typesetting algorithm, but its capabilities extend beyond English typesetting. Through further optimization of the algorithm, it achieves hybrid typesetting for both CJK and Latin-based languages.

Demo

First, let's look at a demo of the typesetting effect:

ekp-demo

Limitations

Currently, it only supports hybrid typesetting between CJK and one Latin-based language. Mixed typesetting with multiple Latin-based languages is not supported. This limitation arises because the system cannot precisely determine which language a word belongs to in order to perform hyphenation.

Usage

Configuration

ekp-latin-lang is used to set the primary Latin-based language in the text. The default setting is "en_US". All supported languages can be found in the "dictionaries" directory. The language name must match the name following "hyph_" in the dictionary files. Test cases include examples of German and French typesetting. Other languages have not been tested extensively but should theoretically work; however, finer customization may be required.

ekp-param-set is a function used to configure fundamental typesetting parameters. These parameters include:

Parameter Meaning
ekp-lws-ideal-pixel Ideal pixel width between Latin words
ekp-lws-stretch-pixel Stretchable pixel width between Latin words
ekp-lws-shrink-pixel Shrinkable pixel width between Latin words
ekp-mws-ideal-pixel Ideal pixel width between Latin words and CJK characters
ekp-mws-stretch-pixel Stretchable pixel width between Latin words and CJK characters
ekp-mws-shrink-pixel Shrinkable pixel width between Latin words and CJK characters
ekp-cws-ideal-pixel Ideal pixel width between CJK characters
ekp-cws-stretch-pixel Stretchable pixel width between CJK characters
ekp-cws-shrink-pixel Shrinkable pixel width between CJK characters

For example: (ekp-param-set 7 3 2 5 2 1 0 2 0) sets the above parameters accordingly. ​​Do not modify these variables directly – always use this function for configuration.​​

If not manually configured, the default values follow KP algorithm recommendations for spaces between latin words:

  • The ideal width is set to the pixel width of a space character.
  • The stretchable width defaults to 1/2 of the ideal width.
  • The shrinkable width defaults to 1/3 of the ideal width.

For spaces between latin word and CJK character: ekp-mws-ideal-pixel = ekp-lws-ideal-pixel - 2 while maintaining the same stretch/shrink proportions.

For spaces between CJK characters: Ideal width between CJK characters defaults to 0. Stretchable width between CJK characters defaults to 2 pixels. Shrinkable width between CJK characters defaults to 0 (non-compressible).

Core Functions

Two functions are provided:

(ekp-pixel-justify string line-pixel)

Formats the text STRING to fit a pixel width of LINE-PIXEL per line and returns the justified text.

(ekp-pixel-range-justify string min-pixel max-pixel)

Searches for optimal typesetting within the range of MIN-PIXEL to MAX-PIXEL. Returns a cons-cell where the car is the formatted text and the cdr is the pixel value achieving the best typesetting result. Please Note: This function iteratively computes the typesetting cost between the minimum and maximum pixel values to find the optimal case at the minimum cost. ​​If the specified range is too large, execution time may increase significantly.​​ Future updates plan to leverage Rust dynamic libraries for parallel computation to improve performance.

Next Todos

  • Preserve the original text's text properties.
  • Refactor using Rust dynamic modules: Utilize Rust's parallel computing capabilities to enhance rendering performance.
  • Implement autocorrection for punctuation: Correct English punctuation mistakenly used in Chinese text; Correct Chinese punctuation mistakenly used in English texts...

Credits

  • The core algorithm is fundamentally derived from the seminal paper: "Breaking Paragraphs into Lines" by ​​DONALD E. KNUTH AND MICHAEL F. PLASS​​.

  • The implementation of ​​Latin word hyphenation​​ is adapted from the source code of the ​​Pyphen​​ Python library, and the corresponding dictionaries originate from this project: https://github.com/Kozea/Pyphen

About

Implement of Knuth-plass algorithm in emacs-lisp, support for mixed typesetting of CJK and Latin languages.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published