GitHub - Baksalyar/Metagrapher: A convenient Python library for obtaining readable Universal Resource Identifiers (URIs) and other Latin script identifiers generated from strings written in different languages

Metagrapher — convenient Python library for obtaining readable Universal Resource Identifiers (URIs) and other Latin script identifiers generated from strings written in different languages, using non-standard characters and UTF-8 symbols. Also, this technique is called transliteration or romanization.

Installation

Coming soon...

pip3 install metagrapher

Supported Languages

At the moment, Metagrapher supports the following language pairs (in alphabetic order):

Arabic → English (ar_en) [draft]
Armenian → English (hy_en)
Azerbaijani → English (az_en)
Burmese → English (br_en) [draft]
Chinese → English (zh_en)
Czech → English (cz_en)
Deutsch → English (de_en)
Divehi → English (dv_en)
English → Russian (en_ru)
Finnish → English (fi_en)
French → English (fr_en)
Greek → English (el_en)
Hungarian → English (hu_en)
Japanese → English (jp_en)
Lithuanian → English (lt_en)
Latin → English (la_en)
Latvian → English (lv_en)
Makedonian → English (mk_en)
Polish → English (pl_en)
Portuguese → English (pt_en)
Romanian → English (ro_en)
Russian → English (ru_en)
Serbian → English (sr_en)
Slovak → English (sk_en)
Thai → English (th_en) [draft]
Turkish → English (tr_en)
Turkmen → English (tk_en)
Ukrainian → English (uk_en)
Vietnamese → English (vi_en)
Unicode Symbols → English* ('symbols_en')

Why Metagrapher?

A primary intention of this library is to create readable identifiers for any items, mostly — so called semantic URLs, which is more readable to live users, legacy software and to search engines.

This library will help to generate a readable identifier from the any string with “non-standard” (for Latin alphabet) characters, as well as from strings written in another language. If your primary language is English, I bet that the following URL will be absolutely unreadable for you: http://goodi.es/волшебство/الرغبات السرية/. Furthermore, this URL cannot be read for some legacy software and hardware. Metagrapher will kindly convert the above mentioned URL to user- and machine-readable: http://goodi.es/volshebstvo/alrghbat-alsryh.

Examples

Language-specific conversion (if you want to convert from specific language to English consciously):

import Metagrapher

print( Metagrapher('матрёшка водка бабушка балалайка').ru_en() )

Will output: matryoshka vodka babushka balalayka

If you want to convert only additional UTF-characters like ♥ or ☣, for instance, you can use the symbols_en() method:

Metagrapher('I ♥ NY and want a cup of ☕ right now!').symbols_en().lower().slugify('_')

Will generate: i_love_ny_and_want_a_cup_of_coffee_right_now

Oh, by the way, slugify() will remove any non-readable characters and provide you with a ready-to-use string. As an argument, method slugify() will accept any string for delimiting words, - (will be used by default).

And, as you might think, a lower() method will lower-case all upper-cased characters.

One of the strong sides of Metagrapher is a delicate case conversion. For instance, Metagrapher will preserve the case for transliterated string: 'ШАПКА Юлии' by default will translate to 'SHAPKA Yulii', other libraries will convert this string to 'ShAPKA Yulii' or even to 'SHAPKA YUlii', and 'ШАШКА ЧАПАЯ' to 'ShAShKA ChAPAYa'.

If you intend to translate only currency signs, like $ or €, or £, etc. to readable text, you can use currencies_en() method:

Metagrapher('new t-shirts for only € 3,60').currencies().slugify()

Will generate: New-t-shirts-for-only-EUR-360

Finally, you can chain methods (and languages, accordingly) in an elegant and logical way:

Metagrapher('Здравствуйте, ♥ Labą dieną! ∞ teşekkür 100 ₺ için')
	.ru_en().pl_en().tr_en().symbols_en().currencies_en().upper().slugify()

Will output: ZDRAVSTVUYTE-LOVE-LABA-DIENA-INFINITY-TESHEKKUR-100-TRY-ICHIN

To use Chinese, Japanese and/or Korean languages, please instantiate Metagrapher with an argument take_from_cjk='c, j, k', where 'c' — Chinese, 'j' — Japanese, and 'k' is Korean. This flag will allow you to load relatively “heavy” dictionaries at the library startup (this procedure will add ~0.02 s. for Chinese, ~0.1 s. for Japanese, and ~0.05 s. for Korean on the initialization stage). Also you have an option to load these languages at any later time — just use load_cjk() with the same keyword argument take_from_cjk, or as the first positional argument:

# 'j' loads Japanese dicts:
Metagrapher('ワウ', take_from_cjk='j').jp_en().get()

# 'c' will load Chinese dictionaries:
Metagrapher().load_cjk('c').zh_en('哇').get()

# 'k,c' will hook up Chinese and Korean languages:
Metagrapher("好运").load_cjk(take_from_cjk='k,c').zh_en().get()

# You can freely combine markers 'c' (Chinese), 'j' (Japanese) and 'k' (Korean)
# in the positional argument, or keyword argument take_from_cjk():
Metagrapher(take_from_cjk='j,k').jp_en('できます').load_cjk('c').zh_en('改变语言').get()

Completely automatic non-language-specific conversion available by methods with any_ at the beginning of name. For example, transliteration of all characters from all languages to English can be done with the magic method any_en(). It is generally usable for programmatically slugifying any headers or user names:

transliteration = Metagrapher('嚜鮙گرอาพέρ', 'cjk').any_en()
print(transliteration)

Output: metagrapher

P.S. Don't forget to use the take_from_cjk=True or load_cjk() method for Chinese, Japanese and Korean ideographs!

You can customize substitutions manually and override language pairs this way:

Metagrapher('My ♥ is beating').symbols_en(override={'♥': 'heart'})

Output without overriding: My love is beating

With overriding: My heart is beating

Or you can even provide your custom set of substitutions:

print( Metagrapher('Making bad things again')
	.custom({'bad': 'good', 'again': 'as always', ' ': ' ♥ '}) )

Result: Making ♥ good ♥ things ♥ as ♥ always

P.S. Useful resources, that you can borrow from Metagrapher:

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
imgs		imgs
.gitignore		.gitignore
README.md		README.md
README.rst		README.rst
bencher.py		bencher.py
dictionary.py		dictionary.py
index.py		index.py
jp_dict.dict		jp_dict.dict
jp_dict.json		jp_dict.json
jp_freq_words.dict		jp_freq_words.dict
jp_kakasi_decomposition.json		jp_kakasi_decomposition.json
setup.py		setup.py
tests.py		tests.py
zh_dict.json		zh_dict.json
zh_freq_words.dict		zh_freq_words.dict

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Installation

Supported Languages

Why Metagrapher?

Examples

About

Uh oh!

Releases

Packages

Languages

Baksalyar/Metagrapher

Folders and files

Latest commit

History

Repository files navigation

Installation

Supported Languages

Why Metagrapher?

Examples

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages