normality

Normality is a Python micro-package that contains a small set of text normalization functions for easier re-use. These functions accept a snippet of unicode or utf-8 encoded text and remove various classes of characters, such as diacritics, punctuation etc. This is useful as a preparation to further text analysis.

WARNING: As of version 3.0, normality requires pyicu as a mandatory dependency. If you cannot install pyicu, consider using normality < 3.0.0.

Example

# coding: utf-8
from normality import normalize, slugify, collapse_spaces, ascii_text, latinize_text

text = normalize('Nie wieder "Grüne Süppchen" kochen!')
assert text == 'nie wieder grune suppchen kochen'

slug = slugify('My first blog post!')
assert slug == 'my-first-blog-post'

text = 'this \n\n\r\nhas\tlots of \nodd spacing.'
assert collapse_spaces(text) == 'this has lots of odd spacing.'

License

normality is open source, licensed under a standard MIT license (included in this repository as LICENSE).

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
.github		.github
normality		normality
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

normality

Example

License

About

Uh oh!

Releases

Packages

Used by 1k

Contributors 10

Languages

License

pudo/normality

Folders and files

Latest commit

History

Repository files navigation

normality

Example

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Used by 1k

Contributors 10

Languages

Packages