Crystal Ball

A Python library to guess and extract data from inconsistent structures.

This project intends to help on cases that require reading and parsing text without a stable structural organization in order to extract valuable data from it.

For example, consider the JSON document below and assume you don't know where a possible domain name is.

[
    {"origin": "somewhere", "domain": "foo.example.com"},
    {"origin": "somewhere", "extra_data": "domain=bar.example.com"},
]

Now let's parse it through crystalball and try out some extraction:

>>> import crystalball
>>> cb = crystalball.parse(open('weird_stuff.json', 'r'))
>>> cb.findall('domain')
[Match('foo.example.com'), Match('bar.example.com')]

Any Match object will contain all the strategy used to acquire the value in the document, such as the key hierarchy it went through and additional parsing it had to perform.

Learn more at the documentation (TODO).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
crystalball		crystalball
tests		tests
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
pytest.ini		pytest.ini
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crystal Ball

About

Releases

Packages

Languages

License

7ws/python-crystalball

Folders and files

Latest commit

History

Repository files navigation

Crystal Ball

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages