xmltodict

xmltodict is a Python module that makes working with XML feel like you are working with JSON, as in this "spec":

>>> print(json.dumps(xmltodict.parse("""
...  <mydocument has="an attribute">
...    <and>
...      <many>elements</many>
...      <many>more elements</many>
...    </and>
...    <plus a="complex">
...      element as well
...    </plus>
...  </mydocument>
...  """), indent=4))
{
    "mydocument": {
        "@has": "an attribute", 
        "and": {
            "many": [
                "elements", 
                "more elements"
            ]
        }, 
        "plus": {
            "@a": "complex", 
            "#text": "element as well"
        }
    }
}

Namespace support

By default, xmltodict does no XML namespace processing (it just treats namespace declarations as regular node attributes), but passing process_namespaces=True will make it expand namespaces for you:

>>> xml = """
... <root xmlns="http://defaultns.com/"
...       xmlns:a="http://a.com/"
...       xmlns:b="http://b.com/">
...   <x>1</x>
...   <a:y>2</a:y>
...   <b:z>3</b:z>
... </root>
... """
>>> xmltodict.parse(xml, process_namespaces=True) == {
...     'http://defaultns.com/:root': {
...         'http://defaultns.com/:x': '1',
...         'http://a.com/:y': '2',
...         'http://b.com/:z': '3',
...     }
... }
True

It also lets you collapse certain namespaces to shorthand prefixes, or skip them altogether:

>>> namespaces = {
...     'http://defaultns.com/': None, # skip this namespace
...     'http://a.com/': 'ns_a', # collapse "http://a.com/" -> "ns_a"
... }
>>> xmltodict.parse(xml, process_namespaces=True, namespaces=namespaces) == {
...     'root': {
...         'x': '1',
...         'ns_a:y': '2',
...         'http://b.com/:z': '3',
...     },
... }
True

Streaming mode

xmltodict is very fast (Expat-based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like Discogs or Wikipedia:

>>> def handle_artist(_, artist):
...     print artist['name']
...     return True
>>> 
>>> xmltodict.parse(GzipFile('discogs_artists.xml.gz'),
...     item_depth=2, item_callback=handle_artist)
A Perfect Circle
Fantômas
King Crimson
Chris Potter
...

It can also be used from the command line to pipe objects to a script like this:

import sys, marshal
while True:
    _, article = marshal.load(sys.stdin)
    print article['title']

$ cat enwiki-pages-articles.xml.bz2 | bunzip2 | xmltodict.py 2 | myscript.py
AccessibleComputing
Anarchism
AfghanistanHistory
AfghanistanGeography
AfghanistanPeople
AfghanistanCommunications
Autism
...

Or just cache the dicts so you don't have to parse that big XML file again. You do this only once:

$ cat enwiki-pages-articles.xml.bz2 | bunzip2 | xmltodict.py 2 | gzip > enwiki.dicts.gz

And you reuse the dicts with every script that needs them:

$ cat enwiki.dicts.gz | gunzip | script1.py
$ cat enwiki.dicts.gz | gunzip | script2.py
...

Roundtripping

You can also convert in the other direction, using the unparse() method:

>>> mydict = {
...     'response': {
...             'status': 'good',
...             'last_updated': '2014-02-16T23:10:12Z',
...     }
... }
>>> print unparse(mydict, pretty=True)
<?xml version="1.0" encoding="utf-8"?>
<response>
	<status>good</status>
	<last_updated>2014-02-16T23:10:12Z</last_updated>
</response>

Text values for nodes can be specified with the cdata_key key in the python dict, while node properties can be specified with the attr_prefix prefixed to the key name in the python dict. The default value for attr_prefix is @ and the default value for cdata_key is #text.

>>> import xmltodict
>>> 
>>> mydict = {
...     'text': {
...         '@color':'red',
...         '@stroke':'2',
...         '#text':'This is a test'
...     }
... }
>>> print xmltodict.unparse(mydict, pretty=True)
<?xml version="1.0" encoding="utf-8"?>
<text stroke="2" color="red">This is a test</text>

Ok, how do I get it?

Using pypi

You just need to

$ pip install xmltodict

RPM-based distro (Fedora, RHEL, …)

There is an official Fedora package for xmltodict.

$ sudo yum install python-xmltodict

Arch Linux

There is an official Arch Linux package for xmltodict.

$ sudo pacman -S python-xmltodict

Debian-based distro (Debian, Ubuntu, …)

There is an official Debian package for xmltodict.

$ sudo apt install python-xmltodict

Name	Name	Last commit message	Last commit date
Latest commit martinblech Merge pull request martinblech#127 from sbadia/debian-based-distro Apr 5, 2016 05c171a · Apr 5, 2016 History 154 Commits
tests	tests	Allow non-string attributes in unparse.	Feb 23, 2016
.gitignore	.gitignore	ignore MANIFEST	Dec 13, 2012
.travis.yml	.travis.yml	Update Travis config to use latest Jython release.	Feb 23, 2016
CHANGELOG.md	CHANGELOG.md	Bumped version # to 0.10.1 and updated CHANGELOG.	Feb 23, 2016
LICENSE	LICENSE	updated (c) notice to acknowledge individual contributors	Jul 11, 2012
MANIFEST.in	MANIFEST.in	Switch to latest setuptools.	Oct 21, 2013
README.md	README.md	readme: Added Debian based distro installation	Apr 4, 2016
ez_setup.py	ez_setup.py	Use ez_setup.py from bootstrap.pypa.io.	Feb 23, 2016
setup.py	setup.py	Add support for Python 3.5.	Feb 23, 2016
tox.ini	tox.ini	Add support for Python 3.5.	Feb 23, 2016
xmltodict.py	xmltodict.py	Bumped version # to 0.10.1 and updated CHANGELOG.	Feb 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xmltodict

Namespace support

Streaming mode

Roundtripping

Ok, how do I get it?

Using pypi

RPM-based distro (Fedora, RHEL, …)

Arch Linux

Debian-based distro (Debian, Ubuntu, …)

About

Releases

Packages

Languages

License

SteveHarrison82/xmltodict

Folders and files

Latest commit

History

Repository files navigation

xmltodict

Namespace support

Streaming mode

Roundtripping

Ok, how do I get it?

Using pypi

RPM-based distro (Fedora, RHEL, …)

Arch Linux

Debian-based distro (Debian, Ubuntu, …)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages