Blog archives

Handling complex nested dicts in Python

21 comments
If you want to easily process CSV and JSON files with Python check out dataknead, my data parsing library.

Python is a lovely language for data processing, but it can get a little verbose when dealing with large nested dictionaries.

Let’s say you’re using some parsed JSON, for example from the Wikidata API. The structure is pretty predictable, but not at all times: some of the keys in the dictionary might not be available all the time.

Consider a structure like this:

animals = [
    {
        "animal" : "bunny"
    },
    {}
]

If you would try to directly access the animal property in a fashion like this:

for item in animals:
    print item["animal"]

You would get an error like this:

bunny
Traceback (most recent call last):
    KeyError: 'animal'    

Because the animal key is missing in the second item in the list. You could use the handy get method:

for item in animals:
    print item.get("animal", "no animal available")

The second argument to get is a default value that will be used if the key is not available:

bunny
no animal available

Excellent! However, this leads to problems when having a nested structure:

animals = [
    {
        "animal" : {
            "type" : "bunny"
        }
    },
    {
        "animal" : {}
    },
    {}
]

You could nest the get statements

for item in animals:
    print item.get("animal").get("type")    

But leads to an error because the animal key is lacking in the third item.

You could do something like this:

for item in animals:
    if "animal" in item:
        print item.get("animal").get("type")    

But with deeply nested structures (i counted seven levels in the Wikidata API) this gets unwieldy pretty fast.

Wouldn’t it be awesome if you could simply do this?

for item in animals:
    print item.get("animal/type")

Note the / in the get method.

Unfortunately, this is not possible in vanilla Python, but with a really small helper class you can easily make this happen:

class DictQuery(dict):
    def get(self, path, default = None):
        keys = path.split("/")
        val = None

        for key in keys:
            if val:
                if isinstance(val, list):
                    val = [ v.get(key, default) if v else None for v in val]
                else:
                    val = val.get(key, default)
            else:
                val = dict.get(self, key, default)

            if not val:
                break;

        return val

Now you can do this:

for item in animals:
    print DictQuery(item).get("animal/type")

bunny
{}
None

Nice, huh?

Add a comment

21 comments