Python, Анализ данных

How to access nested data in Python

Время прочтения: 7 мин.

A beginner’s guide to lists in dictionaries inside lists in dictionaries…

In this short guide you will learn how you can access data that is deeply nested in python’s lists and dictionary data structures.

You might have met the following errors while working with JSON data:

KeyError: 0
TypeError: list indices must be integers or slices, not str

If that is the case then this guide is for you, as you will learn some tricks for dealing with these situations.

The basics of indexing dictionaries and lists:

A dictionary contains key and value pairs.

a_dict = {"key": "value"}

To get the value of the corresponding key you put the key in square brackets after the dictionary variable, like this:

your_dictionary[key]

hero_dict = {
'Clark': 'Superman',
'Bruce': 'Batman',
'You_reading_this_guide': 'Pythonman'
}
print(hero_dict['Clark'])
-> Superman

Similarly, to access the values of a list you also use square brackets, but instead of a providing a key you use the index of the item you want, like this:

your_list[index]

The index must be an integer.

hero_list = ['Superman', 'Batman', 'Pythonman']
print(hero_list[0])
-> Superman

Example of how to extract nested data

Take a look at the JSON data below. It comes from an apartment building and has some data about its address and the apartments inside it. Apartment number 1 and 2 have two residents.

The data is available here:
https://github.com/JacobToftgaardRasmussen/medium_nested_data

Now imagine that you want to get the name of the first person in the first apartment. You would write the following:

import json
with open("data.json") as f:
    data = json.load(f)
first_resident = data["ApartmentBuilding"]["Apartments"][0]["Residents"][0]["Name"]

First you import the json module, this will allow you to transform the data into a python dictionary via the json.load() function. Next you open the data file and save the data to the variable data. If you look in the picture of the data above, you can see that the first key is “ApartmentBuilding”. By writing the name of the key in square brackets we get the corresponding value which is another dictionary. This dictionary has two key value pairs:

  • “Address”
  • “Apartments”

In this example we wanted to access the first resident in the first apartment, so you put “Apartments” in square brackets to get the corresponding value, which is a list. We follow this with a in square brackets to access the first index of the list. Next comes “Residents”,followed by another to access the first person, and finally “Name” to get the name of that person.

I encourage you to try it out yourself! Either use the data I have provided or make your own data structure and try to traverse it.

Example of how you would extract nested data with a loop

Now imagine that you are not only interested in the name of the first resident, but instead you would like the names of all the residents. If you just look at the data you can quickly see that it would be the following list: [Bob, Alice, Jane, William].
This is how you can get that result with code:

import json
with open("data.json") as f:
    data = json.load(f)
resident_names = []
apartments = data['ApartmentBuilding']['Apartments']
for apartment in apartments:
    residents = apartment['Residents']
    for resident in residents:
        name = resident['Name']       
        resident_names.append(name)
print(resident_names)
-> ['Bob', 'Alice', 'Jane', 'William']

Alright lets walk through it one line at a time. Just like before we start by importing the json module and the data. This time however, we also initialize the empty list resident_names to store the names in. If you look in the data you will see that the first two levels of the data are dictionaries, therefore we use the the keys “ApartmentBuilding” and “Apartments” in square brackets and save the corresponding value in the variable apartments. This variableis now a list containing all the apartments.

Since we want to go through all of the apartments we use a for loop. In each iteration of the loop we save that apartment’s list of residents in the variable residents. Then we can again use a loop to go through all the residents of that current apartment. This is called a double loop, and it can be a bit tricky to wrap your head around the first time you see it, but don’t worry you will learn it with time and practice.

Alright, you are now looping through all the apartments one by one, and for each apartment you are looping through it’s residents. Now the final step is to add each resident’s name to the resident_names list. We access the name of the resident with the key “Name” and save the value to the variable name. Then we use the append() method and pass it name. Append() is standard on all python lists, and it simply adds whatever you give to it at the end of the list.
If you print the list, you will see the result as we expected it.

Great job! You now know how to access nested data!

A tip for avoiding program crash

Sometimes when working with dictionaries you cannot be sure that a key is actually present in the dictionary. Imagine that in the above data one of the resident’s age is not recorded in the data. For some unknown reason the “Age” key is not the dictionary with “Bob”.

And lets say that now you want to get a list of the ages of all the residents in all the apartments. The code would look exactly like in the example before, except we would be using the key “Age” in the final step instead of “Name”.

But this will not work… The script will crash…
And you will be met with:

File "c:\Users\your_directory\nested_data.py", line 25, in <module>
    resident_ages.append(resident['Age'])
KeyError: 'Age'

The dreaded KeyError

But this is actually very valuable information, as the error message tells us that the key ‘Age’ does not exist in the dictionary where we are looking for it. However, we would still like the program to keep running, and to find all the other names. Maybe we would also like the script to add a default value whenever an age is not found. This can be done with the get() function which is standard on python dictionaries.
Take a look at the code below I have made the changes bold:

import json
with open("data_bob_no_age.json") as f:
    data = json.load(f)

resident_ages = []
apartments = data['ApartmentBuilding']['Apartments']
for apartment in apartments:
    residents = apartment['Residents']
    for resident in residents:
        age = resident.get('Age', 'N/A')
        resident_ages.append(age)
print(resident_ages)
-> ['N/A', 42, 43, 42]

The get() functions takes two arguments, the key you are expecting and a default value that will be returned instead if the key is not present.
As you can see, the resident_ages list contains the value ‘N/A’ at the first index, and the script did not crash.

Using get() can be useful in situations like the one above, but remember to use it cautiously, as you will not receive an error message that could have been important for your understanding of the data structure.

Thank you for reading this article, I hope that you learned something useful!
If you have any questions or comments feel free to reach out to me.
Remember that the code for this article can be found here.

Keep learning!
— Jacob Toftgaard Rasmussen

Источник: https://towardsdatascience.com/how-to-access-nested-data-in-python-d65efa53ade4

Советуем почитать