How to extract values from a list of dictionaries in pandas

Hello!
I was automating the process of getting data from a website and pushing it to Google Sheets and encountered a problem and I want to share the solution I found.

The website was returning a very nested JSON that was very difficult to handle even with pandas.json_normalize(). This function returned a nice dataframe but there was only one problem: one of the columns contained a list of dictionaries from which I needed just one value from each dictionary… The example is below:

[{'value': {'files': [{'mimetype': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
      'perma_link': None,
      'hosted_by': 'podio',
      'description': None,
      'hosted_by_humanized_name': 'Podio',
      'size': 43477,
      'thumbnail_link': None,
      'link': 'www.test.com',
      'file_id': 1356087,
      'external_file_id': None,
      'link_target': '_self',
      'name': 'file.docx'}],
    'sharefile_vault_url': None,
    'space': {'sharefile_vault_url': None,
     'name': 'AIESEC Italy',
     'url': 'www.test.com',
     'url_label': 'crm-igcdp-test',
     'item_accounting_info': None,
     'org_id': 845436,
     'type': 'regular',
     'space_id': 43396326},
    'app_item_id': 15253,
    'app': {'status': 'active',
     'sharefile_vault_url': None,
     'name': User Selection',
     'space_id': 43396326,
     'default_view_id': None,
     'url_add': 'www.test.com',
     'icon_id': 192,
     'link_add': 'www.test.com',
     'app_id': 169234471,
     'current_revision': 44,
     'is_default': False,
     'item_name': 'User',
     'link': 'www.test.com',
     'url': 'www.test.com',
     'url_label': 'user-selection',
     'config': {'item_name': 'User',
      'icon_id': 192,
      'type': 'standard',
      'name': 'User Selection',
      'icon': '192.png'},
     'item_accounting_info': None,
     'icon': '192.png'},
    'title': 'Marta Wilson',
    'initial_revision': {'item_revision_id': 4312859392,
     'created_via': {'url': None,
      'auth_client_id': 18,
      'display': True,
      'name': 'Webform',
      'id': 18},
     'created_by': {'user_id': -1,
      'name': 'User Selection',
      'url': 'www.test.com',
      'type': 'app',
      'image': None,
      'avatar_type': 'icon',
      'avatar': -1,
      'id': 163457471,
      'avatar_id': 192,
      'last_seen_on': None},
     'created_on': '2020-09-29 21:00:24',
     'user': {'user_id': -1,
      'name': 'User Selection',
      'url': 'www.test.com',
      'type': 'app',
      'image': None,
      'avatar_type': 'icon',
      'avatar': -1,
      'id': 163245471,
      'avatar_id': 192,
      'last_seen_on': None},
     'type': 'creation',
     'revision': 0},
    'created_via': {'url': None,
     'auth_client_id': 18,
     'display': True,
     'name': 'Webform',
     'id': 18},
    'created_by': {'user_id': -1,
     'name': 'User Selection',
     'url': 'www.test.com',
     'type': 'app',
     'image': None,
     'avatar_type': 'icon',
     'avatar': -1,
     'id': 162354471,
     'avatar_id': 192,
     'last_seen_on': None},
    'created_on': '2020-09-29 21:18:24',
    'link': 'www.test.com',
    'item_id': 1823412172,
    'sharefile_vault_folder_id': None,
    'revision': 8}}
]

This list could contain multiple dictionaries and I need just this value: ‘title’: ‘Marta Wilson’ (fake name) from each of the dictionaries. Hence I had to somehow loop through this list (which could be variable in length).

And here is my solution:

def extract_names(row):
    names = []
    for i in range(len(row)):
        names.append(row[i]["value"]["title"])
    return names

I then applied this function to the columns:

df_norm["User"] = df_norm["User"].apply(lambda row: extract_names(row) if np.all(pd.notnull(row)) else row)

The if...else statements is very important here because otherwise, the function will throw a TypeError:

TypeError: object of type 'float' has no len()

because NaN values are “float” which length cannot be measured.

Hope it can be useful to somebody:) Check out my article on how I automated a mundane process in a non-profit with Python.

Happy coding :smile:

1 Like