Hidden treasures of Python: rarely used libraries and how to use them

There are so many thousands of libraries in the Python programming language that the title of this article can be essentially related to almost all of them, except for a couple of hundreds. Describing all the Python libraries would probably require a real book library. In this article, though, we’re going to focus on getting a taste of a few ones, designed to solve certain specific tasks or used for fun.

To practice our libraries, we’ll download a dataset from Kaggle – Animal Care and Control Adopted Animals.

import pandas as pd
df = pd.read_csv('animal-data-1.csv')
print('Number of pets:', len(df))
print(df.head(3))
Number of pets: 10290

          id           intakedate intakereason  istransfer sheltercode  \
    0  15801  2009-11-28 00:00:00       Moving           0   C09115463   
    1  15932  2009-12-08 00:00:00       Moving           0   D09125594   
    2  28859  2012-08-10 00:00:00    Abandoned           0   D12082309   
    
      identichipnumber animalname                breedname basecolour speciesname  \
    0       0A115D7358     Jadzia      Domestic Short Hair     Tortie         Cat   
    1       0A11675477      Gonzo  German Shepherd Dog/Mix        Tan         Dog   
    2       0A13253C7B     Maggie  Shep Mix/Siberian Husky    Various         Dog   
    
       ...         movementdate movementtype istrial returndate returnedreason  \
    0  ...  2017-05-13 00:00:00     Adoption     0.0        NaN          Stray   
    1  ...  2017-04-24 00:00:00     Adoption     0.0        NaN          Stray   
    2  ...  2017-04-15 00:00:00     Adoption     0.0        NaN          Stray   
    
       deceaseddate deceasedreason diedoffshelter puttosleep isdoa  
    0           NaN   Died in care              0          0     0  
    1           NaN   Died in care              0          0     0  
    2           NaN   Died in care              0          0     0  
    
    [3 rows x 23 columns]

1. Missingno

Library installation: pip install missingno


Missingno is a special library for displaying missing values in a dataframe. Of course, we can use for this purpose a seaborn heatmap or a bar plot from any visualization library. However, in such cases, we’ll have to create first a series containing missing values in each column using df.isnull().sum(), while missingno does everything under the hood. This library offers a few types of charts:

  • matrix displays density patterns in data completion for up to 50 columns of a dataframe, and it is analogous to the seaborn missing value heatmap. Also, by means of the sparkline at right, it shows the general shape of the data completeness by row, emphasizing the rows with the maximum and minimum nullity.
  • bar chart shows nullity visualization in bars by column.
  • heatmap measures nullity correlation that ranges from -1 to 1. Essentially, it shows how strongly the presence or absence of one variable affects the presence of another. Columns with no missing values, or just the opposite, completely empty, are excluded from the visualization, having no meaningful correlation.
  • dendrogram, like the heatmap, measures nullity relationships between columns, but in this case not pairwise but between groups of columns, detecting clusters of missing data. Those variables that are located closer on the chart show a stronger nullity correlation. For dataframes with less than 50 columns the dendrogram is vertical, otherwise, it flips to a horizontal.

Let’s try all these charts with their default settings on our pet dataset:

import missingno as msno
%matplotlib inline
msno.matrix(df)

msno.bar(df)

msno.heatmap(df)

msno.dendrogram(df)

We can make the following observations about the dataset:

  • In general, there are rather few missing values.
  • The most empty columns are deceaseddate and returndate.
  • The majority of pets are chipped.
  • Nullity correlation:
    • slightly negative between being chipped and being dead,
    • slightly positive – being chipped vs. being returned, being returned vs. being dead.

There are a few options to customize missingno charts: figsize, fontsize, sort (sorts the rows by completeness, in either ascending or descending order), labels (can be True or False, meaning whether to show or not the column labels). Some parameters are chart-specific: color for matrix and bar charts, sparkline (whether to draw it or not) and width_ratios (matrix width to sparkline width) for matrix, log (logarithmic scale) for bar charts, cmap colormap for heatmap, orientation for dendrogram. Let’s apply some of them to one of our charts above:

msno.matrix(
            df, 
            figsize=(25,7),
            fontsize=30, 
            sort='descending',
            color=(0.494, 0.184, 0.556), 
            width_ratios=(10, 1)
            ) 

Finally, if there is still something we would like to tune, we can always add any functionality of matplotlib to the missingno graphs. To do this, we should add the parameter inline and assign it to False. Let’s add a title to our matrix chart:

import matplotlib.pyplot as plt
msno.matrix(
            df, 
            figsize=(25,7),
            fontsize=30, 
            sort='descending',
            color=(0.494, 0.184, 0.556), 
            width_ratios=(10, 1),
            inline=False
            ) 
plt.title('Missing Values Pet Dataset', fontsize=55)
plt.show()

For further practice, however, let’s keep only the most interesting columns of our dataframe:

columns = ['identichipnumber', 'animalname', 'breedname', 'speciesname', 'sexname', 'returndate', 
           'returnedreason']
df = df[columns]

Missingno Documentation

2. Tabulate

Library installation: pip install tabulate


This library serves for pretty-printing tabular data in Python. It allows smart and customizable column alignment, number and text formatting, alignment by a decimal point.

The tabulate() function takes a tabular data type (dataframe, list of lists or dictionaries, dictionary, NumPy array), some other optional parameters, and outputs a nicely formatted table. Let’s practice it on a fragment of our pet dataset, starting with the most basic pretty-printed table:

from tabulate import tabulate
df_pretty_printed = df.iloc[:5, [1,2,4,6]]
print(tabulate(df_pretty_printed))
    -  -----------  -----------------------  ------  -----
    0  Jadzia       Domestic Short Hair      Female  Stray
    1  Gonzo        German Shepherd Dog/Mix  Male    Stray
    2  Maggie       Shep Mix/Siberian Husky  Female  Stray
    3  Pretty Girl  Domestic Short Hair      Female  Stray
    4  Pretty Girl  Domestic Short Hair      Female  Stray
    -  -----------  -----------------------  ------  -----

We can add a headers parameter to our table. If we assign headers='firstrow', the first row of data is used, if headers='keys' – the keys of a dataframe / dictionary. For table formatting, we can use a tablefmt parameter, which can take one of the numerous options (assigned as a string): simple, github, grid, fancy_grid, pipe, orgtbl, jira, presto, pretty, etc.

By default, tabulate aligns columns containing float numbers by a decimal point, integers – to the right, text columns – to the left. This can be overridden by using numalign and stralign parameters (right, center, left, decimal for numbers, or None). For text columns, it’s possible to disable the default leading and trailing whitespace removal.

Let’s customize our table:

print(tabulate(
               df_pretty_printed, 
               headers='keys',
               tablefmt='fancy_grid',
               stralign='center'
               ))
    │    │  animalname  │        breedname        │  sexname  │  returnedreason  │
    ╞════╪══════════════╪═════════════════════════╪═══════════╪══════════════════╡
    │  0 │    Jadzia    │   Domestic Short Hair   │  Female   │      Stray       │
    ├────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤
    │  1 │    Gonzo     │ German Shepherd Dog/Mix │   Male    │      Stray       │
    ├────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤
    │  2 │    Maggie    │ Shep Mix/Siberian Husky │  Female   │      Stray       │
    ├────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤
    │  3 │ Pretty Girl  │   Domestic Short Hair   │  Female   │      Stray       │
    ├────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤
    │  4 │ Pretty Girl  │   Domestic Short Hair   │  Female   │      Stray       │
    ╘════╧══════════════╧═════════════════════════╧═══════════╧══════════════════╛

The only thing to keep in mind here is that pretty-printed tables are best displayed on laptops and computers, but sometimes can have issues on smaller screens (smartphones and iPhones).

Tabulate Documentation

3. Wikipedia

Library installation: pip install wikipedia


Wikipedia library, as its name suggests, facilitates accessing and fetching information from Wikipedia. Some of the tasks that can be accomplished with it include:

  • searching Wikipedia – search(),
  • getting article summaries – summary,
  • getting full page contents, including images, links, any other metadata of a Wikipedia page – page(),
  • selecting the language of a page – set_lang().

In the pretty-printed table above, we saw a dog breed called “Siberian Husky”. As an exercise, we’ll set the language to Russian (my native language :slightly_smiling_face:) and search for some suggestions of the corresponding Wikipedia pages:

import wikipedia 
wikipedia.set_lang('ru') 
print(wikipedia.search('Siberian Husky')) 
['Сибирский хаски', 'Древние породы собак', 'Маккензи Ривер Хаски', 'Породы собак по классификации кинологических организаций', 'Ричардсон, Кевин Майкл']

Let’s take the first suggestion and fetch the first sentence of that page’s summary:

print(wikipedia.summary('Сибирский хаски', sentences=1)) 
Сибирский хаски — заводская специализированная порода собак, выведенная чукчами северо-восточной части Сибири и зарегистрированная американскими кинологами в 1930-х годах как ездовая собака, полученная от аборигенных собак Дальнего Востока России, в основном из Анадыря, Колымы, Камчатки у местных оседлых приморских племён — юкагиров, кереков, азиатских эскимосов и приморских чукчей — анкальын (приморские, поморы — от анкы (море)).

Now, we’re going to get a link to a picture of Husky from this page:

print(wikipedia.page('Сибирский хаски').images[0])

https://upload.wikimedia.org/wikipedia/commons/a/a3/Black-Magic-Big-Boy.jpg

and visualize this beautiful creature:

Husky
Wikipedia Documentation

4. Wget

Library installation: pip install wget


Wget library allows downloading files in Python without the necessity to open them. We can add also a path where to save this file as a second argument.

Let’s download the picture of Husky above:

import wget
wget.download('https://upload.wikimedia.org/wikipedia/commons/a/a3/Black-Magic-Big-Boy.jpg')
'Black-Magic-Big-Boy.jpg'

Now we can find the picture in the same folder as this notebook, since we didn’t specify a path where to save it.

Since any webpage on the Internet is actually a HTML file, another very useful application of this library is to crawl the whole webpage, with all its elements. Let’s download the Kaggle webpage where our dataset is located:

wget.download('https://www.kaggle.com/jinbonnie/animal-data')
'animal-data'

The resulting animal-data file looks like the following (we’ll display only several first rows):

<!DOCTYPE html>
<html lang="en">
<head>
    <title>Animal Care and Control Adopted Animals | Kaggle</title>
    <meta charset="utf-8" />
    <meta name="robots" content="index, follow" />
    <meta name="description" content="animal situation in Bloomington Animal Shelter from 2017-2020" />
    <meta name="turbolinks-cache-control" content="no-cache" />

Wget Documentation

5. Faker

Library installation: pip install Faker


This module is used to generate fake data, including names, addresses, emails, phone numbers, jobs, texts, sentences, colors, currencies, etc. The faker generator can take a locale as an argument (the default is en_US locale), to return localized data. For generating a piece of text or a sentence, we can use the default lorem ipsum; alternatively, we can provide our own set of words. To ensure that all the created values are unique for some specific instance (for example, when we want to create a long list of fake names), the .unique property is applied. If instead, it’s necessary to produce the same value or data set, the seed() method is used.

Let’s look at some examples.

from faker import Faker
fake = Faker()

print(
      'Fake color:', fake.color(), '\n'
      'Fake job:',   fake.job(),   '\n'
      'Fake email:', fake.email(), '\n'
      )

# Printing a list of fake Korean and Portuguese addresses
fake = Faker(['ko_KR', 'pt_BR'])
for _ in range(5):
    print(fake.unique.address())     # using the `.unique` property  

print('\n')

# Assigning a seed number to print always the same value / data set
fake = Faker()
Faker.seed(3920)
print('This English fake name is always the same:', fake.name())
Fake color: #212591 
Fake job: Occupational therapist 
Fake email: nancymoody@hotmail.com 
    
Estrada Lavínia da Luz, 62
Oeste
85775858 Moura / SE
Residencial de Moreira, 57
Morro Dos Macacos
75273529 Farias / TO
세종특별자치시 강남구 가락거리 (예원박김마을)
전라북도 광주시 백제고분길 (승민우리)
경상남도 당진시 가락53가

This English fake name is always the same: Kim Lopez

Returning to our dataset, we found out that there are at least two unlucky pets with not really nice names:

df_bad_names = df[df['animalname'].str.contains('Stink|Pooh')]
print(df_bad_names)
         identichipnumber animalname            breedname speciesname sexname  \
    1692              NaN    Stinker  Domestic Short Hair         Cat    Male   
    3336  981020023417175       Pooh  German Shepherd Dog         Dog  Female   
    3337  981020023417175       Pooh  German Shepherd Dog         Dog  Female   
    
                   returndate                     returnedreason  
    1692                  NaN                              Stray  
    3336  2018-05-14 00:00:00  Incompatible with owner lifestyle  
    3337                  NaN                              Stray  

The dog from the last 2 rows is actually the same one, returned to the shelter because of being incompatible with the owner’s lifestyle. With our new skills, let’s save the reputation of both animals and rename them into something more decent. Since the dog is a German Shepherd, we’ll select a German name for her. As for the cat, according to this Wikipedia page, Domestic Short Hair is the most common breed in the US, so for him, we’ll select an English name.

# Defining a function to rename the unlucky pets
def rename_pets(name):
    if name == 'Stinker':
        fake = Faker()
        Faker.seed(162)   
        name = fake.name()
    if name == 'Pooh':
        fake = Faker(['de_DE'])
        Faker.seed(20387)
        name = fake.name()
    return name

# Renaming the pets
df['animalname'] = df['animalname'].apply(rename_pets)

# Checking the results
print(df.iloc[df_bad_names.index.tolist(), :] ) 
         identichipnumber            animalname            breedname speciesname  \
    1692              NaN         Steven Harris  Domestic Short Hair         Cat   
    3336  981020023417175  Helena Fliegner-Karz  German Shepherd Dog         Dog   
    3337  981020023417175  Helena Fliegner-Karz  German Shepherd Dog         Dog   
    
         sexname           returndate                     returnedreason  
    1692    Male                  NaN                              Stray  
    3336  Female  2018-05-14 00:00:00  Incompatible with owner lifestyle  
    3337  Female                  NaN                              Stray  

Steven Harris and Helena Fliegner-Karz sound a little bit too bombastic for a cat and a dog, but definitely much better than their previous names!

Faker Documentation

6. Numerizer

Library installation: pip install numerizer


This small Python package is used for converting natural language numerics into numbers (integers and floats) and consists of only one function – numerize().

Let’s try it right now on our dataset. Some pets’ names contain numbers:

df_numerized_names = df[['identichipnumber', 'animalname', 'speciesname']]\
                        [df['animalname'].str.contains('Two|Seven|Fifty')]   
df_numerized_names
identichipnumber animalname speciesname
2127 NaN Seven Dog
4040 981020025503945 Fifty Lee Cat
6519 981020021481875 Two Toes Cat
6520 981020021481875 Two Toes Cat
7757 981020029737857 Mew Two Cat
7758 981020029737857 Mew Two Cat
7759 981020029737857 Mew Two Cat

We’re going to convert the numeric part of these names into real numbers:

from numerizer import numerize
df['animalname'] = df['animalname'].apply(lambda x: numerize(x))
df[['identichipnumber', 'animalname', 'speciesname']].iloc[df_numerized_names.index.tolist(), :]   
identichipnumber animalname speciesname
2127 NaN 7 Dog
4040 981020025503945 50 Lee Cat
6519 981020021481875 2 Toes Cat
6520 981020021481875 2 Toes Cat
7757 981020029737857 Mew 2 Cat
7758 981020029737857 Mew 2 Cat
7759 981020029737857 Mew 2 Cat

Numerizer Documentation

7. Emoji

Library installation: pip install emoji


By means of this library, we can convert strings to emoji, according to the Emoji codes as defined by the Unicode Consortium, and, if specified use_aliases=True, complemented with the aliases. The emoji package has only two functions: emojize() and demojize(). The default English language (language='en') can be changed to Spanish (es), Portuguese (pt), or Italian (it).

import emoji
print(emoji.emojize(':koala:'))
print(emoji.demojize('🐨'))
print(emoji.emojize(':rana:', language='it'))
🐨
:koala:
🐸

Let’s emojize our animals. First, we’ll check their unique species names:

print(df['speciesname'].unique())
['Cat' 'Dog' 'House Rabbit' 'Rat' 'Bird' 'Opossum' 'Chicken' 'Wildlife'
 'Ferret' 'Tortoise' 'Pig' 'Hamster' 'Guinea Pig' 'Gerbil' 'Lizard'
'Hedgehog' 'Chinchilla' 'Goat' 'Snake' 'Squirrel' 'Sugar Glider' 'Turtle'
'Tarantula' 'Mouse' 'Raccoon' 'Livestock' 'Fish']

We have to convert these names into lower case, add leading and trailing colons to each, and then apply emojize() to the result:

df['speciesname'] = df['speciesname'].apply(lambda x: emoji.emojize(f':{x.lower()}:', 
                                                                    use_aliases=True))
print(df['speciesname'].unique())
['🐱' '🐶' ':house rabbit:' '🐀' '🐦' ':opossum:' '🐔' ':wildlife:' ':ferret:'
':tortoise:' '🐷' '🐹' ':guinea pig:' ':gerbil:' '🦎' '🦔' ':chinchilla:' '🐐'
'🐍' ':squirrel:' ':sugar glider:' '🐢' ':tarantula:' '🐭' '🦝' ':livestock:'
'🐟']

Let’s rename the house rabbit, tortoise, and squirrel into their synonyms comprehensible for the emoji library and try emojizing them again:

df['speciesname'] = df['speciesname'].str.replace(':house rabbit:', ':rabbit:')\
                                         .replace(':tortoise:', ':turtle:')\
                                         .replace(':squirrel:', ':chipmunk:')
df['speciesname'] = df['speciesname'].apply(lambda x: emoji.emojize(x, variant='emoji_type'))
print(df['speciesname'].unique())
['🐱' '🐶' '🐇️' '🐀' '🐦' ':opossum:️' '🐔' ':wildlife:️' ':ferret:️' '🐢️' '🐷'
'🐹' ':guinea pig:' ':gerbil:️' '🦎' '🦔' ':chinchilla:️' '🐐' '🐍' '🐿️'
':sugar glider:' '🐢' ':tarantula:️' '🐭' '🦝' ':livestock:️' '🐟']

The remaining species are or collective names (wildlife and livestock), or don’t have an emoji equivalent, at least not yet. We’ll leave them as they are, removing only the colons and converting them back into title case:

df['speciesname'] = df['speciesname'].str.replace(':', '').apply(lambda x: x.title())
print(df['speciesname'].unique())
df[['animalname', 'speciesname', 'breedname']].head(3)
['🐱' '🐶' '🐇️' '🐀' '🐦' 'Opossum️' '🐔' 'Wildlife️' 'Ferret️' '🐢️' '🐷' '🐹'
'Guinea Pig' 'Gerbil️' '🦎' '🦔' 'Chinchilla️' '🐐' '🐍' '🐿️' 'Sugar Glider'
'🐢' 'Tarantula️' '🐭' '🦝' 'Livestock️' '🐟']
animalname speciesname breedname
0 Jadzia 🐱 Domestic Short Hair
1 Gonzo 🐶 German Shepherd Dog/Mix
2 Maggie 🐶 Shep Mix/Siberian Husky

Emoji Documentation

8. PyAztro

Library installation: pip install pyaztro


PyAztro seems to be designed more for fun than for work. This library provides a horoscope for each zodiac sign. The prediction includes the description of a sign for that day, date range of that sign, mood, lucky number, lucky time, lucky color, compatibility with other signs. For example:

import pyaztro
pyaztro.Aztro(sign='taurus').description

'You need to make a radical change in some aspect of your life - probably related to your home. It could be time to buy or sell or just to move on to some more promising location.'

Great! I’m already running to buy a new house :grinning:

In our dataset, there are a cat and a dog called Aries:

df[['animalname', 'speciesname']][(df['animalname'] == 'Aries')]
animalname speciesname
3036 Aries 🐱
9255 Aries 🐶

and plenty of pets called Leo:

print('Leo:', df['animalname'][(df['animalname'] == 'Leo')].count())
Leo: 18

Let’s assume that those are their corresponding zodiac signs :wink: With PyAztro, we can check what the stars have prepared for these animals for today:

aries = pyaztro.Aztro(sign='aries')
leo = pyaztro.Aztro(sign='leo')

print('ARIES: \n',
      'Sign:',             aries.sign,          '\n',
      'Current date:',     aries.current_date,  '\n',
      'Date range:',       aries.date_range,    '\n', 
      'Sign description:', aries.description,   '\n',
      'Mood:',             aries.mood,          '\n', 
      'Compatibility:',    aries.compatibility, '\n',
      'Lucky number:',     aries.lucky_number,  '\n',
      'Lucky time:',       aries.lucky_time,    '\n',
      'Lucky color:',      aries.color,       2*'\n',
    
      'LEO: \n',
      'Sign:',             leo.sign,            '\n',
      'Current date:',     leo.current_date,    '\n',
      'Date range:',       leo.date_range,      '\n', 
      'Sign description:', leo.description,     '\n',
      'Mood:',             leo.mood,            '\n', 
      'Compatibility:',    leo.compatibility,   '\n',
      'Lucky number:',     leo.lucky_number,    '\n',
      'Lucky time:',       leo.lucky_time,      '\n',
      'Lucky color:',      leo.color)   
ARIES: 
 Sign: aries 
 Current date: 2021-02-06 
 Date range: [datetime.datetime(2021, 3, 21, 0, 0), datetime.datetime(2021, 4, 20, 0, 0)] 
 Sign description: It's a little harder to convince people your way is best today -- in part because it's much tougher to play on their emotions. Go for the intellectual arguments and you should do just fine. 
 Mood: Helpful 
 Compatibility: Leo 
 Lucky number: 18 
 Lucky time: 8am 
 Lucky color: Gold 

 LEO: 
 Sign: leo 
 Current date: 2021-02-06 
 Date range: [datetime.datetime(2021, 7, 23, 0, 0), datetime.datetime(2021, 8, 22, 0, 0)] 
 Sign description: Big problems need big solutions -- but none of the obvious ones seem to be working today! You need to stretch your mind as far as it will go in order to really make sense of today's issues. 
 Mood: Irritated 
 Compatibility: Libra 
 Lucky number: 44 
 Lucky time: 12am 
 Lucky color: Navy Blue

These forecasts are valid for 06.02.2021, so if you want to check our pets’ horoscope (or maybe your own one) for the current day, you have to re-run the code above. All the properties, apart from, evidently, sign and date_range, change every day for each zodiac sign at midnight GTM.

PyAztro Documentation


Certainly, there are many other funny Python libraries like PyAztro, including:

  • Art – for converting text to ASCII art, like this: ʕ •`ᴥ•´ʔ
  • Turtle – for drawing,
  • Chess – for playing chess,
  • Santa – for randomly pairing Secret Santa gifters and recipients,

and even

  • Pynder – for using Tinder.

We can be sure that with Python we’ll never get bored!

Conclusion

To sum up, I wish all the pets from the dataset to find their loving and caring owners, and the Python users – to discover more amazing libraries and apply them to their projects.

14 Likes

@Elena_Kosourova Awesome post! Thanks for sharing these useful and fun packages! :grinning:

1 Like

Thank you @veratsien! I really enjoyed exploring these libraries, they are really cool! :joy:

1 Like

Absolutely incredible! Thanks for this nice article, I discovered Faker which is what I was dreaming about:)

Happy coding :smile:

1 Like

Thanks a lot, Artur! :grinning: It’s indeed a super-useful library for creating the whole dataframes, not only for renaming a couple of pets, like in my case :sweat_smile:

1 Like

Amazing post, Elena! It’s very helpful to my friend Antholino

1 Like

Thank you Vincenzo! I’m very glad that it was useful! :star_struck: