_bootstrap.Column datatype in Postgres for Data Engineers Course

I’m working on the Managing Created Tables mission in the Postgres for Data Engineers course and I notice that cur.description returns a tuple of <class ‘importlib._bootstrap.Column’>. I’m not familiar with the <class ‘importlib._bootstrap.Column’> datatype and I’m having trouble finding more information.

The first <class ‘importlib._bootstrap.Column’> in the tuple is:

Column(name=‘id’, type_code=23, display_size=None, internal_size=4, precision=None, scale=None, null_ok=None)

As a start, I’d like to know how to access the various elements. I can convert it to a series, but I loose some information (all the column names, like name, code_type, display_size, etc.).

Here is the code from the mission and a couple lines I added:

import pandas as pd
# The `cur` object is provided for you.

conn = psycopg2.connect("dbname=dq user=dq")
cur = conn.cursor()
cur.execute('ALTER TABLE old_ign_reviews RENAME TO ign_reviews')
conn.commit()
cur.execute('SELECT * FROM ign_reviews LIMIT 0')
print(cur.description)

cur_description_0 = cur.description[0]
print(type(cur_description_0))

print(pd.Series(cur.description[0]))

Here is what prints out:

(Column(name='id', type_code=23, display_size=None, internal_size=4, precision=None, scale=None, null_ok=None), Column(name='score_phrase', type_code=1043, display_size=None, internal_size=11, precision=None, scale=None, null_ok=None), Column(name='title_of_game_review', type_code=25, display_size=None, internal_size=-1, precision=None, scale=None, null_ok=None), Column(name='url', type_code=25, display_size=None, internal_size=-1, precision=None, scale=None, null_ok=None), Column(name='platform', type_code=1043, display_size=None, internal_size=20, precision=None, scale=None, null_ok=None), Column(name='score', type_code=1700, display_size=None, internal_size=2, precision=2, scale=1, null_ok=None), Column(name='genre', type_code=25, display_size=None, internal_size=-1, precision=None, scale=None, null_ok=None), Column(name='editors_choice', type_code=16, display_size=None, internal_size=1, precision=None, scale=None, null_ok=None), Column(name='release_year', type_code=23, display_size=None, internal_size=4, precision=None, scale=None, null_ok=None), Column(name='release_month', type_code=21, display_size=None, internal_size=2, precision=None, scale=None, null_ok=None), Column(name='release_day', type_code=21, display_size=None, internal_size=2, precision=None, scale=None, null_ok=None), Column(name='full_url', type_code=25, display_size=None, internal_size=-1, precision=None, scale=None, null_ok=None))
<class 'importlib._bootstrap.Column'>
0      id
1      23
2    None
3       4
4    None
5    None
6    None
dtype: object

Okay. Gear up. This is what I found after digging a bit.

It turns out that when we do cur.description() we get an iterable object of Column objects.

So, because cur.description() returns an iterable object so we just have to go through it to get each column. For instance, to print each column we could do this:

for column in cur.description():  
    print(column)

In your case it will return this:

Column(name='id', type_code=23, display_size=None, internal_size=4, precision=None, scale=None, null_ok=None)
Column(name='score_phrase', type_code=1043, display_size=None, internal_size=11, precision=None, scale=None, null_ok=None)
Column(name='title_of_game_review', type_code=25, display_size=None, internal_size=-1, precision=None, scale=None, null_ok=None)
Column(name='url', type_code=25, display_size=None, internal_size=-1, precision=None, scale=None, null_ok=None)
Column(name='platform', type_code=1043, display_size=None, internal_size=20, precision=None, scale=None, null_ok=None)
Column(name='score', type_code=1700, display_size=None, internal_size=2, precision=2, scale=1, null_ok=None)
Column(name='genre', type_code=25, display_size=None, internal_size=-1, precision=None, scale=None, null_ok=None)
Column(name='editors_choice', type_code=16, display_size=None, internal_size=1, precision=None, scale=None, null_ok=None)
Column(name='release_year', type_code=23, display_size=None, internal_size=4, precision=None, scale=None, null_ok=None)
Column(name='release_month', type_code=21, display_size=None, internal_size=2, precision=None, scale=None, null_ok=None)
Column(name='release_day', type_code=21, display_size=None, internal_size=2, precision=None, scale=None, null_ok=None)
Column(name='full_url', type_code=25, display_size=None, internal_size=-1, precision=None, scale=None, null_ok=None)

Now, each of those Column object is a named tuple.

If you iterate through the tuple you won’t get the names, just the elements. So:

for entry in cur_description_0:
    print(entry)

will return:

id
23
None
4
None
None
None

If you want the names of the entries then you have to do: ._fields. That will return an unnamed tuple containing just the fields names. cur_description_0._fields will return ('name', 'type_code', 'display_size', 'internal_size', 'precision', 'scale', 'null_ok').

So if you really wanted to have it displayed as a series with the fields name you could try:

print(pd.Series(cur.description[0], index=cur.description[0]._fields))

And you will get:

name               id
type_code          23
display_size     None
internal_size       4
precision        None
scale            None
null_ok          None
2 Likes

Thanks so much for this reply! It’s very clear and exactly what I was looking for!

1 Like