Join on indexed column shows no improvement in total cost

Course : Data Engineering
Mission: Using an Index
Slide: Index Performance on Join

In this slide, the output of the exercise is showing that the total cost before and after the index creation in column state is same, but am expecting the cost is lesser when we create an index. Any specific reasons for this kind of output?

import pprint as pp
conn = psycopg2.connect(dbname="dq", user="hud_admin", password="abc123")
cur = conn.cursor()
cur.execute("CREATE INDEX state_idx ON homeless_by_coc(state)")
conn.commit()
cur.execute("EXPLAIN (format json, ANALYZE)SELECT hbc.state, hbc.coc_number, hbc.coc_name, si.name FROM homeless_by_coc as hbc, state_info as si WHERE hbc.state = si.postal")
pp.pprint(cur.fetchall())
cur.execute("DROP INDEX if EXISTS state_idx")
conn.commit()
cur.execute("EXPLAIN (format json, ANALYZE)SELECT hbc.state, hbc.coc_number, hbc.coc_name, si.name FROM homeless_by_coc as hbc, state_info as si WHERE hbc.state = si.postal")
pp.pprint(cur.fetchall())

Both the queries have the total cost as 9956.15

Hi @sreeramkolluru

The short answer is: both queries have the same query plan and therefore the same estimated cost.

Long answer:

If you look at the result of EXPLAIN with and without the index you will see that they have the same query plan.

HashJoin
    Seq Scan
    Hash

The cost is estimated depending on the query plan so it is natural to get the same cost.

This means that Postgres decided not to use the state_idx. In general, Postgres will decide to ignore an index and revert back to a sequential scan if the query results in too many rows because in these situations it is actually slower to use an index.

If you actually execute the query (removing the EXPLAIN) and print the number of rows in the result you will see that it has 85449 rows. In total the homeless_by_coc table has 86529 rows.

As you can see, the query outputs nearly every row so Postgres decided not to use the index since to go over all rows it is faster to do a simple sequential scan.

I hope this answers your question. Do not hesitate to ask for more clarifications if you are still having trouble understanding it.

2 Likes

Thanks for such a clear explanation.