Postgres — Partial Index usage with Dynamic date filter

Virender Singla
Nerd For Tech
Published in
3 min readApr 2, 2021

--

Recently I fine tuned a slow running query in Postgres. AS usual I checked different bind variables passed for that query and the count for those values in the table and finally came up with an additional Index to speed up the query.

Now the problem is that the Prod table is of size 900GB (unfortunately partitioning is not implemented on this table) and creating Index on such a huge table was taking around 9 hours (tested on database restored from snapshot, of course after doing EBS initialization). We wanted to finish the index creation in a low peak time window of 4 hours and minimize possible impact on overall database performance due to this activity.

Query: Actual query was a bit complex with multiple tables joined but for the sake of this post we will focus on partial index so below is just a dummy query.

SELECT * FROM TESTTABLE WHERE ((asn, vsid ) IN ('B09JKJK', 'Unad45F')) and last_updated > current_Date — 360;

As we can see data is needed for last one year on the predicate column last_updated then why to create Index on complete table data (we have 20 years of data in the table). Hence we thought of creating a partial index with hard coded date (to have last 1 year data) in the WHERE clause. This Partial Index will be compact in size and gets created in lesser time (In our case Index creation time reduced to 4 hours).

create index CONCURRENTLY idx_tab_new  ON testtable USING btree (vsid,asn,last_updated,last_updated) where last_updated  > '01-01-2020';

Quick Notes:

  1. We can not create a Partial Index with dynamic filters and that is obvious.
create index idx on testtable (last_updated) where last_updated > current_date — 360;ERROR: functions in index predicate must be marked IMMUTABLE

2. I added last_updated column along with vsid and asn in the Index definition because in future this Index will keep growing and contain more than a year’s data and that time last_updated column within the Index will be helpful to filter the data further.

My initial thought was, now query will pick the partial Index and run in a shorter time, but that was not the case and it wasstill going for seq scan.

Why? Query is having dynamic filter last_updated > current_Date — 360 so planner is not able to recognize that partial index is still suitable as it is having last one year of data (as of in year 2021). What planner is expecting that gives the same WHERE clause in the query as we have given in the Index. Then let’s do that.

New code was deployed in the application with change number as a hint (so in future anyone can understand why we added another predicate). At 1st glance these two filter conditions look childish but that is how I got it working.

SELECT * FROM TABLE WHERE 
last_updated > current_Date — 360
and last_updated >= '01-01-2020' --CHANGE NUMBER 'NNNNNN'
;

Now the query could pick the Index and run faster.

On another note, Partial Indexes can be quite useful where we can give a SELECT query where clause as it is in Partial Index. There will be some overhead for DMLs (Insert/Update) as that where clause need to be calculated for every tuple insert/update.

We have another query where “bitwise &” operation is being performed on a column status_flags and this query is executed thousands of times in a day. So we created a partial Index with exact same WHERE clause to speed up the query (Let me know if there is a more elegant way of doing this with some other type of Index)

create index idx on test (lid,vcode,isname) where
( (
Cast(status_flags AS BIGINT) & 128) > 0 )
AND ( (
Cast(status_flags AS BIGINT) & 2) = 0
AND (
Cast(status_flags AS BIGINT) & 32) = 0
AND (
Cast(status_flags AS BIGINT) & 64) = 0 );

--

--