Recently our team got multiple notifications from AWS for our Postgres Aurora clusters regarding deprecation of Postgres v9.6.x. We must upgrade these databases to higher major version (Postgres v10.x.x).
While researching for all the steps in the upgrade process, One thing came as a surprise to me that Postgres does not copy statistics in major version upgrades. That means there may a plan flip for a critical query and that will cause application latency to go high and hamper overall database health (CPU spikes).
As a obvious thing, i thought to analyze the tables in parallel just after upgrade (in…
One of the core concept in PostgreSQL is it’s vacuuming process. I had read few blog posts where People talked about big outage happened due to transaction wraparound issues and in some of the cases downtime was for few hours. As VACUUM process is very critical hence one should understand in and out of it. In case we hit those dangerous vacuuming/wraparound issues, at least we are better equipped to handle. While reading about vacuuming concept, some of questions came and i am summarizing all of them here.
Q. Is there parallel option for running VACUUM job. A. Delayed but…
Few days back we got CloudWatch alarm alert for CPU Utilization for one of the Postgres database hosted on Amazon Aurora. Looking at performance insights we got the top CPU consuming queries.
SET locked_by = $1,
last_updated_by = $2,
rtcount = rtcount + ?
WHERE id IN (
WHERE rtcount < $5
AND locked_at < $6) FOR UPDATE skip locked limit $7 ) returning *Indexes on events_tab1:
"events_tab1_pk" PRIMARY KEY, btree (id)
"events_tab1_la_rcnt" btree (locked_at, rtcount)
Recently I got a chance to fine tune a slow running query in Postgres. AS usual I checked different bind variables passed for that query and the count for those values in the table and finally came up with an additional Index to speed up the query.
Now the problem is that Prod table is of size 900GB and creating Index on such a huge table was taking around 9 hours (tested on database restored from snapshot). We wanted to create the index in low peak time window only.
Query: Actual query was bit complex with multiple tables join but…
We have Prod database running on Amazon Aurora PostgreSQL. During slow query analysis using CloudWatch Logs Insights (that is for another post), i got one query running in around 10 seconds. (if all the blocks are in cache and in case of disk reads it was taking more than 5 minutes).
Yes one option here is simply cache the data (Indexes) using pg_prewarm extension.
explain (analyze, buffers) SELECT tabalias.rid, tabalias.col1, tabalias.col2, tabalias.col3, tabalias.col4, tabalias.dis_id, tabalias.col5, tabalias.processed_date, tabalias.col6, tabalias.col7, tabalias.col8, tabalias.col8, tabalias.col9, tabalias.col10, tabalias.col11, tabalias.o_id, tabalias.col12 FROM testtable tabalias WHERE tabalias.processed_date >= '18-dec-2020' AND tabalias.processed_date <= '18-jan-2021' AND (tabalias.o_id…
In cloud world, things have become easier as we need to spend less time on setting up infrastructure, doing operational work and we can focus on building up the core business logic.
Whenever we are building AWS architecture or using any service, it’s important to carefully consider all the cost aspects and missing this may give us billing shock. There are hundreds or may be thousands of articles on the internet to avoid these cost surprises and i am adding one more in the list.
The main part of building any solution is the core logic and how/what services we…
We had 43 static databases (no more data changes) running on EC2 instances and having size of around 20TB each, however those databases were required to query only once in a month. Running EC2 instances around the clock with those large EBS volumes does not make sense in terms of cost efficiencies so we thought of doing some cost optimization.
Cost Considerations: We took the AMIs of those EC2 machines and whenever application team needs to to query the databases, those can be restored from AMIs. This way we need to pay only AMIs snapshot cost for most of…
In last post we learnt about the differences in MVCC nature of Oracle vs Postgres and how setting appropriate FILLFACTOR value in Postgres can be beneficial. Let’s do more deep dive on this.
Completely compact table (FILLFACTOR = 100) will lead to bad performance of UPDATES and may lead to Index fragmentation as well. Whereas a low FILLFACTOR value causes Sequential Scan on a table to go slow because it has to read more partially filled blocks.
FILLFACTOR dependency factors:
To get optimal FILLFACTOR value, one should know the complete UPDATEs pattern on the table and how much space a…
Migrating from Oracle to Postgres? Be aware of FILLFACTOR and Indexing implications
We know how useful Indexes are to speed up the SQLs in relational databases. However all these indexes add overhead to DMLs processing because with any DML on the table, all the corresponding Indexes need to get updated.
Postgres has become quite popular open source database and many organizations are moving their workload from Oracle to Postgres. One of the major difference between these two RDBMS is their different architecture to handle multi version concurrency control (MVCC). Let’s understand that.
How UPDATE query works in Oracle
PostgreSQL is popular open-source relational database system and with offering in big cloud players like AWS, Azure and GCP making it top database choice. Postgres supports many different types of Indexes like Gin/Gist/Partial/BRIN. I was quite excited when I first read about the BRIN concept and explored it further.
What is BRIN Index?
BRIN stands for “Block Range INdex”. It stores the max/min values for block ranges defined by the parameter pages_per_range at the time of index creation.
create index idx_test on test using brin (last_updated_date) with (pages_per_range = 32);
By default pages_per_range is 128 so that means for 8kb…