Query Execution Pipeline
Detailed & structured explanation of how queries will be processed and execution in PostgreSQL Internally
The Query Execution Pipeline in PostgreSQL - From SQL string to Results
Have you ever wondered how PostgreSQL return the results magically out of nowhere, when you run a query with simple SQL Statement? Well this blog helps you understand in this blog with detailed explanation
Introduction
In reality, PostgreSQL follows a multi-stage pipeline that transforms your SQL statement into a well-optimized executable plan and then runs that plan to fetch the results.
In Big Picture
PostgreSQL's query execution pipeline can be thought of in five key stages:
Parser - Check syntax and create an internal referencable Parse Tree.
Analyzer - The analyzer carries out a semantic analysis of a parse tree and generates a Query Tree (unoptimized).
Rewriter - Rewriter transforms a query tree into better query tree (still unoptimized) by applying Rules, Views and Constraints.
Planner/Optimizer - The planner decides the best way to run the query by choosing the best plan tree from the set of generated plan trees that can most effectively be executed from the query tree.
Executor - Finally, the executor executes the query how the choosed plan tells by accessing the tables, views and indexes.
SQL --> Parser --> Analyzer --> Planner/Optimizer --> Executor --> Results
Stages Breakdown
A. Parsing
PostgreSQL takes our SQL statement and parse it into a
parse tree.It do certain checks on the query like:
Syntax (
SELECTvsSELEC?)Also checks for valid column names, functions exists (like avg(), sum()) etc.
At the end of this stage, we will posses a raw parse tree based on syntax (not optimized).
Example
SELECT id, first_name FROM users WHERE age > 30;
At this stage, PG only cares that:
Does the
userstable exists?Columns
id,first_nameandageexists inuserstable?>is valid?All
Keywordsused are valid?
B. Analyzer & Rewriting
Analyzer carries out a semantic analysis of a parse tree and generates a query tree.
In this stage, PostgreSQL rewriter tries to apply the transformation rules defined here. Few examples:
Views --> replaced with theire underlying query definition.
Rules --> rewrite logic applied
At the end of this stage, rewriter converts the parse tree into query tree on bases of semantics (not optimized)
Example (based on view)
CREATE VIEW active_users AS SELECT * FROM users WHERE active = true; -- active_users view
-- We queried as follows:
SELECT id FROM active_users WHERE age > 30;
Now, the rewriter will transforms the above query into something like:
SELECT id FROM users WHERE active = true AND age > 30;
C. Planner/Optimizer
The main master mind in the whole execution pipeline.
It takes the query tree as input and generates the multiple candidate execution plans for the query.
It estimates the cost of each plan generated using table statistics and selects the cheapest one among the all.
But the cost estimated is based purely on stats available, it might not 100% accurate.
Cost Factors:
Table Size
Index Availability
Join Algorithms (Nested Loop, Hash Join, Merge Join)
Disk I/O vs Memory Usage
Example
EXPLAIN SELECT id, first_name FROM users WHERE age > 30;
Might returns:
Seq Scan on users (cost=0.00..45.00 rows=300 width=12)
Filter: (age > 30)
Meaning
PostgreSQL will scan the whole
userstable.Seq Scan refers table scan.
Some other scan you might come across:
IndexScan,TidScan,BitmapScan,IndexOnlyScanetc.
Expected rows in the results will be
~300.
D. Execution
The executor follows the plan choosen by planner and executes it step by step.
Execution is Volcano-Style, meaning each node gets the results from its child node, and processes them and passes them upwards.
Example
EXPLAIN SELECT primary_category, sum(likes) FROM blogs GROUP BY primary_category;
QUERY PLAN
--------------------------------------------------------------
HashAggregate (cost=102.65..102.66 rows=1 width=13)
Group Key: primary_category
-> Seq Scan on blogs (cost=0.00..102.43 rows=43 width=9)
(3 rows)
As explained:
At first, postgresql will scan entire table
blogsand passes it's results to HashAggregate.HashAggregate will apply
Group Keyand computes thesum(likes)and gives the results to the users.
Conclusion
PostgreSQL’s query execution pipeline is a blend of parsing rigor, rewrite intelligence, and cost-based planning wizardry. By understanding each stage, you can write smarter SQL and make the database work for you, not against you.
Enjoyed this post?
2 reactions
Related Posts
Comments
Leave a comment
Stay Updated
Get notified when I publish new articles. No spam, unsubscribe anytime.