Your SQL is probably WRONG
Published:
Yes, the title is a bit click-baity, but I hope that this catches your attention for a big warning of a very subtle behavior of most SQL dialects. The main idea is that when you perform a left (or right) join that has clauses on the right (left) hand side table, it is very easy to mess up the integrity of the join and end up with results that are very different to what you are expecting. This behavior is very easy to miss and many people fall into this trap without even noticing, as it is basically a silent failure.
The objective
The main objective is simple: we have two clients, the clients
table and the orders
table. They contain respectively the information regarding the clients metadata and the clients records of orders. We are interested in obtaining the number of orders per client (displayed by their id), even if there are no entries in the orders . Then we will repeate the same exercise but for a particular day.
Setting up the tables
For this example I will assume that we have a running Postgres database, and I will proceed to create the tables from scratch and insert data manually to show the main point. In this section we will take care of the first part. Note that to keep things simple, I will be using simple incremental integer valued counters for the tables, which is something that I would most likely not do in a production environment.
Let’s start by creating a clients
table:
CREATE TABLE clients (
client_id SERIAL PRIMARY KEY,
client_name VARCHAR(255)
);
This table contains only the client_id
and their name. Let’s create now an orders
table:
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
product_id VARCHAR(255),
ordered_by INT,
ordered_at TIMESTAMP,
quantity INT
);
In this table we have some information about the orders, including who made the order. This means there is a direct relationship with the clients
table. We will insert some dummy data so we can work with these examples.
INSERT INTO clients (client_name) VALUES
('John Doe'),
('Jane Smith'),
('Sarah Johnson');
INSERT INTO orders (product_id, ordered_by, ordered_at, quantity) VALUES
('ABC123', 1, '2023-09-25 10:00:00', 2),
('DEF456', 1, '2023-09-26 12:30:00', 3);
Counting orders
Now we go to the task at hand. The solution is simple: we join the tables and apply a group by. A careful reader would note that only one of the clients has orders, so if we do an inner join, we will lose the other clients. That is why we must do a left join and coalesce the sum of the quantities to have a 0 value for the other clients. The query would look like this:
select
client_id,
coalesce(sum(quantity), 0) as quantity
from
clients as c
left join
orders as o
on
c.client_id = o.ordered_by
group by
client_id
order by
client_id
;
All good, business as usual. We get our neat results:
client_id | quantity |
---|---|
1 | 5 |
2 | 0 |
3 | 0 |
The problem
The problem arises when we start trying to impose conditions over the join. Suppose that now we are interested not on all the orders but only on the orders that occurred during 2023-09-26
. This means that we need to impose that condition somewhere in the query. The most natural place is to put it in the where
clause, just like this:
select
client_id,
coalesce(sum(quantity), 0) as quantity
from
clients as c
left join
orders as o
on
c.client_id = o.ordered_by
where
date_trunc('day', o.ordered_at) = '2023-09-26'
group by
client_id
order by
client_id
;
but let’s look at the results:
client_id | quantity |
---|---|
1 | 3 |
We have lost the other clients for which we had zero orders! Why? The answer is subtle: when you tell postgres date_trunc('day', o.ordered_at) = '2023-09-26'
, you are implicitely telling it that the column o.ordered_at
has to be not null and satisfy that condition AFTER the join, and then proceed with the aggregations. This means that it will join the tables, will then filter out the resulting joined rows that have the null field (for all the clients but John) and then it will sum the quantities of the orders.
The fix
The fix is quite simple here: you can put conditions like date_trunc('day', o.ordered_at) = '2023-09-26'
in the join clause. In this particular case, it would look like
select
client_id,
coalesce(sum(quantity), 0) as quantity
from
clients as c
left join
orders as o
on
c.client_id = o.ordered_by
and date_trunc('day', o.ordered_at) = '2023-09-26'
group by
client_id
order by
client_id;
which results in the following:
client_id | quantity |
---|---|
1 | 3 |
2 | 0 |
3 | 0 |
Voilà! To see what is going on here, think that the join will consist of all the joined rows that match on the client_id
and ordered_by
fields, as well as the having the ordered_at
to be within the day of ‘2023-09-26’. We are looking for all the rows of orders that match with Sarah and Jane that happened on the 26th, which is none, hence we get each of them matched with no rows, and consequently, just null values.
Conclusion
While this seems like a toy example, this situation can be found very often disguised in much larger and complex queries. It is important to understand well the order of the statements in a SQL query, specially when we are doing joins that are not inner.
Leave a Comment