Wednesday, April 26, 2006

UNION and Venn diagrams

I had a developer come around today with a problem piece of SQL where he needed to get a distinct list of users (userids) which the application hadn't set preferences correctly for.
The joys of no constraints in the db. Here was a classic example where a constraint/rule i.e. a user must have a preference (or at least a default) in the beginning would have saved this whole data cleansing process now.
This was MySQL 4.0 so there were no subqueries (not that they were needed anyway).

We spent, well I spent about 10 minutes understanding exactly what he was trying to achieve with the SQL he provided but the overall aim of the SQL.
In the end I got out the trusty pad and drew some Venn (Set) diagrams of the JOINs and shaded the areas he wanted back. This was enough for us both to see the existing SQL was going to return the wrong result!

So where does UNION come into it?

UNION exists in almost every relational database and is used to combine to rowsets of data together. UNION will remove duplicates, UNION ALL will leave duplicates.
So instead of using a temporary table to store the rowset and issue another SQL, with UNION you can get the desired rowset in one group. This type of operation validates a performance rule i.e. visit the same data once or a few times as possible.

Example:

select userid from t1 join t2 on t1.id = t2.id
union
select userid from t3 join t4 on t3.id = t4.id

If there is a table on both sides of the UNION there is potential to replace with the appropriate join or subquery, sometimes it is easier though (unless performance is a problem) to leave it as a UNION for readability.

The end result was SQL which provided the result required, without the complexity of temporary tables. It hasn't fix the poor database design yet, as we still need to use distinct and worry about dupes.

Have Fun

Paul