SQL SERVER - JOINS - I

on August 02, 2010
       
The first thing we learn to do with SQL is writing a SELECT statement to get data from one table. This kind of statement seems to be straightforward and very close to the language we speak.
But real-world queries are often much more sophisticated than those simple SELECT statements.
First of all, usually the data we need is split into several different tables. This is a natural consequence of data normalization, which is an essential feature of any well designed database model. And SQL gives you the power to put that data together.

In the past, DBAs and developers used to put all necessary tables and/or views in the FROM clause and then use the WHERE clause to define how the records from each table would combine with the other records.

But there's been a long time since we have a standard for bringing these data together. And this is done using the JOIN operator ( ANSI-SQL 92). Unfortunately, there are some details about JOIN operators that remain obscure for many people.
Below I will show different syntaxes of joins supported by T-SQL (that is SQL SERVER 2008). I will outline a few concepts I believe they shouldn't be forgot each time we combine the data from two tables or more.

Getting Started: 1 Table, no Join

When you have only one object to query, the syntax will be quite simple and no join will be used. The statement will be the good & old "SELECT fields FROM object" plus any other optional clause you might want to use (that is WHERE, GROUP BY, HAVING or ORDER BY).

One thing that end users don't know is that we DBAs usually hide lots of complex joins under one nice and easy-to-use view. This is done for several reasons, ranging from data security to database performance. For instance, DBAs can give permissions for the end users to access one single view instead of several production tables, obviously increasing data security. Or considering performance, DBAs can create a view using the right parameters to join the records from several tables, correctly using database indexes and thus boosting query performance.
All in all, joins might be there in the database even when the end users don't see them.

The Logic Behind Joining Tables

When I started working with SQL, I learned there were several types of joins. But it took me some time to understand what exactly I was doing when I brought those tables together. Maybe because people are so scared of mathematics, it is not frequently said that the whole idea behind joining tables is about Set Theory. Despite the fancy name, the concept is so simple we are taught it in elementary school.

The drawing in Figure 1 is quite similar to the ones found in kids' books from First Grade. The idea is to find correspondent objects in the different sets. Well, this is precisely what we do with SQL JOIN's!

  
Combining objects from different sets


Once you understand the analogy, things will start to make sense.
Consider that the 2 sets in the above figure are tables and the numbers we see are the keys we will use to join the tables. So in each set, instead of representing the whole records, we are only seeing the key fields from each table. The result set of this combination will be determined by the type of join we consider, and this is the topic I will show now. To illustrate our examples, consider we have 2 tables, shown below:

Table1:

CREATE TABLE Table1 (key1 INT NOT NULL,
                                         field1 VARCHAR(50),
                                         field2 INT,
                                         key2 INT NOT NULL,
                                         key3 INT NOT NULL,
                                         PRIMARY KEY (key1));

key1 field1 field2 key2 key3
3 Erik 8 1 6
4 John 3 4 4
6 Mark 3 7 1
7 Peter 6 8 5
8 Harry 0 9 2

 

Table2:

CREATE TABLE Table2 (key2 INT NOT NULL,
                                         field1 VARCHAR(50),
                                         field2 VARCHAR(1),
                                         field3 VARCHAR(1),
                                         PRIMARY KEY (key2));

key2 field1 field2 field3
1 New York A N
2 Sao Paulo B N
4 Paris C Y
5 London C Y
6 Rome C Y
9 Madrid C Y
0 Bangalore D N


You will notice this script does not fully implement referential integrity. I intentionally left the tables without foreign keys to better explain the functionality of the different types of joins. But I did so for didactical purposes only. Foreign keys are extremely useful to guarantee data consistency and they should not be left out of any real-world database.
Well, now we are ready to go. Let's check the types of joins we can use in T-SQL, the correspondent syntax and the result set that each one will generate.

The Inner Join

This is the most common join we use in SQL. It returns the intersection of two sets. Or in terms of tables, it brings only the records from both tables that match a given criteria.
We can see in Figure 2 the Venn diagram illustrating the inner join of two tables. The result set of the operation is the area in red.

 Representing the INNER JOIN

Now check the syntax to combine the data from Table1 and Table2 using a INNER JOIN.

SELECT t1.key1, t1.field1 as Name, t1.key2 as T1Key, 
   t2.key2 as T2Key, t2.field1 as City
FROM Table1 t1
   INNER JOIN Table2 t2 ON t1.key2 = t2.key2; 

The result set of this statement will be:

key1 Name T1Key T2Key City
3 Erik 1 1 New York
4 John 4 4 Paris
8 Harry 9 9 Madrid

Notice it returned only the data from records which have the same value for key2 on both Table1 and Table2.
Opposed to the INNER JOIN, there is also the OUTER JOIN. There are 3 types of OUTER JOINs, named full, left and right. We will look at each one in detail below.

The FULL JOIN

This is also known as the FULL OUTER JOIN (the reserved word OUTER is optional). FULL JOINs work like the union of two sets. Now we have in Figure 3 the Venn diagram illustrating the FULL JOIN of two tables. The result set of the operation is again the area in red.


 Representing the FULL JOIN

The syntax is almost exactly the same we saw before.

SELECT t1.key1, t1.field1 as Name, t1.key2 as T1Key,
    t2.key2 as T2Key, t2.field1 as City
FROM Table1 t1
    FULL JOIN Table2 t2 ON t1.key2 = t2.key2 ;

The result set of this statement will be:

key1 Name T1Key T2Key City
3 Erik 1 1 New York
4 John 4 4 Paris
6 Mark 7 null null
7 Peter 8 null null
8 Harry 9 9 Madrid
null null null 2 Sao Paulo
null null null 5 London
null null null 6 Rome
null null null 0 Bangalore

The FULL JOIN returns all records from Table1 and Table2, without duplicating data.

In the next article, we will see other joins.

Source: SQL Server Central


0 comments:

Post a Comment