3.2.2: SQL M-M Relationships
Learning Objectives
SQL M-M relationships require a junction table between the 2 related tables, where the junction table contains at least both primary keys of the related tables
Junction tables can also contain non-foreign-key data columns that store information about the relationship in each row of the junction table
Introduction
We will now learn to associate data with many-to-many (M-M) relationships, such as sightings and categories, songs and genres, people and personality traits. In each of these pairs, both entities have many of the other entity.
So far we have not been able to model M-M relationships because our table structures have limited us. Storing a foreign key on either of the tables in a M-M relationship would limit us, because that would imply that each row in the table with the foreign key could only belong to a single row in the related table. We do not wish to store arrays of foreign keys (or anything) in SQL columns because doing so makes querying less efficient, and we also cannot duplicate rows to specify multiple foreign key values because we cannot have multiple rows with the same ID or primary key.
In the following hypothetical People
table where each person has multiple (up to infinite) personality traits, it is not possible to associate each person with multiple personality traits, assuming each personality trait also belongs to many people.
id | name | PersonalityId |
---|---|---|
1 | Joe | 1 |
2 | Sally | 2 |
The solution is what we call a "junction" table, aka "join" or "through" table that sits "in between" the tables with an M-M relationship. This junction table consists minimally of 3 columns: id
(typically the primary key of the junction table) and 2 foreign keys corresponding to the IDs of the tables with an M-M relationship. The junction table works well because it allows us to specify combinations of each row in both related tables without repeating any unnecessary data.
Imagine the following People
table.
id | name |
---|---|
1 | Joe |
2 | Sally |
Now imagine the following Personalities
table.
id | trait |
---|---|
1 | Confident |
2 | Detail-oriented |
3 | Kind |
How would we specify that Joe is confident and detail-oriented, and Sally is confident and kind? Similarly, how would we specify that both Joe and Sally are confident? The answer is a junction table like the following.
id | PersonId | PersonalityId |
---|---|---|
1 | 1 | 1 |
2 | 1 | 2 |
3 | 2 | 1 |
4 | 2 | 3 |
Notice the junction table specifies all associations precisely without repeating any unnecessary data. This is what makes SQL M-M relationships efficient to store and to query.
Storing and retrieving M-M relationship data with SQL
Store M-M relationship data
In raw SQL we would insert data into the junction table manually. Assuming the table structures above and that all people and personalities are already in the database, we could write a SQL query like the following to specify that Joe is confident.
Notice how manual this is and prone to human error if we were to mistype one of the table names, column names or values. This is why SQL is primarily used for querying and not inserting new data from applications.
Notice we have assumed the junction table name is "PersonPersonalities". It is convention to name the junction table after the concatenation of the 2 related table names, where the 1st table in the concatenation is singular and the 2nd is plural. There is no rule for which table should go 1st or 2nd in the concatenation.
Retrieve M-M relationship data
If I wanted to query for what personality traits Joe has, I would query the junction table for Joe's personality traits.
The above query assumes I already know Joe's id
is 1, otherwise I would have to first query for Joe's ID. The query retrieves PersonalityId
s associated with Joe, and I would then need to map these IDs back to personality trait names. It's common in data analysis to use SQL joins to perform this query and the ID-to-name mapping in a single query.
Again, handling IDs manually is tedious and error-prone, hence why we use ORMs like Sequelize in apps.
Storing and retrieving non-foreign-key relationship data in junction table
Non-foreign-key data in junction table
A common feature of junction tables is to store additional columns beyond the IDs of the 2 related tables. For example, in our People
and Personalities
example, we may wish to store intensity of each personality trait. Even when 2 people are both confident and kind, one might be relatively more or less confident or kind than the other.
To store so-called non-foreign-key relationship data in our junction table, our junction table PersonPersonalities
could look like the following.
id | PersonId | PersonalityId | intensity |
---|---|---|---|
1 | 1 | 1 | 2 |
2 | 1 | 2 | 1 |
3 | 2 | 1 | 3 |
4 | 2 | 3 | 2 |
Notice the intensity
column in the above example. We define intensity as a number between 1-3 that represents how intensely this person embodies this personality. Notice how we would not be able to store intensity on either People
or Personalities
tables directly because each person does not have the same intensity for each personality, and each personality does not have the same intensity for each person.
The above table tells us Joe has intensity 2 for confidence and intensity 1 for detail-oriented. What about Sally?
Storing non-foreign-key-data in junction table
Similar to foreign key data, in raw SQL we could query the junction table directly to store non-foreign-key data such as intensity.
Retrieving non-foreign-key data from junction table
Also similar to foreign key data, we can retrieve non-foreign-key data from the junction table directly with SQL.
Additional Resources
Here is a video that visually explains the concept of a SQL M-M relationship with a junction table.