3.2.2: SQL M-M Relationships

Learning Objectives

  1. SQL M-M relationships require a junction table between the 2 related tables, where the junction table contains at least both primary keys of the related tables

  2. Junction tables can also contain non-foreign-key data columns that store information about the relationship in each row of the junction table

Introduction

We will now learn to associate data with many-to-many (M-M) relationships, such as sightings and categories, songs and genres, people and personality traits. In each of these pairs, both entities have many of the other entity.

So far we have not been able to model M-M relationships because our table structures have limited us. Storing a foreign key on either of the tables in a M-M relationship would limit us, because that would imply that each row in the table with the foreign key could only belong to a single row in the related table. We do not wish to store arrays of foreign keys (or anything) in SQL columns because doing so makes querying less efficient, and we also cannot duplicate rows to specify multiple foreign key values because we cannot have multiple rows with the same ID or primary key.

In the following hypothetical People table where each person has multiple (up to infinite) personality traits, it is not possible to associate each person with multiple personality traits, assuming each personality trait also belongs to many people.

id
name
PersonalityId

1

Joe

1

2

Sally

2

The solution is what we call a "junction" table, aka "join" or "through" table that sits "in between" the tables with an M-M relationship. This junction table consists minimally of 3 columns: id (typically the primary key of the junction table) and 2 foreign keys corresponding to the IDs of the tables with an M-M relationship. The junction table works well because it allows us to specify combinations of each row in both related tables without repeating any unnecessary data.

Imagine the following People table.

id
name

1

Joe

2

Sally

Now imagine the following Personalities table.

id
trait

1

Confident

2

Detail-oriented

3

Kind

How would we specify that Joe is confident and detail-oriented, and Sally is confident and kind? Similarly, how would we specify that both Joe and Sally are confident? The answer is a junction table like the following.

id
PersonId
PersonalityId

1

1

1

2

1

2

3

2

1

4

2

3

Notice the junction table specifies all associations precisely without repeating any unnecessary data. This is what makes SQL M-M relationships efficient to store and to query.

Storing and retrieving M-M relationship data with SQL

Store M-M relationship data

In raw SQL we would insert data into the junction table manually. Assuming the table structures above and that all people and personalities are already in the database, we could write a SQL query like the following to specify that Joe is confident.

INSERT INTO "PersonPersonalities" ("PersonId", "PersonalityId") VALUES (1, 1);

Notice how manual this is and prone to human error if we were to mistype one of the table names, column names or values. This is why SQL is primarily used for querying and not inserting new data from applications.

Notice we have assumed the junction table name is "PersonPersonalities". It is convention to name the junction table after the concatenation of the 2 related table names, where the 1st table in the concatenation is singular and the 2nd is plural. There is no rule for which table should go 1st or 2nd in the concatenation.

Retrieve M-M relationship data

If I wanted to query for what personality traits Joe has, I would query the junction table for Joe's personality traits.

SELECT "PersonalityId" FROM "PersonPersonalities" WHERE "PersonId"=1;

The above query assumes I already know Joe's id is 1, otherwise I would have to first query for Joe's ID. The query retrieves PersonalityIds associated with Joe, and I would then need to map these IDs back to personality trait names. It's common in data analysis to use SQL joins to perform this query and the ID-to-name mapping in a single query.

Again, handling IDs manually is tedious and error-prone, hence why we use ORMs like Sequelize in apps.

Storing and retrieving non-foreign-key relationship data in junction table

Non-foreign-key data in junction table

A common feature of junction tables is to store additional columns beyond the IDs of the 2 related tables. For example, in our People and Personalities example, we may wish to store intensity of each personality trait. Even when 2 people are both confident and kind, one might be relatively more or less confident or kind than the other.

To store so-called non-foreign-key relationship data in our junction table, our junction table PersonPersonalities could look like the following.

id
PersonId
PersonalityId
intensity

1

1

1

2

2

1

2

1

3

2

1

3

4

2

3

2

Notice the intensity column in the above example. We define intensity as a number between 1-3 that represents how intensely this person embodies this personality. Notice how we would not be able to store intensity on either People or Personalities tables directly because each person does not have the same intensity for each personality, and each personality does not have the same intensity for each person.

The above table tells us Joe has intensity 2 for confidence and intensity 1 for detail-oriented. What about Sally?

Storing non-foreign-key-data in junction table

Similar to foreign key data, in raw SQL we could query the junction table directly to store non-foreign-key data such as intensity.

INSERT INTO "PersonPersonalities" ("PersonId", "PersonalityId", intensity) VALUES (1, 1, 2);

Retrieving non-foreign-key data from junction table

Also similar to foreign key data, we can retrieve non-foreign-key data from the junction table directly with SQL.

SELECT "PersonalityId", intensity FROM "PersonPersonalities" WHERE "PersonId"=1;

Additional Resources

  1. Here is a video that visually explains the concept of a SQL M-M relationship with a junction table.