However, it is not a magic dust and sprinkling it liberally over a database design does not automatically normalise the design. Such myth, unfortunately, has not gone away and seems to be spreading wider. It is as if for every person enlightened, three more replaces him/her.
I think the myth is propagated by the popularity of ORM and difficulties of many ORMs in dealing with relation having non-surrogate keys. But that's a topic for some other time.
Take, for example, the following design:
Surrogate Key
Food
Food Id | Food |
---|---|
1 | Apple |
2 | Orange |
3 | Yam |
4 | Carrot |
Kind Id | Kind |
---|---|
1 | Fruit |
2 | Root |
Food Id | Kind Id |
---|---|
1 | 1 |
2 | 1 |
3 | 2 |
4 | 2 |
Would you believe that it is equally as normalised as the following:
Natural Key
Food
Food Id |
---|
Apple |
Orange |
Yam |
Carrot |
Kind Id |
---|
Fruit |
Root |
Food Id | Kind Id |
---|---|
Apple | Fruit |
Orange | Fruit |
Yam | Root |
Carrot | Root |
and vice-versa.
The natural key version is not any more de-normalised nor redundant than the surrogate key version. There is no more redundancy in any one version than the other. One may be tempted to point at the
Food Kind
relation of the Natural Key version and shout "AHA! There are two 'Fruit' and two 'Root' entries under the Kind Id
column. That's redundant." I'd shout back "'AHA!' There are two '1' and two '2' entries at the same place in the Surrogate Key version. They both are keys. They uniquely identify tuples. If you are referring to the same tuple multiple times, there will be that many copy of the key be it represented with integers or strings.
This article is not debating which key is 'better'. There are many articles addressing the pros and cons of each key type. Simply google for "natural surrogate key".
There are other kind of having multiple copies of the same values. It is known as denormalised state.
De-Normalisation and redundancy goes hand-in-hand. A relation (table in SQL-speak) contains redundant information if the entries contain derivable information. The goal of normalisation is to perform loss-less decomposition such that there is no more derivable information in any relation.
De-Normalised Design
Food
Food Id | Is Crunchy |
---|---|
Apple | true |
Orange | false |
Yam | false |
Carrot | true |
Kind Id |
---|
Fruit |
Root |
Food Id | Kind Id | Is Crunchy |
---|---|---|
Apple | Fruit | true |
Orange | Fruit | false |
Yam | Root | false |
Carrot | Root | true |
The above is de-normalised design because the Is Crunchy field in Food Kind relation is derivable from Food itself.
Normalised Design
Food
Food Id | Is Crunchy |
---|---|
Apple | true |
Orange | false |
Yam | false |
Carrot | true |
Kind Id |
---|
Fruit |
Root |
Food Id | Kind Id |
---|---|
Apple | Fruit |
Orange | Fruit |
Yam | Root |
Carrot | Root |