2007-05-17

Surrogate key does not make normalisation

I am not a surrogate-key lover but I understand that there are people who are and situations where it is advantageous.

However, it is not a magic dust and sprinkling it liberally over a database design does not automatically normalise the design. Such myth, unfortunately, has not gone away and seems to be spreading wider. It is as if for every person enlightened, three more replaces him/her.

I think the myth is propagated by the popularity of ORM and difficulties of many ORMs in dealing with relation having non-surrogate keys. But that's a topic for some other time.

Take, for example, the following design:
 
Surrogate Key
Food
Food IdFood
1Apple
2Orange
3Yam
4Carrot
Kind
Kind IdKind
1Fruit
2Root
Food Kind
Food IdKind Id
11
21
32
42


Would you believe that it is equally as normalised as the following:
Natural Key
Food
Food Id
Apple
Orange
Yam
Carrot
Kind
Kind Id
Fruit
Root
Food Kind
Food IdKind Id
AppleFruit
OrangeFruit
YamRoot
CarrotRoot

and vice-versa.

The natural key version is not any more de-normalised nor redundant than the surrogate key version. There is no more redundancy in any one version than the other. One may be tempted to point at the Food Kind relation of the Natural Key version and shout "AHA! There are two 'Fruit' and two 'Root' entries under the Kind Id column. That's redundant." I'd shout back "'AHA!' There are two '1' and two '2' entries at the same place in the Surrogate Key version.

They both are keys. They uniquely identify tuples. If you are referring to the same tuple multiple times, there will be that many copy of the key be it represented with integers or strings.

This article is not debating which key is 'better'. There are many articles addressing the pros and cons of each key type. Simply google for "natural surrogate key".

There are other kind of having multiple copies of the same values. It is known as denormalised state.
De-Normalisation and redundancy goes hand-in-hand. A relation (table in SQL-speak) contains redundant information if the entries contain derivable information. The goal of normalisation is to perform loss-less decomposition such that there is no more derivable information in any relation.
 
De-Normalised Design
Food
Food IdIs Crunchy
Appletrue
Orangefalse
Yamfalse
Carrottrue
Kind
Kind Id
Fruit
Root
Food Kind
Food IdKind IdIs Crunchy
AppleFruittrue
OrangeFruitfalse
YamRootfalse
CarrotRoottrue


The above is de-normalised design because the Is Crunchy field in Food Kind relation is derivable from Food itself.
Normalised Design
Food
Food IdIs Crunchy
Appletrue
Orangefalse
Yamfalse
Carrottrue
Kind
Kind Id
Fruit
Root
Food Kind
Food IdKind Id
AppleFruit
OrangeFruit
YamRoot
CarrotRoot

No comments: