Workiis: Surrogate key does not make normalisation

I am not a surrogate-key lover but I understand that there are people who are and situations where it is advantageous.

However, it is not a magic dust and sprinkling it liberally over a database design does not automatically normalise the design. Such myth, unfortunately, has not gone away and seems to be spreading wider. It is as if for every person enlightened, three more replaces him/her.

I think the myth is propagated by the popularity of ORM and difficulties of many ORMs in dealing with relation having non-surrogate keys. But that's a topic for some other time.

Take, for example, the following design:

Surrogate Key
Food

Food Id	`Food`
1	Apple
2	Orange
3	Yam
4	Carrot

Kind

Kind Id	`Kind`
1	Fruit
2	Root

Food Kind

Food Id	Kind Id
1	1
2	1
3	2
4	2

Would you believe that it is equally as normalised as the following:
Natural Key
Food

Food Id
Apple
Orange
Yam
Carrot

Kind

Kind Id
Fruit
Root

Food Kind

Food Id	Kind Id
Apple	Fruit
Orange	Fruit
Yam	Root
Carrot	Root

and vice-versa.

The natural key version is not any more de-normalised nor redundant than the surrogate key version. There is no more redundancy in any one version than the other. One may be tempted to point at the Food Kind relation of the Natural Key version and shout "AHA! There are two 'Fruit' and two 'Root' entries under the Kind Id column. That's redundant." I'd shout back "'AHA!' There are two '1' and two '2' entries at the same place in the Surrogate Key version.

They both are keys. They uniquely identify tuples. If you are referring to the same tuple multiple times, there will be that many copy of the key be it represented with integers or strings.

This article is not debating which key is 'better'. There are many articles addressing the pros and cons of each key type. Simply google for "natural surrogate key".

There are other kind of having multiple copies of the same values. It is known as denormalised state.
De-Normalisation and redundancy goes hand-in-hand. A relation (table in SQL-speak) contains redundant information if the entries contain derivable information. The goal of normalisation is to perform loss-less decomposition such that there is no more derivable information in any relation.

De-Normalised Design
Food

Food Id	Is Crunchy
Apple	true
Orange	false
Yam	false
Carrot	true

Kind

Kind Id
Fruit
Root

Food Kind

Food Id	Kind Id	Is Crunchy
Apple	Fruit	true
Orange	Fruit	false
Yam	Root	false
Carrot	Root	true

The above is de-normalised design because the Is Crunchy field in Food Kind relation is derivable from Food itself.
Normalised Design
Food

Food Id	Is Crunchy
Apple	true
Orange	false
Yam	false
Carrot	true

Kind

Kind Id
Fruit
Root

Food Kind

Food Id	Kind Id
Apple	Fruit
Orange	Fruit
Yam	Root
Carrot	Root

Workiis

2007-05-17

Surrogate key does not make normalisation

No comments: