2007-05-17

Surrogate key does not make normalisation

I am not a surrogate-key lover but I understand that there are people who are and situations where it is advantageous.

However, it is not a magic dust and sprinkling it liberally over a database design does not automatically normalise the design. Such myth, unfortunately, has not gone away and seems to be spreading wider. It is as if for every person enlightened, three more replaces him/her.

I think the myth is propagated by the popularity of ORM and difficulties of many ORMs in dealing with relation having non-surrogate keys. But that's a topic for some other time.

Take, for example, the following design:
 
Surrogate Key
Food
Food IdFood
1Apple
2Orange
3Yam
4Carrot
Kind
Kind IdKind
1Fruit
2Root
Food Kind
Food IdKind Id
11
21
32
42


Would you believe that it is equally as normalised as the following:
Natural Key
Food
Food Id
Apple
Orange
Yam
Carrot
Kind
Kind Id
Fruit
Root
Food Kind
Food IdKind Id
AppleFruit
OrangeFruit
YamRoot
CarrotRoot

and vice-versa.

The natural key version is not any more de-normalised nor redundant than the surrogate key version. There is no more redundancy in any one version than the other. One may be tempted to point at the Food Kind relation of the Natural Key version and shout "AHA! There are two 'Fruit' and two 'Root' entries under the Kind Id column. That's redundant." I'd shout back "'AHA!' There are two '1' and two '2' entries at the same place in the Surrogate Key version.

They both are keys. They uniquely identify tuples. If you are referring to the same tuple multiple times, there will be that many copy of the key be it represented with integers or strings.

This article is not debating which key is 'better'. There are many articles addressing the pros and cons of each key type. Simply google for "natural surrogate key".

There are other kind of having multiple copies of the same values. It is known as denormalised state.
De-Normalisation and redundancy goes hand-in-hand. A relation (table in SQL-speak) contains redundant information if the entries contain derivable information. The goal of normalisation is to perform loss-less decomposition such that there is no more derivable information in any relation.
 
De-Normalised Design
Food
Food IdIs Crunchy
Appletrue
Orangefalse
Yamfalse
Carrottrue
Kind
Kind Id
Fruit
Root
Food Kind
Food IdKind IdIs Crunchy
AppleFruittrue
OrangeFruitfalse
YamRootfalse
CarrotRoottrue


The above is de-normalised design because the Is Crunchy field in Food Kind relation is derivable from Food itself.
Normalised Design
Food
Food IdIs Crunchy
Appletrue
Orangefalse
Yamfalse
Carrottrue
Kind
Kind Id
Fruit
Root
Food Kind
Food IdKind Id
AppleFruit
OrangeFruit
YamRoot
CarrotRoot

2007-05-11

Hypermedia, the forgotten aspect of REST

In a post to rest-discuss mailing list, Roy Fielding quizzed if anyone else know what was missing from the REST presentation slides made by Steve Bjorg.
Without looking at the slides and the following posts, can you guess what was missing?
If someone is going to give a presentation on REST, it is likely that they will emphasise the more popular aspects of REST: resource, resource identification, representation of the resource and the uniform methods for interacting and manipulating the resource.
Many presentations that I have seen would further concentrate on the look-and-feel of "REST-ful" URLs, giving the audiences the wrong message that REST is primarily about URL construction. Josh Sled brought a good example of how even a big-name (I'm not telling who, go figure it out from the look-and-feel of the URL) development group got REST wrong: http://developer.yahoo.com/photos/V3.0/createAlbum.html. Whatever do they mean by "Passed in as a REST parameter.", I wonder. Quick, the server ate too many REST parameters, pass in the non-REST ones.
What is missing is hypermedia. It's the last constraint mentioned in the dissertation. http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm section 5.1.5:
REST is defined by four interface constraints: identification of resources; manipulation of resources through representations; self-descriptive messages; and, hypermedia as the engine of application state.
But whatever does "hypermedia as the engine of application state" mean? I like Mark Baker's explanation:
Hypermedia as the engine of application state" is simply a long-winded way of saying that REST clients make progress via links embedded in the data they retrieve. That sounds like an almost information-free statement, but it's not difficult to violate this constraint, and it can be quite costly to do so as well.
For example, if you have a blog, and there is an entry for 2007-05-11, from your home page, there should be a link pointing to that entry. User and client should not be expected to guess how to get there.
Just as state is explicitly represented and transferred, so must the transitions.