2007-07-14

REST-ified Message Queue on HTTP

REST-ified Message Queue on HTTP[1]

In a post to rest-discuss mailing list, Paul Winkler noted that the REST API offered by activemq is not actually REST-ful at all.

* it's not REST that says GET is a read-only operation; it's HTTP. So
their HTTP implementation is broken. Sadly seems to be pretty common.
* DELETE on a URL representing a queue means you want to delete the
entire queue, not a single message!

He ended with a question "Has anybody thought of a good way to model a queue?". The following is my solution.
A message queue can be thought of as having two ends: one for the producers and the other for the consumers. The producers insert messages into the consumers which will be popped by the consumers. A queue is a simple concept and can be implemented simply in programs. However, once networking capability creeps into the requirements, things can get complicated because network is not reliable. Simple implementations of queue work only because it assumes data transfer within a process is reliable which is a reasonable assumption for in-process activity.
When network is factored in, new problems crop up:

How do you reliably insert messages?

This problem can be paraphrased as doing reliable delivery over HTTP. There are two techniques that address this: Bill de hÓra's HTTPLR and Paul Prescod's.
Using any of these two technique, a producer can POST a message to the same queue URL shared by other producers.

==> POST /queue
    some message
<== 201 Created
    Location: /message/433

How do you reliably allow multiple consumers to pop messages?

This problem expands on the previous one. There is a pop action that has to be considered. In a queue, the pop operation removes and return the head of the queue to the caller. With ActiveMQ, you'd send a DELETE to the queue URL. As Paul Winkler noted, that means the DELETE is applied to the queue, not to individual message.
If one can get a list of messages, then one can repeatedly attempt to pop (by doing DELETE) on each messages. That is, one repeatedly poll for an available message.

==> GET /queue/messages
<== 200 OK
    [list of messages]

// oops, some other client has taken msg #100
==> DELETE /message/100
<== 410 Gone 

==> DELETE /message/102
<== 200 OK
    [message 102]

The polling technique is a form of busy waiting. It is an efficient technique if there is a high probability that the conditions will be met within a short period of time relative to the frequency of polling. To reduce collisions (the 410 Gone in the example above), consumers can try to DELETE randomly.

However, with an increase in number of clients, the frequency of collision increases and efficiency decreases. Furthermore, network is not reliable and client/server can die at any point. When a client issues a DELETE, and it does not receive any response back from the server, it cannot be certain that a message has not been popped for it.

Desirable Properties

So, what are the desirable properties a solution must have?

Client should not Care About Other Client's Existence

The reason polling is done by clients in the example above is because a client has to care about other clients. If it knows for certain that it is the only client, it does not have to poll.
We can put each client in its own virtual world where it can think that it is the only client in existence so it does not have to poll. This is accomplished by giving each client its own access point. Let's implement it by giving each client its own personalised de-queue process resource identified by a unique (per client) URL.

Forgiving to Unreliable Network, Client and Server

In real world, things break unexpectedly², be it some network admin unplugging a port incidentally, or the server runs out of disk and errors out temporarily, or the client was killed by a OOM condition.
Any implementation of a reliable message queue must be able to at least recover from the above breakages. We can simplify the possible conditions to two cases:

Server does not receive the request
Client does not receive the response

as the other cases depends on how the client and server are implemented, which I am saving for future topic.

A Solution

The only thing required is for the consumer client to have its own consumer ID. This is not an unreasonable requirement and can be implemented easily (GUID or other simpler mechanisms).
A de-queue is implemented as a sequence of one or more GET requests followed with a DELETE. A GET causes the server to returns a previously bound message to the consumer, if there is any. If there isn't any, return the next unbound message. A DELETE on a message removes the binding on that message.

==> GET /queue/pending_message?consumer_id=12345
<== 200, MSG #32

// repeated GET should yield the same message
==> GET /queue/pending_message?consumer_id=12345
<== 200, MSG #32


// A way to access a particular message
==> GET /queue/message/32
<== 200, MSG #32

// Unbind message 32
==> DELETE /queue/message/32
<== 204

// Since no message is bound to the customer, this GET binds one
GET /queue/pending_message?consumer_id=12345
<== 200, MSG #34

DELETE /queue/message/34
<== 204

// A personal access point allows personalised retrieval criteria
POST /queue/pending_message/filter?consumer_id=12345
<== Return oldest message first
<== 204

// Now retrieves from the oldest to the newest
GET /queue/pending_message?consumer_id=12345
<== 200, MSG #1

^[1] I explicitly mentioned 'on HTTP' because there are groups who seem to tied REST and HTTP too strongly for my liking; some even goes as far as equating REST to HTTP.
^[2] Does expecting things to break unexpectedly make such breakage expected?

2007-05-17

Surrogate key does not make normalisation

I am not a surrogate-key lover but I understand that there are people who are and situations where it is advantageous.

However, it is not a magic dust and sprinkling it liberally over a database design does not automatically normalise the design. Such myth, unfortunately, has not gone away and seems to be spreading wider. It is as if for every person enlightened, three more replaces him/her.

I think the myth is propagated by the popularity of ORM and difficulties of many ORMs in dealing with relation having non-surrogate keys. But that's a topic for some other time.

Take, for example, the following design:

Surrogate Key
Food

Food Id	`Food`
1	Apple
2	Orange
3	Yam
4	Carrot

Kind

Kind Id	`Kind`
1	Fruit
2	Root

Food Kind

Food Id	Kind Id
1	1
2	1
3	2
4	2

Would you believe that it is equally as normalised as the following:
Natural Key
Food

Food Id
Apple
Orange
Yam
Carrot

Kind

Kind Id
Fruit
Root

Food Kind

Food Id	Kind Id
Apple	Fruit
Orange	Fruit
Yam	Root
Carrot	Root

and vice-versa.

The natural key version is not any more de-normalised nor redundant than the surrogate key version. There is no more redundancy in any one version than the other. One may be tempted to point at the Food Kind relation of the Natural Key version and shout "AHA! There are two 'Fruit' and two 'Root' entries under the Kind Id column. That's redundant." I'd shout back "'AHA!' There are two '1' and two '2' entries at the same place in the Surrogate Key version.

They both are keys. They uniquely identify tuples. If you are referring to the same tuple multiple times, there will be that many copy of the key be it represented with integers or strings.

This article is not debating which key is 'better'. There are many articles addressing the pros and cons of each key type. Simply google for "natural surrogate key".

There are other kind of having multiple copies of the same values. It is known as denormalised state.
De-Normalisation and redundancy goes hand-in-hand. A relation (table in SQL-speak) contains redundant information if the entries contain derivable information. The goal of normalisation is to perform loss-less decomposition such that there is no more derivable information in any relation.

De-Normalised Design
Food

Food Id	Is Crunchy
Apple	true
Orange	false
Yam	false
Carrot	true

Kind

Kind Id
Fruit
Root

Food Kind

Food Id	Kind Id	Is Crunchy
Apple	Fruit	true
Orange	Fruit	false
Yam	Root	false
Carrot	Root	true

The above is de-normalised design because the Is Crunchy field in Food Kind relation is derivable from Food itself.
Normalised Design
Food

Food Id	Is Crunchy
Apple	true
Orange	false
Yam	false
Carrot	true

Kind

Kind Id
Fruit
Root

Food Kind

Food Id	Kind Id
Apple	Fruit
Orange	Fruit
Yam	Root
Carrot	Root

2007-05-11

Hypermedia, the forgotten aspect of REST

In a post to rest-discuss mailing list, Roy Fielding quizzed if anyone else know what was missing from the REST presentation slides made by Steve Bjorg.
Without looking at the slides and the following posts, can you guess what was missing?
If someone is going to give a presentation on REST, it is likely that they will emphasise the more popular aspects of REST: resource, resource identification, representation of the resource and the uniform methods for interacting and manipulating the resource.
Many presentations that I have seen would further concentrate on the look-and-feel of "REST-ful" URLs, giving the audiences the wrong message that REST is primarily about URL construction. Josh Sled brought a good example of how even a big-name (I'm not telling who, go figure it out from the look-and-feel of the URL) development group got REST wrong: http://developer.yahoo.com/photos/V3.0/createAlbum.html. Whatever do they mean by "Passed in as a REST parameter.", I wonder. Quick, the server ate too many REST parameters, pass in the non-REST ones.
What is missing is hypermedia. It's the last constraint mentioned in the dissertation. http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm section 5.1.5:

REST is defined by four interface constraints: identification of resources; manipulation of resources through representations; self-descriptive messages; and, hypermedia as the engine of application state.

But whatever does "hypermedia as the engine of application state" mean? I like Mark Baker's explanation:

Hypermedia as the engine of application state" is simply a long-winded way of saying that REST clients make progress via links embedded in the data they retrieve. That sounds like an almost information-free statement, but it's not difficult to violate this constraint, and it can be quite costly to do so as well.

For example, if you have a blog, and there is an entry for 2007-05-11, from your home page, there should be a link pointing to that entry. User and client should not be expected to guess how to get there.
Just as state is explicitly represented and transferred, so must the transitions.

2007-03-28

mdadm Cares About Partition Type

mdadm now starts caring for the partition type when creating a raid volume on partitions. It has to be 'fd' (Linux raid auto).

[used fdisk to create sda1, sda2, sdb1, sdb2. 
sda1 and sdb2's type is 82 (linux swap).
sda2 and sdb2's type is 83 (linux)]

~ $ sudo mdadm --create  /dev/md3 -n 2 -l raid1 --name='raid1 volume set' /dev/sda2 /dev/sdb2

mdadm: /dev/sda2 is too small: 0K
mdadm: create aborted



~ $ sudo mdadm --version

mdadm - v2.5.6 - 9 November 2006


[used fdisk to change sda2 and sdb2 to fd.
"Changed system type of partition 2 to fd (Linux raid autodetect)"]


~ $ sudo mdadm --create  /dev/md3 -n 2 -l raid1 --name='raid1 volume set' /dev/sda2 /dev/sdb

mdadm: array /dev/md3 started.

(originally from http://microjet.ath.cx/WebWiki/2007.03.28_mdadm_Cares_About_Partition_Type.html)

2007-03-15

1994 nissan sentra fuel injector wierdness

We have two 1994 Nissan Sentra here, Put-Put and Pit-Pit. Put is 160K and Pit is 200K. They both still run strong but that is thanks to having 2 fuel injectors replaced.

It started last November when Pit's cylinder #1 fuel injector went bad. It was so unexpected because fuel injector rarely goes bad. I chalked it up to bad fuel filter. So I replaced the fuel injector and filter.

Then two weeks later, Put's cylinder #1 fuel injector also went bad. That was so surprising. What was the chance of two Sentras having bad fuel injector within weeks of each other. I can't blame it on gasoline because the other car, a 1991 Mazda Protege, was not experiencing any problem, and googling for the problem did not indicate that there was a widespread problem with Sentra's fuel injector. I replaced the fuel injector and filter.

The next day, Put's cylinder #3 fuel injector went bad. In less than 24 hours! I replaced it, and there is no more problem so far.

But last week, Pit's cylinder #4 fuel injector went dead again. By this time, I was so good at replacing Sentra's fuel injector, I did the whole thing from start to finish in under 30 minutes. For comparison purpose, the first time I did it I took 6 hours splitted across 2 days.