2007-07-14

REST-ified Message Queue on HTTP

REST-ified Message Queue on HTTP[1]

In a post to rest-discuss mailing list, Paul Winkler noted that the REST API offered by activemq is not actually REST-ful at all.

* it's not REST that says GET is a read-only operation; it's HTTP. So
their HTTP implementation is broken. Sadly seems to be pretty common.
* DELETE on a URL representing a queue means you want to delete the
entire queue, not a single message!

He ended with a question "Has anybody thought of a good way to model a queue?". The following is my solution.
A message queue can be thought of as having two ends: one for the producers and the other for the consumers. The producers insert messages into the consumers which will be popped by the consumers. A queue is a simple concept and can be implemented simply in programs. However, once networking capability creeps into the requirements, things can get complicated because network is not reliable. Simple implementations of queue work only because it assumes data transfer within a process is reliable which is a reasonable assumption for in-process activity.
When network is factored in, new problems crop up:

How do you reliably insert messages?

This problem can be paraphrased as doing reliable delivery over HTTP. There are two techniques that address this: Bill de hÓra's HTTPLR and Paul Prescod's.
Using any of these two technique, a producer can POST a message to the same queue URL shared by other producers.
==> POST /queue
    some message
<== 201 Created
    Location: /message/433

How do you reliably allow multiple consumers to pop messages?

This problem expands on the previous one. There is a pop action that has to be considered. In a queue, the pop operation removes and return the head of the queue to the caller. With ActiveMQ, you'd send a DELETE to the queue URL. As Paul Winkler noted, that means the DELETE is applied to the queue, not to individual message.
If one can get a list of messages, then one can repeatedly attempt to pop (by doing DELETE) on each messages. That is, one repeatedly poll for an available message.
==> GET /queue/messages
<== 200 OK
    [list of messages]

// oops, some other client has taken msg #100
==> DELETE /message/100
<== 410 Gone 

==> DELETE /message/102
<== 200 OK
    [message 102]

The polling technique is a form of busy waiting. It is an efficient technique if there is a high probability that the conditions will be met within a short period of time relative to the frequency of polling. To reduce collisions (the 410 Gone in the example above), consumers can try to DELETE randomly.

However, with an increase in number of clients, the frequency of collision increases and efficiency decreases. Furthermore, network is not reliable and client/server can die at any point. When a client issues a DELETE, and it does not receive any response back from the server, it cannot be certain that a message has not been popped for it.

Desirable Properties

So, what are the desirable properties a solution must have?

Client should not Care About Other Client's Existence

The reason polling is done by clients in the example above is because a client has to care about other clients. If it knows for certain that it is the only client, it does not have to poll.
We can put each client in its own virtual world where it can think that it is the only client in existence so it does not have to poll. This is accomplished by giving each client its own access point. Let's implement it by giving each client its own personalised de-queue process resource identified by a unique (per client) URL.

Forgiving to Unreliable Network, Client and Server

In real world, things break unexpectedly2, be it some network admin unplugging a port incidentally, or the server runs out of disk and errors out temporarily, or the client was killed by a OOM condition.
Any implementation of a reliable message queue must be able to at least recover from the above breakages. We can simplify the possible conditions to two cases:
  • Server does not receive the request
  • Client does not receive the response
as the other cases depends on how the client and server are implemented, which I am saving for future topic.

A Solution

The only thing required is for the consumer client to have its own consumer ID. This is not an unreasonable requirement and can be implemented easily (GUID or other simpler mechanisms).
A de-queue is implemented as a sequence of one or more GET requests followed with a DELETE. A GET causes the server to returns a previously bound message to the consumer, if there is any. If there isn't any, return the next unbound message. A DELETE on a message removes the binding on that message.
==> GET /queue/pending_message?consumer_id=12345
<== 200, MSG #32

// repeated GET should yield the same message
==> GET /queue/pending_message?consumer_id=12345
<== 200, MSG #32


// A way to access a particular message
==> GET /queue/message/32
<== 200, MSG #32

// Unbind message 32
==> DELETE /queue/message/32
<== 204

// Since no message is bound to the customer, this GET binds one
GET /queue/pending_message?consumer_id=12345
<== 200, MSG #34

DELETE /queue/message/34
<== 204

// A personal access point allows personalised retrieval criteria
POST /queue/pending_message/filter?consumer_id=12345
<== Return oldest message first
<== 204

// Now retrieves from the oldest to the newest
GET /queue/pending_message?consumer_id=12345
<== 200, MSG #1



[1] I explicitly mentioned 'on HTTP' because there are groups who seem to tied REST and HTTP too strongly for my liking; some even goes as far as equating REST to HTTP.
[2] Does expecting things to break unexpectedly make such breakage expected?