2005-12-09

Implementing application-level timeout

I was tasked to fix an app that had been in production for two years. The app was having trouble communicating with a server of a new customer. At times, it would just stuck in read() call even though it was supposed to have timed out in 2 minutes.
So I took a look at the source:
 
public void createConnection()
  {
    socket = Socket.new(/*....*/);
    s.setSocketTimeout(TWO_MINUTES);
    output = socket.getOutputStream();
    input = socket.getInputStream();
  }

  public void sendPing(Socket s) 
  {
    output.write(PING_MESSAGE);
    /* ... */
    input.read();
    /* ... */
  }


This is the wrong way to implement application-level timeout. For, you see, it turned out this particular server was sending a TCP Keep Alive packet every 1 minutes 15 seconds. I have no idea why they did that (the default value should be around 2 hours), but they did anyway, and they were a valuable customer.
Telling the customer, 'you are incapable of configuring server' just wouldn't do, and it was also not totally their fault.


Our fault was in using socket's timeout for application-level timeout. Java's Socket#setSocketTimeout method corresponds to setting SO_RCVTIMEO and SO_SNDTIMEO socket options in BSD Socket API.
Each packet received, whether it contains application level data or not, reset the time out timer. Those Keep Alive packets were preventing the timeout timer from reaching the two minutes mark.

The easiest solution to this problem was to create a timer service. Each time you want to do a socket operation, you register to the timer service and unregister afterward.

import javautils.fun.VoidToVoid;


public class TimerService extends Thread
{
  /* a singleton */
  public synchronized TimerService getInstance() { /* ... */ }

  public void run()
  {
    while (true) {
     mutex.acquire();
     try {
       interruptTimedOutThread();
     } finally {
       mutex.release();
     }
     sleep(ONE_SECOND);
    }
  }

  public void timeout(int timeout_millisecond, VoidToVoid func)
  {
    try {
      register(Thread.current, timeout_millisecond);
      try {
        func.call();
      } finally {
        unregister(Thread.current);      
      }
    /* 
       Convert exceptions caused by interruption outside
       of the register-unregister block because we don't want
       exception handling to be interrupted
    */
    } catch (ClosedByInterruptException e) {
      throw new TimedOutException("timed out", e);
    } catch (InterruptException e) {
      throw new TimedOutException("timed out", e);
    }
  }
}

public class ConnectionToServer 
{
  public void sendPing(Socket s)
  {
    TimerService timer = TimerService.getInstance();
    try {
      timer.timeout(TWO_SECONDS, new VoidToVoid() {
        public void with() 
        {
          output.write(PING_MESSAGE);
          /* ... */
          input.read();
          /* ... */
        }});
    } catch (TimedOutException e) {
      /* ... */
    }
  }
} 
 
(originally from http://microjet.ath.cx/WebWiki/2005.12.09_Implementing_Application-Level_Timeout.html)

No comments: