2008-03-11

qmail in resource-constrained environment

An important thing to note when using virtualised environment
such as OpenVZ is that resources are artificially constrained. This can cause various funny and head-scratching behaviours.

My OpenVZ VE (plan VZ 128) from quantact.com, runs apache2 and qmail (as spooler).
I had to adjust tcpserver connection limit to 3 (tcpserver -c 3), and qmail-send's concurrencyremote to 1 (echo 1 > /var/qmail/control/concurrencyremote).

Each incoming SMTP request triggers an RBL lookup and may trigger a spamassassin check. The default tcpserver's incoming connection limit is 40 and there just isn't enough resources to support 40 spamassassin instances (1 spamd with 40 spamc processes). Instances will fail with various puzzling error messages. The only clear indication that they are failing because of resource constraint is by comparing the content of /proc/user_beancounters before and after failures.

Furthermore, if the remote host, where the real mail server is, is down for a period such that enough mails are queued up, when the remote host is up again, qmail-send will start 20 qmail-remote processes (the default) to send the mails all at the same time. Each process will send email to the real mail server through a stunnel connection. But there just aren't enough resources to support 20 qmail-remote processes along with 20 stunnel connections. Again, the only way to know this for certain is by comparing the content of /proc/user_beancounters before and after failures. Otherwise, all you are getting are error messages like below which could have been caused by various things like network or firewall problem:

2008-03-11T06:53:02.78164 2008.03.11 02:53:02 LOG3[11407:3058117552]: SSL_accept: Peer suddenly disconnected


Once the concurrency level is brought down, force qmail to resend the queue:

qmail-qstat; qmail-tcpok; pkill -ALRM qmail-send; qmail-qstat

No comments: