Pen goes extremely slow occasionally

From: Chris St. Pierre <stpierre_at_NebrWesleyan.edu>
Date: Mon, 3 Nov 2008 10:12:00 -0600 (CST)

We've been using pen to load balance among our four LDAP servers for
several months now, and are quite pleased with it. Last week, quite
suddenly, pen would occasionally (and suddenly) start taking _ages_ to
answer requests. Sometimes restarting it would solve the problem;
some times it wouldn't. At first we thought the problem was the
number of connections to pen, but we've seen this happen with no
connections at all.

This is our pen command line:

pen -u pen -l /var/log/pen/ldap.log -p /var/run/pen/ldap.pid \
  -n -C 127.0.0.1:10389 -x 2048 -d 10.1.1.57:389 chico groucho harpo zeppo

Strangely enough, despite the fact that I've given it the -n flag, it
doesn't appear to be in no-block mode:

# penctl 127.0.0.1:10389 mode
block no delayed_forward no hash no roundrobin no stubborn no weight no prio

Putting the server in no-block mode doesn't appear to help.

Running an LDAP query and looking at the debug logs, it looks like pen
doesn't even log the connection for several seconds. E.g.:

% date; time ldapsearch -xLLL -h 10.1.1.57 "uid=stpierre" dn
Mon Nov 3 10:02:20 CST 2008
dn: uid=stpierre,ou=People,o=NebrWesleyan.edu,o=isp

ldapsearch -xLLL -h 10.1.1.57 "uid=stpierre" dn 0.00s user 0.00s system 0% cpu 14.007 total

(14 seconds!! Normal LDAP queries take approx. 0.015 seconds.)

That doesn't appear in the pen logs for eleven full seconds:

Nov 3 10:02:31 zeppo pen: Client 10.9.3.7 has index 46
Nov 3 10:02:31 zeppo pen: lookup_client returns 46
Nov 3 10:02:31 zeppo pen: Trying server 3 at time 1225728151
Nov 3 10:02:31 zeppo pen: match_acl(0, 117639434)
Nov 3 10:02:31 zeppo pen: Successful connect to server 3
Nov 3 10:02:31 zeppo pen: Client 10.9.3.7 has index 46 and server 3

(Yes, clocks are all synched.)

Those logs were collected when only two other hosts were trying to use
pen, both low volume. There were a total of 75 connections to pen;
we've done _loads_ more than that before.

Nothing obviously changed before this started.

I'm running RHEL 5 x86_64.

All ideas are welcome. Thanks!

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University
Received on Mon Nov 03 2008 - 17:19:37 CET

This archive was generated by hypermail 2.2.0 : Mon Nov 03 2008 - 17:19:37 CET