Originally posted to the Scary Devil Monastery on 2004.02.09. To be fair, OpenLDAP has gotten much more stable since then.
Rune Kristian Vikenwrote: > > I suddenly realize that the ldap database contains far too few users, > that is, no users since November seem to exist in the database. Curses > fly through the room. OpenLDAP, at least, seriously sucks. If it weren't better than all the alternatives for our setup, I'd be tempted to find all of the authors and go back in time to kill all four grandparents of each one, just to be sure. I discovered the marvelous failure-to-sync behaviour in a couple different ways as I underwent the joys of a NIS-to-LDAP migration. The first time around, the database appeared to have taken some corruption and was no longer writing anything to disk, although it was handling new entries fine from the memory allocated by the processes. I discovered this when investigating why simple operations that used to be merely annoyingly slow[1] had become sanity-sucking slow, and when I brought it down for a few seconds to reindex it as I had done a few times before when I was still working out which attributes to index slapindex hung. So, puzzled, and not wanting users on my back while I figured it out[2], I restarted slapd. Or tried to, anyway. I knew I should never have let the stock of black candles get so low. There weren't enough left for a proper ritual. Anyway, after much cursing, and the discovery that of course the backups of the database directory directly are *also* corrupt, and more cursing, and telling users who can no longer log in to go away and leave me alone for more than a minute at a time so I CAN ACTUALLY GET SOME WORK DONE, I manage to get a slapcat out of it, that hangs *IN THE MIDDLE OF SPITTING OUT AN ATTRIBUTE IN THE MIDDLE OF AN ENTRY, AND, IMPOSSIBLY, IN A DIFFERENT FSCKING PLACE DEPENDING ON HOW I PIPE IT!* Ahem, excuse me. I've been told that since the last budget cuts, any more increases on the graduate student health insurance premiums will come out of my paycheck, so I haven't been getting my necessary stress relief in. It's been so bad that I haven't even bothered to replace the batteries in the cattle prod. So anyway, the last few entries that we added were unrecoverable, including an account made for an attractive undergraduate that my PFY has been getting friendly with. PFY isn't entirely unhappy, because she'll have to come in to see him to get her password reset. Still, I'm a little nervous, so I make a cronjob to start making straight text LDIF dumps of the database so that in the case that something like this happens again, I'll have an up-to-date backup from the previous night that can't be corrupt. By now, all the monks here are probably laughing at me, but that's okay. I've laughed at a lot of you, too. What would the monastery be without the camaraderie of mutual schadenfreude? So of course, a month later, I come in and note that the system is frightfully sluggish again. I fight the adrenaline hit and calmly clone the ldap data to a second server to check it out. Corrupt. Okay, no problem, I know exactly how to get around it now, so downtime will be on the order of seconds. No problem. I'll just check on the backup LDIF just to make sure that everything is ... WHAT DO YOU MEAN IT TRUNCATED IN THE MIDDLE OF AN ENTRY *EVEN BEFORE* THE END OF MY LAST MANUAL DUMP THAT WAS FINE?! IT WAS RUNNING PERFECTLY WHEN I LEFT LAST NIGHT! *cough* Anyway, no helping it now, so I recover as much as I can, and reload the system. The users don't even notice, with the exception of the owners of a couple recently recreated accounts, including the previously mentioned undergraduate, who is now starting to get a little suspicious at how often she's ending up in the PFYs office. PFY takes one look at my expression and wisely keeps his comments neutral. So, anyway, I have a much more solid workaround in place now, but describing it would be UI, so all you buggers who were snickering earlier can just hang if you need it. It should have been transparently obvious to me in the first place, so it should be transparently obvious to you by now. And I haven't had any problems with it since then, probably because whatever mocking power it is that keeps poking at my life has decided that there's easier targets. Like the versioning issues with perl and db, for instance, that cause perl to spin its wheels endlessly on some indexing operations, or the half-dozen lines of code I added to my automatic updates system that failed last Saturday in a manner utterly inconsistent with the perfectly clean results on I observed my independent test systems only last Friday. But that's another rant. > I need some whisky. Annoyingly, I now find nauseating all the stuff I used to like when I was a teenager, though at last year's Usenix I discovered a new fondness for margaritas if sufficiently smooth (a concept which nobody seems to understand out where I live, where margaritas seem to be made almost entirely out of poor tequila, though I suppose I can start making my own). What are people recommending now? [1] Lightweight my ass. The fact that X.509 has the weight of an 18-wheel rig doesn't make a minivan something you shove in your backpack. [2] Yes, I know what a replica server is. It wasn't working for reasons I'm not going to get into. Go away.
Trackback URL for this post:
http://www.resonant.org/trackback/16

OpenLDAP Really Really SUCKS
I agree, and have experienced all the same issues like this ...
If OpenLDAP team blames BerkleyDB, why the hell they dont use some other backend databse as default.
My problem is simple but very critical; My ldap data got usually corrupt saying that it has some indexing issue, I tried to turn off all indexing it worked fine, but the LDAP_SEARCH was dam slow. In my case indexing is vital.
I dont know why OpenLDAP even exists, its good only as a toy for very small entries. (i am talking about less then 100).
If there exists a solution, I would more then glad to know. Else I will be shifting to the Great Microsoft World ! Where everything atleast is usable.
I am really really pissed off with OpenLDAP data corruption.
Posted on several mailing lists and http://www.ldapguru.com, but no one has ever replied ... !!!
If you know any solution, instead to turn indexes off that will be more then a bottel of wine for me.
It's a pity...
... that everything else is worse. Even with the problems, I'm very happy to be finally through with NIS, and I'm not ready to move to a complete Kerberos infrastructure. That didn't keep me from being royally pissed at the time, of course.
To be fair, with the current version (2.1.30) my only problem seems to be with very sporadic performance breakdowns (it grinds nearly to a halt every other month or so), and on SMP systems only (my uniprocessor servers are fine). Those can be fixed simply be restarting slapd periodically. Changing backends didn't help — I experienced exactly the same problem with LDBM as with BDB. I suspect a very obscure race condition somewhere.
I'm running about 1200 entries on the SMP system, and only 200 on the uniprocessor system, though, which may have something to do with it.
By the way, I have seen ActiveDirectory collapse as well, and it's far harder to clean up, so going to Microsoft won't help you there, plus it gets you all of the baggage of a Microsoft server. If you want commercial support, you might want to give Netscape Directory Server a try, or go with one of the places that provides support for OpenLDAP. (I'm not endorsing any of the places in those links, but just noting them as a possibility.)
It's scary how close our situations are...
I also have a script to backup my ldap daily.
I even modified the postfix config NOT to start up if LDAP is in it's successfully-started-but-hung state.
Here is my restore script:
/etc/init.d/slapd stopkillall -9 slapd
rm -rf /var/lib/openldap-data
mkdir /var/lib/openldap-data
chown ldap:ldap /var/lib/openldap-data
/etc/init.d/slapd start
ldapadd -x -D cn=admin,dc=blah,dc=co,dc=za -W < backup.ldif
I've done this many times, and all forums do is blame filesystem/os/db instability... but no clue on how to resolve it.
-cry-
I think LDAP is the only really good and neat solution, but if I cannot get some stabiulity (the ldap db coruupts practically after every system crash and some plain reboots)
Marius van Wyk
LDAP backups
Make sure your backup scripts generate staggered backups over the last week or two. It's quite possible to generate two days in a row worth of corrupt backups without realizing it.
Also, I've only ever had problems on SMP machines. You may want to make your main auth server single-processor and see if that helps.