[Eug-lug] Mailing list search

Garl Grigsby badd_karma at comcast.net
Wed Sep 5 15:22:19 PDT 2007


I am currently the administrator for a mailing list archive/search site 
for a number of internal-only mailing lists (about 20). The current 
search engine is based on HTDig and works "ok". By ok I mean it seems to 
work most of the time. Occasionally it fails to reindex no reason I can 
find, it is horrifically slow to index (~40hrs on a Dual Processor 
Opteron w/4GB of RAM), and the database that the search indices are 
stored in corrupts far to easily. To add to all of this development of 
HTDig seems to have stalled or died completely (not sure which).

Due to a hardware failure on a box that was not being backed up (this 
was not my box), and a few personnel changes, I am now forced to rebuild 
the archive/search system from the ground up. What I am being given is 
access to the mbox files for each mailing list, and pretty much nothing 
else. I have no access to the admin functions on the mailing list 
server, nor can I get any changes made to its configuration.

I am leaning toward using Mhonarc to create the archive. What I need 
suggestions on is a search engine. I am looking for something that can 
handle a fairly large archive of messages, say on the order of 100-150k 
messages, that can easily index only new messages, and that can search 
groups of messages(i.e. I would like it so that you can search across a 
selected group of mailing lists, all lists, or only a single list).  I'd 
also like something that used a standard DB as the backend (MySQL,  
Postgres, or something similar).

Due to the nature of the lists, I cannot use an external search engine. 
Everything must be kept in house. The server I have to host this on is 
running RHEL 5 and Apache. I have complete control of this server, so I 
can make changes as I see fit (other than changing the OS).

So does anybody have any suggestions on a search engine that they have 
used that seems to work well? Did I leave anything out? I see a kitchen 
sink in the corner I didn't mention, but.....

-G


More information about the EUGLUG mailing list