* desktop search engines
Posted on December 6th, 2008 by Alex. Filed under Linux.
Yesterday I was eagerly searching for a very specific file. However I forgot the filename and could only think about some possibilities abouts its content. And I knew, it is an Openoffice document.
Well, as my standard operating system is Linux, the first try was using command line tools such as find and fgrep or zgrep. Openoffice saves the documents as zip file, which can be unzipped as usual. So it should be easy to find the file, decompress it temporarily, search for the keyword in it and proceed to the next file. However both tools are very powerful and just to figure out the necessary parameters took some while. In the end running a test did not give satisfying results and I decided to look for alternatives.
Google Desktop
Help could be provided by a so called desktop search program. First I tried out Deskbar by Google. Unfortunately it comes with lots of gimmicks that have nothing to do with the search itself (at least in the Windows). In addition I wanted to find the file, but the indexer used by Google, was not convincable to consume all its resources it could get to built the index. It ran only, when the computer was idle. So I waited for more than one hour (went for dinner) to find out, that the files where I suspect the searched content was yet to be searched in the next 2.5 hours of idle time. But I wanted to have the file now!
Google Deskbar will send data to the Google server and as soon as I connected to the Internet I got the latest news, weather report etc. In addition I am concerned about my privacy, since the content of my files is nothing for third parties. I aborted the indexing and removed the program again. So far I did not find the file.
Summary:
+ easy and well known interface
+ free of cost
+ available for multiple operating systems
+ extendable with plug-ins
- comes with lots of gimmicks
- very slow indexer which has a strict interpretation of idle time.
- does not search Openoffice documents (despite the plugin)
- send data to the Google server to identify you
- privacy issues (e.g. allows to store the index on the Google server to search your network drives)
Copernicus
The next try was Copernicus which is also free of cost. Fortunately it is a program that does exactly what it supposed to do. The indexer was quite fast and in the preferences I could define the drives and directories that should be processed. Simply this setting gave a tremendous speedup. The indexer was not as careful with the resources as the one in Deskbar. Whenever the computer was idle, it took everything what it could get. However there is no Linux version of this program available.
Summary:
+ free of cost for non-commercial home use
+ fast indexer
+ search can be limited to drives/directories
- only available in Windows
Beagle
As a alternative running in Linux is Beagle. Many repositories have ready made packages to install Beagle on the Linux machine. It provides a intuitive and easy-to-use graphical interface.
Summary (not yet complete):
+ free of cost
+ open source, so no misuse of sensitive data
- requires mono which could be a huge overhead
Others
Maybe you also want to have look at:
- Searchmonkey (http://searchmonkey.sourceforge.net/index.php/Main_Page)
Hides the functional power ofgrep,find, etc below a GUI. - Recoll (http://www.lesbonscomptes.com/recoll/index.html)
Should be very light weighted; no server, daemon, etc necessary; extendable by helpers to index PDF and other files; supports stemming