a PHP spider and search engine
If in doubt as to which version to download, read this.
PHP 5.4 or greater
MySQL 5.5.3+ or MariaDB
PHP 5.4 or greater
MySQL 5.5.3+ or MariaDB
Be sure to check this blog post before requesting the PDO version.
Binaries and source code
Converts .doc, .ppt, & .xls files to text
Which version should I download?
The version downloaded depends on your needs. SphiderLite is most like the original Sphider, while Sphider (MB) has added capabilities. It can index images and RSS feeds. Both have MySQLnd requirements that not all hosts provide. The PDO version is then needed. Older PHP installations may need something not using prepared statements. Read on for more details.
Sphider 4.0.0-MB the most robust version. It can index not just text content, but can also index images. There is also an RSS feed indexing capability. his version DOES require both the MySQLi and MySQLnd extensions to PHP. Some hosting providers have DISABLED the MySQLnd extension, particularly for clients using shared hosting. Check with your system administrator or hosting provider to assure that the MySQLnd extension is available.¹ The installed database used is either MySQL 5.5.3+ or MariaDB. This version can handle multibyte strings. While the PHP mbstring extension is recommended, it is not required. In the absense of the mbstring extension, multibyte string functions will be emulated.
SphiderLite 2.0.0 has the same requirements as Sphider 4.0.0-MB. For normal site indexing, it is the same as Sphider 4.0.0-MB. However, SphiderLite has no RSS or image indexing/search capabilities.
Sphider 2.4.3 PDO does not use any MySQL extentions, but DOES use PDO (PHP Data Objects). This variation is intended for situations in which the MySQLnd extension has not been made available. (Run the mysqlnd_check tool to confirm this.) Like Sphider and SphiderLite, the database used is MySQL 5.5.3+ or MariaDB. This version is obsolete and WILL NOT WORK UNDER PHP 8!!! Available for download by request only.
Sphider 1.4.2 is last version to not use prepared statements. There are still a few installations which use an old enough version of PHP that later versions of Sphider using more secure prepared statements just won't work. The good part is that 1.4.2 is usable in those cases. The downside is that it is not as secure and lacks the bug fixes and improvements of later versions. By PHP 7.1, this version will not work. If your PHP predates 5.4, this is the version for you. (Sphider 1.4.2 is available for download upon request only. No support is provided.)
Option utility for Windows
Catdoc is an optional, third party add-on for Windows users who wish to be able to convert *.doc, *.ppt, and *.xls files to text. On Linux systems, check with your system administrtor or hosting provider to see if this feature is available. It is NOT required for the conversion of pdf files to text.
Catdoc is a port to Windows of catdoc, catppt, and xls2csv utilities found in Linux. This is a third-party compilation containing Windows binaries as well as source code. We have done some basic testing from a Windows command prompt. We used a Windows 7 x64 based machine. First, a directory C:\bin\linux2winports was created. From the zip file (provided as a download) we extracted the three exe files and the charsets directory to the directory created. We got 2 errors complaining of the extraction failing to set timestamps on two of the files. Ignore the warnings. The extracted binaries are win32 (for x86), but they worked in our x64 environment. We did not try any of the options but simply a command like "catdoc somefile.doc", "catppt someppt.ppt", and "xls2csv spreadsheet.xls". We received expected outputs, so the port does work, at least on a basic level.
The recommendations are to use the pre-compiled binaries provided, but if you have the know-how to make your own binaries, you are free to do so. The binaries are win32, but it may be possible to use the source to make x64 binaries. We don't know, haven't tried. This package is provided as-is.
Common Text Files
While indexing, Sphider excludes common words from indexing. If you wish to see what those words are, check your Sphider installation for include/common.txt. This is a simple text file listing the excluded words. The problem is, this is a list of English words. If you are indexing a site in some other language, it becomes pretty useless.
While you may replace common.txt with another of your own making, here are a couple of pre-made lists which can be substituted for common.txt. Simply rename the existing common.txt to something like en_common.txt, and rename one of these to common.txt.
The number of pre-made lists is short, but feel free to make your own, and maybe even share them!
NOTE 1: Download, extract, and run this script.
This will tell you which Sphider you can use. This is a definitive check.
The above method is definitive. You can also create a script:
Upload this to your server and run it. If you get results like this(in the"mysqlnd" section, last line, "API Extensions", should contain at least "mysqli") you SHOULD be good, but SOME hosting companies may STILL block actual access to MySQLnd for users on shared hosting plans. If there is no "mysqlnd" section, or "API Extensions" shows "no value", mysqlnd is not enabled.
If you find that mysqlnd is not enabled, you may still be able to enable it. Here is a blog post which may help.