Sphider
a PHP spider and search engine
If in doubt as to which version to download, read this.
Available Downloads
PHP 7.0 or greater
MySQL 5.5.3+ or MariaDB
PHP 7.0 or greater
MySQL 5.5.3+ or MariaDB
Be sure to check this blog post before requesting the PDO version.
Binaries and source code
Converts .doc, .ppt, & .xls files to text
Which version should I download?
The version downloaded depends on your needs. SphiderLite is most like the original Sphider, while Sphider has added capabilities. It can index images and RSS feeds. Both have MySQLnd requirements that not all hosts provide. The PDO version is then needed. Older PHP installations may need something not using prepared statements. Read on for more details.
Sphider 5.3.0 the most robust version. It can index not just text content, but can also index images. There is
also an RSS feed indexing capability. This version DOES require both the MySQLi and MySQLnd extensions to PHP.
Some hosting providers have DISABLED the MySQLnd extension, particularly for clients using shared hosting. Check with
your system administrator or hosting provider to assure that the MySQLnd extension is available.¹
The installed database used is either MySQL 5.5.3+ or MariaDB. This version can handle multibyte strings. The PHP
mbstring extension is required. 4.2.0-MB requires PHP 7.0.0 or greater!
(4.1.0-MB will work with PHP 5.5.3+ through PHP 8.0.
There are issues in PHP 8.1.)
SphiderLite 2.4.0 has the same requirements as Sphider 5.3.0. For normal site indexing, it is the same as Sphider 5.3.0.
However, SphiderLite has no RSS or image indexing/search capabilities, nor ability to produce a links report. 2.3.1
requires PHP 7.0.0 or greater!
(2.1.0 will work with PHP 5.5.3+ through PHP 8.0. There are issues in PHP 8.1.)
(Sphider 4.1.0-MB or SphiderLite 2.1.0 can be obtained upon request.)
Sphider 2.4.3 PDO does not use any MySQL extentions, but DOES use PDO (PHP Data Objects). This variation is intended for situations in which the MySQLnd extension has not been made available. (Run the mysqlnd_check tool to confirm this.) Like Sphider and SphiderLite, the database used is MySQL 5.5.3+ or MariaDB. This version is obsolete and WILL NOT WORK UNDER PHP 8!!! Available for download by request only. NO support is provided.
Sphider 1.4.2 is last version to not use prepared statements. There are still a few installations which use an old enough version of PHP that later versions of Sphider using more secure prepared statements just won't work. The good part is that 1.4.2 is usable in those cases. The downside is that it is not as secure and lacks the bug fixes and improvements of later versions. By PHP 7.1, this version will not work. If your PHP predates 5.4, this is the version for you. (Sphider 1.4.2 is available for download upon request only. NO support is provided.)
Optional utility for Windows
Catdoc is an optional, third party add-on for Windows users who wish to be able to convert *.doc, *.ppt, and *.xls files to text. On Linux systems, check with your system administrtor or hosting provider to see if this feature is available. It is NOT required for the conversion of pdf files to text.
Catdoc is a port to Windows of catdoc, catppt, and xls2csv utilities found in Linux. This is a third-party compilation containing Windows binaries as well as source code. We have done some basic testing from a Windows command prompt. We used a Windows 7 x64 based machine. First, a directory C:\bin\linux2winports was created. From the zip file (provided as a download) we extracted the three exe files and the charsets directory to the directory created. We got 2 errors complaining of the extraction failing to set timestamps on two of the files. Ignore the warnings. The extracted binaries are win32 (for x86), but they worked in our x64 environment. We did not try any of the options but simply a command like "catdoc somefile.doc", "catppt someppt.ppt", and "xls2csv spreadsheet.xls". We received expected outputs, so the port does work, at least on a basic level.
The recommendations are to use the pre-compiled binaries provided, but if you have the know-how to make your own binaries, you are free to do so. The binaries are win32, but it may be possible to use the source to make x64 binaries. We don't know, haven't tried. This package is provided as-is.
Common Text Files
While indexing, Sphider excludes common words from indexing. If you wish to see what those words are, check your Sphider installation for include/common.txt. This is a simple text file listing the excluded words. The problem is, this is a list of English words. If you are indexing a site in some other language, it becomes pretty useless.
While you may replace common.txt with another of your own making, here are a couple of pre-made lists which can be substituted for common.txt. Simply rename the existing common.txt to something like en_common.txt, and rename one of these to common.txt.
The number of pre-made lists is short, but feel free to make your own, and maybe even share them!
NOTE 1: Download, extract, and run this script.
This will tell if the latest Sphider will work for you. This is a definitive check.
The above method is definitive. You can also create a script:
<?php
phpinfo();
?>
Upload this to your server and run it. If you get results like this(in the"mysqlnd" section, last line, "API Extensions", should
contain at least "mysqli") you SHOULD be good, but SOME hosting companies may STILL block actual access to MySQLnd for users
on shared hosting plans. If there is no "mysqlnd" section, or "API Extensions" shows "no value", mysqlnd is not enabled.
CPanel settings to enable mysqlnd:
If you find that mysqlnd is not enabled, you may still be able to enable it. Here is a
blog post
which may help.