a PHP spider and search engine
The roots of Sphider go back to 2005 when Ando Saabas made the first release. Releases continued to be made into 2009. Since then, with a single security fix in 2013, Sphider has gone unsupported. There have been a number of forks in the original code since that time. Sphider-Pro and Sphider-Plus are the most notable examples, but both are paid versions. Beginning in 2015, worldspaceflight.com began making improvements to the original Sphider and making the newer versions available without charge. The current version is 2.0.0.
Sphider is made available without any warranty, although support is provided the best we can, via the forum. Other users are invited to leaves tips, suggestions, or lend aid as they see fit.
Operating system support
- MySQL (MySQLi/MySQLnd or PDO)
- MariaDB (MySQLi/MySQLnd or PDO)
- SQLite (PDO)
- PostgreSQL (PDO)
Spidering and indexing
- Performs full text indexing.
- Can index both static and dynamic pages.
- Can index images contained in the pages.
- Respects robots.txt protocol, and nofollow and noindex tags.
- Follows server side redirections.
- Allows spidering to be limited by depth (ie maximum number of clicks from the starting page), by (sub)domain or by directory.
- Allows spidering only the urls matching (or not matching) certain keywords or regular expressions.
- Supports indexing of pdf, doc and xls files (using external binaries for file conversion).
- Possbility to exclude common words from being indexed.
- Can index RSS feeds.
- Supports AND, OR and phrase searches
- Supports excluding words (by putting a '-' in front of a word, any page including the word will be omitted from the results).
- Supports wildcard (*) searches.
- Option to add and group sites into categories
- Possibility to limit searching to a given category and its subcategories.
- Possibility of searcing in a specified domain only.
- "Did you mean" search suggestion on mistyped queries.
- Context-sensitive auto-completion on search terms (a la Google Suggest)
- Word stemming for english (searching for "run" finds "running", "runs" etc)
- Can search for images,
- Can search RSS feeds.
- Includes a sophisticated web based administration interface
- Supports indexing via a web interface as well as from commandline - easy to set up cron jobs.
- Comprehensive site and search statistics
- Simple template system - easy to integrate into a site