a PHP spider and search engine
Each variation of Sphider has a variation in changes made. The original Sphider (as released by Ando Saabas) appears by default. You may also view the by the PDO fork, the SQlite fork, or the PostgreSQL fork.
Sphider Change Log
Sphider 2.2.0, release date 3 December 2018 Removed the use of tables from the search forms (RAP) Added css to replace the use of tables (RAP) Added a new template for use on mobile devices (RAP, aided by ReddWebDev) Added a mobile browser auto-detect to the search function (RAP) Cleaned up several language files (RAP) Security fix: Turned off PHP error reporting used for development (RAP)] Increased db user_agent column to 50 (from 15) (RAP, suggested by ReddWebDev) Add a tabbed search toggle to the settings page (RAP) Changed files: admin/admin.php admin/auth.php admin/configset.php admin/db_backup.php admin/db_main.php admin/install.php admin/messages.php admin/rss_spider.php admin/spider.php admin/update_rollup.php common_template/footer.php common_template/img_search_form.php common_template/rss_search_form.php common_template/rss_search_results.php common_template/search_form.php common_template/search_results.php include/searchfuncs.php js_suggest/suggest.php settings/conf.php sql/tables.php language/am-language.php language/ar-language.php language/da-language.php language/en-language.php language/fa-language.php language/no-language.php language/sk-language.php language/sq-language.php language/tr-language.php templates/*/search.css (x6)(in templates/air, dark, earth, fire, standard, water) search.php New files: include/detectmobilebrowser.php templates/mobile/header.html templates/mobile/search.css Deleted files: search-160.php Sphider 2.1.0, release date 23 November 2018 Found wildcard search was broken, fixed it (RAP) Found the negate search broken, fixed it (RAP) Changed files: admin/db_backup.php admin/install.php admin/update_rollup.php include/commonfuncs.php sql/tables.php Sphider 2.0.1, release date 15 November 2018 Purged ^M characters (RAP) Updated internal references (RAP) Updated to later version of jquery, 3.3.1 (RAP) Changed files: admin/db_backup.php admin/install.php admin/messages.php admin/spider.php admin.spiderfuncs.php admin/update_rollup.php sql/tables.sql templates/*/header.html (x6)(in templates/air, dark, earth, fire, standard, water) search.php search-160.php Sphider 2.0.0, 26 October 2017 Changed some potentially problematic code (RAP) Improved indexing reliability for pages encoded in other than latin-1 or utf-8 (RAP) Improved responsiveness of "phrase" searches (RAP) Avoided potential problem in "and" and "or" searches (RAP) Fixed a PHP7.1 issue with "too few arguments" for a function (RAP) Added indexing of RSS feeds and associated search (RAP) Integrated Image Indexing into the main spider (RAP) Uncluttered the main search screen when multiple domains are present (RAP) Integrated main search, RSS search, and image search to a single tabbed page (RAP) (Retained search.php from Sphider 1.6.0 as search-160.php for those who desire only a simple content search - slightly modified to stay compatible with other 2.0.0 code changes. To be useful, it needs to be renamed to search.php with the distributed search.php renamed to NEWsearch.php.) Renamed some language files to conform with ISO standards (RAP) (ISO renames): alb- to sq-, cns- to zh_cn-, cnt- to zh_tw-, cz- to cs-, ee- to et-, si- to sl-, swa- to sw-, se- to sv- Created Danish and Norwegian language files (RAP) Updated code to come closer to "best PHP coding practices" (PSR-2) (RAP) (This entailed renaming many of the functions to meet standards rcommendations.) Removed several unused functions (RAP) Updated to later version of jquery, 2.2.4 (RAP) Changed files admin/admin.css admin/admin.php admin/auth.php admin/configset.php admin/db_backup.php admin/db_main.php admin/dbmain.js admin/install.php admin/messages.php admin/spider.php admin/spiderfuncs.php admin/update_rollup.php common_template/categories.php common_template/footer.php common_template/img_search_form.php (new if it doesn't exist) common_template/img_search_results.php (new if it doesn't exist) common_template/search_form.php common_template/search_results.php include/categoryfuncs.php include/commonfuncs.php include/pstem_class.php include/searchfuncs.php js_suggest/suggest.php settings/conf.php settings/database.php languages/*-language.php (all changed) sql/tables.sql templates/*/header.html (x6)(in templates/air, dark, earth, fire, standard, water) templates/*/search.css (x6)(in templates/air, dark, earth, fire, standard, water) search.php New files admin/rss_spider.php calendar/* (new folder, sub-folders and files) common_template/rss_search_form.php common_template/rss_search_results.php language/da-language.php (Danish) language/no-language.php (Norwegian) Deleted files images/images.php (if exists) images/imgsearchfuncs.php (if exists) img_search.php (if exists) include/localutils.php SII_changelog (if exists) SII_Installation_Notes.txt (if exists) Moved files simple_html_dom.php (/images to /include)(new if not exist) Folders /tmp (new) /images (removed if exists) Sphider 1.6.0, release date 14 July 2017 Added the ability to truncate tables (RAP) Added the ability to clear site data while retaining the site and its settings (RAP) Added the ability of crawling from a sitemap (RAP) Added the option of previewing pages on the results listing (RAP) Added support for the optional Sphider Image Indexer (RAP) Fixed error in the Clean links routine (RAP) Fixed problem that caused suspended indexing not to resume (RAP) Fixed error on sites index screen (RAP) Fixed another potential problem with table prefixes containing a hyphen (RAP) Removed more deprecated html for HTML5 compliance (RAP) Changed files admin.php admin.css commonfuncs.php conf.php configset.php db_backup.php db_main.php install.php install.txt messages.php search.css (x6)(in air, dark, earth, fire, standard, water) search.php searchfuncs.php search_results.php spider.php spiderfuncs.php tables.sql update_rollup.php SphiderUserGuide.pdf Sphider 1.5.4, release date 29 May 2017 Added the ability to index decimal numbers (RAP) Added a filter to strip emoticons from text (presence of emoticons interferes with indexing) (RAP) Fixed another potential problem with table prefixes containing a hyphen (RAP) Changed many language files to display proberly in UTF-8 (not a linguist, so ...) (RAP) Added Albanian language (Fatih Ibrahimi) Added Amharic language (sheshu) Added Swahili language (mdoja) Changed files admin.php conf.php configset.php db_backup.php install.php searchfuncs.php sphider.php sphiderfuncs.php tables.sql update_rollup.php SphiderUserGuide.pdf Changed files in the languages folder ar-language.php bg-language.php cns-language.php cnt-language.php cz-language.php de-language.php ee-language.php en-language.php es-language.php fa-language.php fi-language.php fr-language.php hr-language.php hu-language.php it-language.php lv-language.php nl-language.php pl-language.php pt-language.php ro-language.php ru-language.php se-language.php si-language.php sr-language.php sk-language.php tr-language.php New files /languages/alb-language.php /languages/am-languarge.php /languages/swa-language.php Deleted cn-language.php; Unused and superceded by cns-language.php and cnt-language.php Sphider 1.5.3, release date 15 Apr 2017 Fixed a problem where Clean Domains routine would hang when the sites table was empty (RAP) Fixed a problem in which the robots.txt may not be properly recognized and parsed (RAP) Added ability to read the robots.txt file on HTTPS sites (RAP) Fixed a potential PHP error on a break statement (RAP) Fixed a potential problem with table prefixes containing a hyphen (RAP) Changed files: spiderfuncs.php categoryfuncs.php searchfuncs.php search.php admin.php spider.php db_backup.php install.php update_rollup.php tables.sql SphiderUserGuide.pdf Sphider 1.5.2, release date 14-Dec-2016 Fixed issue that sphider would abort a crawl when an improperly coded character was encountered on a web page. A non-fatal error is now thrown, but execution contines on the next page. (RAP) Corrected possible database error when updating settings due to a possible null value (RAP) Sphider 1.5.1, release date 22-Dec-2015 Corrected bug in keyword cleanup routine (RAP) Corrected bug on editsiteform (RAP) Corrected bug in the display "next" pages resulting from a phrase search (RAP) Corrected an introduced bug affecting weighting in results (RAP) Corrected introduced error of possibly over long descriptions being inserted into links table (RAP) Corrected sql error in links cleaning (RAP) Removed a piece of developmental code that was overriding weighted_keyword setting! (RAP) Found and replaced 3 more instances of deprecated code in the backup process (associated with next item) (RAP) Completely rewrote the backup and restore processes, wildly significant improvement in restore time (How wild? 1000x improvement on test database!) (RAP) Added security when an url is displayed (RAP) Altered tables to default to utf-8 (RAP) Changed the way the database tab reports the number of rows. (SHOW TABLE STATUS number of rows reporting is unreliable on InnoDB tables) (RAP) Added sql error handling to all execute statements (RAP) Found and repaired a memory leak (RAP) Fixed bug in the wildcard search that caused ALL domains to be searched, regardless of setting (RAP) Restored stemming algorithm to class status (RAP) Fixed bug, category is now retained for category searches yielding multi-page results (RAP) Fixed bug, category or domain is now preserved when performing a "did you mean" search (RAP) Added ability to clear and reset search form (RAP) Sphider 1.5.0, release date 01-Dec-2015 SuggestFramework, which has proven to be a security risk, buggy, and generally unreliable, has been ditched for jQuery - suggestions now work reliably and safely (RAP) The database Optimize function, which was found to be broken since AT LEAST 1.3.6, was repaired and now functions (RAP) It is likely that when the Optimize feature did nothing, neither did backup and restore. Good thing. The backup was flawed and the restore was deadly. Fixed, now safe (RAP) All queries (except in install and update_rollup) now use only prepared statements, making sql injection virtually imposible (RAP) All $_GET/$_REQUEST/$_POST data is escaped (RAP) All database changes since 1.3.1 have been rolled up into a single script, making updates easier even if you skip a version (RAP) Fixed bug in which all domains were always searched in an advanced search. Now have a choice (RAP) Fixed bug arising from the use of the sph_htmlentities function (introduced in 1.4.1), changed all references to the PHP htmlentities function (RAP) Removed sph_htmlentities from commonfuncs.php (RAP) Subcategories now appear properly when editing a site (RAP) When results are displaying only the two most relevant links per site, the "More results" link now works (RAP) When on a search by category screen, choosing "Search all sites" now functions (RAP) Capping the number of results returned now functions (RAP) Added the ability to purge the domains table of unused domains (RAP) Added the ability to restore settings to default values. Useful after a structure-only restore (RAP) Sphider 1.4.2, release date 15-Nov-2015 Improved language support when entering a search query (RAP) Wildcard (*) search now supported (RAP, MySQLi update of an original mod by Tec) Removed vulnerability of seeing snippets of PHP code from html files (RAP) Added new templates (RAP) Updated html/xhtml to HTML5 (RAP) Indexing of pdf, doc, xls, and ppt files is now possible in a Windows environment (RAP, influenced by a mod by doogle) Bug fix: "Did you mean" suggestions for PHRASE searches now works (RAP) Sphider 1.4.1, release date 06-Nov-2015 THIS VERSION CHAIN DOES NOT INCLUDE 1.3.7 (Stefan Sorg s.sorg[a t]yappadoo.org) HOWEVER, THE FOLLOWING ITEMS HAVE BEEN CHANGED IN ACCORDANCE WITH STEFAN'S CHANGES: Escaped SQL-Queries with mysqli::real_escape_string() (RAP following Stefan Sorg changes, but using non-deprecated code) Escaped HTML-Output with sph_htmlentities() (RAP following Stefan Sorg changes and using his sph_htmlentities function) Moved directory "js_suggest" out of "include" (because it made sense) (RAP following Stafan Sorg lead) ALSO MORE code cleanup! (Sphider code is a real mess) (RAP) SuggestFramework updated to ver 0.31 (RAP) Bug fix in beta: Sanitization filter for search query now allows utf-8 letters, numbers, spaces, hypens, periods (points), and apostrophe. (RAP) Bug fix: Search type (AND, OR, PHRASE) now retained when clicking on a "Did you mean" suggestion. (RAP) Stefan's securing selected directories with .htaccess was not incorporated. Many users are not using Unix/Xenix/Linux and it would only be clutter. Plus, this is someting a user can do on his/her own. Securing directories is never a bad idea. For example, the admin directory should be password secured and accessible ONLY by https. Sphider 1.4, release date 14-Sep-2015 Code updated to conform to PHP 5.6, removed/changed deprecated php code (RAP) Converted code to MySQLi extension (Original MySQL extension deprecated as of PHP 5.5.x, likely to be removed in PHP 7) (RAP) Replaced php shortcut tags (RAP) Added sanitization code to prevent many of the code injection vulnerabilities (RAP) Created a table to hold configuration settings, thus eliminating a huge code injection risk (RAP) Revised install.php to remove a create error, make tables InnoDB, create and insert intial values for the new table 'settings' (RAP) Equivalent changes to tables.sql (RAP) Created update_1-3-6.php to upgrade the 1.3.6 database to InnoDB and add the settings table. General code cleanup, reformatting, and bug fixes (RAP) Sphider 1.3.6, release date 04-06-2013 Code injection vulnerability bug fix (Ando Saabas) Sphider 1.3.5, release date 13-12-2009 Fixed ereg warnings - PHP 6 compatible now (Ando Saabas) Update Bulgarian language file (Martin Halachev) Sphider 1.3.4, release date 29-04-2008 An XSS vulnerability bug fixed (Ando Saabas) Sphider 1.3.4b, release date 11-12-2007 Bug in file download function fixed (Viorel Irimia) Bug with possible bold tag bleeding in result titles fixed (Ando Saabas) Index all does not load keyword table multiple times any more (Ando Saabas) Bug with certain searches returning too many results fixed (Ando Saabas) Sphider 1.3.3, release date 15-09-2007 Sphider now also works on ports other than default 80 (Ando Saabas) By default socket connectability checking removed (Ando Saabas) "Url must contain option" now an OR option instead of AND( Ando Saabas) "Duplicate entry" bug fixed (Ando Saabas) Limit max title size in search results (Tec) A bug in "not" words query fixed (Ando Saabas) Sphider 1.3.2, release date 28-07-2007 Indexing speed improvements (Ando Saabas) Bug with > sign in title fixed (Ando Saabas) Accent conversion bug fixed (Ando Saabas) Bug with accented characters in suggest fixed (Tec) Other minor bug fixes (Ando Saabas) Czech language file added (Marek Èapla) Sphider 1.3.1f, release date 17-03-2007 "Did you mean" string no longer hardcoded in templates (Ando Saabas) Formatting bug due to extra </div> in template fixed (Ando Saabas, Frank Carius) Sphider can now find links which are given via the base tag (Jason Judge) Serbian language file added (Aleks) Slovenian language file added (Damir Kervina) Some minor bug fixes (Ando Saabas) Sphider 1.3.1e, release date 22-11-2006 Problem with reaching urls where trailing slash after domain has been omitted fixed (Ando Saabas) Sphider 1.3.1d, release date 10-11-2006 Spaces in urls are now treated properly (Ando Saabas) Script now also works with short tags turned off (Ando Saabas) Turksih language file added (Ibrahim Kaplan) Sphider 1.3.1c, release date 30-08-2006 Bug with unique word counting fixed (Adam Schneider) Latvian language file added (Kaspars) Sphider 1.3.1b, release date 07-06-2006 Security related bugfixes (Ando Saabas) Sphider 1.3.1, release date 21-05-2006 Stemming support added for English. Uses the stemming algorithm by Martin Porter, implemented in PHP by Richard Heyes (Ando Saabas) As-you-type search suggestions added (a la Google Suggest). Uses the Suggest Framework (http://sourceforge.net/projects/suggest) (Tanel Tõnnisson) "Did you mean" spelling suggestions added (Ando Saabas) Several major speed optimizatios for both indexing and searching (Ando Saabas) Optional domain grouping implemented, such that no more than 2 results from each domain are displayed (a la Google) (Ando Saabas) Session ID-s can now be stripped from urls (Ando Saabas) Database tab in admin section for backing up and optimizing the database (Manu Arponen, Ando Saabas) Security bugfixes (Ando Saabas) Farsi language file added (Sepehr Esmaeili) Sphider 1.3, release date 21-02-2006 Changes: Some minor bugfixes (Ando Saabas) Russian language file added (Thanks to Mihail Korobov) Sphider 1.3 RC2, release date 10-12-2005 Changes: Indexing words with more than 30 characters does not produce a "duplicate entry" warning any more (Ando Saabas) Multiple searches with an apostrophe in keywords now possible (Ando Saabas) Bug with highlighting words with a '+' in front of them fixed (Ando Saabas) Slovak language file added (Fedor Tirsel) Traditional Chinese language file added (Benny) Sphider 1.3 RC1, release date 03-12-2005 Changes: Update of look and feel of admin section (Ando Saabas, Rich Pedley) Configuration of Sphider settings now possible through admin section (Albert Bohlmeijer, Ando Saabas) Indexing results logging into files now possible (Ando Saabas) Spidering notice can be sent to admin e-mail (Ando Saabas) Outputting spidering results to standard out can be turned off (Ando Saabas) Showing categories can be turned off in conf file (Ando Saabas, Albert Bohlmeijer) Possibility to set a minimum delay between file downloads (eg to keep from spamming the server with too frequent requests) (Ando Saabas) Simple template system introduced, searching and result presentation completely separeted (Tanel Tõnnisson, Ando Saabas) Language file format changed to a more sensible one (Ando Saabas) Some missing stripslashes added in admin section (Maxxer) Bug when digging an url from meta refresh tag fixed (Maxxer) Apache fancy indexing paramaters are now ignored (Manu Arponen, Ando Saabas) Wrappers added for indexing powerpoint and excel files (Manu Arponen) Bug with trailing backslashes at url ends fixed. Sphider 1.2.7c, release date 03-11-2005 Changes: A bug with OR searches fixed. Simple Chinese language file added (thanks to Ben). French language file added (thanks to Dan Delsol). Arabian language file added (thanks to Marzook Alshammary). Sphider 1.2.7b, release date 21-10-2005 Changes: Swedish language file added (thanks to Mikael Brodin). Polish language file added (thanks to Michal Charko). Bulgarian language file added (thanks to Martin Halachev). Sphider 1.2.7, release date 29-09-2005 Changes: Support for indexing pdf and doc files via external binaries added. Stopwords are not highlighted in page summary any more. Sphider 1.2.6a RC1, release date 27-06-2005 "Reindex all" now works properly even when indexing parameters haven't been set. Argument variables argv and argc now accessed through $_SERVER superglobal. Empty disallow field in robots.txt treated properly. & tags in urls now converted to &. Italian language file added (thanks to Stefano Paganini). Sphider 1.2.6 RC1, release date 20-06-2005 "Reindex all" option both from command line and admin interface. Indexing options saved with the rest of the site data (used when reindexing). When phrase searching, only the full search phrases in search results are coloured. Possibility to define url must include/must not include string/regular-expression list for a site. Reindexing now checks if a page status has been changed and deletes it from the index if necessary. Some code cleanup. Sphider 1.2.5a RC1, release date 25-05-2005 Indexing and searching numbers is now possible (set in conf.php). Sphider 1.2.5 RC1, release date 13-05-2005 Changes: Support for rel="nofollow" attribute in <a href..> links. Url scheme is now saved in the database, so indexing https pages is possible. Meta descriptions can now be used as page description in results page (set in conf.php). Meta keywords can now be indexed and weight assigned to them. Advanced search form added (can be set in conf.php). OR search added (available via advanced search form). Output by the script is always flushed (immediate feedback in browser). Category list can now have an arbitrary number of columns (set in conf.php). Results and categories page more customizble via css. Main search page renamed from index.php to search.php. German language file added (thanks to Sascha Kuhn). Portuguese language file added (thanks to Static Bit). Some minor bugfixes. Sphider 1.2.3 RC1, release date 27-03-2005 Changes: Possibility to add an arbitrary prefix to Sphider tables in Mysql (thanks to Albert Bohlmeijer). Bug where opened socket was not closed fixed (thanks to Albert Bohlmeijer). Dutch language file added (Jeroen de Bruijn). Reindexing improved. Indexing words in domain name and path now added as an option (turned off by default). Word weight calculation algorithm changed. A bug with calculating page size when page is over 1Mb fixed. Fixed a bug which could prevent words with non-western characters from being indexed. Indexing speed improved. Sphider 1.2.2, release date 20-03-2005 Changes: MD5 sum of each page is checked upon adding them to the database to avoid duplicate entries (eg due to aliases such as http://www.domain.com/ vs. http://www.domain.com/index.html). Authentication changed to session-based. Phrase searching now works with magic_quotes_gpc = On. Fixed a small bug in HEAD query. Fixed some charset issues. Sphider 1.2.1, release date 09-03-2005 Changes: Browsing through multiple search result pages does not increase search count anymore. Spidering now works with allow_url_fopen = Off. Correct reporting of response timeout from server. Handles http code 302 properly. Reporting of http codes in spidering log. Fixed a minor bug with displaying empty brackets when $show_query_scores was turned off (thanks to Shdwdrgn). Sphider 1.2 RC 1, release date 24-02-2005 Changes: Many improvements and new features in Administrator tools Option to exclude parts of pages from being indexed (for example menus appearing in each page) via and tags. Spanish language file (thanks to Claudio Tavares Mastrangelo) Many small fixes and improvements. Sphider 1.1.0, release date 28-01-2005 Changes: Install script added. Bug in calculating the order of pages when searching for multiple words fixed. HTTP query header improved. Deleting categories and sites removes all unnecessary data. Database sometimes not being updated when re-indexing bug fixed. Empty array sorting with certain queries bug fixed. Some HTML and CSS changes and improvements. Internationalization of the search script (language files).