PHP DevCenter

oreilly.comSafari Books Online.Conferences.

We've expanded our LAMP news coverage and improved our search! Search for all things LAMP across O'Reilly!

Search
Search Tips

advertisement

Listen Print Discuss Subscribe to PHP Subscribe to Newsletters

Building a Simple Search Engine with PHP
Pages: 1, 2, 3

The Search Interface



Of course, users will not be able to work with the MySQL database directly. Therefore, we'll create another PHP script that provides an HTML form to query the database. This works just like any other search engine. The user enters a word in a textbox, hits Enter, and receives a page of results linked to the appropriate pages. The result order depends on the number of times a keyword appears in each document. The search.php script is listed below.

<?

/*
* search.php
*
* Script for searching a database populated with keywords by the
* populate.php-script.

*/

print "<html><head><title>My Search Engine</title></head><body>\n";

if( $_POST['keyword'] )
{
   /* Connect to the database: */
   mysql_pconnect("localhost","root","secret")
       or die("ERROR: Could not connect to database!");
   mysql_select_db("test");

   /* Get timestamp before executing the query: */
   $start_time = getmicrotime();

   /* Set $keyword and $results, and use addslashes() to
    *  minimize the risk of executing unwanted SQL commands: */
   $keyword = addslashes( $_POST['keyword'] );
   $results = addslashes( $_POST['results'] );

   /* Execute the query that performs the actual search in the DB: */
   $result = mysql_query(" SELECT p.page_url AS url,
                           COUNT(*) AS occurrences 
                           FROM page p, word w, occurrence o
                           WHERE p.page_id = o.page_id AND
                           w.word_id = o.word_id AND
                           w.word_word = \"$keyword\"
                           GROUP BY p.page_id
                           ORDER BY occurrences DESC
                           LIMIT $results" );

   /* Get timestamp when the query is finished: */
   $end_time = getmicrotime();

   /* Present the search-results: */
   print "<h2>Search results for '".$_POST['keyword']."':</h2>\n";
   for( $i = 1; $row = mysql_fetch_array($result); $i++ )
   {
      print "$i. <a href='".$row['url']."'>".$row['url']."</a>\n";
      print "(occurrences: ".$row['occurrences'].")<br><br>\n";
   }

   /* Present how long it took the execute the query: */
   print "query executed in ".(substr($end_time-$start_time,0,5))." seconds.";
}
else
{
   /* If no keyword is defined, present the search page instead: */
   print "<form method='post'> Keyword: 
          <input type='text' size='20' name='keyword'>\n";
   print "Results: <select name='results'><option value='5'>5</option>\n";
   print "<option value='10'>10</option><option value='15'>15</option>\n";
   print "<option value='20'>20</option></select>\n";

   print "<input type='submit' value='Search'></form>\n";
}

print "</body></html>\n";

/* Simple function for retrieving the current timestamp in microseconds: */
function getmicrotime()
{
   list($usec, $sec) = explode(" ",microtime());
   return ((float)$usec + (float)$sec);
}

?>

The script may be called with or without the keyword argument. If it's defined, the script searches for that word in the database. It will also show the length of time it took to process the query. Otherwise, the script presents the search page instead. That page will resemble Figure 1.


Figure 1 - our simple search page

Let's search on the keyword linux. Our dataset produces results similar to Figure 2.


Figure 2 - the search results page

As expected, onlamp.com appears first on the result page because the keyword linux appears more frequently on this site than on the others. A search for java would probably get onjava.com on the top, and 'xml' would most likely generate the most hits for xml.com. Also note that we've limited the results to the five most interesting pages.

Speeding Up the Database

As the bottom of the results page shows, the query took 0.393 seconds to execute. While this may not seem like an incredibly long time, it does represent quite a hit as the database grows. Fortunately, since we're using a database, there's a very simple solution.

CREATE INDEX word_word_ix ON word (word_word);

This will create an index in the word table on the word_word column. Since all of our searches start with this column, the database will find the appropriate pages much more quickly. To prove this point, we will search for the keyword linux again, to see if we gained any performance. See Figure 3.


Figure 3 - searching with an index

Nice. It took 0.028 seconds, a speed increase of 0.365 seconds, or 1,400 percent. If this engine handled an average of 1,000 queries per hour, this would mean a savings of about 144 minutes per day.

Summary

As shown in this article, useful search engines can be built pretty simply. Without much hassle, you could develop this concept further to handle multiple keywords, boolean operators, stop words, and other features you find in many commercial search facilities. It would also be interesting to populate the database further with a few hundred megs of data. Would the speed still be reasonable? Probably. One thing we could be absolutely sure of, however, is that for an intranet of a mid-sized company with just a few dozen searches per hour, this solution can offer stunning performance with minimal setup.

Whether you're planning to develop a big-scale commercial search engine, or are just playing around, http://www.robotstxt.org/wc/robots.html offers lots of helpful and interesting reading on this topic. For example, it describes the use of the standardized robots.txt file, which every Internet spider should use to determine what it can and can't do on a specific site. Please read and follow the rules if you don't control the sites you want to search.

I wish you good luck and look forward to getting a visit from your spider soon. :)

Daniel Solin is a freelance writer and Linux consultant whose specialty is GUI programming. His first book, SAMS Teach Yourself Qt Programming in 24 hours, was published in May, 2000.


Return to the PHP DevCenter.


What enhancements would you make?
You must be logged in to the O'Reilly Network to post a talkback.
Post Comment
Full Threads Oldest First

Showing messages 1 through 62 of 62.

  • Help
    2010-04-29 14:24:22  joe--- [Reply | View]

    Help I created populate.php and connected to my database how do I populate the database?
  • Bugfixes, Better Error Checking and Full Project Download
    2009-01-10 06:56:33  ZIMSICAL.com [Reply | View]

    I have posted a .zip download of all of the project files including the php files, the database .sql file and an .htaccess file (for users having permission issues with their php setup). This should take care care of everyones problems. Instructions are included, please email me with any questions.

    http://www.zimsical.com/portfolio/php/oreilly-search-engine-tutorial-fixed.zip
    • Bugfixes, Better Error Checking and Full Project Download
      2010-02-10 00:51:18  Gyverr [Reply | View]

      Hi

      The link doesn't seem work for me, would be awesome if you could fix this asap :)
  • auto index lower level pages
    2008-05-02 15:39:37  lektrikpuke [Reply | View]

    It's nice that it does one page, but what about a site (all lower levels)?
  • DEFINE A URL
    2007-12-30 11:44:57  Arsench [Reply | View]

    Hello Im a new in php and need a help please.Im puting the code and receiving the error You need to define a URL to process.Please can you give a example where I have to define a url,and what kind of url?


    Thanks
  • IIS problems
    2007-10-16 06:49:35  snoski3 [Reply | View]

    I guess this is supposed to work from what most posts have been saying. However, the populate and search php files don't work for me. The populate php doesn't give me an error yet there are no new entries in the database. The search page looks like this:

    \n"; for( $i = 1; $row = mysql_fetch_array($result); $i++ ) { print "$i. ".$row['url']."\n"; print "(occurrences: ".$row['occurrences'].")

    \n"; } /* Present how long it took the execute the query: */ print "query executed in ".(substr($end_time-$start_time,0,5))." seconds."; } else { /* If no keyword is defined, present the search page instead: */ print "
    Keyword: \n"; print "Results: 5\n"; print "1015\n"; print "20\n"; print "
    \n"; } print "\n"; /* Simple function for retrieving the current timestamp in microseconds: */ function getmicrotime() { list($usec, $sec) = explode(" ",microtime()); return ((float)$usec + (float)$sec); } ?>

    Thank you for your help.
  • Help
    2007-09-14 10:46:30  Chuddy [Reply | View]

    I just created my database and copy these two codes to two different notepad and save it inside my php server.
    But when i run search.php, the following error massage will appear " Notice: Undefined index: keyword in c:\program files\easyphp1-8\www\search.php on line 13". When i check my line 13 of the code it is " if( $_POST['keyword'] )". How do i fix this problem?
  • Am i doing something wrong
    2007-09-02 10:42:05  louiscbrooks [Reply | View]

    Hi, i tryed this tutorial and everything seems to be running smoothly. Theres one problem though the populate.php script seems to index words 2 or 3 times each. is this normal or have i misconfigured something.

    Many Thanks Louis

  • MULTIPLE KEYWORD SEARCH
    2007-07-13 09:46:17  -MJD- [Reply | View]

    Hi,
    Great tutorial, converted it to ASP, works great.

    Now need a little guidance on multiple keywords!

    Any advice on the SQL statement?

    Cheers.
    MJD
  • Help - please!
    2007-06-02 12:20:19  88guy [Reply | View]

    I have spent days on this thing and I have one, enormous problem. By the way, I'm 55 and fairly new to all of this (Php and Mysql). On my linux server I easily created the database and added the tables. However, when I try to run the populate.php script in a browser I get a parsing error. I have added "populate.phprl=http://www.cnn.com/" at the bottom of the script, as advised by a previous poster. However, with little knowledge of php I am assuming that I am adding it minus brackets, in the wrong place, etc -something is not right. If I do not add the previous line I simply get the message about needing to add a URL for indexing.

    Very specifically - please - where does a line like this go and what, precisely, is it's syntax (should it be proceeded by } and followed by {, etc.)?
    • Help - please!
      2007-06-02 13:51:36  88guy [Reply | View]

      I figured it out...... I was doing something stupid.
  • Natural Language search engine
    2007-06-01 12:50:42  tennis_dunlop [Reply | View]

    Hi, this Tut was great. But as many as mentioned so far, there are quite a bit of extra thing that can be done. One of the main issue I have with this example, is that it only search for one word, and without true relationship without each otehr, except the amount of time the word is present in the text.

    I would like to suggest a php search engine that I wrote (it's GNU licensed) that include full on-screen installation, ability to create different index zones on your website (perfect for multilingual sites), natural language search and query expansion using MySQL 5.0+, great on-screen stats and report about your user's searches, aggregate userS' search and suggest complementary terms of search to ease your user's experience, and much more. Have a look at http://blog.dstmichel.ca/index.php/2007/05/14/11-invenio-a-php-web-search-engine

    And feel free to play and learn with all the included well-comented files.

    I would of course greatly appreciate your comments and would gladly help if you need assistance.

    Thanks,

    Dennis
  • populate.php
    2007-03-06 14:45:44  adevesa [Reply | View]

    I'm new to all of this PHP/MYSQL. I've been trying for days now to do this. My problems are:
    1.How do I make the mysql server to run on the localhost.
    2.when I index http://localhost/populate.php?url=http://www.macdevcenter.com/ I get exactly the same answer in the PAGE table, instead of only http://www.macdevcenter.com/
    3.How does the populate do all the things it is supposed to do regarding reading the database?

    I NEED HELP!!!!! I've been working hard to create a search engine with knowing nothing about computer programming until a few days ago.
    Thank you.
  • modify
    2007-02-28 15:00:04  katie_P [Reply | View]

    Hi as mentioned already im new to php & mysql, I must say its quite interesting - far better then my media studies course i used to do lol

    1. I was wondering is there any way in tailoring the search engine so that it displays a description under the associated links ?

    2. Lastly, am I write in saying that you would input a else statement to show "please enter a search term" when there is no valid search word inside the box

    Thanks for your help guys
  • Simplifying the query
    2007-01-25 15:45:35  pjdevitt [Reply | View]

    This seems obvious to me, but why are you storing every occurance of a keyword within a document? A simpler solution would be to just count the number occurances of a word within the PHP code and write the value to the database. That would remove the GROUP BY used in most of your queries. The occurance table would need to be modified to include a 'count' field. Here's a snippet of PHP that will create an array of words and the number of times they occur in the document.


    $wordbank = array();

    preg_match_all("/(\b[\w+]+\b)/", $buf, $words);
    for($j=0; $j<count($words[0]); $j++) {
    $cur_word = addslashes(strtolower($words[0][$j]));
    if(!in_array($cur_word, $filterWords)) {
    if(!isset($wordbank[$cur_word])) {
    $wordbank[$cur_word] = 0;
    }
    $wordbank[$cur_word]++;
    }
    }
    </code>

    You can then iterate through the $wordbank array and add a new record in the occurance table.

  • Giving Back
    2007-01-23 16:52:24  PHPchick [Reply | View]

    I found this really helpful in getting up & running fast and wanted to give back as a "thanks". So here's a little bit that I added in my version:

    /* create an array of words you want to exclude */
    $filterWords = array('a', 'about', 'an', 'and', 'are', 'as', 'at', 'be', 'by', 'from', 'how', 'i', 'in', 'is', 'it', 'nbsp', 'of', 'on', 'or', 'that', 'the', 'this', 'to', 'was', 'we', 'what', 'when', 'where', 'which', 'with');

    ... later in the code ...

    /* Does the current word already have a record in the word-table? */
    $cur_word = addslashes( strtolower($words[$i][$j]) );

    /* add the following to filter unwanted words */
    if (!in_array( $cur_word, $filterWords)) {
    ... database selects/inserts...
    }
    • Giving Back
      2007-03-05 03:59:33  katie_P [Reply | View]

      Hi sorry to bother you, but do you know how to search for more than 1 word. There was code provided earlier on in this thread but it does'nt work for me

      $result = mysql_query(" SELECT p.page_url AS url,
      COUNT(*) AS occurrences
      FROM page p, word w1, occurrence o1, word w2, occurrence o2
      WHERE p.page_id = o1.page_id AND
      w1.word_id = o1.word_id AND
      w1.word_word = \"$keyword[1]\" AND
      w2.word_id = o2.word_id AND
      w2.word_word = \"$keyword[2]\" AND
      GROUP BY p.page_id
      ORDER BY occurrences DESC
      LIMIT $results" );

      I think you need to change an SQL statment, could you please help me

      Thanks
    • Giving Back
      2007-02-28 14:51:59  katie_P [Reply | View]

      Hiya im kind new to php stuff. Im I write in saying that you have to insert these code into the search document?

      thanks
  • TSEP - ready, well featured PHP search engine
    2007-01-20 23:48:53  ONG [Reply | View]

    Hi

    I was happy to read the article, but it's one of many: There are several articles out on the net talking about search engine development.

    Since I am the admin of a search engine on sourceforge (TSEP) ( http://www.tsep.info ) I want to take the chance to invite developers to join in on an advanced search engine development progress: We are looking to dedicated developers right now.

    Olaf
  • Fantastic Article
    2006-08-25 06:44:05  JAS168 [Reply | View]

    This was a great article. I used the same concept to create a search engine for all types of resources. My only suggestion would be to keep in mind that this article is not the end-all solution for searching on a website, but rather a place to start in coding.
  • Passing session id through fopen
    2005-03-20 22:42:59  TennisOne [Reply | View]

    Thanks for the article. I am attempting to use it for our website. The populate.php script which executes the fopen passes in the URL to an article on our website. Every article on our website ensures that the user accessing the article is a member. I am executing the populate.php script as a member, however, when I execute the fopen call I lose all of my session information and get redirected to a join page because the article page does not think that I am a member.

    Is there a way for me to pass session information via the URL even though

    session.use_trans_sid = 0

    in my php.ini file. Which I believe from a security standpoint is the right thing to do.

    I tried passing my session information by defining

    $url_with_sid = $url."?PHPSESSID=".session_id()
    if (!($fd = fopen($url_with_sid, "r"))

    Unfortunately the fopen fails with "failed to open stream: HTTP request failed!"

    I can define the url with a query string parameter such as

    $url_with_sid = $url."?hello=world";

    And this works fine. Consequently, the fopen is not allowing the PHPSESSID query string parameter to be passed. Any thoughts or ideas would be greatly appreciated.

  • another success
    2005-02-15 14:51:33  Lykerus [Reply | View]

    For those that were having trouble with the

    Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource

    Try something like this code:
    $MySQLPassword = "password";
    $HostName = "localhost";
    $UserName = "username";

    mysql_connect($HostName,$UserName,$MySQLPassword)
    or die("ERROR: Could not connect to database!");
    mysql_select_db("database_name");


    I haven't worked on this engine for a few weeks, but decided to rewrite the connection script and see if I could get things working. It now works great!

    Thanks for this tutorial! It is a great one!
  • another success
    2005-02-15 14:50:47  Lykerus [Reply | View]

    For those that were having trouble with the

    Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource

    Try something like this code:
    $MySQLPassword = "password";
    $HostName = "localhost";
    $UserName = "username";

    mysql_connect($HostName,$UserName,$MySQLPassword)
    or die("ERROR: Could not connect to database!");
    mysql_select_db("database_name");


    I haven't worked on this engine for a few weeks, but decided to rewrite the connection script and see if I could get things working. It now works great!

    Thanks for this tutorial! It is a great one!
    • oops
      2005-02-15 14:52:03  Lykerus [Reply | View]

      sorry that got posted twice!

      --Lykerus
  • success
    2005-01-26 17:18:07  peetycox [Reply | View]

    Hi

    Please ignor my previous post, it works a treat(although a little slower n my server).

    I have one question, the example uses localhost and root with password which i understand is need for populating the databese. However for the search could i use localhost/any with limited permissions. Or am i just being paraniod?

    Thanks great example.

    Peetycox
  • mysql_fetch_array() error
    2005-01-21 14:43:31  Lykerus [Reply | View]

    Hello,

    I am having a strange problem when trying to populate the db. I am new to db's so be patient with me. I noticed that Sean had this same problem earlier, but I couldn't find the solution he found.

    The error I get is:

    mysql_fetch_array(): supplied argument is not a valid MySQL result resource

    I tried a roundabout way of using this ($row = mysql_fetch_array($result);) tag, but it doesn't work. When I try and search, it seems as though the database hasn't been populated.

    Can anyone help? Please and thank you!
    • mysql_fetch_array() error
      2005-01-26 15:54:55  peetycox [Reply | View]

      Hi

      I'am getting the same error it reads:

      Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in d:\appserv\www\scn\search.php on line 30

      Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in d:\appserv\www\scn\search.php on line 70
      Indexing: social

      Its driving me mad, this thing is great and just what i need. Please some tell what i need to do

      Thanks

      Peetycox
  • help me for populate.php
    2004-12-03 00:36:42  setiawan77th [Reply | View]

    Hi all..

    please help me with my problem, when I run
    populate.php it always say "You need to define a
    URL to process." but actually I alreade put
    some url addres in $url = addslashes( $_GET['http://localhost/newweb/start/'] ); well some body can help me??
    • help me for populate.php
      2005-01-21 14:59:09  Lykerus [Reply | View]

      One other thing that I forgot to add to your question is that you shouldn't put you url address after the '$_GET' tag. The url will not be able to be processed if you keep that in there. If you change the 'http://localhost/newweb/start/' to 'url' like it was before, and then do as I stated in my last message, you should be up and running.

      Good luck.
    • help me for populate.php
      2005-01-21 14:35:57  Lykerus [Reply | View]

      If you haven't already been answered to this or figured this out yet, in order to process a URL, you need to add the url at the end of the populate.php file.

      e.g. ...populate.php?url=http://www.cnn.com/

      This basically tells the populate.php file which url to index.

      Hope this helps!
  • help about the search engine
    2004-03-03 02:00:56  dinesh2037 [Reply | View]

    hi,

    I need 2 add (a)no. of hits in the results, (b) add previous and next link and (c) display message when no parameter is supplied.

    i need the search engine in php but i am quite new 2 php.

    plz write 2 me that whoever could help me.

    bye
  • Search 0-40 words
    2004-01-01 18:39:58  anonymous2 [Reply | View]

    Hi all

    I am still having problems with the search for keywords that can search up to 40 words. Thank you Giff for editting the SQL statement, but that would only solve my problem for 2-worded searches.

    My trail of thought is that the php script would have to break the string of words into substrings, then parse each substring as an individual word search, then searching for more than one searched word in the same article.

    The more i play with this, the bigger mess it becomes. Can anyone help?

    -Sean
    • Search 0-40 words
      2007-03-04 05:27:58  katie_P [Reply | View]

      hi can i ask how u managed to search for two words, i cant even managed to that

      Thanks

      If you could give any sort of code that would be great
  • Add on for foreign language
    2003-12-31 00:35:06  anonymous2 [Reply | View]

    Hi All,

    Thanks for this useful search engine.

    The accents ("à" in french for exemple) are encoded as "à" by html editors.

    To get your search engine dealing with it, I put the following lines in populate.php :

    /* Foreign site : convert french characters made by html editors : */

    $patterns[0] = "/ /";
    $patterns[1] = "/à/";
    $patterns[2] = "/â/";
    $patterns[3] = "/é/";
    $patterns[4] = "/è/";
    $patterns[5] = "/ê/";
    $patterns[6] = "/î/";
    $patterns[7] = "/ù/";
    $patterns[8] = "/û/";
    $patterns[9] = "/ç/";
    $patterns[10] = "/œ/";
    $patterns[11] = "/€/";
    $patterns[12] = "/©/";

    $replacements[0] = " ";
    $replacements[1] = "à";
    $replacements[2] = "â";
    $replacements[3] = "é";
    $replacements[4] = "è";
    $replacements[5] = "ê";
    $replacements[6] = "î";
    $replacements[7] = "ù";
    $replacements[8] = "û";
    $replacements[9] = "ç";
    $replacements[10] = "œ";
    $replacements[11] = "€";
    $replacements[12] = "©";

    $buf = preg_replace($patterns, $replacements, $buf);

    BETWEEN LINE

    $buf = ereg_replace('/&\w;/', '', $buf);

    AND LINE

    /* Extract all words matching the regexp from the current line: */

    It's not big deal but it works and it is easy to adapt to foreign languages.

    Regards,

    Louis
    http://www.interactive-trails.com
    • Add on for foreign language
      2003-12-31 00:41:31  anonymous2 [Reply | View]

      I meant "à" is modified to "& a g r a v e ;" by html editors (and by this site too)

      Space = & n b s p ;
      à = & a g r a v e ;
      â = & a c i r c ;
      é = & e a c u t e ;
      € = & e u r o ;

      and so on...
  • A more detailed query...
    2003-12-23 06:52:17  anonymous2 [Reply | View]

    Hi Daniel,

    Thanks again for your fantastic tutorial. I was wondering how one might submit a query which exclueds certain words and includes others. I know that the following should work for finding several words:

    $result = mysql_query(" SELECT p.page_url AS url,
    COUNT(*) AS occurrences
    FROM page p, word w1, occurrence o1, word w2, occurrence o2
    WHERE p.page_id = o1.page_id AND
    w1.word_id = o1.word_id AND
    w1.word_word = \"$keyword[1]\" AND
    w2.word_id = o2.word_id AND
    w2.word_word = \"$keyword[2]\" AND
    GROUP BY p.page_id
    ORDER BY occurrences DESC
    LIMIT $results" );

    But how can I also tell it to NOT provide me with an article that contains a word I don't want as well?

    - Giff
    • A more detailed query...
      2003-12-23 22:43:15  anonymous2 [Reply | View]

      Hi Giff

      Thank you for that. However, there is still a minor defect. This is what came up when i pasted your code in. And keep in mind that i have not altered anything stated in the article.

      "Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /var/www/html/jeo/dev/search/search.php on line 48
      query executed in 0.000 seconds."


      So does anyone have a suggestion?
      Please help!

      Thank you all for helping me so far.

      Best wishes for the festive season!

      Regards,
      Sean
    • RE: A more detailed query...
      2003-12-23 07:15:55  dsolin [Reply | View]

      Hi Giff,

      I'm not sure if I got you right here, but couldn't you just use the != or NOT LIKE operators? This would result in something like:

      w2.word_word != \"$keyword[1]\"

      or:

      w2.word_word NOT LIKE \"$keyword[1]\"

      Hope that helps!

      -Daniel
      • RE: A more detailed query...
        2003-12-23 07:44:06  anonymous2 [Reply | View]

        I tried that earlier, but it doesn't seem to work - the reason being (I think) that there ARE occurances of that word that are defined as not being in that article (aka, all the occurances in other articles).
        • RE: A more detailed query...
          2003-12-23 08:12:37  anonymous2 [Reply | View]

          Let me try to explain a little more clearly.

          Say I have an article with the word "science" and the word "computer" in it, and the search asked for all articles with "science" but without "computer."

          If I use this query:

          w2.word_word != "computer"

          than it will look for an occurance o2 that dosn't point to "computer". However, this constraint would be satisfied instantly by any other word existing in the article - even the word "science" would satisfy it. As long as there is one instance of a word other than computer in that article, the search will bring it up.

          If I put the "!= " further up with the article definition, we still get a similar problem:

          p.page_id != o1.page_id

          This occurrence o1 of "computer" may indeed not be in the page, simply because it is an occurrence of "computer" on another page. As long as there is any other page with "computer" on it, the query will bring up the page that the search dosn't work.

          What I really want is not:

          "Find a page where there EXISTS a word "computer" not on that page"

          but:

          "Find a page that, for ALL occurrences of the word "computer", NONE of them are on it"

  • multiple keywords
    2003-12-22 01:39:29  anonymous2 [Reply | View]

    hi

    thank you for those codes. they worked like a charm.

    i was just wondering if there was additional syntax involved whereby i would be able to search more than 1 keyword at a time. For example, a phrase?

    I have tried it on this search engine and it comes back with a null result.

    Your help would be most welcomed.

    Thank you,
    Sean
    • RE: multiple keywords
      2003-12-22 01:57:16  dsolin [Reply | View]

      Hi Sean,

      As pointed out by a previous poster, the easiest way to implement multiple keyword searching would probably be to use MySQL's Full-text Search. Detailed information about this feature can be found at:

      http://www.mysql.com/doc/en/Fulltext_Search.html

      You will need to rewrite the example program used in the article for it to use Full-text Search, but that should be quite simple to do.

      Good luck!

      Daniel
      • RE: multiple keywords
        2003-12-22 17:59:54  anonymous2 [Reply | View]

        Thank you Daniel.

        I have been looking at your link above, but have no luck in seeing this through. First of all, I still do not understand the FULLTEXT context. And how I would alter the table to incorporate such feature, then bringing it to the Search.php section.

        I hope you can help me here.

        Best regards,
        Sean
        • RE: multiple keywords
          2004-09-21 10:57:36  cityslicker [Reply | View]

          Hi All,

          Thanks for the code - it is great.

          Just inform you of my edits and how I used multiple keywords.

          1st of all - I put the keyword extraction script in a function.

          I would use it to list titles and keywords of pages as well as main body text. I call the function 3 times for words in a title, twice for keywords and once for main body text. This is a way of scoring a page.

          E.g. when a search is done on 'business' and a page title has the word 'business' in it will show 3 occurrences (although I dont show occurrences I just use the score to order the list)

          2. Multiple keywords.

          Basically, I use the explode() function to get an array or keywords and loop through them applying them to the query. I keep the scores for each word and add them together before displaying the list by highest score.

          //CODE

          /* Get timestamp before executing the query: */
          $start_time = getmicrotime();

          $keyword_array = explode(" ", $_GET['keyword']);

          $score = array();
          foreach($keyword_array as $keyword)
          {

          /* Set $keyword and $results, and use addslashes() to
          * minimize the risk of executing unwanted SQL commands: */


          /* Execute the query that performs the actual search in the DB: */
          $result = mysql_query(" SELECT p.page_title AS title,
          COUNT(*) AS occurrences
          FROM pages p, word w, occurrence o
          WHERE p.pageID = o.page_id AND
          w.word_id = o.word_id AND
          w.word_word = \"$keyword\"
          GROUP BY p.pageID
          ORDER BY occurrences DESC
          LIMIT 0, 5" );


          for( $i = 1; $row = mysql_fetch_array($result); $i++ )
          {

          $score[$row['title']] += $row['occurrences']; //Array of scores

          }
          }

          if(count($score) > 0)
          {
          arsort($score); //Reverse sort the associative array scores by highest

          /* Get timestamp when the query is finished: */
          $end_time = getmicrotime();


          /* Present the search-results: */
          print "<h2>Search results for '".$_GET['keyword']."':</h2>\n";
          //Loop through array and display results

          while ($element = each($score)) //Loop through array and output results
          {

          echo $element[ "key" ];
          echo " - ";
          echo $element[ "value"];
          echo "
          ";

          }


          /* Present how long it took the execute the query: */
          print "query executed in ".(substr($end_time-$start_time,0,5))." seconds.";
          }
          else
          {

          //Display a no pages found page

          }


          // END CODE

          This works fine but is a little slower if the user wants to search for a sentence. All in all, it is an easy add-on to the already supplied code that provides multiple keyword searching.

          Hope this helps someone!
          • RE: multiple keywords EDIT
            2004-09-21 11:10:36  cityslicker [Reply | View]

            Hi again,

            Just a small edit from the code above.

            If a user seached for "good web sites" and a page contained 100's of 'good' but no 'web' and 'sites' then it would rank higher than a page which can have all three. This is not what we want so ammend the above code with this part:

            for( $i = 1; $row = mysql_fetch_array($result); $i++ )
            {

            $score[$row['title']] += $row['occurrences']; //Array of scores
            if($row['occurences'] > 0) { $score[$row['title']] += 1000; } //This makes pages containing all keywords rank highest
            }

            You can set the 1000 to whatever you like but you should be safe with that number.

            • RE: multiple keywords EDIT
              2004-09-21 11:19:15  cityslicker [Reply | View]

              Hi once more!!

              Make sure you spell occurrences correctly unlike in my code above!!

              • RE: multiple keywords EDIT
                2009-11-04 09:45:30  xoqqa [Reply | View]

                Would you be so kind to send me your version of this search engine please? I've been trying to figure out what is wrong with mine and noticed that yours is somewhat different. For example I don't have page_title but just urls... I think you've also modified the populate script.

                I would appreciate if you can send me the script files on sammutmatu[at]gmail.com

                Thanks
  • A more difficult search engine
    2003-12-18 08:16:26  anonymous2 [Reply | View]

    I need to develop a PHP/MySQL search engine for my compnay (the publisher of an academic magazine) and my boss has expressed a desire for an "exact phrase option." For instance, when a user types in "martin luther king", the search engine would only bring up articles on Dr. King, rather than articles about Martin Luther and his dealings with german royalty. However, this seems like a terribly difficult thing to implement. Your tutorial has been a great deal of help on this topic, and I was wondering if you might point me in the right direction.

    - Giff
    • RE: A more difficult search engine
      2003-12-19 00:49:31  dsolin [Reply | View]

      Hi Giff,

      From what I can understand by reading your description, this is something that you will need to implement in the indexing mechanism of your search engine -- when the user provides the engine with a search phrase, the backend needs to already know the difference between "an article on Dr. King" and "an article about Martin Luther and his dealings with german royalty".

      So, without being able to get into much detail, I think you need to implement this in your backend database. Maybe you should add a column that indicates the state of a certain URL -- is it an article on Dr. King or a about his dealings with german royalty? Of course, the hardest part of such a project would be to implement logic into the indexing mechanism that calculates that value of this column. Maybe it needs to be done manually?

      If you find a working solution, Giff, please feel free to post it here. I'm sure that would be interesting reading for many of us. Good luck!

      -Daniel
  • Search in dynamic page
    2003-10-20 03:29:22  anonymous2 [Reply | View]

    As i saw this search method do not search in dynamic page such as mypage.php?id=46 ro something ...

    How can i do that ? run populate.php with a loop for all params in URL ? :

    for ($i=1; $i < 50; $i++) {
    system("populate.php?id=".$i);
    }

    ...

    Matt
    • RE: Search in dynamic page
      2003-10-20 04:22:54  dsolin [Reply | View]

      Hi Matt,

      With this (simple) example, all pages needs to be indexed via http individually. However, just as you imply, you could quite easily automate this task to index several pages in a batch. As you might know, you can make http-requests using PHP's fopen()-function, so you could do something like this:

      for($i = 1; $i < 50; $i++)
      {
      fopen("http://www.mysite.com/mypage.php?id=".$i);
      }

      Good luck!

      Daniel
  • it doesn't work?
    2003-09-20 16:31:11  anonymous2 [Reply | View]

    i'm a total php noob, but i uploaded the 2 pages to my webhost (lycos) but it won't print the search results, i adjusted the files to my database and so on it prints everything but the results...?
    • re: it doesn't work?
      2003-09-21 13:23:22  dsolin [Reply | View]

      Hi there!

      Could you post the URL to the search engine on your lycos account? No possible solution pops up right away, but maybe I would be able to help if I could take a look at it.

      Best,
      Daniel
  • Nice
    2003-09-04 12:12:14  anonymous2 [Reply | View]

    Great solution! I tried it out tonight and got it working like a charm in half the time I thought it would take me!

    Keep up the good work!

    Quality Kingdom

  • Index through the file system
    2002-11-04 03:48:05  anonymous2 [Reply | View]

    For building a local index, it is much more
    efficent to do it through the filesystem instead
    of through the http server.

  • Use Exclusion Tags
    2002-11-04 03:27:27  anonymous2 [Reply | View]

    Many search engine recognize tags that instruct them NOT to parse the markup that they enclose.

    For example ...

    <!-- stop_indexing -->

    Here there might be a menu or other markup
    that should not be indexed

    <!-- start_indexing -->

    It ensures that the only words derived from
    significant content are indexed ... makes it
    more precise for the user, and of course the
    index is smaller -- so the whole thing works
    faster.
  • htDig
    2002-10-31 13:21:44  anonymous2 [Reply | View]

    Stand on some giant's shoulders -
    www.htdig.org - an open-sourced spider/search engine used extensively throughout the world.
  • Optimizing multiple occurences of same word on same page
    2002-10-30 08:07:18  anonymous2 [Reply | View]

    Instead of inserting several records for the same word on the same page, you could add another field to the occurence table which indicates the number of occurences of the word on the referenced page. This reduces the number of records per page to the number of distinct words. Also stop words (excluding some words from the index is always good ["a","the","and"]).
  • simple search
    2002-10-28 23:33:02  anonymous2 [Reply | View]

    Nice solution, I did the same thing in Perl, but used the same approach.
    Only I just dump all the data en do a refill each night, instead of checking if there's a record.
  • MySQL Fulltext?
    2002-10-28 00:01:55  anonymous2 [Reply | View]

    Hey,

    Why not use the MySQl Fulltext?

    This gives you support for boolean operations and stopwords.

    It might be a good idea.

    -- Brian
    • MySQL Fulltext?
      2003-05-21 12:52:26  anonymous2 [Reply | View]

      Great idea, if your ISP supports MySQL 3.23....

      >Hey,
      >Why not use the MySQl Fulltext?

      >This gives you support for boolean operations >and stopwords.

      >It might be a good idea.

      -- Brian
      • MySQL Fulltext?
        2008-05-13 23:07:27  manrah [Reply | View]

        i have a problem on search.
        suppose i do a search on name, age ,sex which i collect through post on search page.on basis of these parameter i get 10 search result pages of 10 profile per page.but when I navigate to link that of next or previous result page I lost the parameters i.e name , age, sex.how to overcome this problem.


Recommended for You

Tagged Articles

Post to del.icio.us

This article has been tagged:

php

Articles that share the tag php:

Understanding MVC in PHP (477 tags)

The PHP Scalability Myth (123 tags)

The Dynamic Duo of PEAR::DB and Smarty (53 tags)

PHP Form Handling (43 tags)

Very Dynamic Web Interfaces (39 tags)

View All

search

Articles that share the tag search:

MySQL FULLTEXT Searching (93 tags)

Find What You Want with Plucene (22 tags)

Building a Vector Space Search Engine in Perl (18 tags)

Google Your Desktop (14 tags)

Dreaming of an Atom Store: A Database for the Web (14 tags)

View All

mysql

Articles that share the tag mysql:

MySQL FULLTEXT Searching (155 tags)

Live Backups of MySQL Using Replication (152 tags)

Advanced MySQL Replication Techniques (125 tags)

Ten MySQL Best Practices (59 tags)

Rolling with Ruby on Rails (56 tags)

View All

Sponsored Resources

  • Inside Lightroom
Advertisement

Sponsored by:

O'Reilly Media

©2010, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
  • About O'Reilly
  • Academic Solutions
  • Contacts
  • Customer Service
  • Careers
  • Press Room
  • Privacy Policy
  • Terms of Service
  • Writing for O'Reilly
  • Community
  • Authors
  • Forums
  • Membership
  • Newsletters
  • RSS Feeds
  • User Groups
  • More O'Reilly Sites
  • igniteshow.com
  • makerfaire.com
  • makezine.com
  • craftzine.com
  • labs.oreilly.com
  • Partner Sites
  • InsideRIA
  • O'Reilly Insights on Forbes.com