
A. Introduction
With keyword searching for Web pages, we are asking the system to do a lot of work behind the scenes in order to generate a few KB of text for display. Ordinary HTML pages are simply a collection of data bytes that the server has to send down the wire to the browser for display. On the other hand, keyword searching means running a query program on the server, which has to search through a large database to find the pages the viewer is interested in. Searching requires a lot of disk I/O to build the indexes ahead of time, as well as for the search, and then both system CPU cycles and disk I/O are needed to calculate the score for each "hit" page and sort the hits by rank order.
The process is both machine- and labor-intensive, so for the sake of efficiency, we will not create separate indexes for every portion of the Web. Instead, the query software permits restricting the displayed results to those that are within a limited realm (for example, limiting the results to any pages whose URL starts out with "http://www.ohiou.edu/oupress/"). The next section describes how you can use the new software to build a realm-limited search into your pages, and the final section describes the procedures we use "behind the scenes" to create the searchable databases that the software uses.
CAUTION!
We will be retiring our ThunderStone Search Appliance, replacing it with a Google Search Appliance ("GSA"). The GSA is now in production, so the time has come to update any existing custom searches that use the old code.
Updating your existing custom search to use the GSA instead of the ThunderStone machine requires replacement of the FORM and INPUT fields in your old HTML code with those listed below.
There will be a period of time when both search appliances are in place, before we permanently retire the ThunderStone machine. Hence, you will not have to update your code overnight. We will update this page to announce the power-off date for the ThunderStone Appliance as soon as that date is established. The ThunderStone power-off date will be no sooner than September 30, 2008.
If the only search you have on your pages is a copy of the one used in the upper-right corner of the Front Door (which searches the entire Ohio University web presence using a FORM tag with "action="http://www.ohio.edu/progtools/searchRoute.cfm"), then you will not need to make any changes at all: we have updated that code already. It is the custom searches, as described below, that require updating to use the GSA.
The custom search method described here is intended to permit any Ohio University pagemaster to include, on any of his or her pages, a search option that will return quickly only those hits that are part of that subsite.
You can control the pages that are searched in two ways: specifying the "collection" to be used (i.e., which of the specific databases that the search engine maintains should be used for the search), and specifying the "realm" to be reported on (i.e., the initial parts of the URL that all of your subsite's pages have in common).
At this time there are two collections available:
Therefore, for example, this method can be used to search only those pages whose URLs start with "http://www.ohiou.edu/perspectives/", (using the "default" collection) but it cannot be used to create a combined search of all pages whose URLs start with either "http://www.ohiou.edu/perspectives/" or "http://www.ohiou.edu/researchnews/", because there is no combination of one collection and one realm that will include all of those pages and no others. If you need to create such a complex search, please contact the Office of Information Technology Web Services team, at 593-1017, or by E-mail to webteam@ohio.edu, in order to determine whether an existing collection will work, or whether the search engine would have to be re-configured to create a new collection.
There are several steps to building your own custom search:
Including a terminal slash restricts the search results to include only pages at that level, not in any sub-subsites. For example, specifying a realm of "http://www.ohiou.edu/pagemasters/" would exclude http://www.ohiou.edu/pagemasters/memo85/append4.html, which would have been included without the terminal slash.
<form method="get" action="http://google.ohio.edu/search">
<input type="hidden" name="sort" value="date:D:L:d1">
<input type="hidden" name="entqr" value="0">
<input type="hidden" name="ud" value="1">
<input type="hidden" name="client" value="ou_front">
<input type="hidden" name="output" value="xml_no_dtd">
<input type="hidden" name="proxystylesheet" value="ou_front">
<input type="hidden" name="ie" value="UTF-8">
<input type="hidden" name="oe" value="UTF-8">
<input type="hidden" name="as_dt" value="i">
<input type="hidden" name="site" value="default_collection">
<input type="hidden" name="as_sitesearch" value="http://www.ohiou.edu/pagemasters">
Click here and type to enter your search keywords:
<input size=25 name="q" value="" maxlength="255"> <input type=submit name="btnG" value="Search">
</form>
Dick Piccard revised this file (http://www.ohiou.edu/pagemasters/memo85/append4.html) on September 5, 2008.
Please E-Mail comments or suggestions to "webteam@ohio.edu".