Maintaining Web Publications at Ohio University

Appendix IV: Using the Google Search Appliance for
Keyword Searching Ohio University Web Pages


Return to Appendix III  Return to Table of Contents  Go to Appendix IV



Appendix IV Table of Contents

A. Introduction

B. Custom Searches

C. Example Custom Search

D. Behind The Scenes



A.  Introduction

This Appendix discusses the Web page keyword searching that is available from the "Search" button on the Ohio University Front Door. There are three reasons for including this discussion in Memo 85:

With keyword searching for Web pages, we are asking the system to do a lot of work behind the scenes in order to generate a few KB of text for display. Ordinary HTML pages are simply a collection of data bytes that the server has to send down the wire to the browser for display. On the other hand, keyword searching means running a query program on the server, which has to search through a large database to find the pages the viewer is interested in. Searching requires a lot of disk I/O to build the indexes ahead of time, as well as for the search, and then both system CPU cycles and disk I/O are needed to calculate the score for each "hit" page and sort the hits by rank order.

The process is both machine- and labor-intensive, so for the sake of efficiency, we will not create separate indexes for every portion of the Web. Instead, the query software permits restricting the displayed results to those that are within a limited realm (for example, limiting the results to any pages whose URL starts out with "http://www.ohiou.edu/oupress/"). The next section describes how you can use the new software to build a realm-limited search into your pages, and the final section describes the procedures we use "behind the scenes" to create the searchable databases that the software uses.


B.  Custom Searches


CAUTION!

We will be retiring our ThunderStone Search Appliance, replacing it with a Google Search Appliance ("GSA"). The GSA is now in production, so the time has come to update any existing custom searches that use the old code.

Updating your existing custom search to use the GSA instead of the ThunderStone machine requires replacement of the FORM and INPUT fields in your old HTML code with those listed below.

There will be a period of time when both search appliances are in place, before we permanently retire the ThunderStone machine. Hence, you will not have to update your code overnight. We will update this page to announce the power-off date for the ThunderStone Appliance as soon as that date is established. The ThunderStone power-off date will be no sooner than September 30, 2008.

If the only search you have on your pages is a copy of the one used in the upper-right corner of the Front Door (which searches the entire Ohio University web presence using a FORM tag with "action="http://www.ohio.edu/progtools/searchRoute.cfm"), then you will not need to make any changes at all: we have updated that code already. It is the custom searches, as described below, that require updating to use the GSA.



The custom search method described here is intended to permit any Ohio University pagemaster to include, on any of his or her pages, a search option that will return quickly only those hits that are part of that subsite. You can control the pages that are searched in two ways: specifying the "collection" to be used (i.e., which of the specific databases that the search engine maintains should be used for the search), and specifying the "realm" to be reported on (i.e., the initial parts of the URL that all of your subsite's pages have in common).

At this time there are two collections available:

Therefore, for example, this method can be used to search only those pages whose URLs start with "http://www.ohiou.edu/perspectives/", (using the "default" collection) but it cannot be used to create a combined search of all pages whose URLs start with either "http://www.ohiou.edu/perspectives/" or "http://www.ohiou.edu/researchnews/", because there is no combination of one collection and one realm that will include all of those pages and no others. If you need to create such a complex search, please contact the Office of Information Technology Web Services team, at 593-1017, or by E-mail to webteam@ohio.edu, in order to determine whether an existing collection will work, or whether the search engine would have to be re-configured to create a new collection.

There are several steps to building your own custom search:

  1. Identify a "collection" that includes all of your pages (choosing the collection that includes the fewest other pages will speed your search slightly), and identify the "realm" you will specify to restrict the search to only your pages. Typically this will be the full URL up to the point where the pages vary. For example, "http://www.ohiou.edu/pagemasters" would specify a realm that includes all of the Pagemasters Toolbox pages.

    Including a terminal slash restricts the search results to include only pages at that level, not in any sub-subsites. For example, specifying a realm of "http://www.ohiou.edu/pagemasters/" would exclude http://www.ohiou.edu/pagemasters/memo85/append4.html, which would have been included without the terminal slash.

  2. Use your mouse to select the HTML code displayed here, and copy it:

    <form method="get" action="http://google.ohio.edu/search">
    <input type="hidden" name="sort" value="date:D:L:d1">
    <input type="hidden" name="entqr" value="0">
    <input type="hidden" name="ud" value="1">
    <input type="hidden" name="client" value="ou_front">
    <input type="hidden" name="output" value="xml_no_dtd">
    <input type="hidden" name="proxystylesheet" value="ou_front">
    <input type="hidden" name="ie" value="UTF-8">
    <input type="hidden" name="oe" value="UTF-8">
    <input type="hidden" name="as_dt" value="i">
    <input type="hidden" name="site" value="default_collection">
    <input type="hidden" name="as_sitesearch" value="http://www.ohiou.edu/pagemasters">
    Click here and type to enter your search keywords:
    <input size=25 name="q" value="" maxlength="255"> <input type=submit name="btnG" value="Search">
    </form>

  3. Go to your page editor, open the page you want to add the search into, view the HTML code if that isn't the default, position the insertion point appropriately, and paste.

  4. Find the hidden "site" input tag in the HTML you just pasted into your file, and if necessary change the value from "default_collection" to the appropriate collection for your pages, as you decided in step 1 (for those with graphical browsers, the part to change is displayed in bold, blue type, above).

  5. Find the hidden "as_sitesearch" input tag in the HTML you just pasted into your file. If the collection you have specified includes no other pages than the ones you want to search, remove that entire tag. If the collection you have specified does include other pages, change the value from "http://www.ohiou.edu/pagemasters" to the appropriate realm for your pages, as you decided in step 1 (for those with graphical browsers, the part to change is displayed in bold, green type, above). Be sure to include or exclude the terminal slash as appropriate for the results you want, according to the commentary in the second paragraph of step 1.

  6. Revise the prompt text as appropriate (for those with graphical browsers, the part to change is displayed in bold, red type, above). In some situations a much more terse prompt would be appropriate.

  7. Save the modified page and test the search.


C.  Example Custom Search

The HTML given in step 3, above, will produce the following results (try the search, now, to see what the results page is like):

Click here and type to enter your search keywords:


Return to Appendix III  Return to Table of Contents  Go to Appendix IV


Ohio University Front Door |Webauthors Welcome| Academic Technology


Dick Piccard revised this file (http://www.ohiou.edu/pagemasters/memo85/append4.html) on September 5, 2008.

Please E-Mail comments or suggestions to "webteam@ohio.edu".