[Fwd: LINKS -- Tutorial Number Five (HTML mail)]

Phyllis Wilson (burmkat@sonic.net)
Sun, 27 Jul 1997 18:13:50 -0700

This is a multi-part message in MIME format.

--------------752C15A5909
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Do you want me to continue sending these?

--------------752C15A5909
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Received: from kiki.arlington.com (daemon@kiki.arlington.com [140.174.170.5]) by sub.sonic.net (8.8.8/8.8.5) with ESMTP id FAA27088 for <burmkat@sonic.net>; Mon, 27 Apr 1998 05:21:40 -0700
From: owner-links-apr@arlington.com
X-envelope-info: <owner-links-apr@arlington.com>
Received: from localhost (daemon@localhost)
by kiki.arlington.com (8.8.8/8.8.5) with SMTP id FAA18874;
Mon, 27 Apr 1998 05:08:53 -0700 (PDT)
Received: by kiki.arlington.com (bulk_mailer v1.6); Mon, 27 Apr 1998 05:06:34 -0700
Received: (from majordom@localhost)
by kiki.arlington.com (8.8.8/8.8.5) id FAA18762;
Mon, 27 Apr 1998 05:06:33 -0700 (PDT)
Date: Mon, 27 Apr 1998 05:06:33 -0700 (PDT)
Message-Id: <199804271206.FAA18762@kiki.arlington.com>
Subject: LINKS -- Tutorial Number Five (HTML mail)
To: links-apr-outgoing@kiki.arlington.com
Reply-To: tcopley@arlington.com
MIME-Version: 1.0
Content-type: multipart/alternative; boundary="boundary5021"
Content-Transfer-Encoding: 7bit
Sender: owner-links-apr@arlington.com

Dear Workshop participants,

This is an HTML mail document meant to be read with a MIME
(multipurpose Internet mail extensions) compatible, HTML-capable,
e-mail client, such as Netscape Mail 3.0 or later. If you are reading
this message, it probably means that your e-mail client program does
not have these features.

An alternative method for reading this document is to connect to the
Arlington Courseware Web site. In order to view this document on our
Web site, please start your Web browser, and connect to the following
URL:

http://www.arlington.com/links/tutorial5/html/

You will be prompted for a user name and a password by your browser.
For the user name, please type (in lower case):

links

and for the password, also type (in lower case):

links

Your browser will then fetch these workshop materials and display them
for you.

TEXT VERSION

A plain text (non-HTML) version of this document is also available. To
request a copy of this tutorial in plain text, please send an e-mail
message to my automatic resending facility at the e-mail address:

mbot-apr@arlington.com

and type in the Subject line of your message:

send tut5

Please leave the body of the message blank.

If you have any questions about these procedures, please send an e-mail
message to <tcopley@arlington.com>.

TPC

[ ******** PLEASE IGNORE THE REST OF THIS DOCUMENT ******** ]

--boundary5021
Content-Type: text/plain; charset=us-ascii

Dear Workshop participants,

This is an HTML mail document meant to be read with a MIME
(multipurpose Internet mail extensions) compatible, HTML-capable,
e-mail client, such as Netscape Mail 3.0 or later. If you are not
using an HTML-capable mail program, the HTML section of this document
may show up as an attachment. You can save the HTML attachment to your
hard drive, and open it with your Web browser in order to view it.

An alternative method for reading this document is to connect to the
Arlington Courseware Web site. In order to view this document on our
Web site, please start your Web browser, and connect to the following
URL:

http://www.arlington.com/links/tutorial5/html/

You will be prompted for a user name and a password by your browser.
For the user name, please type (in lower case):

links

and for the password, also type (in lower case):

links

Your browser will then fetch these workshop materials and display them
for you.

TEXT VERSION

A plain text (non-HTML) version of this document is also available. To
request a copy of this tutorial in plain text, please send an e-mail
message to my automatic resending facility at the e-mail address:

mbot-apr@arlington.com

and type in the Subject line of your message:

send tut5

Please leave the body of the message blank.

If you have any questions about these procedures, please send an e-mail
message to <tcopley@arlington.com>.

TPC

--boundary5021
Content-Type: text/html; charset=us-ascii

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
Tutorial Number Five -- Searching with Style

NOTE: This document is meant to be viewed with a browser-capable e-mail client, such as Netscape Mail 3.0. If this presentation appears strange, we suggest that you connect to our Web site to view this document instead <http://www.arlington.com/links/tutorial5/html/>. The Web server will ask you for a username and password. Please put in "links" for both.

Make the Link Workshop Logo
Tutorial Number Five
Searching with Style

In this tutorial you will learn how to:

  • Conceptualize the need for coping strategies in dealing with the exponential growth of information.
  • Analyze the process of carrying out an efficient and effective Web search.
  • Plan out your Web search.
  • Synthesize your own view of WWW search engines and how to best utilize them.

Just for a moment, please imagine what it might be like if you went to a store where, instead of paying the store keeper for goods and services, the storekeeper had to pay you a fee to come and take the merchandise away. In this upside-down, imaginary world it would be completely unheard of to pay for anything. Sounds ridiculous, right? Yet, in the information sphere we are rapidly approaching the point where information is becoming such an unlimited commodity that it can be compared to air or sea water in its availability. We can have as much as we want at no cost. We are only limited by our storage capacity.

According to an estimate by The International Data Corporation, a computer industry marketing research firm:

In 1985, the number of documents in the world was doubling every five years. By 1989 the amount was doubling every three years. In 1991, they doubled every year, and in 1994 they are estimated to double in nine months.(1)

Even allowing for a trace of hyperbole because the source of this claim is an advertisement for text-searching software, it does seem to square with the direct personal experience of many of us who have been participating in the global Internet. The production of raw information has clearly been in an exponential growth phase for a number of years, with no end in sight. Would it be totally surprising then, if information providers were willing to pay us to consume their information? One is tempted to ask whether any more production of information is really warranted under the circumstances. However, we need information to live--although it must be information that is directly relevant to our daily lives and the challenges that each of us face. This is the rub: we need meaningful information to live, but without question the growing amount of information consists mainly of the irrelevant.

Twenty-Two Centuries Ago

To see that things were not always so, let us turn for a moment to the legendary library of Alexandria in Egypt twenty-two centuries ago. It represented the flowering of Greek scholarship in its time, having more than 500,000 scrolls by the end of King Ptolemy's reign. The catalog alone is said to have numbered 120 volumes--and the collection continued to expand through an aggressive program of acquisition.

In fact, the expansion program was nothing short of ruthlessly zealous. Alexandria was one of the greatest seaports of the day, and the Ptolemaic regime had a policy of inspecting all visiting ships for books. When books were found that were deemed valuable, they were confiscated and taken to the library for copying. However, only the copies were returned to the original owners. This attitude reflected the relative scarcity of information at the time. It made these written artifacts valuable booty to be amassed as a symbol of the prestige and power of the rulers of Alexandria. Ptolemy III went to the extent of striking a bargain with the authorities in Athens to make copies of the tragedies of Aeschylus, Sophocles, and Euripides. As security for the safe return of these precious manuscripts, Ptolemy had to pay the present-day equivalent of millions of dollars. However, once in his hands, Ptolemy III had copies made, but only returned copies of their priceless legacies to the Athenians, thereby gleefully forfeiting his deposit.

This brilliant yet devious campaign amassed the largest collection in the world at that time of the intellectual treasures of the ancients. It included virtually all of the Greek classics, as well as the best knowledge of the day of science, mathematics, and engineering.(2) Setting aside their value as rare books and documents, who could imagine today going to such lengths when most of these documents can simply be downloaded from the Web for no cost at all?

In fact, there is perhaps declining merit in maintaining collections of books even in digital form. If they can be downloaded at any time, unless there is an immediate requirement for them, these tracts simply take up space on the computer--better someone else's than yours.

Perhaps this may seem like a radical and perhaps overly grim view of the value of collections of books, yet I make it in order to demonstrate that the meaning and usefulness of information has become separate from its physical presence. In effect, the copy, that is, the ability to make a copy has superseded the value of the original work.

Static Versus Dynamic Sources of Information

It used to be that when one became stumped trying to find out information about a subject, that one, usually in desperation, turned to the reference librarian for help. The librarian knew his or her collection well, and how to use such arcane resources as the card catalog, and could find what one sought with ease. With "the collection" now residing on millions of networked computers throughout the world, it is incomprehensible that any one individual could ever fathom even a significant portion of this assemblage. Yet the skills that librarians practice are as valid today as ever. In fact, knowledge workers, scholars, and professionals of all kinds will need to practice some of the same skills previously performed only by librarians if they are to cope with the ever-growing mountains of information.

One traditional way of handling collections of information is to categorize--that is, to "Divide and conquer!" as the old maxim goes. If we can just organize all of the material into pigeon-holes, then we can label each pigeon hole, and have the power to find the things within them when we want them. In today's world of exponentially expanding information, categorization has the potential to back-fire. If, instead of pigeon-holes, we think of categories as tin-caps and the flow of new information as a fire-hose, the problem becomes more obvious. Similarly, bibliographies, another traditional tool of the scholar, have been rendered less than completely useful by the rapidity with which new information is being produced. Bibliographies represent only a snap-shot of the state of the knowledge in a field at a point in time. A newly produced bibliography may seem sadly out-of-date in a matter of a few months. Only information-organizing schemes that are subject to frequent updates can be completely relied upon.

Strategy Five

Rely on dynamic information resources, that is, those that assure currency of the information.

Overcoming Infoglut

What are some of the information-handling coping skills that can help us survive this wildly overflowing cornucopia of information known as the Internet? Here are a few that come to mind.

1. Be very selective about what you keep. Today's situation is the very antithesis of the one faced by Ptolemy III. In his day there was a dearth of information that was written down. The relative scarcity provided some motivation, or at least an explanation, for why a powerful king might choose to amass a huge collection of books or scrolls (physical informational artifacts). Today one can neither afford the luxury of such ambitions nor realistically afford the cost of maintaining large collections of static information. It is much better to maintain a small collection of immediately relevant information and download the rest from the Internet when required. Remember, the information you need is only a point and click away with the World Wide Web.
2. Seek out and evaluate information sources and providers on the basis of their potential to be dynamic, meaningful interpretors of the information scene. Some would argue that teachers and librarians have been performing these jobs since time immemorial. However, at any point in time some sources are simply more useful than others. In the past, great emphasis was placed on making as few errors as possible. While accuracy is as important as ever, the fact is that documents on the Internet can be updated on a moment's notice and be just as available as ever to the public. If a mistake is discovered, or new relevant information comes to light, the material can simply be revised. It is only that information that cannot or will not be kept current that is of lesser value.
3. Use power tools. During the same time that the mounting flow of information led to a heightened level of info-stress on the part of millions, the countless contributors to the information resources of the Internet were busily dreaming up new schemes to make the Internet manageable and navigable. Perhaps some of you have had the opportunity to try the InfoSeek URL that I gave at the end of Tutorial Four. This is but one example of a power tool designed to help you find things in Webspace, and it is quite effective at this task. There are others, and I will write more about them later in this tutorial.
4. Have a plan when seeking information on the net.
The Spider

The spider loves to entertain
Her neighbors and relations,
But woe to any bugs or flies
Who accept her invitation!
So have a care, be wary of
This most accomplished spinner.
When she murmurs, "Be my guest!"
What she means is, "Be my dinner!"

--Ethel Jacobson

There you have it. Due caution must be taken!

The WWW can be an intoxicating and seductive place. It will probably not be long before some would-be guru urges the youth of the world to drop out and tune in to the Web. Many do report experiencing the feeling that they are in a trance-like state as they meander through Webspace with too many interesting ways to go. Perhaps, this is related to the "informational myopia" to which Jeff Conklin referred (see Tutorial 3).

Having a plan is like having a road-map to your destination in Webspace. It gives you something with which to benchmark your progress. Even if you end up revising your original plan, you still have a basis for interpreting your results.

5. Bootstrap as much as possible. It is often the chance discovery of a resource that can lead you to some of the most valuable information on the Internet. There is no one way to search for information. As a general rule, use whatever means work. But once you find some information related to your goal, it is often serendipity that will lead you to the most valuable finds. Intuition, combined with a logical plan, is the best recipe for effective and efficient searching.

WWW Search Tools

You must also understand the search tools of the WWW, and how to use them.

At this point, I would like you to turn your attention to a specific Web page, namely the WWW Power Index, compliments of Web Communications, an Internet presence provider located in Santa Cruz, California. The URL is:

By the way, there is nothing sacred about this nice collection of URLs. It is not unlike several others that I have seen on the net. When I refer to a specific menu item, I will also give its URL, so there is no reason to become wedded to this particular index. Nevertheless, it was kind of the folks at Web Communications to provide this, so if you are in the market for a Web presence provider, you may want to check out their home page.

Assuming that you have found the Web Power Index without any difficulty, you should see this menu:

Select "Internet/WWW Search Tools," the first menu item. Next you will see the menu below.

Finally, once again select the first menu item, "WWW Search Tools" and you will then be presented with this menu:

Taxonomy of Search Engines

Part of my motivation for taking you through this exercise is for you to get some idea of the range of search engines and other sources for finding things on the Web. The only ones that I am going to cover, as they are personal favorites, are the following,

and in Tutorial Six I will also cover:

and Veronica via the University of Minnesota.

Of course, these are only some of my personal favorites, and tastes will surely vary. As you can see from the World Power Index, there are many possible choices and this menu is far from exhaustive. The principles of searching that I will discuss are valid regardless of the particular search engine.

What are the major differences between the various search engines? There seem to be four major types. A search engine may be classified as:

  • a database,
  • a gatherer,
  • a retrieval program, or
  • a harvester.

Some examples include:

1. Indexes

These consist of databases of URLs that have been suggested or collected from around the WWW. Perhaps the best known index is Yahoo, which got started at Stanford University but has now gone commercial.

2. Automatic Collectors, or Gatherers

These are special purpose programs that are capable of traversing the WWW and collecting URLs as they go. These programs are sometimes called "spiders" or "robots" for obvious reasons. These can be of two types:

  • Depth first -- these examine a Web page and follow each link in as much depth as possible and then follow the links to other external Web pages. Lycos is the best example of this.
  • Breadth first -- these collectors start at the top level of a Web page and follow all of the top level links to other Web pages.

3. Non-Indexing Retrieval Programs

These robots wait until they receive a request and then directly go out and look for URLs that match the query. Web Crawler is an example.

4. Harvesters

The best known and perhaps only real harvester is Veronica. It goes out and examines all of the titles of the documents it finds in a gopher tree and records them in a searchable dataset. It then moves on to the next gopher and repeats the process. It keeps on going until it has exhausted all the gophers that it can find. A harvester is different from a gatherer because the former attempts to record every URL in existence. In contrast, a gatherer works continuously, traversing back and forth and is never done. The Veronica harvest is complete and comprehensive, but it only applies to gopher servers and must be updated periodically in order to stay current.

An Experiment

All five of the search engines whose URLs I have noted roughly fall into type 2 above, that is, gatherers. Perhaps the most complete are AltaVista and Lycos.

Let's try a little experiment to see how many "hits" we can get. Use the key words:

    "Melrose Place"

Please try these words on all five search engines and compare the results you get.

    Note: AltaVista is one of the best search engines around. Here is an interesting side experiment to try with it. Select a URL of interest to you. For example, how about trying the URL of the home page for your school or business. Enter the URL for this page into AltaVista's search field. AltaVista will find any Web pages with links to the page that you have selected. You can also do the same thing with HotBot. Try it with both engines and compare the results that you get!

As previously mentioned, an effective search begins with a strategy or a plan. This means making use of the best resources at hand in order to gain leverage over the process of working toward the solution of a problem. This may mean discarding some routes to the object of the search, in favor of those more likely to succeed. The diagram below will help to visualize the process:

Guidelines For Effective Searching

A checklist can be a handy way to get started searching the Web. The list should be limited to those search engines that you have found to be the most productive. The value of a checklist is:

  1. You are less likely to overlook key resources.
  2. The more you organize the information that you retrieve, the less likely you are to get disoriented by it.

Here are some things to think about as you conduct your search.

  1. Be specific. Define the unique qualities of your subject.
  2. Follow a logical plan, but trust your instincts about the correct path to search.
  3. As you gather more information about your subject, build a mental model of the intended outcome of your search. Update it when you find significant new information.
  4. Be alert to new pathways. If you find a relevant Web page it can be valuable not only in terms of immediate benefits, but also in terms of where it might lead.
  5. Work toward building up a kit of investigative tools so that you can use multiple approaches.
  6. Lastly, keep in mind that there are information professionals, brokers, and other experts who make a living searching out information. They can be found on the net. If time equals money, sometimes it pays to hire help.

Exercise

So that you might get some experience with these tools, here is another URL for you to try. This Web Gopher page was put together here at Arlington Courseware. It lists several of the best search engines. (However, it is less complete than the World Power Index.):

As an alternative, here is another URL to try for MetaCrawler, which acts as an intelligent agent to contact several Internet search engines:

I suggest that you edit a new HTML file that you can use to build a "research launch-pad." I will explain this exercise further below. As a first step view the source HTML file for the Arlington Web Gopher page. There will be a few things that may be a little mystifying in this HTML file, but most of it should be easy to comprehend. The most important new concept in this HTML file is the use of a definition list. The elements of this construct are, first, the corresponding beginning and end of the (D)efinition (L)ist, denoted by <DL> and </DL>, respectively. This form is sometimes called a container, because this pair of tags surrounds the content of the definition list.

Secondly, for each term to be defined, the (D)efinition (T)erm is labeled with a <DT> tag. This is used in much the same way as the <LI> tag in ordered and unordered lists covered in previous tutorials.

Finally, each definition term must have a (D)escription that is tagged <DD>. In actual practice definition lists are used for many things besides lists of definitions. It is really an alternative and excellent way to make an unnumbered list.

In any event, as an exercise, using the <H2> and </H2> heading tags, make up a heading. You may call it anything you like, but I call mine my "Research Launch-pad." Use the normal HTML boilerplate for <HTML>, <HEAD>, <BODY>, etc.. Copy over the definition lists from the Arlington Web Gopher page. You do not have to copy the HTML for each search engine, only the ones on this list that we are going to use. These include:

  • Yahoo
  • Infoseek
  • CMU Lycos
  • Veronica

To complete our menu, add to it lines for AltaVista, WebCrawler and HotBot, whose URLs are mentioned previously. Use similar HTML for a definition list in order to complete the menu using the URLs for these two engines. (NOTE: You may include the in-line graphics if you want. That is a purely optional part of the exercise.)

You now have taken an important step forward in constructing your launch-pad. We will finish it up in the next tutorial and add more features.

Review Question

Regarding the search for "Melrose Place," which search engine got the most hits? How were they different? Which one(s) got the best hits? Do you know why?

Search Engine Critique

Lastly, here are a few of the limitations that are sometimes mentioned about search engines:

  • They are inconsistent in what they index and users do not always understand how they work.
  • Most indexing schemes accept the content owner's word for what is there.
  • The indexing is inconsistent among tools.
  • The owner usually determines the content.
  • Search engines are frequently unavailable.
  • Searching by keyword often results in too many false hits and not enough good ones.

Overall, the Web would be much more difficult to navigate without search engines, despite some of their obvious short-comings.

End Notes

(1) Open Text Corporation, "Open Text 5: High Performance Text Retrieval Tools," (Open Text Corp., Waterloo, Ontario, Canada, 1994 [Advertisement]).

(2) Kennedy, Noah, The Industrialization of Intelligence: Mind and Machine in the Modern Age (London, Unwin Hyman, Ltd., 1989. pp. 2-3.)


Copyright ©1997 by Thomas P. and Barbara L. Copley. All rights reserved.
No reproduction in any form is permitted without the express written permission of the authors.

--boundary5021--

--------------752C15A5909--