--------------752C15A5909
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Do you want me to continue sending these?
--------------752C15A5909
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Received: from kiki.arlington.com (daemon@kiki.arlington.com [140.174.170.5]) by sub.sonic.net (8.8.8/8.8.5) with ESMTP id FAA27088 for <burmkat@sonic.net>; Mon, 27 Apr 1998 05:21:40 -0700
From: owner-links-apr@arlington.com
X-envelope-info: <owner-links-apr@arlington.com>
Received: from localhost (daemon@localhost)
by kiki.arlington.com (8.8.8/8.8.5) with SMTP id FAA18874;
Mon, 27 Apr 1998 05:08:53 -0700 (PDT)
Received: by kiki.arlington.com (bulk_mailer v1.6); Mon, 27 Apr 1998 05:06:34 -0700
Received: (from majordom@localhost)
by kiki.arlington.com (8.8.8/8.8.5) id FAA18762;
Mon, 27 Apr 1998 05:06:33 -0700 (PDT)
Date: Mon, 27 Apr 1998 05:06:33 -0700 (PDT)
Message-Id: <199804271206.FAA18762@kiki.arlington.com>
Subject: LINKS -- Tutorial Number Five (HTML mail)
To: links-apr-outgoing@kiki.arlington.com
Reply-To: tcopley@arlington.com
MIME-Version: 1.0
Content-type: multipart/alternative; boundary="boundary5021"
Content-Transfer-Encoding: 7bit
Sender: owner-links-apr@arlington.com
Dear Workshop participants,
This is an HTML mail document meant to be read with a MIME
(multipurpose Internet mail extensions) compatible, HTML-capable,
e-mail client, such as Netscape Mail 3.0 or later. If you are reading
this message, it probably means that your e-mail client program does
not have these features.
An alternative method for reading this document is to connect to the
Arlington Courseware Web site. In order to view this document on our
Web site, please start your Web browser, and connect to the following
URL:
http://www.arlington.com/links/tutorial5/html/
You will be prompted for a user name and a password by your browser.
For the user name, please type (in lower case):
links
and for the password, also type (in lower case):
links
Your browser will then fetch these workshop materials and display them
for you.
TEXT VERSION
A plain text (non-HTML) version of this document is also available. To
request a copy of this tutorial in plain text, please send an e-mail
message to my automatic resending facility at the e-mail address:
mbot-apr@arlington.com
and type in the Subject line of your message:
send tut5
Please leave the body of the message blank.
If you have any questions about these procedures, please send an e-mail
message to <tcopley@arlington.com>.
TPC
[ ******** PLEASE IGNORE THE REST OF THIS DOCUMENT ******** ]
--boundary5021
Content-Type: text/plain; charset=us-ascii
Dear Workshop participants,
This is an HTML mail document meant to be read with a MIME
(multipurpose Internet mail extensions) compatible, HTML-capable,
e-mail client, such as Netscape Mail 3.0 or later. If you are not
using an HTML-capable mail program, the HTML section of this document
may show up as an attachment. You can save the HTML attachment to your
hard drive, and open it with your Web browser in order to view it.
An alternative method for reading this document is to connect to the
Arlington Courseware Web site. In order to view this document on our
Web site, please start your Web browser, and connect to the following
URL:
http://www.arlington.com/links/tutorial5/html/
You will be prompted for a user name and a password by your browser.
For the user name, please type (in lower case):
links
and for the password, also type (in lower case):
links
Your browser will then fetch these workshop materials and display them
for you.
TEXT VERSION
A plain text (non-HTML) version of this document is also available. To
request a copy of this tutorial in plain text, please send an e-mail
message to my automatic resending facility at the e-mail address:
mbot-apr@arlington.com
and type in the Subject line of your message:
send tut5
Please leave the body of the message blank.
If you have any questions about these procedures, please send an e-mail
message to <tcopley@arlington.com>.
TPC
--boundary5021
Content-Type: text/html; charset=us-ascii
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
NOTE: This document is meant to be viewed with a browser-capable e-mail client, such as Netscape Mail 3.0. If this presentation appears strange, we suggest that you connect to our Web site to view this document instead <http://www.arlington.com/links/tutorial5/html/>. The Web server will ask you for a username and password. Please put in "links" for both.
In this tutorial you will learn how to:
Just for a moment, please imagine what it might be like if you went to a store where, instead of paying the store keeper for goods and services, the storekeeper had to pay you a fee to come and take the merchandise away. In this upside-down, imaginary world it would be completely unheard of to pay for anything. Sounds ridiculous, right? Yet, in the information sphere we are rapidly approaching the point where information is becoming such an unlimited commodity that it can be compared to air or sea water in its availability. We can have as much as we want at no cost. We are only limited by our storage capacity. According to an estimate by The International Data Corporation, a computer industry marketing research firm: In 1985, the number of documents in the world was doubling every five years. By 1989 the amount was doubling every three years. In 1991, they doubled every year, and in 1994 they are estimated to double in nine months.(1) Even allowing for a trace of hyperbole because the source of this claim is an advertisement for text-searching software, it does seem to square with the direct personal experience of many of us who have been participating in the global Internet. The production of raw information has clearly been in an exponential growth phase for a number of years, with no end in sight. Would it be totally surprising then, if information providers were willing to pay us to consume their information? One is tempted to ask whether any more production of information is really warranted under the circumstances. However, we need information to live--although it must be information that is directly relevant to our daily lives and the challenges that each of us face. This is the rub: we need meaningful information to live, but without question the growing amount of information consists mainly of the irrelevant. Twenty-Two Centuries AgoTo see that things were not always so, let us turn for a moment to the legendary library of Alexandria in Egypt twenty-two centuries ago. It represented the flowering of Greek scholarship in its time, having more than 500,000 scrolls by the end of King Ptolemy's reign. The catalog alone is said to have numbered 120 volumes--and the collection continued to expand through an aggressive program of acquisition. In fact, the expansion program was nothing short of ruthlessly zealous. Alexandria was one of the greatest seaports of the day, and the Ptolemaic regime had a policy of inspecting all visiting ships for books. When books were found that were deemed valuable, they were confiscated and taken to the library for copying. However, only the copies were returned to the original owners. This attitude reflected the relative scarcity of information at the time. It made these written artifacts valuable booty to be amassed as a symbol of the prestige and power of the rulers of Alexandria. Ptolemy III went to the extent of striking a bargain with the authorities in Athens to make copies of the tragedies of Aeschylus, Sophocles, and Euripides. As security for the safe return of these precious manuscripts, Ptolemy had to pay the present-day equivalent of millions of dollars. However, once in his hands, Ptolemy III had copies made, but only returned copies of their priceless legacies to the Athenians, thereby gleefully forfeiting his deposit. This brilliant yet devious campaign amassed the largest collection in the world at that time of the intellectual treasures of the ancients. It included virtually all of the Greek classics, as well as the best knowledge of the day of science, mathematics, and engineering.(2) Setting aside their value as rare books and documents, who could imagine today going to such lengths when most of these documents can simply be downloaded from the Web for no cost at all? In fact, there is perhaps declining merit in maintaining collections of books even in digital form. If they can be downloaded at any time, unless there is an immediate requirement for them, these tracts simply take up space on the computer--better someone else's than yours. Perhaps this may seem like a radical and perhaps overly grim view of the value of collections of books, yet I make it in order to demonstrate that the meaning and usefulness of information has become separate from its physical presence. In effect, the copy, that is, the ability to make a copy has superseded the value of the original work. Static Versus Dynamic Sources of InformationIt used to be that when one became stumped trying to find out information about a subject, that one, usually in desperation, turned to the reference librarian for help. The librarian knew his or her collection well, and how to use such arcane resources as the card catalog, and could find what one sought with ease. With "the collection" now residing on millions of networked computers throughout the world, it is incomprehensible that any one individual could ever fathom even a significant portion of this assemblage. Yet the skills that librarians practice are as valid today as ever. In fact, knowledge workers, scholars, and professionals of all kinds will need to practice some of the same skills previously performed only by librarians if they are to cope with the ever-growing mountains of information. One traditional way of handling collections of information is to categorize--that is, to "Divide and conquer!" as the old maxim goes. If we can just organize all of the material into pigeon-holes, then we can label each pigeon hole, and have the power to find the things within them when we want them. In today's world of exponentially expanding information, categorization has the potential to back-fire. If, instead of pigeon-holes, we think of categories as tin-caps and the flow of new information as a fire-hose, the problem becomes more obvious. Similarly, bibliographies, another traditional tool of the scholar, have been rendered less than completely useful by the rapidity with which new information is being produced. Bibliographies represent only a snap-shot of the state of the knowledge in a field at a point in time. A newly produced bibliography may seem sadly out-of-date in a matter of a few months. Only information-organizing schemes that are subject to frequent updates can be completely relied upon. Strategy FiveRely on dynamic information resources, that is, those that assure currency of the information. Overcoming InfoglutWhat are some of the information-handling coping skills that can help us survive this wildly overflowing cornucopia of information known as the Internet? Here are a few that come to mind.
WWW Search ToolsYou must also understand the search tools of the WWW, and how to use them. At this point, I would like you to turn your attention to a specific Web page, namely the WWW Power Index, compliments of Web Communications, an Internet presence provider located in Santa Cruz, California. The URL is: http://www.webcom.com/webcom/power/index.html By the way, there is nothing sacred about this nice collection of URLs. It is not unlike several others that I have seen on the net. When I refer to a specific menu item, I will also give its URL, so there is no reason to become wedded to this particular index. Nevertheless, it was kind of the folks at Web Communications to provide this, so if you are in the market for a Web presence provider, you may want to check out their home page. Assuming that you have found the Web Power Index without any difficulty, you should see this menu:
Select "Internet/WWW Search Tools," the first menu item. Next you will see the menu below.
Finally, once again select the first menu item, "WWW Search Tools" and you will then be presented with this menu:
Taxonomy of Search EnginesPart of my motivation for taking you through this exercise is for you to get some idea of the range of search engines and other sources for finding things on the Web. The only ones that I am going to cover, as they are personal favorites, are the following, and in Tutorial Six I will also cover: and Veronica via the University of Minnesota. Of course, these are only some of my personal favorites, and tastes will surely vary. As you can see from the World Power Index, there are many possible choices and this menu is far from exhaustive. The principles of searching that I will discuss are valid regardless of the particular search engine. What are the major differences between the various search engines? There seem to be four major types. A search engine may be classified as:
Some examples include: 1. Indexes These consist of databases of URLs that have been suggested or collected from around the WWW. Perhaps the best known index is Yahoo, which got started at Stanford University but has now gone commercial. 2. Automatic Collectors, or Gatherers These are special purpose programs that are capable of traversing the WWW and collecting URLs as they go. These programs are sometimes called "spiders" or "robots" for obvious reasons. These can be of two types:
3. Non-Indexing Retrieval Programs These robots wait until they receive a request and then directly go out and look for URLs that match the query. Web Crawler is an example. 4. Harvesters The best known and perhaps only real harvester is Veronica. It goes out and examines all of the titles of the documents it finds in a gopher tree and records them in a searchable dataset. It then moves on to the next gopher and repeats the process. It keeps on going until it has exhausted all the gophers that it can find. A harvester is different from a gatherer because the former attempts to record every URL in existence. In contrast, a gatherer works continuously, traversing back and forth and is never done. The Veronica harvest is complete and comprehensive, but it only applies to gopher servers and must be updated periodically in order to stay current. An ExperimentAll five of the search engines whose URLs I have noted roughly fall into type 2 above, that is, gatherers. Perhaps the most complete are AltaVista and Lycos. Let's try a little experiment to see how many "hits" we can get. Use the key words: "Melrose Place" Please try these words on all five search engines and compare the results you get. Note: AltaVista is one of the best search engines around. Here is an interesting side experiment to try with it. Select a URL of interest to you. For example, how about trying the URL of the home page for your school or business. Enter the URL for this page into AltaVista's search field. AltaVista will find any Web pages with links to the page that you have selected. You can also do the same thing with HotBot. Try it with both engines and compare the results that you get! As previously mentioned, an effective search begins with a strategy or a plan. This means making use of the best resources at hand in order to gain leverage over the process of working toward the solution of a problem. This may mean discarding some routes to the object of the search, in favor of those more likely to succeed. The diagram below will help to visualize the process:
Guidelines For Effective SearchingA checklist can be a handy way to get started searching the Web. The list should be limited to those search engines that you have found to be the most productive. The value of a checklist is:
Here are some things to think about as you conduct your search.
ExerciseSo that you might get some experience with these tools, here is another URL for you to try. This Web Gopher page was put together here at Arlington Courseware. It lists several of the best search engines. (However, it is less complete than the World Power Index.): As an alternative, here is another URL to try for MetaCrawler, which acts as an intelligent agent to contact several Internet search engines: Note: if you are not sure what to search for, try MetaSpy, which lets you watch other people's searches underway. I suggest that you edit a new HTML file that you can use to build a "research launch-pad." I will explain this exercise further below. As a first step view the source HTML file for the Arlington Web Gopher page. There will be a few things that may be a little mystifying in this HTML file, but most of it should be easy to comprehend. The most important new concept in this HTML file is the use of a definition list. The elements of this construct are, first, the corresponding beginning and end of the (D)efinition (L)ist, denoted by <DL> and </DL>, respectively. This form is sometimes called a container, because this pair of tags surrounds the content of the definition list. Secondly, for each term to be defined, the (D)efinition (T)erm is labeled with a <DT> tag. This is used in much the same way as the <LI> tag in ordered and unordered lists covered in previous tutorials. Finally, each definition term must have a (D)escription that is tagged <DD>. In actual practice definition lists are used for many things besides lists of definitions. It is really an alternative and excellent way to make an unnumbered list. In any event, as an exercise, using the <H2> and </H2> heading tags, make up a heading. You may call it anything you like, but I call mine my "Research Launch-pad." Use the normal HTML boilerplate for <HTML>, <HEAD>, <BODY>, etc.. Copy over the definition lists from the Arlington Web Gopher page. You do not have to copy the HTML for each search engine, only the ones on this list that we are going to use. These include:
To complete our menu, add to it lines for AltaVista, WebCrawler and HotBot, whose URLs are mentioned previously. Use similar HTML for a definition list in order to complete the menu using the URLs for these two engines. (NOTE: You may include the in-line graphics if you want. That is a purely optional part of the exercise.) You now have taken an important step forward in constructing your launch-pad. We will finish it up in the next tutorial and add more features. Review QuestionRegarding the search for "Melrose Place," which search engine got the most hits? How were they different? Which one(s) got the best hits? Do you know why? Search Engine CritiqueLastly, here are a few of the limitations that are sometimes mentioned about search engines:
Overall, the Web would be much more difficult to navigate without search engines, despite some of their obvious short-comings. End Notes(1) Open Text Corporation, "Open Text 5: High Performance Text Retrieval Tools," (Open Text Corp., Waterloo, Ontario, Canada, 1994 [Advertisement]). (2) Kennedy, Noah, The Industrialization of Intelligence: Mind and Machine in the Modern Age (London, Unwin Hyman, Ltd., 1989. pp. 2-3.)
Copyright ©1997 by Thomas P. and Barbara L. Copley.
All rights reserved. |
--------------752C15A5909--