ML20134K941

From kanterella
Jump to navigation Jump to search
Comments on Encl Article Entitled, Evaluation of Retrieval Effectiveness for Full-Text Document Retrieval Sys, Per 850327 Request.Article Critical of Ibm Storage & Info Retrieval Sys (Stairs)
ML20134K941
Person / Time
Issue date: 04/11/1985
From: Altomare P
NRC OFFICE OF NUCLEAR MATERIAL SAFETY & SAFEGUARDS (NMSS)
To: Mausshardt D
NRC OFFICE OF NUCLEAR MATERIAL SAFETY & SAFEGUARDS (NMSS)
Shared Package
ML20134K933 List:
References
REF-WM-1 NUDOCS 8508300520
Download: ML20134K941 (15)


Text

, THIS MEMORANDUM REPLACES THE MEMORANDUM DATED APRIL 5, 1985, SAME SUBJECT.

'

  • DISTRIBUTION:
  • WM:s/f JGreev s NMSJ: r/f J0 Bunting LKction:r/f MKnapp Originator HMiller

, PAltomare REBrowning 403/PA/85/04/05. MKearney MJBell DMattson g yi g JSurmeier LHigginbotham MEMORANDUM FOR: Donald B. Mausshardt, Deputy Director Office of Nuclear Material Safety and Safeguards THRU: Joseph 0. Bunting, Jr. , Chief Policy and Program Control Branch Division of Waste Management, NMSS FROM: Philip M. Altomare, Section Leader Program Planning Section Policy and Program Control Branch Division of Waste Management, NMSS

SUBJECT:

COMMENTS ON " STAIRS" TYPE FULL-TEXT RETRIEVAL SYSTEMS Your memorandum of March 27 requested comments on an article, "An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System," contained in a memorandum from William Besaw to Jack Roe. This article was critical of the " STAIRS" type full-text document-retrieval system. I have provided comments on the subject article below.

As you are well aware the Division of Waste Management has a pressing need for an improved document storage and retrieval system to support our activities under the Nuclear Waste Policy Act of 1982 (NWPA). We are investigating system requiredients with assistance from the Office of Resource Management and the Office of Administration. We have already identified the potential problem raised in the noted article and have provided for corrective system design.

The central point of the subject article is that large full-text document retrieval systems have poor cearch and recall capability, i.e., in a test of STAIRS (IBM Storage And Inft ition Retrieval System) containing 350,000 pages of text only 20 percent of the relevant documents were retrieved.

The approach used in the study conducted was to enter the document full-text into the system and to depend upon computer searching of the text using search words or word combinations to locate documents of interest. Different people, particularly those.with dif ferent backgrounds, use different descriptor words, therefore, many relevant documents were missed in the large data base tested.

The skill of the person designing the search and the time spent are significant factors in the number of relevant documents recovered. The approach used in this study did not, however, take advantage of the full capabilities of a STAIRS type system. STAIRS allows a document to be divided into separate fields (author, date, subject, abstract, text, etc.) any or all of which can be OFC :WMPC :WMPC  :  : .  :  :

NAME :PAltomare:dh J08unting  :  :  :  :  : :

DATE :85/04/10  :  :  :  :  :  :

8508300520 850724 PDR WASTE WM-1 PDR

DISTRIBUTION:

' WM:s/f JGre:vas NMSS: r/f J0 Bunting Section: r/f MKnapp Originator HMiller PAltomare REBrowning 403/PA/85/04/05 MKearney MJ8 ell DMattson JSurmeier LHigginbotham searched. With a little creativity, a "descriptor" field can be added to include file codes, standardized key words, user desired keywords, or other indexing type information. Since this is the same information used in manual or computer indexing document identification systems, STAIRS type systems are always as good as these systems and in addition provide the option for full-text search plus the capability of bringing the document up to the visual screen for quick scanning. This concept was tested in a recent ADM demonstration of the GESCAN system (a full-text storage and retrieval system that is reportedly faster and cheaper than STAIRS) where the descriptor field was included for WM. It worked as expected. We have also contacted a coauthor of the article, D.C. Blair. Mr. Blair's intent was to point out a problem in

- large data bases. He is not against full-text storage and retrieval systems and feels that the added search field could significantly improve the recall capability. One should not conclude from the article that manual or index type document storage and retrieval systems would necessarily have better recall capability for large data bases than a STAIRS type full-text search system.

Considering the above, the recall ability is not the deciding factor in determining the extent to which full-text storage and retrieval is implemented.

We consider cost to be a major factor and this is being investigated along with recall capability and many other items in a pilot project we are undertaking.

We also hope to evaluate the GESCAN, optical character readers, digital image scanners, and conventional document storage and retrieval methods. With a carefully planned system design, cost can be significantly reduced and may result in cost savings. This is particularly true when the real cost (and delays) of satisfying FOIA and hearing " Discovery" requirements are considered.

I have underlined the term " extent" above because in point-of-fact there is an increasing, though uncontrolled, use of full-text storage taking place in the agency. The question of whether to use full-text computer compatible storage is becoming academic. A standard practice with the IBM 5520 word processor- is to archive documents to floppy disks which are then saved. Quite frequent full-text of documents u.e retrieved from the disks thus constituting full text storage and retrieval. Unfortunately, there is very little control exercised in what is saved and these disks contain early draf ts and lots of what is best defined as junk. A well worded F0IA could cause us much trouble. Therefore, as part of our pilot program we are considering procedures and guidelines for division document handling as well as methods of storage and retrieval.

It is worthwhile to note that a STAIRS type system can serve as an archival system for IBM 5520 documents. In the pilot program it is planned that documents will be entered into the storage and retrieval system through the IBM 0FC :WMPC :WMPC .  :

NAME :PAltomare:dh J0 Bunting .  :  :  :

DATE :85/04/10  : . .  : -

r

'S DISTRIBUTION:

WM:s/f JGreeves NMSS: r/f J08unting Section: r/f MKnapp Originator HMiller PAltomare REBrowning 403/PA/85/04/05' ~ , MKearney MJBell DMattson JSurmeier LHigginbotham 5520 in accordance with set procedures and under the control of a system administrator.

WM is specifically concerned with defining the requirements for a document storage and retrieval system that will allow us to fulfill the directive of the NWPA to approve or disapprove a license application for a HLW repository within three years from license application or in four years if we choose to make a special request to Congress. We note that: this will be the first-of-a-kind high-level waste repository; a very large number of documents will be generated before the license application is received in the year 1990; this is expected to be a contentious hearing and probably the largest single licensing action

. the NRC will undertake. Accordingly, a document and data management system capable of storing, searching, and timely retrieval of information of importance to licensing must be available. For this reason we have entered into our present study and pilot program. This work is of interest to many others, however, and we are working closely with DOE to help in establishing the requirements for their system (they are now controlling 200,000 documents as compared to 20,000 for WM), also with RM in the pilot program and the Corporate Data Network (CDN) and with ADM in defining system requirements through their Integrated Information Handling System Working Group. We feel that substantial progress has been made to date but recognize that we are working near the state-of-the-art of information management systems and that problems should be anticipated. With a continuation of the cooperation we have received we feel that we can fulfill our needs as well as contribute to the overall NRC effort.

Philip M. Altomare, Section Leader Program Planning Section Policy and Program Contro'l .anch Division of Waste Management n

/

OFC :WMPC brl :WMP C7  :

- _ --- . ---- gL---.-

- /" // -_--_-_--.--_---------.-----------

NAME :PAltomare:dh J08{unting  :  :  :  : *:

l

__---._--------_--.-_-, i

--_----- ------------.----------- :----------- :------------:----------- 1 DATE :85/04/10  : q [U  : .  :  : .

m AftD TRAftSWTTAL SW 7 TI. r. .:

. . e - ,o , e,. . e.e . e.e -

/

l , ..

BA. - -

$$ a: _

[ 5  %

' e i S. M r.Mm l

1 Action fee 20~

h l Approwel For Oserence per h As n .f_f For Cervostion . H T. ? "_

Circulate For Yo.r Information ,. . See M

-m g,_;

q 7" i e- ^^ ^^ : -

. . _ Jeetify ,,

RD4 ARKS l O p HM. zmM ,

any 4d p .

c m -s,my ny~ .

WM Record File. WM Project St A %& -

l 2 00ckdt NO.

DO NOT use this form as a k000RO ef esprovals, concurrences, disposals, cleargrpet and similar actions Bs;r;'M (Name, org. symbol, Agency / Post) --- Room No.-Bldg.

YZ -Q.W '

Phone No.

v 4 / T C (U M f & _... -

- 1. 5 - 3.1 p t-to2 OPTIONAL FORM 41 (Rev. 7-763 e cro tsa1 o - n1-s29 (1 9) rmnYN1Y1-11.20s l

l l

l

+ - - ,

r -,- -

s' couvrns mcnces

\

Edgar H. Sibley An EtNfRFHf9W9f:S l8Ffe, operational fp0-testil5C5I5Nf-Tttrievelfmg

'* "d E"*' \

(containing roughly 350,000 pages of text) sisene the syneen se heretrievingq JghWundtgsessentraf stut Ascannents'Friepent is a particular sesch.:Thr l findings are discussed in terms of the theory and practice of full-text document retrieval. 1 AN EVALUATION OF RETRIEVAL EFFECTIVENESS FOR A FULL-TEXT DOCUMENT-RETRIEVAL SYSTEM DAVID C. BLAl.R and M. E. MARON

, =

1

[ need for human indexers whose employ. ment is in-Document retrieival is the problem of finding stored /

creasingly ccstly and whose work often appears incon-documents that contain useful information. There exist - sistent and less than fully effective.

a set of documents on a range of topics, written by i different authors, at different times, and at varying A pioneering test to evaluate the feasibility of full-text search and retrieval was conducted by Don Swan- ,

levels of depth, detail clarity. and precision. and a set son and reported in Science in 1960 (6l. Swanson con.

of individuals who. at different times and for different cluded that text searching by computer was signifi-reasons. search for recorded information that may be cantly better than conventional retrieval using human contained in some of the documents in this set. In each subject indexing. Ten years later. in 1970. Salton. also instance in which an individual seeks information, he i in Science, reported optimistically on a series of experi-or she will find some documents of the set useful and ments on automatic full. text searching [3).

! other documents net useful: the documents found use- This paper describes a large-scale, fullaoxt. search fut are, we say, relevant: the others not relevant.

l and retrieval experiment aimed it evaluating the effec.

How should a collection of documents be organized livenese el full text retrieent" For the purposes of our so that a person can find all and only the relevant study, we examined IBM's full-text retuW systeens items? One answer is automatic full-text retrieval. STAlltS/ STAIRS, an acronym for" Storage AndInfor-which on its surface is disarmingly simple: Store the ination Retrieval System." is a very fast. large-caisacity, .

full text of all documents in the collection on a com. full-text docurnent-retrieval system. Our eenpirical I puter so that every character of every word la overy i sentence of every document can be located by the me. Hudy of STADt3 la a litiention support siteetles J 3 showed its retrieval effectivenees to be surprisingly (

chine. Then, when a persen wants information from  !

that stored collection, the computer is instructed to poet. We offer theoretical reasons to explain why this j

search for all documents containing certain specified poor performance should not be surprising and also

,1 -

words and word combinations, which the user has why our experimental results are not inconsistent with specified, the earlier more favorable results cited above. The em trieval probleene we desamthe would be peshieus week; 4 Two elements make the Idee of automatic full text l

retrieval even more attractive. On the one hand. digital any M $sMost retegesel sysesen endia this J sensesar.eandy should not ho sess es m entique of u technology continues to provide computers that are l

larger, faster. cheaper, more reliable, and easier to use: , r STAIAS einges hut sother meetstges ettha gelmelples e(

and. on the other hand, full. text retrieval avoids the whoshitandeshecimi&.eemtW eyelesed a nas acuoi cet es,osoo-a:ss rse ase based c

'39 Cammnucatuans of At ACM

~ " -~**

o Campxtus Practsces

~

CONDUCT OF THE TEST x For the test. we attempted to have the retrieval system Number of Relowns and Retnewd "

A***' ~ used in the same way it would have been during actual ltsal Numter Reiment n, litigation. Two lawyers. the principal defense attorneys x

in the suit. participated in the experiment. They gener.

Number of Reiment and Retneed ~ ated a total of 51 different information requests, which Pmonion n, itset Numtwr Rommed were translated into formal queries by either of two paralegals, both of whom were familiar with the case and experienced with the STAIRS system. The parale-gals searched on the database until they found a set of documents they believed would satisfy one of ths ini-HGURE 2. DeAneens of Proceen and Recas tial requests. The original hard copies of these docu.

ments were retrieved from files, and xerox copies were sent to the lawyer who originated the request. The law-defense of a large corporate law suit. Access to the documents was provided by IBM's STAIRS /TLS soA. yet then evaluated the documents, ranking them ac-cording to whether they were " vital." " satisfactory."

were (Storage And Information Retrieval System /The-

" marginally relevant." or " irrelevant" to the original re-saurus Linguistic System). STAIRS software represents .

state-of.the-art software in full text retrieval. It pro- quest. The lawyer then made an overall judgment con-cerning the set of documents received, stating whether j vides facilitfee for retrieving text where specified words t he or she wanted further refinement of the query and '

appear either singly or in complex Boolean combine .r

~~

further searching. The reasons for any subsequent tio *Xiser can spEify the retrieFal of text in which !

i query revisions were made in writing and were fully peer appear together anywhere in the document.

recorded. The information. request and query.

d within the same paragraph. within the same sentence, t formulation procedures were considered complete only

or adjacent to each other (as in "New* adjacent " York').

when the lawyer stated in writing that he or she was l Retrieval can also be performed on fields such as au-satisfied with the search results for that particular I thor, date and document number. STAIRS provides

' query (i.e. in his or her judgment, more than 75 per-ranking functions that permit the user to order re. cent of the " vital."" satisfactory

  • and
  • marginally rele-trieved sets of 200 documents or less in either ascend- vant" documents had been retrieved). It was only at ing or descending numerical (e.g., by date) or alphabetic this point that the task of measuring Precision and Re.

(e.g., by author) order. In addition, rett:eved sets of less call was begun. (A diagram of the information. request than :'00 documents can also be ordered by the fre.

procedure is given in i~igure 3.) The lawyers and paral.

quency with which specified search terms occur in the retrieved documents.The Thesaurus Linguistic System egals were permitted as much interaction as they thought necessary to ensure highly effective retrieval.

(TLS) provides the facilities to manually create an inter-active thesaurus that can be called up by the user to The parategals were able to seek clarification of the lawyers'information request in as mucn detail and as semantically broaden (or narrowl his or her searches; it often as they desired, and the lawyers were encouraged allows the designer to specify semantic relationships to continue requesting information from the database between search terms such as

  • narrower than.* until they were satisfied they had enough information

" broader than." related to." *synonomous with." as to defend the lawsuit on that particular issue or query.

well as automatic phrase decomposition. STAIRS / TIS ,

In the test, each query required a number of revisions.

thus represents a comprehensive full. text document-and the lawyers were not generally satisfied until many retrieval system, retrieved sets of documents had been generated and evaluated.

THE EXPERIMENTAL PROTOCOL Precision was calculated by dividing the total num-To test how well STAIRS could be used to retrieve all ber of relevant (i.e

  • vital."" satisfactory." and " margin-and only the documents relevant to a given request for ally relevant") documents retrieved by the total num-information. we wanted in essence to determine the values of Recall (percentage of relevant documents re- ber of retrieved documents. If two or more retrieved sets were generated before the lawyer was satisfied trieved) and Precision (percentage of retrieved docu. .

with the results of the search, then the retrieved set ments that are relevant). Although Precision is an im-considered for calculating Precision was computed as

  • portant measure of retrieval effectiveness,it is mean. the union of all retrieved sets generated for that request, ingless unless compared to the level of Recall desired (Documents that appeared in more than one retrieved by the user. In this case, the lawyers who were to use the system for litigation support stipulated that they . , set were automatically excluded from all but one set.)

Recall was considerably more difficult to calculate must be able to retrieve at least 75 percent of all the v documents relevant to a given request for information, since it required finding relevant documents that had not been retrieved in the course of the lawyers' search.

and that they regarded this entire 75 percent as essen.

To find the unretricoed relevant documents, we devel.

tial to the defense of the case. (The lawyers divided the relevant retrieved documents into three groups:

  • vital." oped sample frames consisting of subsets of the unte.
  • satisfactory
  • and
  • marginally relevant." All other re- trieved database that we believed to be rich in relevant documents (and from which duplicates of retrieved rel.

trieved documents were constdered "irrelevJnt.")

Cammunuc2trans af the AOf Mt

\tarcs :H$ VAwe3 Numter i

Comptmg,Proctices ,

Start is-.m to a n=are ljyl -

iniormanon I

, nequest

  • J.- _ _ _ - - -

e'y u-Hyf

'~

i 4 MQuery l

J'

(.NJ----------,

J ,

Am w) 4 Paralegal I i i 1 j J -, i 4

+

Clans-  ! .

cation '

j, :%r;u.vw.j j

, f.e - .

4 l- ' >

c. Formal Query

, to System j ,

+

- ~

eineved Revi /

Set Ade.

  • FormalCuery > Step

\ A k

e.

l Submit to Inquirer

/

I vi

! fnquirer's Inquirer Evaluation

  • Satisfied j

+

Evaluation of Stop *---- Effectiveness by Experimenters FIGURE 3. The Informeelen Request Procedure evant documents had been excluded). Random samples ersted. The total number of relevant documents that were taken from these subsets, and the samples were existed in these subsets could then be estimated. We examined by the lawyers in a blind evaluation: the sampled from subsets of the database rather than the lawyers were not aware they were evaluating sample entire database because. for most queries, the percent-sets rather than retrieved sets they had personally gen. age of relevant documents in the database was less than 32 CJ s runicat:Jas JI the.tC.tf .\farc8: 19f5 Wlume 23 Water 3

v Computing Practices 2 percent, making it almost impossible to have both percent. WWWeie@MM manageable sample sizes and a high level of confidence tuned 9sesentage sedy agebess(et theNefeveindiaW-in the resulting Recall estimates. Of course no extrapo- @ subesene the lessysse using the'synese behesusf lation to the entire database could be made from these  ? they sesse seenestas a mansk tidner pusessesse lLass Recall esiculations. Nonetheless, the estimation of the esessepeessasi.

number of re.avant unretrieved documents in the sub- When we plot the value of Precision against the cor-sets did give us a maximum value for Recall for each responding value of Recall for each of the 40 informa-requeet. tion requests. we get the scatter diagram given in Fig-ure 4. Although Figure 4 contains no more data than TEST RESULTS Table I. It does show the relationships in a more ex.

Of the 51 retrieval requests processed values of Preci- plicit way. For example, the heavy clustering of points sion and Recall were calculated for 40. The other 11 in the lower right corner shows that in over 50 percent requests were used to check our sampling techniques of the cases we get values of Precision above 30 percent and control for possible bias in the evaluation of re- with Recall at or below 20 percent. The clustering in trieved and sample sets. the lower portion of the diagram shows that in 80 per-In Table I we show the values of Precision and Recall cent of the information requests the value of Recall was for each of the 40 requests. The values of Precision at or below 20 percent. Figure 4 also depicts the fre-ranged from a maximum of 100.0 percent to a mini- quantly observed inverse relationship between Recall mum of 19.6 percent The unweighted average value of and Precision, where high values of Precision are oRen Precision turned out to be 79.0 percent (standard devia- accompanied by low values for Recall, and vice versa tion = 23.2). The weighted average was 75.5 percent. [8].

This meant that, on average. 79 out of every 100 docu-ments retrieved using STAIRS were judged to be rele- OTHER FINDINGS vant. After the initial Recall / Precision estimations were The values of Recall ranged from a maximum of 78.7 done. several other statistical calculations were carried percent to a minimum of 2.5 percent. The unweightet out in the hope that additional inference.s could be average value of Recall was 20 percent (standard devie. made. First. the results were broken down by lawyer to tion = 15.9), an'd the weighted avenge value was 20.26 ascertain whether certain individuals were prima facie TASt.E t. Recad arid Prec:sion Values for Each Infom'ationr equest tnformeoen Wormeest request '

request w eer Pocae P oceion nurnber Recal P oceaan 1 . . 27 50.0 % 42.6 %

2 45.5 % 92.6 % 28 50.0 19.6 3 . . 3 . .

4 . . 30 7.0 100.0 5- ...i'- -

31 . . .

6 - . 8.9 ....fe ' 60.0 32 i 12.5 100.0 -3 7 , 20.6 . . p." , ' . 64.7 - 33 .- 18.2. 79.5 < b 43.9 .

  • 88.8 . 34 '. 14.1 45.1 u ,1 8 --

9 13.3 48.9 . 35  ; . . _ , . . . .: ,r 10 .

10.4 96.8 36 . g. . .

. 4.2 . , 33.3 ,,;

11 . . . ,

12.8 . , 100.0 37 .

15.9 . . , . ,.' 81.8 12.('if,' ' 9.6 a. . '# 84.2 38 ' 24.7* " ^ 83.3 ' 68.3 M "'d 33,9 nn 85.0 39

4i.,4,m 93 W l 7,,7 wr M 9 40 # #' 18.5 4.1 '
  • C'
  • 100.0 N*i .

m*.- 3$ Nyn , c imur ,9.0 ,

4if.pw M

, 16 N ~"'" . "'Y/- M e ' .

42 J * ,g,3 45.4i mm 4 ' Ss 91.0 96.9 d.'I .

>

  • r . t77pp. 9dR. .% % Act'. e t . 9 ; ..;; . 43 y;9thmt18.9m1M"100.0 M N

' 18 3%W..? 13.0-*:x 4 - 38.0%.("r' 44 '.c Mnt;.10.6 4 is,si 100.0.c.14 [

19 :t m;y 15.8, jew!d 42.1 ,

. C .b,ii';. 20.3.Wi.e 94.0 :ded u . .a 20i.go.,419.4 '.vih, 88.9 , 6 ;. v .48, .g,n.11.0, m e..,q .u . 86.7 y ,M ,

21 y . 41.0 .,,, 33.8 i 47 ,;w .13.4 4 o.,100.0 ;.3ng

i. :w22.f 3g3,,22.2 c,y,3 94.8., 48 ;t'!ir m 13 7 T.: IIre. 87.5 ,y ,

49 - . , ,17.4 87.8, r.

23 ;..a.. 100.0 -

. 2.8 . ., s. # . , .

  • P g . 13.5 75.7 "d 3

25'

.. N 13.0

' 94.0

  • 51 4.7 100.C i 26 7.2 95.3 A rage peces = :o os stannara oemean - is si A.erape ebeassan = 19 0% Starr.ans comosm = 23 3)

' farc*s : H$ 'blume :3 Numt're 3

. Geunscartons of t're AC.\t 33

Compatmg Prutsces 4

percent, again the results were not statistically signifi-cant at the .05 level.

o Mean Value for R and P The Retrieval Effectiveness of Lawyers versus Paralegals The argument can be made that. because STAIRS is a

- high speed. on-line. interactive system, the searcher at

. the terminal can quickly and effectively evaluate the I - output of STAIRS during the query modification proc-

' *

  • ess. Therefore. retrieval effectiveness might be signifi-
  • cantly improved if the person originating the informa-

. tion request is actually doing the searching at the ter-mwl. This would mean that if a lawyer worked di-rectly on the query formulation and query modification i at the STAIRS terminal, rather than using a paralegal as l . S.*, ** * ' intermediary, retrieval effectivenees might be im.

. *. H proved.

, y We tested this conjecture by comparing the retrieval f effectiveness of the lawyer vis & vis the paralegal on the i same information request. We selected (at random) Sve information requests for which the searches had al-ready been completed by the paralegal, and for which retrieved sets had been evaluated by the lawyer and FIGURE 4. Plot of Precision versus Recat for AIInformation Requests values of Recall computed. (Neither the lawyer who made the relevance judgments nor the parategal knew the Recall figures for these original requests.) We in-more adept at using the system than others. The results vited the lawyer to use STAIRS directly to access' the were as follows: database. giving the lawyer copies of his or her original Precision information requests. The lawyer translated these re-Recall Lawyer 1 22.7 % 76.0g psts into formal quer:es evalu2:it:g the text dis.

Lawyer 2 M.0% 81.4Po played on the screen, modifying the queries as he or she saw fit, and finally deciding w,n en to terminate the Although there is some difference between the results search. For each of the five information requests we for each 'awyer. the variance is not statistically sigmfi- estimated the minimum number af relevant documents cant it the 05 level. Although this was a very limited in the entire file. and knowing w hich documents the test, we can conclude that at least for this experiment lawyer had previously judged relevant we were able to the results were independent of the particular user in- compute the values of Recall for the lawyer at the ter-volved. minal as we had already done fcr he paralegal. Ifit Another area of interest related to the revisions made were true that STAIRS would give better results when to requests when the lawyer was not completely satis- the lawyers themselves worked at the terminal. the fled with the initial retrieved sets of documents. We values of Recall for the lawyers would have to be sig-hypothesized that if the values of Recall and Precision nificantly higher than the values of Recall when the for the requests where substantial revisions had to be parategals did the searching. The results were as fol-made (about 30 percent of the total) were significantly lows:

different from the overall mean values we might be able to infer something about the requesting procedure. Request Recall Recall Unfortunately, the values for Recall and Precision for number (parategal) (lawyer) the substantially revised queries (23.9 percent and 62.1 1 7.2P. 6.6%

percent, respectively) did not indicate a statistically sig- 2 19.4 % 10.3 %

mficant difference. 3 4.2% 26.4 %

Finally, we tested the hypothesis that extremely high 4 4.1% 7.4%

values of Precision for the retrieved sets would corre- 5 18.9 % 25.3 %

late directly with the lawyers' judgments of satisfaction Mean 10.7 % 15.2 %

I with that set of documents (which might indicate that (s.d. = 7.65) (s.d. = 9.83) the lawyers were confusing Precision with Recall). To 1 do this, we computed the mean Precision for all re- Although there is a marked improvement in the law-quests where the lawyers were satisfied with the initial yer's Recall for requests 3. 4. and 5. and in the average retrieved set. and compared this value to the mean Recall for all five information requests, the improve-Precision for all requests. Although the Precision for ment is not statistically significant at the .05 level requests that were not revised came out to be 85.4 (z = -0.81). Hence we cannot reject the hypothesis that 1

-s ., ven m . - e .M wer t l

Gmputmg Practices both the lawyer and the paralegal get the same results were constructed that contained the word " accident (s)"

alcng with soveral relevant proper nouns.

for Recall. m . .

_g f gM e6eesessednesudaiest."acci 3 WHY WAS RECALI. SO LOW The realization that STAIRS may be retrieving only one nt." b "C

^

~V

~~~

usithnespassnesdag soy 2 out of five relevant documents in response to an infor-The manner in which an mation request may surprise those who have used individual referred to the incident was frequently de-STAIRS or had it demonstrated to them. This is because pendent on his or her point of view. Those who dis-they will have seen only the retrieved set of documents cussed the event in a critical or accusatory way re-and not the total corpus of relevant documents: that is, ferred to it quite directly-as an " accident." Those who they have seen that the proportion of relevant docu-ments in the retrieved set (i.e.. Precision) is quite good were personally involved in the event. and perhaps culpable. tended to refer to it euphemistically as, inter (around 80 percent). The important issues to consider

, alia. an " unfortunate situation." or a " difficulty." Some-here are (1) why was Recall so low and (2) why did the times the accident was referred to obliquely as "the users (lawyers and paralegals) believe they were re-subject of your last letter.""what happened last week trieving 75 percent of the relevant documents when, in was . . . ." or, as in the opening lines of the minutes of a fact, they were only retrieving 20 percent.

De law welmes'et Resetegnessed beesume t meeting on the issue. "Mr. A: We all know why we're here .. ." Sometimes retevant documents dealt with sear 6evelis delbesit as use to seestsee deconseems the problem by mentioning only the technical aspects suisiest beesume les design le bened.en the mamampshm that it is e einspie asetter Ibr users to ibreese the enest of why the accident occurred, but neither the accident itself nor the people involved. Finally, much relevant werde and phaseos that win Ise used in the docussened i information discussed the situation prior to the accident they will And useful, and endy in thsee d=====se This I and. naturally, contained no reference to the accident assumption is not a new one: it goes back over 25 years j to the early days of computing. The basic idea is that itself.

one can use the formal aspects of text to predict its Another informattor%usiGssited in the ident15- .!

meaning or subject content: formal aspects such as the 4 cation of a key teems os phaseos thenneserused ed occurrence location. and frequency of words; and to retrievi'releveint informattdni fatsrWwere 'able' to find 26 other words and phrases that retrieved add 5 the extent that it can be precisely descnbed. the syn-

'actic structure of word phrases. It was hcped that by tional relevant documents. The 3 original key terms ,

could not have been used individually is they would exploiting the high speed of a computer to analyze the have retrieved 420 documents, or approximately 4000 formal arpects of text. one could get the computer to deal with text in a " comprehending-like" way (i.e., to pages of hard copy, an unreasonably !arge set, most of identify the subject content of textsl. This endeavor is which contained irrelevant information. Another re-known ss " Automatic Indexing" or. in a more general quest identified 4 key terms / phrases that retrieved rel-evant documents, which we were later able to enlarge sense "Natural Language Processing." During the past two decades. many experiments in automatic indexing by 44 additional terms and combinations of terms to fof which full. text searching is the simplest formt have retrieve relevant documents that had been missed.

Sometimes we followed a trail oflinguistic creativity been carned out. and many discussions by linguists.

through the database. In searching for documents dis.

psychologists, philosophers. and computer scientists cussing " trap correction * (one of the key phrases), we have analyzed the results and the issues (5]. These exJ periments show that full. text document retrieval has discovered that relevant. unretrieved documents had -

worked well only on unrealistically small databases. discussed the same issue but referred to it as the " wire The belief in the predictability of the words and warp." Continuing our search. we found that in still other documents trap correction was referred to in a j phrases that may be used to discuss a particular subject '

third and novel way: the " shunt correction system."

is a difficult prejudice to overcome. In a naive sort of Finally, we discovered the inventor of this system was way,it is an appealing prejudice but a prejudice none-theless, because the effectiveness of full. text retrieval a man named "Coxwell" which directed us to some documents he had authored. only he referred to the I has not been substantiated by reliable Recall measures on realistically large databases. Statod s= : - 'i. It & syst a as tha " Roman circle method." Using the Roman circle method in a query directed us to still more rele-impossibly difficult for users to predict the exact were.

vant but unretrieved doc.iments, but this was not the word combinations, and phrases that are used by all (cy end either. Further searching revealed that the system mostl relevant documente and siely (or primordy) by tg ..

those documente, se con be seen la the following exasp.

had been tested in another city, and all documents ger.

mane to those tests referred to the system as the " air [

ploi truck." At this point the search ended having con-t.

In the legal case in question, one concern of the law-yers was an accident that had occurred and was now sumed over an entire 40-hour week of on.line search-an object of litigation. The lawyers wanted all the re- ing, but there is no reason to believe that we had reached the end of the trail: we simply ran out of time.

ports, correspondence, memoranda, and minutes of As the database included many items of personal cor.

meetings that discussed this aqcident. Formal queries 95

%, e :3 %meer 1 O-mmwm et me AC.t

' 'r:, : %

I

. . 1 Compating ?gsctices respondence as well as the verbatim minutes of meete lection of relevant information to be put on line. Might l ings, the use of slang frequently changed the way in it not be reasonable to expect them to be suspicious which one would riormall

  • ta k abcut a subject.9E- that thsy were not retrieving evarything they wanted?

WN, phied Not really. Because the database was so large (providing access to over 350.000 pages of hard copy. all of which was in some way pertinent to the lawsuit), it would be unreasonable to expect four individuals (two lawyers and two paralegals) to have total recall of all the impor-

'Esen misspeWamsesed an eW. Key search tant supporting facts. testimony, and related data tha't terms like -flattening.* " gauge."

  • memos.* and *corre. were germane to the case. If they had such recall they spondence." which were essential parts of phrases, would have no need for a computerized. interactive were used effectively to retrieve relevant documents. retrieval system. It is well known among cognitive psy.

However, the misspellings "flatoning." "guage." gage.* chologists that man's power of literal recall is much less "memoes.* and "correspondance." using the same effective than his power of recognition. The lawyers phrases, also retrieved relevant documents. Misspell- could remember the exact text of some of the impor-ings like these, which are tolerable in normal everyday tant information, but as we have already stated, this correspondence, when included in a computerized da- was a very small subset of the totalinformation rele-tabase become literal traps for users who are asked not vant to a particular issue. They could recognize the im-only to anticipate the key words and phrases that may portant information when they saw it, and they could be used to discuss an issue but also to foresee the wliole do so with uncanny consistency. (As a control, we sub-range of possible misspellings. letter transpositions, and mitted some retrieved sets and sample sets of docu-typographical errors that are likely to be committed. ments to the lawyers several times in a blind test of

$smo intur==8Ea= requests placed ahnent W their evaluation consistency, and found that their con-demands on the ingenuity of the indiv6 dual _cggstructp sistency was almost perfect.) Also, since the lawyers ing the query. In one situation. the Tawyer wanted I were not experts in information retrieval system de-

" Company A's comments concerning ...." Losiums et sign, there were no a priori reasons for them to suspect the documents authored by Company A was not the Recall levels of STAIRS.

enough. as many relevant' comments were embedded ia the minutes of meetings or recorded secondhand in the DETERIORATION OF RECA!.I. AS documents authored by others. Retrieving all the docu- A FUNCTION OF FII.E SIZE ments in which Company A was mentioned was too One reason why Recall evaluations done on small data-f bread a search; it retrieved over 5.000 documents bases cannot be used to estimate Recall on larger dase<

(about 40.000+ pages of hard copy). However, predict- bases is because, ceteris partbus the value of Recalle ing the exact phraseology of the text in which Com- decreases as the size of the database increases, or, from, pany A commented on the issue was almost impossible: a different point of view, the amount of search effort l sometimes Company A was not even mentioned. only reqgired to obtain the same Recall level increases as that so.and so (representing Company A)"said/consid. the database increases, often at a faster rate than the ered/ remarked / pointed out/ commented /noted/ex- increase in database size. On the database we studied.

plained/ discussed.' etc. there were many search terms that, used by them.

In some requests. the most important terme and i selves, would retrieve over 10.000 documents. Such phrases were not used at all in relevant documents. For output overload is a frequent problem of full. text re-example. " steel quantity" was a key phrase used to trieval systems.

l retrieve important relevant documents germane to an As a retrieved set of several thousand documents is j actionable issue, but unretrieved relevant documents impracticJ1. the user must reduce the output overload were also found that did not report steel quantity at all, by reformulating the single. term query so that it re-l but merely the number of such things as girders." trieves fewer documents. If a single term query wi re-l

" beams."-frames.* -bracings." etc. In another request, it trieves too many documents, the user may add another was important to find documents that discussed "non- term, wi. so as to form the new query wi and wa -(or expendable components."In this case, relevant unre- "wi adjacent wi." or "wi same w a-). The reformulated trieved documents merely listed the names of the com. query cannot retrieve more documents than the origi-ponents (of which there were hundreds) and made no nal: most probably, it will retrieve many fewer. The mention of the broader generic description of these process of adding intersecting terms to a query can be items as " nonexpendable." . continued until the size of the output reaches a man.

Why didn't the lawyers realize they were not getting ageable number. (This strategy, and its consequences, is all of the information relevant to a particular issue? - discussed in more detail in [1j.) However, as the user Certainly they knew the lawsuit. They had been in- norseus the else of the astpug hyademydeessemeningf volved with it from ihe beginning and were the princi. ternes the vehse of RseeN geis down h wier pal attorneys representing the defense. In addition, one seek new tesum the M'leithefsame relavedt of the paralegals had been instrumental not only in deemssenesiis45 he emeluded by.that seessimulmeedre j setting up the database but also in supervising the se. query.J

'4 l ,
  • m o at .c t 'u :4.' ass

. se :3 'we* 1

Omptin ?mtscrs pould undoubtedly have cbserved similar phenomena perience in information systems analysis and should bo (Swanson was later to comment perceptively un the expecteicas about docu- than the typical STAIRS searcher. Moreover. STAIRS is ment retrieval from experiments using small databases sold under the premise that it is easy to use and re.

[7)). In addition it has only recently been cbserved that quires no sophisticated training on the part of the user.

information-retrieval systems do not scale up [2]. That Yet this study is a clear demonstration of just how is, retrieval strategies that work well on small systems sophisticated search skills must be to use STAIRS or.

do not necessarily work well on larger systems lprimar- mutatis mutandis, any other fu!I text retrieval system.

ily because of output overload).'This means that studies WWheNimes4ist%lipsabiduits of retrieval effectiveness must be done on full sized seongesed by esleast see asMeat vender.^

retrieval systems if the results are to be indicative of WESTLAW.which has~ mede~lte impoteelen by ederiest q

how a large, operational sysjem would perform. How- imbeestensens es legni esmWW51AW hee new b4(

ever !arge-scale. detailed retrieval-effectiveness stud- medS84Wisiment testatenessessesuel with manne5y los. like the one reported here, are unprecedented be- @emmi cause they are incredibly expensive and time consum-ing; our experiment took six months; involved two re-searchers and six support staff; and. taking into account

SUMMARY

all direct and indirect expenses, cost almost half a mil- This paper has presented a major, detailed evaluation lion dollars. Nevertheless. Swanson and Salton's earlier of a full-text document retrieval system. We have full-text evaluations remairs pioneering studies and, shown that the system did not work well in the envi-rather than contradict our findings, have an illuminat- roament in which it was tested and that there are theo-ing value of their own. retical reasons why full text retrieval systems applied An objection that might be made to our evaluation of to large databases are unlikely to perform well in any STAIRS is that the low Recall observed was not due to retrieval environment.19seepelmeistifearly studie$

STAIRS but rather to query formulation error. This ob- wee bened en the smagt stas of the destasse usadi(nd jection is based on the realization that, at least in prin- were geared toward showing only that full-text search ciple, virtually any subset of the database is retrievable was competitive with searching based on manually as-by some simple or complex combination of search- signed index terms. under the assumption that, if it terms. The user's task is simply to find the right combi- were competitive, full text retrieval would eliminate nation of search terms to retrieve aff and only the rele- the cost of indexing. However, there are costs associ-vant dccuments. However, we believe that users should ated with a full-text system that a manual system dces not be asked to shoulder the blame. and perhaps an not incur. First. there is the increased time and cost of analcgy will indicate why. Suppose you ask a company entering the full text of a document rather than a set of to make a tock for you. and they oblige by providing a manually assigned subicct and context descriptors. The combinat:on !cck, but when you ask them for the com- average length of a document record on the system we binat:on to open the lock, they say that finding the evaluated was about 10.000 characters. In a manually correct combination is your problem not theirs. Now, it assigned index term system of the same type, we found is possible. in principle. to find the correct combination. the average document record to be less than 500 char-but in practice it may be impossibly difficult to do so. A acters. Thus. the full text system incurs the additional full. test retneval system bears the burden of retrieval cost of inputting and verifying 20 times the amount of failure because it places the user in the position of information that a manually indexed system would having to find (in a relatively short time) an imposs'oly needito deal with. This difference alone would more difficult combination of search terms. The person using than compensate for the added time needed for manual a full text retrieval system to find !niormation on a indexing and vocabulary construction. The 20-fold relatively large database is in the same unenviable po-. increase in document record size also means that the sition as the indnedeeWoobosc h the -hda= $ database for a full-text system will be some 20 times the lock. It is true that we. as eva'.uators. found the larger than a manually indexed database and entail combinations of search terms necessary to retrieve increased storage and searching costs. Finally, because many of the unretrieved relevant documents, but three the average number of searchable subject terms per things should be kept in mind. First. we make no claim document for the full text retrieval system described to having found all the relevant unretrieved docu- here was approximately 500, whereas a manually in-ments; we may not have found even half of them, as dexed system might have a subject indexing depth of our sampling technique covered only a small percent- about to, the dictionary that lists and keeps track of age of the database.Second a tremendous amount of these assignments (i.e.. provides paisease se the d$a-search time was involved with rach request (sometimes M be as much as 50 timese lasser on a full-temi over 40 hours4.62963e-4 days <br />0.0111 hours <br />6.613757e-5 weeks <br />1.522e-5 months <br /> of on-line time), and the entire test took ,f eyeesse G hemen t a meneellyindseedsyssesq A full text almost 6 months. Such inefficiency is clearly not conso- retrieval system does not give us something for nothing.

nant with the high speed desired for computerized re- Full text searching is one of those things, as Samuel trieval. Third, the evaluators in this case represented. Johnson put it so succinctly, that "... is never done together, over 40 years of practical and theoretical ex. well, and one is surpnsed to see it done at all."

re c --ar..m ,, w ac'.: um ass we 23 wer 3

i Ccmpt.ng ?r:ct crs

e. . .

1 8 S='is I A :nformanca re'nevai systems. Scu,ct n tise3:. :is.

Acknowledgments. The authors would like to thank William Cooper of the University of California at , [uIde. P., and Deuter. M E. Indening consistency and quality Am.

Berkeley for his comments en an earlier version of this coc. :o. a lluir t n9i. 23s-tra.

manuscript, and Barbara Blair for making the irawings cx Catesene. and Subiect Descrioters: H 1.0 [Models and Prisci.

that accompany the text. piesp Ger.eral. H.3.3 (information Sto-age and cettwvall: fr.formano.:

Search and Reineval-werch process. ameryformatar en

, General Terms: Desisa. Human Fac: ors. Theory

, Additional Key Words and Phrases: full-text document retneval.

- RITERENCZ5 J 1. Blaar. D.C. Searching biases in large interactive document retrieval litigation support. reineval evaluat:on. Recall and Precison systemt [ Am. Soc. Inf. Sd 31 (July 1980). 271-277.

2. ResnikoT. H L The nanonal r.eed for research in information sci
  • Received 4/84 sccepted 9/s4 encs. STI Issues and Opuons Workshoo. House subcommittee on 3

! science. research and technology. Washington. D.C Nov. 3.197s. . Present Addresses: David C. Stair. Graduate School of Eusiness

3. I n.C. Automauc text analysit Scunce 164. 3929 (Apr.1970L M MM Au h M M M E. Maron. School of Lbrary and Informauon Studies. The University
4. Sarscovic. T. Relevance: A review of and a framework for thinking of California. Berkeley. CA 94720.

? on the notion in informahon science f. An Soc. Inf. Sct16 (1975L J 321-343.

5. spatch Jones. K. Astemstse Krywerd Cassificarwa for informatoes As-Permission to copy without fee all or part of this matenal is granted Intest Butterworths. London.1971. provided that the copies are not made or distnbuted for direct commer-
s. Swanson. D.C. Searching natural tanguage text by computer. Science cial advantage. the ACM copyright nouce and the title of the puolicanon i and its date appear, and notice is given that copying is by permission of IJ2. 3434 (Oct.1960L 1099-1104. the Associanon for Computing Machinery. To copy otherwise. or to
y. Swanson. D.R. Information retneval as a inal and error process b6r.

republish, requires a fee and/or spoonc permission.

4 Q. 41. 2 (1978k 128-148.

a

.I 4

8 s

SUBSCRIBE TO-ACM PUBLICATIONS .

Yvhether Scu are a comCutng novice er a master of pur in-decth-ACM Transacdons on Mathemadcal Software, cratt. ACM has a puolication that can meet ) cur it'cividual ACM Transaccons en Catabase Systems, ACM Transac-needs. Do you want broad-gauge high quality, highly read- :ons on Programming Languages and Systems. ACM able artic'es on key issues and mator developments and Transac cns on Graphics. ACM Transac:ces on Ctfice tiends sn computer scence? Read Communicadons of the Informadon Systems, and ACM Transacdons on Computer ACM. Do you want to read comprehensive surveys, tutorials, Systems. Do >cu need accinonal references on computing 7 and overview artc!es on topecs of current and emerging Compudng Reviews contains onginai reviews and abstracts importance: Computing Surveys is right for you. Ate you of current books and joumals. The ACM Guide to Comput-interested in a publication that offers a range of ing Uterature is an important bibliographic guide to scientfic research desagned to keep pu abesast computing literature Co#ected Algorittrns from ACM is of the latest issues and dowlopments? Read a collection of ACM caigon Ts available in Joumaf of the ACM. What speofic topics are printed wrsson, on nicrofiche, or machine-worth exploring further? The vanous ACM readable tape.

transacuens cover research and applicabons For more information about ACM N "-T'--s, write for ) cur free copy of the ACM Pubhca6ons Catalog to: The PuNientings Department. The Aeemation for Computing Machinery,11 West 42nd Street,New'itxk NY10036.

o um * ~ s .

':c-..p O., u,rcirms :t m sc:t .s m

m amptmg Peacuces The deterioration of Recall from a probabilistic point we consider a three- or four term query, the value of of view is quite startling. For each query, there is a Recall droos ofieven more sharply.

class of relt.vant documents that we designate as R. We The problem of output overload is especially critical represent the probability that each of those documents in full text retrieval systems like STA!RS. where the will contain some word mi as p and the probability that frequency of occurrence of search terms is considerably a relevant decurnent will contain some other word w larger than (and increases faster than) the frecuency of as g. Thus, the value of Recall for a request using only occurrence (or " breadth") of index terms in a database wi will be equal to p. and Recall for a request using where the terms are manually assigned to documents.

only w will be equal to 9. Now the probability that a This means that the usaretsmessit seteleveTsfifeisU relevant document will contain both wi and w: is less wMass time psehiesnieloutput overleed soonee thed than or equal to either p or 9. If we assume that the 1the userof a manuelly indexed systead The solution respective appearances of wi and w:in a relevant docu. that STAIRS offers--conjunctively adding search terms ment are independent events, then the probability of to the query-does reduce the number of documents both of them appearing in a relevant document would retrieved to a manageable number but also eliminates be equal to the product of p and 9. Since both p and g relevant documents. Search queries employing four or are usually numbers less than unity, their product usu- five intersecting terms were not uncommon among the ally will be smaller than either p or g. This means that queries used in our test. However, the probability that Recall, which can also be thought of as the probability a query that intersects five terms will retrieve relevant of retrieving a relevant document,is now equal to the documents is quite small. If we were to assign a proba-product of p and g. In other words, reducing the num- bility of.7 to all the respective probabilities in a hypo-ber of documents retrieved by intersecting an increas- thetical five-term query as we did in the two-term ing number of terms in the formal query causes Recall query in Table II (and .7 is an optimistic average value).

for that query also to decrease. the Recall level for that query would be .028. In other However, the problem is really much worse. In order words, that query could be expected to retrieve frss for a relevant document. which contains mi and wi. to than 3 percent of the ralevant documents ir. the data-be retrieved by a single query, a searcher must select base. If the probabRities for the fivenessa query weis&

'and use those words in his or her query. The probabil. more reelletic averop of.5. the Reciall~velue for~thetil ity that the searcher will select wi s, i of course, gener- query would be . coast This means that if there efese -

ally less than 1.0: and the probability that wi will occur 1000 relevant documents on the database. It is Ilhey in a relevant document is also usually !ess than 1.0. that this query would retrieve only one of them.The However. these probabilities must be multiplied by the searcher must submit many such low yield queries to probability that the searcher will select w: as part of his the system if he or she wants to retrieve a high percent-or her query, and the probability that w: will occur in a age of the relevant documents. .

relevant document. Thus, calculating Recall for a two-term search involves the multiplication of four num- DISCt;SSION bers each of which is usually less than 1.0. As a result. The reader who is surprised at the results of this test of the value of Recall gets very small(see Table II). When retrieval effectiveness is not alone. The lawyers who participated in the test were equally astonished. Al-though there are sound theoretical reasons why we TABLE :1. The Frctablity of Ret: evmq a Relevant Docurnent should expect these results, they seem to run counterf Containing Terms a, and we to previous (ests of retrieval effectiveness for fulletext

- 'I ". b. .

~

f

^

[- Two ioneering evaluations of full-text retrieval sys-tems by respected researchers in the field (Swanson (6)

P( =Y=NbaiIst'y searEde'r u e term a wy7Q

). queryMPWM.j'.yMQT.yst

'P(Dw F.7 6 Prnhahaty wi aMirTa rWevant docunent i and Salton (3]) determined to their satisfaction that full text document retrieval systems could retrieve rel-

'P(Dw

s ) 4.6 = Probabetyyiappears hfaleisvant dosisnerit ] evant documents at a satisfactory level while avoiding- -

,Probabety.of seen: hor selecthgM andlsi relevene fixGrE] the problems of' manual indexing. Our study fon the G.u w .,9 wi m wise r .". M ,. J other hand, shows that full text docuanent r(trievat

$ ffdfjs,# r J ** '

does seet opagate at satisfactory levels and that there ari NM&

~

I ar. . , y _

sound theoretical reasons to expect tlile to be sd/Who is ,

Probabety'ot,sem selecog i right? Well. we all are, and this is not an equivocation.'

! h.N0".tainsp,q;g'[ys WE The two earlier studies drew the correct conclusions I 1%S'ksw if from their evaluations but these conclusions were dif- 1 JNEMEb- as '

d"?D*D8 N N - . -

and a forent from ours because they were based on small experimental databases ofless than 750 documents. jl i h -

N._

W.d:u P(Sw,) x P(Dwi) EP(Sfv ) kP(Dw )E. . e n

Our study was done not on an experimental database but an actual, operational database of almost 40.000 i

.J.- w y ..i~.- w ywr ,,sco u p ,.mT documents. Had Swanson and Salton been fortunate (e.g P(.6) x P(.7) x P(.5) x P(.8).= .128, ,,g 34 enough to study a retrieval system as large as ours. they

'tw :.ns '!we :s nmer 3

. nm. manes :r At x?.i MT

Cgyutingfuctices .

THE ALLURE Of FULL TEXT ing ofindexers, and the time consumed in scanning /

DOCUMENT RETRIEVAL reading documents and assigning context and subject Retrieving document 'exts by subject content occupies terms. 79tenememies eHan gues-hsppenhay a special place in the province ofinformation retrieval luenhetests weseWuss armust aberpseuesweefst because, unlike data retrieval, the richness and flexibil- t$neessyAmesis ed seestesel sesselvenstf ity of natural language have a significant impact on the .

conduct of a search. The indexer chooses subject terms MEASURING RETRIEVAL r.rrexnVENESS that will describe the informational content of the doc-

  • Two of the most widely used measures of document-uments included in the database. and the user de- retrieval effectiveness are Recall and Precision. Recall scribes his or her information need in terms of the measures how well a system retrieves all the relevant subject descriptors actually assigned to the documents documents; and Precision, how well the system re-

{ Figure t). However, there are no clear and precise trieves only the relevant documents. For the purposes of rulee to govern the indexers' choic.e of appropriate sub- this study. we define a document as relevant if it is ject terms, so that even trained indexers may be incon- judged useful by the user who initiated the search. If sistent in their application of subject terms. Experimen- not, then it is nontelevant (see (4]). More precisely.

tal studies have demonstrated that different indexers Recall is the proportion of relevant documents that the will generally index the same document differently (9]. system retrieves, the ratio of r/ns(Figure 2). Notice that and even the same individual will not always select the one can interpret Recall as the probability that a rele-identicalindex terms if asked at a later time to index a vant document will be retrieved. Precision, on the document he or she has already indexed. The problems other hand. measures how well a system retrieves only associated with manual assignment of subject descrip- the relevant documents: it is defined as the ratio r/ni tors make computerized, full. text document retrieval and can be interpreted as the probability that a re-extremely appealing. By entering the entire, or the trieved document will be relevant.

most significant part of a document text onto the data-base. one is freed. it is argued. from the inherent evils THE TEST ENVIRONMENT of manually creating document records reflecting the The database examined in this study consisted of just subject content of a particular document; among these. under 40.000 docume'nte, repe6senting' roughly M(

the construction of an indexing vocabulary. the train- pages of hard-copy tem (. which were to be used in the mcuirer I It' Coming CoCumer'tS n Search of Y L' ,Information 's e I

L kT 7 s/ .

jj' t

VOcaCulary Cuery 'l of fresaurus Formula!>cn j'

LidentifsCal'on Content

_3 ._

- M r ie'.m<,W-7 m4 -

? - _4

--o/

l

! T

'= ;

a1 MI k I k

a n i d.

7 9 %7  %

Inden and Formal Record m,. j Retneval Query f *

,h ;g ,g gi E

System 4%, . ,

a gs g (

. -- P d #

F TC'2t1 f*d m *' .

"*D M Output to inqu..er

",* -Relevant

  • Items At hww - ~-xtu.gm&.;rt~;
  • RGURE 1. De Dynamics at Informanen Retr' eval M0 0-, o ,, s vu,,, e ",e C. + f w, : m " .1.

e f %ee !