ML25329A181
| ML25329A181 | |
| Person / Time | |
|---|---|
| Issue date: | 06/08/1990 |
| From: | Hoyle J NRC/Chairman |
| To: | |
| References | |
| Download: ML25329A181 (0) | |
Text
UNITED STATES NUCLEAR REGULATORY COMMISSION WASHINGTON, 0. C. 20555 CERTIFICATION MINUTES OF SECOND MEETING OF THE LSS ADVISORY REVIEW PANEL I certify that the attached Minutes of the Meeting of the LSS Advisory Review Panel, held on March 20-21, 1990, are accurate to the best of my knowledge and belief.
These minutes were approved by the Panel at the June 7, 1990, meeting.
Date
\\. --*
Minutes of LSSARP Meeting March 20-21, 1990 The second meeting of the Licensing Support System Advisory Review Panel (LSSARP or Panel) was held in open session in Bethesda, Maryland, on March 20, 1990, with a site visit and tour of the U.S.
Patent and Trademark Office in Crystal City, Virginia, on March 21, 1990.
Enclosure 1
is a copy of the meeting agenda. is a list of attendees.
ADMINISTRATIVE ISSUES Mr. Hoyle began a discussion of administrative issues with the minutes of the December 19, 1989, meeting.
Mr. Treby of the NRC's Office of the General Counsel asked that a change be made on page 2.
In the third paragraph under the role of the LSSARP, the word "consensus" should be changed to "majority."
Mr.
- Silberg, an attorney representing the utility group, said that consensus means the absence of an objection.
Mr.
Cameron of the LSSA staff asked for clarification of the definition for "consensus."
Mr. Hoyle stated that "consensus means no dissent among us.
The Panel would also provide advice on the basis of majority views, that is at least five of the seven members, with dissenting views attached.
Mr. Hoyle noted that coalitions (several persons representing a group with only one vote) must agree among themselves and will have one vote.
With this change, the minutes were approved and will be placed in the NRC's Public Document Room (PDR).
Mr. Hoyle proposed that in the future he place a draft of the minutes in the PDR soon after each meeting and replace it with the final version when it is approved by the Panel.
There was no objection to this proposal.
Since the, LSSARP is a federal advisory committee, all meetings will be held in open session and minutes of each meeting will be placed in the PDR.
Mr. Hoyle will send a letter report to the LSS Administrator after each meeting and will provide a copy to each Panel member.
Next Mr. Hoyle asked Mr. Lloyd Donnelly, the LSS Administrator (LSSA), about the report which must be sent to the Commission in June evaluating DOE's compliance with the LSS rule.
Mr. Donnelly noted that this date was selected prior to the delay in the repository licensing schedule from 1995 to 2001.
He is currently discussing the matter with DOE and will be notifying the Com.mission about future LSSA planning for compliance evaluation.
Mr. Hoyle suggested that transcripts be produced for each meet-ing.
Some members felt that transcripts were unnecessary; others wanted them.
After discussion, it was agreed that, on a trial basis, there will be a transcript for the next meeting.
Minutes 2 -
STATUS OF LSS DEVELOPMENT Next was a presentation on the status of LSS development by Dan Graser ~f DOE who is the project manager for the current LSS design effort and the contracting officer's technical representa-tive (COTR) for the current LSS design contract with Science Applications International Corporation (SAIC).
He will also be the COTR for any subsequent LSS procurements.
His handout is included as Enclosure 3.
The current design contract with SAIC, which is in the process of being modified to bring it to an orderly conclusion, will be used to develop the detailed design procurement specifications.
, The final contract deliverables, covering all functional areas of the system, are due in July, August, and September 1990 with a two-to three-month DOE review period before final acceptance.
Because DOE has a very limited FY 1990/1991 budget, it intends to rely on technical support from the LSSA staff to assist in preparation of solicitation packages for the LSS procurement.
In addition, DOE will attempt to augment its professional staff on LSS design by having DOD personnel detailed to DOE for an extended period of time.
DOE is endeavoring to obtain acquisition support assistance from the Federal Systems Integration
& Management Center (FEDSIM),
a GSA organization established to assist agencies in efficiently and effectively using their own information resources.
FEDSIM has about 20 pre-approved contractors who are invited to bid on p,rojects.
This could save months in getting contracts in place.
DOE may use FEDSIM to arrange a support contract for (1) develop-ment of specifications, (2) live testing and acceptance criteria, (3) support as tests are conducted, and (4) assistance in compiling results of tests. If DOE uses FEDSIM, and gets support from the LSSA staff, a request for proposal (RFP) could be issued in August 1991 with a contract for the LSS awarded in early-April 1992.
Once the contract is in place, the vendor would begin working towards installation of the first node of the LSS in December 1992.
A node has the full functional capability of the LSS for capture plus search and retrieval of both full text and images.
By the next Panel meeting, DOE plans to have a final strategy and a schedule, and be able to present options that can be built in to create as stable a schedule as possible.
Ms. Barbara Cerny of DOE's Office of Civilian Radioactive Waste Management added that one of the ways to remain flexible is to go into a GSA program called "trail boss."
Under this concept, a person is designated the trail boss for a particular procurement.
The trail boss has authority for the entire procurement even though only a part of it has been specified at the time.
Minutes 3 -
Under the previous procurement strategy which was discussed at the December 1989 LSSARP meeting, DOE would not have used just a single s~urce or done only a single procurement for the entire LSS.
After all the specifications were developed, there would have been additional procurements for the computer platform, the software, etc.
Now, due to the delay in the repository schedule, DOE is going to combine the functional requirements into one solicitation and make a single award.
The contract will require establishing one node, ironing out all the bugs, and making sure it operates properly before proceeding.
Additional nodes of the system will be procured as required.
Ms. Cerny reminded the Panel that the new schedule assumes availability of funding.
When DOE envisioned using approximately six capture stations, there was an assumption that once the capture stations had been used to eliminate the document backlog, all but about three would be decommissioned and surplused.
At a cost of about
$2 million per capture
- station, approximately
$10 million would be surplused.
DOE was willing to absorb that cost as a necessary expenditure to permit timely elimination of the backlog.
Under the present plan, hardware architecture will be adaptable to whatever is being done, i.e.,
once the backlog has been processed, the equipment will be used for something else.
Therefore, there will be an estimated cost saving of approximately
$10 million.
When asked about cataloging costs, etc., Mr. Graser responded that he estimates that at least 70% of the cost will be labor.
SAIC has been asked to reexamine the estimates used and the status of the backlog of documents, look at the new schedule, and determine whether or not the same number of documents will be generated by the time the license applica-tion is submitted.
Currently only 12% of the documents being generated within DOE HQ are relevant to the LSS.
Until a decision is made on the permit issue between DOE and the State of Nevada, there is not likely to be a near term increase in docu-ment volume over the level anticipated under the previous report-ing program schedule.
HEADERS The next discussion was led by Ms. Betsy Shelburne of the LSSA staff.
A copy of the slides used in her presentation are included as Enclosure 4.
Her handouts, a letter from B. Cerny dated January 31, 1990, a letter from D. Graser dated September 21, 1989, and a letter from F.
X. Cameron dated August 7, 1989, are included as Enclosures 5,
6, and 7,
respectively.
Ms.
Shelburne talked about LSS header needs and the elements of information that could be picked up.
She requested that a working group be established to develop recommendations for required header elements.
\\
Minutes 4 -
After a discussion, Mr. Hoyle proposed establishing a header working group with membership from Nevada, NRC, DOE, and National Congress of American Indians (NCAI).
Industry and adjacent counties,indicated they have funding constraints at this time and cannot participate.
Mr. Kirk Balcom, representing Nevada, was appointed Chairman.
Mr. Hoyle will assign an NRC participant.
Ms. Cerny stated that she would send an SAIC employee as DOE's representative because DOE had no one available from its staff.
The Panel elected to have Ms. Betsy Shelburne be a member of the working group.
Mr. Donnelly offered to provide office space and clerical support.
The recommendations of the working group should be provided to Mr.
Hoyle by mid-May for review and approval by the full Panel before being forwarded to the LSSA.
Meetings of the working group are not covered under the Federal Advisory Committee Act since the group is a fact-finding commit-tee.
Therefore, working group meetings need not be announced in the Federal Register.
LSS DESIGN Mr. Balcom said that he is also interested in the current DOE efforts to design the LSS and feels that the Panel should have input into that now rather than in August or September when the SAIC design documents are received by DOE.
Ms. Cerny said that there are many documents that, though they are final deliverables, are in no way fixed in terms of the final RFP (request for proposal) for the LSS.
She suggested that these documents could be reviewed by the Panel and Panel comments could be incorporated in the RFP.
Mr. Balcom felt that would be too late; Nevada, particularly, would like to be involved in the design process earlier, perhaps in monthly meetings.
He stated that the design issue is as important as the header issue.
Mr.
Hoyle asked Ms. Cerny if a working group could be provided information to review as it becomes available.
Ms.
Cerny responded that DOE periodically conducts major design review meetings and recommended that the Panel either take part in the design meetings or thoroughly review the documents as they come out.
She stated that she would welcome input to the design process.
Mr. Hoyle noted that a major concern is whether the Panel input will be injected into the design process in time to affect the design.
He pointed out that section 2.1011(£)
of the rule specifies that the Panel shall provide advice to DOE on the fundamental issues of the design and development of the computer system.
When the design documents are delivered, a working group should review them and make a recommendation to the Panel for submission to the LSSA.
Mr. Graser noted that there is about a four-month period for the FEDSIM contractor to become familiar with the design documents; the Panel could also review the
Minutes documents during that time.
Ms.
Cerny said that FEDSIM review period will occur after the documents are delivered to DOE, and agreed there would be no problem with the Panel's reviewing the documents then.
Mr. Donnelly reminded the Panel that DOE has the responsibility to design and develop the system, and must be given the oppor-tunity to do that without undue interference by either his office or the LSSARP.
He noted that when the Panel reviews the design documents, perhaps some of their current concerns will be alleviated.
OTHER FEDERAL AGENCIES' INFORMATION MANAGEMENT SYSTEMS There were presentations by three Federal agency representatives who discussed their experiences with design, procurement and operation of large automated information management systems.
First was Mr.
Boyd Alexander of the U.S.
Patent and Trademark Office (PTO).
A copy of the slides used in his presentation are given in Enclosure 8.
Mr. Alexander was asked about the costs of the system.
The PTO charges a database user fee of
$40 per hour for text search.
They break even with that fee.
The lifecycle costs for the total system approach $500 million.
This includes future development, equipment, contractor costs, etc.
PTO is now attempting to make a policy decision regarding whether informa-tion should be available to the public with no user fee.
of the Archival Research and Archives and Records Administra-data system at the National ways they enhance images made As examples he used documents Mr.
Bill Holmes, the Director Evaluation Staff at the National
- tion, discussed the automated Archives and showed some of the from very poor quality originals.
from the Civil War period.
Mr.
David Copenhafer of the Securities and Exchange Commission spoke about the EDGAR (electronic data gathering, analysis, and retrieval) sys~em.
The EDGAR system is a pilot project that SEC has found to be a useful testing ground for different approaches.
The system contains about 64,000 records or about one million pages.
Due to the nature of their business, SEC does not intend to put old data into EDGAR.
When asked about costs, Mr.
Copenhafer noted that members of the public can get anything in their Public Reference Room at no charge.
Total system lifecycle costs are about $100 million.
Following the Federal agencies' presentations, Mr. Hoyle asked Mr.
Donnelly if his office planned to do a "lessons learned" study summarizing the experience of other agencies with large information management systems.
Mr. Donnelly stated that this would be done.
Minutes 6 -
REVISION OF TOPICAL GUIDELINES Mr.
Treby discussed the revision of topical guidelines.
The topical 'guidelines published in the rule were intended to be interim guidelines pending issuance of a Regulatory Guide.
The LSS Internal Steering Committee (LSSISC) has established a task force to propose final guidelines.
The LSSISC expects to send recommendations to the Commission in May and to provide them to the Panel by early July.
The LSSISC task force's recommendations will be mailed to Panel members for a discussion at the fall meeting.
LSSA/DOE MEMORANDUM OF UNDERSTANDING Mr. Cameron, the Deputy LSS Administrator, gave a brief update on the Memorandum of Understanding (MOU) between DOE and LSSA.
The MOU will set forth mutual responsibilities for design, develop-ment, and operation of the LSS.
The MOU will list the major procurement activities, the deliverables
- involved, and the schedule.
The schedule will be included in a management plan attached to the MOU.
The LSSA sent a draft MOU to DOE.
DOE is incorporating its comments for response to the LSSA in April.
Negotiations will begin in mid-April with hopes of completion by June or July.
The MOU must be reviewed and approved by the Commission.
DOCUMENT LOADING PRIORITIES Mr. Cameron also discussed the prioritized loading of documents into the LSS.
Under the present schedule, the first node of the LSS will be ready for operation in late December 1992.
Between 500,000 and 750,000 pages per year will be processed (captured) by that first node.
With recommendations from Panel members on priority loading categories, the LSSA staff will compile a
document loading priority schedule for circulation to the Panel and discussion at the fall meeting.
The first node could be used for loading of. priority documents, with the second, third, etc.,
nodes used to work off the backlog of lower priority documents.
The Commission will review the priority loading schedule and a "costs and needs" analysis before any documents are loaded into the LSS beyond those that are needed to fully test and evaluate the first node.
Mr. Cameron reminded Panel members that their priority recommendations on the Prioritized Document Production Schedule-LSS Participant Worksheet (which was distributed and discussed at the December 1989 Panel meeting) are due to the LSSA by September 1, 1990.
Minutes 7 -
FUTURE MEETINGS Mr.
Hoyle proposed that a short Panel meeting be held in early June to'review the header working group's recommendations.
It was agreed that the next meeting will be June 7, 1990, in the Washington, D.C., area.
He will attempt to set up videocon-ferencing equipment for the meeting.
A proposed planning agenda for future meetings was distributed.
See Enclosure 9.
The fall Panel meeting will include discussions of SAIC
- products, priority loading categories, access to technical data, revision of topical guidelines, and the com-pliance evaluation program.
The Panel members
- agreed that the fall meeting will be a two-day session on October 10 and 11, 1990, in Reno, Nevada.
Mr. Hoyle reminded Ms. Cerny that the Panel must have the design documents as soon as DOE receives them.
Ms. Cerny agreed that as soon as she receives the documents, she will send them to Mr.
Hoyle.
He in turn will send them to the Panel members with a request for written comments.
Panel members' comments will be distributed and a discussion meeting arranged, if necessary.
The morning of March 21,
- 1990, the Panel met at Mr.
Boyd Alexander's office at the U.S. Patent and Trademark Office for a demonstration and tour of their automated document system. 0 is a copy of the handout used in the PTO demonstra-tion.
Enclosures:
1. Agenda
- 2. Attendance List
- 3. D. Graser Handout
- 4. B. Shelburne Slides
- 5. B. Cerny letter dtd 1/31/90
- 6. D. Graser letter dtd 9/21/89
- 7. F.X. Cameron letter dtd 8/7/89 8. B. Alexander Slides
- 9. Planning Agenda
- 10. PTO - Automation
AGENDA LSS ADVISORY REVIEW PANEL MEETING MARCH 20 -
21, 1990 Tuesday, March 20, 1990 9:00 10:15 10:30 10:50 12:00 1:15 Agenda Overview and Panel Administrative Issues (John Hoyle, LSSARP Chairman)
Break Status of LSS Development (Barbara Cerny -
DOE)
Headers (Betsy Shelburne -
LSSA)
Lunch Break Information Items (LSSA/DOE Memorandum of Understanding (MOU); Revision of Topical Guidelines; Priority Loading Categories) 2:00 Automated Information Management Systems:
Experiences from Other Federal Agencies:
4:30 5:00 2:00 2:30 3:00 3:15 Patent and Trademark Office (Boyd Alexander)
National Archives and Records Administration (Bill Holmes, Director, Archival Research and Evaluation Staff)
Break Securities and Exchange Commission -
EDGAR (David Copenhafer, Deputy Director, Office of EDGAR Management)
Schedule and Agenda Planning Adjourn Wednesday, March 21, 1990 9:00 Site Visit to U.S. Patent and Trademark Office, Room 916, Crystal Park 2, 2121 Crystal Drive, Arlington, Virginia (Convenient to Crystal City stop on either Blue Line or Yellow Line of Metro)
Attendance List LSS Advisory Review Panel Meeting, March 20-21, 1990 Panel Members Nuclear Regulatory Commission John C. Hoyle, Panel Chairman Stuart A. Treby Phillip Altomare Department of Energy Barbara Cerny Dan Graser State of Nevada Kirk Balcom Local Government - Site Steve Bradhurst Local Government - Adjacent Dennis Bechtel Liza Vibert Peter Cummings National Coalition of American Indians Loretta V. Metoxen Nuclear Industry Jay Silberg Felix Killar U.S. Patent and Trademark Office (Non-Voting Member)
Boyd Alexander
Others Lloyd Donnelly, NRC/LSSA Chip Cameron,NRC/LSSA Betsy Shelburne, NRC/LSSA Avi Bender, NRC/LSSA Lynn Scattolini, NRC/LSSA Marilee Rood, NRC/LSSA Rosetta Virgilio, NRC/GPA/SP Jack Whetstine, NRC/ASLBP John Frye, NRC/ASLBP Kathryn Winsberg, NRC/OGC Steve Scott, NRC/IRM Susan Bilhorn, NRC/OCM/KR Eileen Tana, NRC/NMSS W. Richard Pierce, SAIC Roger B. Bradford, SAIC Stephen Spector, CNWRA Robbie Cooke, Wang Labs, Inc Jim Smith, Government Computer News LSS DELIVERABLES FROM THE REVISED SAIC CONTRACT DESIGN EFFORT Sys\\em Concept Feasibility Report 7/88 INFOw,tATION MANAGEMENT Archives Operations Procedures 3/89 LSS Thesaurus Maintenance Procedures 9/90 LSS Thesaurus (Draft) 3/90 Controlled Vocabulary 9/90 Controlled Vocabulary Haint. Procedures 9/90 LSS Prototype Cataloging Manual 1/89 PROTOTYPE Prototype Effort Analysis l Report 2/90 CAPTURE SYSTEM Capture System Design Document 3/89 Capture System (Stand alone) Specs 3/90 SEARCH SYSTEM Search System Design Documentation 9/90 Search System Text DBMS Functional Design 8/90 Search System H/W Configuration Designs 8/90 Search System Custom Applications S/W Design 8/90 Search System DBMS S/W Architecture Design 8/90 IMAGE SYSTEM Image System Design Documentation 8/90 Image System H/W Configuration Designs 7/90 Image System Custom Applications S/W Design 7/90 Image System DBMS S/W Architecture Design 7/90 WORKSTATIONS
- Workstation H/W Configuration Design 8/90 Workstation Applications S/W Architecture Design 8/90 COMMUNICATIONS Communications H/W Design 8/90 Communications Circuit Design 8/90
Task Name Teak 1*2 ** FEDSIM EFFORT (Rol lup)
Review SAIC Docunent1 90
~
~
~
Start Duratn End Har May Jul Sep Nov Jan Har Hay Jul Sep Nov Jan Har Hay Jul Sep Nov Jan Mar Hay Jul Sep Date (Mnths) Date 1*Mar*90 41 28-Sep-93 ############################ffl##ll#######################################II###########,
Develop Acquhl tlon Strat 2-Apr-90 6
5*0ct*90. L~mwm~,.-.v.w~L....... :,,a Acquire FEDSIM Contractor 2*Hay*90 4
5*Sep*90. g~
Manage Contract 6*Sep*90 TASK 3 **
ACOUIS. DOCS.
6*Sep*90 (Roll up)
Requirements Anelyala 6*Sep*90 Alternatives Anelyal1 8*Nov*90 Market Survey 6*Sep*90 Economic Analysis 6*Sep*90 RFP Draft 9*0ct*90 RFP Final 21*May*91 Perfom. Valid. Method.
15*Jan*91 TASK 4 **
NEGOT. 'EVAL. 2l*Aug*91 (Rol lup)
Issue RFP 23*Aug*91 Ques./Respons. to Vendor* 23*Aug*91 Evaluation (Incl. PV) 25-0ct-91 TASK 5 *
- ASSIST LSS IMPL 6*Apr*92 (Rol lup)
Award LSS Contract 6*Apr*92 IV&V, Review Deliverables 6*May*92 Acceptance Test* 1st St. 13*Nov*92 Acceptance Test* Addl.
16*Dec*92 35 28*Sep*93 11 22*Aug*91 4 14*Jan*91 3 14*Feb*91 3 11*Dec*90 4 14*Jan*91 6 18 Apr-91 2 23*Jul *91 7 22*Aug*91 6
4*Mar*92 0 23*Aug*91 2 24*0ct*91 4
4*Har*92 17 28*Sep-93 0
6*Apr*92 6 12*Nov*92 15*Dec*92 9 28-Sep-93
~-
~
M.
r TASK AREAS JO UTILIZE NRC SUPPORT PROCUREMENT/DEVELOPMENT Plan:
o,
Develop SOW for FEDSIM Support Contractor.
0 Obtain and Review SAit's Design Deliverables.
o Support Contractor Turns Deliverables into *turnkey Solicitation*.
Requires:
o Participate in solicitation, evaluation, award of FEDSIH's support contractor (OOE/NRC must provide 2 staff for the FEDSIM solicitation process).
o Reviews of SAIC Deliverables done in conjunction with FEDSIH will ensure that they can be used for a
INFORMATION MANAGEMENT Pl an:
o Establish screening criteria and plan for RIS current and backlog collection migrating to LSS.
o Revise and update OCRWM Indexing Manuals moving toward LSS submitter's profile.
o Survey, then develop plans, procedures, schedules for document & data backlog processing. (Includes information located with participants such as USGS.)
o RIS Keyword, Organization Codes, Thesaurus control with enhanced tools.
Requires:
o Relevancy criteria development.
o Prioritization criteria development.
o Header development for LSS submitter records within broader LSS Header Record.
o Thesaurus development, controlled vocabulary management, resolution and concordance of *document types*.
o Operations procedure development consistent with QA requirements.
- -------~
OFFICE OF THE LSS ADMINISTRATOR DEVELOPMENT OF LSSARP HEADER RECOMMENDATIONS A BRIEFING BY 1HE OFFICE OF TI-IE LSS ADMINISTRATOR IN SUPPORT OF 1HE LSSARP LSS ADVISORY REVIEW PANEL MEETING MARCH 20, 1990 BETI-IESDA, MARYIAND
OUTIJNE OF 1HE PRESENTATION TI-IE HEADER ISSUES o WHY HAVE A HEADER?
u WHAT CAPTURED?
o WHO CAPTURES ?
NEXT SfEPS BRIEFING HANDOUTS:
o Cerny, DOE, to Hoyle, LSSARP, dated Jan. 31, 1990 o Graser, DOE, to Cameron, LSSA, dated Sept. 21, 1989 o Cameron, NRC, to Graser, DOE, dated Aug. 7, 1989 2
TI-IE HEADER ISSUE:
WHAT INFORMATION ELEMENTS SHOUID BE CAPTURED BY WHOM ?
3
WHY START RESOLVING HEADER ISSUES NOW?
PARTICIPANT PLANNING INPUT INTO SUBMITfER'S INTERNAL RECORDS MANAGEMENT SYSTEMS AND PROCEDURES BUDGETING (Staff & Dollars) AND CONTRACTING BEGIN PROCESSING DOCUMENTS FOR FIRST LSS NODE IDENTIFY AND RESOLVE ISSUES THAT IMPACT LSS DESIGN IDENTIFY AND RESOLVE ISSUES THAT IMPACT LSS OPERATIONS & MAINTENANCE THE MATRIX OF POSSIBLE WHO AND WHAT ALTERNATIVES HAVE SIGNIFICANT COST-BENEFIT RAMIFICATIONS THE PROCESS FOR REASONED CONSIDERATION OF THE ALTERNATIVES WILL TAKE TIME 4
DECIDING ON HEADER NEEDS NOW MEANS DOING SO WITHOUT PERFECT INFORMATION, e.g.
DUPLICATE CHECKING ALGORITHM SEARCH & RETRIEVAL MECHANISMS FINAL DESIGN NOT LlKELY TO IMPACT BIBLIOGRAPHIC HEADE~ BUT COUID CHANGE ENHANCED HEADER REQUIREMENTS, e.g.
ABSTRACT --
HUMAN vs. SOFIW ARE COMPOSED 5
DEVELOPMENTS TO DATE PRELIMINARY UST OF FIEIDS FOR BIBUOGRAPHIC HEADER (SUBMfITERS' HEADER)
DEVELOPED BY TECHNICAL WORK.GROUP OF TI-IE ADVISORY COMMITfEE ON LSS RULE DATED MAY 1988 LISTED 1WENTY FIVE REQUIRED FIELDS
! ! NOT SIMPLE ! !
OOF/SAIC PROTO1YPE TEST CONDUCTED IN FAIL, 1989 DOF/SAIC PROTOTYPE TEST REPORT RELEASED IN FEBRUARY, 1990 PROVIDES INPUT RELEVANT TO HEADER DESIGN AND CAPTURE PROCEDURES NOT ONLY THE WHAT?, BUT ISSUES RELATED TO HOW? '
6
"IF A FULL-TEXT DATABASE, WHY DO WE NEED A HEADER ?"
IMPROVE USER'S SEARCH RESULTS RECAll and PRECISION o PROVIDE ADDITIONAL ACCESS POINTS WHICH:
-+ MIGHT BE IN TEXT, BUT NOT IN CONSISTENT FORMAT (names, numbers, subject)
-+ MIGHT NOT BE IN FULL-TEXT. (contract number, classification codes, project number) o IMPROVE RECALL GlVEN:
-+ V ARIElY OF DOCUMENTS IN LSS COLLECTION
-+ UNSTRUCTURED TEXT o IMPROVE PRECISION - NARROW or EXPAND UNlVERSE PRIOR TO FULL-TEXT SEARCH PROVIDE DESCRIPTIVE INFORMATION ABOUT DOCUMENTS o FOR ON-LINE REVIEW OF SEARCH RESULTS o FOR PRINTED LISTINGS, BIBLIOGRAPHIES, ANNOUNCEMENTS OF NEW ENTRIES IMPROVE SPEED OF CERTAIN QUERIES 7
BREAKDOWN OF QUESTION:
WHAT ELEMENTS ARE CAPTIJRED BY WHOM ?
WHAT?
WHO?
MINIMUM BIBLIOGRAPHIC ELEMENTS --
Date, Author, Title, etc.
EXTENSIVE BIBLIOGRAPHIC ELEMENTS --
Contract and Report numbers, Project numbers, Witnesses, Sponsoring Organization, etc.
SUBJECT INDEXING Descriptors -- Controlled Vocabulary (Thesaurus)
Identifiers -- Free Form Words and Phrases ABSTRACTS CLASSIFICATION NUMBERS OR CATEGORIES -- Based on Subject, Topical Guidelines, or DOE Mission Plan LSS P ARTI CIP ANTS (SUBMITTERS)
OR LSS ADMINISTRATOR'S CONTRACTOR 8
WHAT
?
9 l-...-
QUESTION:
GIVEN ALL THE BENEFITS, WHY NOT DEVELOP THE MOST EXTENSIVE HEADER ?
ANSWER:
DIFFERENT LEVELS IN CODING HA VE DIFFERENT BENEFITS AND VERY DIFFERENT COSTS 10
HEADER E~EMENTS:
SUMMARY
BENEFITS AND COST FACTORS B E N E F I T S TYPE~ C!F HEADER RECALL ELEMENTS EXAMPLES USES CONTENT "RT DY.Tor.ru, DL!TI""
DESCRIPTIVE Date, Author Structured Access low Pages, Title
, Presentation Report Number Condition TAGS GROUPING Project No.,
Structured Access average LIKE DOCUMENTS Event Date,
, Scoping
- not always in text Contract No.
References, LINKINGS Pointers Structured Access average S1JR.TF,CT "'"*1-"r.Kl'I:""!
Thesaurus Structured Access average CONTROLLED
,scoping Free Form
'Structured' Access UNCONTROLLED
& Thesaurus Update average 61l~IRACTlHG
-Annotative, Access &
low
-Indicative, Presentation
- aver.
-Informative
- high s;1.1a~~l[ls;AI1QH Top.Guidelines, Access, scoping, average Presentation
- COST PER DOCUMENT* STAFF HOURLY PAY RATE, INCLUDING OVERHEAD DOCUMENTS PROCESSED PER HOUR 11 OTHER high high high low low low average PRECISION CONTENT OTHER low high average high average high average low average low
- average
- average low
- h~gh high low COST FACTORS I TRAINING, LABOR
- QC, and VOLUME HOUR HAINT-PER
- RATE, ENANCE HOUR SALARY high average low average average average average average low low high high
_average average low average average average lowest high high lowest high high high average average
N THE Mr. Avi Bender* WMPC Pol icy & Program Control Branch Division of Waste Management Office of NMSS Ma 11 Stop 623-SS' U.S. Nuclear Regulatory Conmission Washington, O.C. 20555
Dear Mr. Bender:
TRANSMITTAL OF REVISION 2 REQUIREMENTS DEFINITION FOR A LICENSING INFORMATION MANAGEMENT SYSTEM FOR NUCLEAR WASTES
Reference:
Draft Report Requirements Definition for an Information Management System for Nuclear Waste, Aerospace Corporation, 31 January 1986 (6812-04.86.rlj.O5)
Enclosed are ten draft copies of the subject report incorporating the definition and rationale for the requirement of full text storage and retrieval of LIMS records. There will be a final version of this report in the late Spring following the Pilot Project Deroonstration Tests. The final draft will refine the requirements determined during the demonstra-tion program.
So far, these include:
(1) an update on the projected number of future records with an estimate on how many would be in the NRC system and in the DOE system,
{2) a new section on applicable standards, (3) a new section on the functional requirements of document capture, and
{4) any other relevant requirements that can be defined between your staff and ours.
Col'll1lents on this latest draft would be appreciated.
8604170,63 860~24 PDR WMRES EECAEROS A-4167 PDR RLJ:gbf Enclosures cc: P. Altomare* WMPC G.E. Aichinger
- SO/PMR {1 etter only)
~~~itu rru,rs, (J~ '{:I/Jr( t411f R. L.
hnson Systems Director Eastern Technical Division
.411.4/firffl41i11e Action Emplqyu CIIC-Of'FtCU LOCATtO.., u,o IAlf IL HCUHOO 90ULr-..AAO I L s<cu.. 00 C* l."0,00,1A
w
(
(
Aerospace Report No.
WPR-85(5812-04)-l DLU"T Revision 2 Requirements Definition for a Licensing Information Management System for Nuclear Waste Subtask l. Task Order 002 of FIN 4167 Programmatic System Studies and Analyses March 1986 Prepared for Policy and Program Control Branch Division of Waste Management U.S. NUCLEAR REGULATORY COMMISSION Washington, D.C.
Prepared by Government Support Division THE AEROSPACE CORPORATION Washington, D.C.
Contract No. F04701-83-C-0084
WHO
?
14
QUESTION:
WHO CAPTIJRES WHAT ELEMENTS OF THE HEADER?
ANSWER: MUST GO BACK TO UNDERSTANDING AS REFLECfED IN THE LSS RULE PARTICIPANTS WILL SUBMIT A MINIMUM SERIES OF DESCRIPTIVE FIELDS (PARAPHRASE OF DEFINITION OF "BIBLIOGRAPHIC" HEADER IN THE LSS RULE)
LSS ADMINISTRATOR WILL ENHANCE TO A FULL HEADER WITH SUBJECT TERMS AND OTHER INFORMATION, AS NECESSARY 1 5
SO WHY WORRY ABOUT TIIE "WHO" ?
LSS RULE DOES NOT CLEARLY DRAW THE LINE WHERE MINIMUM ENDS & ENHANCED BEGINS OLD WORKING GROUP RECOMMENDED 1WENlY FIVE REQUIRED FIELDS FOR SUBMITTER'S HEADER IS THIS MINIMUM ?
CRITERIA TO CONSIDER KNOWLEDGE:
SOME ELEMENTS ONLY KNOWN BY SUBMITTER OUALllY:
SOME ELEMENTS "BEST' KNOWN BY SUBMITTER COSTS:
CONSISTENCY AND QUALITY MIGHT BE BETTER IF DONE BY CENTRAL STAFF
-- LSSA SOME ELEMENTS ALREADY DONE BY SOME SUBMITTERS IN THEIR OWN RECORDS MANAGEMENT PROCESSES -- WHY DUPLICATE EFFORT?
BURDEN ON SUBMITTERS TO DO MORE SOPHISTICATED CATALOGING 16
CRITERIA NEED TO BE APPLIED IN 1WO AREAS:
ALL BIBLIOGRAPHIC ELEMENTS CLASSIFICATION CODES 17
SUGGESTED NEXT STEPS FORM LSSARP WORKING GROUP TO RECOMMEND SUBMrITER & ENHANCED HEADERS TO LSSARP GOAL:
TASKS:
MAKE REASONED STUDY BASED ON PREVIOUS WORK TO DATE NOT TO REINVENT AND REDO PREVIOUS WORK WORKING GROUP DEVELOPS WORK PLAN, SUBMITS HEADER RECOMMENDATIONS TO LSSARP MEMBERS FOR WRITTEN COMMENTS, and REVISES RECOMMENDATION BASED ON MEMBERS' COMMENTS 18
SUGGESTED DECISIONS FOR TODAY MEMBERSHIP OF LSSARP WORKING GROUP PANEL MEMBER ORGANIZATION REPRESENTATIVES HAVING KNOWLEDGE OF HEADER DESIGN & USE SAIC REPRESENTATION LSS ADMINISTRATOR'S ROLE
.. WILLING TO SERVE AS WORKING GROUP MEMBER
.. WILLING TO PROVIDE SPACE AND CLERICAL SUPPORT
.. WILLING TO PROVIDE LIMITED TECHNICAL ASSISTANCE THROUGH CONSULTANTS TENTATIVE SCHEDULE WHEN IS A FINAL DECISION NEEDED?
.. AS SOON AS PRACTICAL, TENTATIVELY SCHEDULED FOR FALL '90 MEETING 19
Mr. John Hoyle Chairman LSS Advisory Review Panel Department of Energy Washington, DC 20585 JAN S i 7900 U.S. Nuclear Regulatory Conrnission Washington, D.C.
20555 Re: Background Materials on LSS Headers
Dear Hr. Hoyle:
In response to the discussions held at the December, 1989 Advisory Review Panel meeting in Reno, we are forwarding materials related to LSS bibliographic header development. Four documents trace the header development process from late 1987 through May, 1988 and are enclosed for your information.
We also checked with Mr. Richard Pierce {of SAIC's LSS design project team),
who participated in the technical working group during the negotiated rulemaking process, as to the status of the headers at the time when the rulemaking process was completed. His recollection is that the technical working group and the negotiating conrnittee were able to develop a list of fields only for the submitters' headers but not the more comprehensive version required for the LSS environment.
He stated that the composition of the LSS header was an issue that was deferred, to be addressed at a later time by the Panel and the Office of the LSS Administrator. This seems to be consistent with the final rule, Sections 2.1011 (f)(l) where the Panel is to provide advice "... on the fundamental issues of the design and development... ", and 2.1011 (f)(2)(i) where the Panel is to provide advice on "... format standards for the submission of documentary material... such as... bibliographic headers...
- and with the broader mandate t o the Office of the LSS Administrator and the Advisory Review Panel provided in Sections 2.1011 (d)(8) and 2.1011 (d}(l4).
In reviewing our files, however, we noted that the documentation trail ends somewhat abruptly and that there is a "missing* piece of documentation -- that being some acceptance or affirmation by the negotiating convnittee of the final piece of documentation entitled "Draft Bibliographic Header Fields, Rev. 3, 5-17-88".
I think it would be useful to all the potential participants if the Panel can definitize the list of fields for submitters' headers at the next meeting.
Please feel free to contact Dan Graser of my staff at 586-4589 if you require any further background information or assistance.
Enclosures Sincerely,
~~'-
~ c,_~,,,,--
Barbara A. Cerny 6
Director Information Resources Management Division Office of Civilian Radioactive Waste Management
- 1. "Draft Bibliographic Header Fields", Rev.3, 5-17-88
- 2. Draft "Minutes of the HLW Licensing Support System Advisory Committee Meeting", April 18-19, 1988, Washington, D.C.
- 3. "Information Retrieval Systems: A Tutorial" Prepared by Negotiated Rulemaking Technical Staff, February 3, 1988
- 4. Attendance List and Attachment 8 (Glossary of Terms), "Meeting of the HLW Licensing Support System Advisory Convnittee*, November 19-20, 1987 cc:
L. Desell, RW-331
I DRAFT BIBLIOGRAPHIC HEADER FIELDS Rev. 3 5-17-88 The fields 1n the following list are considered by the Technical Staff to be either required to filled in by each participating organization submitting documents to the LSS, or in some cases are optional. They are expected to be a subset of the *fu11* header to be used in the LSS.
Some fields are a plicable to only certain types of documents, however. For this purpose
- -a-document is considered to be any document which can stand alone and could possibly be searched by a user, whether or not it 1s an attachment or enclosure to another document. A letter with three stand-alone attachments would require 4 bibliographic headers to be submitted - one for each of the letter and attachments.
It will, of course, be necessary to develop detailed coding instructions on how to fill out the bibliographic header.
REQUIRED FIELDS:
Accession No.2 (non-system) -
This would be a unique alpha numeric consecutive number assigned by the submitting agency for two purposes:
- 1.
To distinguish one agency's submitted documents from another's, thus allowing an agency to retrieve all of its documents.
- 2.
To perform a control function, i.e., ensuring that every submitted document from an agency is received and entered into the LSS.
Submitter Center1 - the office, site, division, etc. that is submitting the document to the LSS.
Document Type1 - the format in which the information is presented, e.g.
correspondence, report, regulation, etc.
Number of pages2 - the length of the entire document represented as one number.
Title2 - the title that appears on the docume~t.
Description - *in cases where there is no title or the title does not convey sufficient information, this is a brief description of the document, e.g.,
- 1etter concerning Negotiated Rule-Making Conrnittee Meeting Agenda* or
- Progress report for April 1988 - June 1988*.
Author(s}2 - the name of each individual authoring the article, report, etc.
Author Organization(s}1 - the name of the organization, corporation, or agency producing the document or the corresponding organization, corporation or agency to which the author belongs.
Sponsoring Agency1 - the agency(cies) who provided the funding for the work performed in the document.
DRAFT BIBLIOGRAPHIC HEADER FIELDS (continued)
Rec1pient(s)2 - the name(s) of those persons receiving the document either as the addressee(s), the distribution list, or the recipients of copies c*cc* or *bee*).
Recipient o*rganization(s)1 - the corresponding organization, corporation, or agency to which the recipient belongs.
Journal Information1 - if the document is an article from a journal, the name and other journal information that would distinguish the article.
Document Date2 - the date contained on the document that is the date that the document was created or printed.
Errata Date2 - if the document 1s an errata sheet, the date of these corrections.
Contract No.2 - the contract number, if any, under which the work reported in the document was performed.
Document or Report No.(s)2 - the number(s) assigned to the document by the producers and by the sponsoring agency(ies) if any Edition - the version of a document, whether draft, revision, supplement, etc.
Meeting Date2 - the date referenced in or included in the text of a document of a meeting that has taken or will take place.
Site of Activity1 - the location, if pertinent, to which the work in the documen~ pertains.
Document Reference1 - The document whose content or production is influenced by the submitted document.
Image/ASCII Identifier2 - Microfonn frame nu~ber or file identification of corresponding _image and file identification of corresponding ASCII file.
Protected1 - The type of privilege or protection (if any) being claimed for the document.
Document Condition1
- terms such as pages missing, illegible portions, attachments missing, marginalia present, etc.
Parent Document Identif1cation2 - Accession number of the parent document if this 1s a stand alone attachment or enclosure, or the accession number(s) of the stand-alone attachments or enclosures if this is a parent document.
Abstract for Non-Documents - a full description of the item including such information as dates, purpose, physical description, location, etc. For raw data
- a full description of the data including such items as how the data was collected, format, purpose, type, dates, etc.
DRAIT BIBLIOGRAPHIC HEADER FIELDS (continued)
THE FOLLOWING FIELDS ARE OPTIONALs Descr1ptors1 - terms assigned from the LSS Thesaurus that best represent the content of the document. (Use of this field requires adherence to additional LSS coding procedures.)
Identifiers -
terms that are not contained in the Thesaurus that the submitter believes will assist a user in retrieving the document; these may be *buzz words* or words representing new concepts that have not yet appeared in the Thesaurus.
Cormtents - any information not contained in the listed fields that would be helpful to the LSS catalogers.
Abstract - a surrrnary of the contents of the document.
Notes:
1 -
governed by an authority 11st 2
- governed by format rules
\\
-~----
May 3, 1988
*-DR APT-------
KDnJTBS OP TD BIM LYCBNSIJfG S'O'PPOM' SYSTBK ADVISORY COMNIT'l'D XU'l'DI~
APRIL 18-19, 1988 Washington, D.C.
MEETING U>CATION AND ATTENDANCE The aixtb meeting of the-HLW Licenaing Support Sytem Advisory Committee (hereafter referred to as the committee) was held on March 18, 1988 from 9:00 a.m. to 5:00 P*** and April 19, 1988 from 9:00 a.m. to 3:30 P*** The meeting was held in the offices of The Conservation Foundation in Wahington, o.c.
A list of committee members and members of the public who attended this meeting is appended hereto as Attachment 1.
APPROVAL OF THE MINUTES As its first item of business, the committee discussed the draft minutes from the committee** March 22-24, 1987 meeting.
Several committee members indicated that they had not had time to review these draft minutes in sufficient, detail. Others indicated that they would provide suggestions to the facilitator for changes that they felt were relatively minor and non-ubstantive in nature. Thus, no changes to the draft minutes of the March meeting were officially approved by the committee.
EXPLANATION OF CHANGES MADE TO THE NRC 1S DRAFT RULE NRC representatives explained that the draft text of a new Subpart J to 10 CFR Part 2 that was distributed to committee ----------
I With no other 9eneral quetion or comaenta, the committee agreed to take a rece* to provide com.mitt** aembera who had not yet ***nth* newly reviaed text an opportunity to review it in detail. Th* committee alao agree that upon reconvenin9, they would dicu** th* dratt rule aection by **ction.
DISCUSSION OF THE DRAFT RULE Section 2.1000 - Scope ot Subpart NRC representatives explained that th* intent of thia aection was to incorporate by reference certain provisions of Subpart G, NRC'* rule of general applicability, to the rule for the BLW licensing proceeding which will be published as Subpart J in Part 2.
NRC was asked why sections 2.740 and 2.741 were not listed in the provisions of Subpart' G that would be incorporated by reference.
NRC representatives responded that these sections were essentially lifted verbatim, with minor changes to accomodate the special circumstances ot th* HLW licensing proceeding and the proposed use of the LS$ into aections 2.101s and 2.1019 of this draft rule.
Section 2.1001 - Definitions Bibliographic Header The representative ot the environmental coalition stated that the definition used tor thia term might be a problem because ot the limitations that are placed on public access to the I.SS under section 2.1001.
The facilitator briefly reported on the activities ot the technical I
r...
I --
work qroup which, h* explained, i* likely to recomaend that the parti** be required to complete a impl* *bibliographic header,*
which would include information on uch itma ** the date, author, recipient and ubject of the document, and that th* I.SS Adlllinitrator would be required to prepare a *ore complete header for th* document which would include *ore infonation than that upplied by the party. 'l'hi* additional information might include uch items as keywords and an abstract of the document.
NRC representatives explained that their intent was to leave thi*
iaaue open for now and resolve it at *om* later date through the tsS Administrator ~nd the use of the proposed advisory review board which will make recommendation* to the I.SS Administrator.
No apecific changes to this definition were auggested.
Document NRC representatives were asked what the phrase
- associated with the business of" was meant to imply.
They replied that they intented that this phrase would make it clear that contractor documents as well as agency document were **ant to be included in the I.SS.
The committee agreed to stike the part of this definition that was added by the NRC negotiating team from the definition used in the original text, such that the
- definition would read:
"Document means any written, printed, recorded, magnetic, graphic matter or other documentary material, regardless of form or characteristic.*
EEI representatives stated that the term documentary aaterial was not defined in this definition action but it was defined in the text of the rule under Section 2.1003.
The committee ageed that the sentence which defined this term in
INFORMATION RETRIEVAL SYSTEMS A TUTORIAL Prepared By Negotiated Rulemaking Technical Staff FEBRUARY 3, 1988
CONTENTS Page
1.0 INTRODUCTION
........................................... 1 1.1 PURPOSE................................................ }
1.2 HOW TO USE THIS DOCUMENT............................... !
2.0 SEARCH ANO RETRIEVAL........................................ 3 2.1 BIBLIOGRAPHIC HEADER................................... 3 2.2 BIBLIOGRAPHIC HEADER WITH ABSTRACT..................... 3 2.3 BIBLIOGRAPHIC HEADER WITH SUBJECT TERMS................ 4 2.4 BIBLIOGRAPHIC HEADER WITH ABSTRACT AND SUBJECT TERMS... 4 2.5 FULL TEXT.............................................. 5
- 2. 6 ENHANCED FULL TEXT..................................... 5 2.7 RETRIEVAL ENHANCEMENTS................................. 5
- 3. 0 DATA CAPTURE................................................ 6
- 3. 1 IMAGES................................................. 6 3.1.1 Electronic...................................... 6 3.1.2 Microform....................................... 6 3.2 FULL TEXT.............................................. 6 3.2.1 Optical Character Recognition (OCR) Process..... 6 3.2.2 Rekeying........................................ 7 3.2.3 Word Processing................................. 7
- 3. 3 HARD COPY.............................................. 8 4.0 CATALOGING AND INDEXING..................................... 9 4.1 HEADERS................................................ 9 4.1.l Bibliographic Headers........................... 9 4.1. 2 Subject Terms................................... 9 4.1.3 Abstract........................................ 9 4.2 FULL TEXT.............................................. 10 5.0 STORAGE..................................................... 11 5. 1 HARD CO PY.............................................. 11
- 5. 2 MICROFORM.............................................. 11 5.3 ELECTRONIC............................................. 11 5. 3. l Opt i cal Di s k.................................... 11 5.3.2 Magnetic Tape................................... 12 5.3.3 Magnetic Disk................................... 12 6.0 DISPLAY..................................................... 13 6. 1 IMAGE.................................................. 13 6.2 ASCII TEXT............................................. 13 6.3 HEAD£R................................................. 14
- 7. 0 DOCUMENT OUTPUT............................................. 15
- 7. 1 HARD COPY.............................................. 15 7.2 MICROFORM.............................................. 15
- 7. 3 FACSIMILE.............................................. 15
- 8. 0 REPRESENTATIVE SCENARIOS.................................... 16 9.0 ADDITIONAL SYSTEM PARAMETERS................................ 19 APPENDIX GLOSSARY.................................................... A-1
1.0 INTRODUCTION
This document has been prepared jointly by technical staff of the Conservation Foundation, the Nuclear Regulatory Corrmission, and Science Applications International Corporation (SAIC), the DOE LSS contractor.
Opinions expressed in this document are those of the authors and are based on review of the literature and "hands-on" experience in designing and using on-line information and litigation support systems.
For further information or clarification, please contact:
Kirk Balcom (703) 476-1100 Avi Bender (301) 492-9914 Dick Pierce {703) 821-4350 1.1 PURPOSE The purpose of this document 1s to provide the Negotiated Rulemaking Advisory Convnittee with a tutorial on basic information retrieval concepts and to establish a common framework and vocabulary for all future discussions.
The document provides an explanation of search and retrieval
- methods, and a discussion of various storage, indexing and display techniques.
This is followed by a description of convnon options for database creation and for the retrieval process. A glossary is included to define the most commonly used terms.
A very important system requirement, and the ultimate measure of success, is to provide accurate and timely access to all information within the LSS.
There are other requirements as well and each imposes a different design specification.
A major premise in developing this guide was to focus attention on a major technical driving factor, information search and retrieval concepts, and less on the hardware, cost and design aspects.
These latter issues will be addressed at a later stage when more definitive requirements are established.
1.2 HOW TO USE THIS DOCUMENT Section 2 of the report will guide you through the convnon ways to search and retrieve documents from an on-line database and will describe some of the advantages and disadvantages of each option. Section 3 describes how the information can be captured from hard copy or directly from word processing equipment in order to create the electronic database.
Section 4 then takes you through the various options for cataloging and indexing.
Storage options are described in Section 5 and document display and output options are described in Sections 5 and 6.
Using Section 2 as a menu, the reader can then turn to Section 8 to see the various options for creating a system to achieve the desired search and retrieval alternative.
For example, if it is determined that only an abstract/ bibliographic search will be required then all the options described under scenario 8 are possible. If enhanced full text search is the option then all the options under scenario F are possible.
Closer scrutiny of scenarios A through F reveals redundancy of options in storage, 1
display, database creation indexing, display and workstations.
Specific requirements such as *perform full text search and retrieve original highlighted ASCII text within 60 seconds and image within 24 hours2.777778e-4 days <br />0.00667 hours <br />3.968254e-5 weeks <br />9.132e-6 months <br />" will begin.to eliminate some of the options. Otherwise almost every conceivable scenario is possible but not necessarily practical. The actual approach for developing the LSS may involve some or all of scenarios A. through F.
Finally, while search and retrieval techniques are certainly important factors in determining system requirements, there are additional performance parameters which must be defined in order to specify a system.
These are discussed briefly in Section 9.
2
2.0 SEARCH AND RETRIEVAL Documents ire searched ind retrieved either manually through physical files, or electronically through computer searches of bibliographic headers, subject terms, abstracts, or full document text ind are then available for review in electronic or hard copy readable form.
A search strategy generally retrieves one or more *hits* (those documents which meet the terms of the search query}.
The success of the search strategy is measured by two factors--recall and precision. Recall is the number of documents retrieved in relation to the number of documents that exist on the query.
Perfect or 100% recall is retrieving all of the documents that satisfy the query.
Precision is the number of retrieved documents that actually pertain to the query in relation to the total number of documents retrieved. Perfect or 100% precision means that there are no "false drops" (irrelevant documents). Retrieval systems are usually rated by how well they perform on recall and precision.
In general, as recall
- improves, precision decreases.
As the database grows, the user tends to reduce the number of hits by more restrictive searches, i.e. adding conditions which reduce recall. The third factor to consider is whether the amount of information displayed for each "hit" is sufficient to ascertain whether the "hit" is useful.
Good system design as well as experience in using on-line databases are important factors in improving document retrieval.
2.1 BIBLIOGRAPHIC HEADER A bibliographic header is composed of the essential parts of the document, such as author, title, date, etc., along with descriptive features, such as type of document, number of pages, etc. A search can be conducted on any word or date in the header. This type of system provides excellent recall and precision for such queries as *give me a list of all documents written by author x" or "give me a list of all documents published in the year l9xx."
The system does not lend itself to content based searches since a search term must appear in the header. Therefore recall and precision are poor for content based searches.
In addition, while the display of information is sufficient for an author or date search, it gives little or no indication of the validity or usefulness of the document in a subject search.
Generally a review of the document is needed to determine usefulness.
2.2 BIBLIOGRAPHIC HEADER WITH ABSTRACT The addition of a searchable abstract to the header improves the recall and precision for subject searches, as well as the ability to determine the usefulness of each document. A searcher must take into account,
- however, all possible synonyms for the subject term in order to increase recall.
A well-written abstract that includes those words most likely to be used for retrieving that document will also substantially increase recall. In some cases, an extensive abstract can actually eliminate the need for obtaining a hard copy of the document.
As a whole, recall is poor to average and precision is about average for this system, while the display of information is greatly improved over a bibliographic header. This is a more costly system than the header-only system since the author or an abstractor is 3
needed to provide the abstract.
2,3 BIBLIOGRAPHIC HEADER WITH SUBJECT TERMS This system adds subject terms to the header, also improving recall and precision for subject searches. However, the information displayed for each
- hit" is a poor indication of the usefulness of the document as subject terms are frequently limited in number and therefore are only an indication of the subject matter of the document. A hard copy of the document is generally necessary to determine its usefulness in meeting the search criteria. Subject terms are also useful in eliminating ambiguities of words in the header.
Overall, the system is about average for recall and precision and below average for display.
2.4 BIBLIOGRAPHIC HEADER WITH ABSTRACT AND SUBJECT TERMS The addition of both an abstract and subject terms to the header allows for a greater degree of recall than the previous systems. A searcher can also improve precision by looking at keywords assigned to a useful document and limit a search by using the same keywords.
Again, the abstract assists in determining whether the document is useful.
Recall is rated average to good, precision is average, and display is above average.
2.5 FULL TEXT Full text indexing allows the searcher to search on every word within the document.
If such a search is performed in conjunction with a synonym file,
the resulting recall of documents may be higher than any of the preceding methods but with a relatively lower than average level of precision.
Without the benefit of a synonym file the researcher (unless very knowledgeable in the field) will run into problems of semantics.
For
- example, searching on volcanic may not result in documents using the words earthquake, ground movement, slip fault, tectonic...
Full text search is a superior method for content based searches used to identify places,
- people, and terms with the documents.
Searching for concepts, however, is not an easy matter since concepts generally do not appear as words in the text. Full text indexing without any enhancement can create an unwieldy document retrieval situation where instead of finding the needle in th~ haystack the user retrieves the needle and the haystack.
Depending on the software package used, display is generally above average since one can see the highlighted words within context.
Built in term weighting algorithms are also available to display documents according to an importance ranking factor based on the frequency of the hit word within the document.
Compared to abstracts and subject terms, full text requires the least amount of human intervention during the database indexing process.
2.6 ENHANCED FULL TEXT The approach that maximizes the virtues of all the preceding indexing schemes is enhanced full text.
By combining bibliographic header, which provides a structure for the information before it enters the database, with 4
the full text which provides for content based searches, and subject terms which provide concepts, the resulting recall and precision is superior. The user now has greater flexibility to use either full text
- search, bibliographic header, subject terms, or a combination of the three.
2.7 RETRIEVAL ENHANCEMENTS Regardless of which system is chosen for a database, there are certain retrieval enhancements that should also be considered to improve searching.
These include:
a)
Boolean Logic - the use of connectors such as "and," "or," and "not."
b)
Range Searching the use of phrases such as *from to
" or "between... and... " and other similar phrases for searching date or other ranges.
c)
Field Searching - the capability of limiting the search to a specific field, such as author, date, title, etc.
d)
Phrase Searching - the ability to use phrases such as "nuclear waste" or "nuclear power plant."
e)
Proximity - searching for a word within x number of words of another word, e.g., the word "nuclear" within 3 words of "power. "
f)
Sorting - sorting the output chronologically, alphabetically by author, etc.
g) limiting - limiting the output to certain years, a specific language, a geographical area.
h)
KWIC or keyword in context format - displays the keyword surrounded by the 25 or so words before and after.
These are only some of the major enhancements to be considered.
5
----=--- - ----
3.0 DATA CAPTURE Data capture is the process by which documents and information become a part of the LSS.
The process can take several forms including placing documents into a file cabinet, entering the full text of a document into machine readable (ASCII)
- form, and capturing the image on a microfilm or in an electronic {bit-mapped) image file.
3.1 IMAGES 3.1.1 Electronic Capturing an electronic image of a document from hard copy (paper) is a straight-forward process consisting of feeding documents in to a scanning device, checking the resultant image, and entering a file identification of the document.
The image is a replica of the original, including margin notes, signatures, graphics, date stamps, etc. which can not be captured in ASCII form.
Images are the only reasonable method of capturing graphic oriented documents.
Electronic images require relatively large amounts of storage, typically 50,000 to 100,000 bytes per 8 1/ 2 x 11 inch page, as compared to ASCII at 2500 to 3000 bytes per page.
Thus the use of images requires high density storage devices such as optical disks.
Although images are electronic, the characters or words on the page cannot be recognized by the computer until the image is processed by opt ical character recognition.
3.1.2 Microform Microform is used to describe all of the reduced size photographic capture processes such as microfilm and microfiche.
This type of document capture has been used for several years and is fairly automated and inexpensive.
Retrieval of the proper image must be assisted by a computerized index if the files are large, and viewing of the document is usually accomplished by a projection process.
Recent developments have combined the storage capabilities of microfilm with the versatility of electronic images.
In this configuration, a microfilm image is located automatically in a storage
- device, scanned electronically, and transmitted to a terminal for viewing.
This process is slower than retrieving electronic images from optical disks.
3.2 FULL-TEXT The full text of a document may be entered into the LSS to be available to browse or read as part of the document selection process, or more likely to be used for full-text search by software or hardware.
The three processes which are used to enter the full text of a document into the system are optical character recognition, rekeying, and conversion from machine readable form from word processing.
3.2.1 Optical Character Recognition (OCR) Process The OCR process converts an electronic (bit-mapped) image of a page into 6
ASCII text (a bit pattern for each character and punctuation).
The quality of the text produced is highly dependent on the quality of the image which is submitted to the process - i.e. an original printed page with uniform type will produce better results than a fourth generation photocopy with smudges and extraneous markings. Current generation OCR devices can produce text with 99.5% to 99.9% accuracy under optimum conditions. Note that this would still result in 3 to 15 errors in a 3000 character page.
Correction of errors is a manual process although tools such as spelling checkers can assist.
(A nontrivial consideration is whether or not to correct spelling errors in the original text. ) The necessity to correct the errors is dependent on their magnitude and other factors such as:
- The effect of the errors on full-text retrieval.
- The use of the ASCII text in reading or browsing the document.
- The use of the ASCII text for downloading and file transfer.
The advantages of the OCR process is that it is relatively automated and can be performed without much human intervention up to the point of review and correction.
If correction is minimal or not required (i.e. high quality documents),
costs can be as low as S.20 to S.40 per page.
With many corrections (i.e. low quality documents), costs can be as much as S2.50 to S3.00 per page.
If the total costs exceed S3.00 per page, it can be less expensive to key in the document directly.
Continuous improvements are being made in OCR technology which will increase speed of production and reduce the error rate. Presently OCR of an image made from scanning of a good quality paper copy can be reasonably performed, however OCR from an image produced by blow-back of a microfiche or microfilm is not considered feasible.
3.2.2 Rekeying Keying a document into a computer is accomplished simply by typing the characters directly on the keyboard. This rather low-tech approach is also the most costly method.
At typical local service center rates of Sl.00 per 1000 characters, a readable page will cost S2.50 to $3.00 to enter in ASC II form.
Rekeying is the only reliable method for poor quality documents such as those produced from microform or deteriorated paper.
3.2.3 Word Processing Documents which have been prepared on a computer by word processing software, for example, are already in machine readable format.
However due to the fact that most full-text programs require that files be entered in ASCII form and computer communications are not standardized, some conversion is required. Generally speaking, tools are available for this purpose.
The major problem with receiving data in machine readable format is the quality assurance. It is necessary that the machine readable version of the document be verified as a true representation of the hard copy.
(In many cases last minute changes to a document are made on a typewriter.)
7
Costs for this process can be minimal 1f the document 1s produced on the same computer and the conversion process 1s automated.
Given the variety of parties and contractors associated with the repository, 1t 1s not expected that costs will be negligible for this method, but they will certainly be less than rekeying and probably less than OCR with correction.
3.3 HARD COPY Filing of information in hard copy is the simplest and most direct form, however it is probably the most unwieldy. Given the geographic distribution of retrieval, at least two, and probably more copies of the data would be required.
As with microform capture, a computer aided index is a
requirement for large databases.
One of the major problems with hard copy storage is security. Documents are not always returned to the files or may be misfiled. Hard copy, provided the copy is faithful to the original, is easy to read, requiring no projection device or display terminal.
8
4.0 CATALOGING AND INDEXING Cataloging and indexing ire processes for preparing the LSS records for retrieval.
The type of cataloging is directly related to the search and retrieval techniques to be employed.
4.1 HEADERS 4.1.1 Bibliographic Headers Bibliographic cataloging is the simplest form of a description of a document.
It results in a series of descriptive terms, usually objective in nature, which can be assigned by relatively unskilled clerical personnel.
Examples are author, recipient, date, title, type of document, etc.
The bibliographic header represents the minimum information which might be entered into an information system about a document.
It is the opinion of the technical staff that all records in the LSS should have a bibliographic header, even if more complete indexing including full-text is used.
The bibliographic header is generally typed into a "fill in the blanks" form as a document is entered into the system. The information could conceivably be provided by the organization submitting the document as part of the submission process.
4.1.2 Subject Terms Subject terms represent an addition to the header which provides informat ion about the material in the document.
They are particularly useful for technical reports and similar lengthy documents and less important for correspondence.
There are differences of opinion over the best method to assign subject terms to a document, whether by an information management
{librarian) specialist, the author, an independent subject expert, or some combination.
The assignment of subject terms to a document, if it is to result in successful retrieval, should be made by a highly skilled individual together with such tools as an authority list and controlled vocabulary. Cost may therefore be a major factor in considering the utility of adding subject terms to the header. While the assignment is subjective and dependent upon the skill of the individual, subject terms can enhance retrieval by incorporating terms which are not used in the text itself but are the terms normally used by the searcher. Subject terms are typically entered into fixed fields of a structured database.
4.1.3 Abstract Adding the abstract to a header can be less costly in cases where it has been provided as part of the document. If the abstract must be created for the header, costs and the requirement for skilled individuals become a consideration.
Most database programs have text fields which are sufficiently large to ho7d the abstroct.
In effect the abstract is searched in "full-text-.
If a document contains an abstract and is entered in searchable full-text, the abstract will of course be included automati cally as a search mechanism.
9
,.2 FULL TEXT In order for all the words in documents to be searched by software the text must be indexed. All software full-text search programs include the tools to be used in this process; thus it is a relatively automated process and does not require skilled information management personnel. The resulting file, sometimes referred to as an inverted file, contains a sorted list of all words in the documents (except convnon words such as a, an, the, was, is, etc.) and a pointer to the location(s) of the words in the documents.
The size of the inverted file is a function of the program which is used for the indexing, but it can vary from 50% to 200% of the original ASCII file.
Even after the inverted file has been created, new documents can be added to the system and the index modified to accorrwnodate the additional information.
Eventually,
- however, a modified index becomes inefficient to use, and a reindexing of the entire file is required.
Full text indexing, although not labor intensive, requires major computer resources and time to process large files. There are several examples, however, of commercial and government full text retrieval applications that are large and complex and still deliver reasonable indexing and retrieval response times. The files will require segmentation, although this may be invisible to the user.
10
5.0 STORAGE 5.1 HARO COPY Hard copy (paper} is one possible mechanism for the information required in the LSS.
The major problems with this method are the difficulties of locating documents, missing documents and pages due to misfiling or borrowing, and the space required. For 10 million pages approximately 600-700 filing cabinets occupying 4000-5000 square feet would be required.
Advantages of hard copy include the readability of the document and the fact that the document is a true representation of the original including signatures.
5.2 MICR0F0RM Storage in microfilm or microfiche provides a more condensed medium and therefore reduces the storage volume.
Automated machinery is available to assist in locating a specific frame, but once it is found,
a projection device is required in order to read the page.
Quality of microform varies widely in readability and depends to a great extent on the quality of the original document. Missing documents can also be a problem with microform, but missing pages are not typical assuming the whole document was original ly captured.
5.3 ELECTRONIC To understand the electronic storage requirements for various techniques of capture and retrieval, consider an example document consisting of 5 pages of text and one page of graphic information. Storage requirements for the various cataloging and indexing forms are as follows :
Bibliographic header Index to bibliographic header Subject terms Index for subject terms Abstract Inverted file of abstract ASCII text of document Inverted file of text Image of graphic page Image of text pages Assumption 1500 characters Not all terms indexed 10 phrases at 30 char/ phrase All terms indexed One-half page Abstract full-text searchable 3000 characters/ page Full-text searchable by software 300 dpi compressed@ 20: l 300 dpi compressed@ 20: l TOTAL Bytes 1500 1000 300 300 1500 1500 15,000 15,000 55,000 275,000 366, 100 From this example, one can judge the relative impact on storage requirements of various search, retrieval, and display options.
5.3. 1 Optical Disk Optical disks represent the least cost electronic medi um of storage for large volumes of data. Current optical disk technology is "write-once-read-11 7
many" (WORM), which means that the information cannot be erased or changed.
Such a medium is ideal for archival documents.
Erasable optical disks are now arriving on the market, but the technology and storage density is not as advanced as WORM.
A 12" optical disk storing 6.4 gigabytes can contain 100,000 pages in image form, l,OOO,OOO pages in indexed full-text, or headers for about 1,000,000 documents.
Optical disks can be searched randomly for files, thus resulting in faster response than serial devices such as microfilm.
5.3.2 Magnetic Tape Magnetic tape is a relatively low cost storage medium, however it requires manual intervention (to mount the right tape on the tape reader) and retrieval is relatively slow. Magnetic tape is therefore not often used for information which must be accessed frequently, but is well suited for backup storage which is only accessed in the event of failure of the primary storage media.
5.3.3 Magnetic Disk Magnetic disks are probably the highest cost storage media for large (gigabyte) storage requi rements.
Its advantage is primarily the speed of retrieval.
12
6.0 DISPLAY All retrieval techniques will result 1n 111st of *h1ts*, Le. documents which meet the query.
Since no query technique 1s 10~ efficient, additional review is probably required to make the final determination if the hits are indeed documents of interest to the user. This may be done on the screen by reviewing additional information on each document which may be stored in the system.
Such information could be the image of each page, the ASCII text, the header, or a report such as a list of all documents by a specific author.
6.1 IMAGE The electronic image of the page, displayed on a high-resolution terminal, provides a true representation of the original document in a form which can be read or skimmed.
All markings on the page, including marginalia, signatures, and date stamps will be reproduced in the image as well as figures and graphics which cannot be stored electronically in any other form.
Images must be viewed on a high-resolution (100 dots per inch minimum) screen to be readable.
The interface device between the screen and the computer will include a compression/ decompression board which permits the storage of the image to be in a compressed form, approximately 1/10 to 1/ 30 of the original scanned image.
This hardware is of course more expensive than standard monochrome monitors and interface devices.
Due to the fact that images, even in the compressed form, require some 50,000 to 100,000 bytes per page, remote transmission of images is not very practical.
One page transmitted over a 2400 baud modem would take about 4 minutes.
Images can also be provided in microform and projected locally on a microfilm or microfiche reader.
6.2 ASCII TEXT The text of the document may be available in machine readable form or it may have been created by the OCR process for the purpose of indexing the text for full-text search.
If this ASCII form of the text is stored in the system, it can be viewed on demand in order to help determine if the document is indeed of interest. Note that even if the document is available for full-text search, it fs the index of the text that is used by the software and the ASCII text is not necessarily maintained.
ASCII code is relatively compact storage compared to images, compression techniques to provide even more efficiency.
transmission of text is reasonable to accomplish.
If the transmitted to a personal computer, it can be stored, printed, for inclusion as quotes in other documents.
incorporating Thus remote text can be and extracted The text of a document contains only the alphanumeric characters and punctuation which were contained in the original document.
It will not include signatures, hand-written notes, figures, or graphics.
13
6.3 HEADER Output of the entire header of I document, including subject terms and abstract if they have been included, may be sufficient to determine if the document is of interest. This information -will require the least amount of storage and transmission time of the possible screen outputs, and like ASCII text, will contain only alphanumeric characters.
14
7.0 DOCUMENT OUTPUT Once it has been determined that a document is of interest and a more permanent record of the document is desired for detailed reading, it can be obtained in hard copy or microform.
7.1 HARD COPY A copy of the document can be obtained in several ways:
If the stored copy is in paper form, a photo copy can be made.
If the stored copy is in electronic image form, a copy can be printed on a laser printer.
If the stored copy is in microform, a "blowback" of the frame can be printed.
Any of these copies could be obtained at the LSS site, the user site, or sent by express or regular mail.
7.2 MICROFORH A microfiche or microfilm copy of the document can be made from any of the stored forms noted above, and similarly transmitted to the user. Although storage space requirements of the user are reduced when the documents are in microform, a reader or reader/ printer will be required.
7.3 FACSIMILE Particularly when time is critical, copies of the selected documents can be transmitted to the user by facsimile devices. Cost of this alternative wil l be the highest, requiring not only transmission costs but also the requirement for a receiving device.
15
8.0 REPRESENTATIVE SCENARIOS In this section we have attempted to define certain scenarios based on the search and retrieval techniques presented in section 2.
The alternatives listed in section 2 through 7 can be combined in many forms to represent a system.
These scenarios define the choices which must be made for each search and retrieval option, still leaving open the various remaining options. A possible set of scenarios are as follows:
A.
A system which provides for search and retrieval on information contained in bibliographic headers only. The document could be stored on microform, electronic images, or hard copy.
B.
In addition to the capabilities described in A., an abstract is added to the header which can be searched in full text.
C.
In addition to the capabilities described in A., subject terms are added which can be searched.
D.
A combination of B. and C. which permits searches on all header information including bibliographic, subject terms, and abstract.
E.
A system which provides for full-text search of documents along with an abbreviated header. The document could be stored on microform, electronic image, or hard copy.
F.
A combination of the system described in E with the capability to search headers with subject terms (C).
16
I
-.c-t........ CrMllllft a,u.. s l11eh1ff:
Sc*,.,.s te UJture bit -a~ *u9t rtl,.,.s fer **crefll er **ueflch*
111**tal* hard copy Ceteletl11t/lMe*l11t
- tbl*otrapll*c heachr co,r*s.ct of objective fields such as author, t*tte, date, *-tty,-, Kce11*0ft nllllber, etc.
O,t1.. 1 I* lude:
...,.._tic disk
...,.._t*c tape O,t*ul dhll N*crefor Mank..
Stalld*~ al,tla*-r*c -*tor fer header tnforut*on and tnteract*on wi th the data base.
O,tl-*1 hlgll resolutlOft -ltor fer electrOftlC lu911 and/or **crofo,.
,..... r.
lecwlllt M,ut O,tt..
1.IIC lllde:
Nlcrefor er hardco,y by a*l er **prt11 Nlcrofor ev1*1111*e at local workstation and pr*nted *ocally (loctren*c *uee aulhb*e at *ocal worhtlllOfl and pr*nttd locally Copy via hu*tle ffv*u I. IIIUOUAPMIC NUD(I IIITII AIISTUCT Cetalotllll/lMetnt
- .. ra,tl*c header coprlsed ef object*v* f*elds plus tht preparation ef.. abstract of the doc-nt.
C. 8111.IOIUINIC IODll IIITII SUI.JUT llMS All cat1tor*e1 and.. t*ons r-** the,_ as Scenario A. e*cepl for:
cau1ett11t1****11t
- 1111t09r1p11*c lleMler coprhed of object ht fields plus the stltcllon of subject ters.
11
- - - - - - - - - - ---- - -~---
D.
11*L1C11UPMIC N(M)[I 111111 AISTWT AIII SUNlCT llMS Al* c1t99orl11 and opt*on1 reu*n lht 1a-11 for ScOtlar*o A. e*ctpt for:
Cata*otlllf/lMotn1
- 1ograph*c header co,r*sect of objective f*e*ds,*u* the,reparation of an abstract and the ****ct*on of subject ltrs.
l.
JUll nn Doaaent Dat1ba10 Croatlon PrtparatlOfl of uchlnt roadablo (ASCII) lt*t of the doc-nt by conversion of hard copy usl119 opt*cal character roc09nlt lon process or rtktY*nt and conversion of doc-nts avallablo In word proc1ss*n9 fllts.
l19e of the doc-nt uy optionally bo prepared by:
Scann*nt pa911 to upturo *1t -1pped l ac)t, F*** pa911 for **crofll or **crof*che, or a*nta***nt hard copy.
Cet1lo1ln1/IMttn1 Preparation of* blbl*ograph*c fleadtr tllllch 11 bo 1111 Ht**l*d than
- n Scenarios A thro119h 0.
lnd***n9 of tht fu*t te*I *r software full t**t rttrlava* 11 --.*or*d.
Storato Sa-options as for Scenario A.
011,1,y Standard alphan-r*c onllor for htadtr and t**l lnforallon and Interaction *Ith the data base.
Opt*c*l h*9h resolution onltor for aloctronlc 11991 and/or lcrofar reader.
Doc-** Output OplloM Include:
Nlcrofor or hardcopy by all or **press fllcrofor avallable at local worllstat*o* and prlnlod locally Printing of ASCII lt*l on local printer OcNn*oadlng of ASCII te*t to local workstation llectron*c lu91 1wa*labl1 al local workstation and print.cl locally Copy via facsl**l* devlct F.
DIIWltl.D fUll lUl All c1ta9orle1 and optl0111 r... ln thl,_ as Scenario C. 11c,,t ror:
C1t1lo9ln9/lndoxln1 Preparation of a bibliographic fleadar plus lht stlectle11 of subject ltr*s.
lndt*lng of lht let *f softwart full ttll rttrltw1l Is N1Ploytd.
18
9.0 ADDITIONAL SYSTEM PARAMETERS The preceding sections have focused on the search and retrieval aspects of the LSS system, including the impact of certain aspects on system design.
There are several additional parameters which have significant effect on the
- system, and since they are related to aspects of search and retrieval or display, we will mentiqn them here. Decisions on these aspects must be made as well before the system requirements can be complete and design specifications can be formulated.
These parameters include:
- 1)
Data volume - total number of documents and pages.
- 2)
Response time - time to respond to a request such as a query or a request to print.
- 3)
Geographic distribution - locations of end users and data input.
- 4)
Number of users - especially the number who may use the system simultaneous 1 y.
- 5)
Type of users - which will affect types of queries and the user interface.
- 6)
Centralized versus distributed - location(s) of the data base.
- 7)
Technology - constantly providing new capabilities and lowering the cost of existing capabilities.
- 8)
Cost.
19
APPENDIX GLOSSARY OF THE HLW ADVISORY COMMITTEE
GLOSSARY ABSTRACT Sunvnary of the main points in a document, usually organized around the theory of the case or subject matter at issue; also called digest; most convnon use in discovery systems is to sunvnarize portions of transcripts.
ASCII ASCII is the acronym for American Standard Code for Information Interchange.
This is the system by which letters, punctuation characters, spaces, some special symbols and control codes are encoded into numeric values for interpretation and storage by a computer.
ASCII FILE BIT An ASCII FILE is a TEXT FILE containinr the ASCII codes which represent characters and symbols (as opposed to an IMAGE FILE which contains the data to actually draw these characters). See also BIT-MAPS.
BIT stands for Binary digiT. It represents the smallest unit of information in a digital computer.
It can have a value of either I or 0, and can be represented by a switch (which is either on or off}.
BIT-MAP Rather than storing the information on a page of text as a series of ASCII codes which represent the characters on that page, an IMAGE of that page may be created and stored in a computer.
This IMAGE consists of a large number of BITS (ranging from x toy per page of typed text}, where the zeros and ones stored by the BITS represent the white and black portions of the page at high RESOLUTION.
Such an image is called a BIT-MAP.
When displayed, a BIT-MAP can be interpreted only by a human user who "reads" the image; it is not meaningful to computer programs. A FILE containing a BIT-MAP may be copied, moved, displayed or printed by a computer system.
BOOLEAN LOGIC BYTE Boolean logic (or Boolean algebra} is a system of logical functi ons and operators which permit computations and operations on binary (true/ fa1se} operations. This system was developed by and named after George Boole, an English mathematician (1815-1864}.
A BYTE is the basic unit of data storage. A BYTE is made up of a certain number of BITS. This number depends on the architecture of the computer, but is always divisible by two (with no remainder). The ful l ASCII code requires at least 8 BITS per BYTE, which is the minimum number found in conventional computers.
CATALOGING CATALOGING is the process of describing a document being entered into a collection (~ a library or DATA BASE management system).
The object of CATALOGING is to extract (or assign} the informat ion necessary to access (find) the document without having to examine A-1
sequentially each document in the collection.
CATALOGING information may be used in INDICES of the collection.
(See HEADER)
CD-ROM (or Compact Disk - Read Only Memory)
Some OPTICAL DISK systems use disks which have had data written to the disk by special reproduction equipment, and can only been read by the computer system onto which they are installed. When such disks {or disk systems) are Compact Disk format, they are called CO-ROMs.
CO-WORM (or Compact Disk - Write Once, Read Many-times)
Some OPTICAL DISK systems can write to disks as well as read them.
Unlike magnetic disk storage devices, these systems can not erase and re-write information.
When such disks (or disk systems) are Compact Disk format, they are called CO-WORMs. To modify a FILE stored on such a system, the entire file (including the correction) must be re-written. The new and old versions are distinguished by VERSION NUMBERS.
COOING See CATALOGING CONTROLLED VOCABULARY List of terms spe 11 i ng and abbreviations, authority 1 is t) or phrases which are maintained for continuity of
- usage, such as authors, addresses, organizational document types, subject terms.
(Also known as CHARACTER RECOGNITION ENGINE A device designed to convert a BIT MAP IMAGE of a document into an ASCII file is called a CHARACTER RECOGNITION ENGINE. Simple vers ions are designed to recognize specific character sets {font recognition devices) while more complex versions are programmed to recognize specific characters by their unique topology.
DATA BASE An organized body of information on a pre-determined topic is a DATA BASE.
Related DATA BASES can be logically or physically combined to constitute a larger and more detailed DATA BASE on a broader subject.
A DATA BASE can be envisioned as a set of file cabinets, containing completed forms of a given kind. Each completed form is called a RECORD, each question on the form is a FIELD, and each completed question.is the contents of that FIELD.
DOCUMENT FILES A DOCUMENT FILE (or simply a "document", when this usage would not confuse the FILE with the physical document it represents) is the basic type of data stored in a computerized archive system such as the LSS. A DOCUMENT FILE is a TEXT FILE which contains the contents of a physical document; it and may also contain a HEADER.
E-MAIL "Electronic Mail"; creation, storage and transmission of word processing documents from computer to computer.
A-2
- - - - - - - - - - - - ~ - - - -
FIELD FILE A RECORD may be subdivided into FIELDS. just as a form can consist of a number of blanks into which information can be entered. The data to be entered in a FIELD is determined by the FIELD'S definition.
A completed set of FIELDS is called a RECORD.
Examples include author, date, title. abstract.
A FILE is a unit of data storage. A FILE is identified by a FILENAME, and contains a collection of related data. These data need not be further organized (.L.L, they may simply be a STRING of BYTES) or they may be subdivided further into named FIELDS.
FILENAME Each FILE stored on a computer system can be identified by a FILENAME.
Such a name is either unique to a FILE, or files with the same name can be distinguished by their location within the computer' s FILE STRUCTURE, or by the VERSION NUMBER of the FILE.
FULL TEXT The version of the document as it resides on a computer system for display ("linear file" in retrieval terms}.
FULL TEXT SEARCHING FULL TEXT SEARCHING is a computerized text processing technique which locates the occurrence of specific words or groups of words within a TEXT FILE.
Logical relationships can be specified by Boolean logic expressions when stating the search condition (.t....JL. "Find places in the text where ' hot' and ' cold ' occur within the same physical paragraph"}
and proximity expressions. Software FULL TEXT SEARCHING techniques require INVERTED FILES while hardware techniques stream the entire portion of the DATA BASE being examined through a hardware comparator, and do not require such files.
HARD COPY A HARD COPY is a paper copy of a document. It can be the paper original, a photocopy or a telefax copy. for example.
HEADER A TEXT FILE in a computerized archive system such as the LSS general ly contains the contents of a physical document, stored as ASCII codes of the text within that document.
In addition to this text, CATALOGING information can be appended to the beginning {or "head"} of the document. Such a HEADER may contain a variety of information in FIELDS, which may be accessed directly by DATA BASE management software (for INDEXED SEARCHING} or may be accessed by FULL TEXT SEARCH software (either independently or along with the body of the text from the document}.
Headers are also known as surrogates, document coding
- forms, OCF' s, bibliographic citations and "identified" in the NRC consensus document on the rulemaking issues.
IMAGE An IMAGE of a page visually presents the information on that page.
This image is meaningful only to a human user, and can not be
interpreted by computer programs. Examples of document images are photocopies, telefax copies, microfiche and BIT-MAP IMAGE FILES.
IMAGE COMPRESSION The number of BITS in an uncompressed IMAGE FILE of a page of text is equal to the area of the page times the RESOLUTION of the IMAGE (plus a few additional BITS required by all FILES). The amount of memory required to store this IMAGE can be reduced by IMAGE COMPRESSION techniques.
IMAGE FILE An IMAGE FILE is a computer FILE containing a BIT-MAP of a document IMAGE. The number of BITS in an uncompressed IMAGE FILE of a page of text is equal to the area of the page times the RESOLUTION of the IMAGE (plus a few additional BITS required by all FILES).
INDEX (plural INDICES)
There are a variety of logical ways to physically arrange a collection of documents
(.LS.:.
alphabetically by author or by
- title, chronologically by date produced or entered into the collection).
Each of these ways is designed to help access (find) a document based on a specific strategy for finding it. Unfortunately, a collection cannot be organized simultaneously in each of these ways. In order to make each strategy possible, surrogate collections can be created which contain the key information (sorted appropriately) and the location of the document.
In libraries, these surrogate collections are the author catalog and subject catalog. Such DATA BASE surrogates constitute INDICES of the collection.
INDEXED SEARCH INDEXED SEARCHING, the conventional method used by DATA BASE management software to access data, searches INDICES constructed to support the specific type of queries. This is distinguished from FULL TEXT SEARCHING, which searches the TEXT FILE (or corresponding INVERTED FILE, in the case of FULL TEXT SEARCH software) that has not been otherwise organized for retrieval.
INVERTED FILE Software FULL TEXT SEARCH techniques do not directly search a TEXT FILE at the time the search request is made (as do word processing programs when searching for a STRING). Rather, the TEXT FILE is pre-processed to create a file containing the words in the TEXT FILE and pointers to their locations. The INVERTED FILE can be searched much faster than the original FILE since it has been pre-sorted.
KE~ORD Accessing documents in a collection can be facilitated by assigning KEYWORDS to the document (or a RECORD representing it in a DATA BASE) during CATALOGING.
KEYWORDS are words that describe the document ' s contents and are best assigned from a CONTROLLED VOCABULARY, preferably with the aid of a THESAURUS.
A-4
KEYWORD IN CONTEXT (KWIC)
Words in the FULL TEXT document, including words located before and after the keyword.
KEYWORDING A part of CATALOGING, KEYWOROING is the processes of 1ssigning KEYWORDS.
KEYWORDS are generally assigned from a
CONTROLLED VOCABULARY, and are most useful when based upon a THESAURUS.
OCR (or Optical Character Recognition)
A device or process which converts HARO COPY text into an ASCII file by using a CHARACTER RECOGNITION ENGINE.
OPTICAL DISK An OPTICAL DISK is a computer data storage system, such a CD-ROM or CD-WORM disk drive, which records BITS as the presence or absence of minute pits on a glass disk. The system is *optical" since laser light is used to write and read this data from the disk.
PIXEL An IMAGE can be represented by a large number of small spots (usual ly in rows and columns). These spots, which can be either black or white, are called PIXELS (from *picture elements").
PROTOTYPE In compiling the information necessary to design and build a large DATA BASE management system, a system PROTOTYPE can be used to estimate quantitative performance information about components of a larger system to be built, and can be used to quantify and evaluate the behavior and response of users to software while it is being developed. Such a PROTOTYPE consists of hardware test environment in which specific components can be interfaced and evaluated, a software environment which can run a simulation (or simplified version) of software to be used in the complete system, and a test DATA BASE (representative of, but significantly smaller than the final DATA BASE) wh ich can be used to test user behavior, software and hardware performance and DATA BASE organization.
RECORD A RECORD is a group of one or more related FIELDS, containing data. A DATA BASE generally consists of group of RECORDS, each containing a group of. related data in the subject of the DATA BASE. These can be considered individual completed forms in a file cabinet which represents the DATA BASE.
RESOLUTION The RESOLUTION of a BIT MAP IMAGE is the number of PIXELS per unit area. If no IMAGE COMPRESSION has occurred, the number of BITS needed to store an IMAGE FILE is equal to the number of PIXELS in the IMAGE.
SCANNER A SCANNER is a device which converts HARD COPY text into a BIT-MAP IMAGE.
A-5
STRING A character STRING is a series of characters represented by their ASCII codes.
SUBJECT TERMS Words or phrases assigned to a document during subjective CATALOGING, to represent the overall concept presented by a document. SUBJECT TERMS are usually selected from a hierarchical CONTROLLED VOCABULARY list, such as the DOE Keyword Dictionary, and are assigned at the closest level of detail.
SYNONYM FILE One aspect of a THESAURUS is to identify words (or phrases) which have the same meaning (synonyms),
and to select one which is used to represent and replace the others during KEYWORDING. A FILE containing such groups of related words is a SYNONYM FILE.
Such a FILE can be used with some sophisticated FULL TEXT SEARCH software, so that each synonym is found in a search if any of a group of synonyms from the FILE are sought.
TEXT FILE A TEXT FILE has its characters stored as ASCII codes, as opposed t o IMAGE FILES where the shape of the character is stored in BIT-MAP form.
TEXT FILES in the LSS generally contain the the text of documents in the system, and are therefore often referred to as DOCUMENT FILES (or
- simply, "documents", when this would not confuse them with physical documents).
THESAURUS A THESAURUS is a CONTROLLED VOCABULARY with embedded instructions and relationships which assist in assigning KEYWORDS or SUBJECT TERMS consistently and logically during CATALOGING. THESAURI can be used for developing a search strategy at a precise level of detail and may contain broader, narrower, and related terms (synonyms). Also called taxonomy and classification scheme.
VERSION NUMBER When FILES are modified in many computer systems, previous versions of the FILE are retained under the same FILENAME. To distinguish between versions, VERSION NUMBERS are assigned.
A-6
7 A'rl'EHDAHCB LIST Meeting ot the BLW Licanaing Support Syt-Adviory emmitt**
Noveaber 19-20, 1987 COMMITTEE MEMBERS (Including Spokeaperaon and Alternate)
Priscilla Attean Penebacot Nation Denni* Bachtel Clark County, Nevada Steve Bradhurst Nye County, Nevada Francis x. Cameron Ottice ot the General Counsel o.s. Nuclear Regulatory Commission Barbara Cerny DOE Don Christy Nuclear Waste Ottice State ot Mississippi Bill Clausen State ot Minnesota Stan Echols Ottice ot the General Counsel o.s. Department ot Energy Xevin Gover Special Counsel Nez Perce Nuclear Waste Program Ronald T. Halfmoon Nuclear Waste Program Nez Perea Tribe Robert Halstead Radioactive Waste Revisw Board State ot Wisconsin Alice Hector Attorney tor the Texas Nuclear Waste Task Force Hector and Associates
\\
ATTACHMENT 8
r..
r..,.
GU>SSARY OP TECBKICAL TERMS The following represents an initial consensus on the definition of technical terms following the November meeting in Denver. It is not complete and will be enlarged aa th*
participants request clarification. In come instances, the terms are somewhat specific to the HLW terminology already developed, rather than the most representative or precise definition in current "discovery" or "litigation support" glossaries.
Header Searchable Header Hard Copy Oocwnent Technique of coding a docwnent, process or materials by describing its parts, usually know aa "fields":
Bibliographic Header (simple coding)
Docwnent Number Date Author(*)
Addressee(*)
Copies Sent To Title Description (if title not clear)
Document Type Enhanced Header (usually includes some subjective analysis of the content of a docwnent)
Abstract Thesaurus, taxonomy Subject Terms Additions case-specific Fields, e.g.,
Docket File Code Contract Number Report Number Concurrence List Headers are also know as surrogates, DCF's, "coding forms", or bibliographic citations. The term "identified in the LSS" has been used in the NRC Position Paper to signify the use of a header.
The information in the header after it has been indexed by a computer program and made available tor searching on a computerized retrieval system The paper docwnent or copy of it ("hard copy")
Image Full Text Searchable Full Text Enhanced Full Text Keywords Subject Terms Fields OCR Optical Disk CD-ROM E-Mail Record The microfilm, microfiche or optical diak
("bit-mapped") version ot the hard copy document The version ot the document aa it reaid*
in a computer system tor display ("linear tile" in retrieval terms)
All the words (except "atop" worda) in the document attar it has been indexed by a "full text" computer program and made available tor searching on a computerized "tull text" retrieval system ("inverted tile" in retrieval terms)
Full text plus header or some additional way ot describing a document Words in the searchable tull text document1 to avoid contusion, not used here to rater to a field in a header Words, terms and phrases created especially tor a specific case or tact situation; usually included in an "enhanced header"
- Parts which make up headers, e.g.,
author, title, date, abstract Optical Character Reader; a device which converts hard copy text into computer- -
readable words A media (plastic disk) tor storing large quantities ot electronic data in-the torm ot images, text or searchable words and phrases c:.;
A form ot optical disk commonly used for storage of electronic data
~
i J "Electronic Mail"; creation, storage and transmission ot word processing documents trom computer to computer e.g., hard' copy document, geologic core sample, photograph, image, magnetic tape or disk
Mr. Francis X. Cameron Department of Energy Washington. DC 20585 SEP Z t 1989 Office of the LSS Administrator U.S. Nuclear Regulatory Commission Washington, D.C. 20555 Re: Your Letter of August 7, 1989 Comments on Prototype System Cataloging Manual
Dear Chip:
I have reviewed the above noted letter and its enclosure in some detail, and have forwarded them to SAIC for their consideration.
I will be happy to review the next version of NUDOCS header design, and I agree that we should cooperate on the coordination of header design efforts. However, I feel that a better defined effort than merely exchanging preliminary study documents,
internal system design/ redesign, etc., is needed.
We need to move toward a definition of LSS header record content. A focused work group should begin work on the development of header record designs so that all potential parties have a more definitive statement of the formats they should be moving toward.
This initi ative should commence sooner rather than later.
Regarding the comments you have forwarded, I would like to address what seems to be a persistent tendency within NRC to assume that the processing protocols, headers, and other record fields utilized in the instrumented test bed processing may pre-determine the eventual LSS header design.
- Likewise, there seems to be a tendency to perceive our test bed environment as having more objectives than, in fact, it does.
Your letter implies that the headers used for the instrumented test bed reflect, or will reflect, the failure in LSS "to ensure the completeness and the unique identification of this critical set of documents" by our treatment of the Document Type and Detailed Document Type fields* for the instrumented test bed.
Let me assure you that the instrumented test bed header treatments are not pre -determinative of the LSS header formats.
The information management questions being addressed by the instrumented test bed can be summarized by the following:
How will the system be used?
What aids or hindrances are evidenced in our overall concept designs?
What are the effects of partitioning text?
r-How will header fields be utilized in conjunction with text search capabilities?
How will descriptors be used in full text search?
How effective are printed aids such as a thesaurus and a retrieval manual?
The instrumented test bed, by intent, does not have the validation or testing of the specific level of document type treatment as an objective, although such by-products will be duly considered.
I would also like to make a general observation which is meant to be a constructive one.
Your letter notes that NRC's upgrade of NUOOCS makes the continuing dialogue on the issues related to header design particularly important, and that you would like to ensure consistency between the LSS and NUOOCS headers.
1 read this, in conjunction with the detailed comparisons with 'the way things are done in NUOOCS' that are found throughout the 11 pages of comments you have provided, and am left with the impression that NRC perceives the LSS to be simply a restatement of NUOOCS in a new hardware and software environment.
For example, take the question from the comments:
z Should the LSS detailed document types *be mapped to NRC document type codes"?
Why not ask, rather, "To what degree will one be the subset of the other after we have met our design objectives?" It should not become a question of whose system drives the header design: records from all the participants need to be entered; the DOE coliect1on will be preponderant in volume; NRC will be the critical user during the hearings; and, nothing in the LSS implementation should prevent the use of LSS as a records system by a given party.
The LSS is to serve multiple purposes which include its use as a surrogate for discovery, a tool to support motions practice, and, the Convnission's docket and official record for the licensing proceeding.
We are now attempting to design an LSS which meets all of these objectives. Perhaps NUOOCS already meets most of these design requirements, but, it is my observation that NRC's existing methodologies, document type codes, detail document type, and other treatments are, in fact, constrained by NRC's existing hardware configurations and software capabilities -- as would be the LSS if it simply mirrored NUDOCS (or ARS, for that matter).
The point is that these are already dated technologies to some extent, whereas we have a unique opportunity to let our required functionality drive the hardware and software we procure {rather than having to build an application using whatever computers and software happen to be available). Decisions about header design should be made in light of the LSS' unique objectives and what the LSS will allow us to do with the 'tabula rosa' of new technology.
During design stages it is important to remember that the LSS does not have to inherit the baggage of DOE, NRC, and other parties' existing systems' limitations, be they hardware, software, or limitations inherent in a system designed for other purposes.
At the same time, we recognize that products already developed, as represented by existing systems, can and must be used in building the LSS data base.
3
)'-- I am suggest1ng that it is more important to determ1ne what fields, field contents, and field formats are necessary to support the organization. search, and retrieval of a record 1n the LSS header and text environment.
We need to do th1s w1th the 1ntent of fully ut1l1zing and maximizing the retrieval software's capabil ities. as much as they may be anticipated. If we provide th 1s sort of defin1t1on to the potent1al parties. each can begin the process of moving toward the acceptable LSS header record format with minimal rework being necessary at a later time.
A review of existing systems, such as your NUOOCS redesign effort, is useful in that it may provide a checklist of items that need to be addressed and is a source for lessons learned.
On the other hand, close scrutiny of cataloging procedures used for our instrumented test bed is premature since the LSS header record formats are not as yet defined.
The prototype cataloging procedures are not even a worthwh1le point of departure for such a definition because the test bed env ironment does not attempt to define the anticipated LSS hardware or software env1ronment -- it only emulates anticipated functionality in its study of the attributes which affect that environment.
I hope that these observations will be helpful in our mutual efforts to maintain the perspective of what our LSS design efforts should be based upon.
We look forward to participating in the initiative where developing the LSS headers needed to meet LSS functionality is the primary design objective.
cc:
B. Cerny, RW-14 Sincerely,
/
- /
1/2,
- (~IL,...... ~,., ~
- ....c..c..-..c..
Daniel J. Graser Program Analyst Information Resources Management Division Office of Civilian Radioactive Waste Management
r UNITED STATES NUCLEAR REGULATORY COMMISSION WASHINGTON, 0. C. 20555 August 7, 1989 Mr. Daniel J. Graser Infonnat1on Resources Management Office of Civilian Radioactive Waste Management U.S. Department of Energy Forrestal Building 1000 Independence Avenue Washington, D.C.
20006
Dear Mr. Graser:
As part of the NRC efforts to review the design of the LSS, I am enclosing NRC coITITlents on the SAIC reports "LSS Prototype Header Design," and "LSS Prototype Cataloging Manual." Although these reports focus on the LSS Prototype, our corrments will need to be considered in establishing the header design and cataloging manual for the final LSS.
Most importantly, the manner in which the NRC adjudicatory record has been incorporated into the Document Type and Detailed Document Type fields fails to ensure the completeness and the unique identification of this critical set of documents.
In this regard, we would be interested in discussing th~ resolution of our COITITl~nts and questions at your convenience.
A continuing dialogue on the issues related to header design is particularly important in light of the NRC upgrade of its document control system (NUOOCS ). Part of the upgrade process is a re-evaluation of the headers, indexing manuals, and authority files for NUOOCS.
~e would like to ensure consistency between the LSS and NUDOCS headers and would encourage coordinaticn of these header design efforts. In this regard, we would invite you and your contractor to evaluate the next version of the NUDOCS header design which will be ready for review in October 1989.
If I can provide any further information on our corrments, please feel free to contact me.
Sincerely, F~X~
Chai nnan LSS Internal Steering Committee
Enclosure:
As stated
7/26/89 COMMENTS ON SAIC REPORTS ENTITLKD "LICENSING SUPPORT SYSTEM PROTOTYPE HEADER DESIGN" March 7, 1989 version and "LICENSING SUPPORT SYSTEH PROTOTYPE CATALOGING MANUAL" March 14, 1989 version I.
GENERAL COMMENT
S AND QOESTIONS:
- 1. The Catalogin& Manual <CM}
states that 120,000 paaes of documents will be captured.
We understand that this repre-sents about 2,600 documents includina the SCP, its referenc-es, some of the "administrative record and some handwritten notes. We are concerned that these documents are not a representative sample of the document types that will populate the system later on.
At a minimum, this will affect the validation of the Document Type authority files.
Also and more important, it will limit the ability of the various classes of searchers to fully evaluate the prototype in the "test phase".
In what areas do you expect the header might change as the true makeup of the database evolves?
2. What is the source or basis for some of the specific format requirements in the Cataloging Manual?
Is it patterned after any existing system, such as the DOE's ARS?
NRC has provided SAIC with the NRC's NUOOCS header record layout, indexing manuals, and authority files.
What, if any, are the reasons why some of the NRC conventions (such as Document Type structure and Affiliation codes) were not adopted?
- 3. How are numeric and alpha-numeric fields structured so as to allow for sorting and listing? Will indexers have to "zero-fill" or will the software Justify appropriately?
- 4. What procedures are envisioned for the modification and update of the authority files based on submitter*s sugges-tions and needs?
- 5. There needs to be uch aore discussion internally *ithin DC and between the parties about the followlna issues:
A.
One issue that ia not addressed to any decree in the header design docu.ent la the extent to which the submitter'a authora or authorirur officea as opposed to the submitter'a catalogers vill coaplete portions of
- 1.
the header, *pecifically the title and/or ab9traot.
Fro the experience vit.h nc*e NUDOCS, it aa7 be better for the subllittinc office to at least *propose* ae uch of the subJectiYe i.nforaation as possible 1n order to liit the nuaber of errors or larepreaeotationa coamitted by the cataloaere unfaailiar *1th the context or the subject atter.
Fro the description of the biblioaraphic fields, it appears that oat can be completed by the subeittinc office with party*a cataloaers perfor.inc review functions for quality control and tor torat or classification conaiatenc7.
There are coat/benefit issues to debate.
B.
Abstracting
- the need and purpose of an Abstract
- for all docu.enta or
- for Just selected docuaeota by type
- if so, which types of docU11enta.
- who, personally and oraaniaationall7, will prepare the abstract, dependinc on docuaeot type.
- When will the abstract be prepared.
- Ally differing considerations on aboYe issues betv6en
- backfit* phase versus the *real-tie* phase when the tieliness of entry requireeot will copete with the requireeot for the quality of iodexi~
for lonc-ter. retrieval.
C.
More work must be done on the Docuaent Type clasaific3-tion schee. See fields 5 and 6 for ore i.nfonaatioo.
6. Section 3.1 of CM.
Windows and pull-down menus are high-tech.
They will certainly help new indexers and eliminate inconsistent entries.
However, experienced indexers may want faster entry.
Will it be possible for such authorized indexers to bypass windows and enter direct-ly.
Entries in specific fields could then be automatically checked against authority files at "end" before record is
- closed out*.
7. In Section 3. 1.3 of CM, the Querv function as discussed seems cumbersome and unsophisticated.
Can it handle multi-parameter searches? If so, how will it handle embed-ded Boolean statements within statements?
How does it differentiate between: ((A and BJ or C) versus (A and (B or CJ)
??
le it thought that the prototype cataloguer& did not need such sophisticated search capabilities?
In the full system, both cataloguers and searchers will need such a capability.
- 2.
8. Section 3.2.l of CM implies that documente come to the station with LSS Accession numbers already aeeianed.
What are the pre-indexini procedures and rules?
Who defines and determines the "cataloaina unite"? When and how are acces-sion numbers aeeianed?
Who and how are duplicates searched?
9. In Section 3. 2. 3 of CH, "Deletina a Record", it ie stated that the phrase "Delete Number?.. will appear before an entire record is deleted.
This could be confusini to the cataloger in that he or she may assume that only the LSS accession number will be deleted rather that the entire record.
Aleo, is it possible to archive these "deletions" at least temporarily instead of erasini them eo that they can be recovered if needed?
10. In Section 3.2. 4 of CM, "Using Query," on pages 17 and 19 of CM, the method of performina a search is described.
It ie assumed that the described method ie only for the use of catalogers or other individuals who have extensive experi-ence with the LSS.
The search software for most LSS users must be much more helpful.
- 11. Section 3. 3. 2 of CM -- How are "batches" defined?
What i f more than one batch is done in a day? or if one batch spans more than one day?
The command "After what date (YYMMDD )
does not seem to allow for this.
12 Section 4 of CM - Quality Control -- there definitely shou ld be more than one level of QC.
Aleo, the initials of the QC peraons should also be carried on the data record.
Each submitting party will have their own Quality Control proc e -
dures.
However, QC should be given a lot of attention and the responsibility should be a major line function, not relegated to a committee.
- 3.
II.
COMMENTS ON SAIC PROPOSED FIELDS AND ASSOCIATED CATALOGING ROLES:
Field 11 LSS Accession Humber:
NRC places their Accession Number in the lower left corne1.
It would be of interest to know if there was a reason for your decision to place the LSS number in the upper right corner.
Many NRC documents have notations in the upper right corner.
One alternative placement would be the lower right corner; althouah some organizations place paie revision numbers there.
Another alternative would be to place the number vertically in the middle of the left margin.
We are confused as to how the "Package" header will diffet from the header of the "parent" document.
Or will the parent records just carry two Accession Numbers?
In the NRC systems, the Accession Number of the Parent or Mother is carried on the data record of all the "children" and the Parent document carries a flaa to denote the existence o!
"children**.
How will your method effect the hit counts, the sorting, and printouts? More explanation and some examples are needed here.
Field 12, Title/
Subject:
The **subject line ** on correspondence is usually a very broad characterization with little thought given toward long term retrieval or distinguishing it from other documents.
It is acknowledged that brief abstracts (NRC NuDocs has 4 lines )
prepared by catalogers are time consuming and not always the best.
It is also acknowledged that with the full-text of documents available for on-line searchers, this short abstract may not be critical for search and retrieval purposes.
However, for the purpose of listings, bibliogra-phies, announcements, court certifications, and for scannJng large "hi tlists" to determine the relevant documents for further review, something more than the "subject line" will be required.
Remember, not all end-users of the LSS will be on-line.
Also, moat letters do not have a "subject line" like Memoranda.
Maybe this is the purpose of the LSS "Abstract Field #22.
If so. it is not clearly stated.
On page 5 of SAIC LSS Prototype Header Desi&n Report (HD),
the last two linee of the discussion of Field 2, Title, state that the title of an encompassing work will be in the "Bibliographic Citation" field.
It is not clear to which of the fields this statement is referring.
- 4.
Cataloaina Rulee:
What is the propoeed lenath of this field?
Why are the format rules so very specific?
Given the number of varied cataloaers from differ-ent parties overtime, will it not be a real burden on the indexers to follow these strict professional-type catalogina rules and on LSS staff to assure compliance and consistency?
For what end? If it is to do sorting (and filing) by title alphabetically, then couldn't some software routine be written to ignore preceding articles?
- p. 27 of CM, 2nd paragraph -- there should be more explicit rules about what to cover in title descrip-tions.
Phrases like "meaningful" and "reflecting the content" are too vague.
The NRC system has more specific rules on what aspects of the document content should be covered varying by document type.
While they may not be perfect, at least they should be reviewed.
Shouldn't the same convention as with Abstract Field be used to denote actual wording of the Title versus indexer-composed description.
Field #22 Abstract Field is discussed here due to its interrelationship with the Title/Subject Field.
It is unclear in the SAIC Prototype reports which documents will be abstracted.
The CM states that a "'brief description on the content" will entered.
In the final LSS design, much more must be decided and said about the Type of Abstract required or accepted.
Also, what is the proposed length of this "brief" descrip-tion?
Field #3, LS$ Pointer p. 30 of CM, 2nd paragraph of "instructions " -- how will be cataloger know that a
- revision
- is already in the system?
Can't or shouldn't that be caught in the pre-indexing review (duplicate check)?
Using this field and maybe others, how will marked up copies of the same document be handled, i. e. reviewers handwritten comments and editing on a report or "pen & ink" changes?
How will such a document be indexed?
Also, how will a cover letter forwarding various/selected replacement pages to a previously-submitted document be handled?
How will drafts, revisions, errata, etc. be linked together?
The listed codes in the "controlled vocabularies" do not seem to cover ~uch a case.
- 5.
This information must be captured somehow and the search software must utilize it to notify searchers passively that previous or* later versions and/ or errata exist and are on the system.
Field 15 Document Type This should be a repeating field !
Is this and the "detailed document type" scheme already i n use at DOE?
As you know, NRC has their own scheme based on their own terminology.
Somehow a mutual interagency list should be devised.
There is a major concern regarding the instruction f or completion of this field on page 34 of the CM.
There, it is stated that the cataloger will select the first [and no other) document type that matches the f o rm of the document from the provided list.
The eighth document type in that list is Legal Materials which (as discussed in the descrip-tion of Field 6, Detailed Document Type, on pages 36 to 40 )
includes those documents associated with the NRC adjudicatory record.
Under the current instructions for tho catalogers, a document that is part of the adjudicatory record may not be identified as such if the cataloger finds a document type in the list that matches the document prior t o reaching the Legal Material document type.
This is a serious problem and must be correc ted as soon as possible.
THIS FIELD CAN NOT BE USED AS THE DELIMITER FOR THE HLW ADJUDICATORY FILES !!
See Section III for proposed new field.
A " legal " document in s ome other proceeding may be submitted to LSS.
It should get a "legal" document type BUT m~y or may not be part of the HLW adjudicatory record.
Any non-legal document type, i. e.. drawing, journal article, letter, etc, at first may be entered as they a re.
Then later that document becomes an exhibit. It then must a.lw2 carry the Legal document type while keeping its original document type code.
Field #6. Detailed Document Type This also should be a repeating field.
Some of the elements here are not mutual l y exclusive within one document.
These codes should be mapped to NRC document type codes.
If for no other reason than to test clarity of both systems.
Specifically, more work must be done on the legal document types.
- 6.
Field 17 Document Date How will transcripts and minutes of meetincs spanninc multiple days be coded?
Field #8 Document/Report Number Is this field Pil.li'.. for numbers of the specific document being cataloced, i.e. (1) contract number !or actual contract and amendments, not reports done under that con-tract; (2) the USGS or NUREG report and revisions, not other documents, memos, letters about the OSGS or NUREG report?
The description appears this way.
Aeeuminc this is true, how will documents commenting on or 'about* such reports be coded?
What is the purpose to preceding alpha codes listed on p. 45 of the CM?
This appears redundant to the Document Type Code.
Does this not put a burden on the searcher to know what kind of number he/she has been given to search?
If fir.31 retrieval software has a 'wildcard* character, then this problem could be eliminated, but I think the classifi-cation is not justified.
Ru:e 7 on page 47 of the CM states that common abbreviatior.s shcu ld be used where possible.
While this suggestion is ac~eptable in theory, the examples provided are not co~moc tc all LSS users.
It may be best t o refrain from using at=~eviaticn except where their ~eanin& is obvious and ur.:~biguous.
NUDOCs has attempted to keep an authority fi : :o o f accepted abbreviations and it is not always up to da:e or used.
The problem will get much worse given the mu:tiple parties contributing to the LSS over long period c:
ti~:.
Field#: Edition. Version/Revision Th~s field will require more detailed instructions to ha~dle:
selected pages submitted as Amendment 9 of looseleaf document such as this indexing manual or the application
!yersus!
whole indexing manual or application includin& revised interfiled pages thru Amendment 9 Sh:Jldn't this field be linked to occurrences in the previ-ou: field?
Might there not be cases where Rev. 6 to a Sa~dia report then becomes NRC NUREG - ####, which is later su~plemented 6 times?
Sad but true.
- 7.
In reference to the use of this field "for deacribinc computer codes and code manuals ", more explanation is required.
As one reviewer of these reports stated "I think I know what this means, but surely this needs to be spellec out better so we are all singing from the same hymnal "
Field #10 Author Name Should concurrences, either by name or organization, be picked up if they appear on the document?
Field #11. Author Or~anization How will the authority file rules handle organizational name changes, subsidiaries, reorganizations, etc?
Must develop rules for authors who write in two of more capacities, i.e.
letterhead says ACME utility, but author is writing as the head of the utility owners group.
OR lawyer works for DEWY, CHEATEM & HOWE but is representing EXXON.
OR NMSS staff chairing inter-agency or intra-agency review group?
How handled? -- will you pick up both?
In the NRC system, the Affiliations (and the Document Type Codes) have hierarchical scheme to classify document authors and recipients, NRC AFFILIATION SCHEME First level -
E for external vs N for internal.
This would not be appropriate in this system Second level - type of organization "
i.e.
SG = state government UT = utility LO = local government us = Federal agency LG = legal firm MV = manufacturer or vendor etc DOCUMENT TYPE CODE SCHEME CLUTN = correspondence/letter/utility to NRC TRUTIN = text/ report/utility inspection report POINT :
These codes can be very powerful in searching, especially along with the Boolean "Not Equal " to narrow scope of searches to their essence.
Many times after searching known parameters, the resultant hitlist i s still too large to be useful.
At this point, the
- 8.
searcher may not know what he/she wants or be able to positivelv select a narrowina concept, but he/she knows what he/she does not want.
Then, using the let (and second) level Affiliation codes (and/or Document Type codes) truncated, he/ she can exclude classes of docu-ments by type of author, by type of recipient or by type of document.
Field #12 Recipient Name Proposed instructions state that this field would include attendees at a meeting as recipients.
Some meetinas may have a long attached attendance list and it may not be feasible or beneficial to list all of them in this header field.
More specific rules must be developed to narrow t he scope and intent of this data capture.
Consideration: i f smaller number of attendees (i. e. less than twelve) are listed at the first of meeting minutes or meeting summary and it was a "participatory"-type meeting, then such persons should be captured.
In this case, one could argue that such persons are more "authors" than "recipients.
Better yet, have another field for "attendees".
The re-quirement to complete this different field could be trig-gered for all records having certain document types.
Field #17. Publication Data Instruc tions state that an entry is required.
From the description of this field, however, it is not clear whether an entry will be appro~riate in all instances.
Field #18, Subject Term Please provide more information as to the intent of this field.
The broad nature of the terms may cause this field to be of little value in searching for particular documents.
Is it to be used to segment the database?
If so, there IDay be problems because many documents may address several of the listed terms such that the submitter and cataloger would have difficulty in assigning a single term to a document.
Also the searchers may take issue with the view of the cataloger. It will be hard to make the segments mutually exclusive by the subject scheme. Page 63 of the CM states that this field will not be used in the Prototype.
Therefore, it will be impossible to test the usefulness of this item.
- 9.
Field 121 Special Class Hore discussion is required on this f ield because it appears that this field and the Document Type fields are beina used in combination to "segment" the Adjudicatory Record file for the adjudicatory Boards.
In this field or in the "Project" field, documents related to rulemakings and documents referenced/cited in other documents should be captured.
In the proposed list of "special classes**, it is not clear what documents will be encompassed by the following terms:
EA-AR (Part of the Environmental Assessment Administrative Record), LA-AR (Part of the License Application Administra-tive Record), and Lit (Part of EA Siting Litigation).
Are these DOE-specific classes? If not, it would be difficult for others to assign such codes.
Other parties will have their own "special" codes.
The LSS Administrator will have to maintain authority list.
The description of NRC evidence in the special class list should be revised to read "Unit is evidence in an adjudicatory proceeding" because evidence may be oral or written.
Field #22 Abstract See comments in section on the Title/Subject field ( #2 ).
Field,2s QA Level Code Please provide mo re information on the scope and usage of this field.
Field 127 Page count.
How will the page count for package records be handled? This has been a sticky issue in the NRC "s NUDOCS, especially when an enclosure in the new "package" is a document already indexed earlier and therefore is a "duplicate" which mus t be tagged to this new package for completeness.
- 10.
III.
PROPOSED ADDITIONS.
The following elements of information were not included as separate fields but may be of value in performing search tasks :
- Date docketed HLW adjudicatory document "tag"
-- see comments on field~
- 5, #6 and #21 for more information.
- Concurrence Names Reference Affiliation/Organization (use same Controlled Vocabulary as used for Author or Recipient Organization. )
- Referenced Documents and/or regulations (parts of CFR)
- Event Date -- dates of meetings, inspections, "incidents **
- Alternate availability -- other sources of same document,
i. e. NTIS, GPO, ORNL and/or location and contact of core samples, data tapes, maps, travel vouchers, etc.
In addition, there are certain elements of information that are captured in more generic fields which might warrant their own specific field
- Witne3ses & Speakers (currently in
- author field )
- Attendees (currently in
- recipient field.
If kept as part of more gen~ric field, I could debate that attendees should go in *author field ", especially for small meetings -
less than ten people. )
- -Contract numbers (currently in the "Report field " )
There is an argument which states that some of the above listed information could be found by searching the full-text.
- Also, some of this information could be loaded into more general fields.
However consistency of capture and format would argue for a specific field.
The existence of such fields would trigger indexers to capture the information in a standard format.
This would relieve the burden on the searchers.
A paragraph or so explaining and justifying the exclusion of such data capture and alternate retrieval methods should be provided.
It would be helpful to those of us who follow (advisory commit-tees) and wonder "why not?".
Further, weren
- t there other fields proposed or discussed during the negotiations?
If so, what was their disposition?
Those fields that were considered but not included should be listed and discussed somewhere.
- 11.
00 (I)
~
fl) 0
.--4 (J
J5 A
D
.*.,............ -........ 1.,.-,..-.,-,.-.,,.,.,..,....,....................,-,. **.
- *~*::::*::*:*::::::::::::::::>'
__ I tefor Dlrectora Automated System D. UCr Patent Office Pr Ilana ofAPS ognm getMnl of onlcO.la Office Electr Con.cd DlalrlMt on and don ASSISANT COMMISSIONER FOR INFORMATION SYSTEMS T. Olammo DEPUTY ASSISTANT COIIIIISSIONER
- 8. Alexandlf APS Contncta Program Slaff Directorate for Automated TM &
Admln.Syatema R. Rihn (Acting)
Offloeof Auto111111ed Traclemartt System*
Office of Aclmln.
Syatemaand lllcrocornpuw Applk:ellone
- Support Staff Directorate for Sptem Engineering and Evaluation J.Oberttl...
Office of SJ*lem*
Engin..tng and CommWllcatlona Office of Syalema r.., end Eveluetlon Office of Tedlnlcal Revlewenct Eveluadon
~
I Of Central rectorate tor Computer Operatlona llnon ft.
Offtceof Support u....
Offlceof Opent Ing s, *..,.,.
Support Offlceof Comput*
Opendon*
Incoming Application 2900 Flllng Receipts 2900 Patent Operations Weekly Work Volumes Incoming Correspondence 100,000 Outgoing Correspondence (ACTIONS) 7132 246,000 Pending Flies 2200 Grants Foreign Documents 7000 Non-Patent Literature 578 Archives GazeltN 400 5=~ CfE11 -
Document Data Base (Search File) 29,900,000 Documents Average Subclass 127 U.S.
108 Foreign 9 Non Patent Literature 237,102,729 Pages
- 3% Reclasslfled Annually Organized Into:
406 Classes 122,045
- Subclasses U.S.Patents 5.5 Text Pages (30 K Characters) 2.6 Drawing Pages
1400IO ftpre 2 PATENT EXAMINERS AND APPLICATIONS FILED 1988-1995 r,, I I,.,w
,.,_,,&m._.
22GO 2000 11*
1100
,........ -----,.------------.,--.---....------,r------------+ 1400 1911 1919 1190 ttt1 1992 1993 1994 1995 C -E *....
,.-r-:\\
~~" PTO's Automation Objectives
- Provide Automated Searching Services to Patent and Trademark Examiners
- Create Electronic Data Bases Containing U.S. and Foreign Patents and U.S.
Trademarks
- Broad Dissemination of Patent Information in Electronic Form
,/r-;\\
'~" PTO's Automation Objectives (continutd}
- Permit Filing of Applications in Electronic Form
- Enhance all Patent and Trademark Processes through Automation
~-------------
,r*:\\
~~/ System Characteristics
- Large Mainframes for Text Search
- Sophisticattxl Workstations for Digital Image Searching
- Massive Data Base on Optical Disks 1;
- High Speed Communications Network
Automated Patent Syste111 General Concept Bacllllle Processing On-Going Processing
- Character Recognlllon
- DlglUzallon *--
- Kerlng
- Archiving Computer* and Local Communication*
Eleclronlc Dal*
Bases Electronic Worllstallon
- Appllcallon Re,lew
- Search & Retrlewal
- Oltlce Actions
- MIS Reports Photo Comp lnpul Camera CopJ
~ -
- -__J
APS Development Strategy Q] Production System Q] Operational Testbed - Group 220 QJ Group 220 Composition Small Number of Examiners All Technologies Electrical Mechanical Chemical
- Q] Long-Term Optional Quantity Contracts for Deployment Qi Modular Architecture to Allow for Technology Enhancements Qi Conversion of Complete U.S. Data Bases Qi Exchanges - European and Japanese. Data Bases
pr Tk\\
~~., Fully Deployed APS
- Number of U.S. Patents:
- 5 million
- Number of Foreign Patents:
- 7-10 million
- Total Optical Data Base Size:
- 32 terabytes
- Number of Image Workstations:
- 1000
- Projected Capacity of Communications Network:
- 400-500 megabits/second
Database Development for A P S Full Text of U.S. Patents from Printing Process Capture Digitized Images of U.S. Patents
- Scan Patents at 300 DPI
- Write to Optical Discs
- l111a1e Capture Complete In First Qaarter or FY-19
- Display Images at ISi DPI Images of Foreign Patents
- De*elop Trilateral lma1e Standards
- Via Trilateral Agreement with European Patent Office and Japanese Patent Office - Exchan1e Ima1e1 Load Images and Install Discs on A P S as Needed Access to Commercial Data Bases
Automated Patent System Searchable Databases Text Search Engllsh Language Japanese Abstracts Full Text of U.S. Patents Issued Since 1975 to date Engllsh Language European Abstracts Image Search Images of Patents from the European Patent Office Search FIie Images of Japanese Patents Images of selected Non-Patent Literature Documents Images of U.S. Patents Issued since 1790
System Architecture for Automated Patent System Mainframe (NAS9080)
Peripherals Host-to-Network Interface (AUSCOM)
Dlgltal PBX LAN (INTECOM)
Office Automation Processors UNIX Based ------
(FTC-MASSCOMP)
Automation FIie Servers Rapid-Access Dlgltal Document Storage Optical Disk Slngle Drives (Oracle Sun/Optlmem)
Note: Component Connection Using IEEE 802.3 Standard Interface (CMC)
Mainframe (NAS9080)
High Speed Printers (Oracle)
High Density Storage Devices (Sony)
/
Text Termlnal (FTC-Mad)
Workstations High Resolution Workstation (Oracle)
i BENEFITS High Quality Patents (More Comprehensive Search)
Ability to Meet Ever Increasing Workloads Dlssemtnatlon of Technology to the Public Access to Comprehensive Database of Foreign Pat~ts.
\\
GROUP 220 STATUS Group 220 cxamtncrs like the system and are using It full tlme operationally.
Examiners are using the system with advanced and sophisticated search strategies.
We believe the system to be productivity-neutral at this Ume - with Improved quality.
Release #4 (requested by users) provtded Improved functional capablhtles and up to 30% Improvement In systems performance.
New display screens made by Techtronlcs are currently being tested In the Group cluster room.
Public Search Room Q
Four APS Text Terminals Installed Q
Over 600 Public Users Trained Public Use of the System is High Public User Fee is $40 Hour Image Workstations will be Added Soon
BOTTOIILINE Text Search ls Operational and Deployed. Modest Evolutlonaiy Enhancements Continuing.
Image Search Software Is Mature (I.e. Near End of Development. User Requirements for Addttlonal Enhancements ldentlfled and Programmed for ImplementaUon). (Release #5 and #6)
Hardware Improvements Identified and Scheduled for Reprocurement.
Accelerated Deployment of Image Search (from the Schedule In the October 1988 Plan Is Achievable and Can be Justlfled).
Need to RelntUate Developmental Stages of th~ Electronic File Wrapper, PALM. Patent Copy Sales, Classlftcatlon Data Systems and Photocomposttlon.
- . Expand Test Bed to Second Examining Group
- Load Images of All U.S. Patents on Optical Disks
- Expand Text Search Data Base
r---------------------------- ----------
LSS ADVISORY REVIEW PANEL - PLANNING AGENDA Ji 1990 1991 1992 1993 LSSARP October (Tentative)
First Second Agenda Meeting Meeting Review of Revised Topical Guidelines Review of ARP Subcommittee Recommendation on Header Review of SAIC Design Documentation Discussion of Priority Documents Production Schedule Presentation on Access to Technical Data Presentation on Compliance Evaluation Program LSS/HLW Februart March August September January August Apri 1 November January Milestones SAIC SAIC SAIC SAIC Surface Final RFP Award Exploratory First LSS Prototype Capture Image Search Investigations for LSS LSS Contract Shaft Station Report System System
System Design
Begin Contract Operational Design Design Document Document Document SAIC Controlled SAIC SAIC Vocabulary LSS Workstation Thesaurus Hardware (Draft)
Configuration Design
--U.S. Patent and Trademark Office Office of Information Systems - Automation Automation in the Parent&: Trademark Office
[n 1980 the Patent & Trademark Office (PTO) began its current automation efforts by congressional mandate through Public Law 96-517, section 9, whereby the Commissioner was charged with preparing a plan to fully automate the operations of the agency.
ln preparing, the PTO identified its current systems as comer stones for the future systems. A comprehensive plan was drawn up to cover all operations of the agency.
ln 1982. the PTO submitu:d to the Congress a plan to improve the quality of patents and aadcmarlcs through automation. Congress approved the plan's concepts and insaucted the office to go ahead with the implemenwion of its plan.
TRADEMARK AUTOMATION Since 1982, the entire Trademark Examination Operation has been automated and is now using a search and rettieval system with a dam ba.,c of over 600,000 active Trademarks. Thirty*five percent of these trademarb contain picture images, stored elecaooically as digitiz.ed images, of lhc design elements in one dam ba.,c, and 100 percent of the textual infomwion in ASCII form scored in another. Text and Image~ are undcnaken by tradcmut oanaring attorneys to accomplish ta-egpninatioo of applications for Trademn Registration. This sean:b 5YS1e1D is called T-Searcb. The T *Scarth softwae i.t a modified version of a commm:ial softwlR package called ORBIT available from Maxwell Online Inc.. This sysiem is openw:d in a conventional IBM mainframe computer configuration connected t0 workswions. Botb image information and text information are stored on magnetic disc media. The search software allows unique searching capabilities for both text and image in a combined search swement or separately as text or image searches. The capability for simple cexc. phonetic. syllabic and nwneric searches either separately or in combination are also possible. ln a text search, both left and right hand word truncation operations may be performed in a single search statement The worksration used by the trademark attorneys is a Burroughs B*22 microcomputer.
PATENT AUTOMATION The Automated Patent System (APS) is being implemented in response t0 a need to improve patent quality. Th.is system provides improved access for the prior an search performed by examiners 1.1 a preliminary to patentability decisions.
The firsl step towards automation is the availability of full text sean:hing of all U.S. Patents which have issued since January of 1975 and English language
~is of Japanese patents. All 1600 patent examiners have been trained to use full text search which is available through the use of text terminals connected lbrough the APS.
Eventually, full electronic search as depicted in the aaacbed system udlitec*
ture chat will be available to the parent euminc.r al their bigb resolution. dual screen womwions. This system bas already been installed as a productica system in one of the 16 paa.ent examining groups (Group 220).
The APS is noc a conventional architecture as can be seen from the
- RICbtd IIICbileaure chart. Bodi Image end Text type searches may be mad!aned, but unlw the T*Se:e:rcb system dley may not both be searched in a single search statement. The Search software used for the Text Search panioa of APS is also a commercially aVlillble package which b&1 been augmerued for Patent Full Text Search it is called Meaenger and Lt 'Chemical Abslrlcu Services product. Image Sean;h b&1 been created for the Patent and Tradenlam Office by its contracton, Planning Research CorporaJion and Cht.mical 0 Absrracts Service. The use of both lmage and Text search is made possible through the use of a highly sophisticated work.station allowing what the PTO refers to as Full Electronic Search. Full Electronic Search capability via a High Resolution dual screen worlcstarion, allows the Group 220 examiner to search picture images of all of the U.S. PatentS (Over 163,000) assigned t0 areas of technology assigned to Group 220. ln addition to the images the examiner may also conduct a full text search of every word of over 1,000,000 U.S. Parents issued since January,1975 and every word of over 1,170,000 English language abstracts of Japanese patentS and over 6,000 English language absu-acts of Published Chinese patent applications.
Examiners in Group 220 and all of the other examining groups may also access from their text tenninals or wortswio11s certain commercial data bases.
Approximately 4.6 million of the 4.8 million U.S.Pa.tencs have been scanned ;1s digitmd image.,: 163,000 of these patents (over one million images) arc loaded on the APS for.retrieval by the examiners from optical disks and can b6 displayed at the work.sw:ion ten seconds from the requesi command. Each pagt. of each retrieved docwnent may be seen at a rare sligblly over one second per page if desired by the examiner High resolution saeens allow the examiner to view the printled text and complex drawings at somewhal over~ of the acwal printed page me. Eacb wortstation is equipped with a laser prinll!:r allowing the user to waJk away widl a very high quality paper copy of 1be petent documents retrieved from the opcical disk system. This optical disc sysu:m segment of the APS makes the PTO the largest government iosaalJarioo of opcical disk technology.
Bodl single disk drive rapid acces.,
devices and muhi4ive high density.
opcical juke box devices make up the opcical disc system.
In the we Call of 1987 a blue ribbon Industty Review Panel w11.1 established by 1be Secrewy of Commerce. headed by
tJnemrector of the Insti111te for Computer Sitience and Technology. 1be review RBef conducted the first comprehensive cxusmal review of the work accomplished UDdme on the APS of the PTO. The Jlllnd found that the basic concepts of the awmnation master pl.an were sound., and amawomated system of high sgpistication was necessary in this amnplex environmenL However, the i:imm recommended that the PTO n:imucture its management of the gmsam and gain finn conaol of all work dbne!for it by its contractors. A total rnsuw:mringtookplaceofthe Alutamalion program placing the aut!tority for the program in the hands of amA':ssistant Commissioner reporting dmclly to the Commissioner. A comprehensive project management
~
W&1 put inlD place, and the mna:act with the intcgrarioa conaact.or wasrenegotimd. The Indusuy Review Panel has since been formalizrd intO an advisory group which will meet gmiodically ro review the progress of me A¥$and recommend modification tO the proipm and its progress as necessary.
- 11. Mainframe. Two NAS/9080 (IBM Lookalike) mainframes provide indexes 10>approximately 884 billion byteS of optii:al storage. and 80 billioo byteS of OD U.magnetic SlOnge for text 9ell'Ch.
supporting the search and relrieval of text andiimageda1aba.1es.
1.IJliptal Switdl. A local area network bamoo bigb speed digual IWilcbel umta all elements of tbe.,.._ OYa' a fibcropcic ~
wilb Cffl!teetiDUI ll'allpll'CDt t0 end men. Eadl bip-speed dip.I C'hanot,I SuppCAlS dila lnnpllis,ioa a speeds in excess of 700 ki1obiu per second. The toca1 network capecicy will acad 1.000 mepbils per~
1 HCllt*CO-network interface. Translates die necwort prolDCOI between the mainframe and all ocher devices in the system.
- 4. Rapid*Acces-Dmces (RAD).
Opcical discs c:oatain frequently ac:cessed APS System Architecture (D
information. such u U.S. Puents in dif itiz.ed image c:ompressed form.
!. ili&b*DeuitJ...... Jences. Opcical disk 1ilnriea srme a second copy of frequauly u.ted infcmwa. and two copies of less frequendy accessed iDfamwion.
- 6. Wonmdoa aenen. Office Amoawion file mvas sm individual ma's wart. o~ *ummaaoo procamr will provide electzoaic mail. word pnxesan1. and ocher ~mtioa
,oftware,
'7.Cmlnl ud Groap Prilltu.
Cauramcd primiq provida 300 lila-per*incb par.em pap imqes.
- a. Woraationl. Mare ma 800 wortscaaoas will pro~de me primary wic:r imcrfacc ro me sysiem for,eacb and rcuicval and full office lllhXMOOII funcaoa.s.
- 9. Tm Termin*la Over 1,000 tennina!Y will provide an additional user in~
tO the SystaD for i-ac text search, commacial dD bue aca:s.,, and office
- oaoa,*riOQ timctions.
- 11. Emnal Q'ICIIII Iaterfce.
Galeways provide* intr:rfacc ID commercial dela bues.
Additiolflll uyor,,tlllio11 011 activities or urcfat<<J iltformatio11 011 awomation in th4 Pau1&1 a,,,ti Trodatark Office may be
"""1iMd from:
Oflk* of l,vor,,tan1>11 Syst.tms U.S. Pau1&1 cl Trat:J,em,ark Office W4W11flOII, D.C.20231 Ttkplton.e: (703)$$7~
Tt~ :(703)$$7~169