Regardless of database size, FTP is the standard method to receive and return bibliographic and authority record files. A portal for file transfer is available here on LTI's website.
Data verification checks are made during and immediately after the transfer of records to LTI's computers to ensure that records are properly formatted in the USMARC communications format. Checks are also made to ensure that the record directories only contain numbers, that controlled heading fields do not contain non-MARC characters, etc.
Setting of non-filing indicators in eight title fields is one of several pre-authority control processing operations. Non-filing indicators specify the number of initial characters to be ignored during computer filing.
For the title field (tag 245), the only title field to which the language code generally applies, articles associated with the fixed field language code are compared against the initial text in the title field. Based on this comparison, the non-filing indicator is set to 0, if no match is made, or to its proper matched value. The program takes into account diacritics and special characters associated with an article, but preceding the first actual filing character.
If the fixed field language code in bytes 35-37 of the 008 is either blank or does not match a language code, the algorithm compares the title (245) field's initial text against a table of common articles, composed from any languages, and sets the non-filing indicator to its proper value or to 0 as appropriate.
Because title fields other than 245 (e.g., X30, 240) do not necessarily correspond with the fixed field language code, LTI's program compares non-245 field initial text against the table of common articles and sets the non-filing indicator to its proper value or to 0 as appropriate.
As a rule, in controlled fields LC practice is simply to delete initial articles in the formulation of headings. LC authority records do not contain filing indicators in 1XX fields, e.g., the geographic heading is "Dalles (Or.)" -- not "The Dalles (Or.)" (n 82036146).
Automated non-filing indicator fix programs are sometimes unable to distinguish correctly between when a leading letter (e.g., A) or word (e.g., Lo) in a title is used as an article and when it is used as another part of speech that should not be ignored in filing. LTI's software uses, when appropriate, up to four words in the title to help determine if the initial word is actually used as an article. Examples of where the non-filing indicator is set to 0 based on an analysis of the second word of the title are listed below:
A is for apple
A B C of power brakes
El Salvador 1932
A la orilla del viento
A la recherche du temps perdu
Das ist mir lieb
Un de Baumugnes, etc.
While it is still theoretically possible for a filing indicator set correctly in the source record to be re-set to an incorrect value, the chances of that happening are extremely remote.
For those that insist on preserving their original non-filing indicators in the 245 title field, LTI can preserve them on request. This is not an option in controlled title fields where the removal of initial articles is controlled by explicit LC authorized headings.
LTI creates an ASCII text report showing every change made to a title non-filing indicator. This report shows the before and after settings along with the relevant title text. For libraries profiled to correct non-filing indicators, the report provides reassurance that its non-filing indicators are being correctly set. It is just as useful for libraries that have chosen not to have LTI set their non-filing indicators because it lists the changes that should have been made had the library opted to have them fixed.
Optionally, the first indicator in the 245 field can be set based on the presence or absence of a 1XX field. In other words, the first indicator is checked and if necessary changed to 0 when there is no 1XX field present in the record, or to 1 if there is a 1XX field. Most libraries select this option based on LC rule interpretation [LCRI 21.30J] that titles proper should always be traced.
AuthPrep and MARC Update Processing
Authority control at LTI begins with a generalized database clean-up program, AuthPrep, whose purpose is to increase both the probability of catalog record heading matches against authority headings and to upgrade unauthorized headings to current forms.
AuthPrep normalizes headings to correct for a variety of typographical and punctuation errors. These include elimination of leading and trailing blank spaces, compression of multiple blank spaces to a single space in 1XX/4XX/6XX/7XX/8XX fields, and deletion of blank spaces on either side of subfield codes.
AuthPrep makes many types of changes to headings, some of which correct the omission or improper assignment of content designators while others correct punctuation or bring headings into compliance with AACR2. Examples of changes at the subfield code level include additions (such as inserting, where appropriate, $f, $l, $s, and $k in title fields, $c and $d in personal names, $b in corporate names, and $v in series); conversions (such as changing $b to $n in conference names, correcting errors caused by the omission or improper assignment of $c, $d, and $e in personal names); and deletions (optional removal of subfields $e and $4 from name headings).
Complex Bible, music, and other uniform title headings are parsed and updated to conform with current cataloging rules. Leading non-filing articles are removed from uniform titles and title portions of author/title headings and unnecessary parentheses and brackets are deleted from name headings. If not already present, brackets are added surrounding GMDs in controlled title fields. AuthPrep even updates certain non-controlled heading fields -- e.g., the OCLC control number preface is changed from ocl7 to ocm0, 301/ 305 fields are retagged as 300.
Obsolete subject subdivisions such as Addresses, essays, and lectures and Collected works are eliminated. As part of this preliminary clean-up the letters l and O are converted to 1 and 0 respectively in date subfields, and a check is made to ensure that subfield code $d precedes dates in personal names. 1XX fields with a second indicator of '1' generate the appropriate 6XX subject heading. All 1XX second indicators are set to blank. Format integration changes are also made at this time.
To achieve consistency with the current MARC standards, catalog records are modernized to reflect the latest MARC 21 Format for Bibliographic Data tagging and coding conventions. LTI's MARC Update service is an important clean-up step for records created prior to 1987. An exhaustive table of authority control pre-processing fixes is found in the document LTI MARC Update Changes.
Changes include deletion of obsolete fields and subfields; conversion of obsolete tags, indicators, and subfields to current usage; and conversion of outdated fixed-field element codes. Examples of MARC Update changes are deletion of 039 fields, conversion of subfields $d and $e in 245, 246, and 247 fields of the serials format to the currently defined codes $n and $p, conversion of the obsolete 705 and 715 fields to 700 and 710 respectively. Several MARC Update options are offered, including deletion of LC Children's, Sears, genre headings, or NLM MeSH subject headings. Unless instructed otherwise, LTI deletes the obsolete OCLC-generated 87X fields.
Other Pre-processing Routines
Changes to cataloging rules due to AACR2 require special processing on series, conference names, and titles prior to authority record linkage.
Series statements (400/410/411/440) are retagged as 490 fields and assigned a first indicator of 1. An AACR2 series entry is then added in the appropriate 800/810/811/830 field. Removal of initial articles, capitalization changes, and adjustment of filing indicators is frequently necessary as part of this processing. To illustrate, the traced title series:
440 4$aThe series in computer science
490 1 $aThe series in computer science
and an AACR2 series field is added to the record
830 0$aSeries in computer science
If the original 4XX series begins with the pronoun His, Hers, Its, or Their, the pronoun is replaced in the 8XX field with the full heading from the catalog record's 1XX field. In addition, 840 series fields are tagged as 830 fields.
In conference name headings, the order and punctuation of data elements in subfields $b, $c, $d, and $n are changed to conform to AACR2. In the 111/611/711/811 fields, the obsolete subfield $b is converted to subfield $n and the number, place, and date are reformulated in parentheses with proper subfield coding and punctuation. To illustrate, the conference heading:
111 20$aInternational Conference on Elizabethan Theatre, $b1st, $cUniversity of Waterloo, $d1969
is converted to:
111 2 $aInternational Conference on Elizabethan Theatre $n(1st :$d1969 :$cUniversity of Waterloo)
Controlled title fields are checked for proper punctuation and subfield coding. Omitted subfield coding, including subfield $l before languages and $f before dates, is inserted. Brackets are removed temporarily from media qualifiers in subfield $h. Media designators are checked to ensure they do not prevent an authority record link, e.g., $hPhonorecord and $hPhonodisc are changed to $hSound recording, $hMachine-readable data file and $hComputer file to $hElectronic resource. In accordance with current LC and OCLC practice, following authority control: 1) brackets are added surrounding GMDs ($h) in 245, 246, and 740 fields; and 2) GMDs ($h) in these same fields are corrected to current AACR2 forms.
Extraction of Controlled Headings
Following AuthPrep processing, headings eligible for authority control are extracted from catalog records and a unique, sequentially assigned number is added to the end of controlled heading fields. This number provides a link to permit reinsertion of the authority controlled heading into the catalog record later in the processing cycle.
Table I lists MARC record fields and subfields checked by LTI's authority service. With the exception of subfields $u, $w, $4, $5, $6, and $9 all subfields in catalog record headings are matched against all appropriate subfields in LC authority record headings. Subfield $v in 8XX fields are validated and corrected wherever possible, e.g., when volume designation information has been miscoded as part of $a or miscoded as $n or $p, or when other clear errors in formatting occur. In addition, subfield $v data is corrected based on the 642 field of the referenced authority record. For LC subject authority control, only subject fields (6XX) with a second indicator of 0 (i.e., LC subject headings) are validated. A blank second indicator is treated as if it were 0. LTI offers optional authority control of LC Children's subjects, NLM's MeSH subject headings, and some genre headings.
100 $a q b c d e k t n p l f g 110 $a b e n d c k t p l f g 111 $a q e g k t p l f 130 $a t n p l f k s g d m o r h 240 $a n p l f k s g d m o r h * 400 $a q b c d k t n p l f g v * 410 $a b n d c k t p l f g v * 411 $a q e g k t p l f v * 440 $a n p v 490 1st ind. 0 - optional 600 $a q b c d k t n p l f m o r s h g v x y z 610 $a b n d c k t p l f m o r s h g v x y z 611 $a q e g k t p l f s h v x y z 630 $a t n p l f k s g d m o r h v x y z 650 $a b v x y z 651 $a v x y z 655 $a 2 700 $a q b c d e k t n p l f m o r s h g 710 $a b e n d c t p l f m o r s h g 711 $a q e g k t p l f s h 730 $a t n p l f k s g d m o r h 800 $a q b c d k t n p l f m o r s h g v 810 $a b n d c k t p l f m o r s h g v 811 $a q e g k t p l f s h v 830 $a t n p l f k s g d m o r h v 840 $a h v * converted to corresponding 8XX
Table I. MARC fields and subfields validated by LTI's authority control service