Authority Control

Limited vs. Full Manual Review

Some argue that full manual review of headings that failed to link, or only partially linked, to an authority record during automated processing is an essential component of any serious authority control service. Surely, there can be no question that manual review will always correct some headings not linkable via machine processing. On the down side, manual review of unlinked headings extends time and represents a major cost in delivering authority control services. This cost is passed along to the customer and is apparent in the wide price difference between machine processing and manual review processing.

Whether full manual review is critical depends to a large extent on the effectiveness of the vendor's machine processing. If full manual review corrects only a percentage point or two of the library's total headings, its considerably higher price may not be justified. For example, if during machine or limited review processing 96% of the library's subject headings are validated against authorized headings, the potential of increasing the link rate by several percentage points may not merit the additional expense. On the other hand, if the vendor's machine processing validates 80% of the library's subject headings and another 17% are validated during manual review, then manual review becomes a necessity with that particular authority control vendor.

In the absence of commonly accepted qualitative standards measuring the effectiveness of batch authority control services, libraries should consider two benchmarks prior to contracting for authority control. First, the vendor's anticipated validation rate of library name and subject headings to authority records; and, second, the ratio of authority records to bibliographic records that the library receives from the vendor after processing is completed. These ratios vary with library type, size, and nature of the collection, but the vendor should be able to provide reliable estimates based on similar jobs already performed. Vendors should also offer guidance as to the most cost-effective and suitable processing options for a specific library.

Unless there are special circumstances, such as the library requesting that locally tagged subjects be re-tagged as LC subject headings prior to authority control, no US database adhering to nationally accepted cataloging standards is shipped from LTI unless at least 95% of the controlled headings have linked to authorized headings.

Looking at the second benchmark, for databases up to 150,000 catalog records, LTI returns about one LC authority record per catalog record. As the number of catalog records increases, the ratio of LC authority records to bibliographic records decreases because headings are more likely to re-occur. Public library databases are likely to yield fewer authority records than academic library databases because of the likelihood of the collection containing multiple titles by popular authors. For academic databases above one million records, the ratio of authority records to catalog records is .65:1. In other words, a million bibliographic records might result in extraction of about 650,000 total name and subject authority records.

Cost benefit is yet another way to evaluate machine versus manual review processing. [These figures are based on statistics from LTI processing and are not applicable to other authority control vendors.] On average, for bibliographic records taken from OCLC, RLIN, or other distributors of LC MARC records, roughly one percent of the library's controlled headings will be changed during full manual review that would not have been changed during limited review processing. For example, assuming a database of 200,000 catalog records with each record containing 3.8 controlled headings-i.e., a total of 760,000 controlled headings-full manual review processing might change about 7,600 headings that would not be changed during limited review processing. To achieve this 1% link rate improvement, the cost of a library's authority control project will be more than double that of a limited review job. For some libraries, particularly those with unusual collections or whose records are of questionable quality, the additional fixes resulting from full manual review are justified. For most libraries, selecting limited review processing is an opportunity to save money without paying a huge penalty in quality. For other authority control vendors full manual review may have real utility but with LTI's processing it is nearly always a waste of library resources.

Libraries are sometimes persuaded to select manual review for bogus reasons. They are led to believe that manual review is somehow going to eliminate or reduce greatly the number of mis-linked headings. The "Madonna myth" illustrates this well. An authority vendor might suggest that in machine processing the rock star "Madonna" could be confused with "Mary, Blessed Virgin, Saint." There is a certain logic to this. However, if one considers how full manual review authority control is in fact performed, the example makes no sense, other than to discourage libraries from requesting a less costly machine processing option.

Regardless of whether a library selects machine or manual review processing, controlled headings are extracted from bibliographic records and first run through machine processing. It is only after a library heading has not linked to an authorized heading during machine processing that the heading becomes a candidate for manual review by an editor.

The critical point is that if a heading is mis-linked during machine processing, that heading will never come to the attention of an editor because it has successfully (albeit incorrectly) linked to an authorized heading. Editors do not check every linked heading in every record to verify that a proper and correct link has been made. Instead, they examine only those headings that failed to link to an authorized heading during machine processing. If editors reviewed every heading linked during machine processing, the authority control vendor's costs might easily exceed one dollar per record. Few libraries could afford authority control at that price.

Like the Madonna example, there are thousands of bad or ambiguous LC cross-references that need to be "blocked" prior to linking headings during machine processing. LTI blocks any heading containing five or fewer characters from linking during the initial machine link. Selectively, some of these headings are unblocked [e.g., Asia, Iran, Iraq, etc.] where there is no likelihood of an incorrect link being made to an authority record. If blocked by LTI, these headings appear in the unlinked headings list and, if the library believes the authority record is important to its catalog-for example when it contains a useful cross reference or explanatory note--the authority record can always be downloaded from LC.

Taken together, in the LC name and subject authority files there are over 11 million 1XX headings and 4XX (see from) references, of which LTI blocks 163,000. As one might anticipate, four out of five of the blocks involve corporate/conference heading cross-references (41X). For example, the initials AAS appear as cross-references in 20 LC authority records, including those for the American Astronautical Society, African Academy of Sciences, Arkansas Archeological Society, American Arachnology Society, Aquaculture Advisory Service, Association for Asian Studies, etc. Neglecting to block ambiguous headings and cross references can lead to some amusing mislinks.

It is easy to identify databases where vendors have confused LC authority records with authority control within a given database. Actual examples of the mayhem caused by simplistic processing algorithms are listed below.

Catalog record heading as received from library:

610 20 $aInternational Society for Augmentative and Alternative Communication,$cthe patriarch$xJuvenile literature.

In pre-authorized record, heading probably read:

600 00 $aIsaac,$cthe patriarch,$xJuvenile literature.

Catalog record heading as received from library:

110 20 $aBibloiteca Estadual Celso Kelly.$c(Musician)

In pre-authorized record, heading probably read:

100 00 $aBeck,$c(Musician)

Catalog record heading as received from library:

600 20 $aIturralde Gomez, Antonieta,$cPrincess of Wales.

In pre-authorized record, heading probably read:

600 00 $aDiana,$cPrincess of Wales.

Similar problems result when authority control vendors use tables to expand parts of headings during a pre-processing procedure, without taking into account the entire heading. For example, for years one vendor routinely changed the geographic subdivision $zMelbourne to $zMelbourne (Vic.) when in fact many of the headings referred to the city in Florida.

Consider the following examples from an academic library database previously authorized by an established authority control vendor. As LTI received the catalog record for Joanna Cole's El autobus magico en tiempos de los dinosaurios [Magic school bus in the time of the dinosaurs] (1995), it contained two unjustified personal name/title subject added entries (tagged 600 - 2nd indicator 0), one for:

$aOliver,Rupert.$tDinosaurios$xLiteratura juvenil.$2bidex

[and another for]

$aWilson,Ron,$d1941-$tDinosaurios$xLiteratura juvenil.$2bidex

When examining the LC author/title authority records for Oliver, Rupert.$tDinosaurios [n 96117774] and Wilson, Ron,$d1941- $tDinosaurios [n 97004704], one finds 430 cross-references from the title Dinosaurios. In this case the library's source record-i.e., as delivered to the first authority control vendor-contained the Spanish language subject Dinosaurios$xNovela$2bidex incorrectly tagged as LCSH. Because the vendor found multiple cross-references in the LC index files form Dinosaurios to both the Oliver and the Wilson authority records, it hedged its bets by adding both author/title subjects to the library's catalog record. Presumably, had there been 27 cross-references in LC authority records having a see-reference from Dinosaurios, the vendor would have inserted 27 new subject headings-none having any relationship to the work at hand. Identical sets of the Oliver and Wilson bogus author/title subjects appear in other records in this database, including Spanish language editions of Michael Crichton's Jurassic Park and Syd Hoff's Danny and the dinosaur. If nothing else, these records demonstrate the ease in which computers can trash a library database.

In another record LTI found a subject heading for a Russian author who had no relationship to the publication. The catalog record describes a state government report issued by the Alaska Department of Fish and Game, Division of Wildlife Conservation, on the topic of wolves. The authority vendor added a spurious heading because there is a cross-reference from "Wolf" in the LC authority record having the control number no 96032788.

In still another catalog record, this one a handbook, one finds a puzzling uniform title (630) subject heading Handobukku$xWaterfowl management. What happened here makes sense when examining LC authority record "nr 93048295," where there is a 430 reference from Handbook to Handobukku. In all the above cases the headings are structurally good-they just have zero relevance to the catalog records in which they occur. Most catalog users coming across them would not exert the effort to understand why they appear in the catalog records, thinking perhaps that the cataloger knew something that they did not.

Unfortunately, once a heading has been mis-linked, a subsequent vendor will find it almost impossible to identify and to fix such headings. They are encountered only by chance or the presence of invalid subfield codes. Manual review authority control is of no value if editors doing the review do not have easy access to the bibliographic record from which the unlinked heading was extracted. A large percentage of headings that remain unlinked following machine processing require that the editor examine both the source bibliographic record and the authority record in search of clues that will determine a link to be made between the two.

Finally, full manual review should not be confused with either "choice of entry" or re-cataloging. Similarly, descriptive cataloging is unaffected by authority control and, while some vendors do offer various record enhancement services that can result in the addition of subjects and added entries to the library's bibliographic record, authority control per se is focused on controlled headings already present in the library's source records.

Prior to offering a full manual review authority control cost quote, LTI requires that the library' submit its entire database for a no-charge evaluation. If our analysis shows that the library's cataloging is not consistent with U.S. national cataloging standards and practices, full manual review is not available as an option.

Next: Details of LTI's Processing