lrts: Vol. 56 Issue 3: p. 155

Batchloading MARC bibliographic records into libraries’ online catalogs has become an increasingly common and necessary means of providing access to the electronic and microform resources that libraries collect or to which they provide access. While individual libraries are challenged to create title-level metadata for large collections like Early English Books Online (EEBO) and Eighteenth Century Collections Online, e-book collections from publishers such as Springer, Wiley, and Elsevier, and microform collections such as Papers of the NAACP and the Congressional Information Service CIS Congressional Committee Hearings on Microfiche, 1833–1969, batchloading records improves discoverability and ensures that a library's digital and microform holdings are accurately reflected by the catalog. Bibliographic records for large collections may be provided by the publisher, the aggregator, or a third-party vendor or utility, such as OCLC. Some collections are finite while others may grow over time, presenting the challenge of performing batchloads periodically for a single collection.

The acquisition of electronic and microform collections and their accompanying bibliographic records present many challenges to technical services and other units involved in batchloading activities. Maintaining consistent record quality can be problematic. Vendors, including publishers, aggregators, and bibliographic utilities such as OCLC, often supply bibliographic records for collections that libraries purchase or to which they license access. These vendors do not follow consistently the cataloging standards that libraries have been accustomed to applying in their online catalogs. As licensed or purchased collections grow, many libraries have turned to acquiring bibliographic records for them. These ongoing updates can become a heavy workload issue for technical services and other units. Notifications of additional bibliographic record availability also can be problematic because some vendors do not have effective notification systems.

This paper reports on a study that examined how batchloading activities in large research libraries affect staffing, budgets, workflow, and the quality of records in the catalog. It examines how these libraries manage batchloading activities, how information technology issues support or hinder batchloading activities, and how libraries assess the effectiveness of batchloading. The authors also explore how libraries work together to address some of the issues presented by batchloading activities and needs. In the current economic climate, libraries must adopt and refine the most cost-effective methods available for facilitating access to digital collections. The methods chosen inevitably will be affected by high-level policy questions, many of which remain to be answered. The authors hope the survey will reveal some of the methods currently in use and point the way to further innovation and improvement.

Literature Review

As a relatively new practice, MARC bibliographic record batchloading has a limited literature, but the authors found seven useful examples. One offers a valuable overview, three present case studies, two address issues of record quality, and one focuses on the impact of batchloading on users. In her overview, Martin describes the challenges that libraries face in providing cataloging for e-books, maintaining that libraries have to consider many issues when cataloging e-books.1 These include the source of bibliographic records, whether they can be batch processed, whether to combine print and online holdings on the same records, what modifications to the bibliographic records will be needed, how to maintain the records, and whether to add holdings for e-books to WorldCat.

In the first case study, Mugridge and Edmunds reported on how the Pennsylvania State University has addressed the challenges of loading large numbers of bibliographic records for electronic and microform collections into the online catalog.² They found that efficiencies can be gained by striving for a standardized workflow and documenting procedures while recognizing that some flexibility must be maintained to accommodate the needs of various stakeholders and library users.

Wu and Mitchell discussed the increasingly complex e-book landscape in a case study at the University of Houston Libraries (UHL).³ They reported on the use of vendor-supplied cataloging for a collection of approximately 400,000 e-books, finding that the benefits of batchloading vendor records outweighed the limitations of those records. UHL staff used MarcEdit extensively to batch edit bibliographic records before loading them into the online catalog. MarcEdit, developed by Terry Reese, is “a one click harvesting process for generating MARC metadata from a variety of metadata formats.”⁴ However, Wu and Mitchell reported that UHL intends to use its e-resource knowledgebase, SerialsSolutions KnowledgeWorks, to provide access to e-books, allowing the process to be further streamlined. KnowledgeWorks is a data repository for SerialsSolutions 360 services and is used to provide access, management, and assessment services for electronic resources. It can be an alternative or a supplement for libraries that do not want to batchload bibliographic records for their e-resources into the integrated library system (ILS).

In the third case study, Martin and Mundle reported on the University of Illinois at Chicago University Library's experiences batchloading MARC records for the Springer e-book collection.⁵ Martin and Mundle found that the free records made available for the e-books purchased through a consortial initiative required considerable work to improve and clean up the records after they were loaded in each of the consortia members’ online catalogs. They predicted that this situation is likely to persist as vendors continue to automate record creation. They indicated that MARC record subscription services for aggregated e-book collections might help libraries keep up with the rapidly growing e-book landscape. Martin and Mundle pointed out that libraries may need to provide quick, albeit less than ideal, access to e-books and other e-resources and follow up by improving accuracy and quality after that access is in place.

Minčić-Obradović offered a more general discussion of the deficiencies of records available.⁶ While many commercial publishers and vendors are able to supply MARC records for their e-books, these records still have numerous problems. The quality of the records varies widely and updates are not supplied when needed. The development of provider-neutral records—i.e., a single bibliographic record that can be used for all instances of an online resource—offers some promise, but provider-neutral records will only prove useful when they are widely adopted. Standalone programs such as MarcEdit have proven very helpful to libraries as they attempt to redress the significant lack in quality in bibliographic records supplied by vendors.

Quality issues with vendor-supplied records prompted a collaborative approach at OhioLink, a consortium of eight Ohio college and university libraries and the State Library of Ohio. Preston described the efforts of OhioLINK's Database Management and Standards Committee (DMSC) to provide bibliographic records for more than 44,000 e-books purchased by the consortium.⁷ Concerns about the quality of vendor records raised during the DMSC projects included the lack of Library of Congress or medical subject headings, unauthorized author name headings, missing International Standard Book Numbers (ISBNs), and serial issues cataloged as monographs. DMSC members found that abandoning vendor-supplied records and instead locating them in bibliographic utilities, such as OCLC's WorldCat, or creating records from scratch, was often less laborious.

Grigson discussed the challenges inherent in making e-books visible to the public.⁸ She pointed out that while federated search and preharvested search (such as SerialsSolutions Summon Service) may offer complementary search options, the catalog remains an effective discovery tool for e-books. Grigson identified many issues that present ongoing challenges, such as keeping up with updates to e-book collections, deleting records for e-resource collections that are not renewed, inadequate bibliographic record quality, and ensuring the accuracy of links.

The present paper is intended to supplement existing literature by examining MARC bibliographic record batchloading workflow at 18 very large U.S. and Canadian research libraries that responded to a survey.

Research Method

The authors designed the survey to provide insight into the impact of batchloading records on technical service departments of large academic and research libraries. The authors created the survey using the SelectSurvey web-based survey management product. The survey included fifty-seven questions organized into ten sections: demographics, staffing, budgets, scope, management, workflow, quality standards, collaborative efforts, information technology (IT) support, and assessment. Some questions provided the option of adding comments. The survey was reviewed by Penn State's Office for Research Protections(ORP); because the survey did not collect information about human subjects, it did not require ORP approval. The survey can be found in the appendix.

In December 2010, the survey was distributed via e-mail to the members of the Association for Library Collections and Technical Services’ (ALCTS) Technical Services Directors of Large Research Libraries Interest Group (commonly known as “Big Heads”). This group consists of the technical services directors of the 24 largest Association of Research Libraries (ARL) university libraries (including Canadian libraries), 1 non-ARL university library (Stanford University), 2 public libraries, and 3 national libraries. This group was chosen because the authors believed that they would be likely to both be very familiar with batchloading activities within their libraries and be facing many of the same challenges that are being investigated in this research. The authors also believed that they would be interested in the topic and therefore likely to respond. However, the initial response was low, with only 7 of the 30 member libraries responding. Follow-up e-mail invitations were sent to the individual technical services directors specified as the Big Heads representatives in June 2011, resulting in 18 submitted surveys (a 60 percent response rate). Respondents included 1 Canadian library and 17 United States libraries. Six were private universities and 12 were public universities. None of the responding libraries were public or national libraries. If one eliminates the 3 national U.S. libraries (Library of Congress, National Library of Medicine, and National Agricultural Library), which have quite a different mission and clientele, the response rate increases to 66.7 percent. Respondents were given the option of skipping questions if they were unable to determine the answer; in many cases, 17 or fewer respondents answered a particular question, and those instances are noted throughout the paper.

Survey Results and Discussion

Demographics

All 18 respondents to the survey (table 1) have an ILS created by one of the three major vendors (Ex Libris, SirsiDynix, or Innovative Interfaces). The number of records in their respective ILSs varies from 1 million to more than 12 million, with the average close to 6 million records. Staff size of the responding institutions varied greatly, as did the relative proportion of librarians, staff, and hourly staff and student employees. The number of full-time equivalent (FTE) librarians ranged from 45 to 266 (with an average of 108), number of staff ranged from 60 to 488 (averaging 211), and the number of hourly staff and student employees ranged from 50 to 454 (averaging 166).

Staff Involved with Batchloading Activities

The number of FTE devoted to batchloading activities varied from 0.8 to 11.0. The number of librarians devoted to batchloading activities varied from 0.0 to 5.0 (averaging 1.7), with other professionals ranging from 0.0 to 4.5 (averaging 0.9), and support staff ranging from 0.0 to 5.0 (averaging 1.5). One institution reported 0.3 FTE “other staff” participating in batchloading efforts and no institution reported using student assistants. While some correlation exists between total staff and the number of employees devoted to batchloading activities, notable exceptions also were seen: the library that reported the largest number of staff devoted to batchloading (11) had 716 total staff, whereas a library with even more staff (750) reported only 6 FTE devoted to batchloading. Similarly, 1 library with 589 total staff had only 1.2 FTE devoted to batchloading, whereas another library with fewer total (513) had 7.0 FTE devoted to batchloading. The 6 private universities had an average of 666 total staff, whereas the private universities had an average of 434 total staff. However, both the public and private universities reported an average of almost 4.0 FTE devoted to batchloading activities.

An analysis of staff devoted to batchloading activities in comparison with total expenditures for e-resources does not indicate any correlation. For example, when the responding libraries were ranked according to expenditures for e-resources, the 6 libraries that spent the least on e-resources averaged 4.6 FTE staff devoted to batchloading, the middle 6 devoted 5.2 FTE to batchloading activities, and the 6 libraries that spent the most on e-resources devoted 2.6 FTE to batchloading activities.

All respondents reported that they had redefined existing positions to add responsibility for batchloading. Four respondents (22.2 percent) reported that they had redefined existing positions to be solely dedicated to batchloading. Another 4 reported that they had created new positions dedicated to batchloading.

Two-thirds of respondents (12) expected to devote more staff to batchloading in the next five years. Because two-thirds of the respondents anticipated growth in staff levels for batchloading, one could assume that the batchloading workload is increasing or growing more complex at most institutions polled. The institutions who responded that no increase in batchloading staff was projected may feel they have sufficient staff devoted to batchloading for the short-term or they may feel that current budgetary climate and competing priorities would not allow allocation of more staff to batchloading, even if doing so were deemed to be advisable. Future research might explore the perceived priority of batchloading in the matrix of other services provided by libraries’ technical services departments.

Budgeting for Bibliographic Records and for Batchloading Activities

Only 23.5 percent of 17 respondents reported having a dedicated budget allocated for ongoing costs of batchloading activities. The amount of money spent each year on the purchase of bibliographic records varied widely. One should note that because many collections (especially e-content) include bibliographic records as part of the purchase, expenditures for records are not necessarily an indicator of number of records acquired. No institution reported spending less than $1,000 per year; 3 of 16 respondents (18.8 percent) reported spending between $1,000 and $5,000; 1 library spent between $5,000 and $10,000; 4 libraries (25 percent) spent between $10,000 and $50,000; 4 spent between $50,000 and $100,000; and another 4 spent more than $100,000 per year. Based on this rough snapshot, the cost of bibliographic records, although only a small percentage of acquisitions and collections budgets totaling $12,000,000 and more, is not insignificant for many institutions.

Funding for the purchase of bibliographic records came principally from collections and operations budgets. Of the 17 respondents to the question on sources of funding, 15 (88.2 percent) reported using monies from the collections budget; 11 (64.5 percent) reported using monies from the operations budget; 3 (17.6 percent) reported using special funds; 3 reported using endowments; and 1 reported using grant money. (Note that respondents could identify multiple funding sources if applicable.) One respondent specified that funding came from the operations portion of the technical services budget and another noted that the exact source of funding for bibliographic records is unknown because his or her library uses a decentralized funding model. Responses to this question make clear that institutions use various funding sources and the majority purchase their records using a combination of collections and operations budgets.

Nine respondents described their funding models in greater detail. Four respondents indicated that the costs of batchloading were covered ad hoc and distributed across job or organizational lines. Most respondents reported that costs were not explicitly tracked, aside from the costs associated with individual purchases of batch records. In one case, funds were taken from the supplies budget, which, at that institution, is distinct from the collections development budget. In two cases, a specific person (the assistant director for Technical Services and Collections Development officer; the head of Collections) was named as overseeing the costs. One respondent reported his or her library's process as “quite distributed, and not very efficient currently.”

The picture that emerges from the responses is of no single funding model, but an array of approaches. The relatively distributed nature of funding models likely makes tracking of resources and their assessment challenging. For example, assessing the cost-effectiveness of the purchase, loading, and ongoing maintenance of a set of records is more difficult if the costs associated with the workflow are distributed throughout an organization, with some costs being more transparent and easier to track than others.

Scope of Batchloading Activities

No respondents reported batchloading fewer than 100,000 records into their ILS during the past three years. Of 17 responding libraries, 2 (11.8 percent) reported loading between 100,000 and 200,000 records; 3 libraries (17.6 percent) reported loading between 200,000 and 500,000 records; and 12 libraries (70.6 percent) reported loading more than 500,000 records. An analysis of the number of records batchloaded did not reveal a correlation between that number and the respective libraries’ total expenditures. For example, the 2 libraries that reported loading between 100,000 and 200,000 records within the last three years averaged $19,207,552 in total acquisitions budget. The 3 libraries that loaded between 200,000 and 500,000 records had an average total acquisitions budget of $19,235,052. The 12 libraries that reported batchloading more than 500,000 records had an average total acquisitions budget of $17,513,333, lower than the libraries that batchloaded far fewer records.

These numbers indicate that many libraries added a sizable percentage of the total number of records in their respective catalogs in a short timeframe. Although 8 (47.1 percent) of 17 respondents batchloaded less than 10 percent of the total number of records in their catalogs in the past three years, 6 (35.3 percent) reported that such records amounted to 10–20 percent of their catalog, and 1 library reported that they totaled 30–40 percent. In one noteworthy case, a library responded that recently batchloaded records represent more than 50 percent of the total records in its ILS.

All respondents reported deleting records in batches, some in large numbers. Only 3 (17.6 percent) of the 17 responding libraries reported deleting fewer than 1,000 records each year; 5 (29.4 percent) reported deleting between 1,000 and 5,000 records each year; 4 (23.5 percent) reported deleting between 5,000 and 10,000; 2 (11.8 percent) reported deleting between 10,000 and 50,000; and 2 reported deleting between 50,000 and 100,000. One library reported deleting more than 100,000 records per year.

The most common reason (reported by all 17 libraries that answered this question) for deleting records or suppressing records from public view was cancellation of a subscription to an online databases or collection. Nearly one-quarter (4 libraries; 23.5 percent) cited withdrawal of physical items, 8 (47.1 percent) cited invalid URLs, and another 4 cited errors found in records. Other reasons not explicitly listed in the survey question but identified by respondents included normal maintenance, e.g., the title was no longer available from a vendor or publisher, publishers were removed from EBL or ebrary profiles, and the contents of large sets and e-book packages changed. The number of responses to this multiple-choice question makes clear that most libraries have multiple reasons for batch deleting records from their ILSs.

The responses summarized above indicate that batchloaded records compose a remarkable proportion of records in many institutions’ ILSs and that batchloading is a dynamic process (or set of processes) that involve not only the batch adding of records but also batch removal. Online collections are perhaps more subject to change and in briefer timeframes than physical collections. When a substantial portion of an institution's collection is digital rather than physical, the fluidity of the catalog and thus the need to batchload and batch remove records is likely to increase dramatically. While monographic records for print materials and microforms are relatively static, records for monographic e-resources tend to be more subject to change. For catalogers with a purely monographic background, this fact may require change to a mind-set closer to that of a serials cataloger, who is accustomed to repeated edits of a single record. Given the current scope of batchloading (and its anticipated future), management and coordination of access to digital assets will no doubt be an important role for libraries technical services departments in the coming years.

Management of Batchloading Processes

Management of batchloading processes is commonly shared across participating libraries’ organizational units. The most frequently cited unit with responsibilities for batchloading activities is Cataloging, identified by 13 (76.5 percent) of the 17 respondents to this question. Additional units identified were IT or Systems (9; 52.9 percent), Acquisitions (7; 41.2 percent), Collection Development (3; 17.6 percent), and Public Services (1 library). Another 6 respondents (35.3 percent) reported additional units with responsibilities (E-Resources Management Section; E-Resources, Serials, and Database Management; Scholarly Resource Integration; and Knowledge Access and Resource Management Services), reflecting the diversity of organizational structures in place in responding institutions. Respondents were given the option of selecting more than one unit and had the option of providing their own response.

Respondents indicated that Cataloging held primary responsibility for managing batchloading activities in 12 (70.6 percent) of the 17 responding institutions. This is followed by IT or Systems (7; 41.2 percent), Acquisitions (2; 11.8 percent), and “other” (4; 23.5 percent), which includes E-Resources Management Section; E-Resources, Serials, and Database Management; Scholarly Resource Integration; and Knowledge Access and Resource Management Services. Since the responses to this question total more than 100 percent, the possibility exists that respondents had trouble deciding who was primarily responsible for managing batchloading activities. However, Cataloging and IT or Systems most frequently were identified as bearing primary responsibility for batchloading activities in the majority of libraries. None of the responding libraries reported that Public Services or Collection Development were primarily responsible.

Workflow

Of 17 respondents to the question about timeliness of batchloading, 6 (35.3 percent) reported that loading records for large packages took longer than three months; 4 (23.5 percent) said such loads occurred within one to three months; another 4 libraries said they occurred within one month; 1 respondents said within two to three weeks; and 2 reported batchloads occurred within a week. Asked whether the reported turnaround time was acceptable, 11 respondents (64.7 percent) said no, while the rest said yes.

In addition to batchloading records in the local ILS, 10 (58.8 percent) of 17 respondents reported that their libraries used other methods, such as a metasearch engine, to ensure access to collections. One might assume that libraries providing access via alternative routes may feel less pressure to load records for all their e-book holdings into their ILS, although these alternatives to batchloading may have their own challenges and disadvantages. The alternatives mentioned include WorldCat Local using the “treat as held option,” SFX Find It deep linking for conferences and proceedings within article databases, Summon from SerialsSolutions, e-indexes and aggregated index services such as Primo Central, MetaLib, and ebrary's platform. One respondent reported bypassing their ILS and loading records directly into their discovery layer (Ex Libris's Primo), with the possibility of moving the record sets to the ILS (Aleph) at a later date. The survey did not capture data that might have allowed the authors to assess the relative merits of these approaches. Further research is warranted in this area.

Respondents were asked if their libraries ensure access to titles in Google Books via the ILS. Of the 17 respondents to this question, 11 (64.7 percent) do so. Of libraries that do ensure access to Google Books, the vast majority (10 of the 11) ensure access only to Google Book titles for which their library holds a print version. Only 1 library reported that it ensured access to titles not held by the library. Ten libraries used the Google API to ensure access to Google Books through their ILS and 4 reported using other methods, such as a script written by Systems staff, persistent links for books scanned for the Google Books project (presumably the links are pushed into the corresponding print records), an OpenURL resolver (Umlaut, an open-source link resolver, in conjunction with SFX's knowledgebase), and selective searching of the Google Books database by subject. No respondents reported batchloading records into the ILS to ensure access to titles in Google Books. Because Google requires delivery of a bibliographic record for every volume a library sends for digitization, one can assume that libraries always have records for the titles they themselves have supplied to the Google Books project.

Six (35.3 percent) of 17 responding libraries reported that they ensure access to titles in HathiTrust via their ILS. Only 1 of those 6 reported that such access includes titles for which the library holds no print version. Five libraries reported using the HathiTrust API to ensure access, and 1 library reported batchloading records into the ILS to ensure access. Four libraries reported using other methods, including a cataloger manually adding links to print records and using an OpenURL resolver (Umlaut + SFX).

When asked if their libraries batchoaded records for freely available e-resources in addition to those available via Google Books and HathiTrust, 9 (56.3 percent) of 16 respondents reported batchloading records for these resources. Eight of these 9 reported that such titles are selected by subject specialists and bibliographers. One respondent said titles are loaded on the basis of patron suggestion. Four reported the titles are selected by vendors. Four respondents provided other details. One respondent reported that free titles are selected by the Digital Library Program and library administration, 1 reported that the majority of free e-resources loaded are U.S. government documents, and 1 reported loading MARCIVE records for U.S. government publications but was considering batchloading records for Open Access journals (via SFX and MarcIt) and for National Academies Press titles (all of which were made freely downloadable in PDF format in June 2011).

Workflow is directly affected by high-level policy questions, many of which remain to be answered either by individual libraries, or perhaps more usefully, in a coordinated fashion by consortia or professional library organizations. What is the role of the online catalog in a world increasingly shaped by cloud computing and network-level resource discovery? Should an institution's catalog include records for free resources? If so, which ones and how should they be selected? To what extent should an institution's online catalog replicate access to resources in HathiTrust and Google Books? Batchloading workflows (and workloads) will certainly change as these questions are addressed.

Quality Standards

A majority (14; 82.4 percent) of 17 responding libraries reported that the use of vendor-supplied metadata for digital resources has lowered their library's quality standards for bibliographic data; the rest said these metadata had not changed their standards. No respondents reported that use of vendor-supplied metadata had caused them to raise their standards.

Seventeen respondents answered the question about how they assess the quality of vendor-supplied records. All 17 reported that they used visual review by catalogers or other staff to assess the quality of vendor-supplied records, and 9 (52.9 percent) also reported using automated validation with MarcEdit or other software. (Given the size of many record sets, one might reasonably assume that any visual review by catalogers is of a sampling of records rather than of every record.) Two of the 17 responding libraries reported using other methods, specifically data analysis, in-house validation, and loading the records into a development system and evaluating them in the user interface. Ten respondents described the tools and applications they use in greater detail. Six mentioned MarcEdit; others mentioned Excel, the programming language Ruby, and locally devised Perl scripts or C++ programs to assess records. In one case, a respondent drew an important distinction between data integrity and record quality. One respondent reported using processes that are part of the ILS to identify invalid tags, indicators, and so on in external files before loading. One respondent reported loading the file of records into a test region on the local ILS for review.

Seventeen libraries reported using a variety of methods to address quality issues in the bibliographic records they load. Respondents were given several options and were able to select more than one. Sixteen reported that they deal with different levels of quality and fullness from different vendors by editing records using MarcEdit or locally devised scripts to meet local or national standards. Five (29.4 percent) reported that they also edit records manually to meet local or national standards. Nine (52.9 percent) reported that they sometimes accept and load records “as is,” and 2 reported that they have rejected records that they determined were unacceptable. One respondent said the library tried to pressure vendors to supply improved data. From these responses one could assume that different approaches are taken with different record sets, which is not unexpected given the varying levels of quality reported.

Thirteen (76.5 percent) of 17 respondents said they had rejected sets of bibliographic records because of quality issues. The respondents who had rejected records cited many reasons:

lack of authority control or subject access
bad data that would have been difficult or impossible to clean up by automated means
incomplete title fields
character encoding errors
right-to-left text orientation errors
records lacking unique identifiers
nonstandard cataloging practices, such as cataloging groups of unrelated material as a whole or creating unexpected analytic records
concerns from public service librarians that too many nonspecific subject headings would have overloaded the subject areas and made books and journals too hard to find
technical limitations of the ILS (unable to match incoming records to existing records)
the possibility of pulling together better sets of records for the same resources
invalid URLs
overlap or duplication with print records

Batchloading vendor-supplied records for electronic resources can result in multiple records in the ILS for the same title. Thirteen (76.5 percent) of 17 respondents reported that they accepted multiple records for the same title, 5 (29.4 percent) followed a single-record approach and attempted to describe all instances of an e-resource with one record (but keeping print separate), and 4 (23.5 percent) followed a single-record approach and attempted to describe all instances of a resource with one record (print and electronic). Four respondents provided more detailed answers. One said they take a single-record approach for serials but do not deduplicate for monographs; they also do not yet have a plan for how they will implement the provider-neutral standard in the ILS with batchloaded records coming from a variety of sources. Another reported accepting multiple records when print and electronic records cannot be easily matched and accepting multiple records for e-resources when they are available from multiple vendors. The third respondent reported using a single record for print and electronic serials, but separate print and electronic records for monographs and attempting to use a single record for all vendor iterations of electronic access to a title. The final respondent reported that discussions of a single-record approach were ongoing, particularly regarding serials but also regarding monographs.

One can assume that no institution has the resources to originally catalog or upgrade records for all the digital assets to which it has access, so the need to batchload records is unavoidable. As reported above, 82.4 percent of 17 responding libraries reported that the use of vendor-supplied metadata for digital resources had lowered local quality standards for bibliographic data. One might intuit that respondents perceive the quality of their catalogs as having declined as a result of batchloading. While some vendors have made efforts to improve their metadata to conform more closely to national standards, others have not. The authors believe that many public service librarians would agree that some access to resources, even if imperfect, is preferable to none, but this leaves the question of bibliographic quality standards unanswered. Technical services departments, and especially catalogers, may be increasingly called on to justify maintaining current quality standards in light of shrinking budgets and a ballooning universe of digital resources. The fundamental question will remain: are users able to find what they are seeking quickly and easily? If not, how can quality standards be modified to better serve users’ wants and needs?

Collaborative Efforts

Many respondents reported collaborative efforts to create bibliographic record sets. Two approaches were identified: collaborating with other libraries to address resources owned in common and collaborating with vendors and bibliographic utilities. Seven (38.9 percent) of 18 survey participants reported that they had collaborated with other libraries. Notably, all of these were public institutions, with 58.3 percent of the 12 public institutions reporting some collaboration with other libraries. Two libraries reported contributing to the CONSER (Cooperative Online Serials) project to catalog serials in the Directory of Open Access Journals (www.doaj.org). Two libraries mentioned that they had contributed to Committee on Institutional Cooperation (CIC) efforts to catalog collections purchased jointly, such as the Springer e-book collection. Some cataloging efforts were coordinated by OhioLINK and the University of California's Shared Cataloging Project.

Similarly, 7 (38.9 percent) of 18 survey respondents reported collaborating with vendors or utilities to create bibliographic record sets for electronic or microform collections. In this case, 3 of the 6 private institutions reported collaborative efforts with vendors, whereas only 4 of the 12 public libraries collaborated with vendors. Some libraries reported specific collaborative projects in which they participated. These fall into two categories. In the first, libraries provided feedback to vendors to help improve their ongoing services. An example of this included a library that collaborated with a vendor to develop and deliver records for e-books as part of a patron-driven acquisitions pilot project. Another library reported working through CONSER to improve the quality of serials records that are subsequently distributed through a vendor, in this case SerialsSolutions. The second type of collaboration with vendors included the cataloging of discrete collections of materials, such as The Making of Modern Law: Primary Sources, 1620–1926 collection (gdc.gale.com/products/the-making-of-modern-law-primary-sources-1620-1926) or the EEBO collections.

The survey findings demonstrate that some collaboration is taking place, but it is opportunistic rather than methodical or programmatic. Some collaboration arose from consortial purchases leading to consortial efforts to improve or create records for resources purchased jointly. Others resulted from individual libraries’ efforts to work with vendors to improve their products. This area has much room for improvement. More broadly based collaborative efforts could benefit more libraries, increase efficiency, and reduce costs.

Information Technology

Of the 17 respondents to this question, 8 (47.1 percent) reported using MarcEdit as part of the batchloading process; 3 (17.6 percent) reported using locally devised scripts; 10 (58.8 percent) reported using a combination of both MarcEdit and locally devised scripts; 4 (23.5 percent) said they use other software and scripts; 2 reported using the Millennium system load tables; 1 reported using UltraEdit and writing preprocessing programs specific to each load stream; and 1 reported using a suite of resources, including XSLT, SQL and Data Warehousing, Ruby, Perl, Primo Normalization Rules, Unix, Bash, Awk, and Excel.

A significant majority (14; 82.4 percent) of 17 respondents said that IT support is necessary to maintain batchloading at their institutions. Respondents described their IT needs in a variety of ways, including the following:

customizing records
programming
working on special projects or “tricky” problems
scripting for data transformation and automation
adapting of load tables for new record sets
troubleshooting
running system reports or load programs
batch deleting records
creating or changing authorizations for staff access to servers
creating FTP scripts to retrieve files from external sources and move them into the correct location on servers

These responses indicate a wide range of approaches, probably dependent on organizational structure, job descriptions, and local expertise; 15 (88.2 percent) of 17 respondents reported that their current ILS presents technical obstacles to managing batchloads. The 15 respondents who answered yes to this question described the following obstacles:

an inability to mark records for deletion
lack of sophistication and customizability of ILS loaders
lack of levels of granularity to support multi-campus holdings structure
ILS software upgrades that cause previously devised processes to fail
inability to effectively match and replace or overlay records
limit of loading a certain (relatively small) number of records at one time
extremely limited global edit options
unsafe batch deletion features
lack of a unique record ID in every record that is usable as a match point
“all-or-nothing” security features that preclude authorizing staff to perform certain essential functions without giving them access to “everything”
inability to make batch updates to MARC holdings records
limited system resources that make scheduling a large number of loads challenging
absence of a deduplication utility
cryptic error messages that make data correction before load difficult or impossible.

Some common themes that emerge from these responses are the challenges of record matching, batch deletion, MARC holdings management, system resources (i.e., scheduling loads including the number of records that can be loaded in a day, week, and so on), and global editing of records. No ILS is perfect, but the authors suggest the possibility of encouraging ILS vendors to build certain features deemed to be desirable for batchloading into future releases of their products. Following up with the 2 respondents who reported their ILSs present no obstacles to batchloading would be instructive. The question also arises as to whether locally maintained ILSs are the future of library asset management, or whether more and more data and database maintenance will be shifted to the cloud and third-party vendors. In cases where responsibility for batchloading is distributed across technical services and IT departments, coordination, thorough and up-to-date documentation, and effective communication are assumed to be highly desirable, if not essential.

Assessment

Survey respondents reported various methods of assessing the quality and impact of their batchloading efforts. Nine (52.9 percent) of 17 responding libraries reported using usage data as an assessment tool. End user feedback was used in 7 libraries, while only 1 library used formal end user testing as an assessment tool. Two libraries used faculty review, and 1 conducted focus groups as an assessment tool. No libraries reported the use of an end user survey. Other assessment activities reported included review by libraries-wide task forces, review by bibliographers, quality assurance testing, staff review, and error reports. One indicated that his or her library did not conduct assessment on a regular basis.

Two libraries made changes to their policies and procedures on the basis of the results of their assessment activities. In both cases, usage data revealed a notable increase in the use of resources after records for the resources were batchloaded into the catalog. This resulted in an increase in the staff resources devoted to batchloading activities to minimize the time between when the resource was available and when the records appeared in the online catalog.

All but 1 of the 17 responding libraries informed one or more constituents when a batchload was completed. Multiple answers were possible. Twelve informed all library staff, while 9 specifically informed subject specialists and selectors. Three informed academic department faculty who might be interested in particular loads. None of the responding libraries informed students or the general public about completed batchloads. One respondent's library used a wiki to post information about batchloads, 1 reported notifying the staff at the requesting library, and another indicated relying on “various informal notification channels within the Libraries.” Four libraries indicated that they relied on selectors to inform their respective communities. Some notification occurs at most libraries, but it is not consistent and often relies on the liaison role of subject specialists and selectors.

Survey respondents were asked to select what they felt were the biggest challenges related to batchloading activities facing their libraries. They could select more than one answer, but they were not asked to rank them, and they also could supply their own response. Of the 17 respondents, 14 (82.4 percent) considered inconsistent record quality to be one of the biggest batchloading challenges facing their respective libraries. This was followed by staffing (13; 76.5 percent), ongoing maintenance (10; 58.8 percent), vendor technical support (9; 52.9 percent), local technical support (8; 47.1 percent), and funding the purchase of records (6; 35.3 percent). Other challenges identified in the comments section include the need for good records from developing countries, time needed for initial analysis, managing record-quality expectations, achieving ongoing confidence in any particular source, and the proliferation of potential record sources, even for the same resource.

Respondents’ expectations for the future were very similar to their view of current challenges with batchloading. They were asked what they felt would be the biggest challenges they would face in the next five years. As in the previous question, they could select more than one answer, but were not asked to rank them, and they also could supply their own response. Fourteen (82.4 percent) of 17 responding libraries expected inconsistent record quality to continue to be their biggest challenge. Staffing was considered to be a future challenge by 14 libraries, and vendor technical support, ongoing maintenance, and funding records purchases were each identified by 11 (64.7 percent) libraries as additional challenges. Eight libraries (47.1 percent) identified local technical support as a challenge. Additional concerns identified by respondents included the future of the ILS and the number of resources with no records available at all. While inconsistent record quality was identified as the biggest current concern of respondents, the challenges libraries most expected to face in the next five years were funding, staffing, vendor technical support, and maintenance.

Maintenance

Responding libraries reported using several methods for maintenance of batchloaded records. The most frequently cited methods of the 17 responding libraries (and the percent who reported using these approaches) were the following:

notifications from vendors that titles have been deleted or added, or that new or updated MARC record sets are available (14 libraries; 82.4 percent)
feedback from patrons (13 libraries; 76.5 percent)
feedback from subject specialists (12 libraries; 70.6 percent)
regular review by catalogers or other staff (6 libraries; 35.3 percent)

Other methods reported were monthly reloading of batchloaded records, subscribing to OCLC WorldCat Collection Sets standing orders, setting up reminders in the library's ILS, running a URL checker report within the ILS, and running scripts to check with vendors for new or updated files.

Notifications about invalid links or other errors from patrons, faculty, staff, and other library users can take several forms. A website for reporting functionality problems is used in 15 (88.2 percent) of the 17 responding libraries. E-mail (used by 14; 82.4 percent) and telephone (used by 8; 47.1 percent) reports of inaccuracies also are common. Other options for reporting problems included QuestionPoint (www.questionpoint.org), chat reference, the online catalog's feedback form, in-person reporting to a staff member at a service point, and notifications sent to an error-reporting electronic discussion list (OhioLINK).

URLs are seldom checked in the respondents’ online catalogs. More than half (10 libraries; 62.5 percent) of the 16 that answered this question reported never checking the URLs in their catalogs, 3 (18.8 percent) checked URLs irregularly, 1 library checked them quarterly, and another checked monthly. One library reported running a link checker on resources included in a separate database, but never against the entire ILS. Another library reported using the SFX link resolver service whenever possible to avoid bad links.

Opportunities for Improvement

The batchloading process could be more efficient if the functionality of the ILS improved. URL checking and the ability to accurately match and overwrite records is especially important. Batch deletion also is crucial given the fluid nature of digital resources.

Collaborative efforts to improve sets of bibliographic records also would be highly advantageous. Working with vendors earlier in the bibliographic record production process would help them by creating a better product and would help libraries by providing record sets that are higher quality and easier to load. Widespread adoption of products like MarcEdit would be advantageous. The library profession as a whole would benefit from more frequent training opportunities at national and regional library conferences.

Batchloading efforts in academic libraries could benefit tremendously from a widespread adoption by vendors of a set of best practices, such as the MARC Record Guide for Monograph Aggregator Vendors, 2nd ed., which aims to provide vendors with information for producing high-quality MARC record sets acceptable to libraries.⁹ Other best practices could encourage vendors to perform quality checks, supply working links to correct titles, supply correct publication information, and ensure that the number of records match the number of resources in the aggregated package.

Finally, direct information exchange by library staff engaged in batchloading activities would be immensely useful. With consortial purchases and many libraries loading records for similar or identical lists of titles, libraries would benefit if technical services staff in different libraries could communicate directly with each other. Electronic discussion lists, blogs, and social media can be useful tools in building knowledgeable online communities available for quick consultation on technical issues. More effective communication can lead to collaborative cataloging and sharing of customized files. Forums suited to this kind of exchange do exist, such as MarcEdit-L (listserv.gmu.edu/cgi-bin/wa?A0=marcedit-l), an online discussion list. Others devoted to specific aspects of the batchloading process (or even specific vendors) would be helpful.

Areas for Future Research

This project has revealed three key aspects of batchloading bibliographic records that would prove fruitful for future research. First, the survey highlights concerns about the poor quality of vendor-supplied records. As a result, the practice of batchloading decreases the level of quality in the online catalog. How has this affected the ability of library users to find, identify, select, and obtain library materials for their research or other needs? If an effect is established, does it vary from one discipline to another? The impact of quality variances in the online catalog may be minimized by the use of discovery interfaces such as SerialsSolutions’ Summon service. Will libraries choose to perform fewer batchloads and rely instead on access to electronic resources through other interfaces, or will libraries continue to attempt to maintain the online catalog as a database of record?

Assessment is the second area in which further research would be useful for the library community. This survey indicated more than half of responding libraries conduct assessments of their batchloading efforts. The type of assessment activities varied considerably, from reviewing usage data to determine whether use increased after the batchload to conducting focus groups with end users. Assessment is critical because it can show whether an activity is worthwhile and can result in better access or other positive outcomes. Assessment can inform library administrators whether the investment in staff and monetary resources is beneficial to the library's mission. A more in-depth study of assessment practices and assessment findings would be useful to other libraries and could provide guidance on how to approach such an activity.

The third area for future research is the effect of collaborative efforts. This survey revealed that many of the responding libraries had participated in one or more collaborative effort related to batchloading projects, but they reported little consistency, and the efforts were not repeated. Delving more deeply into what collaborative efforts had taken place, whether they were successful, how the success or lack of it was determined (i.e., what were the assessment criteria?), and how future collaborative efforts can be fostered would be informative.

Conclusion

The literature reviewed for this paper revealed that many libraries are facing challenges in managing their batchloading activities. The researchers conducted a survey of how large research libraries manage the batchloading of MARC bibliographic records for electronic and microform resources into the online catalog with the aim of investigating the impact of batchloading records on the policies and procedures of academic and research libraries. The ALCTS Big Heads Interest Group was selected as an appropriate population for this study because the researchers believed that they were likely to be heavily involved in batchloading activities and would be likely to respond. The survey was completed by 18 (60 percent) of the Big Heads member libraries.

The survey results revealed that all of the responding libraries were involved in batchloading MARC bibliographic records into the online catalog, and a sizeable portion of their catalogs consisted of records that were batchloaded rather than individually loaded. A majority of respondents (65 percent) were not satisfied with their current workflow, indicating that the loading of records was not sufficiently timely. Quality of records was a major concern for survey respondents, and collaborative efforts to address these issues were sporadic and opportunistic. IT presented a number of problems to libraries engaged in batchloading activities, including ILS functionality problems, troubleshooting, and the need for specialized programming.

As long as online catalogs are considered the database of record for library collections, batchloading of bibliographic records will continue to be an important part of libraries’ strategy for providing access to aggregated electronic resources and microform collections. This study revealed that most libraries anticipate that batchloading activities will increase during the next five years and that the challenges libraries face (e.g., staffing, vendor technical support, ongoing maintenance, funds for record purchase, and local technical support) will remain important.

Several areas would improve libraries’ abilities to perform batchloading activities, including better ILS functionality, improved vendor support, increased collaborative efforts, better training in the use of tools such as MarcEdit, and better communication. Tracking the application of discovery-layer software and its effect on the user experience, and whether it will replace some or all of libraries’ batchloading efforts, will be interesting. Best practices will be directly affected by high-level policy questions, many of which remain to be answered. Further research is warranted, especially in the areas of quality control, assessment, and collaboration. Increased collaboration with each other and with vendors could significantly improve libraries’ ability to help users find and use the materials they need for their research.


Batchloading MARC Bibliographic Records: Current Practices and Future Challenges in Large Research Libraries
	Rebecca L. Mugridge, Jeff Edmunds
	Rebecca L. Mugridge is Head, Cataloging and Metadata Services; iym6@psu.edu
	Jeff Edmunds is Digital Access Coordinator, Pennsylvania State University Libraries, University Park, Pennsylvania; jhe2@psu.edu

Abstract	Research libraries are using batchloading to provide access to many resources that they would otherwise be unable to catalog given the staff and other resources available. To explore how such libraries are managing their batchloading activities, the authors conducted a survey of the Association for Library Collections and Technical Services Directors of Large Research Libraries Interest Group member libraries. The survey addressed staffing, budgets, scope, workflow, management, quality standards, information technology support, collaborative efforts, and assessment of batchloading activities. The authors provide an analysis of the survey results along with suggestions for process improvements and future research.

Institution	Public or Private	Total Staff (FTE)	Staff Devoted to Batchloading (FTE)	Total Acquisitions Budget	Expenditures for E-resources
Cornell University	Private	513	2.4	14,917,133	8,256,470
Harvard University	Private	938	2	32,341,358	9,335,310
New York University	Private	530	1.5	20,461,642	12,112,955
Princeton University	Private	370	3	23,156,840	10,487,102
Stanford University	Private	930	4	Not available	Not available
Yale University Library	Private	716	11	31,340,632	8,299,701
Indiana University	Public	341	7	13,490,434	7,623,775
Ohio State University	Public	750	6	11,954,846	7,191,692
Pennsylvania State University	Public	589	1.15	17,953,463	11,404,651
University of Alberta	Public	295	5	19,446,396	13,836,448
University of California, Berkeley	Public	571	7	17,846,646	7,648,665
University of Illinois at Urbana-Champaign	Public	513	7	15,281,388	7,908,799
University of Minnesota	Public	350	0.8	17,008,958	9,797,966
University of North Carolina at Chapel Hill	Public	155	4	16,970,946	7,046,460
University of Texas at Austin	Public	311	4	17,392,118	7,120,110
University of Virginia	Public	544	2	10,352,942	5,893,290
University of Washington	Public	440	2	14,842,396	8,581,484
University of Wisconsin—Madison	Public	353	1.5	11,522,129	7,081,468


1.	Kristin E Martin, ""Cataloging E-Books: An Overview of Issues and Challenges,"," Against the Grain (2007) 19, no. 1: 45–47.
2.	Rebecca L. Mugridge and Jeff Edmunds, ""Using Batchloading to Improve Access to Electronic and Microform Collections,"," Library Resources & Technical Services (2009) 53, no. 1: 53–61.
3.	Annie Wu and Anne M Mitchell, ""Mass Management of E-Book Catalog Records: Approaches, Challenges, and Solutions,"," Library Resources & Technical Services (2010) 54, no. 3: 164–74.
4.	Terry Reese, ""Automated Metadata Harvesting: Low-Barrier MARC Record Generation from OAI-PMH Repository Stores Using MarcEdit,"," Library Collections & Technical Services (Apr. 2009) 53, no. 2: 121.
5.	KristinMartin EKristinMartinE , Mundle Kavita, ""Cataloging E-Books and Vendor Records: A Case Study at the University of Illinois at Chicago,"," Library Resources & Technical Services (2010) 54, no. 4: 227–37.
6.	Ksenija Minčić-Obradović, E-Books in Academic Libraries (Chandos: Oxford, 2011): 89-94.
7.	Carrie A Preston, ""Cooperative E-Book Cataloging in the OhioLINK Library Consortium,"," Cataloging & Classification Quarterly (2011) 49, no. 4: 257–76.
8.	Anna Grigson, "Making E-Book Collections Visible to Readers,"," in in E-Books in Libraries: A Practical Guide , ed. Kate Price and Virginia Havergal, 139-61 (London: Facet, 2011) ; 139–61.
9.	MARC Record Guide for Monograph Aggregator Vendors, 2nd edition: Includes Revisions to September 2011 (Washington, D.C: Program for Cooperative Cataloging, 2011): www.loc.gov/catdir/pcc/sca/FinalVendorGuide.pdf (accessed Nov. 14, 2011)..


Article Categories: Library and Information Science ARTICLES