Copying & Pasting Can Cause Havoc in MARC Records

Written by Joan on September 17, 2018. Posted in Blog, General

September 17, 2018

Recently we have received records from libraries containing fields with data that contains incompatible character encodings.  The majority of these fields appear to be summaries, annotations, or tables of contents that may have been copied from websites like Google Books or Amazon.com and pasted into the bibliographic record.

Some of the data used on websites in summaries or tables of contents contains character encodings that are not compatible with a MARC bibliographic record.  Additionally, MARC-8 characters are never appropriate on a web page, so any special MARC-8 data is always incompatible with web data.  Then, when all of the different Web encodings (UNICODE/UCS, UTF-8, UTF-32, Windows 1252, etc.) are added to MARC data, it becomes a conglomeration of many different character sets that is often incompatible with library systems.  We have seen situations in which the special character that appeared as a quotation on a website was represented as a field terminator in the data received by MARCIVE, which makes the record more challenging to process.

Another issue is when a very long summary or table of contents is pasted into a 520 or 505 tag in a MARC record.  There is a size limit for individual fields in a MARC record (9999 bytes), as well as a size limit for the entire MARC record (99999 bytes).  Sometimes these summaries and tables of contents exceed the limits, either for a field or for the entire record.

A few examples of issues with this type of data:

⇒Quotations:  Appears on website   …chance for a “real job”….

The quotations are interpreted by the local system or text editor as something different, and they are incorrect when exported from the library’s system, therefore incorrect when received by MARCIVE. They can appear as question marks or other representations, depending on the editor or function used to view or process the records and how it handles characters that are not valid for bibliographic records.

⇒Apostrophe:  Appears on website   …Jacob’s opinion…..

The apostrophe is interpreted by the local system or text editor as something different, and the data is received by MARCIVE with incompatible coding.

⇒Special characters – non-English:   Appears on website “sueño” and received by MARCIVE as “suñeo”

Depending on how the data is used or displayed, this may appear as “sue?no” or “suñeo” or some other form.  The code for ñ is not being interpreted correctly; it is not the correct code for this special character in MARC records

While we are happy to resolve these problems, sometimes it is not always apparent to the end user where the problems with diacritics and other special characters originated.  Therefore it is good to be aware of the issues that arise from cutting text from websites and pasting it into a MARC record.

The best way to include summaries and Tables of Contents in your data is to have it added by a vendor like MARCIVE that has put procedures in place to make the data compatible with MARC records, or within your system by entering the data using the tools for cataloging bibliographic data or a MARC editor tool.

Written by Carol Love, Programmer/Analyst and Joan Chapa, MLS

print

Tags: , , ,

havequestion_LFT