Data Clean-Up and Management - Kenneth Furuta, Margaret Hogarth

Data Clean-Up and Management (eBook)

A Practical Guide for Librarians

Kenneth Furuta, Margaret Hogarth (Autoren)

eBook Download: EPUB

2012 | 1. Auflage
578 Seiten
Elsevier Science (Verlag)
978-1-78063-347-3 (ISBN)

Data use in the library has specific characteristics and common problems. Data Clean-up and Management addresses these, and provides methods to clean up frequently-occurring data problems using readily-available applications. The authors highlight the importance and methods of data analysis and presentation, and offer guidelines and recommendations for a data quality policy. The book gives step-by-step how-to directions for common dirty data issues. - Focused towards libraries and practicing librarians - Deals with practical, real-life issues and addresses common problems that all libraries face - Offers cradle-to-grave treatment for preparing and using data, including download, clean-up, management, analysis and presentation

Margaret Hogarth is Electronic Resources Coordinator and Subject Specialist for Environmental Sciences, Water and Soils for the University of California, Riverside Libraries. She has a B.A. in English from the University of California, Santa Barbara, an MLIS from San Jose State University and an M.S. in Environmental Studies from California State University, Fullerton. She has been a librarian since 1999.

Introduction (why this book is needed)

Abstract:

This book is a practical guide for librarians, which will introduce techniques to enable you to manipulate data needed for reporting and decision making. The techniques enable assembly of data sets from different sources to support decisions. It is a cradle to grave discussion of issues and techniques in data gathering, cleaning, preparation and presentation.

Key words

needs

goals

practical

manipulate

data

techniques

A co-worker stops by your office, smiles and says, “I want to compare two databases for journal overlap and unique titles. Is there a way to get a list, or do I have to compare each side by side?”

You think about it for a second and realize that reviewing three lists will save her time. One list contains titles in both databases, the others are titles unique to each database. So you say, “I think I can do that. Is there a title list for each?”

Your co-worker says, “Yes. I will send you some links.”

When you open the links, you discover one is a spreadsheet, the other is a list on a web page. You look for and find ISSNs for each title on both lists. You conclude, “I can create the lists in Access and present them in Excel fairly quickly.” And you do so later that evening. (Indeed, you would have finished faster but your favorite TV show was on.) The next morning you email the file to your co-worker. She opens it and stops by your office to say, “Thank you, this is just what I wanted.”

Libraries need to look at data from different systems, combined in useful ways. How much do the journal subscriptions cost and what is their circulation and usage? How many journals do we subscribe to for a particular discipline? What percentage of journal titles available full text in an aggregator database covers that discipline? Can we apply the same percentage to our budget? We need to cut 10 percent of the titles from an expensive package; how do we determine which journals to cut? Which journals costing more than $2000/year are used so infrequently that we could use interlibrary loan or print on demand or other delivery options, and still save money? To answer these questions may involve combining acquisitions, bibliographic, circulation, usage, vendor sites, OpenURL, interlibrary loan and perhaps other data. Perhaps staff have enough of an understanding of the library’s data environment to be able to answer the question, but only after laborious download, clean-up, combining and formatting. We hope this book and the readily available techniques described in it will help make that process less onerous.

What makes this book unique?

This book is a practical guide for librarians, which will introduce techniques to enable you to create lists similar to the scenarios above and to assemble other data sets from different sources to support decisions.

Our focus is on merging and manipulating library data. Our examples are based on issues and problems we face in our institutions. Library literature contains many excellent papers and books on using and presenting data. An example is the recent book, Library Data (Orcutt 2010), which contains innovative and compelling examples of the presentation of library statistics. Another example, Viewing Library Metrics from Different Perspectives (Dugan et al., 2009), makes an important contribution in selecting, using data and presenting data to different stakeholders.

However, the literature is silent on how to create that final product. This book seeks to fill that void by giving a cradle to grave discussion of issues and techniques in data gathering, cleaning, preparation and presentation. Although there are many tools available, we will mainly discuss the manipulation of data using Excel and Access.

Why library data is important

Libraries are awash in data. It is collected from gate counts, reference transactions, subscription prices, average costs for books… Although we gather it, we may not put it to its best use based on the outcome you want or need. Indeed, the first step in assembling data is to think about the project’s needs and goals. For example, are you gathering statistics for the Association of Research Libraries (ARL), writing a grant, or determining the cost–benefit of a journal or service point? Each project requires its own resources, techniques and tools.

If the needs and goals are not defined at the beginning of a project, all the thought and time spent in assembling the analysis will be lost. For example, during a journal cancellation project one of the more important pieces of information is the use, electronic and circulation, of a title. Other measures, such as the impact factor, may not be as important in your final decision. Therefore, your time will be better spent by focusing on articles downloaded than on impact factor.

We need quality data about our libraries to make the best decisions on allocating scarce budget resources. If your journal cancellation data does not allow for a good comparison of titles, then you may cancel a highly used title in favor of one that is not as useful. This brings up the concept of “opportunity costs”: the lost cost of not choosing the next best product. If your analysis cannot create a reliable ranking of products, your decisions will not be optimal.

There are many undesirable outcomes of using dirty or incomplete data. We may not pull data from the best resources and merge it together in a coherent package that supports effective decision making. For example, a circulation desk’s traffic pattern may be best understood by viewing data from a variety of sources. They could include the time and date of checkouts from the catalog, gate counts and a tally of questions asked from a third source. Combining those can give a comprehensive picture useful to make staffing decisions.

It is often impossible to obtain completely accurate data. In that case, it is vital to understand the limits to accuracy before making the decision. The decision is the important outcome, not clean data.

The poor decisions made from dirty, incomplete or poorly understood data may ultimately result in losing patrons. An example may be establishing reference desk hours at your library. What patrons are you missing if you shorten the hours? A quality data set should include the foundations for analysis, for example, how many patrons are there in a given time period, are they the primary clientele, what questions are they asking, and/or are they turning to chat reference when you’re not available? If your dataset cannot address these questions, the managers will be forced to rely on anecdotal evidence to inform the decision.

The book’s outline

In this book, we will start by discussing common data sources and issues that all libraries face (Chapter 2). For example, we all get usage data from the same or similar platforms and vendors. We also track circulation and reference statistics from in-house systems or procedures. We use similar tools such as Excel or MarcEdit to process the data. We face similar challenges in cleaning up the data and processing it for use when making decisions in collection development or staffing, or advocating for larger budgets.

In Chapter 3 we note the importance of understanding what data really measures when using it. We touch on the importance of making sure you are really counting what you need and not something different, and the need for all staff members to use the same definitions and count the same way. If not, then you may be comparing apples to oranges.

We begin to discuss processing data in Chapter 4 with a general overview and issues that should be considered, not least combining the strengths of Excel and Access to produce a comprehensive result.

Chapter 5 through 8 continue to focus on library data clean-up using Excel and other tools. We address solutions to common problems such as Excel dropping the leading zero in an ISSN or ISBN, cleaning up non-printing characters, and combining data sets from different sources.

Chapter 9 through 14 explore using Access. The topics addressed range from creating forms for input to generating reports, and “querying” the data to retrieve just what is needed.

Chapter 15 looks at strategies for dealing with missing data. Although the main thrust of the book is on quantitative data, Chapter 16 addresses qualitative data, which is also important. Chapter 17 gives a brief overview of return on investment (ROI). Chapter 18 explores ways of presenting data to others. Here we complement the books mentioned above by exploring techniques to produce their results.

Chapter 19 closes the circle that started in Chapter 2. In some ways, the need for the methods we present are work-arounds needed because of the poor quality of available data. In this chapter we focus on the need for library data policies in which data quality is explored and defined systematically. We close by considering next steps (Chapter 20). Cloud computing has enabled easier data access. We explore possible library applications.

Solution to scenario

This...

Erscheint lt. Verlag	22.10.2012
Sprache	englisch
Themenwelt	Geisteswissenschaften ► Sprach- / Literaturwissenschaft
	Sozialwissenschaften ► Kommunikation / Medien ► Buchhandel / Bibliothekswesen
	Wirtschaft ► Betriebswirtschaft / Management ► Unternehmensführung / Management
	Wirtschaft ► Betriebswirtschaft / Management ► Wirtschaftsinformatik
ISBN-10	1-78063-347-5 / 1780633475
ISBN-13	978-1-78063-347-3 / 9781780633473

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

EPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

CHF 159,95