Blick ins Buch

Practice R (eBook)

An interactive textbook

Edgar J. Treischl (Autor)

eBook Download: EPUB

2023
397 Seiten
De Gruyter (Verlag)
978-3-11-070508-9 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (EPUB)

Many students learn to analyze data using commercial packages, even though there is an open-source software with cutting-edge possibilities: R, a programming language with countless cool features for applied empirical research.

Practice R introduces R to social science students, inspiring them to consider R as an excellent choice. In a non-technical pragmatic way, this book covers all typical steps of applied empirical research.

Learn how to prepare, analyze, and visualize data in R. Discover how to collect data, generate reports, or automate error-prone tasks.

The book is accompanied by an R package. This provides further learning materials that include interactive tutorials, challenging you with typical problems of applied research. This way, you can immediately practice the knowledge you have learned. The package also includes the source code of each chapter and templates that help to create reports.

Practice R has social science students in mind, nonetheless a broader audience may use Practice R to become a proficient R user.

Edgar J. Treischl is a postdoctoral researcher. He studied sociology in Munich and attained his PhD at the University of Mannheim. His research interests include educational research, evaluation, and data science.

Part I: The first steps

1 Introduction

R is a programming language and a powerful tool to analyze data, but R has a lot more to offer than statistics. To mention just a few options, R has many capabilities to visualize data, to collect data (e.g., from a website), or even to create interactive dashboards. From this perspective it is no wonder why R has a huge fan base. Unfortunately, learning R can be though. People who struggle may say that the data handling is complicated, some complain that R lacks a graphical interface, and probably all agree that beginners face a rather steep learning curve. Regardless of our perception, the best way to learn R is by means of practice. For this reason, this book introduces R, focuses on the most important steps for applied empirical research, and explains how to use R in practice. After reading and working on the materials in this book, you will be able to prepare and analyze data, make visualizations, and communicate key research insights.

Who should read this book? Overall, the book introduces R and is written for people with no prior knowledge about it. However, Practice R is a textbook for the social sciences, and it is assumed that the reader has prior knowledge in statistics and quantitative methods. Practice R might not be the first choice if you have yet to learn what a standard deviation, Pearson’s r, or a t-test is. The same applies for topics of quantitative empirical research. I presume that the reader has knowledge about research designs, is familiar with the difference between cross-sectional and longitudinal data, and other aspects that intermingle with statistics, seeing that quantitative methods are a substantial part of the social science curriculum. Of course, this does not mean that only (social science) students can profit from reading the book. A diverse audience – holding the assumed prior knowledge – may use Practice R to become a proficient R user.

To support you, the book is accompanied by an R package. An R package is a software add-on and extends the capabilities of R. In our case, the PracticeR package gives you access to tutorials to practice the discussed content, it provides the code of this book, and also further materials (e.g., a template to create reports) that are supposed to boost your skills. We will learn how to install R packages in the next chapter, but keep in mind that all materials of the book become available once the PracticeR package is installed.

Let me outline the idea of the tutorials and how they are related to the content of the book. The tutorials summarize the content and aim to familiarize you with the core concepts. The interactive tutorials are integrated in R and run on your computer. By clicking on the Run button, R code will be executed, and the tutorial shows the results. Don’t mind if something goes wrong, you can reload and start over at the click of a button. As an illustration, Figure 1.1 shows a screenshot of the Basics of Data Manipulation (Chapter 4) tutorial. It summarizes how to filter, arrange, and select data. Irrespective of the topic, each tutorial probes you to apply the discussed content. The exercises in the tutorials aim to increase your coding skills and they are ordered ascendingly by difficulty. Sometimes I’ll ask you to adjust the R code, which gives you a better understanding of how the code works. In most instances I will challenge you with typical data analyzing problems. In the more advanced steps, you are supposed to transfer the discussed content to a similar or a new concept. Don’t worry, hints are provided to solve the exercises and the tutorials include the solutions. Now that the scope is set, we can divulge the content of Practice R.

Fig. 1.1: Example tutorial

The content

Part I lays the foundation and outlines the first steps to work with R:

–
Chapter 2 introduces R and RStudio, which is an integrated development environment to work with R. The chapter contains the most important steps to understand how R behaves and outlines in depth how RStudio substantially helps us to increase our R skills. We install both software packages and we discover some of the cool features of RStudio. Next, I give a concise introduction of base R– the programming language – which is essential for subsequent steps. Moreover, the chapter makes you familiar with data types and structures.
–
In Chapter 3 we start to explore data. We examine variables, we calculate and visualize descriptive statistics, and we explore how variables are related. We estimate the correlation between two variables, visualize the effect, and interpret the effect size. Data exploration is crucial when we start to work with data. For this reason, this chapter also highlights packages and ways to get a quick overview of new and unfamiliar data. For example, some packages implement graphs to examine several variables at once; others can generate a PDF report with summary statistics for all variables of a particular data set. Thus, we explore variables, and we get in touch with packages that help us to discover unfamiliar data.
–
Chapter 4 focuses on data manipulation steps and introduces the dplyr package (Wickham, François, et al., 2022). The latter is the Swiss pocketknife for manipulating data. I introduce the main functions of the package and we will focus on typical steps to prepare data for an analysis. Before we can dive into this topic in the second part, we should take one step back. The last part of this chapter highlights strategies to increase the workflow and, consequently, the efficiency of our work. For example, you may wonder how much R code you need to remember to become an efficient R user. The last section outlines in detail why there is no need to memorize code and introduces strategies to handle (complicated) code.

Part II introduces the basics to analyze data, visualize results, and create reports:

–
Chapter 5 outlines the data preparation steps required before we can start to analyze data. We learn how to import data and how to cope with problems that may occur. Depending on the data, the import step may induce errors, but the same may apply during the data cleaning steps, and we should consider the concerns of missing (and implausible) values. Finally, I introduce the main functions from the forcats package (Wickham, 2022a). The package is made for categorical variables and is a good supplement to our data manipulation skills since categorical variables are often used in social sciences.
–
We analyze data in Chapter 6. There is a broad range of possibilities to analyze data with R, however, we apply a linear regression analysis, because it is the workhorse of social science research. First, I give an non-technical introduction for people with a different educational background. Next, we run an example analysis that we will improve step by step. We learn how to develop the model, we examine interaction effects, and we compare the performance of the estimated models. To compare models and to examine the assumption of a linear regression analysis, we also focus on visualization techniques.
–
To visualize research findings, Chapter 7 concentrates on the ggplot2 package (Wickham, Chang, et al., 2022). The package can be quite demanding in the beginning, but we will learn that creating a graph without much customization is far from rocket science. We first focus on typical steps to create and adjust a graph (e.g., adjust a title). Next, we increase the theoretical knowledge by exploring how ggplot2 works behind the curtain. Ultimately, there are a lot of packages that extend the possibilities of ggplot2. The last section highlights some of these possibilities.
–
Chapter 8 focuses on reporting. After the analysis and the visualization step, we need to summarize the findings in a document and the rmarkdown package makes it possible to create text documents with R (Allaire, Xie, McPherson, et al., 2022). An rmarkdown file contains text, graphs, or tables, just like any other text document. However, it is code-based and also contains output from R. Thus, we create tables and graphs with R and include them in the rmarkdown document. Using code to create the report increases the reproducibility of the work and we avoid introducing errors, because we eliminated the need to transfer output from R into a word processing software.

Part III completes the basics and focuses on topics that – at first glance – seem less related to applied empirical research, but that will add to your skill set:

...

Erscheint lt. Verlag	8.5.2023
Zusatzinfo	28 b/w and 103 col. ill., 1 b/w and 5 col. tbl.
Sprache	englisch
Themenwelt	Sozialwissenschaften ► Soziologie
Schlagworte	Data Analysis • Data Preparation • Datenanalyse • Datenverarbeitung • Empirical Research • Empirische Forschung • R • Software • Visualization
ISBN-10	3-11-070508-7 / 3110705087
ISBN-13	978-3-11-070508-9 / 9783110705089

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

EPUB (Wasserzeichen)
Größe: 11,1 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür die kostenlose Software Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

CHF 48,90