The Basics of S-PLUS (eBook)
XXII, 444 Seiten
Springer New York (Verlag)
978-0-387-28390-6 (ISBN)
Proven bestseller: almost 6000 copies sold in the U.S. in two editions
New edition updated to cover S-PLUS 6.0
Can be used as an introduction to R, as well as S-PLUS
New exercises have been added; Includes a comparison of S-PLUS and R
Well-suited for self-study
Thisisnowthefourtheditionof"e;TheBasicsofS-Plus"e;since1997.S-Plus saw a steady growth in popularity, and it established itself in many edu- tional and business places as a major data analysis tool.S-Plus is valued for its modern, interactive data analysis environment, whether it is the p- mary system or a complement to other standards like SAS (the latter is in particular true for the industry we work in, pharmaceuticals). We have followed the various releases with new editions of our book, introducing over time major changes like the incorporation of S Version 4 (the underlying language), Trellis graphs, a graphical user interface, in particular for the Windows operating system, and a chapter on R and its di?erencestoS-Plus(thatareminorforthematerialcoveredinthisbook). Thiseditionisanupdatefromedition3tocovernewfunctionsandfeatures ofS-Plus Version 7.0 (working from the beta release for MS Windows and Linux), adding more practical tips and examples, and correcting a few mistakes. We are very grateful to all our readers, in particular those sending us suggestions, comments, and any other kind of feedback. You will see some of these re?ected in the book.
Preface 6
Contents 9
Figures 17
Tables 21
1 Introduction 23
1.1 The History of S and S-Plus 24
1.2 S-Plus on Different Operating Systems 26
1.3 Notational Conventions 28
2 Graphical User Interface 31
2.1 Introduction 31
2.2 System Overview 32
2.2.1 Using a Mouse 33
2.2.2 Object Explorer 33
2.2.3 Commands Window 33
2.2.4 Toolbars 34
2.2.5 Graph Sheets 34
2.2.6 Script Window 34
2.3 Getting Started with the Interface 35
2.3.1 Importing Data 35
2.3.2 Graphs 35
2.3.3 Data and Statistics 37
2.3.4 Customizing the Toolbars 37
2.3.5 Chapters 38
2.4 Detailed Use of the GUI Interface 40
2.5 Object Explorer 40
2.6 Help 41
2.7 Data Export 43
2.8 Working Directory 45
2.9 Data Import 46
2.10 Data Summaries 49
2.11 Graphs 51
2.12 Trellis Graphs 58
2.13 Linear Regression 60
2.14 PowerPoint (Windows Only) 64
2.15 Excel (Windows Only) 66
2.16 Script Window 67
2.17 UNIX/Linux GUI 69
2.18 Summary 78
2.19 Exercises 79
2.20 Solutions 80
3 A First Session 95
3.1 General Information 95
3.1.1 Starting and Quitting 96
3.1.2 The Help System 97
3.1.3 Before Beginning 97
3.2 Simple Structures 98
3.2.1 Arithmetic Operators 98
3.2.2 Assignments 99
3.2.3 The Concatenate Command: c 101
3.2.4 The Sequence Command: seq 102
3.2.5 The Replicate Command: rep 103
3.3 Mathematical Operations 104
3.4 Use of Brackets 106
3.5 Logical Values 107
3.6 Review 110
3.7 Exercises 113
3.8 Solutions 114
4 A Second Session 117
4.1 Constructing and Manipulating Data 117
4.1.1 Matrices 118
4.1.2 Arrays 123
4.1.3 Data Frames 126
4.1.4 Lists 129
4.2 Introduction to Functions 130
4.3 Introduction to Missing Values 131
4.4 Merging Data 132
4.5 Putting It All Together 133
4.6 Exercises 136
4.7 Solutions 138
5 Graphics 147
5.1 Basic Graphics Commands 147
5.2 Graphics Devices 148
5.2.1 Working with Multiple Graphics Devices 150
5.3 Plotting Data 150
5.3.1 The plot Command 151
5.3.2 Modifying the Data Display 152
5.3.3 Modifying Figure Elements 153
5.4 Adding Elements to Existing Plots 155
5.4.1 Functions to Add Elements to Graphs 155
5.4.2 More About 157
5.4.3 More on Adding Axes 157
5.4.4 Adding Text to Graphs 159
5.5 Setting Options 160
5.6 Figure Layouts 162
5.6.1 Layouts Using Trellis Graphs 162
5.6.2 Matrices of Graphs 162
5.6.3 Multiple-Screen Graphs 163
5.6.4 Figures of Speci.ed Size 164
5.7 Exercises 167
5.8 Solutions 168
6 Trellis Graphics 175
6.1 An Example 176
6.2 Trellis Basics 178
6.2.1 Trellis Syntax 178
6.2.2 Trellis Functions 179
6.2.3 Displaying and Storing Graphs 179
6.3 Output Devices 180
6.4 Customizing Trellis Graphs 182
6.4.1 Setting Options 182
6.4.2 Arranging the Layout of a Trellis Graph 183
6.4.3 Ordering of Graphs 185
6.4.4 Axis Customization 186
6.4.5 Modifying Panel Strips 187
6.4.6 Arranging Several Graphs on a Single Page 187
6.4.7 Updating Existing Trellis Graphs 189
6.4.8 Writing Panel Functions 190
6.5 Further Trellis Hints 193
6.5.1 Useful General Trellis Settings 194
6.5.2 Graphing Individual Pro.les 195
6.5.3 Preparing Data to Use for Trellis 196
6.5.4 The subset Option 197
6.5.5 Adding a Key 197
6.5.6 The subscripts Option in Panel Functions 199
6.6 Exercises 203
6.7 Solutions 205
7 Exploring Data 215
7.1 Descriptive Data Exploration 215
7.2 Graphical Exploration 226
7.2.1 Interactive Dynamic Graphics 241
7.2.2 Old-Style Graphics 241
7.3 Distributions and Related Functions 242
7.4 Confirmatory Statistics and Hypothesis Testing 247
7.5 Missing and In.nite Values 253
7.5.1 Testing for Missing Values 254
7.5.2 Supplying Data with Missing Values to Functions 254
7.5.3 Missing Values in Graphs 255
7.5.4 Infinite Values 255
7.6 Exercises 257
7.7 Solutions 260
8 Statistical Modeling 273
8.1 Introductory Examples 273
8.1.1 Regression 273
8.1.2 Regression Diagnostics 275
8.2 Statistical Models 277
8.3 Model Syntax 278
8.4 Regression 279
8.4.1 Linear Regression and Modeling Techniques 280
8.4.2 ANOVA 283
8.4.3 Logistic Regression 285
8.4.4 Survival Data Analysis 287
8.4.5 Endnote 289
8.5 Exercises 290
8.6 Solutions 293
9 Programming 307
9.1 Lists 307
9.1.1 Adding and Deleting List Elements 309
9.1.2 Naming List Elements 310
9.1.3 Applying the Same Function to List Elements 312
9.1.4 Unlisting a List 316
9.1.5 Generating a List by Using 316
9.2 Writing Functions 316
9.2.1 Documenting Functions 319
9.2.2 Scope of Variables 319
9.2.3 Parameters and Defaults 320
9.2.4 Passing an Unspeci.ed Number of Parameters to a Function 322
9.2.5 Testing for Existence of an Argument 323
9.2.6 Returning Warnings and Errors 323
9.2.7 Using Function Arguments in Graphics Labels 324
9.3 Iteration 325
9.3.1 The for Loop 325
9.3.2 The while Loop 326
9.3.3 The repeat Loop 327
9.3.4 Vectorizing a Loop 327
9.3.5 Large Loops 329
9.4 Debugging: Searching for Errors 330
9.4.1 Syntax Errors 331
9.4.2 Invalid Arguments 332
9.4.3 Execution or Run-Time Errors 332
9.4.4 Logical Errors 333
9.5 Output Using the 336
9.6 The paste Function 338
9.7 Exercises 340
9.8 Solutions 341
10 Object-Oriented Programming 345
10.1 Creating Classes and Objects 347
10.2 Creating Methods 350
10.3 Debugging 355
10.4 Help 356
10.5 Summary and Overview 356
10.6 Exercises 357
10.7 Solutions 358
11 Input and Output 371
11.1 Reading Commands from a File:The source Function 371
11.2 Data Import/Export: Easiest Method 372
11.3 Data Import/Export: General Method 374
11.4 Data Import/Export: Basic Method 375
11.5 Reading Data from the Terminal 376
11.6 Editing Data 377
11.7 Transferring Data: The data.dump and data.restore Functions 378
11.8 Recording a Session 378
11.9 Exercises 380
11.10 Solutions 381
12 Tips and Tricks 385
12.1 Useful Techniques 385
12.1.1 Housekeeping: Cleaning Up Directories 385
12.1.2 Storing and Restoring Graphical Parameters 386
12.1.3 Naming of Objects 386
12.1.4 Repeating Commands 387
12.2 Programming Environment and Techniques 388
12.2.1 The Process of Developing a Function 388
12.2.2 Setting up an Editor and Running the Code in S-Plus 388
12.2.3 Treating Data Frames as Lists 390
12.2.4 Working with Graph Sheets 391
12.2.5 Incorporating and Accessing C and Fortran Programs 393
12.2.6 Batch Jobs 396
12.2.7 Libraries 398
12.3 Factors 400
12.3.1 Creating Factors and Ordered Factors 400
12.3.2 Internal Representation of Factors 402
12.3.3 Where Levels Play a Role 403
12.3.4 Where Factors Can Lead Their Own Lives 404
12.3.5 How Factors Come Into Life 406
12.3.6 Adding and Dropping Factor Levels 407
12.4 Including Graphs in Text Processors 408
12.4.1 Generating Graphs for Windows Applications 409
12.4.2 Generating PostScript Graphs 410
12.4.3 PostScript Graphs in LATEX 411
12.4.4 If You Don’t Have a PostScript Printer 412
12.4.5 Greek Letters in Graphs 412
12.5 Exercises 414
12.6 Solutions 416
13 S-Plus Internals 423
13.1 How S-Plus Works Under UNIX 423
13.1.1 The Working Chapter 424
13.1.2 Customization on Start-Up and Exit 424
13.2 How S-Plus Works Under Windows 426
13.2.1 Command Line Options 426
13.2.2 Start-up and Exit Functions 427
13.2.3 How the Script Window works 428
13.3 Storing Mechanism 429
13.4 Levels of Calls 430
13.5 Exercises 432
13.6 Solutions 433
14 Information Sources on and Around S-Plus 435
14.1 Insightful 435
14.2 S-News: Exchanging Information with Other Users 436
14.3 The StatLib Server 436
14.4 What Next? 437
15 R 439
15.1 Development 440
15.2 Some Similarities Between R and S 440
15.3 Some Differences Between R and S 440
15.3.1 Language 441
15.3.2 Libraries 442
15.3.3 Trellis-Type Graphs 442
15.3.4 Colors and Lines 443
15.3.5 Data Import and Export Formats 443
15.3.6 Memory Handling 443
15.3.7 Mathematical Formulae in Graphs 443
15.3.8 Graphical User Interfaces 443
15.3.9 Start-Up Mechanism 444
15.3.10 Windows Integration 444
15.3.11 Support 444
15.4 Summary 445
16 Bibliography 447
16.1 Print Bibliography 447
16.2 On-Line Bibliography 449
16.2.1 S–PLUS Related Sources 449
16.2.2 TEX- Related Sources 451
16.2.3 Other Sources 451
Index 453
7 Exploring Data (p. 193)
In the preceding chapters, we have laid the foundation for understanding the concepts and ideas of the S-Plus system. We explored basic ideas and how to use S-Plus for performing calculations, and we have seen how data can be generated, stored, and accessed. Furthermore, we also looked at how data can be displayed graphically. All this will be useful as we explore real data sets in this chapter. We will explore data sets that come with S-Plus, speci.cally the Barley and Geyser data sets.
Rather than presenting a list of available statistical functions, we will go through a typical data analysis as a way of introducing the more useful and common commands and the kind of output we’ll encounter. We chose to use S-Plus data sets so you can follow along with the analysis we present and complete the exercises at the end of this chapter. We divide the data analysis into two categories: "descriptive" and "graphical" exploration. Further sections cover distributions and related functions, con.rmatory statistics and hypothesis testing, and missing and in.nite values.
7.1 Descriptive Data Exploration
We will now explore the di.erent variables contained in the Barley data set. We will first analyze the variables in one dimension, or, in other words, we will take a univariate approach. The analysis of the dependence between the variables and the exploration of higher-dimensional structure follows later.
The Barley Data Set
The Barley data are measurements of yield in bushels per acre at di.erent sites. The analysis comprises 6 sites planting 10 di.erent varieties of barley in 2 successive years, 1931 and 1932. The data set therefore contains 120 measurements of barley yield. Our main goal will be to investigate di.erences in barley yields given by the di.erent variable constellations, such as the 1931 harvest of the .fth variety on site 4 and the 1932 harvest of the seventh variety at the same site.
Just enter
> barley
to see the data.
Given the basic information about the Barley data, the following analysis is intended to gain more information and structural knowledge about the numbers we have.
A typical place to begin is, of course, looking at the data. If the data set is small, we can easily look at it simply by printing it out. We check the data size by entering
> dim(barley)
120 4
Erscheint lt. Verlag | 15.12.2005 |
---|---|
Reihe/Serie | Statistics and Computing | Statistics and Computing |
Zusatzinfo | XXII, 444 p. 31 illus. |
Verlagsort | New York |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik |
Mathematik / Informatik ► Mathematik ► Computerprogramme / Computeralgebra | |
Mathematik / Informatik ► Mathematik ► Statistik | |
Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
Naturwissenschaften ► Biologie | |
Technik | |
Wirtschaft | |
Schlagworte | Modeling • programming • Sets • S Language • S-PLUS • Statistica • Statistical Analysis • Statistical Computing • statistical software |
ISBN-10 | 0-387-28390-0 / 0387283900 |
ISBN-13 | 978-0-387-28390-6 / 9780387283906 |
Haben Sie eine Frage zum Produkt? |
Größe: 3,3 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich