The advent of high-speed, affordable computers in the last two decades has given a new boost to the nonparametric way of thinking. Classical nonparametric procedures, such as function smoothing, suddenly lost their abstract flavour as they became practically implementable. In addition, many previously unthinkable possibilities became mainstream; prime examples include the bootstrap and resampling methods, wavelets and nonlinear smoothers, graphical methods, data mining, bioinformatics, as well as the more recent algorithmic approaches such as bagging and boosting. This volume is a collection of short articles - most of which having a review component - describing the state-of-the art of Nonparametric Statistics at the beginning of a new millennium.Key features:* algorithic approaches * wavelets and nonlinear smoothers * graphical methods and data mining * biostatistics and bioinformatics * bagging and boosting * support vector machines * resampling methods
Front Cover 1
Recent Advances and Trends in Nonparametric Statistics 4
Copyright Page 5
Table of Contents 8
PREFACE 6
SECTION 1: ALGORITHMIC APPROACHES TO STATISTICS 12
Chapter 1. An introduction to support vector machines 14
Chapter 2. Bagging, subagging and bragging for improving some prediction algorithms 30
Chapter 3. Data compression by geometric quantization 46
SECTION 2: FUNCTIONAL DATA ANALYSIS 58
Chapter 4. Functional data analysis in evolutionary biology 60
Chapter 5. Functional nonparametric statistics: a double infinite 72
SECTION 3: NONPARAMETRIC MODEL BUILDING AND INFERENCE 88
Chapter 6. Nonparametrie models for ANOVA and ANCOVA: a review 90
Chapter 7. Isotonic additive interaction models 104
Chapter 8. A nonparametrie alternativc to analysis of covariance 120
SECTION 4: GOODNESS OF FIT 132
Chapter 9. Assessing structural relationships between distributions–a quantile process approach based on Mallows distance 134
Chapter 10. Almost sure representations in survival analysis under censoring and truncation: applications to goodness-of-fit tests 150
SECTION 5: HIGH-DIMENSIONAL DATA AND VISUALIZATION 164
Chapter 11. Data depth: center-outward ordering of multivariate data and nonparametric multivariate statistics 166
Chapter 12. Visual exploration of data through their graph representations 180
SECTION 6: NONPARAMETRIC REGRESSION 192
Chapter 13. Inference for nonsmooth regression curves and surfaces using kernel-based methods 194
Chapter 14. Nonparametric smoothing methods for a class of non-standard curve estimation problems 214
Chapter 15. Weighted local linear approach to censored nonparametric regression 228
SECTION 7: TOPICS IN NONPARAMETRICS 244
Chapter 16. Adaptive quantile regression 246
Chapter 17. Set estimation: an overview and some recent developments 262
Chapter 18. Nonparametric methods for heavy tailed vector data: a survey with applications from finance and hydrology 276
SECTION 8: NONPARAMETRICS IN FINANCE AND RISK MANAGEMENT 292
Chapter 19. Nonparametric methods in continuous-time finance: a selective review 294
Chapter 20. Nonparametric estimation in a stochastic volatility model 314
Chapter 21. Dynamic nonparametric filtering with application to volatility estimation 326
Chapter 22. A normalizing and variance-stabilizing transformation for financial time series 346
SECTION 9: BIOINFORMATICS AND BIOSTATISTICS 360
Chapter 23. Biostoehasties and nonparametrics: oranges and apples? 362
Chapter 24. Some issues concerning length-biased sampling in survival analysis 378
Chapter 25. Covariate centering and scaling in varying-coefficient regression with application to longitudinal growth studies 388
Chapter 26. Directed peeling and covering of patient rules 404
SECTION 10: RESAMPLING AND SUBSAMPLING 420
Chapter 27. Statistical analysis of survival models with Bayesian bootstrap 422
Chapter 28. On optimal variance estimation under different spatial subsampling schemes 432
Chapter 29. Locally stationary processes and the local block bootstrap 448
SECTION 11: TIME SERIES AND STOCHASTIC PROCESSES 456
Chapter 30. Spectral analysis and a class of nonstationary processes 458
Chapter 31. Curve estimation for locally stationary time series models 462
Chapter 32. Assessing spatial isotropy 478
SECTION 12: WAVELET AND MULTIRESOLUTION METHODS 488
Chapter 33. Automatic landmark registration of 1D curves 490
Chapter 34. Stochastic multiresolution models for turbulence 508
AUTHOR INDEX 522
An Introduction to Support Vector Machines
Bernhard Schölkopa bernhard.schoelkopf@tuebingen.mpg.de a Max-Planck-Institut für biologische Kybernetik, Spemannstr. 38, Tübingen, Germany
This article gives a short introduction to the main ideas of statistical learning theory, support vector machines, and kernel feature spaces.1
1 An Introductory Example
Suppose we are given empirical data
1y1,…,xmym∈X×±1.
(1)
Here, the domain is some nonempty set that the patterns xi are taken from; the yi are called labels or targets. Unless stated otherwise, indices i and j will always be understood to run over the training set, i.e., i, j = 1,…, m.
Note that we have not made any assumptions on the domain other than it being a set. In order to study the problem of learning, we need additional structure. In learning, we want to be able to generalize to unseen data points. In the case of pattern recognition, given some new pattern ∈X, we want to predict the corresponding y ∈ {± 1}. By this we mean, loosely speaking, that we choose y such that (x, y) is in some sense similar to the training examples. To this end, we need similarity measures in and in {± 1}. The latter is easier, as two target values can only be identical or different. 2 For the former, we require a similarity measure
:X×X→ℝ,xx′→kxx′,
(2)
i.e., a function that, given two examples x and x′, returns a real number characterizing their similarity. For reasons that will become clear later, the function k is called a kernel [12,1,6].
A type of similarity measure that is of particular mathematical appeal are dot products. For instance, given two vectors ,x'∈ℝN, the canonical dot product is defined as
⋅x′:=∑n=1Nxnx′n.
(3)
Here, [x]n denotes the n-th entry of x.
The geometrical interpretation of this dot product is that it computes the cosine of the angle between the vectors x and x′, provided they are normalized to length 1. Moreover, it, allows computation of the length of a vector x as ⋅x, and of the distance between two vectors as the length of the difference vector. Therefore, being able to compute dot products amounts to being able to carry out all geometrical constructions that can be formulated in terms of angles, lengths and distances.
Note, however, that we have not made the assumption that the patterns live in a dot, product space. In order to be able to use a dot product as a similarity measure, we therefore first need to embed them into some dot, product space , which need not be identical to N. To this end, we use a map
:X→ℋx↦x:=Φx.
(4)
The space is called a feature space. To summarize, embedding the data into has three benefits.
1. It lets us define a similarity measure from the dot product in ,
xx′:=x⋅x′=Φx⋅Φx′.
(5)
2. It allows us to deal with the patterns geometrically, and thus lets us study learning algorithms using linear algebra and analytic geometry.
3. The freedom to choose the mapping Φ will enable us to design a large variety of learning algorithms. For instance, consider a situation where the inputs already live in a dot product space. In that case, we could directly define a similarity measure as the dot product. However, we might still choose to first apply a nonlinear map Φ to change the representation into one that is more suitable for a given problem and learning algorithm.
We are now in the position to describe a simple pattern recognition algorithm. The idea is to compute the means of the two classes in feature space,
1=1m1∑i:yi=+1xi,
(6)
2=1m2∑i:yi=−1xi,
(7)
where m1 and m2 are the number of examples with positive and negative labels, respectively. We then assign a new point x to the class whose mean is closer to it. This geometrical construction can be formulated in terms of dot products. Half-way in between c1 and c2 lies the point c := (c1 + c2) /2. We compute the class of x by checking whether the vector connecting c and x encloses an angle smaller than π/2 with the vector w := c1 − c2 connecting the class means, in other words
=sgnx−c⋅wy=sgnx−c1+c2/2⋅c1−c2=sgnx⋅c1−x⋅c2+b.
(8)
Here, we have defined the offset
:=12∥c2∥2−∥c1∥2.
(9)
It will prove instructive to rewrite this expression in terms of the patterns xi in the input domain . Note that we do not have a dot product in , all we have is the similarity measure k (cf. (5)). Therefore, we need to rewrite everything in terms of the kernel k evaluated on input patterns. To this end, substitute (6) and (7) into (8) to get the decision function
=sgn1m1∑i:yi=+1x⋅xi−1m2∑i:yi=−1x⋅xi+b=sgn1m1∑i:yi=+1kx⋅xi−1m2∑i:yi=−1kxxi+b.
(10)
Similarly, the offset becomes
:=121m22∑ij:yi=yj=−1kxixj−1m12∑i:jyi=yj=+1kxixj.
(11)
Let us consider one well-known special case of this type of classifier. Assume that the class means have the same distance to the origin (hence b = 0), and that k can be viewed as a density, i.e., it is positive and has integral 1,
Xkxx′dx=1forallx′∈X.
(12)
In order to state this assumption, we have to require that we can define an integral on .
If the above holds true, then (10) corresponds to the so-called Bayes decision boundary separating the two classes, subject to the assumption that the two classes were generated from two probability distributions that are correctly estimated by the Parzen windows estimators of the two classes,
1x:=1m1∑i:yi=+1kxxi
(13)
2x:=1m2∑i:yi=−1kxxi.
(14)
Given some point x, the label is then simply computed by checking which of the two, p1(x) or p2(x), is larger, leading to (10). Note that this decision is the best we can do if we have no prior information about the probabilities of the two classes, or a uniform prior distribution. For further details, see [15].
The classifier (10) is quite close to the types of learning machines that we will be interested in. It is linear in the feature space (Equation (8)), while in the input domain, it is represented by a kernel expansion (Equation (10)). It is example-based in the sense that the kernels are centered on the training examples, i.e., one of the two arguments of the kernels is always a training example. This is a general property of kernel methods, due to the Representer Theorem [11,15]. The main point where the more sophisticated techniques to be discussed later will deviate from (10) is in the selection of the examples that the kernels are centered on, and in the weight that is put on the individual kernels in the decision function. Namely, it will no longer be the case that all training examples appear in the kernel expansion, and the weights of the kernels in the expansion will no longer be uniform. In the feature space representation, this statement corresponds to saying that we will study all normal vectors w of decision hyperplanes that can be represented as linear combinations of the training examples. For instance, we might want to remove the influence of patterns that are very far away from the decision boundary, either since we expect that they will not improve the generalization error of the decision function, or since we...
Erscheint lt. Verlag | 31.10.2003 |
---|---|
Sprache | englisch |
Themenwelt | Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik |
Mathematik / Informatik ► Mathematik ► Computerprogramme / Computeralgebra | |
Mathematik / Informatik ► Mathematik ► Statistik | |
Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
Naturwissenschaften ► Physik / Astronomie ► Elektrodynamik | |
Sozialwissenschaften ► Soziologie ► Empirische Sozialforschung | |
Technik ► Elektrotechnik / Energietechnik | |
ISBN-10 | 0-08-054037-6 / 0080540376 |
ISBN-13 | 978-0-08-054037-5 / 9780080540375 |
Haben Sie eine Frage zum Produkt? |
Größe: 24,8 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
Größe: 7,8 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich