Textual Information Access

Statistical Models

Eric Gaussier, Francois Yvon (Herausgeber)

Buch | Hardcover

448 Seiten

2012
ISTE Ltd and John Wiley & Sons Inc (Verlag)
978-1-84821-322-7 (ISBN)

Artikel merken

This book presents statistical models that have recently been developed within several research communities to access information contained in text collections. The problems considered are linked to applications aiming at facilitating information access:
- information extraction and retrieval;
- text classification and clustering;
- opinion mining;
- comprehension aids (automatic summarization, machine translation, visualization).
In order to give the reader as complete a description as possible, the focus is placed on the probability models used in the applications concerned, by highlighting the relationship between models and applications and by illustrating the behavior of each model on real collections.
Textual Information Access is organized around four themes: informational retrieval and ranking models, classification and clustering (regression logistics, kernel methods, Markov fields, etc.), multilingualism and machine translation, and emerging applications such as information exploration.

Contents

Part 1: Information Retrieval
1. Probabilistic Models for Information Retrieval, Stéphane Clinchant and Eric Gaussier.
2. Learnable Ranking Models for Automatic Text Summarization and Information Retrieval, Massih-Réza Amini, David Buffoni, Patrick Gallinari,  Tuong Vinh Truong and Nicolas Usunier.
Part 2: Classification and Clustering
3. Logistic Regression and Text Classification, Sujeevan Aseervatham, Eric Gaussier, Anestis Antoniadis,  Michel Burlet and Yves Denneulin.
4. Kernel Methods for Textual Information Access, Jean-Michel Renders.
5. Topic-Based Generative Models for Text  Information Access, Jean-Cédric Chappelier.
6. Conditional Random Fields for Information Extraction, Isabelle Tellier and Marc Tommasi.
Part 3: Multilingualism
7. Statistical Methods for Machine Translation, Alexandre Allauzen and François Yvon.
Part 4: Emerging Applications
8. Information Mining: Methods and Interfaces for Accessing Complex Information, Josiane Mothe, Kurt Englmeier and Fionn Murtagh.
9. Opinion Detection as a Topic Classification Problem, Juan-Manuel Torres-Moreno, Marc El-Bèze, Patrice Bellot and  Fréderic Béchet.

Eric Gaussier is deputy director of the Grenoble Informatics Laboratory, one of the largest Computer Science laboratories in France. François Yvon is professor of Computer Science at the University of Paris Sud in Orsay and member of the Spoken Language Processing group of LIMSI/CNRS, Paris, France.

Introduction xiii
Eric Gaussier and François Yvon

PART 1: INFORMATION RETRIEVAL 1

Chapter 1. Probabilistic Models for Information Retrieval 3
Stéphane Clinchant and Eric Gaussier

1.1. Introduction 3

1.3. Probability ranking principle (PRP) 10

1.4. Language models 15

1.5. Informational approaches 21

1.6. Experimental comparison 27

1.7. Tools for information retrieval 28

1.8. Conclusion 28

1.9. Bibliography 29

Chapter 2. Learnable Ranking Models for Automatic Text Summarization and Information Retrieval 33
Massih-Réza Amini, David Buffoni, Patrick Gallinari, Tuong Vinh Truong, and Nicolas Usunier

2.1. Introduction 33

2.2. Application to automatic text summarization 45

2.3. Application to information retrieval 49

2.4. Conclusion 54

2.5. Bibliography 54

PART 2: CLASSIFICATION AND CLUSTERING 59

Chapter 3. Logistic Regression and Text Classification 61
Sujeevan Aseervatham, Eric Gaussier, Anestis Antoniadis,Michel Burlet, and Yves Denneulin

3.1. Introduction 61

3.2. Generalized linear model62

3.3. Parameter estimation 65

3.4. Logistic regression 68

3.5. Model selection 70

3.6. Logistic regression applied to text classification 74

3.7. Conclusion 81

3.8. Bibliography 82

Chapter 4. Kernel Methods for Textual Information Access 85
Jean-Michel Renders

4.1. Kernel methods: context and intuitions 85

4.2. General principles of kernel methods 88

4.3. General problems with kernel choices (kernel engineering) 95

4.4. Kernel versions of standard algorithms: examples of solvers 97

4.5. Kernels for text entities 103

4.6. Summary 123

4.7. Bibliography 124

Chapter 5. Topic-Based Generative Models for Text Information Access 129
Jean-Cédric Chappelier

5.1. Introduction 129

5.2. Topic-based models 135

5.3. Topic models 142

5.4. Term models 161

5.5. Similarity measures between documents 164

5.6. Conclusion 168

5.7. Appendix: topic model software 169

5.8. Bibliography 170

Chapter 6. Conditional Random Fields for Information Extraction 179
Isabelle Tellier and Marc Tommasi

6.1. Introduction 179

6.2. Information extraction 180

6.3. Machine learning for information extraction 184

6.4. Introduction to conditional random fields 187

6.5. Conditional random fields 193

6.6. Conditional random fields and their applications 203

6.7. Conclusion 214

6.8. Bibliography 215

PART 3: MULTILINGUALISM 221

Chapter 7. Statistical Methods for Machine Translation 223
Alexandre Allauzen and François Yvon

7.1. Introduction 223

7.2. Probabilistic machine translation: an overview 227

7.3. Phrase-based models 235

7.4. Modeling reorderings 250

7.5. Translation: a search problem 259

7.6. Evaluating machine translation 272

7.7. State-of-the-art and recent developments 279

7.8. Useful resources 287

7.9. Conclusion 289

7.10. Acknowledgments 291

7.11. Bibliography 291

PART 4: EMERGING APPLICATIONS 305

Chapter 8. Information Mining: Methods and Interfaces for Accessing Complex Information 307
Josiane Mothe, Kurt Englmeier, and Fionn Murtagh

8.1. Introduction 307

8.2. The multidimensional visualization of information 309

8.3. Domain mapping via social networks 320

8.4. Analyzing the variability of searches and data merging 323

8.5. The seven types of evaluation measures used in IR 327

8.6. Conclusion 331

8.7. Acknowledgments 332

8.8. Bibliography 332

Chapter 9. Opinion Detection as a Topic Classification Problem 337
Juan-Manuel Torres-Moreno, Marc El-Bèze, Patrice Bellot, and Fréderic Béchet

9.1. Introduction 337

9.2. The TREC and TAC evaluation campaigns 339

9.3. Cosine weights - a second glance 347

9.4. Which components for a opinion vectors? 348

9.5. Experiments 352

9.6. Extracting opinions from speech: automatic analysis of phone polls 357

9.7. Conclusion 365

9.8. Bibliography 366

Appendix A. Probabilistic Models: An Introduction 369
François Yvon

A.1. Introduction 369

A.2. Supervised categorization 370

A.3. Unsupervised learning: the multinomial mixture model 384

A.4. Markov models: statistical models for sequences 391

A.5. Hidden Markov models 397

A.6. Conclusion 410

A.7. A primer of probability theory 411

A.8. Bibliography 420

List of Authors 423

Index 425

Erscheint lt. Verlag	13.4.2012
Verlagsort	London
Sprache	englisch
Maße	163 x 241 mm
Gewicht	794 g
Themenwelt	Mathematik / Informatik ► Informatik ► Office Programme
ISBN-10	1-84821-322-0 / 1848213220
ISBN-13	978-1-84821-322-7 / 9781848213227
Zustand	Neuware