Selective Visual Attention

Computational Models and Applications

Liming Zhang, Weisi Lin (Autoren)

Buch | Hardcover

352 Seiten

2013
Wiley-IEEE Press (Verlag)
978-0-470-82812-0 (ISBN)

Artikel merken

Visual attention is a relatively new area of study combining a number of disciplines: artificial neural networks, artificial intelligence, vision science and psychology. The aim is to build computational models similar to human vision in order to solve tough problems for many potential applications including object recognition, unmanned vehicle navigation, and image and video coding and processing. In this book, the authors provide an up to date and highly applied introduction to the topic of visual attention, aiding researchers in creating powerful computer vision systems. Areas covered include the significance of vision research, psychology and computer vision, existing computational visual attention models, and the authors' contributions on visual attention models, and applications in various image and video processing tasks. This book is geared for graduates students and researchers in neural networks, image processing, machine learning, computer vision, and other areas of biologically inspired model building and applications. The book can also be used by practicing engineers looking for techniques involving the application of image coding, video processing, machine vision and brain-like robots to real-world systems. Other students and researchers with interdisciplinary interests will also find this book appealing.

Provides a key knowledge boost to developers of image processing applications
Is unique in emphasizing the practical utility of attention mechanisms
Includes a number of real-world examples that readers can implement in their own work:
robot navigation and object selection
image and video quality assessment
image and video coding
Provides codes for users to apply in practical attentional models and mechanisms

Liming Zhang is a Professor of Electronics at Fudan University, where she leads the Image and Intelligence Laboratory. Since the 1980s she has been engaged in biological modeling and its application to engineering, such as artificial neural network models, visual models and brain-like robot models, and has published three books in Chinese on artificial neural networks, image coding and intelligent image processing, as well as over 120 pages in the area. Since 2003 she has been studying problems in modeling visual attention and applying it in computer vision, robot vision, object tracking, remote sensing and image quality assessment. She has served as a Senior Visiting Scholar at the University of Notre Dame and Technical University of Munich. Weisi Lin is an Associate Professor in the division of computer communications at Nanyang Technological University's School of Computer Engineering. He also serves as Lab Head, Visual Processing, and Acting Department Manager, Media Processing, in Institute for Infocomm Research. Lin has also participated in research at Shantou University (China), Bath University (UK), National University of Singapore, Institute of Microelectronics (Singapore), Centre for Signal Processing (Singapore). His research interests include image processing, perceptual modeling, video compression, multimedia communication and computer vision. He holds 10 patents, has written 4 book chapters, and has published over 130 refereed papers in international journals and conferences. He is a Chartered Engineer, and a Fellow of IET. Lin graduated from Zhongshan University, China with B.Sc in Electronics and M.Sc in Digital Signal Processing, and from King’s College, London University, UK with Ph.D in Computer Vision.

Preface xi PART I BASIC CONCEPTS AND THEORY 1

1 Introduction to Visual Attention 3

1.1 The Concept of Visual Attention 3

1.1.1 Selective Visual Attention 3

1.1.2 What Areas in a Scene Can Attract Human Attention? 4

1.1.3 Selective Attention in Visual Processing 5

1.2 Types of Selective Visual Attention 7

1.2.1 Pre-attention and Attention 7

1.2.2 Bottom-up Attention and Top-down Attention 8

1.2.3 Parallel and Serial Processing 10

1.2.4 Overt and Covert Attention 11

1.3 Change Blindness and Inhibition of Return 11

1.3.1 Change Blindness 11

1.3.2 Inhibition of Return 12

1.4 Visual Attention Model Development 12

1.4.1 First Phase: Biological Studies 13

1.4.2 Second Phase: Computational Models 15

1.4.3 Third Phase: Visual Attention Applications 17

1.5 Scope of This Book 18

References 19

2 Background of Visual Attention – Theory and Experiments 25

2.1 Human Visual System (HVS) 25

2.1.1 Information Separation 26

2.1.2 Eye Movement and Involved Brain Regions 28

2.1.3 Visual Attention Processing in the Brain 29

2.2 Feature Integration Theory (FIT) of Visual Attention 29

2.2.1 Feature Integration Hypothesis 30

2.2.2 Confirmation by Visual Search Experiments 31

2.3 Guided Search Theory 39

2.3.1 Experiments: Parallel Process Guides Serial Search 40

2.3.2 Guided Search Model (GS1) 42

2.3.3 Revised Guided Search Model (GS2) 43

2.3.4 Other Modified Versions: (GS3, GS4) 46

2.4 Binding Theory Based on Oscillatory Synchrony 47

2.4.1 Models Based on Oscillatory Synchrony 49

2.4.2 Visual Attention of Neuronal Oscillatory Model 54

2.5 Competition, Normalization and Whitening 56

2.5.1 Competition and Visual Attention 56

2.5.2 Normalization in Primary Visual Cortex 57

2.5.3 Whitening in Retina Processing 59

2.6 Statistical Signal Processing 60

2.6.1 A Signal Detection Approach for Visual Attention 61

2.6.2 Estimation Theory and Visual Attention 62

2.6.3 Information Theory for Visual Attention 63

References 67

PART II COMPUTATIONAL ATTENTION MODELS 73

3 Computational Models in the Spatial Domain 75

3.1 Baseline Saliency Model for Images 75

3.1.1 Image Feature Pyramids 76

3.1.2 Centre–Surround Differences 79

3.1.3 Across-scale and Across-feature Combination 80

3.2 Modelling for Videos 81

3.2.1 Extension of BS Model for Video 81

3.2.2 Motion Feature Detection 81

3.2.3 Integration for Various Features 83

3.3 Variations and More Details of BS Model 84

3.3.1 Review of the Models with Variations 85

3.3.2 WTA and IoR Processing 87

3.3.3 Further Discussion 90

3.4 Graph-based Visual Saliency 91

3.4.1 Computation of the Activation Map 92

3.4.2 Normalization of the Activation Map 94

3.5 Attention Modelling Based on Information Maximizing 95

3.5.1 The Core of the AIM Model 96

3.5.2 Computation and Illustration of Model 97

3.6 Discriminant Saliency Based on Centre–Surround 101

3.6.1 Discriminant Criterion Defined on Centre–Surround 102

3.6.2 Mutual Information Estimation 103

3.6.3 Algorithm and Block Diagram of Bottom-up DISC Model 106

3.7 Saliency Using More Comprehensive Statistics 107

3.7.1 The Saliency in Bayesian Framework 108

3.7.2 Algorithm of SUN Model 110

3.8 Saliency Based on Bayesian Surprise 113

3.8.1 Bayesian Surprise 113

3.8.2 Saliency Computation Based on Surprise Theory 114

3.9 Summary 116

References 117

4 Fast Bottom-up Computational Models in the Spectral Domain 119

4.1 Frequency Spectrum of Images 120

4.1.1 Fourier Transform of Images 120

4.1.2 Properties of Amplitude Spectrum 121

4.1.3 Properties of the Phase Spectrum 123

4.2 Spectral Residual Approach 123

4.2.1 Idea of the Spectral Residual Model 124

4.2.2 Realization of Spectral Residual Model 125

4.2.3 Performance of SR Approach 126

4.3 Phase Fourier Transform Approach 127

4.3.1 Introduction to the Phase Fourier Transform 127

4.3.2 Phase Fourier Transform Approach 128

4.3.3 Results and Discussion 129

4.4 Phase Spectrum of the Quaternion Fourier Transform Approach 131

4.4.1 Biological Plausibility for Multichannel Representation 131

4.4.2 Quaternion and Its Properties 132

4.4.3 Phase Spectrum of Quaternion Fourier Transform (PQFT) 134

4.4.4 Results Comparison 138

4.4.5 Dynamic Saliency Detection of PQFT 140

4.5 Pulsed Discrete Cosine Transform Approach 141

4.5.1 Approach of Pulsed Principal Components Analysis 141

4.5.2 Approach of the Pulsed Discrete Cosine Transform 143

4.5.3 Multichannel PCT Model 144

4.6 Divisive Normalization Model in the Frequency Domain 145

4.6.1 Equivalent Processes with a Spatial Model in the Frequency Domain 146

4.6.2 FDN Algorithm 149

4.6.3 Patch FDN 150

4.7 Amplitude Spectrum of Quaternion Fourier Transform (AQFT) Approach 152

4.7.1 Saliency Value for Each Image Patch 152

4.7.2 The Amplitude Spectrum for Each Image Patch 153

4.7.3 Differences between Image Patches and their Weighting to Saliency Value 154

4.7.4 Patch Size and Scale for Final Saliency Value 156

4.8 Modelling from a Bit-stream 157

4.8.1 Feature Extraction from a JPEG Bit-stream 157

4.8.2 Saliency Detection in the Compressed Domain 160

4.9 Further Discussions of Frequency Domain Approach 161

References 163

5 Computational Models for Top-down Visual Attention 167

5.1 Attention of Population-based Inference 168

5.1.1 Features in Population Codes 170

5.1.2 Initial Conspicuity Values 171

5.1.3 Updating and Transformation of Conspicuity Values 173

5.2 Hierarchical Object Search with Top-down Instructions 175

5.2.1 Perceptual Grouping 175

5.2.2 Grouping-based Salience from Bottom-up Information 176

5.2.3 Top-down Instructions and Integrated Competition 179

5.2.4 Hierarchical Selection from Top-down Instruction 179

5.3 Computational Model under Top-down Influence 180

5.3.1 Bottom-up Low-level Feature Computation 181

5.3.2 Representation of Prior Knowledge 181

5.3.3 Saliency Map Computation using Object Representation 184

5.3.4 Using Attention for Object Recognition 184

5.3.5 Implementation 185

5.3.6 Optimizing the Selection of Top-down Bias 186

5.4 Attention with Memory of Learning and Amnesic Function 187

5.4.1 Visual Memory: Amnesic IHDR Tree 188

5.4.2 Competition Neural Network Under the Guidance of Amnesic IHDR 191

5.5 Top-down Computation in the Visual Attention System: VOCUS 193

5.5.1 Bottom-up Features and Bottom-up Saliency Map 193

5.5.2 Top-down Weights and Top-down Saliency Map 194

5.5.3 Global Saliency Map 196

5.6 Hybrid Model of Bottom-up Saliency with Top-down Attention Process 196

5.6.1 Computation of the Bottom-up Saliency Map 197

5.6.2 Learning of Fuzzy ART Networks and Top-down Decision 197

5.7 Top-down Modelling in the Bayesian Framework 199

5.7.1 Review of Basic Framework 200

5.7.2 The Estimation of Conditional Probability Density 201

5.8 Summary 202

References 202

6 Validation and Evaluation for Visual Attention Models 207

6.1 Simple Man-made Visual Patterns 207

6.2 Human-labelled Images 208

6.3 Eye-tracking Data 209

6.4 Quantitative Evaluation 211

6.4.1 Some Basic Measures 211

6.4.2 ROC Curve and AUC Score 213

6.4.3 Inter-subject ROC Area 213

6.5 Quantifying the Performance of a Saliency Model to Human Eye Movement in Static and Dynamic Scenes 215

6.6 Spearman’s Rank Order Correlation with Visual Conspicuity 217

References 219

PART III APPLICATIONS OF ATTENTION SELECTION MODELS 221

7 Applications in Computer Vision, Image Retrieval and Robotics 223

7.1 Object Detection and Recognition in Computer Vision 224

7.1.1 Basic Concepts 224

7.1.2 Feature Extraction 224

7.1.3 Object Detection and Classification 227

7.2 Attention Based Object Detection and Recognition in a Natural Scene 231

7.2.1 Object Detection Combined with Bottom-up Model 231

7.2.2 Object Detection based on Attention Elicitation 233

7.2.3 Object Detection with a Training Set 236

7.2.4 Object Recognition Combined with Bottom-up Attention 239

7.3 Object Detection and Recognition in Satellite Imagery 240

7.3.1 Ship Detection based on Visual Attention 242

7.3.2 Airport Detection in a Land Region 245

7.3.3 Saliency and Gist Feature for Target Detection 248

7.4 Image Retrieval via Visual Attention 250

7.4.1 Elements of General Image Retrieval 251

7.4.2 Attention Based Image Retrieval 253

7.5 Applications of Visual Attention in Robots 256

7.5.1 Robot Self-localization 257

7.5.2 Visual SLAM System with Attention 259

7.5.3 Moving Object Detection using Visual Attention 262

7.6 Summary 265

References 265

8 Application of Attention Models in Image Processing 271

8.1 Attention-modulated Just Noticeable Difference 271

8.1.1 JND Modelling 272

8.1.2 Modulation via Non-linear Mapping 274

8.1.3 Modulation via Foveation 276

8.2 Use of Visual Attention in Quality Assessment 277

8.2.1 Image/Video Quality Assessment 278

8.2.2 Weighted Quality Assessment by Salient Values 279

8.2.3 Weighting through Attention-modulated JND Map 280

8.2.4 Weighting through Fixation 281

8.2.5 Weighting through Quality Distribution 281

8.3 Applications in Image/Video Coding 282

8.3.1 Image and Video Coding 282

8.3.2 Attention-modulated JND based Coding 284

8.3.3 Visual Attention Map based Coding 285

8.4 Visual Attention for Image Retargeting 287

8.4.1 Literature Review for Image Retargeting 288

8.4.2 Saliency-based Image Retargeting in the Compressed Domain 289

8.5 Application in Compressive Sampling 292

8.5.1 Compressive Sampling 293

8.5.2 Compressive Sampling via Visual Attention 296

8.6 Summary 300

References 300

PART IV SUMMARY 305

9 Summary, Further Discussions and Conclusions 307

9.1 Summary 308

9.1.1 Research Results from Physiology and Anatomy 308

9.1.2 Research from Psychology and Neuroscience 309

9.1.3 Theory of Statistical Signal Processing 310

9.1.4 Computational Visual Attention Modelling 310

9.1.5 Applications of Visual Attention Models 313

9.2 Further Discussions 314

9.2.1 Interaction between Top-down Control and Bottom-up Processing in Visual Search 314

9.2.2 How to Deploy Visual Attention in the Brain? 315

9.2.3 Role of Memory in Visual Attention 316

9.2.4 Mechanism of Visual Attention in the Brain 316

9.2.5 Covert Visual Attention 317

9.2.6 Saliency of Large Smooth Objects 317

9.2.7 Invariable Feature Extraction 320

9.2.8 Role of Visual Attention Models in Applications 320

9.3 Conclusions 320

References 321

Index 325

Reihe/Serie	IEEE Press
Sprache	englisch
Maße	163 x 241 mm
Gewicht	658 g
Themenwelt	Mathematik / Informatik ► Mathematik ► Angewandte Mathematik
	Studium ► 1. Studienabschnitt (Vorklinik) ► Physiologie
	Technik ► Elektrotechnik / Energietechnik
ISBN-10	0-470-82812-9 / 0470828129
ISBN-13	978-0-470-82812-0 / 9780470828120
Zustand	Neuware