GPU Computing Gems Emerald Edition offers practical techniques in parallel computing using graphics processing units (GPUs) to enhance scientific research. The first volume in Morgan Kaufmann's Applications of GPU Computing Series, this book offers the latest insights and research in computer vision, electronic design automation, and emerging data-intensive applications. It also covers life sciences, medical imaging, ray tracing and rendering, scientific simulation, signal and audio processing, statistical modeling, video and image processing. This book is intended to help those who are facing the challenge of programming systems to effectively use GPUs to achieve efficiency and performance goals. It offers developers a window into diverse application areas, and the opportunity to gain insights from others' algorithm work that they may apply to their own projects. Readers will learn from the leading researchers in parallel programming, who have gathered their solutions and experience in one volume under the guidance of expert area editors. Each chapter is written to be accessible to researchers from other domains, allowing knowledge to cross-pollinate across the GPU spectrum. Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution. The insights and ideas as well as practical hands-on skills in the book can be immediately put to use. Computer programmers, software engineers, hardware engineers, and computer science students will find this volume a helpful resource. For useful source codes discussed throughout the book, the editors invite readers to the following website: "e;- Covers the breadth of industry from scientific simulation and electronic design automation to audio / video processing, medical imaging, computer vision, and more- Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution- Offers insights and ideas as well as practical "e;hands-on"e; skills you can immediately put to use
Table of Contents 6
Editors, Reviewers, and Authors 12
Introduction 20
Section 1: Scientific Simulation 22
Chapter 1. GPU-Accelerated Computation and Interactive Display of Molecular Orbitals 26
1.1.Introduction, Problem Statement, and Context 26
1.2.Core Method 27
1.3.Algorithms, Implementations, and Evaluations 29
1.4.Final Evaluation 37
1.5.Future Directions 39
References 39
Chapter 2. Large-Scale Chemical Informatics on GPUs 40
2.1.Introduction, Problem Statement, and Context 40
2.2.Core Methods 43
2.3.Gaussian Shape Overlay: Parallelization and Arithmetic Optimization 43
2.4.LINGO: Algorithmic Transformation and Memory Optimization 48
2.5.Final Evaluation 51
2.6.Future Directions 54
Acknowledgments 54
References 55
Chapter 3. Dynamical Quadrature Grids: Applications in Density Functional Calculations 56
3.1.Introduction 56
3.2.Core Method 57
3.3.Implementation 58
3.4.Performance Improvement 60
3.5.Future Work 62
References 63
Chapter 4. Fast Molecular Electrostatics Algorithms on GPUs 64
4.1.Introduction, Problem Statement, and Context 64
4.2.Core Method 66
4.3.Algorithms, Implementations, and Evaluations 66
4.4.Final Evaluation 75
4.5.Future Directions 79
References 79
Chapter 5. Quantum Chemistry: Propagation of Electronic Structure on a GPU 80
5.1.Problem Statement 80
5.2.Core Technology and Algorithm 82
5.3.The Key Insight on the Implementation—the Choice of Building Blocks 86
5.4.Final Evaluation and Benefits 90
5.5.Conclusions and Future Directions 93
Acknowledgments 93
References 94
Chapter 6. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm 96
6.1.Introduction, Problem Statement, and Context 96
6.2.Core Methods 97
6.3.Algorithms and Implementations 99
6.4.Evaluation and Validation of Results, Total Benefits, and Limitations 109
6.5.Future Directions 113
Acknowledgments 113
References 113
Chapter 7. Leveraging the Untapped Computation Power of GPUs: Fast Spectral Synthesis Using Texture Interpolation 114
7.1.Background and Problem Statement 114
7.2.Flux Calculation and Aggregation 116
7.3.The GRASSY Platform 118
7.4.Initial Testing 121
7.5.Impact and Future Directions 122
Acknowledgments 122
References 123
Chapter 8. Black Hole Simulations with CUDA 124
8.1.Introduction 124
8.2.The Post-Newtonian Approximation 125
8.3.Numerical Algorithm 126
8.4.GPU Implementation 127
8.5.Performance Results 128
8.6.GPU Supercomputing Clusters 128
8.7.Statistical Results for Black Hole Inspirals 130
8.8.Conclusion 130
Acknowledgments 131
References 131
Chapter 9. Treecode and Fast Multipole Method for N-Body Simulation with CUDA 134
9.1.Introduction 134
9.2.Fast N-Body Simulation 135
9.3.CUDA Implementation of the Fast N-Body Algorithms 137
9.4.Improvements of Performance 141
9.5.Detailed Description of the GPU Kernels 143
9.6.Overview of Advanced Techniques 150
9.7.Conclusions 152
References 152
Chapter 10. Wavelet-Based Density Functional Theory Calculation on Massively Parallel Hybrid Architectures 154
10.1.Introduction, Problem Statement, and Context 154
10.2.Core Method 156
10.3.Algorithms, Implementations, and Evaluations 159
10.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations 165
10.5.Conclusions and Future Directions 168
References 172
Section 2: Life Sciences 174
Chapter 11. Accurate Scanning of Sequence Databases with the Smith-Waterman Algorithm 176
11.1.Introduction, Problem Statement, and Context 176
11.2.Core Method 177
11.3.CUDA Implementation of the SW Algorithm for Identification of Homologous Proteins 177
11.4.Discussion 190
11.5.Final Evaluation 191
References 191
Chapter 12. Massive Parallel Computing to Accelerate Genome-Matching 194
12.1.Introduction, Problem Statement, and Context 194
12.2.Core Methods 195
12.3.Algorithms, Implementations, and Evaluations 197
12.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations 204
12.5.Future Directions 204
References 205
Chapter 13. GPU-Supercomputer Acceleration of Pattern Matching 206
13.1.Introduction, Problem Statement, and Context 206
13.2.Core Method 207
13.3.Algorithms, Implementations, and Evaluations 208
13.4.Final Evaluation 214
13.5.Future Direction 217
Acknowledgments 217
Appendix 217
References 219
Chapter 14. GPU Accelerated RNA Folding Algorithm 220
14.1.Problem Statement 220
14.2.Core Method 221
14.3.Algorithms, Implementations, and Evaluations 222
14.4.Final Evaluation 228
14.5.Future Directions 230
References 230
Chapter 15. Temporal Data Mining for Neuroscience 232
15.1.Introduction 232
15.2.Core Methodology 233
15.3.GPU Parallelization: Algorithms and Implementations 235
15.4.Experimental Results 243
15.5.Discussion 247
References 248
Section 3: Statistical Modeling 250
Chapter 16. Parallelization Techniques for Random Number Generators 252
16.1.Introduction 252
16.2. L'Ecuyer's Multiple Recursive Generator MRG32k3a 253
16.3.Sobol Generator 256
16.4.Mersenne Twister MT19937 258
16.5.Performance Benchmarks 263
Acknowledgments 265
References 266
Chapter 17. Monte Carlo Photon Transport on the GPU 268
17.1.Physics of Photon Transport 268
17.2.Photon Transport on the GPU 270
17.3.The Complete System 277
17.4.Results and Evaluation 279
17.5.Future Directions 280
References 282
Chapter 18. High-Performance Iterated Function Systems 284
18.1.Problem Statement and Mathematical Background 284
18.2.Core Technology 287
18.3.Implementation 287
18.4.Final Evaluation 291
18.5.Conclusion 293
References 293
Section 4: Emerging Data-Intensive Applications 296
Chapter 19. Large-Scale Machine Learning 298
19.1.Introduction 298
19.2.Core Technology 299
19.3.GPU Algorithm and Implementation 301
19.4.Improvements of Performance 308
19.5.Conclusions and Future Work 311
Acknowledgments 312
References 312
Chapter 20. Multiclass Support Vector Machine 314
20.1.Introduction, Problem Statement, and Context 314
20.2.Core Method 315
20.3.Algorithms, Implementations, and Evaluations 317
20.4.Final Evaluation 327
20.5.Future Direction 331
References 331
Chapter 21. Template-Driven Agent-Based Modeling and Simulation with CUDA 334
21.1.Introduction, Problem Statement, and Context 334
21.2.Final Evaluation and Validation of Results 341
21.3.Conclusions, Benefits and Limitations, and Future Work 344
References 345
Chapter 22. GPU-Accelerated Ant Colony Optimization 346
22.1.Introduction, Problem Statement, and Context 346
22.2.Core Method 347
22.3.Algorithms, Implementations, and Evaluations 348
22.4.Final Evaluation 358
22.5.Future Direction 360
Acknowledgments 361
References 361
Section 5: Electronic Design Automation 362
Chapter 23. High-Performance Gate-Level Simulation with GP-GPUs 364
23.1.Introduction 364
23.2.Simulator Overview 366
23.3.Compilation and Simulation 368
23.4.Experimental Results 376
23.5.Future Directions 383
Related Work 384
References 384
Chapter 24. GPU-Based Parallel Computing for Fast Circuit Optimization 386
24.1.Introduction, Problem Statement, and Context 386
24.2.Core Method 388
24.3.Algorithms, Implementations, and Evaluations 390
24.4.Final Evaluation 394
24.5.Future Direction 397
References 399
Section 6: Ray Tracing and Rendering 400
Chapter 25. Lattice Boltzmann Lighting Models 402
25.1.Introduction, Problem Statement, and Context 402
25.2.Core Methods 403
25.3.Algorithms, Implementation, and Evaluation 404
25.4.Final Evaluation 414
25.5.Future Directions 416
25.6.Derivation of the Diffusion Equation 416
Acknowledgments 419
References 419
Chapter 26. Path Regeneration for Random Walks 422
26.1.Introduction 422
26.2.Path Tracing as Case Study 423
26.3.Random Walks in Path Tracing 423
26.4.Implementation Details 427
26.5.Results 429
26.6.Discussion 432
Acknowledgments 432
References 433
Chapter 27. From Sparse Mocap to Highly Detailed Facial Animation 434
27.1.System Overview 434
27.2.Background 435
27.3.Core Technology and Algorithms 435
27.4.Future Directions 446
Acknowledgments 447
References 447
Chapter 28. A Programmable Graphics Pipeline in CUDA for Order-Independent Transparency 448
28.1.Introduction, Problem Statement, and Context 448
28.2.Core Method 449
28.3.Algorithms, Implementations, and Evaluations 449
28.4.Final Evaluation 454
28.5.Future Direction 456
References 456
Section 7: Computer Vision 458
Chapter 29. Fast Graph Cuts for Computer Vision 460
29.1.Introduction, Problem Statement, and Context 460
29.2.Core Method 460
29.3.Algorithms, Implementations, and Evaluations 461
29.4.Final evaluation and validation of results 468
29.5.Multilabel Graph Cuts 469
References 471
Chapter 30. Visual Saliency Model on Multi-GPU 472
30.1.Introduction 472
30.2.Visual Saliency Model 473
30.3.GPU Implementation 475
30.4.Results 487
30.5.Conclusion 492
References 492
Chapter 31. Real-Time Stereo on GPGPU Using Progressive Multiresolution Adaptive Windows 494
31.1.Introduction, Problem Statement, and Context 494
31.2.Core Method 496
References 515
Chapter 32. Real-Time Speed-Limit-Sign Recognition on an Embedded System Using a GPU 518
32.1.Introduction 518
32.2.Methods 520
32.3.Implementation 526
32.4.Results and Discussion 528
32.5.Conclusion and Future Work 534
References 535
Chapter 33. Haar Classifiers for Object Detection with CUDA 538
33.1.Introduction 538
33.2.Viola-Jones Object Detection Retrospective 538
33.3.Object Detection Pipeline with NVIDIA CUDA 547
33.4.Benchmarking and Implementation Details 562
33.5.Future Direction 564
33.6.Conclusion 564
References 564
Section 8: Video and Image Processing 566
Chapter 34. Experiences on Image and Video Processing with CUDA and OpenCL 568
34.1.Introduction, Problem Statement, and Background 568
34.2.Core Technology or Algorithm 569
34.3.Key Insights from Implementation and Evaluation 572
34.4.Final Evaluation 586
References 588
Chapter 35. Connected Component Labeling in CUDA 590
35.1.Introduction 590
35.2.Core Algorithm 591
35.3.CUDA Algorithm and Implementation 593
35.4.Final Evaluation and Results 598
References 602
Chapter 36. Image De-Mosaicing 604
36.1.Introduction, Problem Statement, and Context 604
36.2.Core Method 606
36.3.Algorithms, Implementations, and Evaluations 606
36.4.Final Evaluation 618
References 619
Section 9: Signal and Audio Processing 620
Chapter 37. Efficient Automatic Speech Recognition on the GPU 622
37.1.Introduction, Problem Statement, and Context 622
37.2.Core Methods 624
37.3.Algorithms, Implementations, and Evaluations 625
37.4.Conclusion and Future Directions 636
References 638
Chapter 38. Parallel LDPC Decoding 640
38.1.Introduction, Problem Statement, and Context 640
38.2.Core Technology 641
38.3.Algorithms, Implementations, and Evaluations 643
38.4.Final Evaluation 647
38.5.Future Directions 648
References 648
Chapter 39. Large-Scale Fast Fourier Transform 650
39.1.Introduction 650
39.2.Memory Hierarchy of GPU Clusters 652
39.3.Large-Scale Fast Fourier Transform 654
39.4.Algebraic Manipulation of Array Dimensions 656
39.5.Performance Results 660
39.6.Conclusion and Future Work 660
References 663
Section 10: Medical Imaging 664
Chapter 40. GPU Acceleration of Iterative Digital Breast Tomosynthesis 668
40.1.Introduction 668
40.2.Digital Breast Tomosynthesis 670
40.3.Accelerating Iterative DBT using GPUs 671
40.4.Conclusions 677
Acknowledgments 677
References 678
Chapter 41. Parallelization of Katsevich CT Image Reconstruction Algorithm on Generic Multi-Core Processors and GPGPU 680
41.1.Introduction, Problem, and Context 680
41.2.Core Methods 680
41.3.Algorithms, Implementations, and Evaluations 682
41.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations 693
41.5.Related Work 696
41.6.Future Directions 697
41.7.Summary 697
References 697
Chapter 42. 3-D Tomographic Image Reconstruction from Randomly Ordered Lines with CUDA 700
42.1.Introduction 700
42.2.Core Methods 703
42.3.Implementation 705
42.4.Evaluation and Validation of Results, Total Benefits, and Limitations 707
42.5.Future Directions 711
References 712
Chapter 43. Using GPUs to Learn Effective Parameter Settings for GPU-Accelerated Iterative CT Reconstruction Algorithms 714
43.1.Introduction, Problem Statement, and Context 714
43.2.Core Method(s) 715
43.3.Algorithms, Implementations, and Evaluations 716
43.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations 721
43.5.Future Directions 727
References 728
Chapter 44. Using GPUs to Accelerate Advanced MRI Reconstruction with Field Inhomogeneity Compensation 730
44.1.Introduction 730
44.2.Core Method: Advanced Image Reconstruction Toolbox for MRI 731
44.3.MRI Reconstruction Algorithms and Implementation on GPUs 734
44.4.Final Results and Evaluation 740
44.5.Conclusion and Future Directions 741
References 742
Chapter 45. ?1 Minimization in ?1-SPIRiT Compressed Sensing MRI Reconstruction 744
45.1.Introduction, Problem Statement, and Context 744
45.2.Core Methods (High Level Description) 747
45.3.Algorithms, Implementations, and Evaluations (Detailed Description) 748
45.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations 754
45.5.Discussion and Conclusion 756
References 756
Chapter 46. Medical Image Processing Using GPU-Accelerated ITK Image Filters 758
46.1.Introduction 758
46.2.Core Methods 758
46.3.Implementation 761
46.4.Results 767
46.5.Future Directions 769
46.6. Acknowledgments 769
References 770
Chapter 47. Deformable Volumetric Registration Using B-Splines 772
47.1.Introduction 772
47.2.An Overview of B-Spline Registration 773
47.3.Implementation Details 777
47.4.Results 788
47.5.Conclusions 790
References 790
Chapter 48. Multiscale Unbiased Diffeomorphic Atlas Construction on Multi-GPUs 792
48.1.Introduction, Problem Statement, and Context 792
48.2.Core Methods 795
48.3.Algorithms, Implementations, and Evaluations 796
48.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations 807
48.5.Future Directions 810
Acknowledgments 811
References 812
Chapter 49. GPU-Accelerated Brain Connectivity Reconstruction and Visualization in Large-Scale Electron Micrographs 814
49.1.Introduction 814
49.2.Core Methods 814
49.3.Implementation 818
49.4.Results 830
49.5.Future Directions 832
Acknowledgments 833
References 833
Chapter 50. Fast Simulation of Radiographic Images Using a Monte Carlo X-Ray Transport Algorithm Implemented in CUDA 834
50.1.Introduction, Problem Statement, and Context 834
50.2.Core Methods 835
50.3.Algorithms, Implementations, and Evaluations 836
50.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations 843
50.5.Future Directions 848
References 849
Index 852
Erscheint lt. Verlag | 13.1.2011 |
---|---|
Mitarbeit |
Chef-Herausgeber: Wen-Mei W. Hwu |
Sprache | englisch |
Themenwelt | Informatik ► Grafik / Design ► Digitale Bildverarbeitung |
Mathematik / Informatik ► Informatik ► Netzwerke | |
Mathematik / Informatik ► Informatik ► Software Entwicklung | |
Mathematik / Informatik ► Informatik ► Theorie / Studium | |
Informatik ► Weitere Themen ► Hardware | |
Technik ► Elektrotechnik / Energietechnik | |
ISBN-10 | 0-12-384989-6 / 0123849896 |
ISBN-13 | 978-0-12-384989-2 / 9780123849892 |
Haben Sie eine Frage zum Produkt? |
Größe: 27,7 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
Größe: 20,7 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich