Advances in Data Science and Analytics
Wiley-Scrivener (Verlag)
978-1-119-79188-1 (ISBN)
- Titel ist leider vergriffen;
keine Neuauflage - Artikel merken
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, deep learning, and big data. Data analytics software is a more focused version of this and can even be considered part of the larger process. Analytics is devoted to realizing actionable insights that can be applied immediately based on existing queries. For the purposes of this volume, data science is an umbrella term that encompasses data analytics, data mining, machine learning, and several other related disciplines. While a data scientist is expected to forecast the future based on past patterns, data analysts extract meaningful insights from various data sources.
Although data mining and other related areas have been around for a few decades, data science and analytics are still quickly evolving, and the processes and technologies change, almost on a day-to-day basis. This volume provides an overview of some of the most important advances in these areas today, including practical coverage of the daily applications. Valuable as a learning tool for beginners in this area as well as a daily reference for engineers and scientists working in these areas, this is a must-have for any library.
M. Niranjanamurthy, PhD, is an assistant professor in the Department of Computer Applications, M. S. Ramaiah Institute of Technology, Bangalore, Karnataka, India. He earned his PhD in computer science at JJTU. He has over 13 years of teaching experience and two years of industry experience as a software engineer. He has published four books and 85 papers in technical journals and conferences. He has six patents to his credit and has won numerous awards. Hemant Kumar Gianey, PhD, is a senior assistant professor in the Computer Science Department at Vellore Institute of Technology, AP, India. He also worked at Thapar Institute of Engineering and Technology, Patiala, Punjab, India and worked as a post-doctoral researcher in the Computer Science and Engineering Department at National Cheng Kung University in Taiwan. He has over 15 years of teaching and industry experience. He has conducted many workshops and has been a guest speaker in various universities. He has also published many research papers on in scientific and technical journals. Amir H. Gandomi, PhD, is a professor of data science in the Department of Engineering and Information Technology, University of Technology Sydney. Before joining UTS, he was an assistant professor at the School of Business, Stevens Institute of Technology, NJ, and a distinguished research fellow at BEACON Center, Michigan State University. He has published over 150 journal papers and four books and collectively has been cited more than 14,000 times. He has been named as one of the world’s most influential scientific minds and a Highly Cited Researcher (top 1%) for three consecutive years, from 2017 to 2019. He has also served as associate editor, editor, and guest editor in several prestigious journals and has delivered several keynote talks. He is also part of a NASA technology cluster on Big Data, Artificial Intelligence, and Machine Learning.
Preface xv
1 Implementation Tools for Generating Statistical Consequence Using Data Visualization Techniques 1
Dr. Ajay B. Gadicha, Dr. Vijay B. Gadicha, Prof. Sneha Bohra and Dr. Niranjanamurthy M.
1.1 Introduction 2
1.2 Literature Review 4
1.3 Tools in Data Visualization 4
1.4 Methodology 14
1.4.1 Plotting the Data 14
1.4.2 Plotting the Model on Data 15
1.4.3 Quantifying Linear Relationships 16
1.4.4 Covariance vs. Correlation 17
1.5 Conclusion 18
References 18
2 Decision Making and Predictive Analysis for Real Time Data 21
Umesh Pratap Singh
2.1 Introduction 22
2.2 Data Analytics 23
2.2.1 Descriptive Analytics 23
2.2.2 Diagnostic Analytics 23
2.2.3 Predictive Analytics 23
2.2.4 Prescriptive Analytics 24
2.3 Predictive Modeling 24
2.4 Categories of Predictive Models 24
2.5 Process of Predictive Modeling 25
2.5.1 Requirement Gathering 26
2.5.2 Data Gathering 26
2.5.3 Data Analysis and Massaging 26
2.5.4 Machine Learning Statistics 26
2.5.5 Predictive Modeling 26
2.5.6 Prediction and Decision Making 27
2.6 Predictive Analytics Opportunities 27
2.6.1 Detecting Fraud 27
2.6.2 Reduction of Risk 27
2.6.3 Marketing Campaign Optimization 28
2.6.4 Operation Improvement 28
2.6.5 Clinical Decision Support System 28
2.7 Classification of Predictive Analytics Models 28
2.7.1 Predictive Models 28
2.7.2 Descriptive Models 29
2.7.3 Decision Models 29
2.8 Predictive Analytics Techniques 29
2.8.1 Predictive Analytics Software 29
2.8.2 The Importance of Good Data 30
2.8.3 Predictive Analytics vs. Business Intelligence 30
2.8.4 Pricing Information 30
2.9 Data Analysis Tools 30
2.9.1 Excel 30
2.9.2 Tableau 31
2.9.3 Power BI 31
2.9.4 Fine Report 31
2.9.5 R & Python 31
2.10 Advantages & Disadvantages of Predictive Modeling 31
2.10.1 Advantages 31
2.10.2 Disadvantages 32
2.10.2.1 Data Labeling 32
2.10.2.2 Obtaining Massive Training Datasets 32
2.10.2.3 The Explainability Problem 32
2.10.2.4 Generalizability of Learning 33
2.10.2.5 Bias in Algorithms and Data 33
2.11 Predictive Analytics Biggest Impact 33
2.11.1 Predicting Demand 33
2.11.2 Transformation Using Technology and Process 34
2.11.3 Improved Pricing 34
2.11.4 Predictive Maintenance 35
2.12 Application of Predictive Analytics 35
2.12.1 Financial and Banking Services 35
2.12.2 Retail 35
2.12.3 Health and Insurance 36
2.12.4 Oil and Gas Utilities 36
2.12.5 Public Sector 36
2.13 Future Scope of Predictive Modeling 36
2.13.1 Technological Advancements 37
2.13.2 Changes in Work 37
2.13.3 Risk Mitigation 37
2.14 Conclusion 37
References 38
3 Optimizing Water Quality with Data Analytics and Machine Learning 39
Bin Liang, Zhidong Li, Hongda Tian, Shuming Liang, Yang Wang and Fang Chen
3.1 Introduction 39
3.2 Related Work 41
3.3 Data Sources and Collection 42
3.4 Water Demand Forecasting 43
3.4.1 Network Flow and Zone Demand Estimation 43
3.4.2 Demand Forecasting 44
3.4.2.1 Feature Importance 45
3.4.2.2 Forecast Horizon 46
3.4.3 Performance Characterization 46
3.5 Re-Chlorination Optimization 49
3.5.1 Data 51
3.5.2 Water Age Estimation 52
3.5.2.1 Travel Time Estimation 53
3.5.2.2 Residential Time Estimation 54
3.5.3 Ammonia Prediction 54
3.5.4 Optimization Model Definition 57
3.5.5 Improvements in Customer Water Quality 59
3.5.6 Plant Dosing Optimization 62
3.6 Conclusion 63
Acknowledgements 63
References 63
4 Lip Reading Framework using Deep Learning and Machine Learning 67
Hemant Kumar Gianey, Parth Khandelwal, Prakhar Goel, Rishav Maheshwari, Bhannu Galhotra and Divyanshu Pratap Singh
4.1 Introduction 68
4.1.1 Overview 68
4.1.2 Motivation 68
4.1.3 Lip Reading System Outcomes and Deliverables 69
4.2 The Emergence and Definition of the Lip-Reading System 70
4.2.1 Background of Domain 70
4.2.2 Identified Problems 78
4.2.3 Tools and Technologies Used 78
4.2.4 Implementation Aspects 78
4.2.4.1 Data Preparation 79
4.3 Design and Components of Lip-Reading System 82
4.4 Lip Reading System Architecture 82
4.5 Testing 84
4.6 Problems Encountered During Implementation 84
4.6.1 Assumptions and Constraints 85
4.7 Conclusion 85
4.8 Future Work 85
References 86
5 New Perspective to Management, Economic Growth and Debt Nexus Analysis: Evidence from Indian Economy 89
Edmund Ntom Udemba, Festus Victor Bekun, Dervis Kirikkaleli and Esra Sipahi Döngül
5.1 Introduction 90
5.2 Literature Review 92
5.2.1 External Debt and Economic Growth 92
5.2.2 Trade Openness, FDI, and Economic Growth 94
5.2.3 FDI and Economic Growth 94
5.3 Data 95
5.3.1 Analytical Framework and Data Description 96
5.3.2 Theoretical Background and Specifications 96
5.3.2.1 Model Specification 98
5.4 Methodology and Findings 99
5.4.1 Unit Root Testing 99
5.4.2 Cointegration 99
5.4.3 Vector Error Correction Model 103
5.4.4 Long-Run Relationship Estimation 105
5.4.5 Causality Test 107
5.5 Conclusion and Policy Implications 108
Declarations 109
Availability of Data and Materials 109
Competing Interests 110
Funding 110
Authors’ Contributions 110
Acknowledgments 110
References 110
6 Data-Driven Delay Analysis with Applications to Railway Networks 115
Boyu Li, Ting Guo, Yang Wang and Fang Chen
6.1 Introduction 116
6.2 Related Works 118
6.3 Background Knowledge 119
6.3.1 Background and Problem Formulation 120
6.3.1.1 Train Delay 120
6.3.1.2 Delay Propagation 121
6.3.2 Preliminaries 122
6.3.2.1 Bayesian Inference 123
6.3.2.2 Markov Property 123
6.4 Delay Propagation Model 123
6.4.1 Conditional Bayesian Delay Propagation 123
6.4.1.1 Delay Self-Propagation 124
6.4.1.2 Incremental Run-Time Delay 125
6.4.1.3 Incremental Dwell Time Delay 125
6.4.1.4 Accumulative Departure Delay 126
6.4.2 Cross-Line Propagation, Backward Propagation and Train Connection Propagation 127
6.5 Primary Delay Tracing Back 130
6.5.1 Delay Candidates Selection 130
6.5.2 Relation Construction 131
6.5.2.1 Preceding and Following Trains 131
6.5.2.2 Preceding and Connecting Trains 131
6.6 Evaluation on Dwell Time Improvement Strategy 132
6.7 Experiments 135
6.7.1 Experiment Setting 135
6.7.2 Temporal Prediction of Delay Propagation 137
6.7.3 Spatial Prediction of Delay Propagation 138
6.7.4 Case Study of Primary Delay Tracing Down 139
6.7.5 Evaluation of Dwell Time Improvement Strategy 140
6.8 Conclusion 142
References 142
7 Proposing a Framework to Analyze Breast Cancer in Mammogram Images Using Global Thresholding, Gray Level Co-Occurrence Matrix, and Convolutional Neural Network (CNN) 145
Ms. Tanishka Dixit and Ms. Namrata Singh
7.1 Introduction & Purpose of Study 146
7.1.1 Segmentation 146
7.1.1.1 Types of Segmentation 147
7.1.2 Compression 150
7.2 Literature Review & Motivation 153
7.3 Proposed Work 161
7.3.1 Algorithm 161
7.3.2 Explanation 162
7.3.3 Flowchart 162
7.4 Observation Tables and Figures 163
7.5 Conclusion 176
7.6 Future Work 176
References 176
8 IoT Technologies for Smart Healthcare 181
Rehab A. Rayan, Imran Zafar and Christos Tsagkaris
8.1 Introduction 182
8.2 Literature Review 183
8.2.1 IoT-Based Smart Health 183
8.2.2 Advantages of Applying IoT in Health 186
8.3 Findings 187
8.3.1 Significant Features and Applications of IoT in Health 187
8.3.1.1 Simultaneous Monitoring and Reporting 189
8.3.1.2 End-to-End Connectivity and Affordability 190
8.3.1.3 Data Analysis 190
8.3.1.4 Tracking, Alerts, and Remote Medical Care 190
8.3.1.5 Research 191
8.3.1.6 Patient-Generated Health Data (PGHD) 191
8.3.1.7 Management of Chronic Diseases and Preventative Care 191
8.3.1.8 Home-Based and Short-Term Care 192
8.4 Case Study: CyberMed as an IoT-Based Smart Health Model 192
8.5 Discussions 193
8.5.1 Limitations of Adopting IoT in Health 193
8.5.1.1 Data Security and Privacy 193
8.5.1.2 Connectivity 194
8.5.1.3 Compatibility and Data Integration 195
8.5.1.4 Implementation Cost 195
8.5.1.5 Complexity and Risk of Errors 195
8.6 Future Insights 196
8.7 Conclusions 197
References 197
9 Enhancement of Scalability of SVM Classifiers for Big Data 203
Vijaykumar Bhajantri, Shashikumar G. Totad and Geeta R. Bharamagoudar
9.1 Introduction 204
9.2 Support Vector Machine 205
9.2.1 Challenges 208
9.3 Parallel and Distributed Mechanism 209
9.3.1 Shared-Memory Parallelism 209
9.4 Distributed Big Data Architecture 210
9.4.1 Hadoop MapReduce 210
9.4.2 Spark 210
9.4.3 Akka 211
9.5 Distributed High Performance Computing 212
9.5.1 GASNet 212
9.5.2 Charm++ 213
9.6 GPU Based Parallelism 214
9.6.1 Cuda 215
9.6.2 OpenCL 215
9.7 Parallel and Distributed SVM Algorithms 217
9.7.1 Ls-svm 218
9.7.2 Cascade SVM 219
9.7.3 dc Svm 220
9.7.4 Parallel Distributed Multiclass SVM Algorithms 222
9.8 Conclusion and Future Research Directions 222
References 225
10 Electrical Network-Related Incident Prediction Based on Weather Factors 233
Hongda Tian, Jessie Nghiem and Fang Chen
10.1 Introduction 233
10.2 Related Work 235
10.3 Methodology 235
10.3.1 Binary Classification of Incident and Normality 235
10.3.2 Incident Categorization Using Natural Language Processing 236
10.3.3 Classification of Multiple Types of Incidents 236
10.4 Experiments 237
10.4.1 Data Sets 237
10.4.2 Evaluation Metrics 239
10.4.3 Binary Classification 239
10.4.4 Incident Categorization 241
10.4.5 Multi-Class Classification 242
10.5 Conclusion and Future Work 244
Acknowledgements 244
References 245
11 Green IoT: Environment-Friendly Approach to IoT 247
Abhishek Goel and Siddharth Gautam
11.1 Introduction 247
11.2 G-IoT (Green Internet of Things) 249
11.3 Layered Architecture of G-IoT 251
11.3.1 Data Center/Cloud 252
11.3.2 Data Analytics and Control Applications It 252
11.3.3 Data Aggregation and Storage 253
11.3.4 Edge Computing 253
11.3.5 Communication and Processing Unit 254
11.4 Techniques for Implementation of G-IoT 257
11.5 Power Saving Methods Based on Components 266
11.6 Applications of G-IoT 266
11.7 Challenges and Future Scope 269
11.8 Case Study 269
11.9 Conclusion 270
References 271
12 Big-Data Analytics: A New Paradigm Shift in Micro Finance Industry 275
Vinay Pal Singh, Rohit Bansal and Ram Singh
12.1 Introduction 276
12.2 Reality of Area and Transcendent Difficulties 276
12.2.1 Probable Overlending 278
12.2.2 Information Imbalance 278
12.2.3 Retreating Not-for-Profit Sector 278
12.2.4 Neighbourhood Pressure 279
12.3 Data Analytics in Microfinance 280
12.3.1 Types of Data Analytics Used in Microfinance 280
12.3.2 Use of Big Data in Microfinance Industry 281
12.3.3 Risk and Data Based Credit Decisions 282
12.3.4 Product Development and Selection 283
12.3.5 Product or Service Positioning 283
12.3.6 M-Commerce and E-Payments 283
12.3.7 Making Reliable Credit Decisions 284
12.3.8 Big Data-Driven Model Promises Psychometric Evaluations 284
12.3.9 Product Build-Up, Service Positioning, and Offering 284
12.4 Opportunities and Risks in Using Data Analytics 284
12.5 Risk in Utilizing Big Data 287
12.6 Conclusion 290
References 290
13 Big Data Storage and Analysis 293
Namrata Dhanda
13.1 Introduction 293
13.1.1 6 V’s of Big Data 294
13.1.2 Types of Data 295
13.1.3 Issues in Handling Big Data 297
13.2 Hadoop as a Solution to Challenges of Big Data 297
13.2.1 The Hadoop Ecosystem 298
13.2.2 Rack Awareness Policy in HDFS 307
13.3 In-Memory Storage and NoSQL 308
13.3.1 Key-Value Data Stores 309
13.3.2 Document Stores 309
13.3.3 Wide Column Stores 310
13.3.4 Graph Stores 310
13.3.5 Multi-Modal Databases 310
13.4 Advantages of NoSQL Database 310
13.5 Conclusion 311
References 311
14 A Framework for Analysing Social Media and Digital Data by Applying Machine Learning Techniques for Pandemic Management 313
Mutyala Sridevi
14.1 Introduction 314
14.2 Literature Review 314
14.3 Understanding Pandemic Analogous to a Disaster 317
14.4 Application of Machine Learning Techniques at Various Phases of Pandemic Management 318
14.4.1 Mitigation Phase 319
14.4.2 Preparedness Phase 320
14.4.3 Response Phase 321
14.4.4 Recovery Phase 321
14.5 Generalized Framework to Apply Machine Learning Techniques for Pandemic Management 322
14.6 Conclusion 324
References 324
About the Editors 327
Index 329
Erscheinungsdatum | 03.11.2022 |
---|---|
Sprache | englisch |
Gewicht | 794 g |
Themenwelt | Informatik ► Datenbanken ► Data Warehouse / Data Mining |
Informatik ► Office Programme ► Outlook | |
Technik ► Bauwesen | |
ISBN-10 | 1-119-79188-X / 111979188X |
ISBN-13 | 978-1-119-79188-1 / 9781119791881 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich