Advances in Data Science and Analytics

Concepts and Paradigms

M. Niranjanamurthy, Hemant Kumar Gianey, Amir H. Gandomi (Herausgeber)

Buch | Hardcover

352 Seiten

2022
Wiley-Scrivener (Verlag)
9781119791881 (ISBN)

Artikel merken

ADVANCES in DATA SCIENCE and ANALYTICS Presenting the concepts and advances of data science and analytics, this volume, written and edited by a global team of experts, also goes into the practical applications that can be utilized across multiple disciplines and industries, for both the engineer and the student, focusing on machining learning, big data, business intelligence, and analytics.

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, deep learning, and big data. Data analytics software is a more focused version of this and can even be considered part of the larger process. Analytics is devoted to realizing actionable insights that can be applied immediately based on existing queries. For the purposes of this volume, data science is an umbrella term that encompasses data analytics, data mining, machine learning, and several other related disciplines. While a data scientist is expected to forecast the future based on past patterns, data analysts extract meaningful insights from various data sources.

Although data mining and other related areas have been around for a few decades, data science and analytics are still quickly evolving, and the processes and technologies change, almost on a day-to-day basis. This volume provides an overview of some of the most important advances in these areas today, including practical coverage of the daily applications. Valuable as a learning tool for beginners in this area as well as a daily reference for engineers and scientists working in these areas, this is a must-have for any library.

M. Niranjanamurthy, PhD, is an assistant professor in the Department of Computer Applications, M. S. Ramaiah Institute of Technology, Bangalore, Karnataka, India. He earned his PhD in computer science at JJTU. He has over 13 years of teaching experience and two years of industry experience as a software engineer. He has published four books and 85 papers in technical journals and conferences. He has six patents to his credit and has won numerous awards. Hemant Kumar Gianey, PhD, is a senior assistant professor in the Computer Science Department at Vellore Institute of Technology, AP, India. He also worked at Thapar Institute of Engineering and Technology, Patiala, Punjab, India and worked as a post-doctoral researcher in the Computer Science and Engineering Department at National Cheng Kung University in Taiwan. He has over 15 years of teaching and industry experience. He has conducted many workshops and has been a guest speaker in various universities. He has also published many research papers on in scientific and technical journals. Amir H. Gandomi, PhD, is a professor of data science in the Department of Engineering and Information Technology, University of Technology Sydney. Before joining UTS, he was an assistant professor at the School of Business, Stevens Institute of Technology, NJ, and a distinguished research fellow at BEACON Center, Michigan State University. He has published over 150 journal papers and four books and collectively has been cited more than 14,000 times. He has been named as one of the world’s most influential scientific minds and a Highly Cited Researcher (top 1%) for three consecutive years, from 2017 to 2019. He has also served as associate editor, editor, and guest editor in several prestigious journals and has delivered several keynote talks. He is also part of a NASA technology cluster on Big Data, Artificial Intelligence, and Machine Learning.

Preface xv

1 Implementation Tools for Generating Statistical Consequence Using Data Visualization Techniques 1
Dr. Ajay B. Gadicha, Dr. Vijay B. Gadicha, Prof. Sneha Bohra and Dr. Niranjanamurthy M.

1.1 Introduction 2

1.2 Literature Review 4

1.3 Tools in Data Visualization 4

1.4 Methodology 14

1.4.1 Plotting the Data 14

1.4.2 Plotting the Model on Data 15

1.4.3 Quantifying Linear Relationships 16

1.4.4 Covariance vs. Correlation 17

1.5 Conclusion 18

References 18

2 Decision Making and Predictive Analysis for Real Time Data 21
Umesh Pratap Singh

2.1 Introduction 22

2.2 Data Analytics 23

2.2.1 Descriptive Analytics 23

2.2.2 Diagnostic Analytics 23

2.2.3 Predictive Analytics 23

2.2.4 Prescriptive Analytics 24

2.3 Predictive Modeling 24

2.4 Categories of Predictive Models 24

2.5 Process of Predictive Modeling 25

2.5.1 Requirement Gathering 26

2.5.2 Data Gathering 26

2.5.3 Data Analysis and Massaging 26

2.5.4 Machine Learning Statistics 26

2.5.5 Predictive Modeling 26

2.5.6 Prediction and Decision Making 27

2.6 Predictive Analytics Opportunities 27

2.6.1 Detecting Fraud 27

2.6.2 Reduction of Risk 27

2.6.3 Marketing Campaign Optimization 28

2.6.4 Operation Improvement 28

2.6.5 Clinical Decision Support System 28

2.7 Classification of Predictive Analytics Models 28

2.7.1 Predictive Models 28

2.7.2 Descriptive Models 29

2.7.3 Decision Models 29

2.8 Predictive Analytics Techniques 29

2.8.1 Predictive Analytics Software 29

2.8.2 The Importance of Good Data 30

2.8.3 Predictive Analytics vs. Business Intelligence 30

2.8.4 Pricing Information 30

2.9 Data Analysis Tools 30

2.9.1 Excel 30

2.9.2 Tableau 31

2.9.3 Power BI 31

2.9.4 Fine Report 31

2.9.5 R & Python 31

2.10 Advantages & Disadvantages of Predictive Modeling 31

2.10.1 Advantages 31

2.10.2 Disadvantages 32

2.10.2.1 Data Labeling 32

2.10.2.2 Obtaining Massive Training Datasets 32

2.10.2.3 The Explainability Problem 32

2.10.2.4 Generalizability of Learning 33

2.10.2.5 Bias in Algorithms and Data 33

2.11 Predictive Analytics Biggest Impact 33

2.11.1 Predicting Demand 33

2.11.2 Transformation Using Technology and Process 34

2.11.3 Improved Pricing 34

2.11.4 Predictive Maintenance 35

2.12 Application of Predictive Analytics 35

2.12.1 Financial and Banking Services 35

2.12.2 Retail 35

2.12.3 Health and Insurance 36

2.12.4 Oil and Gas Utilities 36

2.12.5 Public Sector 36

2.13 Future Scope of Predictive Modeling 36

2.13.1 Technological Advancements 37

2.13.2 Changes in Work 37

2.13.3 Risk Mitigation 37

2.14 Conclusion 37

References 38

3 Optimizing Water Quality with Data Analytics and Machine Learning 39
Bin Liang, Zhidong Li, Hongda Tian, Shuming Liang, Yang Wang and Fang Chen

3.1 Introduction 39

3.2 Related Work 41

3.3 Data Sources and Collection 42

3.4 Water Demand Forecasting 43

3.4.1 Network Flow and Zone Demand Estimation 43

3.4.2 Demand Forecasting 44

3.4.2.1 Feature Importance 45

3.4.2.2 Forecast Horizon 46

3.4.3 Performance Characterization 46

3.5 Re-Chlorination Optimization 49

3.5.1 Data 51

3.5.2 Water Age Estimation 52

3.5.2.1 Travel Time Estimation 53

3.5.2.2 Residential Time Estimation 54

3.5.3 Ammonia Prediction 54

3.5.4 Optimization Model Definition 57

3.5.5 Improvements in Customer Water Quality 59

3.5.6 Plant Dosing Optimization 62

3.6 Conclusion 63

Acknowledgements 63

References 63

4 Lip Reading Framework using Deep Learning and Machine Learning 67
Hemant Kumar Gianey, Parth Khandelwal, Prakhar Goel, Rishav Maheshwari, Bhannu Galhotra and Divyanshu Pratap Singh

4.1 Introduction 68

4.1.1 Overview 68

4.1.2 Motivation 68

4.1.3 Lip Reading System Outcomes and Deliverables 69

4.2 The Emergence and Definition of the Lip-Reading System 70

4.2.1 Background of Domain 70

4.2.2 Identified Problems 78

4.2.3 Tools and Technologies Used 78

4.2.4 Implementation Aspects 78

4.2.4.1 Data Preparation 79

4.3 Design and Components of Lip-Reading System 82

4.4 Lip Reading System Architecture 82

4.5 Testing 84

4.6 Problems Encountered During Implementation 84

4.6.1 Assumptions and Constraints 85

4.7 Conclusion 85

4.8 Future Work 85

References 86

5 New Perspective to Management, Economic Growth and Debt Nexus Analysis: Evidence from Indian Economy 89
Edmund Ntom Udemba, Festus Victor Bekun, Dervis Kirikkaleli and Esra Sipahi Döngül

5.1 Introduction 90

5.2 Literature Review 92

5.2.1 External Debt and Economic Growth 92

5.2.2 Trade Openness, FDI, and Economic Growth 94

5.2.3 FDI and Economic Growth 94

5.3 Data 95

5.3.1 Analytical Framework and Data Description 96

5.3.2 Theoretical Background and Specifications 96

5.3.2.1 Model Specification 98

5.4 Methodology and Findings 99

5.4.1 Unit Root Testing 99

5.4.2 Cointegration 99

5.4.3 Vector Error Correction Model 103

5.4.4 Long-Run Relationship Estimation 105

5.4.5 Causality Test 107

5.5 Conclusion and Policy Implications 108

Declarations 109

Availability of Data and Materials 109

Competing Interests 110

Funding 110

Authors’ Contributions 110

Acknowledgments 110

References 110

6 Data-Driven Delay Analysis with Applications to Railway Networks 115
Boyu Li, Ting Guo, Yang Wang and Fang Chen

6.1 Introduction 116

6.2 Related Works 118

6.3 Background Knowledge 119

6.3.1 Background and Problem Formulation 120

6.3.1.1 Train Delay 120

6.3.1.2 Delay Propagation 121

6.3.2 Preliminaries 122

6.3.2.1 Bayesian Inference 123

6.3.2.2 Markov Property 123

6.4 Delay Propagation Model 123

6.4.1 Conditional Bayesian Delay Propagation 123

6.4.1.1 Delay Self-Propagation 124

6.4.1.2 Incremental Run-Time Delay 125

6.4.1.3 Incremental Dwell Time Delay 125

6.4.1.4 Accumulative Departure Delay 126

6.4.2 Cross-Line Propagation, Backward Propagation and Train Connection Propagation 127

6.5 Primary Delay Tracing Back 130

6.5.1 Delay Candidates Selection 130

6.5.2 Relation Construction 131

6.5.2.1 Preceding and Following Trains 131

6.5.2.2 Preceding and Connecting Trains 131

6.6 Evaluation on Dwell Time Improvement Strategy 132

6.7 Experiments 135

6.7.1 Experiment Setting 135

6.7.2 Temporal Prediction of Delay Propagation 137

6.7.3 Spatial Prediction of Delay Propagation 138

6.7.4 Case Study of Primary Delay Tracing Down 139

6.7.5 Evaluation of Dwell Time Improvement Strategy 140

6.8 Conclusion 142

References 142

7 Proposing a Framework to Analyze Breast Cancer in Mammogram Images Using Global Thresholding, Gray Level Co-Occurrence Matrix, and Convolutional Neural Network (CNN) 145
Ms. Tanishka Dixit and Ms. Namrata Singh

7.1 Introduction & Purpose of Study 146

7.1.1 Segmentation 146

7.1.1.1 Types of Segmentation 147

7.1.2 Compression 150

7.2 Literature Review & Motivation 153

7.3 Proposed Work 161

7.3.1 Algorithm 161

7.3.2 Explanation 162

7.3.3 Flowchart 162

7.4 Observation Tables and Figures 163

7.5 Conclusion 176

7.6 Future Work 176

References 176

8 IoT Technologies for Smart Healthcare 181
Rehab A. Rayan, Imran Zafar and Christos Tsagkaris

8.1 Introduction 182

8.2 Literature Review 183

8.2.1 IoT-Based Smart Health 183

8.2.2 Advantages of Applying IoT in Health 186

8.3 Findings 187

8.3.1 Significant Features and Applications of IoT in Health 187

8.3.1.1 Simultaneous Monitoring and Reporting 189

8.3.1.2 End-to-End Connectivity and Affordability 190

8.3.1.3 Data Analysis 190

8.3.1.4 Tracking, Alerts, and Remote Medical Care 190

8.3.1.5 Research 191

8.3.1.6 Patient-Generated Health Data (PGHD) 191

8.3.1.7 Management of Chronic Diseases and Preventative Care 191

8.3.1.8 Home-Based and Short-Term Care 192

8.4 Case Study: CyberMed as an IoT-Based Smart Health Model 192

8.5 Discussions 193

8.5.1 Limitations of Adopting IoT in Health 193

8.5.1.1 Data Security and Privacy 193

8.5.1.2 Connectivity 194

8.5.1.3 Compatibility and Data Integration 195

8.5.1.4 Implementation Cost 195

8.5.1.5 Complexity and Risk of Errors 195

8.6 Future Insights 196

8.7 Conclusions 197

References 197

9 Enhancement of Scalability of SVM Classifiers for Big Data 203
Vijaykumar Bhajantri, Shashikumar G. Totad and Geeta R. Bharamagoudar

9.1 Introduction 204

9.2 Support Vector Machine 205

9.2.1 Challenges 208

9.3 Parallel and Distributed Mechanism 209

9.3.1 Shared-Memory Parallelism 209

9.4 Distributed Big Data Architecture 210

9.4.1 Hadoop MapReduce 210

9.4.2 Spark 210

9.4.3 Akka 211

9.5 Distributed High Performance Computing 212

9.5.1 GASNet 212

9.5.2 Charm++ 213

9.6 GPU Based Parallelism 214

9.6.1 Cuda 215

9.6.2 OpenCL 215

9.7 Parallel and Distributed SVM Algorithms 217

9.7.1 Ls-svm 218

9.7.2 Cascade SVM 219

9.7.3 dc Svm 220

9.7.4 Parallel Distributed Multiclass SVM Algorithms 222

9.8 Conclusion and Future Research Directions 222

References 225

10 Electrical Network-Related Incident Prediction Based on Weather Factors 233
Hongda Tian, Jessie Nghiem and Fang Chen

10.1 Introduction 233

10.2 Related Work 235

10.3 Methodology 235

10.3.1 Binary Classification of Incident and Normality 235

10.3.2 Incident Categorization Using Natural Language Processing 236

10.3.3 Classification of Multiple Types of Incidents 236

10.4 Experiments 237

10.4.1 Data Sets 237

10.4.2 Evaluation Metrics 239

10.4.3 Binary Classification 239

10.4.4 Incident Categorization 241

10.4.5 Multi-Class Classification 242

10.5 Conclusion and Future Work 244

Acknowledgements 244

References 245

11 Green IoT: Environment-Friendly Approach to IoT 247
Abhishek Goel and Siddharth Gautam

11.1 Introduction 247

11.2 G-IoT (Green Internet of Things) 249

11.3 Layered Architecture of G-IoT 251

11.3.1 Data Center/Cloud 252

11.3.2 Data Analytics and Control Applications It 252

11.3.3 Data Aggregation and Storage 253

11.3.4 Edge Computing 253

11.3.5 Communication and Processing Unit 254

11.4 Techniques for Implementation of G-IoT 257

11.5 Power Saving Methods Based on Components 266

11.6 Applications of G-IoT 266

11.7 Challenges and Future Scope 269

11.8 Case Study 269

11.9 Conclusion 270

References 271

12 Big-Data Analytics: A New Paradigm Shift in Micro Finance Industry 275
Vinay Pal Singh, Rohit Bansal and Ram Singh

12.1 Introduction 276

12.2 Reality of Area and Transcendent Difficulties 276

12.2.1 Probable Overlending 278

12.2.2 Information Imbalance 278

12.2.3 Retreating Not-for-Profit Sector 278

12.2.4 Neighbourhood Pressure 279

12.3 Data Analytics in Microfinance 280

12.3.1 Types of Data Analytics Used in Microfinance 280

12.3.2 Use of Big Data in Microfinance Industry 281

12.3.3 Risk and Data Based Credit Decisions 282

12.3.4 Product Development and Selection 283

12.3.5 Product or Service Positioning 283

12.3.6 M-Commerce and E-Payments 283

12.3.7 Making Reliable Credit Decisions 284

12.3.8 Big Data-Driven Model Promises Psychometric Evaluations 284

12.3.9 Product Build-Up, Service Positioning, and Offering 284

12.4 Opportunities and Risks in Using Data Analytics 284

12.5 Risk in Utilizing Big Data 287

12.6 Conclusion 290

References 290

13 Big Data Storage and Analysis 293
Namrata Dhanda

13.1 Introduction 293

13.1.1 6 V’s of Big Data 294

13.1.2 Types of Data 295

13.1.3 Issues in Handling Big Data 297

13.2 Hadoop as a Solution to Challenges of Big Data 297

13.2.1 The Hadoop Ecosystem 298

13.2.2 Rack Awareness Policy in HDFS 307

13.3 In-Memory Storage and NoSQL 308

13.3.1 Key-Value Data Stores 309

13.3.2 Document Stores 309

13.3.3 Wide Column Stores 310

13.3.4 Graph Stores 310

13.3.5 Multi-Modal Databases 310

13.4 Advantages of NoSQL Database 310

13.5 Conclusion 311

References 311

14 A Framework for Analysing Social Media and Digital Data by Applying Machine Learning Techniques for Pandemic Management 313
Mutyala Sridevi

14.1 Introduction 314

14.2 Literature Review 314

14.3 Understanding Pandemic Analogous to a Disaster 317

14.4 Application of Machine Learning Techniques at Various Phases of Pandemic Management 318

14.4.1 Mitigation Phase 319

14.4.2 Preparedness Phase 320

14.4.3 Response Phase 321

14.4.4 Recovery Phase 321

14.5 Generalized Framework to Apply Machine Learning Techniques for Pandemic Management 322

14.6 Conclusion 324

References 324

About the Editors 327

Index 329

Erscheinungsdatum	03.11.2022
Sprache	englisch
Gewicht	794 g
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
	Informatik ► Office Programme ► Outlook
	Technik ► Bauwesen
ISBN-13	9781119791881 / 9781119791881
Zustand	Neuware