Resource Management on Distributed Systems (eBook)
520 Seiten
Wiley-IEEE Press (Verlag)
978-1-119-91295-8 (ISBN)
Comprehensive guide to the principles, algorithms, and techniques underlying resource management for clouds, big data, and sensor-based systems
Resource Management on Distributed Systems provides helpful guidance by describing algorithms and techniques for managing resources on parallel and distributed systems, including grids, clouds, and parallel processing-based platforms for big data analytics.
The book focuses on four general principles of resource management and their impact on system performance, energy usage, and cost, including end-of-chapter exercises. The text includes chapters on sensors, autoscaling on clouds, complex event processing for streaming data, and data filtering techniques for big data systems.
The book also covers results of applying the discussed techniques on simulated as well as real systems (including clouds and big data processing platforms), and techniques for handling errors associated with user predicted task execution times.
Written by a highly qualified academic with significant research experience in the field, Resource Management on Distributed Systems includes information on sample topics such as:
- Attributes of parallel/distributed applications that have an intimate relationship with system behavior and performance, plus their related performance metrics.
- Handling a lack of a prior knowledge of local operating systems on individual nodes in a large system.
- Detection and management of complex events (that correspond to the occurrence of multiple raw events) on a platform for streaming analytics.
- Techniques for reducing data latency for multiple operator-based queries in an environment processing large textual documents.
With comprehensive coverage of core topics in the field, Resource Management on Distributed Systems is a comprehensive guide to resource management in a single publication and is an essential read for professionals, researchers and students working with distributed systems.
Shikharesh Majumdar is Chancellor's Professor & Director at Real Time and Distributed Systems Research Centre, Carleton University, Canada. Professor Majumdar earned his PhD in Computational Science from the University of Saskatchewan in 1988 and is a Senior Member of the IEEE and a Fellow of Institute of Engineering and Technology (IET). Professor Majumdar's research interests include Parallel and Distributed Systems, Operating Systems, Middleware, and many more. He has had many papers published in Journals and Refereed Conference Proceedings, has provided various contributions to many books and is the recipient of multiple awards.
Comprehensive guide to the principles, algorithms, and techniques underlying resource management for clouds, big data, and sensor-based systems Resource Management on Distributed Systems provides helpful guidance by describing algorithms and techniques for managing resources on parallel and distributed systems, including grids, clouds, and parallel processing-based platforms for big data analytics. The book focuses on four general principles of resource management and their impact on system performance, energy usage, and cost, including end-of-chapter exercises. The text includes chapters on sensors, autoscaling on clouds, complex event processing for streaming data, and data filtering techniques for big data systems. The book also covers results of applying the discussed techniques on simulated as well as real systems (including clouds and big data processing platforms), and techniques for handling errors associated with user predicted task execution times. Written by a highly qualified academic with significant research experience in the field, Resource Management on Distributed Systems includes information on sample topics such as: Attributes of parallel/distributed applications that have an intimate relationship with system behavior and performance, plus their related performance metrics.Handling a lack of a prior knowledge of local operating systems on individual nodes in a large system.Detection and management of complex events (that correspond to the occurrence of multiple raw events) on a platform for streaming analytics.Techniques for reducing data latency for multiple operator-based queries in an environment processing large textual documents. With comprehensive coverage of core topics in the field, Resource Management on Distributed Systems is a comprehensive guide to resource management in a single publication and is an essential read for professionals, researchers and students working with distributed systems.
Preface
The availability of processors, memory, and high-speed interconnection networks at a reasonable cost is continuously increasing the use of parallel and distributed systems. Appropriate management of resources is crucial, however, for effectively harnessing the power of the underlying resource pool. While resource management techniques on conventional single processor systems is covered in many standard operating systems books, there is comparatively less coverage of resource management on parallel and distributed systems in the currently existing books. This book aims at addressing this gap and describing algorithms and techniques for managing resources on parallel and distributed systems, including grids, clouds, smart facilities, and parallel processing-based platforms for big data analytics.
The book focuses on resource management on distributed systems, which is of interest to students, researchers, and industrial practitioners who work with systems comprising multiple resources that may range from clusters to clouds to smart facilities. In addition to a discussion of existing knowledge in the area, the book includes material based on research results that have made significant contributions to the state of the art. The book describes key concepts as well as summarizes research results. The key features of the book include the following:
- An introduction to five general principles of resource management that will be adapted by the techniques described in the following chapters.
- A description of the resource management techniques and algorithms with a discussion of their impact on system performance, energy usage, and cost. When applicable, attention is paid to the trade-offs among these three characteristics. Special techniques for achieving system scalability are discussed.
- Results of applying the techniques on simulated as well as real systems (including clouds and big data processing platforms).
- In addition to data processing on cloud environments topics that include the following are discussed:
- ○ Big data platforms and frameworks, e.g. MapReduce.
- ○ Sensor-based smart systems that are becoming an important component of a smart society.
- Research results that include the description of the different experiments and pointers to research papers that provide supporting documentation for the research described. Insights into system behavior and performance resulting from these research results are provided.
The organization of the book is influenced by many years of teaching graduate courses in distributed systems in general and resource management in particular, as well as my research performed in the area. Examples and exercises are included in appropriate sections of the book.
Book Contents
This book concerns resource management in distributed systems that can be classified into two categories: computing-intensive systems and data-intensive systems. The book has two parts, each focusing on a particular type of distributed systems. Part 1 (Chapters 2–5) focuses on issues underlying distributed computing-intensive systems, whereas Part 2 (Chapters 6–10) is concerned with distributed data-intensive systems.
Basics: Introductory material, including definitions of basic units of distributed computations and their characteristics, are discussed in Chapters 1 and 2.
Chapter 1 describes the evolution of distributed systems from nodes communicating with one another using basic communication mechanisms such as remote procedure calls (RPCs) to clusters, grids, and clouds, including edge computing systems and smart facilities. The basic units of computation used by various applications such as threads and processes are introduced. Three types of resource management operations performed on these computation units that include allocation and sheduling are described. General principles underlying resource management that form the backbone of a number of resource management techniques described in the later chapters are introduced.
Chapter 2 focuses on the characterization of parallelism in applications. Both graph-based characteristics as well as single-point characteristics are introduced. These application attributes have an intimate relationship with the execution behavior and performance of the system running the application and are thus important in the context of resource management. Performance metrics that can be used for analyzing the performance of resource management algorithms are described in this chapter. Energy consumed by computation and data-intensive applications is often of critical concern. The later part of the chapter introduces energy-related metrics and characteristics and discusses their interrelationship. The trade-off between energy consumption and performance is discussed. Energy-aware resource management is the subject of discussion in a later chapter.
Allocation and Scheduling: Resource allocation and scheduling are two important resource management operations that are discussed in Chapter 3. The resource allocator maps the application work units to processing resources and determines which process/thread will be executing on which processor. Well-known results in the area are discussed. The chapter discusses both optimal algorithms as well as techniques for devising heuristic resource allocation techniques. The discussion of resource allocation is followed by a discussion on process/task scheduling. The task scheduler decides the order in which computation units allocated to a processor will execute. Scheduling of tasks for an application with service level agreements that include deadlines for completion is considered. Scheduling algorithms for both single processor systems as well as systems with multiple processors are described. Analyses of performance of these algorithms are presented. The chapter ends with a discussion of scheduling techniques for client server systems.
Handling uncertainties in allocation and scheduling: Real systems are often characterized by uncertainties associated with system and workload parameters. Chapter 4 focuses on systems with such uncertainties and describes how to build in robustness into resource management techniques to mitigate their adverse impact on system performance. Two types of uncertainties are discussed. The first results from the errors associated with user-predicted task execution times that are often specified as part of a service level agreement (SLA). Techniques for handling both underestimation and overestimation of task execution times are discussed. Analyses of performance of the techniques are presented. A cloud data center often comprises hundreds and thousands of computing resources that are susceptible to changes with time and the exact local scheduling policy used by each resource may not always be available a priori to the resource manager for the data center. Techniques for resource management to handle this second type of uncertainty associated with the knowledge of the local scheduling policies used at the various resources are described. Performance analyses of these techniques referred to as techniques for resource management “in the dark” are presented.
Handling changes in workload in resource allocation: Determining the number of resources to provision for a given workload is a complex undertaking. The problem is further complicated by dynamic changes in workload that are typically the case on a distributed system used by multiple users. Chapter 5 addresses the problem of dynamically controlling the number of CPU resources by using resource auto-scaling. System capacity is increased or decreased automatically by the auto-scaling algorithm so that client satisfaction is met while keeping the cost of resource usage under control. The concepts of both vertical auto-scaling in which the CPU power and memory capacity of a given resource are controlled in accordance with the system load, and horizontal auto-scaling that increases or decreases the number of computing resources in accordance with the change in system workload are introduced. Most of the chapter concerns horizontal auto-scaling techniques. Three types of horizontal auto-scaling are discussed: (i) reactive auto-scaling for which a change to the number of resources is made after a change has occurred to the system workload; (ii) proactive auto-scaling for which the future system workload is predicted and the change in the number of resources to handle a change in this future workload is computed proactively before the said change in workload intensity occurs; and (iii) hybrid auto-scaling that is a combination of reactive and proactive auto-scaling. Performance analysis for each technique is reported.
Data-Intensive Systems and MapReduce: Part 2 of the book that concerns data-intensive distributed systems starts with Chapter 6, which focuses on platforms running MapReduce jobs that are used in big data analytics as well as for other data-intensive applications. The chapter describes techniques for allocation and scheduling for MapReduce jobs associated with SLAs that include job completion deadlines. Two resource management algorithms, a budget-based algorithm and a constraint programming-based algorithm, are discussed. The SLA associated with a job includes user estimates of task execution times that are often subject to error. Two techniques for handling such errors and increasing the robustness of resource management are described. The chapter includes a thorough discussion of the performance of the various...
Erscheint lt. Verlag | 6.9.2024 |
---|---|
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik ► Netzwerke |
Mathematik / Informatik ► Informatik ► Theorie / Studium | |
Schlagworte | Big Data Analytics • big data platforms. • computing resource management • distributed system cost • distributed system management • distributed system performance • operator based queries • parallel system cost • parallel system performance • reduce data latency • resource management on clouds |
ISBN-10 | 1-119-91295-4 / 1119912954 |
ISBN-13 | 978-1-119-91295-8 / 9781119912958 |
Haben Sie eine Frage zum Produkt? |
Größe: 9,9 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich