Serverless Machine Learning with Amazon Redshift ML - Phil Bates, Sumeet Joshi, Debu Panda, Bhanu Pittampally

Blick ins Buch

Serverless Machine Learning with Amazon Redshift ML (eBook)

Create, train, and deploy machine learning models using familiar SQL commands

Phil Bates, Sumeet Joshi, Debu Panda, Bhanu Pittampally (Autoren)

eBook Download: EPUB

2023
290 Seiten
Packt Publishing (Verlag)
9781804619698 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (EPUB)

Amazon Redshift Serverless enables organizations to run petabyte-scale cloud data warehouses quickly and in a cost-effective way, enabling data science professionals to efficiently deploy cloud data warehouses and leverage easy-to-use tools to train models and run predictions. This practical guide will help developers and data professionals working with Amazon Redshift data warehouses to put their SQL knowledge to work for training and deploying machine learning models.
The book begins by helping you to explore the inner workings of Redshift Serverless as well as the foundations of data analytics and types of data machine learning. With the help of step-by-step explanations of essential concepts and practical examples, you'll then learn to build your own classification and regression models. As you advance, you'll find out how to deploy various types of machine learning projects using familiar SQL code, before delving into Redshift ML. In the concluding chapters, you'll discover best practices for implementing serverless architecture with Redshift.
By the end of this book, you'll be able to configure and deploy Amazon Redshift Serverless, train and deploy machine learning models using Amazon Redshift ML, and run inference queries at scale.

Supercharge and deploy Amazon Redshift Serverless, train and deploy machine learning models using Amazon Redshift ML, and run inference queries at scaleKey FeaturesLeverage supervised learning to build binary classification, multi-class classification, and regression modelsLearn to use unsupervised learning using the K-means clustering methodMaster the art of time series forecasting using Redshift MLPurchase of the print or Kindle book includes a free PDF eBookBook DescriptionAmazon Redshift Serverless enables organizations to run petabyte-scale cloud data warehouses quickly and in a cost-effective way, enabling data science professionals to efficiently deploy cloud data warehouses and leverage easy-to-use tools to train models and run predictions. This practical guide will help developers and data professionals working with Amazon Redshift data warehouses to put their SQL knowledge to work for training and deploying machine learning models. The book begins by helping you to explore the inner workings of Redshift Serverless as well as the foundations of data analytics and types of data machine learning. With the help of step-by-step explanations of essential concepts and practical examples, you ll then learn to build your own classification and regression models. As you advance, you ll find out how to deploy various types of machine learning projects using familiar SQL code, before delving into Redshift ML. In the concluding chapters, you ll discover best practices for implementing serverless architecture with Redshift. By the end of this book, you ll be able to configure and deploy Amazon Redshift Serverless, train and deploy machine learning models using Amazon Redshift ML, and run inference queries at scale.What you will learnUtilize Redshift Serverless for data ingestion, data analysis, and machine learningCreate supervised and unsupervised models and learn how to supply your own custom parametersDiscover how to use time series forecasting in your data warehouseCreate a SageMaker endpoint and use that to build a Redshift ML model for remote inferenceFind out how to operationalize machine learning in your data warehouseUse model explainability and calculate probabilities with Amazon Redshift MLWho this book is forData scientists and machine learning developers working with Amazon Redshift who want to explore its machine-learning capabilities will find this definitive guide helpful. A basic understanding of machine learning techniques and working knowledge of Amazon Redshift is needed to make the most of this book.]]>

1 Introduction to Amazon Redshift Serverless

“Hey, what’s a data warehouse?” John Doe, CEO and co-founder of Red.wines, a fictional specialty wine e-commerce company, asked Tathya Vishleshak*, the company’s CTO. John, who owned a boutique winery, had teamed up with Tathya for the project. The company’s success surged during the pandemic, driven by social media and the stay-at-home trend. John wanted detailed data analysis to align inventory and customer outreach. However, there was a problem – producing this analysis was slowing down their online transaction processing (OLTP) database.

“A data warehouse is like a big database where we store different data for a long time to find insights and make decisions,” Tathya explained.

John had a concern, “Sounds expensive; we’re already paying for unused warehouse space. Can we afford it?”

Tathya reassured him, “You’re right, but there are cloud data warehouses such as Amazon Redshift Serverless that let you pay as you use.”

Expanding on this, this chapter introduces data warehousing and Amazon Redshift. We’ll cover Amazon Redshift Serverless basics, such as namespaces and workgroups, and guide you in creating a data warehouse. Amazon Redshift can gather data from various sources, mainly Amazon Simple Storage Service (S3).

As we go through this chapter, you’ll learn about a crucial aspect of this, the AWS Identity and Access Management (IAM) role, needed for loading data from S3. This role connects to your Serverless namespace for smooth data transfer. You’ll also learn how to load sample data and run queries using Amazon Redshift query editor. Our goal is to make it simple and actionable, so you’re confident in navigating this journey.

Tathya Vishleshak

The phrase 'Tathya Vishleshak' can be loosely interpreted to reflect the concept of a data analyst in Sanskrit/Hindi. However, it's important to note that this is not a precise or established translation, but rather an attempt to convey a similar meaning based on the individual meanings of the words 'Tathya' and 'Vishleshak' in Sanskrit.

Additionally, Amazon Redshift is used to analyze structured and unstructured data in data warehouses, operational databases, and data lakes. It’s employed for traditional data warehousing, business intelligence, real-time analytics, and machine learning/predictive analytics. Data analysts and developers use Redshift data with machine learning (ML) models for tasks such as predicting customer behavior. Amazon Redshift ML streamlines this process using familiar SQL commands.

The book delves into ML, explaining supervised and unsupervised training. You’ll learn about problem-solving with binary classification, multi-class classification, and regression using real-world examples. You’ll also discover how to create deep learning models and custom models with XGBoost, as well as use time series forecasting. The book also covers in-database and remote inferences using existing models, applying ML for predictive analytics, and operationalizing machine learning models.

The following topics will be covered in this chapter:

What is Amazon Redshift?
Getting started with Amazon Redshift Serverless
Connecting to your data warehouse

This chapter requires a web browser and access to an AWS account.

What is Amazon Redshift?

Organizations churn out vast troves of customer data along with insights into these customers’ interactions with the business. This data gets funneled into various applications and stashed away in disconnected systems. A conundrum arises when attempting to decipher these data silos – a formidable challenge that hampers the derivation of meaningful insights essential for organizational clarity. Adding to this complexity, security and performance considerations typically muzzle business analysts from accessing data within OLTP systems.

The hiccup is that intricate analytical queries weigh down OLTP databases, casting a shadow over their core operations. Here, the solution is the data warehouse, which is a central hub of curated data, used by business analysts and data scientists to make informed decisions by employing the business intelligence and machine learning tools at their disposal. These users make use of Structured Query Language (SQL) to derive insights from this data trove. From operational systems, application logs, and social media streams to the influx of IoT device-generated data, customers channel structured and semi-structured data into organizations’ data warehouses, as depicted in Figure 1.1, showcasing the classic architecture of a conventional data warehouse.

Figure 1.1 – Data warehouse

Here’s where Amazon Redshift Serverless comes in. It’s a key option within Amazon Redshift, a well-managed cloud data warehouse offered by Amazon Web Services (AWS). With cloud-based ease, Amazon Redshift Serverless lets you set up your data storage without infrastructure hassles or cost worries. You pay based on what you use for compute and storage.

Amazon Redshift Serverless goes beyond convenience, propelling modern data applications that seamlessly connect to the data lake. Enter the data lake – a structure that gathers all data strands under one roof, providing limitless space to store data at any scale, cost-effectively. Alongside other data repositories such as data warehouses, data lakes redefine how organizations handle data. And this is where it all comes together – the following diagram shows how Amazon Redshift Serverless injects SQL-powered queries into the data lake, driving a dynamic data flow:

Figure 1.2 – Data lake and data warehouse

So, let’s get started on creating our first data warehouse in the cloud!

Getting started with Amazon Redshift Serverless

You can create your data warehouse with Amazon Redshift Serverless using the AWS Command-Line Interface (CLI), the API, AWS CloudFormation templates, or the AWS console. We are going to use the AWS console to create a Redshift Serverless data warehouse. Log in to your AWS console and search for Redshift in the top bar, as shown in Figure 1.3:

Figure 1.3 – AWS console page showing services filtered by our search for Redshift

Click on Amazon Redshift, which will take you to the home page for the Amazon Redshift console, as shown in Figure 1.4. To help get you started, Amazon provides free credit for first-time Redshift Serverless customers. So, let’s start creating your trial data warehouse by clicking on Try Amazon Redshift Serverless. If you or your organization has tried Amazon Redshift Serverless before, you will have to pay for the service based on your usage:

Figure 1.4 – Amazon Redshift service page in the AWS console

If you have free credit available, it will be indicated at the top of your screen, as in Figure 1.5:

Figure 1.5 – AWS console showing the Redshift Serverless Get started page

You can either choose the defaults or use the customized settings to create your data warehouse. The customized settings give you more control, allowing you to specify many additional parameters for your compute configuration including the workgroup, data-related settings such as the namespace, and advanced security settings. We will use the customized settings, which will help us customize the namespace settings for our Serverless data warehouse. A namespace combined with a workgroup is what makes a data warehouse with Redshift Serverless, as we will now see in more detail.

What is a namespace?

Amazon Redshift Serverless provides a separation of storage and compute for a data warehouse. A namespace is a collection of all your data stored in the database such as your tables, views, database users, and their privileges. You are separately charged for storage based on the size of the data stored in your data warehouse. For compute, you are charged for the capacity used over a given duration in Redshift processing hours (RPU) on a per second-basis. The storage capacity is billed as Redshift managed storage (RMS) and is billed by GB/month. You can view...

Erscheint lt. Verlag	30.8.2023
Vorwort	Colin Mahony
Sprache	englisch
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
Themenwelt	Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik
ISBN-13	9781804619698 / 9781804619698

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

EPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

CHF 66,30