Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Data Wrangling with SQL (eBook)

A hands-on guide to manipulating, wrangling, and engineering data using SQL
eBook Download: EPUB
2023
350 Seiten
Packt Publishing (Verlag)
978-1-83763-430-9 (ISBN)

Lese- und Medienproben

Data Wrangling with SQL - Raghav Kandarpa, Shivangi Saxena
Systemvoraussetzungen
27,59 inkl. MwSt
(CHF 26,95)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

The amount of data generated continues to grow rapidly, making it increasingly important for businesses to be able to wrangle this data and understand it quickly and efficiently. Although data wrangling can be challenging, with the right tools and techniques you can efficiently handle enormous amounts of unstructured data.
The book starts by introducing you to the basics of SQL, focusing on the core principles and techniques of data wrangling. You'll then explore advanced SQL concepts like aggregate functions, window functions, CTEs, and subqueries that are very popular in the business world. The next set of chapters will walk you through different functions within SQL query that cause delays in data transformation and help you figure out the difference between a good query and bad one. You'll also learn how data wrangling and data science go hand in hand. The book is filled with datasets and practical examples to help you understand the concepts thoroughly, along with best practices to guide you at every stage of data wrangling.
By the end of this book, you'll be equipped with essential techniques and best practices for data wrangling, and will predominantly learn how to use clean and standardized data models to make informed decisions, helping businesses avoid costly mistakes.


Become a data wrangling expert and make well-informed decisions by effectively utilizing and analyzing raw unstructured data in a systematic mannerPurchase of the print or Kindle book includes a free PDF eBookKey FeaturesImplement query optimization during data wrangling using the SQL language with practical use casesMaster data cleaning, handle the date function and null value, and write subqueries and window functionsPractice self-assessment questions for SQL-based interviews and real-world case study roundsBook DescriptionThe amount of data generated continues to grow rapidly, making it increasingly important for businesses to be able to wrangle this data and understand it quickly and efficiently. Although data wrangling can be challenging, with the right tools and techniques you can efficiently handle enormous amounts of unstructured data. The book starts by introducing you to the basics of SQL, focusing on the core principles and techniques of data wrangling. You'll then explore advanced SQL concepts like aggregate functions, window functions, CTEs, and subqueries that are very popular in the business world. The next set of chapters will walk you through different functions within SQL query that cause delays in data transformation and help you figure out the difference between a good query and bad one. You'll also learn how data wrangling and data science go hand in hand. The book is filled with datasets and practical examples to help you understand the concepts thoroughly, along with best practices to guide you at every stage of data wrangling. By the end of this book, you'll be equipped with essential techniques and best practices for data wrangling, and will predominantly learn how to use clean and standardized data models to make informed decisions, helping businesses avoid costly mistakes.What you will learnBuild time series models using data wranglingDiscover data wrangling best practices as well as tips and tricksFind out how to use subqueries, window functions, CTEs, and aggregate functionsHandle missing data, data types, date formats, and redundant dataBuild clean and efficient data models using data wrangling techniquesRemove outliers and calculate standard deviation to gauge the skewness of dataWho this book is forThis book is for data analysts looking for effective hands-on methods to manage and analyze large volumes of data using SQL. The book will also benefit data scientists, product managers, and basically any role wherein you are expected to gather data insights and develop business strategies using SQL as a language. If you are new to or have basic knowledge of SQL and databases and an understanding of data cleaning practices, this book will give you further insights into how you can apply SQL concepts to build clean, standardized data models for accurate analysis.]]>

Table of Contents


Preface


Part 1: Data Wrangling Introduction


1


Database Introduction


Getting started


Establishing the foundation


Efficient data organization


Data integrity and consistency


Technical requirements


Decoding database structures – relational and non-relational


What is a database?


Types of databases


Tables and relationships


The SQL CREATE DATABASE statement


The SQL CREATE TABLE statement


SQL DROP TABLE versus TRUNCATE TABLE


SQL ALTER TABLE


SQL constraints


SQL keys


Database relationships


Comparing database normalization and denormalization


Normalization


Types of normalization


Denormalization


When to apply denormalization


Disadvantages of denormalization


Summary


Practical exercises


Practical exercise 1


Practical exercise 2


Practical exercise 3


Practical exercise 4


2


Data Profiling and Preparation before Data Wrangling


What is data wrangling?


Data wrangling steps


The importance of data wrangling


Benefits of data wrangling


Data wrangling use cases


Business use cases


Data capture


How does data get captured?


Data-capturing techniques


Web scraping


Structured versus unstructured data


Paid-for versus free data-wrangling tools


Data profiling


Data profiling types


Data profiling techniques


Practical exercise


Step 1 – Discovery


Step 2 – Structuring


Step 3 – Cleaning


Step 4 – Enriching


Step 5 – Validating


Step 6 – Publishing


Summary


Part 2: Data Wrangling Techniques Using SQL


3


Data Wrangling on String Data Types


SQL data types


Numeric data types


Date and time data types


String data type


SQL string functions


RIGHT()


LEFT()


LEN()


TRIM()


RTRIM()


LTRIM()


RPAD()


LPAD()


REPLACE()


REVERSE()


SUBSTRING()


CAST()


CONCATENATE()


CONCATENATE_WS()


UPPER function


LOWER function


INITCAP function


INSTR function


Summary


Practical exercises


Practical exercise 1


Practical exercise 2


Practical exercise 3


Practical exercise 4


4


Data Wrangling on the DATE Data Type


SQL DATE data type functions


EXTRACT


DATEDIFF()


TIMEDIFF()


DATE_ADD()


DATE_SUB()


DATE_FORMAT()


STR_TO_DATE()


Extracting the current date and time


Summary


5


Handling NULL Values


The impact of missing data and NULL values on data analysis


Understanding the importance of data validation and cleaning before analyzing data


Identifying NULL/missing values


NULL values versus zero values


Using the IS NULL and IS NOT NULL operators to filter and select data with NULL values


IS NULL() and IS NOT NULL() – scenario


Using the COALESCE and IFNULL functions to replace NULL values with a default value


IFNULL()


COALESCE()


IS NULL versus = NULL


Summary


6


Pivoting Data Using SQL


SQL Transpose – rows to columns


Use case scenario


SQL Cross Tab – columns to rows


Use case scenario


Unpivoting data in SQL


Analytical workflow – from SQL to business intelligence – transforming data into actionable insights


Summary


Part 3: SQL Subqueries, Aggregate And Window Functions


7


Subqueries and CTEs


Introduction to subqueries


Simple subqueries


Correlated subqueries


Using subqueries in SELECT statements


Using subqueries in FROM statements


Using subqueries in WHERE statements


Nested subqueries


Correlated subqueries


Using subqueries in INSERT, UPDATE, and DELETE statements


Managing and maintaining subqueries


Common table expressions


Performance considerations for subqueries and CTEs


Subquery versus CTEs


Summary


8


Aggregate Functions


Overview of aggregate functions in SQL


Using GROUP BY


COUNT()


SUM()


AVG()


MIN() and MAX()


COUNT(DISTINCT)


Case scenario – using all aggregate functions


Summary


9


SQL Window Functions


The importance of SQL window functions


SQL aggregate functions


SQL window functions versus aggregate functions


Window functions versus aggregate functions – an example to illustrate the differences


Window functions


SUM()


COUNT()


AVG()


ROW_NUMBER()


RANK() and DENSE_RANK()


Lead() and Lag()


NTILE()


Summary


Part 4: Optimizing Query Performance


10


Optimizing Query Performance


Introduction to query optimization


Query execution plan


Query optimization techniques


Example


Caching


Normalization


Query monitoring and troubleshooting


Query profiling


Query logging


Database monitoring


Tips and tricks for writing efficient queries


Summary


In the next chapter, we will learn about descriptive statistics using SQL, which will provide us with insights into the distribution, central tendency, and variability of data, which can, in turn, help us identify outliers and anomalies. Common SQL functions and statements used for descriptive statistics include COUNT, AVG, MIN, MAX, and GROUP BY. By using SQL to analyze data, researchers and analysts can efficiently extract and summarize information from large datasets.


Part 5:Data Science And Wrangling


11


Descriptive Statistics with SQL


Calculating descriptive statistics with SQL


Mean


Median


Mode


Standard deviation


Variance


Variability


Summary


In the next chapter, we will learn how SQL can be used for time series analysis.


12


Time Series with SQL


Running totals


Case scenario


Lead and lag for time series analysis


Case scenario


Key KPIs


Percentage change


Case scenario


Key KPIs


Moving averages


Case scenario


Key KPIs


Rank for time series analysis


Case scenario


Key KPIs


CTE for time series analysis


Importance of using CTEs while performing time series analysis


Forecasting with linear regression


Case scenario


Key KPIs


Summary


In the next chapter, we will learn different methods to find outliers in the data easily. Outlier detection is an important aspect of data analysis as it helps determine if the data is correct, looks at the skewness of the data, and removes any unexpected values.


13


Outlier Detection


Measures of central tendency and dispersion


Case scenario


Key KPIs


Methods for detecting...


EPUBEPUB (Ohne DRM)

Digital Rights Management: ohne DRM
Dieses eBook enthält kein DRM oder Kopier­schutz. Eine Weiter­gabe an Dritte ist jedoch rechtlich nicht zulässig, weil Sie beim Kauf nur die Rechte an der persön­lichen Nutzung erwerben.

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür die kostenlose Software Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
der Grundkurs für Ausbildung und Praxis

von Ralf Adams

eBook Download (2023)
Carl Hanser Verlag GmbH & Co. KG
CHF 29,30
Das umfassende Handbuch

von Wolfram Langer

eBook Download (2023)
Rheinwerk Computing (Verlag)
CHF 38,95