Nicht aus der Schweiz? Besuchen Sie lehmanns.de
Text Mining in Practice with R - Ted Kwartler

Text Mining in Practice with R

(Autor)

Buch | Hardcover
320 Seiten
2017
John Wiley & Sons Inc (Verlag)
978-1-119-28201-3 (ISBN)
CHF 99,95 inkl. MwSt
A reliable, cost-effective approach to extracting priceless business information from all sources of text Excavating actionable business insights from data is a complex undertaking, and that complexity is magnified by an order of magnitude when the focus is on documents and other text information.
A reliable, cost-effective approach to extracting priceless business information from all sources of text

Excavating actionable business insights from data is a complex undertaking, and that complexity is magnified by an order of magnitude when the focus is on documents and other text information. This book takes a practical, hands-on approach to teaching you a reliable, cost-effective approach to mining the vast, untold riches buried within all forms of text using R. 

Author Ted Kwartler clearly describes all of the tools needed to perform text mining and shows you how to use them to identify practical business applications to get your creative text mining efforts started right away. With the help of numerous real-world examples and case studies from industries ranging from healthcare to entertainment to telecommunications, he demonstrates how to execute an array of text mining processes and functions, including sentiment scoring, topic modelling, predictive modelling, extracting clickbait from headlines, and more. You’ll learn how to:



Identify actionable social media posts to improve customer service  
Use text mining in HR to identify candidate perceptions of an organisation, match job descriptions with resumes, and more 
Extract priceless information from virtually all digital and print sources, including the news media, social media sites, PDFs, and even JPEG and GIF image files
Make text mining an integral component of marketing in order to identify brand evangelists, impact customer propensity modelling, and much more

Most companies’ data mining efforts focus almost exclusively on numerical and categorical data, while text remains a largely untapped resource. Especially in a global marketplace where being first to identify and respond to customer needs and expectations imparts an unbeatable competitive advantage, text represents a source of immense potential value. Unfortunately, there is no reliable, cost-effective technology for extracting analytical insights from the huge and ever-growing volume of text available online and other digital sources, as well as from paper documents—until now. 

TED KWARTLER is a data science instructor at DataCamp.com. He has worked in analytical and executive roles at DataRobot, Liberty Mutual Insurance and Amazon.com.

Foreword 1

Chapter 1: What is Text Mining? 1

1.1 What is it? 1

1.1.1 What is text mining in practice? 1

1.1.2 Where does text mining fit? 1

1.2 Why we care about text mining? 1

1.2.1 What are the consequences of ignoring text? 1

1.2.2 What are the benefits of text mining? 1

1.2.3 Setting Expectations: When text mining should (and should not) be used. 1

1.3 A basic workflow. How the process works. 1

1.4 What tools do I need to get started with this? 1

1.5 A Simple Example 1

1.6 A Real World Use Case 1

1.7 Summary 1

Chapter 2: Basics of text mining 1

2.1 What is Text Mining in a practical sense? 1

2.2 Types of Text Mining: Bag of Words. 1

2.2.1 Types of Text Mining: Syntactic Parsing. 1

2.3 The text mining process in context 1

2.4 String Manipulation: Number of Characters & Substitutions 1

2.4.1 String Manipulations: Paste, Character Splits & Extractions 1

2.5 Keyword Scanning 1

2.6 String Packages stringr & stringi 1

2.7 Preprocessing Steps for Bag of Words Text Mining 1

2.8 Spell Check 1

2.9 Frequent Terms & Associations 1

2.9 Delta Assist Wrap Up 1

2.10 Summary 1

Chapter 3: Common Text Mining Visualizations 1

3.1 A tale of two (or three) cultures 1

3.2 Simple Exploration: Term Frequency, Associations & Word Networks 1

3.2.1 Term Frequency 1

3.2.2 Word Associations 1

3.2.3 Word Networks 1

3.3 Simple Word Clusters: Hierarchical Dendrograms 1

3.4 Word Clouds: Overused but Effective 1

3.4.1 One Corpus Word Clouds 1

3.4.2 Comparing and Contrasting Corpora in Word Clouds 1

3.4.3 Polarized Tag Plot 1

3.5 Summary 1

Chapter 4: Sentiment Scoring 1

4.1 What is Sentiment Analysis? 1

4.2 Sentiment Scoring: Parlor Trick or Insightful? 1

4.3 Polarity: Simple Sentiment Scoring 1

4.3.1 Subjectivity Lexicons 1

4.3.2 Qdap’s Scoring for positive and negative word choice 1

4.3.3 Revisiting Word Clouds…Sentiment Word Clouds 1

4.4 Emoticons :) Dealing with these perplexing clues 1

4.4.1 Symbol-Based Emoticons Native to R 1

4.4.2 Punctuation Based Emoticons 1

4.4.3 Emoji 1

4.5 R’s Archived Sentiment Scoring Library 1

4.5 Sentiment the tidytext way 1

4.6 Airbnb.com Boston Wrap Up 1

4.7 Summary 1

Chapter 5: Hidden Structures: Clustering, String Distance, Text Vectors & Topic Modeling 1

5.1 What is clustering? 1

5.1.1 K Means Clustering 1

5.1.2 Spherical K Means Clustering 1

5.1.3 K Mediod Clustering 1

5.1.4 Evaluating the cluster approaches 1

5.2 Calculating & Exploring String Distance 1

5.2.1 What is string distance? 1

5.2.2 Fuzzy Matching-amatch, ain 1

5.2.3 Similarity Distances- stringdist, stringdistmatrix 1

5.3 LDA Topic Modeling Explained 1

5.3.2 Topic Modeling Case Study 1

5.3.2 LDA &LDAvis 1

5.4 Text to Vectors using “text2vec” 1

5.4.1 text2vec 1

5.5 Summary 1

Chapter 6: Document Classification: Finding Clickbait from Headlines 1

6.1 What is document classification? 1

6.2 Clickbait Case Study 1

6.2.2 Session & Data Set Up 1

6.2.3 GLMNET Training 1

6.2.4 GLMNET Test Predictions 1

6.2.5 Test Set Evaluation 1

6.2.6 Finding the most impactful words 1

6.2.7 Case study Wrap Up: Model Accuracy & Improving Performance Recommendations 1

6.3 Summary 1

Chapter 7: Predictive Modeling: Using text for classifying & predicting outcomes. 1

7.1 Classification Vs Prediction 1

7.2 Case Study I: Will this patient come back to the hospital? 1

7.2.2 Patient Readmission in the Text Mining Workflow 1

7.2.3 Session & Data Set Up 1

7.2.4 Patient Modeling 1

7.2.5 More Model KPI: AUC, Recall, Precision & F1 1

7.2.5.1 Additional Evaluation Metrics 1

7.2.6 Apply the model to new patients 1

7.2.7 Patient Readmission Conclusion 1

7.3 Case Study II: Predicting Box Office Success 1

7.3.2 Opening Weekend Revenue in the Text Mining Workflow 1

7.3.3 Session & Data Set Up 1

7.3.4 Opening Weekend Modeling 1

7.3.5 Model Evaluation 1

7.3.6 Apply the Model to new Movie Reviews 1

7.3.7 Movie Revenue Conclusion 1

7.4 Summary 1

Chapter 8: The OpenNLP Project 1

8.1 What is the OpenNLP project? 1

8.2 R’s OpenNLP Package 1

8.3 Named Entities in Hillary Clinton’s Email 1

8.3.1 R Session Set-up 1

8.3.2 Minor Text Cleaning 1

8.3.3 Using OpenNLP on a single email 1

8.3.4 Using OpenNLP on multiple documents 1

8.3.5 Revisiting the Text Mining Workflow 1

8.4 Analyzing the Named Entities 1

8.4.1 Worldwide Map of Hillary Clinton’s Location Mentions 1

8.4.2 Mapping Only European Locations 1

8.4.3 Entities & Polarity: How does Hillary Clinton feel about an entity? 1

8.4.4 Stock Charts for Entities 1

8.4.5 Reach an Insight or Conclusion about Hillary Clinton’s Emails 1

8.5 Summary 1

Chapter 9: Text Sources 1

9.1 Sourcing Text 1

9.2 Web Sources 1

9.2.1 Web Scraping a Single Page with rvest 1

9.2.2 Web Scraping Multiple Pages with rvest 1

9.2.3 Application Program Interfaces (APIs) 1

9.2.4 Newspaper Articles from The Guardian Newspaper 1

9.2.5 Tweets using the “twitteR” Package 1

9.2.6 Calling an API without a dedicated R package 1

9.2.7 Using jsonlite to access the New York Times 1

9.2.8 Using RCurl & XML to Parse Google News Feeds 1

9.2.9 The tm library Web-Mining Plugin 1

9.3 Getting Text from File Sources 1

9.3.1 Individual CSV, TXT and Microsoft Office Files 1

9.3.2 Reading multiple files quickly 1

9.3.2 Extracting Text from PDFs 1

9.3.3 Optical Character Recognition: Extracting Text from Images 1

9.4 Summary 1

Erscheinungsdatum
Verlagsort New York
Sprache englisch
Maße 150 x 229 mm
Gewicht 590 g
Themenwelt Informatik Datenbanken Data Warehouse / Data Mining
Mathematik / Informatik Mathematik Wahrscheinlichkeit / Kombinatorik
ISBN-10 1-119-28201-2 / 1119282012
ISBN-13 978-1-119-28201-3 / 9781119282013
Zustand Neuware
Haben Sie eine Frage zum Produkt?
Mehr entdecken
aus dem Bereich
Datenanalyse für Künstliche Intelligenz

von Jürgen Cleve; Uwe Lämmel

Buch | Softcover (2024)
De Gruyter Oldenbourg (Verlag)
CHF 104,90
Auswertung von Daten mit pandas, NumPy und IPython

von Wes McKinney

Buch | Softcover (2023)
O'Reilly (Verlag)
CHF 62,85