Nicht aus der Schweiz? Besuchen Sie lehmanns.de
Webbots, Spiders, And Screen Scrapers, 2nd Edition - Michael Schrenk

Webbots, Spiders, And Screen Scrapers, 2nd Edition

(Autor)

Buch | Softcover
392 Seiten
2012
No Starch Press,US (Verlag)
978-1-59327-397-2 (ISBN)
CHF 64,30 inkl. MwSt
  • Titel ist leider vergriffen;
    keine Neuauflage
  • Artikel merken
There's a wealth of data online, but sorting and gathering it by hand can be tedious and time consuming. Rather than click through page after endless page, why not let bots do the work for you? Webbots, Spiders, and Screen Scrapers will show you how to create simple programs with PHP/CURL to mine, parse, and archive online data to help you make informed decisions. Michael Schrenk, a highly regarded webbot developer, teaches you how to develop fault-tolerant designs, how best to launch and schedule the work of your bots, and how to create Internet agents that: Send email or SMS notifications to alert you to new information quickly Search different data sources and combine the results on one page, making the data easier to interpret and analyze Automate purchases, auction bids, and other online activities to save time Sample projects for automating tasks like price monitoring and news aggregation will show you how to put the concepts you learn into practice. This second edition of We

; About the Author; About the Technical Reviewer; Acknowledgments; Introduction; Old-School Client-Server Technology; The Problem with Browsers; What to Expect from This Book; About the Website; About the Code; Requirements; A Disclaimer (This Is Important); Fundamental Concepts and Techniques; Chapter 1: What's in It for You?; 1.1 Uncovering the Internet's True Potential; 1.2 What's in It for Developers?; 1.3 What's in It for Business Leaders?; 1.4 Final Thoughts; Chapter 2: Ideas for Webbot Projects; 2.1 Inspiration from Browser Limitations; 2.2 A Few Crazy Ideas to Get You Started; 2.3 Final Thoughts; Chapter 3: Downloading Web Pages; 3.1 Think About Files, Not Web Pages; 3.2 Downloading Files with PHP's Built-in Functions; 3.3 Introducing PHP/CURL; 3.4 Installing PHP/CURL; 3.5 LIB_http; 3.6 Final Thoughts; Chapter 4: Basic Parsing Techniques; 4.1 Content Is Mixed with Markup; 4.2 Parsing Poorly Written HTML; 4.3 Standard Parse Routines; 4.4 Using LIB_parse; 4.5 Useful PHP Functions; 4.6 Final Thoughts; Chapter 5: Advanced Parsing with Regular Expressions; 5.1 Pattern Matching, the Key to Regular Expressions; 5.2 PHP Regular Expression Types; 5.3 Learning Patterns Through Examples; 5.4 Regular Expressions of Particular Interest to Webbot Developers; 5.5 When Regular Expressions Are (or Aren't) the Right Parsing Tool; 5.6 Final Thoughts; Chapter 6: Automating Form Submission; 6.1 Reverse Engineering Form Interfaces; 6.2 Form Handlers, Data Fields, Methods, and Event Triggers; 6.3 Unpredictable Forms; 6.4 Analyzing a Form; 6.5 Final Thoughts; Chapter 7: Managing Large Amounts of Data; 7.1 Organizing Data; 7.2 Making Data Smaller; 7.3 Thumbnailing Images; 7.4 Final Thoughts; Projects; Chapter 8: Price-Monitoring Webbots; 8.1 The Target; 8.2 Designing the Parsing Script; 8.3 Initialization and Downloading the Target; 8.4 Further Exploration; Chapter 9: Image-Capturing Webbots; 9.1 Example Image-Capturing Webbot; 9.2 Creating the Image-Capturing Webbot; 9.3 Further Exploration; 9.4 Final Thoughts; Chapter 10: Link-Verification Webbots; 10.1 Creating the Link-Verification Webbot; 10.2 Running the Webbot; 10.3 Further Exploration; Chapter 11: Search-Ranking Webbots; 11.1 Description of a Search Result Page; 11.2 What the Search-Ranking Webbot Does; 11.3 Running the Search-Ranking Webbot; 11.4 How the Search-Ranking Webbot Works; 11.5 The Search-Ranking Webbot Script; 11.6 Final Thoughts; 11.7 Further Exploration; Chapter 12: Aggregation Webbots; 12.1 Choosing Data Sources for Webbots; 12.2 Example Aggregation Webbot; 12.3 Adding Filtering to Your Aggregation Webbot; 12.4 Further Exploration; Chapter 13: FTP Webbots; 13.1 Example FTP Webbot; 13.2 PHP and FTP; 13.3 Further Exploration; Chapter 14: Webbots That Read Email; 14.1 The POP3 Protocol; 14.2 Executing POP3 Commands with a Webbot; 14.3 Further Exploration; Chapter 15: Webbots That Send Email; 15.1 Email, Webbots, and Spam; 15.2 Sending Mail with SMTP and PHP; 15.3 Writing a Webbot That Sends Email Notifications; 15.4 Further Exploration; Chapter 16: Converting a Website into a Function; 16.1 Writing a Function Interface; 16.2 Final Thoughts; Advanced Technical Considerations; Chapter 17: Spiders; 17.1 How Spiders Work; 17.2 Example Spider; 17.3 LIB_simple_spider; 17.4 Experimenting with the Spider; 17.5 Adding the Payload; 17.6 Further Exploration; Chapter 18: Procurement Webbots and Snipers; 18.1 Procurement Webbot Theory; 18.2 Sniper Theory; 18.3 Testing Your Own Webbots and Snipers; 18.4 Further Exploration; 18.5 Final Thoughts; Chapter 19: Webbots and Cryptography; 19.1 Designing Webbots That Use Encryption; 19.2 A Quick Overview of Web Encryption; 19.3 Final Thoughts; Chapter 20: Authentication; 20.1 What Is Authentication?; 20.2 Example Scripts and Practice Pages; 20.3 Basic Authentication; 20.4 Session Authentication; 20.5 Final Thoughts; Chapter 21: Advanced Cookie Management; 21.1 How Cookies Work; 21.2 PHP/CURL and Cookies; 21.3 How Cookies Challenge Webbot Design; 21.4 Further Exploration; Chapter 22: Scheduling Webbots and Spiders; 22.1 Preparing Your Webbots to Run as Scheduled Tasks; 22.2 The Windows XP Task Scheduler; 22.3 The Windows 7 Task Scheduler; 22.4 Non-calendar-based Triggers; 22.5 Final Thoughts; Chapter 23: Scraping Difficult Websites with Browser Macros; 23.1 Barriers to Effective Web Scraping; 23.2 Overcoming Webscraping Barriers with Browser Macros; 23.3 Final Thoughts; Chapter 24: Hacking iMacros; 24.1 Hacking iMacros for Added Functionality; 24.2 Further Exploration; Chapter 25: Deployment and Scaling; 25.1 One-to-Many Environment; 25.2 One-to-One Environment; 25.3 Many-to-Many Environment; 25.4 Many-to-One Environment; 25.5 Scaling and Denial-of-Service Attacks; 25.6 Creating Multiple Instances of a Webbot; 25.7 Managing a Botnet; 25.8 Further Exploration; Larger Considerations; Chapter 26: Designing Stealthy Webbots and Spiders; 26.1 Why Design a Stealthy Webbot?; 26.2 Stealth Means Simulating Human Patterns; 26.3 Final Thoughts; Chapter 27: Proxies; 27.1 What Is a Proxy?; 27.2 Proxies in the Virtual World; 27.3 Why Webbot Developers Use Proxies; 27.4 Using a Proxy Server; 27.5 Types of Proxy Servers; 27.6 Final Thoughts; Chapter 28: Writing Fault-Tolerant Webbots; 28.1 Types of Webbot Fault Tolerance; 28.2 Error Handlers; 28.3 Further Exploration; Chapter 29: Designing Webbot-Friendly Websites; 29.1 Optimizing Web Pages for Search Engine Spiders; 29.2 Web Design Techniques That Hinder Search Engine Spiders; 29.3 Designing Data-Only Interfaces; 29.4 Final Thoughts; Chapter 30: Killing Spiders; 30.1 Asking Nicely; 30.2 Building Speed Bumps; 30.3 Setting Traps; 30.4 Final Thoughts; Chapter 31: Keeping Webbots out of Trouble; 31.1 It's All About Respect; 31.2 Copyright; 31.3 Trespass to Chattels; 31.4 Internet Law; 31.5 Final Thoughts; PHP/CURL Reference; Creating a Minimal PHP/CURL Session; Initiating PHP/CURL Sessions; Setting PHP/CURL Options; Executing the PHP/CURL Command; Closing PHP/CURL Sessions; Status Codes; HTTP Codes; NNTP Codes; SMS Gateways; Sending Text Messages; Reading Text Messages; A Sampling of Text Message Email Addresses;

Verlagsort San Francisco
Sprache englisch
Maße 178 x 234 mm
Themenwelt Mathematik / Informatik Informatik Web / Internet
ISBN-10 1-59327-397-5 / 1593273975
ISBN-13 978-1-59327-397-2 / 9781593273972
Zustand Neuware
Haben Sie eine Frage zum Produkt?
Mehr entdecken
aus dem Bereich