Webbots, Spiders, And Screen Scrapers, 2nd Edition
No Starch Press,US (Verlag)
978-1-59327-397-2 (ISBN)
- Titel ist leider vergriffen;
keine Neuauflage - Artikel merken
; About the Author; About the Technical Reviewer; Acknowledgments; Introduction; Old-School Client-Server Technology; The Problem with Browsers; What to Expect from This Book; About the Website; About the Code; Requirements; A Disclaimer (This Is Important); Fundamental Concepts and Techniques; Chapter 1: What's in It for You?; 1.1 Uncovering the Internet's True Potential; 1.2 What's in It for Developers?; 1.3 What's in It for Business Leaders?; 1.4 Final Thoughts; Chapter 2: Ideas for Webbot Projects; 2.1 Inspiration from Browser Limitations; 2.2 A Few Crazy Ideas to Get You Started; 2.3 Final Thoughts; Chapter 3: Downloading Web Pages; 3.1 Think About Files, Not Web Pages; 3.2 Downloading Files with PHP's Built-in Functions; 3.3 Introducing PHP/CURL; 3.4 Installing PHP/CURL; 3.5 LIB_http; 3.6 Final Thoughts; Chapter 4: Basic Parsing Techniques; 4.1 Content Is Mixed with Markup; 4.2 Parsing Poorly Written HTML; 4.3 Standard Parse Routines; 4.4 Using LIB_parse; 4.5 Useful PHP Functions; 4.6 Final Thoughts; Chapter 5: Advanced Parsing with Regular Expressions; 5.1 Pattern Matching, the Key to Regular Expressions; 5.2 PHP Regular Expression Types; 5.3 Learning Patterns Through Examples; 5.4 Regular Expressions of Particular Interest to Webbot Developers; 5.5 When Regular Expressions Are (or Aren't) the Right Parsing Tool; 5.6 Final Thoughts; Chapter 6: Automating Form Submission; 6.1 Reverse Engineering Form Interfaces; 6.2 Form Handlers, Data Fields, Methods, and Event Triggers; 6.3 Unpredictable Forms; 6.4 Analyzing a Form; 6.5 Final Thoughts; Chapter 7: Managing Large Amounts of Data; 7.1 Organizing Data; 7.2 Making Data Smaller; 7.3 Thumbnailing Images; 7.4 Final Thoughts; Projects; Chapter 8: Price-Monitoring Webbots; 8.1 The Target; 8.2 Designing the Parsing Script; 8.3 Initialization and Downloading the Target; 8.4 Further Exploration; Chapter 9: Image-Capturing Webbots; 9.1 Example Image-Capturing Webbot; 9.2 Creating the Image-Capturing Webbot; 9.3 Further Exploration; 9.4 Final Thoughts; Chapter 10: Link-Verification Webbots; 10.1 Creating the Link-Verification Webbot; 10.2 Running the Webbot; 10.3 Further Exploration; Chapter 11: Search-Ranking Webbots; 11.1 Description of a Search Result Page; 11.2 What the Search-Ranking Webbot Does; 11.3 Running the Search-Ranking Webbot; 11.4 How the Search-Ranking Webbot Works; 11.5 The Search-Ranking Webbot Script; 11.6 Final Thoughts; 11.7 Further Exploration; Chapter 12: Aggregation Webbots; 12.1 Choosing Data Sources for Webbots; 12.2 Example Aggregation Webbot; 12.3 Adding Filtering to Your Aggregation Webbot; 12.4 Further Exploration; Chapter 13: FTP Webbots; 13.1 Example FTP Webbot; 13.2 PHP and FTP; 13.3 Further Exploration; Chapter 14: Webbots That Read Email; 14.1 The POP3 Protocol; 14.2 Executing POP3 Commands with a Webbot; 14.3 Further Exploration; Chapter 15: Webbots That Send Email; 15.1 Email, Webbots, and Spam; 15.2 Sending Mail with SMTP and PHP; 15.3 Writing a Webbot That Sends Email Notifications; 15.4 Further Exploration; Chapter 16: Converting a Website into a Function; 16.1 Writing a Function Interface; 16.2 Final Thoughts; Advanced Technical Considerations; Chapter 17: Spiders; 17.1 How Spiders Work; 17.2 Example Spider; 17.3 LIB_simple_spider; 17.4 Experimenting with the Spider; 17.5 Adding the Payload; 17.6 Further Exploration; Chapter 18: Procurement Webbots and Snipers; 18.1 Procurement Webbot Theory; 18.2 Sniper Theory; 18.3 Testing Your Own Webbots and Snipers; 18.4 Further Exploration; 18.5 Final Thoughts; Chapter 19: Webbots and Cryptography; 19.1 Designing Webbots That Use Encryption; 19.2 A Quick Overview of Web Encryption; 19.3 Final Thoughts; Chapter 20: Authentication; 20.1 What Is Authentication?; 20.2 Example Scripts and Practice Pages; 20.3 Basic Authentication; 20.4 Session Authentication; 20.5 Final Thoughts; Chapter 21: Advanced Cookie Management; 21.1 How Cookies Work; 21.2 PHP/CURL and Cookies; 21.3 How Cookies Challenge Webbot Design; 21.4 Further Exploration; Chapter 22: Scheduling Webbots and Spiders; 22.1 Preparing Your Webbots to Run as Scheduled Tasks; 22.2 The Windows XP Task Scheduler; 22.3 The Windows 7 Task Scheduler; 22.4 Non-calendar-based Triggers; 22.5 Final Thoughts; Chapter 23: Scraping Difficult Websites with Browser Macros; 23.1 Barriers to Effective Web Scraping; 23.2 Overcoming Webscraping Barriers with Browser Macros; 23.3 Final Thoughts; Chapter 24: Hacking iMacros; 24.1 Hacking iMacros for Added Functionality; 24.2 Further Exploration; Chapter 25: Deployment and Scaling; 25.1 One-to-Many Environment; 25.2 One-to-One Environment; 25.3 Many-to-Many Environment; 25.4 Many-to-One Environment; 25.5 Scaling and Denial-of-Service Attacks; 25.6 Creating Multiple Instances of a Webbot; 25.7 Managing a Botnet; 25.8 Further Exploration; Larger Considerations; Chapter 26: Designing Stealthy Webbots and Spiders; 26.1 Why Design a Stealthy Webbot?; 26.2 Stealth Means Simulating Human Patterns; 26.3 Final Thoughts; Chapter 27: Proxies; 27.1 What Is a Proxy?; 27.2 Proxies in the Virtual World; 27.3 Why Webbot Developers Use Proxies; 27.4 Using a Proxy Server; 27.5 Types of Proxy Servers; 27.6 Final Thoughts; Chapter 28: Writing Fault-Tolerant Webbots; 28.1 Types of Webbot Fault Tolerance; 28.2 Error Handlers; 28.3 Further Exploration; Chapter 29: Designing Webbot-Friendly Websites; 29.1 Optimizing Web Pages for Search Engine Spiders; 29.2 Web Design Techniques That Hinder Search Engine Spiders; 29.3 Designing Data-Only Interfaces; 29.4 Final Thoughts; Chapter 30: Killing Spiders; 30.1 Asking Nicely; 30.2 Building Speed Bumps; 30.3 Setting Traps; 30.4 Final Thoughts; Chapter 31: Keeping Webbots out of Trouble; 31.1 It's All About Respect; 31.2 Copyright; 31.3 Trespass to Chattels; 31.4 Internet Law; 31.5 Final Thoughts; PHP/CURL Reference; Creating a Minimal PHP/CURL Session; Initiating PHP/CURL Sessions; Setting PHP/CURL Options; Executing the PHP/CURL Command; Closing PHP/CURL Sessions; Status Codes; HTTP Codes; NNTP Codes; SMS Gateways; Sending Text Messages; Reading Text Messages; A Sampling of Text Message Email Addresses;
Verlagsort | San Francisco |
---|---|
Sprache | englisch |
Maße | 178 x 234 mm |
Themenwelt | Mathematik / Informatik ► Informatik ► Web / Internet |
ISBN-10 | 1-59327-397-5 / 1593273975 |
ISBN-13 | 978-1-59327-397-2 / 9781593273972 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich