Joe Celko served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards. Mr. Celko is author a series of books on SQL and RDBMS for Elsevier/MKP. He is an independent consultant based in Austin, Texas. He has written over 1200 columns in the computer trade and academic press, mostly dealing with data and databases.
SQL for Smarties was hailed as the first book devoted explicitly to the advanced techniques needed to transform an experienced SQL programmer into an expert. Now, 20 years later and in its fifth edition, this classic reference still reigns supreme as the only book written by a SQL master that teaches programmers and practitioners to become SQL masters themselves! These are not just tips and techniques; also offered are the best solutions to old and new challenges. Joe Celko conveys the way you need to think in order to get the most out of SQL programming efforts for both correctness and performance. New to the fifth edition, Joe features new examples to reflect the ANSI/ISO Standards so anyone can use it. He also updates data element names to meet new ISO-11179 rules with the same experience-based teaching style that made the previous editions the classics they are today. You will learn new ways to write common queries, such as finding coverings, partitions, runs in data, auctions and inventory, relational divisions and so forth. SQL for Smarties explains some of the principles of SQL programming as well as the code. A new chapter discusses design flaws in DDL, such as attribute splitting, non-normal forum redundancies and tibbling. There is a look at the traditional acid versus base transaction models, now popular in NoSQL products. You'll learn about computed columns and the DEFERRABLE options in constraints. An overview of the bi-temporal model is new to this edition and there is a longer discussion about descriptive statistic aggregate functions. The book finishes with an overview of SQL/PSM that is applicable to proprietary 4GL vendor extensions. - New to the 5th Edition:- Overview of the bitemporal model- Extended coverage of descriptive statistic aggregate functions- New chapter covers flaws in DDL- Examination of traditional acid versus base transaction models- Reorganized to help you navigate related topics with ease- Expert advice from a noted SQL authority and award-winning columnist Joe Celko, who served on the ANSI SQL standards committee for over a decade- Teaches scores of advanced techniques that can be used with any product, in any SQL environment, whether it is SQL 92 or SQL 2011- Offers tips for working around deficiencies and gives insight into real-world challenges
Front Cover 1
Joe Celko's sql for Smarties: Advanced sql Programming 4
Copyright 5
Dedication 6
Contents 8
Introduction to the Fifth Edition 18
What is New in this Edition 18
Corrections and Additions 19
Part 1: Data Declaration Features 20
Chapter 1: Databases Versus File Systems 22
1.1. The Schema Statement 24
1.2. Tables as Entities 29
1.3. Tables as Relationships 29
1.3.1. E-R Diagrams 29
1.4. Rows Versus Records 32
1.5. Columns Versus Fields 33
Chapter 2: Transactions and Concurrency Control 36
2.1. Sessions 37
2.2. Transactions and ACID 37
2.3. Concurrency Control 40
2.3.1. The Transaction Phenomena 40
2.4. The Isolation Levels 41
2.5. Pessimistic Concurrency Control 44
2.6. Snapshot Isolation Optimistic Concurrency 44
2.7. Logical Concurrency Control 47
2.8. Cap Theorem 48
2.9. Base 49
2.10. Server-Side Consistency 51
2.11. Error Handling 52
2.12. Deadlock and Livelocks 52
Chapter 3: Tables 54
3.1. Create Table Statements 55
3.1.1. Base Tables 55
3.1.2. [ Global | Local ] Temporary Tables 56
3.2. Column Definitions 57
3.2.1. DEFAULT Clause 57
3.2.2. NOT NULL Constraint 58
3.2.3. CHECK () Constraint 59
3.2.4. UNIQUE and PRIMARY KEY Constraints 60
3.2.5. REFERENCES Clause 61
3.2.6. Referential Actions 61
3.2.6.1. Nested UNIQUE Constraints 63
3.2.6.2. Overlapping Keys 67
3.3. Computed Columns 70
3.4. [ NOT ] DEFERRABLE Constraints 71
3.5. CREATE DOMAIN and CREATE SEQUENCE 73
3.5.1. CREATE DOMAIN 73
3.5.2. CREATE SEQUENCE 73
3.5.2.1. Using the SEQUENCE 76
3.6. Character Set Related Constructs 80
3.6.1. CREATE CHARACTER SET 81
3.6.2. CREATE COLLATION 81
3.6.3. CREATE TRANSLATION 82
Chapter 4: Keys, Locators, and Generated Values 84
4.1. Key Types 85
4.1.1. Natural Keys 86
4.1.2. Artificial Keys 86
4.1.3. Exposed Physical Locators 87
4.2. Practical Hints for Denormalization 88
4.2.1. Row Sorting 89
Chapter 5: Normalization 94
5.1. Functional and Multivalued Dependencies 97
5.2. First Normal Form (1NF) 97
5.2.1. Note on Repeating Groups 99
5.2.1.1. Repeating Columns 100
5.2.1.2. Parsing a List in a String 101
5.3. Second Normal Form (2NF) 102
5.4. Third Normal Form (3NF) 104
5.5. Elementary Key Normal Form (EKNF) 105
5.6. Boyce-Codd Normal Form (BCNF) 106
5.7. Fourth Normal Form (4NF) 108
5.8. Fifth Normal Form (5NF) 109
5.9. Domain-Key Normal Form (DKNF) 111
5.10. Practical Hints for Normalization 120
5.11. Non-Normal Form Redundancy 122
5.11.1. Aggregation Level Redundancy 122
5.11.2. Entire Table Redundancy 123
5.11.3. Access Path Redundancy 124
5.11.4. Attribute Splitting 125
Chapter 6: VIEWs, Derived, and Other Virtual Tables 126
6.1. VIEWs in Queries 126
6.2. Updatable and Read-Only VIEWs 127
6.3. Types of VIEWs 129
6.3.1. Single-Table Projection and Restriction 130
6.3.2. Calculated Columns 130
6.3.3. Translated Columns 130
6.3.4. Grouped VIEWs 131
6.3.5. UNION-ed VIEWs 132
6.3.6. JOINs in VIEWs 134
6.3.7. Nested VIEWs 134
6.4. How VIEWs are Handled in the Database Engine 135
6.4.1. View Column List 136
6.4.2. VIEW Materialization 136
6.4.3. In-Line Text Expansion 137
6.4.4. Pointer Structures 139
6.4.5. Indexing and Views 140
6.5. WITH CHECK OPTION Clause 140
6.5.1. WITH CHECK OPTION as CHECK() clause 144
6.6. Dropping VIEWs 146
6.7. Materialized Query Tables 147
6.7.1. CREATE TABLE 147
6.7.2. REFRESH TABLE Statement 148
Chapter 7: Auxiliary Tables 150
7.1. Series Table 151
7.1.1. Enumerating a List 152
7.1.2. Mapping a Series into a Cycle 154
7.1.3. Building Buckets of Fixed Size 155
7.1.4. Replacing an Iterative Loop 156
7.2. Look-up Auxiliary Tables 158
7.2.1. Simple Translation Auxiliary Tables 160
7.2.2. Multiple Translation Auxiliary Tables 160
7.2.3. Multiple Parameter Auxiliary Tables 161
7.2.4. Range Auxiliary Tables 164
7.2.5. Set Auxiliary Tables 165
7.2.6. Hierarchical Auxiliary Tables 166
7.2.7. One True Look-up Table 167
7.3. Advance Auxiliary Function Tables 170
7.3.1. Inverse Functions with Auxiliary Tables 171
7.3.2. Interpolation with Auxiliary Function Tables 181
7.4. Global Constants Tables 183
7.4.1. Prime Numbers 183
7.4.2. Fibonacci Numbers 187
7.4.3. Random Order Values 188
Word Length 190
7.5. A Note on Converting Procedural Code to Tables 191
Chapter 8: Other Schema Objects 198
8.1. CREATE SCHEMA Statement 200
8.2. Schema Tables 203
8.3. Temporary Tables 203
8.4. CREATE ASSERTION Statement 204
8.5. CREATE DOMAIN Statement 204
8.6. CREATE COLLATION Statement 206
8.7. CREATE TRANSLATION Statement 206
8.8. CREATE PROCEDURE Statement 207
8.9. TRIGGERs 207
8.10. The TRIGGER Model 208
8.10.1. DECLARE CURSOR Statement 210
8.10.2. Allocate Storage in the Host Program 210
8.10.3. DECLARE CURSOR Statement 210
8.10.4. OPEN Cursor Statement 212
8.10.5. FETCH FROM Cursor Statement 212
8.10.6. CLOSE Cursor Statement 214
8.10.7. The DEALLOCATE CURSOR Statement 214
8.10.8. Remarks About Cursors 215
Chapter 9: DDL Flaws to Avoid 218
9.1. Tibbling and Related Errors 218
9.1.1. ISO -11179 Standards 219
9.1.2. Data Type and Constraint Errors 221
9.2. Attribute Splitting 222
9.2.1. Schema Level Attribute Splitting 222
9.2.2. Column Level Attribute Splitting 222
9.2.3. Table Level Attribute Splitting 222
9.3. Overloading Design Flaws 222
9.3.1. Schema Level Overloading 223
9.3.2. Table Level Overloading 223
9.3.3. Column Level Overloading 226
9.4. Non-Normal Form Redundancy 227
9.4.1. Conflicting Relationships 228
Part 2: Data Types 230
Chapter 10: Numeric Data in SQL 232
10.1. Exact Numeric Data Types 233
10.1.1. Bit, Byte, and Boolean Data Types 233
10.2. Approximate Numeric Data Types 239
10.2.1. Float Versus Real Versus Double Precision 239
10.2.2. IEEE Floating Point Extensions 240
10.3. Numeric Type Conversions 240
10.3.1. Rounding and Truncating 241
10.3.2. CAST() Function 242
10.4. Four Function Arithmetic 242
10.4.1. Arithmetic and NULLs 244
10.5. Converting Values to and from NULL 245
10.5.1. NULLIF() Function 245
10.5.2. COALESCE() Function 246
10.6. Mathematical Functions 248
10.6.1. Number Theory Operators 248
10.6.2. Exponential Functions 250
10.6.3. Scaling Functions 251
10.6.4. Other Mathematical Functions 251
10.6.5. Converting Numbers to Words 251
10.7. IP Addresses 253
10.7.1. CHAR(39) Storage 253
10.7.2. Binary Storage 253
10.7.3. Separate SMALLINTs 253
Chapter 11: Character Data Types in SQL 256
11.1. Problems with SQL Strings 257
11.1.1. Problems of String Equality 257
11.1.2. Problems of String Ordering 258
11.1.3. Problems of String Grouping 259
11.2. Standard String Functions 259
11.3. Common Vendor Extensions 261
11.3.1. Phonetic Matching 262
11.3.1.1. Soundex Functions 262
11.3.1.2. The Original Soundex 263
11.3.1.3. Metaphone 265
11.3.1.4. NYSIIS Algorithm 269
11.4. Cutter Tables 270
Chapter 12: Temporal Data Types in SQL 272
12.1. Notes on Calendar Standards 272
12.2. The Nature of Temporal Data Models 276
12.3. SQL Temporal Data Types 277
12.3.1. Tips for Handling Dates, Timestamps, and Times 278
12.3.2. Date Format Standards 278
12.3.3. Time Format Standards 279
12.3.4. Basic Time 282
12.3.5. Time Zones 282
12.4. INTERVAL Data Types 284
12.5. Queries with Date Arithmetic 286
12.6. Use of NULL for “Eternity” 287
12.7. The OVERLAPS () Predicate 288
12.8. State-Transition Constraints 290
12.9. Calendar Tables 296
12.9.1. Report Period Tables 298
Chapter 13: Multiple Column Data Elements 300
13.1. Vector and Coordinate Data Elements 300
13.1.1. Longitude and Latitude 301
13.1.2. Storing an IPv4 Address in SQL 303
13.1.3. A Single VARCHAR(15) Column 303
13.1.4. One INTEGER Column 304
13.1.5. Four SMALLINT Columns 305
13.1.6. Storing an IPv6 Address in SQL 306
13.1.7. A Single CHAR(32) Column 306
13.1.8. Quantity and Unit Vectors 307
13.2. Hierarchical Data Elements 308
13.2.1. Social Security Numbers 308
13.2.2. Rational Numbers 311
13.2.3. Nondecimal and Mixed Units 312
13.2.4. Interrelated Columns 313
Chapter 14: NULL s—Missing Data in SQL 314
14.1. Empty and Missing Tables 316
14.2. Missing Values in Columns 316
14.3. Context and Missing Values 318
14.4. Comparing NULLs 319
14.5. NULLs and Logic 320
14.5.1. NULLS in Subquery Predicates 323
14.5.2. Logical Value Predicate 325
14.6. Math and NULLs 325
14.7. Functions for NULLs 326
14.8. NULLs and Host Languages 326
14.9. Design Advice for NULLs 328
14.9.1. Avoiding NULLs from the Host Programs 330
14.10. A Note on Multiple NULL Values 331
Chapter 15: Table Operations 334
15.1. DELETE FROM Statement 334
15.1.1. The DELETE FROM Clause 335
15.1.2. The WHERE Clause 335
15.1.3. Deleting Based on Data in a Second Table 337
15.1.4. Deleting Within the Same Table 338
15.1.5. Redundant Duplicates in a Table 339
15.2. INSERT INTO Statement 341
15.2.1. INSERT INTO Clause 341
15.2.2. The Nature of Inserts 342
15.2.3. Bulk Load and Unload Utilities 342
15.3. The UPDATE Statement 343
15.3.1. The UPDATE Clause 343
15.3.2. The SET Clause 344
15.3.3. The WHERE Clause 346
15.3.4. Updating with a Second Table Before MERGE 347
15.3.5. Using the CASE Expression in UPDATEs 350
15.4. A Note on Flaws in a Common Vendor Extension 352
15.5. MERGE Statement 353
Chapter 16: Set Operations 358
16.1. UNION and UNION ALL 359
16.1.1. Order of Execution 361
16.1.2. Mixed UNION and UNION ALL Operators 362
16.1.3. UNION of Columns from the Same Table 363
16.2. INTERSECT and EXCEPT 363
16.2.1. INTERSECT and EXCEPT Without NULLs and Duplicates 366
16.2.2. INTERSECT and EXCEPT with NULLs and Duplicates 367
16.3. A Note on ALL and SELECT DISTINCT 368
16.4. Equality and Proper Subsets 369
Part 3: Row and Column Level Features 372
Chapter 17: Comparison or Theta Operators 374
17.1. Converting Data Types 375
17.1.1. Date Display Formats 377
17.1.2. Other Display Formats 377
17.2. Row Comparisons in SQL 377
17.3. IS [NOT] DISTINCT FROM Operator 380
17.4. Monadic Operators 381
17.4.1. IS NULL 381
17.4.2. IS [NOT] {TRUE | FALSE | UNKNOWN} 382
17.4.3. IS [NOT] NORMALIZED 384
Chapter 18: Subquery Predicates 386
18.1. The UNIQUE Predicate 386
18.2. The [NOT] IN() Predicate 388
18.2.1. Optimizing the IN() Predicate 390
18.2.2. Replacing ORs with the IN() Predicate 393
18.2.3. NULLs and the IN() Predicate 393
18.2.4. IN() Predicate and Referential Constraints 396
18.2.5. IN() Predicate and Scalar Queries 397
18.3. [NOT] EXISTS() Predicate 399
18.3.1. EXISTS and NULLs 400
18.3.2. EXISTS and INNER JOINs 403
18.3.3. NOT EXISTS and OUTER JOINs 403
18.3.4. EXISTS() and Referential Constraints 404
18.3.5. EXISTS() and Quantifiers 405
18.3.6. EXISTS and Three Valued Logic 406
18.4. < theta >
18.5. < theta >
18.5.1. The ALL Predicate and Extrema Functions 411
Chapter 19: BETWEEN and OVERLAPS Predicates 414
19.1. The BETWEEN Predicate 414
19.1.1. Results with NULL Values 416
19.1.2. Results with Empty Sets 416
19.1.3. Programming Tips 416
19.2. OVERLAPS Predicate 417
19.2.1. Time Periods and OVERLAPS Predicate 417
Chapter 20: CASE Expression Family 430
20.1. CASE Expression 430
20.1.1. The COALESCE() and NULLIF() Functions 434
20.1.2. CASE Expressions with GROUP BY 434
20.1.3. CASE, CHECK() Clauses and Logical Implication 436
20.2. Subquery Expressions and Constants 439
Chapter 21: LIKE and SIMILAR TO Predicates 442
21.1. Tricks with Patterns 443
21.2. Results with NULL Values and Empty Strings 445
21.3. LIKE is Not Equality 445
21.4. Extending the LIKE Predicate with a Join 445
21.5. CASE Expressions and LIKE Predicates 446
21.6. SIMILAR TO Predicates 447
21.7. Tricks with Strings 448
21.7.1. String Character Content 448
21.7.2. Searching Versus Declaring a String 449
21.7.3. Creating an Index on a String 450
Chapter 22: Basic SELECT Statement 452
22.1. CTEs 452
22.2. FROM Clause 454
22.3. WHERE Clause 454
22.4. GROUP BY Clause 455
22.5. HAVING Clause 455
22.6. SELECT Clause 455
22.7. ORDER BY Clause 456
22.8. Nested Query Expressions and Orthogonality 456
Chapter 23: Basic Aggregate Functions 458
23.1. COUNT() Functions 459
23.1.1. Optimizing Aggregates with DISTINCT 462
23.2. SUM() Function 463
23.3. AVG() Function 465
23.3.1. Averages with Empty Groups 466
23.3.2. Averages Across Columns 469
23.4. Extrema Functions 470
23.4.1. Simple Extrema Functions 470
23.4.2. Generalized Extrema Functions 472
23.4.3. Multiple Criteria Extrema Functions 475
23.4.4. GREATEST() and LEAST() Functions 475
23.5. The LIST() Aggregate Function 478
23.5.1. LIST Aggregate with Recursive CTE 479
23.5.2. The LIST() Function by Crosstabs 480
23.6. The Mode Aggregate Function 481
23.7. The Median Aggregate Function 482
23.7.1. The Weighted Median 483
23.7.2. Modern Median 484
23.8. The PRD() Aggregate Function 484
23.8.1. PRD() Function by Expressions 485
23.8.2. The PRD() Aggregate Function by Logarithms 487
Chapter 24: Advance Descriptive Statistics 490
24.1. Binary Table Functions 490
24.2. Correlation 492
Chapter 25: OLAP Aggregation in SQL 494
25.1. Querying Versus Reporting 494
25.2. GROUPING Operators 495
25.2.1. GROUP BY GROUPING SET 495
25.2.2. ROLLUP 496
25.2.3. CUBES 497
25.2.4. OLAP Examples of SQL 497
25.3. The Window Clause 498
25.3.1. PARTITION BY Subclause 499
25.3.2. ORDER BY Subclause 499
25.3.3. Window Frame Subclause 501
25.4. Windowed Aggregate Functions 502
25.5. Ordinal Functions 503
25.5.1. Row Numbering 503
25.5.2. RANK() and DENSE_RANK() 503
25.5.3. PERCENT_RANK() and CUME_DIST 504
25.5.4. Some Examples 505
25.6. Vendor Extensions 507
25.6.1. LEAD and LAG Functions 507
25.6.1.1. Example with Gaps 508
25.6.2. FIRST and LAST Functions 510
25.7. A Bit of History 511
Chapter 26: Advanced SELECT Statements 514
26.1. Correlated Subqueries 514
26.2. Infixed INNER JOINs 519
26.3. OUTER JOINs 521
26.3.1. A Bit of History 522
26.3.2. NULLs and OUTER JOINs 527
26.3.3. NATURAL Versus Searched OUTER JOINs 529
26.3.4. Self OUTER JOINs 530
26.3.5. Two or More OUTER JOINs 531
26.3.6. OUTER JOINs and Aggregate Functions 533
26.3.7. FULL OUTER JOIN 533
26.4. UNION JOIN Operators 534
26.5. Scalar SELECT Expressions 536
26.6. Old Versus New JOIN Syntax 537
26.7. Constrained Joins 538
26.7.1. Inventory and Orders 539
26.7.2. Stable Marriages 540
26.7.3. Ball and Box Packing 545
26.8. Dr. Codd's T-Join 548
26.8.1. A Procedural Approach 552
26.9. Missing Values in Data 555
26.9.1. Last Known Value 556
26.9.2. Sequence Missing Readings 557
26.9.3. Smoothed Result 558
26.10. Missing and Mixed Data in Rows 560
Part 4: Data Structures in SQL 564
Chapter 27: Graphs in SQL 566
27.1. Basic Graph Characteristics 567
27.1.1. All Nodes in the Graph 567
27.1.2. Path Endpoints 567
27.1.3. Reachable Nodes 568
27.1.4. Edges 568
27.1.5. Indegree and Outdegree 569
27.1.6. Source, Sink, Isolated, and Internal Nodes 570
27.2. Paths in a Graph 571
27.2.1. Length of Paths 572
27.2.2. Shortest Path 572
27.2.3. Paths by Iteration 573
27.2.4. Listing the Paths 576
27.3. Acyclic Graphs as Nested Sets 580
27.4. Adjacency Matrix Model 582
27.5. Points Inside Polygons 583
27.6. Taxicab Geometry 585
27.6.1. Taxi Versus Euclidean Distance 586
27.6.2. Taxi Shapes 587
27.7. Equivalence Classes and Cliques 588
27.7.1. Graph for Equivalence Relations 589
27.7.2. Reflexive Rows 590
27.7.3. Symmetric Rows 591
27.7.4. Transitive Rows 593
27.7.5. Cliques 594
27.7.6. Adding Members 597
27.8. Conclusion 597
Chapter 28: Trees and Hierarchies in SQL 598
28.1. Adjacency List Model 599
28.2. Finding the Root Node 600
28.3. Finding Leaf Nodes 601
28.4. Finding Levels in a Tree 601
28.5. Tree Operations 602
28.5.1. Subtree Deletion 602
28.5.2. Subtree Insertion 603
28.6. Nested Sets Model 604
28.7. Finding Root and Leaf Nodes 606
28.8. Finding Subtrees 607
28.9. Finding Levels and Paths in a Tree 608
28.9.1. Finding the Height of a Tree 608
28.9.2. Finding Immediate Subordinates 608
28.9.3. Finding Oldest and Youngest Subordinates 609
28.9.4. Finding a Path 611
28.10. Functions in the Nested Sets Model 611
28.11. Deleting Nodes and Subtrees 612
28.11.1. Deleting Subtrees 613
28.11.2. Deleting a Single Node 613
28.11.3. Closing Gaps in the Tree 614
28.12. Summary Functions on Trees 615
28.13. Inserting and Updating Trees 622
28.13.1. Moving a Subtree Within a Tree 622
28.14. Converting Adjacency List to Nested Sets Model 625
28.15. Converting Nested Sets to Adjacency List Model 626
28.16. Comparing Nodes and Structure 626
Chapter 29: Queues 630
29.1. Basic DDL 630
29.2. Enqueue, Dequeue, and Empty Procedures 631
29.3. Rearrangement 632
29.4. Queues and Math 634
29.5. Priority Queues 635
29.6. FIFO and LIFO Queues 636
Chapter 30: Matrices in SQL 640
30.1. Arrays via Named Columns 641
30.2. Arrays via Subscript Columns 645
30.3. Matrix Operations in SQL 646
30.3.1. Matrix Equality 647
30.3.2. Matrix Addition 647
30.3.3. Matrix Multiplication 648
30.3.4. Other Matrix Operations 650
30.4. Flattening a Table into an Array 651
30.5. Comparing Arrays in Table Format 653
30.6. Other Matrix Operations 655
Part 5: Typical Queries 656
Chapter 31: Partitioning and Aggregating Data in Queries 658
31.1. Coverings and Partitions 658
31.1.1. Partitioning by Ranges 658
31.1.2. Partition by Functions 660
31.1.3. Partition by Sequential Order 661
31.2. Advanced Grouping, Windowed Aggregation, and OLAP in SQL 663
31.2.1. GROUPING Operators 665
31.2.2. GROUP BY GROUPING SET 666
31.2.3. ROLLUP 666
31.2.4. CUBES 667
31.2.5. OLAP Examples of SQL 668
31.2.6. The Window Clause 668
31.2.6.1. Partition by Subclause 669
31.2.6.2. ORDER BY Subclause 669
31.2.6.3. Window Frame Subclause 671
31.2.7. Windowed Aggregate Functions 673
31.2.8. Ordinal Functions 673
31.2.8.1. Row Numbering 673
31.2.8.2. RANK() and DENSE_RANK() 673
31.2.8.3. PERCENT_RANK() and CUME_DIST 674
31.2.8.4. Some Examples 675
31.2.9. Vendor Extensions 677
31.2.9.1. LEAD and LAG Functions 677
31.2.9.2. FIRST and LAST Functions 678
31.2.9.3. NTILE Function 680
31.2.10. A Bit of History 682
Chapter 32: Sub-sequences, Regions, Runs, Gaps, and Islands 684
32.1. Finding Subregions of Size ( n) 685
32.2. Numbering Regions 686
32.3. Finding Regions of Maximum Size 688
32.4. Bound Queries 692
32.5. Run and Sequence Queries 693
32.5.1. Filling in Missing Numbers 696
32.6. Summation of a Handmade Series 698
32.7. Swapping and Sliding Values in a List 701
32.8. Condensing a List of Numbers 703
32.9. Folding a List of Numbers 703
32.10. Coverings 704
32.11. Equivalence Classes and Cliques 710
32.11.1. Definition by Extension and Intention 711
32.11.2. Graphs in SQL 712
32.11.3. Reflexive Rows 713
32.11.4. Symmetric Rows 714
32.11.5. Transitive Rows 715
32.11.6. Cliques 717
Chapter 33: Auctions 720
33.1. General Types of Bidding 720
33.2. Types of Auctions 721
33.2.1. English Auctions 721
33.2.2. Japanese Auctions 721
33.2.3. Dutch Auctions 721
33.2.4. Vickrey Auctions 722
33.2.5. Auction Schema 722
33.3. LIFO and FIFO Inventory 723
33.3.1. LIFO Cost as a VIEW 725
33.3.2. CASE Expressions 726
33.3.3. Updating Inventory 727
33.4. Bin Packing 729
Chapter 34: Relational Division 732
34.1. Division with a Remainder 734
34.2. Exact Division 735
34.3. Note on Performance 736
34.4. Todd’s Division 737
34.5. Division with JOINs 740
34.6. Division with Set Operators 741
34.7. Romley’s Division 741
34.8. Boolean Expressions in Relational Division 745
Chapter 35: Temporal Queries 748
35.1. Temporal Math 749
35.2. Calendars 754
35.2.1. Holidays 755
35.2.2. Personal Calendars 756
35.3. Time Series 758
35.3.1. Gaps in a Time Series 758
35.3.2. Continuous Time Periods 761
35.3.2.1. Background 765
35.3.3. Missing Times in Contiguous Events 766
35.3.4. Locating Dates 771
35.3.5. Temporal Starting and Ending Points 772
35.3.5.1. Starting and Ending Times 773
35.4. Julian Dates 774
35.5. Other Temporal Functions 778
35.6. Multi-day Periods 779
35.6.1. Weeks 779
35.6.2. Report Periods 782
35.6.2.1. Report Period Table 782
35.7. Modeling Time in Tables 783
35.7.1. Using Duration Pairs 784
35.8. LEAD() and LAG() Functions 786
35.9. Problems with the Year 2000: A Historical Overview 786
35.9.1. The Zeros 787
35.9.2. Leap Year 788
35.9.3. The Millennium 789
35.9.4. Weird Dates in Legacy Data 790
35.9.5. The Aftermath 792
Part 6: Implementation and Coding Issues 794
Chapter 36: Procedural Semi-Procedural and Declarative Programming in SQL 796
36.1. Words Matter 797
36.2. Cleaning Code 797
Chapter 37: Nesting Levels in SQL 804
37.1. Derived Tables 805
37.2. Column Naming Rules 806
37.3. Scoping Rules 807
37.4. Exposed Table Names 809
37.5. Common Table Expressions (CTEs) 810
37.6. LATERAL Tables 811
37.7. Programming Tips 812
Chapter 38: Embedded SQL, CLI Dynamic SQL, and SQL/PSM 814
38.1. Embedded SQL 814
38.2. SQL/CLI 815
38.3. Dynamic SQL 816
38.4. SQL/PSM History 816
38.4.1. SQL Statements 817
38.4.2. Compound Statements 818
38.4.3. SIGNAL and RESIGNAL Statements 819
38.4.4. Assignment Statements 820
38.4.5. Conditional Statements 820
38.4.6. Loops 822
38.4.7. PRINT Statement 823
38.4.8. CALL Statements 823
38.4.9. CREATE PROCEDURE and CREATE FUNCTION 824
38.5. CSV Parameters 826
Index 834
Databases Versus File Systems
Abstract
A comparison of relational databases versus traditional file systems is discussed. Rows are not records; columns are not fields, and tables are not files.
Keywords
Database
File system
US standard railroad gauge
COBOL
FORTRAN
C
BASIC
PL/I
Java
Procedural programming language
OO programming language
E-R diagrams
Peter Chen
Data Declaration Language (DDL)
Data Control Language (DCL)
It ain’t so much the things we don’t know that get us in trouble. It’s the things we know that ain’t so.
—Artemus Ward (William Graham Sumner), American Writer and Humorist, 1834-1867
Perfecting oneself is as much unlearning as it is learning
—Edsgar Dijkstra
If you already have a background in data processing with traditional file systems, the first things to unlearn are
(0) Databases are not file sets.
(1) Tables are not files.
(2) Rows are not records.
(3) Columns are not fields.
(4) Values in RDBMS are scalar, not structured (arrays, lists, meta-data).
Do not feel ashamed of getting stuck in a conceptual rut; every new technology has this problem.
The US standard railroad gauge (distance between the rails) is 4 ft, 8.5 in. This gauge is used because the English built railroads to that gauge and US railroads were built by English expatriates.
Why did the English build railroads to that gauge? Because the first rail lines were built by the same people who built the pre-railroad tramways, and that’s the gauge they used. Why did those wheelwrights use that gauge then? Because the people who built the horse-drawn trams used the same jigs and tools that they used for building wagons, which used that wheel spacing.
Why did the wagons use that odd wheel spacing? For the practical reason that any other spacing would break an axle on some of the old, long distance roads, because this is the measure of the old wheel ruts.
So who built these old rutted roads? The first long distance roads in Europe were built by Imperial Rome for their legions and used ever since. The initial ruts were first made by Roman war chariots, which were of uniform military issue. The Imperial Roman chariots were made to be just wide enough to accommodate the back-ends of two war horses (this example is originally due to Professor Tom O’Hare, Germanic Languages, University of Texas at Austin; email: tohare@mail.utexas.edu).
This story does not end there, however. Look at a NASA Space Shuttle and the two big booster rockets attached to the sides of the main fuel tank. These are solid rocket boosters or SRBs. The SRBs are made by Thiokol at their factory at Utah. The engineers who designed the SRBs might have preferred to make them a bit fatter, but the SRBs had to be shipped by train from the factory to the launch site in Florida. The railroad line from the factory runs through a tunnel in the mountains and the SRBs have to fit through that tunnel. The tunnel is slightly wider than the railroad track. So, the major design feature of what is arguably the world’s most advanced transportation system was determined by the width of a horse’s ass.
In a similar fashion, modern data processing began with punch cards (Hollerith cards if you are really old) used by the Bureau of the Census. Their original size was that of a US dollar bill. This was set by their inventor, Herman Hollerith, because he could get furniture to store the cards from the US Treasury Department, just across the street. Likewise, physical constraints limited each card to 80 columns of holes in which to record a symbol.
The influence of the punch card lingered on long after the invention of magnetic tapes and disk for data storage. This is why early video display terminals were 80 columns across. Even today, files which were migrated from cards to magnetic tape files or disk storage still use 80 column physical records.
But the influence was not just on the physical side of data processing. The methods for handling data from the prior media were imitated in the new media.
Data processing first consisted of sorting and merging decks of punch cards (later, sequential magnetic tape files) in a series of distinct steps. The result of each step feed into the next step in the process. Think of the assembly line in a factory.
Databases and RDBMS in particular are nothing like the file systems that came with COBOL, FORTRAN, C, BASIC, PL/I, Java, or any of the procedural and OO programming languages. We used to say that SQL means “Scarcely Qualifies as a Language” because it has no I/O of its own. SQL depends on a host language to get and receive data to and from end users.
1.1 The Schema Statement
Programming languages are usually based on some underlying model; if you understand the model, the language makes much more sense. For example, FORTRAN is based on algebra. This does not mean that FORTRAN is exactly like algebra. But if you know algebra, FORTRAN does not look all that strange to you the way that LISP or APL would. You can write an expression in an assignment statement or make a good guess as to the names of library functions you have never seen before.
Likewise, COBOL is based on English narratives of business processes. The design of COBOL files (and almost every other early programming language) was derived from paper forms. The most primitive form of a file is a sequence of records that are ordered within the file and referenced by physical position.
You open a file (think file folder or in-basket on your desk) and then read a first record (think of the first paper form on the stack), followed by a series of next records (process the stack of paperwork, one paper form at a time) until you come to the last record to raise the end-of-file condition (put the file folder in the out-basket). Notice the work flow:
1. The records (paper forms) have to physically exist to be processed. Files are not virtual by nature. In fact, this mindset is best expressed by a quote from Samuel Goldwyn “a verbal contract ain’t worth the paper it is written on!”
2. You navigate among these records and perform actions, one record at a time. You can go backward or forward in the stack but nowhere else.
3. The actions you take on one file (think of a clerk with rubber stamps) have no effect on other files that are not in the same program. The files are like file folders in another in-basket.
4. Only programs (the clerk processing the paperwork) can change files. The in-basket will not pick up a rubber stamp and mark the papers by itself.
The model for SQL is data kept in abstract sets, not in physical files. The “unit of work” in SQL is the whole schema, not individual tables. This is a totally different model of work! Sets are those mathematical abstractions you studied in school. Sets are not ordered and the members of a set are all of the same type. When you do an operation on a set, the action happens “all at once” to the entire membership. That is, if I ask for the subset of odd numbers from the set of positive integers, I get all them back as a single set. I do not build the set of odd numbers by sequentially inspecting one element at a time. I define odd numbers with a rule—“If the remainder is ± 1 when you divide the number by 2, it is odd”—that could test any integer and classify it. Parallel processing is one of many, many advantages of having a set-oriented model. In RDBMS, everything happens all at once.
The Data Declaration Language (DDL) in SQL is what defines and controls access to the database content and maintains the integrity of that data for all programs that access the database. Data in a file is passive. It has no meaning until a program reads it. In COBOL, each program has a DATA DIVISION; in FORTRAN, each program has the FORMAT/READ statements; in Pascal, there is a RECORD declaration that serves the same purpose. Pick your non-SQL language.
These constructs provide a template or parsing rules to overlay upon the records in the file, split them into fields and get the data into the host program. Each program can split up the sequence of characters in a record anyway it wishes name and name the fields as it wished. This can lead to “job security” programming; I worked in a shop in the 1970’s where one programmer would pick a theme (nations of the world, flowers, etc.) and name his fields “Afghanistan” or “Chrysanthemum” or worse. Nobody could read his code, so we could not fire him.
Likewise, the Data Control Language (DCL) controls access to the schema objects that a user can create. Standard SQL divides the database users into USER and ADMIN roles. These schema objects require ADMIN privileges to be created, altered, or dropped (CREATE, ALTER, DROP, etc.). Those with...
Erscheint lt. Verlag | 5.12.2014 |
---|---|
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik ► Betriebssysteme / Server |
Mathematik / Informatik ► Informatik ► Datenbanken | |
Mathematik / Informatik ► Informatik ► Programmiersprachen / -werkzeuge | |
Mathematik / Informatik ► Informatik ► Software Entwicklung | |
Sozialwissenschaften ► Kommunikation / Medien ► Buchhandel / Bibliothekswesen | |
ISBN-10 | 0-12-800830-X / 012800830X |
ISBN-13 | 978-0-12-800830-0 / 9780128008300 |
Haben Sie eine Frage zum Produkt? |
Größe: 5,8 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
Größe: 5,9 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich