This textbook examines database systems from the viewpoint of a software developer. This perspective makes it possible to investigate why database systems are the way they are. It is of course important to be able to write queries, but it is equally important to know how they are processed. We e.g. don’t want to just use JDBC; we also want to know why the API contains the classes and methods that it does. We need a sense of how hard is it to write a disk cache or logging facility. And what exactly is a database driver, anyway?
The first two chapters provide a brief overview of database systems and their use. Chapter 1 discusses the purpose and features of a database system and introduces the Derby and SimpleDB systems. Chapter 2 explains how to write a database application using Java. It presents the basics of JDBC, which is the fundamental API for Java programs that interact with a database. In turn, Chapters 3-11 examine the internals of a typical database engine. Each chapter covers a different database component, starting with the lowest level of abstraction (the disk and file manager) and ending with the highest (the JDBC client interface); further, the respective chapter explains the main issues concerning the component, and considers possible design decisions. As a result, the reader can see exactly what services each component provides and how it interacts with the other components in the system. By the end of this part, s/he will have witnessed the gradual development of a simple but completely functional system. The remaining four chapters then focus on efficient query processing, and focus on the sophisticated techniques and algorithms that can replace the simple design choices described earlier. Topics include indexing, sorting, intelligent buffer usage, and query optimization.
This text is intended for upper-level undergraduate or beginning graduate courses in Computer Science. It assumes that the reader is comfortable with basic Java programming; advanced Java concepts (such as RMI and JDBC) are fully explained in the text. The respective chapters are complemented by “end-of-chapter readings” that discuss interesting ideas and research directions that went unmentioned in the text, and provide references to relevant web pages, research articles, reference manuals, and books. Conceptual and programming exercises are also included at the end of each chapter. Students can apply their conceptual knowledge by examining the SimpleDB (a simple but fully functional database system created by the author and provided online) code and modifying it.
Author(s): Edward Sciore
Edition: 2
Publisher: Springer
Year: 2020
Language: English
Commentary: Java, JDBC, dbms, database, true pdf, 2nd Ed, 2020
Pages: 468
Tags: Java, JDBC, dbms, database
Preface
Organization of the Text
Text Prerequisites
The SimpleDB Software
End-of-Chapter Readings
End-of-Chapter Exercises
Contents
About the Author
Chapter 1: Database Systems
1.1 Why a Database System?
1.1.1 Record Storage
1.1.2 Multi-user Access
1.1.3 Dealing with Catastrophe
1.1.4 Memory Management
1.1.5 Usability
1.2 The Derby Database System
1.3 Database Engines
1.4 The SimpleDB Database System
1.5 The SimpleDB Version of SQL
1.6 Chapter Summary
1.7 Suggested Reading
1.8 Exercises
Chapter 2: JDBC
2.1 Basic JDBC
2.1.1 Connecting to a Database Engine
2.1.2 Disconnecting from a Database Engine
2.1.3 SQL Exceptions
2.1.4 Executing SQL Statements
2.1.5 Result Sets
2.1.6 Using Query Metadata
2.2 Advanced JDBC
2.2.1 Hiding the Driver
2.2.2 Explicit Transaction Handling
2.2.3 Transaction Isolation Levels
2.2.4 Prepared Statements
2.2.5 Scrollable and Updatable Result Sets
2.2.6 Additional Data Types
2.3 Computing in Java vs. SQL
2.4 Chapter Summary
2.5 Suggested Reading
2.6 Exercises
Chapter 3: Disk and File Management
3.1 Persistent Data Storage
3.1.1 Disk Drives
3.1.2 Accessing a Disk Drive
3.1.3 Improving Disk Access Time
3.1.4 Improving Disk Reliability by Mirroring
3.1.5 Improving Disk Reliability by Storing Parity
3.1.6 RAID
3.1.7 Flash Drives
3.2 The Block-Level Interface to the Disk
3.3 The File-Level Interface to the Disk
3.4 The Database System and the OS
3.5 The SimpleDB File Manager
3.5.1 Using the File Manager
3.5.2 Implementing the File Manager
3.6 Chapter Summary
3.7 Suggested Reading
3.8 Exercises
Chapter 4: Memory Management
4.1 Two Principles of Database Memory Management
4.2 Managing Log Information
4.3 The SimpleDB Log Manager
4.3.1 The API for the Log Manager
4.3.2 Implementing the Log Manager
4.4 Managing User Data
4.4.1 The Buffer Manager
4.4.2 Buffers
4.4.3 Buffer Replacement Strategies
4.5 The SimpleDB Buffer Manager
4.5.1 An API for the Buffer Manager
4.5.2 Implementing the Buffer Manager
4.6 Chapter Summary
4.7 Suggested Reading
4.8 Exercises
Chapter 5: Transaction Management
5.1 Transactions
5.2 Using Transactions in SimpleDB
5.3 Recovery Management
5.3.1 Log Records
5.3.2 Rollback
5.3.3 Recovery
5.3.4 Undo-Only and Redo-Only Recovery
5.3.4.1 Undo-Only Recovery
5.3.4.2 Redo-Only Recovery
5.3.5 Write-Ahead Logging
5.3.6 Quiescent Checkpointing
5.3.7 Nonquiescent Checkpointing
5.3.8 Data Item Granularity
5.3.9 The SimpleDB Recovery Manager
5.3.9.1 Log Records
5.3.9.2 Rollback and Recover
5.4 Concurrency Management
5.4.1 Serializable Schedules
5.4.2 The Lock Table
5.4.3 The Lock Protocol
5.4.3.1 Serializability Problems
5.4.3.2 Reading Uncommitted Data
5.4.4 Deadlock
5.4.5 File-Level Conflicts and Phantoms
5.4.6 Multiversion Locking
5.4.6.1 The Principle of Multiversion Locking
5.4.6.2 Implementing Multiversion Locking
5.4.7 Transaction Isolation Levels
5.4.8 Data Item Granularity
5.4.9 The SimpleDB Concurrency Manager
5.4.9.1 The Class LockTable
5.4.9.2 The Class ConcurrencyMgr
5.5 Implementing SimpleDB Transactions
5.6 Chapter Summary
5.7 Suggested Reading
5.8 Exercises
Chapter 6: Record Management
6.1 Designing a Record Manager
6.1.1 Spanned Versus Unspanned Records
6.1.2 Homogeneous Versus Nonhomogeneous Files
6.1.3 Fixed-Length Versus Variable-Length Fields
6.1.4 Placing Fields in Records
6.2 Implementing a File of Records
6.2.1 A Straightforward Implementation
6.2.2 Implementing Variable-Length Fields
6.2.3 Implementing Spanned Records
6.2.4 Implementing Nonhomogeneous Records
6.3 SimpleDB Record Pages
6.3.1 Managing Record Information
6.3.2 Implementing the Schema and Layout
6.3.3 Managing the Records in a Page
6.3.4 Implementing Record Pages
6.4 SimpleDB Table Scans
6.4.1 Table Scans
6.4.2 Implementing Table Scans
6.5 Chapter Summary
6.6 Suggested Reading
6.7 Exercises
Chapter 7: Metadata Management
7.1 The Metadata Manager
7.2 Table Metadata
7.3 View Metadata
7.4 Statistical Metadata
7.5 Index Metadata
7.6 Implementing the Metadata Manager
7.7 Chapter Summary
7.8 Suggested Reading
7.9 Exercises
Chapter 8: Query Processing
8.1 Relational Algebra
8.1.1 Select
8.1.2 Project
8.1.3 Product
8.2 Scans
8.3 Update Scans
8.4 Implementing Scans
8.4.1 Select Scans
8.4.2 Project Scans
8.4.3 Product Scans
8.5 Pipelined Query Processing
8.6 Predicates
8.7 Chapter Summary
8.8 Suggested Reading
8.9 Exercises
Chapter 9: Parsing
9.1 Syntax Versus Semantics
9.2 Lexical Analysis
9.3 The SimpleDB Lexical Analyzer
9.4 Grammars
9.5 Recursive-Descent Parsers
9.6 Adding Actions to the Parser
9.6.1 Parsing Predicates and Expressions
9.6.2 Parsing Queries
9.6.3 Parsing Updates
9.6.4 Parsing Insertions
9.6.5 Parsing Deletions
9.6.6 Parsing Modifications
9.6.7 Parsing Table, View, and Index Creation
9.7 Chapter Summary
9.8 Suggested Reading
9.9 Exercises
Chapter 10: Planning
10.1 Verification
10.2 The Cost of Evaluating a Query Tree
10.2.1 The Cost of a Table Scan
10.2.2 The Cost of a Select Scan
10.2.3 The Cost of a Project Scan
10.2.4 The Cost of a Product Scan
10.2.5 A Concrete Example
10.3 Plans
10.4 Query Planning
10.4.1 The SimpleDB Query Planning Algorithm
10.4.2 Implementing the Query Planning Algorithm
10.5 Update Planning
10.5.1 Delete and Modify Planning
10.5.2 Insert Planning
10.5.3 Planning for Table, View, and Index Creation
10.6 The SimpleDB Planner
10.7 Chapter Summary
10.8 Suggested Reading
10.9 Exercises
Chapter 11: JDBC Interfaces
11.1 The SimpleDB API
11.2 Embedded JDBC
11.3 Remote Method Invocation
11.3.1 Remote Interfaces
11.3.2 The RMI Registry
11.3.3 Thread Issues
11.4 Implementing the Remote Interfaces
11.5 Implementing the JDBC Interfaces
11.6 Chapter Summary
11.7 Suggested Reading
11.8 Exercises
Chapter 12: Indexing
12.1 The Value of Indexing
12.2 SimpleDB Indexes
12.3 Static Hash Indexes
12.3.1 Static Hashing
12.3.2 Implementing Static Hashing
12.4 Extendable Hash Indexes
12.4.1 Sharing Index Blocks
12.4.2 Compacting the Bucket Directory
12.5 B-Tree Indexes
12.5.1 How to Improve a Dictionary
12.5.2 The B-Tree Directory
12.5.3 A Directory Tree
12.5.4 Inserting Records
12.5.5 Duplicate Datavals
12.5.6 Implementing B-Tree Pages
12.5.7 Implementing the B-Tree Index
12.6 Index-Aware Operator Implementations
12.6.1 An Indexed Implementation of Select
12.6.2 An Indexed Implementation of Join
12.7 Index Update Planning
12.8 Chapter Summary
12.9 Suggested Reading
12.10 Exercises
Chapter 13: Materialization and Sorting
13.1 The Value of Materialization
13.2 Temporary Tables
13.3 Materialization
13.3.1 An Example of Materialization
13.3.2 The Cost of Materialization
13.3.3 Implementing the Materialize Operator
13.4 Sorting
13.4.1 Why Sort Needs to Materialize Its Input
13.4.2 The Basic Mergesort Algorithm
13.4.3 Improving the Mergesort Algorithm
13.4.4 The Cost of Mergesort
13.4.5 Implementing Mergesort
13.5 Grouping and Aggregation
13.6 Merge Joins
13.6.1 An Example of Mergejoin
13.6.2 Implementing Mergejoin
13.7 Chapter Summary
13.8 Suggested Reading
13.9 Exercises
Chapter 14: Effective Buffer Utilization
14.1 Buffer Usage in Query Plans
14.2 Multibuffer Sorting
14.3 Multibuffer Product
14.4 Determining Buffer Allocation
14.5 Implementing Multibuffer Sorting
14.6 Implementing Multibuffer Product
14.7 Hash Joins
14.7.1 The Hashjoin Algorithm
14.7.2 An Example of Hashjoin
14.7.3 Cost Analysis
14.8 Comparing the Join Algorithms
14.9 Chapter Summary
14.10 Suggested Reading
14.11 Exercises
Chapter 15: Query Optimization
15.1 Equivalent Query Trees
15.1.1 Rearranging Products
15.1.2 Splitting Selections
15.1.3 Moving Selections Within a Tree
15.1.4 Identifying Join Operators
15.1.5 Adding Projections
15.2 The Need for Query Optimization
15.3 The Structure of a Query Optimizer
15.4 Finding the Most Promising Query Tree
15.4.1 The Cost of a Tree
15.4.2 Pushing Select Nodes Down the Tree
15.4.3 Replacing Select-Product Nodes by Join
15.4.4 Using Left-Deep Query Trees
15.4.5 Choosing a Join Order Heuristically
15.4.6 Choosing a Join Order by Exhaustive Enumeration
15.5 Finding the Most Efficient Plan
15.6 Combining the Two Stages of Optimization
15.6.1 The Heuristic-Based SimpleDB Optimizer
15.6.2 Selinger-Style Optimization
15.7 Merging Query Blocks
15.8 Chapter Summary
15.9 Suggested Reading
15.10 Exercises
Index