Relational Database Management Systems (RDBMS) have been around for over 20 years. RDBMS were one of the first multi-server multi user systems developed. Although a number of alternatives exist an RDBMS is still widely used. It is still the de facto storage engine for most applications. RDBMS have greatly evolved over the past 30 years. Although RDBMS have been around for a large number of years a vendor independent architectural overview of an RDBMS is hard to obtain from current literature. Algorithms and abstractions of a database system are often well described but architectural discussion of RDBMS design principles are often hard to come by.
Getting an architectural overview of any system is the first step towards obtaining an in-depth technical understanding of the system. The importance of having an architectural overview of an RDBMS cannot be overstated. It also is a really good foundation to start to understand the various evolution in database technologies i.e. NoSQL and NewSQL database. I have found digging into an RDBMS design principles invaluable. I consider this understanding a must for working with and understanding any evolution in storage technology.
One of the best resources to get this holistic understanding is the book "Architecture of a Database System" by JM Hellerstein, M Stonebraker, J Hamilton. The book is a must on any developers reference shelf. I highly recommend going through the whole book but Chapter 1 a must read for anyone who has to remotely deal with RDBMS. I would go as far as to say it's mandatory reading for all developers.
Chapter 1 introduces the reader to the main components of an RDBMS. These components are shown in the image below. It uses a life of a query as a mechanism to show how these components interact with each other.
There are five major components that are exercised in a typical interaction with an RDMBS:
- Client Communication Manager - In order to communicate with a database an application needs to make a connection with a database over a network. An application establishes a connection with the Client Communication Manager. This component enables communication between various database clients through both local and remote protocols. Its main responsibility is to remember communication state, return data and control messages ( result codes, errors) as well as forward the client's request to other parts of the DBMS.
- Process Manager - The process manager is responsible for providing a “thread of computation” for each database request from a database client. It links the threads data and control output to the appropriate communication manager client. The first decision to be made by the process manager is to determine if enough system resources are available to execute the query or defer the same until a later time.
- Relational Query Processor - On receiving a request to process a query the Relation Query Processor first checks if the user is authorised to run the query. It then compiles the query into an interim query plan which is further optimised. The resulting plan is executed by the "plan executor" which eventually makes use of the transaction and storage monitor.
- Transaction and Storage Manager - Once a query is parsed it retrieves the requested data from the Transaction and Storage Manager. It the gatekeeper to all data access and manipulation calls. The transactional and storage manager also make sure that ACID properties of a transaction are adhered too and thus the need for a lock and log manager.
- Shared Components and Utilities - There are a number of shared components and utilities that are essential for a database to run.
The above is just a very brief introduction to the Architecture of a Database System. I highly recommend the entire book.