If you are in anyway related to software in the service of organizations – you MUST be familiar with Relational Databases * Let’s understand the basics of RDB: what, why, strengths and weaknesses
Relational Databases (RDB) drive nearly all of the information systems in the service of any organization today. Despite being a central and critical technology, it’s only 40 years old, 10 years younger than me.
It was Edgar Codd from IBM who first devised the RDB concept in the early 1970’s. Only in 1978 was the first commercial RDBMS (Relational Database Management System) offered. A year later, in 1979, the RDBMS known today as Oracle DB was introduced.
The Basic Concept
Databases serve as the persistence component in a software application architecture. As opposed to volatile memory that temporarily stores data, the Database is where the data eventually persists. Popular persistence media used for Databases would be magnetic or optic.
RDBs are one approach of storing data in a Database. The basic idea is that of tables of information with relationships defined between these tables (hence the “Relational” in RDB).
Each table consists of columns defined by their column names.
The table rows, or records, represent the actual data (values) stored and serviced for consumption and manipulation.
Each column is defined to store data of a specific type, such as a number, a date or a string (text).
A table has a Primary Key defined, which is one or more columns, by which the rows are sorted and physically stored in the Database. A typical Primary Key of an Employees table would be Employee ID.
RDBMS are designed to be optimized for the expected services of a good Database: storage of data, secured access to the data, manipulation of the data, retrieval of the data, servicing multiple requests (from many users) concurrently and keeping data integrity and consistency.
Some of the most popular RDBMS providers are MySQL, Oracle DB, Microsoft SQL, PostgreSQL, IBM DB2, SAP Sybase. Some offer a free version (a community edition) that can carry a full-blown production application.
In the following image we see how the Employees tables is related to the Regions table. As each single row in the Regions table may be related to any number of rows in the Employees table, this would be a One-to-Many relationship (depicted as 1–>n in the image).
As the Region Code column in the Employees table facilitates the relationship to the Regions table’s Primary Key, that Region Code column in the Employees table is said to be a Foreign Key. Foreign Key and Primary key therefore represent a relationship between two tables.
In the above image, by the way, we see another One-to-Many relationship. It is from the Manager column in the Regions table (being a Foreign Key) to the Emp ID column in the Employees table (being the Primary Key).
A good Database design for any application starts with a good Entity Relationship Diagram (ERD), showing the complete Schema of the tables and their relationships. The below image is the ERD of Excelarate Care, taken from my online course: Databases and SQL Mastery.
In Excelarate Care, a fictious company used in my course, each product is comprised of multiple ingredients, and each ingredient can be found in many products. This is a Many-to-Many relationship. We implement a Many-to-Many relationship by a “middle-man” table, as shown in the above ERD.
In addition to the Primary Key of a table, additional Indexes may be defined. An Index is a structure stored outside of the table, in which the table rows are sorted by any combination of required columns. The Index rows maintain their sync with the primary table by way of “pointing” to their matched rows in the primary table. The Primary Key serves as the “handle” to be pointed at for all indexes defined on a table. The below image shows an Index sorting Employees by their name (the green table).
Indexes allow for fast lookup of data in the table and are key to efficient provisioning of requests by the DB server. Defining the right Indexes is a balancing act that requires some experience. Too many indexes will become a burden when manipulating data. Missing indexes, on the other hand, may significantly slow response time of queries asking for data.
Core RDBMS Principles: ACID
A Database serving multiple concurrent requests coming from different users, poses some challenges in keeping the data integrity, prioritizing requests and presenting a consistent view of the data.
A simple example is that of storing a Sales Order in the Database. Among the many updates this operation dictates, you should at least add an order header record to the Orders table and add several order lines items to the Orders Lines table. If another client is asking to see that newly added order at the exact instance by which the order header record was saved, but the order line items were not yet saved, this client will see a broken order: only header with no line items. This should not be allowed.
Another example would be withdrawing money from an ATM machine. How about the machine breaking down on you after your account balance was updated, but before the money was dispatched to you? I know I wouldn’t want that to happen to me.
We treat a series of operations that must be committed “at once” together, or not happen at all, as a Transaction.
Four principles ensure the consistency and concurrency of the data and its use:
- All changes to data are performed as if they are a single operation.
- Data is in a consistent state when a transaction starts and when it ends.
- The intermediate state of a transaction is invisible to other transactions.
- After a transaction successfully completes, changes to data persist and are not undone.
Working with a RDBMS
An RDBMS is hosted on a server accessible over the network to any client.
The RDBMS provider (and other partners) typically offers connectors for different client types. This way, the DB is communicating with programs written in different technologies: .NET, Java, Excel VBA, Python, to name a few.
Once you have established the “chain” from the runtime environment of your application to the DB server, you can submit requests to the DB server. For example, the Windows computer running Excel that needs access to a MySQL DB, will require the ODBC Connector to MySQL to be installed on the Excel (client) machine.
A typical exchange with the DB server would consist of establishing a connection, submitting the requests and disconnecting from the DB Server. This episode is called a Session.
The most common and popular language to communicate with RDBMS is SQL (Structured Query Language).
We can generally classify the requests types (or, SQL commands) into four categories:
- Data Definition Language (DDL). These commands affect the STRUCTURE of the DB (or Schema). These are commands for creating new tables, altering a table structure (e.g. adding new column), dropping a table (or a whole Schema) and renaming a table. The following SQL command creates a new table:
CREATE TABLE Employees;
- Data Manipulation Language. These commands affect the CONTENT of the tables and facilitate the CRUD operations: Creating, Reading, Updating and Deleting data.
The following command will insert 3 records into the Regions table:
INSERT INTO regions (region_code, region_name)
The following command will return a list of all employees in region 1. The result set consists of 3 columns. Note how we exploit the relationship between the Employees table and the Regions table using the JOIN keyword, in order to get access to columns of both tables. This allows us to show the region’s NAME coming from the Regions table, alongside the employee’s name and quota coming from the Employees table.
SELECT emp_name, emp_quota, region_name
JOIN regions ON emp_region=region_code
- Data Control Language: PERMISSIONS. These commands grant or revoke privileges (permissions) to the users connecting to the RDBMS.
- Transaction Control Language: CONSISTENCY. These commands govern the ACID principles, allowing for starting a transaction, committing a transaction, or rolling-back a transaction.
RDB: Strength and Weaknesses
Among the strengths of a relational DB, I’d list:
- A Robust, proven and mature concept
- Robust, proven and mature technologies with a competitive and vibrant market
- Easy and fast learning curve of the concept and the SQL language
- Designed for strict business rules enforcement as required in many business scenarios
Among its weaknesses, I’d list:
- Not reflective of an object-based approach
- Poor support and performance in highly changing, large scale related systems such as social networks
- Challenging to predict and tune for performance, especially when the scale and complexity of the schema are significant
Databases and SQL are a rich world filling heavy books and courses. In this post I attempted to offer a “soft” introduction that I believe every developer, system analyst, data modeler and otherwise information systems professional should be familiar with, at the minimum.
Not coincidentally, I publish this post in conjunction with the launch of the 4th course in my online program: Computer Programming and Databases with Excel VBA and SQL. The 4th course is all about Databases, SQL, Excel VBA to Databases and SQL over Excel tables. All of the images and examples in this post are taken from this course.
WANT TO LEARN PROGRAMMING WITH EXCEL VBA?
I created a complete 8-course program for you.
From programming to databases with Excel VBA to deliver business-grade solutions.
Take the first course today: Computer Programming with Excel VBA.
Click the button to see the complete program and start now!