Basics of Querying
Understanding the Foundations
Standard SQL is based on the relational model, which is a mathematical model for data
management and manipulation. The relational model was initially created and proposed by
Edgar F. Codd in 1969. Since then, it has been explained and developed by Chris Date, Hugh
Darwen, and others.
the true source for the model’s name is the
mathematical concept relation. A relation in the relational model is what SQL calls a table. The two are not synonymous. SQL attempts to represent with a table: a relation
has a heading and a body. The heading is a set of attributes (what SQL attempts to represent
with columns), each of a given type. An attribute is identified by name and type name. The
body is a set of tuples (what SQL attempts to represent with rows). Each tuple’s heading is the
heading of the relation. Each value of each tuple’s attribute is of its respective type.
Some of the most important principals to understand about SQL stem from the relational model’s core foundations—set theory and predicate logic.
The other branch of mathematics that the relational model is based on is called predicate
logic. A predicate is an expression that when attributed to some object, makes a proposition
either true or false. For example, “salary greater than $50,000” is a predicate. You can evaluate this predicate for a specific employee, in which case you have a proposition.
The relational model uses predicates as one of its core elements. You can enforce data
integrity by using predicates. You can filter data by using predicates. You can even use predicates to define the data model itself. You first identify propositions that need to be stored
in the database. Here’s an example proposition: an order with order ID 10256 was placed on
July 07, 2023 by the customer with ID 45, and handled by the employee with ID 9. You
then create predicates from the propositions by removing the data and keeping the heading.
Remember, the heading is a set of attributes, each identified by name and type name. In this
example, you have orderid INT, orderdate DATE, custid INT, and empid INT
For people with a procedural programming background, the natural way to interact with
data (in a file, record set, or data reader) is with iterations. So using cursors and other iterative
constructs in SQL is, in a way, an extension to what they already know. However, the correct
way from the relational model’s perspective is not to interact with the rows one at a time;
rather, use relational operations and return a relational result. This, in SQL, translates to
writing queries
Understanding Logical Query Processing
SQL has both logical and physical sides to it. The logical side is the conceptual interpretation of the query that explains what the correct result of the query is. The physical side is the
processing of the query by the database engine.
SQL, is a declarative English-like language. In this language,
declarative means you define what you want, as opposed to imperative languages that define
also how to achieve what you want.
Logical Query processing phases
The main statement used to retrieve data in SQL is the SELECT statement. Following are
the main query clauses specified in the order that you are supposed to type them (known as
“keyed-in order”):
1. SELECT
2. FROM
3. WHERE
4. GROUP BY
5. HAVING
6. ORDER BY
But as mentioned, the logical query processing order, which is the conceptual interpretation order, is different. It starts with the FROM clause. Here is the logical query processing
order of the six main query clauses:
1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY
Each phase operates on one or more tables as inputs and returns a virtual table as output.
The output table of one phase is considered the input to the next phase.
Evaluate the FROM Clause
In the first phase, the FROM clause is evaluated. That’s where you indicate the tables you want
to query and table operators like joins if applicable.
Filter Rows Based on the WHERE Clause
The second phase filters rows based on the predicate in the WHERE clause. Only rows for which the predicate evaluates to true are returned
Group Rows Based on the GROUP BY Clause
This phase defines a group for each distinct combination of values in the grouped elements
from the input table. It then associates each input row to its respective group.
Filter Rows Based on the HAVING Clause
This phase is also responsible for filtering data based on a predicate, but it is evaluated after
the data has been grouped; hence, it is evaluated per group and filters groups as a whole. As
is usual in SQL, the filtering predicate can evaluate to true, false, or unknown
Process the SELECT Clause
The fifth phase is the one responsible for processing the SELECT clause. What’s interesting
about it is the point in logical query processing where it gets evaluated—almost last. That’s
interesting considering the fact that the SELECT clause appears first in the query.
Handle Presentation Ordering
The sixth phase is applicable if the query has an ORDER BY clause. This phase is responsible
for returning the result in a specific presentation order according to the expressions that appear in the ORDER BY list.
Comments
Post a Comment