I started taking a free online course on introduction to database (Standford’s professor Jennifer Widom). The course is going over basic terminology of database and DBMS and it focuses on SQL for the most of the course. I’d like to write down basic SQL that I’m learning from this course.
Steps in creating and using a relational database
- Design schema; create using DDL
- Bulk load initial data
- Repeat: execute queries and modifications
Query Language
Query language not only queries data but also can modify data.
- Query languages are compositional, meaning when you run a query over relations, you get a relation as a result
- ex: Relational algebra, SQL (founded on top of relational algebra)
# IDs of students with GPA > 3.7 applying to Stanford
SELECT Student.ID
FROM Student, Apply
WHERE Student.ID = Apply.ID
AND GPA > 3.7 and college = 'Stanford';
Query Optimizer
Query optimizer in DBMS is extremely important as it takes queries in SQL and figures out the fastest way to execute that on the database.
Terminology
DDL
Data Definition Language. CREATE TABLE...
, DROP TABLE...
DML
Data Manipulation Language. SELECT
, INSERT
, DELETE
, UPDATE
Others
indexes, constraints, views, triggers, transactions, authorization, …
Create Database Table
CREATE TABLE Student(ID, name, GPA, photo)
# Create table with each attribute name with its type
CREATE TABLE College(name string, state char(2), enrollment integer)
CREATE TABLE employee (
employee_id INT PRIMARY KEY,
first_name VARCHAR (255) NOT NULL,
last_name VARCHAR (255) NOT NULL,
manager_id INT,
FOREIGN KEY (manager_id) REFERENCES employee (employee_id)
ON DELETE CASCADE
);
Basic SELECT
SELECT A1, A2, ..., An # what to return
FROM R1, R2, ..., Rm # relations
WHERE condition # combine filter
SELECT DISTINCT sName, major # unique row
FROM Student, Apply # natural join
WHERE Student.sID = Apply.sID; # join condition
SELECT cName # ambiguous select because there are cName in both College and Apply. We can do College.cName
FROM College, Apply
WHERE College.cName = Apply.cName
AND enrollment > 20000 AND major = 'CS';
SELECT Student.sID, sName, GPA, Apply.cName, enrollment
FROM Student, College, Apply
WHERE Apply.sID = Student.sID AND Apply.cName = College.cName
ORDER BY GPA DESC, enrollment; # ASCENDING is default
SELECT sID, major
FROM Apply
WHERE major LIKE '%bio%'; # pattern match string
SELECT * # Get all attributes
FROM Student, College;
SELECT sID, sName, GPA, sizeHS, GPA * (sizeHS/1000.0) AS scaledGPA # Adds a new column
FROM Student;
ORDER BY
When there are more than one ORDER BY
clause, then PostgreSQL (example) sorts rows by the first clause first.
SELECT first_name, last_name
FROM customer
ORDER BY first_name ASC, last_name DESC;
# sorts by first_name first, and then sorts by last_name
Table Variables
Purpose of table queries (FROM
clause):
- make queries more readable
- rename relations that are used in the
FROM
clause (this is useful especially when we have two instances of the same relation)
SELECT S.sID, sName, GPA, A.cName, enrollment
FROM Student S, College C, Apply A # name as S, C, and A
WHERE A.sID = S.sID AND A.cName = C.cName;
SELECT S1.sID, S1.sName, S1.GPA, S2.sID, S2.sName, S2.GPA
FROM Student S1, Student S2
WHERE S1.GPA = S2.GPA AND S1.sID < S2.sID;
# S1.sID < S2.sID ensures that it does not have duplicate tuple (A, B) and (B, A)
Set Operators
ex: UNION
, INTERSECT
, EXCEPT
.
Union
The
UNION
operator combines result sets of two or moreSELECT
statements into a single result set.
The nmber and the order of columns in the select list must be the same.
SELECT cName AS name FROM College # rename 'AS' name
UNION ALL
SELECT sName AS name FROM Student
ORDER BY name;
UNION
operator in SQL, by default, eliminates duplicates.- If we want to have duplicates, we can have
UNION ALL
instead ofUNION
.
Intersect
The
INTERSECT
operator returns any rows that are available in both result sets.
The nmber and the order of columns in the select list must be the same.
SELECT sID FROM Apply WHERE major = 'CS'
INTERSECT
SELECT sID FROM Apply WHERE major = 'EE';
# the above is similar to the bottom but has duplicates
SELECT A1.sID
FROM Apply A1, Apply A2
WHERE A1.sID = A2.sID AND A1.major = 'CS' AND A2.major = 'EE';
- Find intersection of data that applies to both queries (A INTERSECT B).
Except
The
EXCEPT
operator returns distinct rows from the first (left) query that are not in the output of the second (right) query.
The nmber and the order of columns in the select list must be the same.
# Find students who applied to CS but not to EE
SELECT sID FROM Apply WHERE major = 'CS'
EXCEPT
SELECT sID FROM Apply WHERE major = 'EE';
# the above is similar to the bottom but has duplicates and additional data
SELECT A1.sID
FROM Apply A1, Apply A2
WHERE A1.sID = A2.sID AND A1.major = 'CS' AND A2.major <> 'EE';
- Some databases don’t support
EXCEPT
.
Sub-queries in WHERE
clause
sub-queries
are nested SELECT
statements and they are powerful.
IN
We can use IN
operator to check if a values matches in a list of values.
value IN (value1, value2, ...)
# with a subquery
value IN (SELECT column_name FROM table_name);
IN
checks whether a specified value matches any value in the sub-query.
# IDs and names of students who have applied to major in CS at some college
SELECT sID, sName
FROM Student
WHERE sID IN (SELECT sID FROM Apply WHERE major = 'CS');
# Equivalent SQL without sub-query
# We got more database rows compared to the above query
# due to duplicate values (a student applies to multiple college)
# To remove duplicates, we can use 'DISTINCT'
SELECT sID, sName
FROM Student, Apply # Join
WHERE Student.sID = Apply.sID and major = 'CS'; # Join condition
SELECT sName
FROM Student
WHERE sID in (SELECT sID FROM Apply WHERE major = 'CS');
SELECT GPA
FROM Student
WHERE sID IN (SELECT sID from Apply WHERE major = 'CS');
# Below use JOIN and DISTINCT but this may result in
# incorrect data
SELECT DISTINCT GPA
FROM Student, Apply
WHERE Student.sID = Apply.sID AND major = 'CS';
As we see in the above example, sometimes the only way to get the correct data we want is to use sub-query instead of JOINs.
# Find students who have applied to CS but have not applied
# to major in EE
SELECT sID, sName
FROM Student
WHERE sID IN (SELECT sID FROM Apply WHERE major = 'CS')
AND sID NOT IN (SELECT sID FROM Apply WHERE major = 'EE');
EXISTS / NOT EXISTS
It is fairly common to write equivalent SQL in different ways.
EXISTS
: checks whether the result of the sub-query returns any rowNOT EXISTS
: checks whether the result of the sub-query returns no row
SELECT cName, state
FROM College C1
WHERE EXISTS (SELECT * FROM College C2
WHERE C2.state = C1.state AND C1.cName <> C2.cName);
# EXISTS here checks whether the suq-query is empty or not
SELECT cName
FROM College C1
WHERE NOT EXISTS (SELECT * FROM College C2
WHERE C2.enrollment > C1.enrollment);
SELECT sName, GPA
FROM Student S1
WHERE NOT EXISTS (SELECT * FROM Student S2
WHERE S2.GPA > S1.GPA);
# This query may be empty if there are multiple students who have
# the highest GPA (equal GPA)
SELECT S1.sName, S1.GPA
FROM Student S1, Student S2
WHERE S1.GPA > S2.GPA;
# This query is similar to the above one with sub-query but this
# gets much larger database rows, and this is not equivalent to
# finding students with the highest GPA
# In fact, this query finds students except those with the lowest
# GPA.
ALL
ALL
compares a scalar value with a single-column set of values returned by a sub-query. ALL
returns TRUE
if the scalar value evaluates to TRUE
with all value in the single column set result.
SELECT sName, GPA
FROM Student
WHERE GPA >= ALL (SELECT GPA FROM Student);
# ALL checks whether GPA is greater than or equal to sub-query
SELECT sName
FROM Student S1
WHERE GPA > ALL (SELECT GPA FROM Student S2
WHERE S2.sID <> S1.sID);
# We got empty result. Remember that we have four students with
# the highest GPA.This query would work if all students have
# unique GPA.
ANY
ANY
compares a scalar value with a single-column set of values returned by a sub-query. ANY
returns TRUE
if any scalar value evaluates to TRUE
with any value in the single set result.
SELECT cName
FROM College C1
WHERE NOT enrollment <= ANY (SELECT enrollment FROM College C2
WHERE C2.cName <> C1.cName);
# ANY is checking if enrollment satisfies at least one element
# of the sub-query.
SELECT sID, sName, sizeHS
FROM Student
WHERE sizeHS > ANY (SELECT sizeHS FROM Student);
# This query finds students who are not from the smallest
# high school in the database
Some database systems such as SQLite do not support ANY
or ALL
operators. In such cases, we can write equivalent queries by using EXISTS
or NOT EXISTS
.
SELECT sID, sName, sizeHS
FROM Student S1
WHERE EXISTS (SELECT * FROM Student S2
WHERE S2.sizeHS < S1.sizeHS);
# This query finds students who have applied to CS but have not
# applied to EE.
SELECT sID, sName
FROM Student
WHERE sID = ANY (SELECT sID FROM Apply WHERE major = 'CS')
AND not sID = ANY (SELECT sID FROM Apply WHERE major = 'EE');
# Not (there is at least one...) == There is no one that..
LIKE
The LIKE
operator can be used for pattern matching.
When there is no wildcard, LIKE
operator works like =
equal operator.
You construct a pattern by combining literal values with wildcard characters and use the
LIKE
orNOT LIKE
operator to find the matches.
Wildcards:
- % (percent): match any sequence of zero or more characters
- _ (underscore): any single character
# true if the value matches the pattern
value LIKE pattern
# true if the value does not match the pattern
value NOT LIKE pattern
ILIKE
- If using PostgreSQL, it supports
ILIKE
operator that matches value case-insensitively.
BETWEEN
We can use the BETWEEN
operator to match a value against a range of values.
# returns true if value is greater than or equal to low
# AND if value is less than or equal to high
value BETWEEN low AND high;