SQL: SELECT, Set Operators, Sub-queries in WHERE clause

created:

updated:

tags: database sql

I started taking a free online course on introduction to database (Standford’s professor Jennifer Widom). The course is going over basic terminology of database and DBMS and it focuses on SQL for the most of the course. I’d like to write down basic SQL that I’m learning from this course.

Steps in creating and using a relational database

  1. Design schema; create using DDL
  2. Bulk load initial data
  3. Repeat: execute queries and modifications

Query Language

Query language not only queries data but also can modify data.

  • Query languages are compositional, meaning when you run a query over relations, you get a relation as a result
  • ex: Relational algebra, SQL (founded on top of relational algebra)
# IDs of students with GPA > 3.7 applying to Stanford
SELECT Student.ID
FROM Student, Apply
WHERE Student.ID = Apply.ID
AND GPA > 3.7 and college = 'Stanford';

Query Optimizer

Query optimizer in DBMS is extremely important as it takes queries in SQL and figures out the fastest way to execute that on the database.

Terminology

DDL

Data Definition Language. CREATE TABLE..., DROP TABLE...

DML

Data Manipulation Language. SELECT, INSERT, DELETE, UPDATE

Others

indexes, constraints, views, triggers, transactions, authorization, …

Create Database Table

CREATE TABLE Student(ID, name, GPA, photo)

# Create table with each attribute name with its type
CREATE TABLE College(name string, state char(2), enrollment integer)

CREATE TABLE employee (
  employee_id INT PRIMARY KEY,
  first_name VARCHAR (255) NOT NULL,
  last_name VARCHAR (255) NOT NULL,
  manager_id INT,
  FOREIGN KEY (manager_id) REFERENCES employee (employee_id)
  ON DELETE CASCADE
);

Basic SELECT

SELECT A1, A2, ..., An  # what to return
FROM R1, R2, ..., Rm  # relations
WHERE condition  # combine filter

SELECT DISTINCT sName, major  # unique row
FROM Student, Apply  # natural join
WHERE Student.sID = Apply.sID;  # join condition

SELECT cName  # ambiguous select because there are cName in both College and Apply. We can do College.cName
FROM College, Apply
WHERE College.cName = Apply.cName
AND enrollment > 20000 AND major = 'CS';

SELECT Student.sID, sName, GPA, Apply.cName, enrollment
FROM Student, College, Apply
WHERE Apply.sID = Student.sID AND Apply.cName = College.cName
ORDER BY GPA DESC, enrollment;  # ASCENDING is default

SELECT sID, major
FROM Apply
WHERE major LIKE '%bio%';  # pattern match string

SELECT *  # Get all attributes
FROM Student, College;

SELECT sID, sName, GPA, sizeHS, GPA * (sizeHS/1000.0) AS scaledGPA # Adds a new column
FROM Student;

ORDER BY

When there are more than one ORDER BY clause, then PostgreSQL (example) sorts rows by the first clause first.

SELECT first_name, last_name
FROM customer
ORDER BY first_name ASC, last_name DESC;

# sorts by first_name first, and then sorts by last_name

Table Variables

Purpose of table queries (FROM clause):

  • make queries more readable
  • rename relations that are used in the FROM clause (this is useful especially when we have two instances of the same relation)
SELECT S.sID, sName, GPA, A.cName, enrollment
FROM Student S, College C, Apply A  # name as S, C, and A
WHERE A.sID = S.sID AND A.cName = C.cName;

SELECT S1.sID, S1.sName, S1.GPA, S2.sID, S2.sName, S2.GPA
FROM Student S1, Student S2
WHERE S1.GPA = S2.GPA AND S1.sID < S2.sID;  
# S1.sID < S2.sID ensures that it does not have duplicate tuple (A, B) and (B, A)

Set Operators

ex: UNION, INTERSECT, EXCEPT.

Union

The UNION operator combines result sets of two or more SELECT statements into a single result set.

The nmber and the order of columns in the select list must be the same.

SELECT cName AS name FROM College  # rename 'AS' name
UNION ALL
SELECT sName AS name FROM Student
ORDER BY name;
  • UNION operator in SQL, by default, eliminates duplicates.
  • If we want to have duplicates, we can have UNION ALL instead of UNION.

Intersect

The INTERSECT operator returns any rows that are available in both result sets.

The nmber and the order of columns in the select list must be the same.

SELECT sID FROM Apply WHERE major = 'CS'
INTERSECT
SELECT sID FROM Apply WHERE major = 'EE';

# the above is similar to the bottom but has duplicates

SELECT A1.sID
FROM Apply A1, Apply A2
WHERE A1.sID = A2.sID AND A1.major = 'CS' AND A2.major = 'EE';
  • Find intersection of data that applies to both queries (A INTERSECT B).

Except

The EXCEPT operator returns distinct rows from the first (left) query that are not in the output of the second (right) query.

The nmber and the order of columns in the select list must be the same.

# Find students who applied to CS but not to EE
SELECT sID FROM Apply WHERE major = 'CS'
EXCEPT
SELECT sID FROM Apply WHERE major = 'EE';

# the above is similar to the bottom but has duplicates and additional data

SELECT A1.sID
FROM Apply A1, Apply A2
WHERE A1.sID = A2.sID AND A1.major = 'CS' AND A2.major <> 'EE';
  • Some databases don’t support EXCEPT.

Sub-queries in WHERE clause

sub-queries are nested SELECT statements and they are powerful.

IN

We can use IN operator to check if a values matches in a list of values.

value IN (value1, value2, ...)
# with a subquery
value IN (SELECT column_name FROM table_name);

IN checks whether a specified value matches any value in the sub-query.

# IDs and names of students who have applied to major in CS at some college
SELECT sID, sName
FROM Student
WHERE sID IN (SELECT sID FROM Apply WHERE major = 'CS');

# Equivalent SQL without sub-query
# We got more database rows compared to the above query
# due to duplicate values (a student applies to multiple college)
# To remove duplicates, we can use 'DISTINCT'
SELECT sID, sName
FROM Student, Apply  # Join
WHERE Student.sID = Apply.sID and major = 'CS';  # Join condition

SELECT sName
FROM Student
WHERE sID in (SELECT sID FROM Apply WHERE major = 'CS');

SELECT GPA
FROM Student
WHERE sID IN (SELECT sID from Apply WHERE major = 'CS');

# Below use JOIN and DISTINCT but this may result in
# incorrect data
SELECT DISTINCT GPA
FROM Student, Apply
WHERE Student.sID = Apply.sID AND major = 'CS';

As we see in the above example, sometimes the only way to get the correct data we want is to use sub-query instead of JOINs.

# Find students who have applied to CS but have not applied
# to major in EE
SELECT sID, sName
FROM Student
WHERE sID IN (SELECT sID FROM Apply WHERE major = 'CS')
  AND sID NOT IN (SELECT sID FROM Apply WHERE major = 'EE');

EXISTS / NOT EXISTS

It is fairly common to write equivalent SQL in different ways.

  • EXISTS: checks whether the result of the sub-query returns any row
  • NOT EXISTS: checks whether the result of the sub-query returns no row
SELECT cName, state
FROM College C1
WHERE EXISTS (SELECT * FROM College C2
              WHERE C2.state = C1.state AND C1.cName <> C2.cName);
# EXISTS here checks whether the suq-query is empty or not

SELECT cName
FROM College C1
WHERE NOT EXISTS (SELECT * FROM College C2
                  WHERE C2.enrollment > C1.enrollment);

SELECT sName, GPA
FROM Student S1
WHERE NOT EXISTS (SELECT * FROM Student S2
                  WHERE S2.GPA > S1.GPA);
# This query may be empty if there are multiple students who have
# the highest GPA (equal GPA)

SELECT S1.sName, S1.GPA
FROM Student S1, Student S2
WHERE S1.GPA > S2.GPA;
# This query is similar to the above one with sub-query but this
# gets much larger database rows, and this is not equivalent to
# finding students with the highest GPA
# In fact, this query finds students except those with the lowest
# GPA.

ALL

ALL compares a scalar value with a single-column set of values returned by a sub-query. ALL returns TRUE if the scalar value evaluates to TRUE with all value in the single column set result.

SELECT sName, GPA
FROM Student
WHERE GPA >= ALL (SELECT GPA FROM Student);
# ALL checks whether GPA is greater than or equal to sub-query

SELECT sName
FROM Student S1
WHERE GPA > ALL (SELECT GPA FROM Student S2
                 WHERE S2.sID <> S1.sID);
# We got empty result. Remember that we have four students with
# the highest GPA.This query would work if all students have
# unique GPA.

ANY

ANY compares a scalar value with a single-column set of values returned by a sub-query. ANY returns TRUE if any scalar value evaluates to TRUE with any value in the single set result.

SELECT cName
FROM College C1
WHERE NOT enrollment <= ANY (SELECT enrollment FROM College C2
                        WHERE C2.cName <> C1.cName);
# ANY is checking if enrollment satisfies at least one element
# of the sub-query.

SELECT sID, sName, sizeHS
FROM Student
WHERE sizeHS > ANY (SELECT sizeHS FROM Student);
# This query finds students who are not from the smallest
# high school in the database

Some database systems such as SQLite do not support ANY or ALL operators. In such cases, we can write equivalent queries by using EXISTS or NOT EXISTS.

SELECT sID, sName, sizeHS
FROM Student S1
WHERE EXISTS (SELECT * FROM Student S2
              WHERE S2.sizeHS < S1.sizeHS);

# This query finds students who have applied to CS but have not
# applied to EE.
SELECT sID, sName
FROM Student
WHERE sID = ANY (SELECT sID FROM Apply WHERE major = 'CS')
  AND not sID = ANY (SELECT sID FROM Apply WHERE major = 'EE');
# Not (there is at least one...) == There is no one that..

LIKE

The LIKE operator can be used for pattern matching.

When there is no wildcard, LIKE operator works like = equal operator.

You construct a pattern by combining literal values with wildcard characters and use the LIKE or NOT LIKE operator to find the matches.

Wildcards:

  • % (percent): match any sequence of zero or more characters
  • _ (underscore): any single character
# true if the value matches the pattern
value LIKE pattern
# true if the value does not match the pattern
value NOT LIKE pattern

ILIKE

  • If using PostgreSQL, it supports ILIKE operator that matches value case-insensitively.

BETWEEN

We can use the BETWEEN operator to match a value against a range of values.

# returns true if value is greater than or equal to low
# AND if value is less than or equal to high 
value BETWEEN low AND high;

References