Skip to content

8.2 Flight analysis in SQL

This lesson aims to introduce students to data analysis and data management using SQL.

Data

We will work with three CSV files introduced in Lesson 5. Full dataset is available at https://geo.fsv.cvut.cz/courses/155isdp/data/08/.

We will import all three files into SQLite database using DB Browser for SQLite. First create a new database (skip enter new table definition). A CSV file can be imported into database using Import -> Table from CSV file.

Import data

Note

Don't forget to write changes into database.

Warm-up exercise

select count(*) from airports;
select count(*) from airlines;
select count(*) from flights;
select * from airports limit 1;
select * from airports where state = 'PA';
select * from airports where airport like '%Regional%';
select (
 (select count(*) from airports where airport like '%Regional%') /
 (select cast(count(*) as float) from airports)) * 100;
select * from airlines limit 1;
select * from airlines where airline like '%america%;
select * from flights limit 1;
select count(*) from flights WHERE origin_airport = 'ANC';
select * from flights WHERE origin_airport = 'ANC' ORDER BY DESTINATION_AIRPORT;
select month,count(*) from flights group by month;
select month,count(*) as count from flights group by month having origin_airport = 'ANC' order by count desc;

Indexing data

Compare query plan with/without index created on columns origin_airport and month.

explain query plan select count(*) from flights WHERE origin_airport = 'ANC';
explain query plan select month,count(*) as count from flights group by month having origin_airport = 'ANC' order by count desc;

Indeces are created by CREATE INDEX SQL statement:

create index flights_origin_airport on flights (ORIGIN_AIRPORT);
create index flights_month on flights (month);

Investigate:

select count(*) as count from flights where origin_airport = 'ANC' and month = 1;

Note that an index may be created on multiple columns:

create index flights_origin_airport_month on flights (origin_airport, month);

Views

View is created by CREATE VIEW statement.

create view flights_anc as select * from flights where origin_airport = 'ANC';
select * from flights_anc;

Views query plan is affected by indeces. Compare query plan below with and without index on column destination_airport.

create view flights_anc_all as select * from flights where origin_airport = 'ANC' or destination_airport = 'ANC';
explain query plan select * from flights_anc_all;
create index flights_destination_airport on flights (destination_airport);

Joins

select flight_number, day, month, year, a.airline from flights as f
 join airlines as a on f.airline = a.iata_code;

Compare performance with indeces defined:

create index airlines_code on airlines (iata_code);
create index flights_airline on flights (airline);
create index airports_code on airports (iata_code);

select flight_number, day, month, year, a.airline,
 ao.airport as origin_airport, ao.city as origin_city, ao.state as origin_state,
 ad.airport as destination_airport, ad.city as destination_city, ad.state as destination_state
 from flights as f
 join airlines as a on f.airline = a.iata_code
 join airports as ao on f.origin_airport = ao.iata_code
 join airports as ad on f.destination_airport = ad.iata_code;

Join Types

SQL defines three major types of joins:

  • INNER JOIN (or JOIN)
  • OUTER JOIN (LEFT, RIGHT, FULL)
  • CROSS JOIN

Join types Source: https://www.geeksforgeeks.org/python-sqlite-join-clause/

Let's create view with flights on 1/1/2015 from Ted Stevens Anchorage International Airport to Seattle-Tacoma International Airport:

create view flights_anc_sea_1_1_2015 as select * from flights
 where day = 1 and month = 1 and year = 2015 and origin_airport = 'ANC' and destination_airport = 'SEA';

JOIN statement:

select flight_number, day, month, year, a.airline from flights_anc_sea_1_1_2015 as f
 join airlines as a on f.airline = a.iata_code;

The statement gives the same result as INNER JOIN:

select flight_number, day, month, year, a.airline from flights_anc_sea_1_1_2015 as f
 inner join airlines as a on f.airline = a.iata_code;

Compare with LEFT/RIGHT/FULL (OUTER) JOIN:

select flight_number, day, month, year, a.airline from flights_anc_sea_1_1_2015 as f
 left join airlines as a on f.airline = a.iata_code;

select flight_number, day, month, year, a.airline from flights_anc_sea_1_1_2015 as f
 right join airlines as a on f.airline = a.iata_code; 

select flight_number, day, month, year, a.airline from flights_anc_sea_1_1_2015 as f
 full join airlines as a on f.airline = a.iata_code; 

Note

RIGHT/FULL OUTER JOIN is supported by newer SQLite version. In older version RIGHT OUTER JOIN may be replaced by LEFT OUTER JOIN and tables switched:

select flight_number, day, month, year, a.airline from airlines as a
left join flights_anc_sea_1_1_2015 as f on f.airline = a.iata_code; 

FULL OUTER JOIN may be replaced by combination of LEFT OUTER JOIN and UNION:

select flight_number, day, month, year, a.airline from flights_anc_sea_1_1_2015 as f
left join airlines as a on f.airline = a.iata_code
union 
select flight_number, day, month, year, a.airline from airlines as a
left join flights_anc_sea_1_1_2015 as f on f.airline = a.iata_code; 

Delete Alaska Airlines record and try to perform INNER and OUTER JOINs. Compare results with previous run.

delete from airlines where iata_code = 'AS';

CROSS JOIN

CROSS JOIN is a very special thing that is not used very often because it allows you to combine attributes without considering their relations. Eg. you have 5 airlines and you have 5 airports and you want all their combinations. CROSS JOIN is a cartesian product. The result is therefore 25 rows = 5 x 5.

with five_airports as (select airport from airports limit 5)
select airport, airline from five_airports as f 
cross join (select airline from airlines a limit 5)
order by airport, airline;

Primary and Foreign keys

Note

SQLite3 doesn't support adding primary and foreign keys by ALTER TABLE statement.

CREATE TABLE "airlines_pk" (
    "IATA_CODE" TEXT PRIMARY KEY,
    "AIRLINE"   TEXT
);
INSERT INTO airlines_pk select * from airlines;

CREATE TABLE "airports_pk" (
    "IATA_CODE" TEXT PRIMARY KEY,
    "AIRPORT"   TEXT,
    "CITY"  TEXT,
    "STATE" TEXT,
    "COUNTRY"   TEXT,
    "LATITUDE"  REAL,
    "LONGITUDE" REAL
);
INSERT INTO airports_pk select * from airports;

CREATE TABLE "flights_pk" (
    "YEAR"  INTEGER,
    "MONTH" INTEGER,
    "DAY"   INTEGER,
    "DAY_OF_WEEK"   INTEGER,
    "AIRLINE"   TEXT,
    "FLIGHT_NUMBER" INTEGER, 
    "TAIL_NUMBER"   TEXT,
    "ORIGIN_AIRPORT"    TEXT,
    "DESTINATION_AIRPORT"   TEXT,
    "SCHEDULED_DEPARTURE"   INTEGER,
    "DEPARTURE_TIME"    INTEGER,
    "DEPARTURE_DELAY"   INTEGER,
    "TAXI_OUT"  INTEGER,
    "WHEELS_OFF"    INTEGER,
    "SCHEDULED_TIME"    INTEGER,
    "ELAPSED_TIME"  INTEGER,
    "AIR_TIME"  INTEGER,
    "DISTANCE"  INTEGER,
    "WHEELS_ON" INTEGER,
    "TAXI_IN"   INTEGER,
    "SCHEDULED_ARRIVAL" INTEGER,
    "ARRIVAL_TIME"  INTEGER,
    "ARRIVAL_DELAY" INTEGER,
    "DIVERTED"  INTEGER,
    "CANCELLED" INTEGER,
    "CANCELLATION_REASON"   TEXT,
    "AIR_SYSTEM_DELAY"  TEXT,
    "SECURITY_DELAY"    TEXT,
    "AIRLINE_DELAY" TEXT,
    "LATE_AIRCRAFT_DELAY"   TEXT,
    "WEATHER_DELAY" TEXT,
    FOREIGN KEY (airline) REFERENCES airlines_pk (IATA_CODE)
    FOREIGN KEY (origin_airport) REFERENCES airports_pk (IATA_CODE)
    FOREIGN KEY (destination_airport) REFERENCES airports_pk (IATA_CODE)
    );
PRAGMA foreign_keys = OFF;
INSERT INTO flights_pk select * from flights;
PRAGMA foreign_keys = ON;

Foreign keys improves consistency check:

delete from airports_pk where iata_code = 'ANC';