JOIN clause in Phoenix
Overview
Phoenix fully supports the standard SQL JOIN
syntax. It allows combining records from different tables if they have common values.
In this article, three example tables with the following columns are used.
Table | Column | Description |
---|---|---|
states |
name |
State name |
abbr |
State abbreviation |
|
capitol |
State capital |
|
largest |
State largest city |
|
enterprises |
name |
Enterprise name |
country |
Country name where the enterprise is located |
|
state |
State abbreviation (in USA) or region name where the enterprise is located |
|
town |
Town name where the enterprise is located |
|
people |
name |
Name of the person |
age |
Age of the person |
|
town |
Town name where the person lives |
|
state |
State abbreviation (in USA) or region name where the person lives |
|
country |
Country name where the person lives |
To be able to follow examples given in this article, you need to create those tables in your Phoenix environment. Do the following:
-
Download the CSV files with sample data: people.csv, enterprises.csv, and states.csv.
-
Put the downloaded files on the filesystem of the host to which the Phoenix Query Server component is mapped, to the directory of your choice.
-
Connect to this host and load the Phoenix shell by executing the following command:
$ /usr/lib/phoenix/bin/sqlline.py
-
Create the empty tables by executing the following commands:
CREATE TABLE enterprises ( name VARCHAR(50) NOT NULL, country VARCHAR(50), state VARCHAR(50), town VARCHAR(50) CONSTRAINT pk PRIMARY KEY (name));
CREATE TABLE people ( name VARCHAR(50) NOT NULL, age SMALLINT, town VARCHAR(50), state VARCHAR(50), country VARCHAR(50) CONSTRAINT pk PRIMARY KEY (name));
CREATE TABLE states ( name VARCHAR(50) NOT NULL, abbr VARCHAR(50), capitol VARCHAR(50), largest VARCHAR(50) CONSTRAINT pk PRIMARY KEY (name));
-
Exit the shell by pressing
Ctrl+D
and load the data from the CSV files into the tables by executing the following commands:$ /usr/lib/phoenix/bin/psql.py /<csv_dir>/people.csv $ /usr/lib/phoenix/bin/psql.py /<csv_dir>/enterprises.csv $ /usr/lib/phoenix/bin/psql.py /<csv_dir>/states.csv
where <csv_dir>
is the path to the directory where the downloaded CSV files are located.
JOIN types
INNER JOIN
The INNER JOIN
takes each row in the first table and searches for a row in the second table that matches the specified condition. If such a row is found, the INNER JOIN
composes a row with values of the columns specified in the query. This row is added to the result and INNER JOIN
proceeds.
This joining mode is used by default, so you may omit the INNER
modifier in the query text.
For example, to find all american enterprises along with their hometowns and the respective state capitols, you need the following query:
SELECT e.name, e.town, s.capitol
FROM states AS s
INNER JOIN enterprises AS e
ON s.abbr=e.state;
Result:
+------------------+---------------+-------------+ | E.NAME | E.TOWN | S.CAPITOL | +------------------+---------------+-------------+ | Rockstar | New York City | Albany | | Bethesda | Rockville | Annapolis | | DigitalAnvil | Austin | Austin | | Shaw Group | Baton Rouge | Baton Rouge | | Microsoft | Redmond | Olympia | | Avnet | Phoenix | Phoenix | | Apple | Cupertino | Sacramento | | Obsidian | Irvine | Sacramento | | Sierra | Los Angeles | Sacramento | | Intel | Sacramento | Sacramento | | Northrop Grumman | Albuquerque | Santa Fe | +------------------+---------------+-------------+
LEFT/RIGHT OUTER JOIN
The LEFT OUTER JOIN
works like INNER JOIN
, but adds values from the first table anyway, regardless of whether the specified condition is satisfied.
For example, to list all the enterprises along with their hometowns, and also the state capitols for those located in the USA, you need the following query:
SELECT e.name, e.town, s.capitol
FROM enterprises AS e
LEFT OUTER JOIN states AS s
ON e.state=s.abbr;
Result:
+------------------+---------------+-------------+ | E.NAME | E.TOWN | S.CAPITOL | +------------------+---------------+-------------+ | Northrop Grumman | Albuquerque | Santa Fe | | DigitalAnvil | Austin | Austin | | Shaw Group | Baton Rouge | Baton Rouge | | Apple | Cupertino | Sacramento | | Obsidian | Irvine | Sacramento | | Eidos | London | | | Sierra | Los Angeles | Sacramento | | Alluxi | Monterrey | | | Rockstar | New York City | Albany | | Empresa | Nogales | | | Avnet | Phoenix | Phoenix | | Microsoft | Redmond | Olympia | | Bethesda | Rockville | Annapolis | | Intel | Sacramento | Sacramento | | Ubisoft | Saint-Mande | | +------------------+---------------+-------------+
The RIGHT OUTER JOIN
does the same, but in regard to the second table instead of the first.
For example, to list all the USA states along with their capitols, and also the enterprises that may be located in those states along with their hometowns, you need the following query:
SELECT s.name, s.capitol, e.name, e.town
FROM enterprises AS e
RIGHT OUTER JOIN states AS s
ON e.state=s.abbr
ORDER BY s.name
LIMIT 20;
Result:
+-------------+--------------+----------+-------------+ | S.NAME | S.CAPITOL | E.NAME | E.TOWN | +-------------+--------------+----------+-------------+ | Alabama | Montgomery | | | | Alaska | Juneau | | | | Arizona | Phoenix | Avnet | Phoenix | | Arkansas | Little Rock | | | | California | Sacramento | Apple | Cupertino | | California | Sacramento | Obsidian | Irvine | | California | Sacramento | Sierra | Los Angeles | | California | Sacramento | Intel | Sacramento | | Colorado | Denver | | | | Connecticut | Hartford | | | | Delaware | Dover | | | | Florida | Tallahassee | | | | Georgia | Atlanta | | | | Hawaii | Honolulu | | | | Idaho | Boise | | | | Illinois | Springfield | | | | Indiana | Indianapolis | | | | Iowa | Des Moines | | | | Kansas | Topeka | | | | Kentucky | Frankfort | | | +-------------+--------------+----------+-------------+
The OUTER
modifier is implied with the LEFT
and RIGHT
modifiers, so it can be omitted.
FULL OUTER JOIN
The FULL OUTER JOIN
unites the LEFT JOIN
and the RIGHT JOIN
: it combines the rows that satisfy the specified condition with the rows that do not from both tables.
Example:
SELECT e.name, e.town, s.capitol
FROM enterprises AS e
FULL OUTER JOIN states AS s
ON e.state=s.abbr
LIMIT 20;
Result:
+------------------+---------------+----------------+ | E.NAME | E.TOWN | S.CAPITOL | +------------------+---------------+----------------+ | | | Juneau | | | | Montgomery | | | | Little Rock | | Avnet | Phoenix | Phoenix | | Apple | Cupertino | Sacramento | | Obsidian | Irvine | Sacramento | | Intel | Sacramento | Sacramento | | Sierra | Los Angeles | Sacramento | | | | Denver | | | | Hartford | | | | Dover | | Eidos | London | | | | | Tallahassee | | | | Atlanta | | | | Honolulu | | | | Des Moines | | | | Boise | | | | Springfield | | | | Indianapolis | | Ubisoft | Saint-Mande | | +------------------+---------------+----------------+
The OUTER
modifier is implied with the FULL
modifier, so it can be omitted.
SELF JOIN
You can join a table with itself (e.g., to filter it). For example, to list the states the capitols of which are also their largest cities, you need the following query:
SELECT s.name, s.capitol
FROM states AS s
JOIN states AS ss
ON s.capitol=ss.largest;
Result:
+---------------+----------------+ | S.NAME | S.CAPITOL | +---------------+----------------+ | Arizona | Phoenix | | Arkansas | Little Rock | | Colorado | Denver | | Georgia | Atlanta | | Hawaii | Honolulu | | Idaho | Boise | | Indiana | Indianapolis | | Iowa | Des Moines | | Massachusetts | Boston | | Mississippi | Jackson | | Ohio | Columbus | | Oklahoma | Oklahoma City | | Rhode Island | Providence | | Tennessee | Nashville | | Utah | Salt Lake City | | West Virginia | Charleston | | West Virginia | Charleston | | Wyoming | Cheyenne | +---------------+----------------+
CROSS JOIN
Normally, there is not much use for CROSS JOIN
in Phoenix as the tables are usually very big. For that reason, Phoenix ignores the CROSS
modifier. However, cross joins are still possible in Phoenix and employ the following syntax:
SELECT <col_1>, <col_2>, ... <col_n>
FROM t1, t2;
GROUP JOIN
Phoenix supports complex JOIN
syntax. The result of joining two tables can be joined with another table using parenthesis. Also, instead of any table in a JOIN
clause, you can use a subquery (derived table), which in turn can also contain its own JOIN
clause.
For example, to list the people who live in the capitols of the states where there is also an enterprise located, you need the following query:
SELECT p.name, e.name, s.capitol
FROM people AS p
JOIN
(enterprises AS e
JOIN states AS s
ON e.town=s.capitol)
ON p.town=e.town
LIMIT 20;
Result:
+------------------+--------------+-------------+ | P.NAME | E.NAME | S.CAPITOL | +------------------+--------------+-------------+ | Abbott Howard | Shaw Group | Baton Rouge | | Alvarado Melvin | DigitalAnvil | Austin | | Baker Daniel | Shaw Group | Baton Rouge | | Barnes Mollie | Shaw Group | Baton Rouge | | Becker Samuel | Shaw Group | Baton Rouge | | Bush Cameron | Shaw Group | Baton Rouge | | Carpenter Carrie | Avnet | Phoenix | | Clarke Joe | Avnet | Phoenix | | Cox Dennis | Intel | Sacramento | | Cox Shawn | Avnet | Phoenix | | Daniel Ada | DigitalAnvil | Austin | | Day Hattie | Intel | Sacramento | | Erickson Mayme | Intel | Sacramento | | Evans Sophie | Avnet | Phoenix | | Farmer Dean | Shaw Group | Baton Rouge | | Fisher Gordon | Intel | Sacramento | | Fletcher Julia | Avnet | Phoenix | | Garner Juan | Shaw Group | Baton Rouge | | Gill Josephine | Shaw Group | Baton Rouge | | Gonzales Edna | Shaw Group | Baton Rouge | +------------------+--------------+-------------+
Exclusion of INNER part from OUTER JOIN
Sometimes you might want to exclude the rows that satisfy the specified condition from the JOIN
result. To that end, you need to add a WHERE
clause to the query.
For example, to list all the enterprises outside of USA, you need the following query:
SELECT e.name, e.country, e.state, e.town
FROM enterprises AS e
LEFT JOIN states AS s
ON e.state=s.abbr
WHERE s.abbr IS NULL;
Result:
+---------+-----------+---------------+-------------+ | E.NAME | E.COUNTRY | E.STATE | E.TOWN | +---------+-----------+---------------+-------------+ | Alluxi | Mexico | Nuevo Leon | Monterrey | | Eidos | UK | England | London | | Empresa | Mexico | Sonora | Nogales | | Ubisoft | France | Ile-de-France | Saint-Mande | +---------+-----------+---------------+-------------+
JOIN with indices
You can create secondary indices for your tables which will be used automatically for JOIN
queries. Example:
CREATE INDEX ipeople ON people (town) INCLUDE (state);
CREATE INDEX istates ON states (capitol) INCLUDE (name, abbr);
CREATE INDEX ienterprises ON enterprises (town) INCLUDE (name, state);
These indices take precedence when running certain queries. Example:
EXPLAIN SELECT p.name, e.name, s.capitol
FROM people AS p
JOIN
(enterprises AS e
JOIN states AS s
ON e.town=s.capitol)
ON p.town=e.town;
Result:
+----------------------------------------------------------------------------------+----------------+---------------+-------------+ | PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | +----------------------------------------------------------------------------------+----------------+---------------+-------------+ | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER IPEOPLE | null | null | null | | SERVER FILTER BY FIRST KEY ONLY | null | null | null | | PARALLEL INNER-JOIN TABLE 0 | null | null | null | | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER IENTERPRISES | null | null | null | | SERVER FILTER BY FIRST KEY ONLY | null | null | null | | PARALLEL INNER-JOIN TABLE 0 | null | null | null | | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER ISTATES | null | null | null | | SERVER FILTER BY FIRST KEY ONLY | null | null | null | | DYNAMIC SERVER FILTER BY "P.0:TOWN" IN ("E.LOCATION:TOWN") | null | null | null | +----------------------------------------------------------------------------------+----------------+---------------+-------------+
As you can see, for this particular query Phoenix uses all three previously created indices rather than original tables.
Hints
Phoenix uses hints to switch between different JOIN
ways of operation.
Hash and sort-merge joins
Phoenix employs two types of JOIN
algorithms: hash and sort-merge. Hash JOIN
is the fastest algorithm, so it is used by default. However, it cannot be used if the total size of the tables is too large and exceeds the available memory. In this case, you have to use the sort-merge JOIN
algorithm by means of the /*+ USE_SORT_MERGE_JOIN */
hint in queries:
SELECT /*+ USE_SORT_MERGE_JOIN */ <values>
FROM <table>
JOIN <clause_condition>
Star join optimization
Phoenix uses star join optimization for queries that have multiple JOIN
clauses. For example, your query has the following structure:
SELECT p.name, s.abbr, e.name
FROM people AS p
JOIN states AS s
ON p.state=s.abbr
JOIN enterprises AS e
ON p.state=e.state;
When processing such query, Phoenix will load all the tables involved into the RAM and perform both JOIN
clauses simultaneously. This can be seen in the query plan, which you can get by using the EXPLAIN
operator:
EXPLAIN SELECT p.name, s.abbr, e.name
FROM people AS p
JOIN states AS s
ON p.state=s.abbr
JOIN enterprises AS e
ON p.state=e.state;
Result:
+----------------------------------------------------------------------------------------------------------+----------------+---------------+--------+ | PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_IN | +----------------------------------------------------------------------------------------------------------+----------------+---------------+--------+ | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER PEOPLE | null | null | null | | PARALLEL INNER-JOIN TABLE 0 | null | null | null | | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER STATES | null | null | null | | PARALLEL INNER-JOIN TABLE 1 | null | null | null | | CLIENT 1-CHUNK 976934 ROWS 314572800 BYTES PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER ENTERPRISES | null | null | null | +----------------------------------------------------------------------------------------------------------+----------------+---------------+--------+
Sometimes, the tables can be very big to be loaded into the RAM at the same time. In such cases, you need to turn the star join optimization off by using the /+ NO_STAR_JOIN/
hint in the query:
EXPLAIN SELECT /*+ NO_STAR_JOIN*/ p.name, s.abbr, e.name
FROM people AS p
JOIN states AS s
ON p.state=s.abbr
JOIN enterprises AS e
ON p.state=e.state;
Result:
+--------------------------------------------------------------------------------------------------+----------------+---------------+-------------+ | PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | +--------------------------------------------------------------------------------------------------+----------------+---------------+-------------+ | CLIENT 1-CHUNK 976934 ROWS 314572800 BYTES PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER ENTERPRISES | null | null | null | | PARALLEL INNER-JOIN TABLE 0 | null | null | null | | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER PEOPLE | null | null | null | | PARALLEL INNER-JOIN TABLE 0 | null | null | null | | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER STATES | null | null | null | +--------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
As can be seen, in this case, the JOIN
operations are performed one by one.