JOIN clause in Phoenix

Overview

Phoenix fully supports the standard SQL JOIN syntax. It allows combining records from different tables if they have common values.

In this article, three example tables with the following columns are used.

Example tables
Table Column Description

states

name

State name

abbr

State abbreviation

capitol

State capital

largest

State largest city

enterprises

name

Enterprise name

country

Country name where the enterprise is located

state

State abbreviation (in USA) or region name where the enterprise is located

town

Town name where the enterprise is located

people

name

Name of the person

age

Age of the person

town

Town name where the person lives

state

State abbreviation (in USA) or region name where the person lives

country

Country name where the person lives

To be able to follow examples given in this article, you need to create those tables in your Phoenix environment. Do the following:

  1. Download the CSV files with sample data: people.csv, enterprises.csv, and states.csv.

  2. Put the downloaded files on the filesystem of the host to which the Phoenix Query Server component is mapped, to the directory of your choice.

  3. Connect to this host and load the Phoenix shell by executing the following command:

    $ /usr/lib/phoenix/bin/sqlline.py
  4. Create the empty tables by executing the following commands:

    CREATE TABLE enterprises (
    name VARCHAR(50) NOT NULL,
    country VARCHAR(50),
    state VARCHAR(50),
    town VARCHAR(50)
    CONSTRAINT pk PRIMARY KEY (name));
    CREATE TABLE people (
    name VARCHAR(50) NOT NULL,
    age SMALLINT,
    town VARCHAR(50),
    state VARCHAR(50),
    country VARCHAR(50)
    CONSTRAINT pk PRIMARY KEY (name));
    CREATE TABLE states (
    name VARCHAR(50) NOT NULL,
    abbr VARCHAR(50),
    capitol VARCHAR(50),
    largest VARCHAR(50)
    CONSTRAINT pk PRIMARY KEY (name));
  5. Exit the shell by pressing Ctrl+D and load the data from the CSV files into the tables by executing the following commands:

    $ /usr/lib/phoenix/bin/psql.py /<csv_dir>/people.csv
    $ /usr/lib/phoenix/bin/psql.py /<csv_dir>/enterprises.csv
    $ /usr/lib/phoenix/bin/psql.py /<csv_dir>/states.csv

where <csv_dir> is the path to the directory where the downloaded CSV files are located.

JOIN types

INNER JOIN

The INNER JOIN takes each row in the first table and searches for a row in the second table that matches the specified condition. If such a row is found, the INNER JOIN composes a row with values of the columns specified in the query. This row is added to the result and INNER JOIN proceeds.

This joining mode is used by default, so you may omit the INNER modifier in the query text.

For example, to find all american enterprises along with their hometowns and the respective state capitols, you need the following query:

SELECT e.name, e.town, s.capitol
FROM states AS s
INNER JOIN enterprises AS e
ON s.abbr=e.state;

Result:

+------------------+---------------+-------------+
|      E.NAME      |    E.TOWN     |  S.CAPITOL  |
+------------------+---------------+-------------+
| Rockstar         | New York City | Albany      |
| Bethesda         | Rockville     | Annapolis   |
| DigitalAnvil     | Austin        | Austin      |
| Shaw Group       | Baton Rouge   | Baton Rouge |
| Microsoft        | Redmond       | Olympia     |
| Avnet            | Phoenix       | Phoenix     |
| Apple            | Cupertino     | Sacramento  |
| Obsidian         | Irvine        | Sacramento  |
| Sierra           | Los Angeles   | Sacramento  |
| Intel            | Sacramento    | Sacramento  |
| Northrop Grumman | Albuquerque   | Santa Fe    |
+------------------+---------------+-------------+

LEFT/RIGHT OUTER JOIN

The LEFT OUTER JOIN works like INNER JOIN, but adds values from the first table anyway, regardless of whether the specified condition is satisfied.

For example, to list all the enterprises along with their hometowns, and also the state capitols for those located in the USA, you need the following query:

SELECT e.name, e.town, s.capitol
FROM enterprises AS e
LEFT OUTER JOIN states AS s
ON e.state=s.abbr;

Result:

+------------------+---------------+-------------+
|      E.NAME      |    E.TOWN     |  S.CAPITOL  |
+------------------+---------------+-------------+
| Northrop Grumman | Albuquerque   | Santa Fe    |
| DigitalAnvil     | Austin        | Austin      |
| Shaw Group       | Baton Rouge   | Baton Rouge |
| Apple            | Cupertino     | Sacramento  |
| Obsidian         | Irvine        | Sacramento  |
| Eidos            | London        |             |
| Sierra           | Los Angeles   | Sacramento  |
| Alluxi           | Monterrey     |             |
| Rockstar         | New York City | Albany      |
| Empresa          | Nogales       |             |
| Avnet            | Phoenix       | Phoenix     |
| Microsoft        | Redmond       | Olympia     |
| Bethesda         | Rockville     | Annapolis   |
| Intel            | Sacramento    | Sacramento  |
| Ubisoft          | Saint-Mande   |             |
+------------------+---------------+-------------+

The RIGHT OUTER JOIN does the same, but in regard to the second table instead of the first.

For example, to list all the USA states along with their capitols, and also the enterprises that may be located in those states along with their hometowns, you need the following query:

SELECT s.name, s.capitol, e.name, e.town
FROM enterprises AS e
RIGHT OUTER JOIN states AS s
ON e.state=s.abbr
ORDER BY s.name
LIMIT 20;

Result:

+-------------+--------------+----------+-------------+
|   S.NAME    |  S.CAPITOL   |  E.NAME  |   E.TOWN    |
+-------------+--------------+----------+-------------+
| Alabama     | Montgomery   |          |             |
| Alaska      | Juneau       |          |             |
| Arizona     | Phoenix      | Avnet    | Phoenix     |
| Arkansas    | Little Rock  |          |             |
| California  | Sacramento   | Apple    | Cupertino   |
| California  | Sacramento   | Obsidian | Irvine      |
| California  | Sacramento   | Sierra   | Los Angeles |
| California  | Sacramento   | Intel    | Sacramento  |
| Colorado    | Denver       |          |             |
| Connecticut | Hartford     |          |             |
| Delaware    | Dover        |          |             |
| Florida     | Tallahassee  |          |             |
| Georgia     | Atlanta      |          |             |
| Hawaii      | Honolulu     |          |             |
| Idaho       | Boise        |          |             |
| Illinois    | Springfield  |          |             |
| Indiana     | Indianapolis |          |             |
| Iowa        | Des Moines   |          |             |
| Kansas      | Topeka       |          |             |
| Kentucky    | Frankfort    |          |             |
+-------------+--------------+----------+-------------+

The OUTER modifier is implied with the LEFT and RIGHT modifiers, so it can be omitted.

FULL OUTER JOIN

The FULL OUTER JOIN unites the LEFT JOIN and the RIGHT JOIN: it combines the rows that satisfy the specified condition with the rows that do not from both tables.

Example:

SELECT e.name, e.town, s.capitol
FROM enterprises AS e
FULL OUTER JOIN states AS s
ON e.state=s.abbr
LIMIT 20;

Result:

+------------------+---------------+----------------+
|      E.NAME      |    E.TOWN     |   S.CAPITOL    |
+------------------+---------------+----------------+
|                  |               | Juneau         |
|                  |               | Montgomery     |
|                  |               | Little Rock    |
| Avnet            | Phoenix       | Phoenix        |
| Apple            | Cupertino     | Sacramento     |
| Obsidian         | Irvine        | Sacramento     |
| Intel            | Sacramento    | Sacramento     |
| Sierra           | Los Angeles   | Sacramento     |
|                  |               | Denver         |
|                  |               | Hartford       |
|                  |               | Dover          |
| Eidos            | London        |                |
|                  |               | Tallahassee    |
|                  |               | Atlanta        |
|                  |               | Honolulu       |
|                  |               | Des Moines     |
|                  |               | Boise          |
|                  |               | Springfield    |
|                  |               | Indianapolis   |
| Ubisoft          | Saint-Mande   |                |
+------------------+---------------+----------------+

The OUTER modifier is implied with the FULL modifier, so it can be omitted.

SELF JOIN

You can join a table with itself (e.g., to filter it). For example, to list the states the capitols of which are also their largest cities, you need the following query:

SELECT s.name, s.capitol
FROM states AS s
JOIN states AS ss
ON s.capitol=ss.largest;

Result:

+---------------+----------------+
|    S.NAME     |   S.CAPITOL    |
+---------------+----------------+
| Arizona       | Phoenix        |
| Arkansas      | Little Rock    |
| Colorado      | Denver         |
| Georgia       | Atlanta        |
| Hawaii        | Honolulu       |
| Idaho         | Boise          |
| Indiana       | Indianapolis   |
| Iowa          | Des Moines     |
| Massachusetts | Boston         |
| Mississippi   | Jackson        |
| Ohio          | Columbus       |
| Oklahoma      | Oklahoma City  |
| Rhode Island  | Providence     |
| Tennessee     | Nashville      |
| Utah          | Salt Lake City |
| West Virginia | Charleston     |
| West Virginia | Charleston     |
| Wyoming       | Cheyenne       |
+---------------+----------------+

CROSS JOIN

Normally, there is not much use for CROSS JOIN in Phoenix as the tables are usually very big. For that reason, Phoenix ignores the CROSS modifier. However, cross joins are still possible in Phoenix and employ the following syntax:

SELECT <col_1>, <col_2>, ... <col_n>
FROM t1, t2;

GROUP JOIN

Phoenix supports complex JOIN syntax. The result of joining two tables can be joined with another table using parenthesis. Also, instead of any table in a JOIN clause, you can use a subquery (derived table), which in turn can also contain its own JOIN clause.

For example, to list the people who live in the capitols of the states where there is also an enterprise located, you need the following query:

SELECT p.name, e.name, s.capitol
FROM people AS p
JOIN
    (enterprises AS e
    JOIN states AS s
    ON e.town=s.capitol)
ON p.town=e.town
LIMIT 20;

Result:

+------------------+--------------+-------------+
|      P.NAME      |    E.NAME    |  S.CAPITOL  |
+------------------+--------------+-------------+
| Abbott Howard    | Shaw Group   | Baton Rouge |
| Alvarado Melvin  | DigitalAnvil | Austin      |
| Baker Daniel     | Shaw Group   | Baton Rouge |
| Barnes Mollie    | Shaw Group   | Baton Rouge |
| Becker Samuel    | Shaw Group   | Baton Rouge |
| Bush Cameron     | Shaw Group   | Baton Rouge |
| Carpenter Carrie | Avnet        | Phoenix     |
| Clarke Joe       | Avnet        | Phoenix     |
| Cox Dennis       | Intel        | Sacramento  |
| Cox Shawn        | Avnet        | Phoenix     |
| Daniel Ada       | DigitalAnvil | Austin      |
| Day Hattie       | Intel        | Sacramento  |
| Erickson Mayme   | Intel        | Sacramento  |
| Evans Sophie     | Avnet        | Phoenix     |
| Farmer Dean      | Shaw Group   | Baton Rouge |
| Fisher Gordon    | Intel        | Sacramento  |
| Fletcher Julia   | Avnet        | Phoenix     |
| Garner Juan      | Shaw Group   | Baton Rouge |
| Gill Josephine   | Shaw Group   | Baton Rouge |
| Gonzales Edna    | Shaw Group   | Baton Rouge |
+------------------+--------------+-------------+

Exclusion of INNER part from OUTER JOIN

Sometimes you might want to exclude the rows that satisfy the specified condition from the JOIN result. To that end, you need to add a WHERE clause to the query.

For example, to list all the enterprises outside of USA, you need the following query:

SELECT e.name, e.country, e.state, e.town
FROM enterprises AS e
LEFT JOIN states AS s
ON e.state=s.abbr
WHERE s.abbr IS NULL;

Result:

+---------+-----------+---------------+-------------+
| E.NAME  | E.COUNTRY |    E.STATE    |   E.TOWN    |
+---------+-----------+---------------+-------------+
| Alluxi  | Mexico    | Nuevo Leon    | Monterrey   |
| Eidos   | UK        | England       | London      |
| Empresa | Mexico    | Sonora        | Nogales     |
| Ubisoft | France    | Ile-de-France | Saint-Mande |
+---------+-----------+---------------+-------------+

JOIN with indices

You can create secondary indices for your tables which will be used automatically for JOIN queries. Example:

CREATE INDEX ipeople ON people (town) INCLUDE (state);
CREATE INDEX istates ON states (capitol) INCLUDE (name, abbr);
CREATE INDEX ienterprises ON enterprises (town) INCLUDE (name, state);

These indices take precedence when running certain queries. Example:

EXPLAIN SELECT p.name, e.name, s.capitol
FROM people AS p
JOIN
    (enterprises AS e
    JOIN states AS s
    ON e.town=s.capitol)
ON p.town=e.town;

Result:

+----------------------------------------------------------------------------------+----------------+---------------+-------------+
|                                       PLAN                                       | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------+----------------+---------------+-------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER IPEOPLE                 | null           | null          | null        |
|     SERVER FILTER BY FIRST KEY ONLY                                              | null           | null          | null        |
|     PARALLEL INNER-JOIN TABLE 0                                                  | null           | null          | null        |
|         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER IENTERPRISES    | null           | null          | null        |
|             SERVER FILTER BY FIRST KEY ONLY                                      | null           | null          | null        |
|             PARALLEL INNER-JOIN TABLE 0                                          | null           | null          | null        |
|                 CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER ISTATES | null           | null          | null        |
|                     SERVER FILTER BY FIRST KEY ONLY                              | null           | null          | null        |
|     DYNAMIC SERVER FILTER BY "P.0:TOWN" IN ("E.LOCATION:TOWN")                   | null           | null          | null        |
+----------------------------------------------------------------------------------+----------------+---------------+-------------+

As you can see, for this particular query Phoenix uses all three previously created indices rather than original tables.

Hints

Phoenix uses hints to switch between different JOIN ways of operation.

Hash and sort-merge joins

Phoenix employs two types of JOIN algorithms: hash and sort-merge. Hash JOIN is the fastest algorithm, so it is used by default. However, it cannot be used if the total size of the tables is too large and exceeds the available memory. In this case, you have to use the sort-merge JOIN algorithm by means of the /*+ USE_SORT_MERGE_JOIN */ hint in queries:

SELECT /*+ USE_SORT_MERGE_JOIN */ <values>
FROM <table>
JOIN <clause_condition>

Star join optimization

Phoenix uses star join optimization for queries that have multiple JOIN clauses. For example, your query has the following structure:

SELECT p.name, s.abbr, e.name
FROM people AS p
JOIN states AS s
ON p.state=s.abbr
JOIN enterprises AS e
ON p.state=e.state;

When processing such query, Phoenix will load all the tables involved into the RAM and perform both JOIN clauses simultaneously. This can be seen in the query plan, which you can get by using the EXPLAIN operator:

EXPLAIN SELECT p.name, s.abbr, e.name
FROM people AS p
JOIN states AS s
ON p.state=s.abbr
JOIN enterprises AS e
ON p.state=e.state;

Result:

+----------------------------------------------------------------------------------------------------------+----------------+---------------+--------+
|                                                   PLAN                                                   | EST_BYTES_READ | EST_ROWS_READ | EST_IN |
+----------------------------------------------------------------------------------------------------------+----------------+---------------+--------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER PEOPLE                                          | null           | null          | null   |
|     PARALLEL INNER-JOIN TABLE 0                                                                          | null           | null          | null   |
|         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER STATES                                  | null           | null          | null   |
|     PARALLEL INNER-JOIN TABLE 1                                                                          | null           | null          | null   |
|         CLIENT 1-CHUNK 976934 ROWS 314572800 BYTES PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER ENTERPRISES | null           | null          | null   |
+----------------------------------------------------------------------------------------------------------+----------------+---------------+--------+

Sometimes, the tables can be very big to be loaded into the RAM at the same time. In such cases, you need to turn the star join optimization off by using the /+ NO_STAR_JOIN/ hint in the query:

EXPLAIN SELECT /*+ NO_STAR_JOIN*/ p.name, s.abbr, e.name
FROM people AS p
JOIN states AS s
ON p.state=s.abbr
JOIN enterprises AS e
ON p.state=e.state;

Result:

+--------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
|                                               PLAN                                               | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
| CLIENT 1-CHUNK 976934 ROWS 314572800 BYTES PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER ENTERPRISES | null           | null          | null        |
|     PARALLEL INNER-JOIN TABLE 0                                                                  | null           | null          | null        |
|         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER PEOPLE                          | null           | null          | null        |
|             PARALLEL INNER-JOIN TABLE 0                                                          | null           | null          | null        |
|                 CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER STATES                  | null           | null          | null        |
+--------------------------------------------------------------------------------------------------+----------------+---------------+-------------+

As can be seen, in this case, the JOIN operations are performed one by one.

Found a mistake? Seleсt text and press Ctrl+Enter to report it