The JSON data type
To work with JSON data in ADQM, you can:
-
use JSON data type (available since version 22.3);
-
explicitly define table columns mapping them to JSON fields;
-
store JSON objects as strings and use special functions to extract values.
The current article describes these approaches using examples based on a test data set that includes information about persons (with fields of different types — name, age, hire_date, address) and can be structured differently in JSON format.
JSON data type
Starting with version 22.3, ADQM supports the Object(JSON) data type (an alias is JSON) that allows automatic creation of columns for a table based on incoming JSON data. That is, the table structure can dynamically change depending on data changes/updates.
|
IMPORTANT
The JSON data type is currently under development, but you can use it experimentally. To do this, enable the following option before creating a table: SET allow_experimental_object_type = 1.
|
Example
Create the persons.json file with JSON data in the /var/lib/clickhouse/user_files folder:
[
{"name":"john", "age":42, "hire_date":"2021-01-01", "address": {"country":"USA", "city":"New York"}},
{"name":"mary", "age":25, "hire_date":"2022-02-02", "address": {"country":"UK", "city":"London"}},
{"name":"andrew", "age":33, "hire_date":"2023-03-03", "address": {"country":"Germany", "city":"Berlin"}}
]
When importing data, use the JSONAsObject format to specify that each JSON object should be stored as a separate table row in a JSON column.
For example, the following query specifies that JSON objects from the persons.json file are stored in the person column of the JSON type (if the column is not explicitly specified, the json column will be used):
SELECT * FROM file('persons.json', 'JSONAsObject', 'person JSON');
Query execution result:
┌─person──────────────────────────────────────────────────────────────────────────────────────────────────┐
│ {"address.city":"New York","address.country":"USA","age":42,"hire_date":"2021-01-01","name":"john"} │
│ {"address.city":"London","address.country":"UK","age":25,"hire_date":"2022-02-02","name":"mary"} │
│ {"address.city":"Berlin","address.country":"Germany","age":33,"hire_date":"2023-03-03","name":"andrew"} │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────┘
To query this data effectively, it is recommended to create a MergeTree table and insert data into it via the INSERT INTO query. Run the following commands sequentially:
SET allow_experimental_object_type = 1;
CREATE TABLE json_table (person JSON) ENGINE = MergeTree ORDER BY tuple();
INSERT INTO json_table SELECT * FROM file('persons.json', 'JSONAsObject', 'person JSON');
Run the DESCRIBE query to verify that the table contains a single JSON column:
DESCRIBE json_table;
┌─name───┬─type───────────┬
│ person │ Object('json') │
└────────┴────────────────┴
With the describe_extend_object_types setting, you can see which columns will be dynamically created in the table according to fields of the JSON objects:
DESCRIBE json_table SETTINGS describe_extend_object_types=1;
┌─name───┬─type───────────────────────────────────────────────────────────────────────────────────────┬ │ person │ Tuple(address Tuple(city String, country String), age Int8, hire_date String, name String) │ └────────┴────────────────────────────────────────────────────────────────────────────────────────────┴
|
NOTE
Note that dynamically created columns cannot be used as primary keys or sort keys, as ADQM does not know about these columns when creating a table. Therefore, the |
Select data from dynamic columns
You can access dynamically created columns of a table in queries. For example:
SELECT person.name, person.address.country FROM json_table;
┌─person.name─┬─person.address.country─┐ │ john │ USA │ │ mary │ UK │ │ andrew │ Germany │ └─────────────┴────────────────────────┘
When fetching data from JSON-based columns, you can use them to sort and aggregate data.
For example, the following query returns rows sorted by person names:
SELECT person.name, person.address.city FROM json_table ORDER BY person.name;
┌─person.name─┬─person.address.city─┐ │ andrew │ Berlin │ │ john │ New York │ │ mary │ London │ └─────────────┴─────────────────────┘
Handle data changes
If JSON object fields change in a source file (for example, new fields are added or types of existing ones change), ADQM will automatically adjust table columns according to a new structure of JSON objects when inserting data.
Add new columns
-
Add an
idfield to the persons.json source file:[ {"id":1, "name":"john", "age":42, "hire_date":"2021-01-01", "address": {"country":"USA", "city":"New York"}}, {"id":2, "name":"mary", "age":25, "hire_date":"2022-02-02", "address": {"country":"UK", "city":"London"}}, {"id":3, "name":"andrew", "age":33, "hire_date":"2023-03-03", "address": {"country":"Germany", "city":"Berlin"}} ] -
Insert data into the table from the JSON file:
INSERT INTO json_table SELECT * FROM file('persons.json', 'JSONAsObject', 'person JSON'); -
Make sure a new
idcolumn of the Int8 type is added to the table schema:DESCRIBE json_table SETTINGS describe_extend_object_types=1;┌─name───┬─type────────────────────────────────────────────────────────────────────────────────────────────────┬ │ person │ Tuple(address Tuple(city String, country String), age Int8, hire_date String, id Int8, name String) │ └────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────┴
Change columns
-
Change the
idfield values in the original JSON file from numbers to strings, for example:[ {"id":"user_1", "name":"john", "age":42, "hire_date":"2021-01-01", "address": {"country":"USA", "city":"New York"}}, {"id":"user_2", "name":"mary", "age":25, "hire_date":"2022-02-02", "address": {"country":"UK", "city":"London"}}, {"id":"user_3", "name":"andrew", "age":33, "hire_date":"2023-03-03", "address": {"country":"Germany", "city":"Berlin"}} ] -
Insert data into the table:
INSERT INTO json_table SELECT * FROM file('persons.json', 'JSONAsObject', 'person JSON'); -
Make sure the
idcolumn type is automatically changed to String:DESCRIBE json_table SETTINGS describe_extend_object_types=1;┌─name───┬─type──────────────────────────────────────────────────────────────────────────────────────────────────┬ │ person │ Tuple(address Tuple(city String, country String), age Int8, hire_date String, id String, name String) │ └────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────┴
Unstructured data
If JSON data is unstructured (for example, JSON objects have different sets of fields), ADQM will anyway consider all fields and fill in fields for which no values are defined with default values for the corresponding data types (for example, 0 for numbers, empty strings for strings).
Example
In the persons.json source file, change the structure of the first JSON object — remove the address.city field and add a new position field:
[
{"name":"john", "age":42, "hire_date":"2021-01-01", "address": {"country":"USA"}, "position":"manager"},
{"name":"mary", "age":25, "hire_date":"2022-02-02", "address": {"country":"UK", "city":"London"}},
{"name":"andrew", "age":33, "hire_date":"2023-03-03", "address": {"country":"Germany", "city":"Berlin"}}
]
Create a new table and paste data from the source file into it:
SET allow_experimental_object_type = 1;
CREATE TABLE unstruct_json_table (person JSON) ENGINE = MergeTree ORDER BY tuple();
INSERT INTO unstruct_json_table SELECT * FROM file('persons.json', 'JSONAsObject', 'person JSON');
Check values of table columns:
SELECT person.name, person.position, person.address.city FROM unstruct_json_table;
┌─person.name─┬─person.position─┬─person.address.city─┐ │ john │ manager │ │ │ mary │ │ London │ │ andrew │ │ Berlin │ └─────────────┴─────────────────┴─────────────────────┘
Default values in table columns are taken into account when calculating aggregate functions. For example, the following query with the groupArray function returns an array of values of the person.address.city column, where the first value is an empty string as the first JSON object does not contain the city field:
SELECT groupArray(person.address.city) FROM unstruct_json_table;
┌─groupArray(person.address.city)─┐ │ ['','London','Berlin'] │ └─────────────────────────────────┘
Add primary keys
As noted above, dynamically created columns cannot be used as primary or sort keys. When creating a table, it is recommended that you use the JSON type for semi-structured parts of data that can potentially be modified in the JSON source, but explicitly specify separate columns for data which type and structure can be clearly declared, and use those columns as primary keys.
Example
In the persons.json source data file, add the person dedicated field for JSON objects:
[
{"person":{"address":{"city":"New York","country":"USA"},"age":42,"hire_date":"2021-01-01","name":"john"}},
{"person":{"address":{"city":"London","country":"UK"},"age":25,"hire_date":"2022-02-02","name":"mary"}},
{"person":{"address":{"city":"Berlin","country":"Germany"},"age":33,"hire_date":"2023-03-03","name":"andrew"}}
]
Create a table with the following columns:
-
name— to store values of the JSON objects'namefield extracted with theJSONExtractStringfunction (this column is used as a sort key); -
person— to store JSON objects.
SET allow_experimental_object_type = 1;
CREATE TABLE sort_key_json_table (name String DEFAULT JSONExtractString(toJSONString(person), 'name'), person JSON)
ENGINE = MergeTree
ORDER BY name;
Insert data into the table from the persons.json file:
INSERT INTO sort_key_json_table (name, person) SELECT
JSONExtractString(toJSONString(person), 'person.name'),
person
FROM file('persons.json', 'JSONAsObject', 'person JSON');
In the resulting table, rows are sorted by the name column:
SELECT * FROM sort_key_json_table;
┌─name───┬─person────────────────────────────────────────────┐
│ andrew │ ((('Berlin','Germany'),33,'2023-03-03','andrew')) │
│ john │ ((('New York','USA'),42,'2021-01-01','john')) │
│ mary │ ((('London','UK'),25,'2022-02-02','mary')) │
└────────┴───────────────────────────────────────────────────┘
Map table columns to JSON fields
ADQM can work with JSON data without using the Object(JSON) data type — this is relevant for all versions of ADQM (including versions earlier than 22.3). For this, it is recommended to create a table with explicitly defined columns mapping them to all JSON fields. This requires you to know the JSON data structure and field types, and make sure the table schema is maintained if JSON fields are added/changed. For nested JSON objects, you can use the Tuple, Map, and Nested data types when defining table columns.
This approach is optimal when JSON data has a known fixed structure — it can provide better performance and the ability to control codecs and flexibly configure primary and sort keys.
Example
Prepare the /var/lib/clickhouse/user_files/persons.json file with JSON data without nested structures:
[
{"name":"john", "age":42, "hire_date":"2021-01-01"},
{"name":"mary", "age":25, "hire_date":"2022-02-02"},
{"name":"andrew", "age":33, "hire_date":"2023-03-03"}
]
Create a table with appropriate columns for all JSON fields:
CREATE TABLE struct_json_table (name String, age Int32, hire_date Date) ENGINE = MergeTree ORDER BY name;
Insert data from the file into the table using the file table function:
INSERT INTO struct_json_table SELECT * FROM file('persons.json', 'JSONEachRow');
As a result, data from JSON fields are written to the corresponding table columns:
SELECT * FROM struct_json_table;
┌─name───┬─age─┬──hire_date─┐ │ andrew │ 33 │ 2023-03-03 │ │ john │ 42 │ 2021-01-01 │ │ mary │ 25 │ 2022-02-02 │ └────────┴─────┴────────────┘
This example uses JSONEachRow — one of the supported input formats to import JSON data. The JSON import formats section describes features of other formats available in ADQM.
For examples of creating Nested, Tuple, and Map columns to work with nested JSON structures, see the Handle as Structured Data article in the ClickHouse documentation.
Functions to work with JSON
ADQM can store JSON objects as strings and provides special functions for parsing JSON-based strings and extracting field values.
For example, the following query interprets JSON objects from the specified file as strings (the JSONAsString format is used), extracts values of the name and age fields, and writes them to the appropriate columns:
SELECT
JSONExtractString(json, 'name') AS name,
JSONExtractUInt(json, 'age') AS age
FROM (SELECT * FROM file('persons.json', JSONAsString) AS json);
The output:
┌─name───┬─age──┐ │ john │ 42 │ │ mary │ 25 │ │ andrew │ 33 │ └────────┴──────┘
Refer to the json-functions article in the ClickHouse documentation for the full list of supported functions.
Import JSON data
ADQM provides a set of formats and settings for importing JSON data. To choose appropriate options, you first need to determine how input data is structured.
JSON import formats
The table below lists examples of choosing formats to parse incoming JSON data depending on its structure. To run these data import examples, first create a table:
CREATE TABLE import_json_table(name String, age Int32, hire_date Date) ENGINE = MergeTree ORDER BY hire_date;
Queries in the Import data column insert data to the import_json_table table from the persons.json file with the content described in the Input JSON column (the persons.json file should be located in a folder from which clickhouse-client is started).
| Input JSON | Import data |
|---|---|
Array of JSON objects:
|
Use the
|
NDJSON (newline-delimited JSON):
|
Use the
|
JSON arrays:
|
Use the
|
A single JSON object that contains fields with arrays of values:
|
Use the
|
Array of JSON arrays:
|
Use the
|
Each of the examples above imports data into the table as follows:
┌─name───┬─age─┬──hire_date─┐ │ john │ 42 │ 2021-01-01 │ │ mary │ 25 │ 2022-02-02 │ │ andrew │ 33 │ 2023-03-03 │ └────────┴─────┴────────────┘
The full list of formats for JSON data import is available in the Formats for Input and Output Data article (see formats prefixed with JSON).
Import JSON objects as strings
Use the JSONAsString format to import JSON objects into an ADQM table as strings (without parsing). In this case, data is written to a single column of the String type.
If there are several comma-separated JSON objects in the source file, each of them is considered a separate row. The input in square brackets is considered an array of JSON objects.
Example
-
Prepare JSON data in the persons.json local file:
[ {"name":"john", "age":42, "hire_date":"2021-01-01"}, {"name":"mary", "age":25, "hire_date":"2022-02-02"}, {"name":"andrew", "age":33, "hire_date":"2023-03-03"} ] -
Create a table with a string column:
CREATE TABLE json_as_str_table(data String) ENGINE = MergeTree ORDER BY tuple(); -
Insert data into the table using the
JSONAsStringformat:INSERT INTO json_as_str_table FROM INFILE 'persons.json' FORMAT JSONAsString; -
Resulting table:
SELECT * FROM json_as_str_table;┌─data──────────────────────────────────────────────────┐ │ {"name":"john", "age":42, "hire_date":"2021-01-01"} │ │ {"name":"mary", "age":25, "hire_date":"2022-02-02"} │ │ {"name":"andrew", "age":33, "hire_date":"2023-03-03"} │ └───────────────────────────────────────────────────────┘
To further process JSON objects stored as strings, use JSON functions.
In the following query, the JSONExtractString function parses strings representing JSON objects from the data column and returns string values of the name field:
SELECT JSONExtractString(data, 'name') AS person_name, data FROM json_as_str_table;
┌─person_name─┬─data──────────────────────────────────────────────────┐
│ john │ {"name":"john", "age":42, "hire_date":"2021-01-01"} │
│ mary │ {"name":"mary", "age":25, "hire_date":"2022-02-02"} │
│ andrew │ {"name":"andrew", "age":33, "hire_date":"2023-03-03"} │
└─────────────┴───────────────────────────────────────────────────────┘
Skip unknown fields
By default, ADQM ignores unknown columns when importing JSON data.
For example, create a table with two columns (name and hire_date) and import JSON with three fields (name, hire_date, and age — the same JSON as in the previous section):
CREATE TABLE short_table(name String, hire_date Date) ENGINE = MergeTree ORDER BY hire_date;
INSERT INTO short_table FROM INFILE 'persons.json' FORMAT JSONEachRow;
Ensure that the age field is ignored:
SELECT * FROM short_table;
┌─name───┬──hire_date─┐ │ john │ 2021-01-01 │ │ mary │ 2022-02-02 │ │ andrew │ 2023-03-03 │ └────────┴────────────┘
To change this behavior, disable the input_format_skip_unknown_fields setting:
SET input_format_skip_unknown_fields = 0;
ADQM will throw an exception if table and JSON structures are inconsistent:
INSERT INTO short_table FROM INFILE 'persons.json' FORMAT JSONEachRow;
Exception on client: Code: 117. DB::Exception: Unknown field found while parsing JSONEachRow format: age: (at row 1)
Import JSON with nested objects
To import JSON objects with nested structures, use complex data types (JSON, Array, or Tuple) for columns to store nested object.
Example
-
Prepare the /var/lib/clickhouse/user_files/persons.json file:
[ {"name":"john", "age":42, "hire_date":"2021-01-01", "address": {"country":"USA", "city":"New York"}}, {"name":"mary", "age":25, "hire_date":"2022-02-02", "address": {"country":"UK", "city":"London"}}, {"name":"andrew", "age":33, "hire_date":"2023-03-03", "address": {"country":"Germany", "city":"Berlin"}} ] -
Select data using the
filetable function with a table structure passed as a parameter:SELECT * FROM file('persons.json', JSONEachRow, 'name String, age Int32, hire_date Date, address JSON'); -
In the resulting table, the
addresscolumn contains nested JSON objects:
┌─name───┬─age─┬──hire_date─┬─address───────────────────────────────┐
│ john │ 42 │ 2021-01-01 │ {"city":"New York","country":"USA"} │
│ mary │ 25 │ 2022-02-02 │ {"city":"London","country":"UK"} │
│ andrew │ 33 │ 2023-03-03 │ {"city":"Berlin","country":"Germany"} │
└────────┴─────┴────────────┴───────────────────────────────────────┘
Fields of nested JSONs
To access nested object fields, enable the input_format_import_nested_json setting:
SET input_format_import_nested_json = 1
After that, you can use the dot notation to specify fields of nested JSON objects in queries, wrapping them in backquotes.
For example, to display cities from address nested objects in a separate column, use the query:
SELECT * FROM file('persons.json', JSONEachRow, 'name String, `address.city` String');
┌─name───┬─address.city─┐ │ john │ New York │ │ mary │ London │ │ andrew │ Berlin │ └────────┴──────────────┘
Auto detect data types on JSON import
ADQM can automatically detect data types when importing JSON data. This functionality allows you to quickly create a table with columns which data types match types of fields in the JSON source.
Example
-
Prepare the /var/lib/clickhouse/user_files/persons.json source file:
{"name":"john","age":42,"hire_date":"2021-01-01"} {"name":"mary","age":25,"hire_date":"2022-02-02"} {"name":"andrew","age":33,"hire_date":"2023-03-03"} -
Run the
DESCRIBEquery to check which data types are defined for values in the persons.json file:DESCRIBE TABLE file('persons.json', JSONEachRow);┌─name──────┬─type─────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐ │ name │ Nullable(String) │ │ │ │ │ │ │ age │ Nullable(Int64) │ │ │ │ │ │ │ hire_date │ Nullable(Date) │ │ │ │ │ │ └───────────┴──────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
-
Create a table using the
filetable function:CREATE TABLE persons ENGINE = MergeTree ORDER BY tuple() AS SELECT * FROM file('persons.json', JSONEachRow); -
Make sure the table columns match types of fields from the imported JSON:
DESCRIBE TABLE persons;┌─name──────┬─type─────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐ │ name │ Nullable(String) │ │ │ │ │ │ │ age │ Nullable(Int64) │ │ │ │ │ │ │ hire_date │ Nullable(Date) │ │ │ │ │ │ └───────────┴──────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
Export JSON data
Most of the JSON formats that ADQM supports can be used to arrange results of SELECT queries.
JSON output formats
Below are examples of formats for exporting data in JSON format from the export_json_table table:
┌─name───┬─age─┬──hire_date─┐ │ john │ 42 │ 2021-01-01 │ │ mary │ 25 │ 2022-02-02 │ │ andrew │ 33 │ 2023-03-03 │ └────────┴─────┴────────────┘
CREATE TABLE export_json_table (`name` String, `age` Int32, `hire_date` Date) ENGINE = MergeTree ORDER BY hire_date;
INSERT INTO export_json_table VALUES ('john', 42, '2021-01-01'), ('mary', 25, '2022-02-02'), ('andrew', 33, '2023-03-03');
-
JSONEachRow
Outputs each table row as a separate JSON object on a new line.
SELECT * FROM export_json_table FORMAT JSONEachRow;{"name":"john","age":42,"hire_date":"2021-01-01"} {"name":"mary","age":25,"hire_date":"2022-02-02"} {"name":"andrew","age":33,"hire_date":"2023-03-03"} -
JSONStringsEachRow
Differs from
JSONEachRowonly in that it outputs field values as strings rather than typed values of JSON objects.For example, the following query outputs the
agefield values as strings instead of numbers:SELECT * FROM export_json_table FORMAT JSONStringsEachRow;{"name":"john","age":"42","hire_date":"2021-01-01"} {"name":"mary","age":"25","hire_date":"2022-02-02"} {"name":"andrew","age":"33","hire_date":"2023-03-03"} -
JSONCompactEachRow
Outputs table rows as JSON arrays — arrays of values without column names.
SELECT * FROM export_json_table FORMAT JSONCompactEachRow;["john", 42, "2021-01-01"] ["mary", 25, "2022-02-02"] ["andrew", 33, "2023-03-03"] -
JSON
Outputs data in JSON format. In addition to the table data, it also outputs column names and types, as well as some additional information (for example, the number of output rows and the total number of rows that could have been output if there were not
LIMITin a query).SELECT * FROM export_json_table LIMIT 2 FORMAT JSON;{ "meta": [ { "name": "name", "type": "String" }, { "name": "age", "type": "Int32" }, { "name": "hire_date", "type": "Date" } ], "data": [ { "name": "john", "age": 42, "hire_date": "2021-01-01" }, { "name": "mary", "age": 25, "hire_date": "2022-02-02" } ], "rows": 2, "rows_before_limit_at_least": 3, "statistics": { "elapsed": 0.000559985, "rows_read": 3, "bytes_read": 59 } } -
JSONCompact
Outputs the same metadata, but uses a compressed form for data itself.
SELECT * FROM export_json_table LIMIT 2 FORMAT JSONCompact;{ "meta": [ { "name": "name", "type": "String" }, { "name": "age", "type": "Int32" }, { "name": "hire_date", "type": "Date" } ], "data": [ ["john", 42, "2021-01-01"], ["mary", 25, "2022-02-02"] ], "rows": 2, "rows_before_limit_at_least": 3, "statistics": { "elapsed": 0.000571299, "rows_read": 3, "bytes_read": 59 } }
The full list of formats to export data as JSON is available in the Formats for Input and Output Data article of the ClickHouse documentation.
Export JSON to a file
To save JSON data from an ADQM table to a file, you can use the INTO OUTFILE clause in the SELECT query, for example:
SELECT * FROM export_json_table INTO OUTFILE 'out.json' FORMAT JSONEachRow;
Check that the specified file with data has been created in a folder from which clickhouse-client is started:
$ cat out.json
{"name":"john","age":42,"hire_date":"2021-01-01"}
{"name":"mary","age":25,"hire_date":"2022-02-02"}
{"name":"andrew","age":33,"hire_date":"2023-03-03"}