The JSON data type
To work with JSON data in ADQM, you can:
-
use JSON data type (available since version 22.3);
-
explicitly define table columns mapping them to JSON fields;
-
store JSON objects as strings and use special functions to extract values.
The current article describes these approaches using examples based on a test data set that includes information about persons (with fields of different types — name
, age
, hire_date
, address
) and can be structured differently in JSON format.
JSON data type
Starting with version 22.3, ADQM supports the Object(JSON) data type (an alias is JSON) that allows automatic creation of columns for a table based on incoming JSON data. That is, the table structure can dynamically change depending on data changes/updates.
IMPORTANT
The JSON data type is currently under development, but you can use it experimentally. To do this, enable the following option before creating a table: SET allow_experimental_object_type = 1 .
|
Example
Create the persons.json file with JSON data in the /var/lib/clickhouse/user_files folder:
[
{"name":"john", "age":42, "hire_date":"2021-01-01", "address": {"country":"USA", "city":"New York"}},
{"name":"mary", "age":25, "hire_date":"2022-02-02", "address": {"country":"UK", "city":"London"}},
{"name":"andrew", "age":33, "hire_date":"2023-03-03", "address": {"country":"Germany", "city":"Berlin"}}
]
When importing data, use the JSONAsObject
format to specify that each JSON object should be stored as a separate table row in a JSON column.
For example, the following query specifies that JSON objects from the persons.json file are stored in the person
column of the JSON type (if the column is not explicitly specified, the json
column will be used):
SELECT * FROM file('persons.json', 'JSONAsObject', 'person JSON');
Query execution result:
ββpersonβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β {"address.city":"New York","address.country":"USA","age":42,"hire_date":"2021-01-01","name":"john"} β β {"address.city":"London","address.country":"UK","age":25,"hire_date":"2022-02-02","name":"mary"} β β {"address.city":"Berlin","address.country":"Germany","age":33,"hire_date":"2023-03-03","name":"andrew"} β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
To query this data effectively, it is recommended to create a MergeTree table and insert data into it via the INSERT INTO
query. Run the following commands sequentially:
SET allow_experimental_object_type = 1;
CREATE TABLE json_table (person JSON) ENGINE = MergeTree ORDER BY tuple();
INSERT INTO json_table SELECT * FROM file('persons.json', 'JSONAsObject', 'person JSON');
Run the DESCRIBE
query to verify that the table contains a single JSON column:
DESCRIBE json_table;
ββnameββββ¬βtypeββββββββββββ¬ β person β Object('json') β ββββββββββ΄βββββββββββββββββ΄
With the describe_extend_object_types
setting, you can see which columns will be dynamically created in the table according to fields of the JSON objects:
DESCRIBE json_table SETTINGS describe_extend_object_types=1;
ββnameββββ¬βtypeββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ β person β Tuple(address Tuple(city String, country String), age Int8, hire_date String, name String) β ββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄
NOTE
Note that dynamically created columns cannot be used as primary keys or sort keys, as ADQM does not know about these columns when creating a table. Therefore, the |
Select data from dynamic columns
You can access dynamically created columns of a table in queries. For example:
SELECT person.name, person.address.country FROM json_table;
ββperson.nameββ¬βperson.address.countryββ β john β USA β β mary β UK β β andrew β Germany β βββββββββββββββ΄βββββββββββββββββββββββββ
When fetching data from JSON-based columns, you can use them to sort and aggregate data.
For example, the following query returns rows sorted by person names:
SELECT person.name, person.address.city FROM json_table ORDER BY person.name;
ββperson.nameββ¬βperson.address.cityββ β andrew β Berlin β β john β New York β β mary β London β βββββββββββββββ΄ββββββββββββββββββββββ
Handle data changes
If JSON object fields change in a source file (for example, new fields are added or types of existing ones change), ADQM will automatically adjust table columns according to a new structure of JSON objects when inserting data.
Add new columns
-
Add an
id
field to the persons.json source file:[ {"id":1, "name":"john", "age":42, "hire_date":"2021-01-01", "address": {"country":"USA", "city":"New York"}}, {"id":2, "name":"mary", "age":25, "hire_date":"2022-02-02", "address": {"country":"UK", "city":"London"}}, {"id":3, "name":"andrew", "age":33, "hire_date":"2023-03-03", "address": {"country":"Germany", "city":"Berlin"}} ]
-
Insert data into the table from the JSON file:
INSERT INTO json_table SELECT * FROM file('persons.json', 'JSONAsObject', 'person JSON');
-
Make sure a new
id
column of the Int8 type is added to the table schema:DESCRIBE json_table SETTINGS describe_extend_object_types=1;
ββnameββββ¬βtypeβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ β person β Tuple(address Tuple(city String, country String), age Int8, hire_date String, id Int8, name String) β ββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄
Change columns
-
Change the
id
field values in the original JSON file from numbers to strings, for example:[ {"id":"user_1", "name":"john", "age":42, "hire_date":"2021-01-01", "address": {"country":"USA", "city":"New York"}}, {"id":"user_2", "name":"mary", "age":25, "hire_date":"2022-02-02", "address": {"country":"UK", "city":"London"}}, {"id":"user_3", "name":"andrew", "age":33, "hire_date":"2023-03-03", "address": {"country":"Germany", "city":"Berlin"}} ]
-
Insert data into the table:
INSERT INTO json_table SELECT * FROM file('persons.json', 'JSONAsObject', 'person JSON');
-
Make sure the
id
column type is automatically changed to String:DESCRIBE json_table SETTINGS describe_extend_object_types=1;
ββnameββββ¬βtypeβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ β person β Tuple(address Tuple(city String, country String), age Int8, hire_date String, id String, name String) β ββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄
Unstructured data
If JSON data is unstructured (for example, JSON objects have different sets of fields), ADQM will anyway consider all fields and fill in fields for which no values are defined with default values for the corresponding data types (for example, 0
for numbers, empty strings for strings).
Example
In the persons.json source file, change the structure of the first JSON object — remove the address.city
field and add a new position
field:
[
{"name":"john", "age":42, "hire_date":"2021-01-01", "address": {"country":"USA"}, "position":"manager"},
{"name":"mary", "age":25, "hire_date":"2022-02-02", "address": {"country":"UK", "city":"London"}},
{"name":"andrew", "age":33, "hire_date":"2023-03-03", "address": {"country":"Germany", "city":"Berlin"}}
]
Create a new table and paste data from the source file into it:
SET allow_experimental_object_type = 1;
CREATE TABLE unstruct_json_table (person JSON) ENGINE = MergeTree ORDER BY tuple();
INSERT INTO unstruct_json_table SELECT * FROM file('persons.json', 'JSONAsObject', 'person JSON');
Check values of table columns:
SELECT person.name, person.position, person.address.city FROM unstruct_json_table;
ββperson.nameββ¬βperson.positionββ¬βperson.address.cityββ β john β manager β β β mary β β London β β andrew β β Berlin β βββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββββββ
Default values in table columns are taken into account when calculating aggregate functions. For example, the following query with the groupArray
function returns an array of values of the person.address.city
column, where the first value is an empty string as the first JSON object does not contain the city
field:
SELECT groupArray(person.address.city) FROM unstruct_json_table;
ββgroupArray(person.address.city)ββ β ['','London','Berlin'] β βββββββββββββββββββββββββββββββββββ
Add primary keys
As noted above, dynamically created columns cannot be used as primary or sort keys. When creating a table, it is recommended that you use the JSON type for semi-structured parts of data that can potentially be modified in the JSON source, but explicitly specify separate columns for data which type and structure can be clearly declared, and use those columns as primary keys.
Example
In the persons.json source data file, add the person
dedicated field for JSON objects:
[
{"person":{"address":{"city":"New York","country":"USA"},"age":42,"hire_date":"2021-01-01","name":"john"}},
{"person":{"address":{"city":"London","country":"UK"},"age":25,"hire_date":"2022-02-02","name":"mary"}},
{"person":{"address":{"city":"Berlin","country":"Germany"},"age":33,"hire_date":"2023-03-03","name":"andrew"}}
]
Create a table with the following columns:
-
name
— to store values of the JSON objects'name
field extracted with theJSONExtractString
function (this column is used as a sort key); -
person
— to store JSON objects.
SET allow_experimental_object_type = 1;
CREATE TABLE sort_key_json_table (name String DEFAULT JSONExtractString(toJSONString(person), 'name'), person JSON)
ENGINE = MergeTree
ORDER BY name;
Insert data into the table from the persons.json file:
INSERT INTO sort_key_json_table (name, person) SELECT
JSONExtractString(toJSONString(person), 'person.name'),
person
FROM file('persons.json', 'JSONAsObject', 'person JSON');
In the resulting table, rows are sorted by the name
column:
SELECT * FROM sort_key_json_table;
ββnameββββ¬βpersonβββββββββββββββββββββββββββββββββββββββββββββ β andrew β ((('Berlin','Germany'),33,'2023-03-03','andrew')) β β john β ((('New York','USA'),42,'2021-01-01','john')) β β mary β ((('London','UK'),25,'2022-02-02','mary')) β ββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββ
Map table columns to JSON fields
ADQM can work with JSON data without using the Object(JSON) data type — this is relevant for all versions of ADQM (including versions earlier than 22.3). For this, it is recommended to create a table with explicitly defined columns mapping them to all JSON fields. This requires you to know the JSON data structure and field types, and make sure the table schema is maintained if JSON fields are added/changed. For nested JSON objects, you can use the Tuple, Map, and Nested data types when defining table columns.
This approach is optimal when JSON data has a known fixed structure — it can provide better performance and the ability to control codecs and flexibly configure primary and sort keys.
Example
Prepare the /var/lib/clickhouse/user_files/persons.json file with JSON data without nested structures:
[
{"name":"john", "age":42, "hire_date":"2021-01-01"},
{"name":"mary", "age":25, "hire_date":"2022-02-02"},
{"name":"andrew", "age":33, "hire_date":"2023-03-03"}
]
Create a table with appropriate columns for all JSON fields:
CREATE TABLE struct_json_table (name String, age Int32, hire_date Date) ENGINE = MergeTree ORDER BY name;
Insert data from the file into the table using the file
table function:
INSERT INTO struct_json_table SELECT * FROM file('persons.json', 'JSONEachRow');
As a result, data from JSON fields are written to the corresponding table columns:
SELECT * FROM struct_json_table;
ββnameββββ¬βageββ¬ββhire_dateββ β andrew β 33 β 2023-03-03 β β john β 42 β 2021-01-01 β β mary β 25 β 2022-02-02 β ββββββββββ΄ββββββ΄βββββββββββββ
This example uses JSONEachRow
— one of the supported input formats to import JSON data. The JSON import formats section describes features of other formats available in ADQM.
For examples of creating Nested, Tuple, and Map columns to work with nested JSON structures, see the Handle as Structured Data article in the ClickHouse documentation.
Functions to work with JSON
ADQM can store JSON objects as strings and provides special functions for parsing JSON-based strings and extracting field values.
For example, the following query interprets JSON objects from the specified file as strings (the JSONAsString
format is used), extracts values of the name
and age
fields, and writes them to the appropriate columns:
SELECT
JSONExtractString(json, 'name') AS name,
JSONExtractUInt(json, 'age') AS age
FROM (SELECT * FROM file('persons.json', JSONAsString) AS json);
The output:
ββnameββββ¬βageβββ β john β 42 β β mary β 25 β β andrew β 33 β ββββββββββ΄βββββββ
Refer to the json-functions article in the ClickHouse documentation for the full list of supported functions.
Import JSON data
ADQM provides a set of formats and settings for importing JSON data. To choose appropriate options, you first need to determine how input data is structured.
JSON import formats
The table below lists examples of choosing formats to parse incoming JSON data depending on its structure. To run these data import examples, first create a table:
CREATE TABLE import_json_table(name String, age Int32, hire_date Date) ENGINE = MergeTree ORDER BY hire_date;
Queries in the Import data column insert data to the import_json_table
table from the persons.json file with the content described in the Input JSON column (the persons.json file should be located in a folder from which clickhouse-client is started).
Input JSON | Import data |
---|---|
Array of JSON objects:
|
Use the
|
NDJSON (newline-delimited JSON):
|
Use the
|
JSON arrays:
|
Use the
|
A single JSON object that contains fields with arrays of values:
|
Use the
|
Array of JSON arrays:
|
Use the
|
Each of the examples above imports data into the table as follows:
ββnameββββ¬βageββ¬ββhire_dateββ β john β 42 β 2021-01-01 β β mary β 25 β 2022-02-02 β β andrew β 33 β 2023-03-03 β ββββββββββ΄ββββββ΄βββββββββββββ
The full list of formats for JSON data import is available in the Formats for Input and Output Data article (see formats prefixed with JSON
).
Import JSON objects as strings
Use the JSONAsString
format to import JSON objects into an ADQM table as strings (without parsing). In this case, data is written to a single column of the String type.
If there are several comma-separated JSON objects in the source file, each of them is considered a separate row. The input in square brackets is considered an array of JSON objects.
Example
-
Prepare JSON data in the persons.json local file:
[ {"name":"john", "age":42, "hire_date":"2021-01-01"}, {"name":"mary", "age":25, "hire_date":"2022-02-02"}, {"name":"andrew", "age":33, "hire_date":"2023-03-03"} ]
-
Create a table with a string column:
CREATE TABLE json_as_str_table(data String) ENGINE = MergeTree ORDER BY tuple();
-
Insert data into the table using the
JSONAsString
format:INSERT INTO json_as_str_table FROM INFILE 'persons.json' FORMAT JSONAsString;
-
Resulting table:
SELECT * FROM json_as_str_table;
ββdataβββββββββββββββββββββββββββββββββββββββββββββββββββ β {"name":"john", "age":42, "hire_date":"2021-01-01"} β β {"name":"mary", "age":25, "hire_date":"2022-02-02"} β β {"name":"andrew", "age":33, "hire_date":"2023-03-03"} β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
To further process JSON objects stored as strings, use JSON functions.
In the following query, the JSONExtractString
function parses strings representing JSON objects from the data
column and returns string values of the name
field:
SELECT JSONExtractString(data, 'name') AS person_name, data FROM json_as_str_table;
ββperson_nameββ¬βdataβββββββββββββββββββββββββββββββββββββββββββββββββββ β john β {"name":"john", "age":42, "hire_date":"2021-01-01"} β β mary β {"name":"mary", "age":25, "hire_date":"2022-02-02"} β β andrew β {"name":"andrew", "age":33, "hire_date":"2023-03-03"} β βββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Skip unknown fields
By default, ADQM ignores unknown columns when importing JSON data.
For example, create a table with two columns (name
and hire_date
) and import JSON with three fields (name
, hire_date
, and age
— the same JSON as in the previous section):
CREATE TABLE short_table(name String, hire_date Date) ENGINE = MergeTree ORDER BY hire_date;
INSERT INTO short_table FROM INFILE 'persons.json' FORMAT JSONEachRow;
Ensure that the age
field is ignored:
SELECT * FROM short_table;
ββnameββββ¬ββhire_dateββ β john β 2021-01-01 β β mary β 2022-02-02 β β andrew β 2023-03-03 β ββββββββββ΄βββββββββββββ
To change this behavior, disable the input_format_skip_unknown_fields
setting:
SET input_format_skip_unknown_fields = 0;
ADQM will throw an exception if table and JSON structures are inconsistent:
INSERT INTO short_table FROM INFILE 'persons.json' FORMAT JSONEachRow;
Exception on client: Code: 117. DB::Exception: Unknown field found while parsing JSONEachRow format: age: (at row 1)
Import JSON with nested objects
To import JSON objects with nested structures, use complex data types (JSON, Array, or Tuple) for columns to store nested object.
Example
-
Prepare the /var/lib/clickhouse/user_files/persons.json file:
[ {"name":"john", "age":42, "hire_date":"2021-01-01", "address": {"country":"USA", "city":"New York"}}, {"name":"mary", "age":25, "hire_date":"2022-02-02", "address": {"country":"UK", "city":"London"}}, {"name":"andrew", "age":33, "hire_date":"2023-03-03", "address": {"country":"Germany", "city":"Berlin"}} ]
-
Select data using the
file
table function with a table structure passed as a parameter:SELECT * FROM file('persons.json', JSONEachRow, 'name String, age Int32, hire_date Date, address JSON');
-
In the resulting table, the
address
column contains nested JSON objects:
ββnameββββ¬βageββ¬ββhire_dateββ¬βaddressββββββββββββββββββββββββββββββββ β john β 42 β 2021-01-01 β {"city":"New York","country":"USA"} β β mary β 25 β 2022-02-02 β {"city":"London","country":"UK"} β β andrew β 33 β 2023-03-03 β {"city":"Berlin","country":"Germany"} β ββββββββββ΄ββββββ΄βββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββ
Fields of nested JSONs
To access nested object fields, enable the input_format_import_nested_json
setting:
SET input_format_import_nested_json = 1
After that, you can use the dot notation to specify fields of nested JSON objects in queries, wrapping them in backquotes.
For example, to display cities from address
nested objects in a separate column, use the query:
SELECT * FROM file('persons.json', JSONEachRow, 'name String, `address.city` String');
ββnameββββ¬βaddress.cityββ β john β New York β β mary β London β β andrew β Berlin β ββββββββββ΄βββββββββββββββ
Auto detect data types on JSON import
ADQM can automatically detect data types when importing JSON data. This functionality allows you to quickly create a table with columns which data types match types of fields in the JSON source.
Example
-
Prepare the /var/lib/clickhouse/user_files/persons.json source file:
{"name":"john","age":42,"hire_date":"2021-01-01"} {"name":"mary","age":25,"hire_date":"2022-02-02"} {"name":"andrew","age":33,"hire_date":"2023-03-03"}
-
Run the
DESCRIBE
query to check which data types are defined for values in the persons.json file:DESCRIBE TABLE file('persons.json', JSONEachRow);
ββnameβββββββ¬βtypeββββββββββββββ¬βdefault_typeββ¬βdefault_expressionββ¬βcommentββ¬βcodec_expressionββ¬βttl_expressionββ β name β Nullable(String) β β β β β β β age β Nullable(Int64) β β β β β β β hire_date β Nullable(Date) β β β β β β βββββββββββββ΄βββββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββββββββ΄ββββββββββ΄βββββββββββββββββββ΄βββββββββββββββββ
-
Create a table using the
file
table function:CREATE TABLE persons ENGINE = MergeTree ORDER BY tuple() AS SELECT * FROM file('persons.json', JSONEachRow);
-
Make sure the table columns match types of fields from the imported JSON:
DESCRIBE TABLE persons;
ββnameβββββββ¬βtypeββββββββββββββ¬βdefault_typeββ¬βdefault_expressionββ¬βcommentββ¬βcodec_expressionββ¬βttl_expressionββ β name β Nullable(String) β β β β β β β age β Nullable(Int64) β β β β β β β hire_date β Nullable(Date) β β β β β β βββββββββββββ΄βββββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββββββββ΄ββββββββββ΄βββββββββββββββββββ΄βββββββββββββββββ
Export JSON data
Most of the JSON formats that ADQM supports can be used to arrange results of SELECT
queries.
JSON output formats
Below are examples of formats for exporting data in JSON format from the export_json_table
table:
ββnameββββ¬βageββ¬ββhire_dateββ β john β 42 β 2021-01-01 β β mary β 25 β 2022-02-02 β β andrew β 33 β 2023-03-03 β ββββββββββ΄ββββββ΄βββββββββββββ
CREATE TABLE export_json_table (`name` String, `age` Int32, `hire_date` Date) ENGINE = MergeTree ORDER BY hire_date;
INSERT INTO export_json_table VALUES ('john', 42, '2021-01-01'), ('mary', 25, '2022-02-02'), ('andrew', 33, '2023-03-03');
-
JSONEachRow
Outputs each table row as a separate JSON object on a new line.
SELECT * FROM export_json_table FORMAT JSONEachRow;
{"name":"john","age":42,"hire_date":"2021-01-01"} {"name":"mary","age":25,"hire_date":"2022-02-02"} {"name":"andrew","age":33,"hire_date":"2023-03-03"}
-
JSONStringsEachRow
Differs from
JSONEachRow
only in that it outputs field values as strings rather than typed values of JSON objects.For example, the following query outputs the
age
field values as strings instead of numbers:SELECT * FROM export_json_table FORMAT JSONStringsEachRow;
{"name":"john","age":"42","hire_date":"2021-01-01"} {"name":"mary","age":"25","hire_date":"2022-02-02"} {"name":"andrew","age":"33","hire_date":"2023-03-03"}
-
JSONCompactEachRow
Outputs table rows as JSON arrays — arrays of values without column names.
SELECT * FROM export_json_table FORMAT JSONCompactEachRow;
["john", 42, "2021-01-01"] ["mary", 25, "2022-02-02"] ["andrew", 33, "2023-03-03"]
-
JSON
Outputs data in JSON format. In addition to the table data, it also outputs column names and types, as well as some additional information (for example, the number of output rows and the total number of rows that could have been output if there were not
LIMIT
in a query).SELECT * FROM export_json_table LIMIT 2 FORMAT JSON;
{ "meta": [ { "name": "name", "type": "String" }, { "name": "age", "type": "Int32" }, { "name": "hire_date", "type": "Date" } ], "data": [ { "name": "john", "age": 42, "hire_date": "2021-01-01" }, { "name": "mary", "age": 25, "hire_date": "2022-02-02" } ], "rows": 2, "rows_before_limit_at_least": 3, "statistics": { "elapsed": 0.000559985, "rows_read": 3, "bytes_read": 59 } }
-
JSONCompact
Outputs the same metadata, but uses a compressed form for data itself.
SELECT * FROM export_json_table LIMIT 2 FORMAT JSONCompact;
{ "meta": [ { "name": "name", "type": "String" }, { "name": "age", "type": "Int32" }, { "name": "hire_date", "type": "Date" } ], "data": [ ["john", 42, "2021-01-01"], ["mary", 25, "2022-02-02"] ], "rows": 2, "rows_before_limit_at_least": 3, "statistics": { "elapsed": 0.000571299, "rows_read": 3, "bytes_read": 59 } }
The full list of formats to export data as JSON is available in the Formats for Input and Output Data article of the ClickHouse documentation.
Export JSON to a file
To save JSON data from an ADQM table to a file, you can use the INTO OUTFILE
clause in the SELECT
query, for example:
SELECT * FROM export_json_table INTO OUTFILE 'out.json' FORMAT JSONEachRow;
Check that the specified file with data has been created in a folder from which clickhouse-client is started:
$ cat out.json
{"name":"john","age":42,"hire_date":"2021-01-01"}
{"name":"mary","age":25,"hire_date":"2022-02-02"}
{"name":"andrew","age":33,"hire_date":"2023-03-03"}