Use PL/Python
Overview
The PL/Python procedural language allows PostgreSQL functions and procedures to be written in the Python language. ADPG supports the language version Python 3.
PL/Python is only available as an untrusted language — it does not provide a possibility to restrict user operations and is therefore named plpython3u
. The author of a function in the untrusted PL/Python language must ensure that the function cannot be misused, since it has the same permissions as a database administrator. Only superusers can create functions in plpython3u
and other untrusted languages.
To install PL/Python in a particular database, use the CREATE EXTENSION command:
CREATE EXTENSION plpython3u;
NOTE
If the plpython3u extension is created in the template1 database used as the default template, all subsequently created databases will have this extension installed.
|
PL/Python functions
To create a function in the PL/Python language, use the standard CREATE FUNCTION command:
CREATE FUNCTION function_name (argument-list)
RETURNS return-type
AS $$
# PL/Python function body
$$ LANGUAGE plpython3u;
The function body is a Python script. When the function is called, its arguments are passed as elements of the args
list. Named arguments are also passed as ordinary variables to the Python script. Named arguments make code more readable. The result is returned from the Python code with return
or yield
(in case of a function returning a set). If you do not provide a return value, Python returns the default None
. PL/Python converts None
to the SQL NULL
value. In a procedure, the result from the Python code must be None
. You can end the procedure without a return
statement or use return
without argument to achieve this. Otherwise, an error will be raised.
For example, the following function returns the greater of two integers:
CREATE FUNCTION get_max (a integer, b integer)
RETURNS integer
AS $$
if a > b:
return a
return b
$$ LANGUAGE plpython3u;
SELECT get_max(28,3);
Result:
get_max ---------- 28
The arguments are set as global variables. Because of the scoping rules of Python, an argument variable cannot be reassigned inside the function to the value of an expression that involves the variable name itself, unless the variable is redeclared as global.
For example, the following function converts a string to lower case:
CREATE FUNCTION fn_lower (str text)
RETURNS text
AS $$
str = str.lower() # error
return str
$$ LANGUAGE plpython3u;
It will be created but will return the error: ERROR: UnboundLocalError: local variable 'str' referenced before assignment
.
Add the global
definition to the str
variable to overcome the problem:
CREATE FUNCTION fn_lower (str text)
RETURNS text
AS $$
global str
str = str.lower()
return str
$$ LANGUAGE plpython3u;
SELECT fn_lower('TEsT STRinG');
Result:
fn_lower ------------- test string
However, it is recommended not to rely on such implementation detail of the PL/Python, but to assume that function parameters are read-only.
Data types
When a PL/Python function is called, its arguments are converted from their PostgreSQL data type to a corresponding Python type. The table below contains type mappings of function arguments.
ADPG/PostgreSQL type | Python type |
---|---|
Boolean |
Bool |
Smallint, int, bigint, oid |
Int |
Real, double |
Float |
Numeric |
Decimal |
Bytea |
Bytes |
Null |
None. See the NULL → None section |
Array |
List. See the Arrays → Lists section |
Composite types |
Composite-type arguments are passed to the function as Python mappings. See the Composite types section |
Other data types, including the PostgreSQL character string types |
Str (in Unicode, like all Python strings) |
A return value of a PL/Python function is converted to the declared PostgreSQL return type using the rules described in the table below.
ADPG/PostgreSQL return type | Conversion rules |
---|---|
Boolean |
The return value will be evaluated according to the Python rules. That is, |
Bytea |
The return value will be converted to Python |
NULL |
None. See the NULL → None section |
Array |
List. See the Arrays → Lists section |
Composite types |
A composite result can be returned as a sequence, mapping, or an object. You can also use a function with |
Other PostgreSQL return types |
The return value is converted to a string using the Python built-in |
Information about non-scalar data types is given below.
Note that logical inconsistencies between the declared return type in PostgreSQL and the returned Python data type are ignored — the value is converted anyway.
NULL → None
If an SQL NULL
value is passed to a function, the argument value will appear as None
in Python. You can add STRICT
to the function definition to make ADPG/PostgreSQL return the NULL
result if a NULL
value is passed as a parameter. In this case, the function is not called. Alternatively, it is possible to check for NULL
inputs in the function body:
CREATE FUNCTION get_max (a integer, b integer)
RETURNS integer
AS $$
if (a is None) or (b is None):
return None
if a > b:
return a
return b
$$ LANGUAGE plpython3u;
The following code returns NULL
:
SELECT get_max(NULL, 5);
Arrays → Lists
SQL array values are passed into PL/Python as a Python list. To return an SQL array value out of a PL/Python function, return a Python list:
CREATE FUNCTION get_array()
RETURNS char[]
AS $$
return ['a', 'b', 'c', 'd', 'e']
$$ LANGUAGE plpython3u;
SELECT get_array();
Result:
get_array ------------- {a,b,c,d,e}
Multidimensional arrays are passed into PL/Python as nested Python lists. For example, a 2-dimensional array is a list of lists. When returning a multi-dimensional SQL array out of a PL/Python function, the inner lists at each level must all be of the same size.
CREATE FUNCTION get_two_dim_array(x int4[]) RETURNS int4[] AS $$
plpy.info(x, type(x))
return x
$$ LANGUAGE plpython3u;
SELECT * FROM get_two_dim_array(ARRAY[[1,2,3],[11,12,13],[21,22,23]]);
Result:
INFO: ([[1, 2, 3], [11, 12, 13], [21, 22, 23]], <class 'list'>) get_two_dim_array --------------------------------- {{1,2,3},{11,12,13},{21,22,23}}
Other Python sequences, like tuples, are also accepted for backward compatibility with PostgreSQL versions 9.6 and below. However, they are always treated as one-dimensional arrays, because they are ambiguous with composite types. For the same reason, when a composite type is used in a multi-dimensional array, it must be represented by a tuple, rather than a list.
Note that in Python, strings are sequences, which can have undesirable effects:
CREATE FUNCTION get_str_arr()
RETURNS varchar[]
AS $$
return "test"
$$ LANGUAGE plpython3u;
SELECT get_str_arr();
Result:
get_str_arr ------------- {t,e,s,t}
Composite types
Composite-type arguments are passed to a function as Python mappings. The attribute names of the composite type are the element names of the mapping. Example:
CREATE TABLE book (
book_id SERIAL PRIMARY KEY,
title VARCHAR(50) NOT NULL,
in_stock INTEGER DEFAULT 0);
INSERT INTO book(title, in_stock) VALUES
('Mrs. Dalloway',25),
('To the Lighthouse',13),
('To Kill a Mockingbird',2),
('The Great Gatsby',3),
('The Lord of the Rings',45);
CREATE FUNCTION sufficient_in_stock(b book)
RETURNS boolean
AS $$
if b["in_stock"] >= 10 :
return True
return False
$$ LANGUAGE plpython3u;
SELECT book_id, title, sufficient_in_stock(b.*) AS sufficient
FROM book b;
Result:
book_id | title | sufficient ---------+-----------------------+------------ 1 | Mrs. Dalloway | t 2 | To the Lighthouse | t 3 | To Kill a Mockingbird | f 4 | The Great Gatsby | f 5 | The Lord of the Rings | t
If an attribute has the NULL
value in the passed row, it has the None
value in the mapping.
There are multiple ways to return row or composite types from a Python function. The examples below assume that the following composite type exists:
CREATE TYPE named_value AS (
name text,
value integer
);
A composite result can be returned as one of the options described below.
A returned sequence object must have the same number of items as the composite result type has fields. The item with the 0
index is assigned to the first field of the composite type, 1
— to the second, and so on. For example:
CREATE FUNCTION create_named_value1 (name text, value integer)
RETURNS named_value
AS $$
return ( name, value )
# return [ name, value ] -- or alternatively, return as list
$$ LANGUAGE plpython3u;
SELECT create_named_value1('name1', 1);
Result:
create_named_value1 --------------------- (name1,1)
To return an SQL NULL
for any column, insert None
at the corresponding position.
When an array of composite types is returned, it cannot be returned as a list, because it is ambiguous whether the Python list represents a composite type or another array dimension.
A composite type cannot be a Python set because it is not indexable.
The value for each result type column is retrieved from the mapping with the column name as a key.
CREATE FUNCTION create_named_value2 (name text, value integer)
RETURNS named_value
AS $$
return { "name": name, "value": value }
$$ LANGUAGE plpython3u;
SELECT create_named_value2('name2',2);
Result:
create_named_value2 --------------------- (name2,2)
Any extra dictionary key/value pairs are ignored. A missing key raises an error. To return an SQL NULL
value for any column, insert None
with the corresponding column name as the key.
This works the same as a mapping. Example:
CREATE FUNCTION create_named_value3 (name text, value integer)
RETURNS named_value
AS $$
class named_value:
def __init__ (self, n, v):
self.name = n
self.value = v
return named_value(name, value)
$$ LANGUAGE plpython3u;
Alternatively, the function can be rewritten as follows:
CREATE FUNCTION create_named_value3 (name text, value integer)
RETURNS named_value
AS $$
class nv: pass
nv.name = name
nv.value = value
return nv
$$ LANGUAGE plpython3u;
SELECT create_named_value3('name3',3);
Result:
create_named_value3 --------------------- (name3,3)
You can use functions with output parameters:
CREATE FUNCTION return_out_params(OUT id integer, OUT name text ) AS $$
return (1, "text1")
$$ LANGUAGE plpython3u;
SELECT * FROM return_out_params();
Result:
id | name ----+------- 1 | text1
Output parameters of procedures are passed back in the same way:
CREATE PROCEDURE calculate_cube (INOUT x integer, INOUT y integer) AS $$
return (pow(x, 3), pow(y, 3))
$$ LANGUAGE plpython3u;
CALL calculate_cube(2, 5);
Result:
x | y ---+----- 8 | 125
Set-returning functions
A PL/Python function can also return sets of scalar or composite types. This can be done in different ways, since the returned object is internally turned into an iterator. The examples below utilize the following composite type:
CREATE TYPE author_book AS (
author text,
book text
);
A set result can be returned from a sequence type, iterator, or generator. Set-returning functions with OUT
parameters are also supported.
Examples
CREATE FUNCTION get_books(author text)
RETURNS SETOF author_book
AS $$
# return tuple containing lists as composite types
return ( [ author, "War and Peace" ], [ author, "Anna Karenina" ], [ author, "The Kreutzer Sonata" ] )
$$ LANGUAGE plpython3u;
SELECT get_books('Tolstoy');
Result:
get_books --------------------------------- (Tolstoy,"War and Peace") (Tolstoy,"Anna Karenina") (Tolstoy,"The Kreutzer Sonata")
A function must return an object providing the __iter__
and __next__
methods.
CREATE FUNCTION get_books1(author text)
RETURNS SETOF author_book
AS $$
class producer:
def __init__ (self, author, book):
self.author = author
self.book = book
self.ndx = -1
def __iter__ (self):
return self
def __next__ (self):
self.ndx += 1
if self.ndx == len(self.book):
raise StopIteration
return ( self.author, self.book[self.ndx] )
return producer(author, [ "War and peace", "Anna Karenina", "The Kreutzer sonata" ])
$$ LANGUAGE plpython3u;
SELECT get_books1('Tolstoy');
Result:
get_books1 --------------------------------- (Tolstoy,"War and peace") (Tolstoy,"Anna Karenina") (Tolstoy,"The Kreutzer sonata")
CREATE FUNCTION get_books2 (author text)
RETURNS SETOF author_book
AS $$
for book in ["War and peace", "Anna Karenina", "The Kreutzer sonata"]:
yield (author, book)
$$ LANGUAGE plpython3u;
SELECT get_books2('Tolstoy');
Result:
get_books2 --------------------------------- (Tolstoy,"War and peace") (Tolstoy,"Anna Karenina") (Tolstoy,"The Kreutzer sonata")
CREATE FUNCTION multiout_simple_setof(n integer, OUT integer, OUT integer) RETURNS SETOF record AS $$
return [(1, 2)] * n
$$ LANGUAGE plpython3u;
SELECT * FROM multiout_simple_setof(3);
Result:
column1 | column2 ---------+--------- 1 | 2 1 | 2 1 | 2
Anonymous code blocks
PL/Python also supports anonymous code blocks called with the DO
statement. An anonymous code block takes no arguments, and any value it might return is discarded. Otherwise, it works like function code.
DO $$
# PL/Python code
$$ LANGUAGE plpython3u;
Example:
DO $$
plpy.notice('This code is executed inside the Do block')
$$ LANGUAGE plpython3u;
Result:
NOTICE: This code is executed inside the Do block DO
Sharing data
PL/Python provides the SD
global dictionary to store private data between repeated calls to the same function. It also has another global dictionary — GD
for public data, that is available to all Python functions within a session.
Each function gets its own execution environment in the Python interpreter, so that global data and function arguments from one function are not available to another function. The exception is the data in the GD
dictionary, as mentioned above.
Database access
The PL/Python language module automatically imports a Python module called plpy
. The functions and constants in this module are available in the Python code as plpy.<function_name>
.
The plpy
module provides several functions to execute database commands.
execute
The plpy.execute
command executes the given query with the optional limit
parameter and returns a result object.
plpy.execute(<query> [, <limit>])
If a limit is specified and is greater than 0
, plpy.execute
returns the number of rows that do not exceed the limit, as if the query includes a LIMIT
clause. If you omit limit
or set it to 0
, the row limit will not be applied.
For example, the following code returns up to 3 rows from the book
table:
CREATE FUNCTION access_db() RETURNS SETOF book AS $$
rows=plpy.execute("SELECT * FROM book", 3)
return rows
$$ LANGUAGE plpython3u;
SELECT * FROM access_db();
Result:
book_id | title | author_id | in_stock ---------+-----------------------+-----------+---------- 1 | Mrs. Dalloway | 1 | 25 2 | To the Lighthouse | 1 | 13 3 | To Kill a Mockingbird | 2 | 2
The result object emulates a list or dictionary object. The result object can be accessed by row number and column name. For example, you can access the title
column as:
col = rows[i]["title"]
You can use the len function to obtain the number of returned rows.
Also, the result object has additional methods listed in the table below.
Method | Description |
---|---|
nrows() |
Returns the number of rows processed by the command. Note that this is not necessarily the same as the number of rows returned. For example, an |
status() |
The |
colnames() |
Returns a list of column names |
coltypes() |
Returns a list of column type OIDs |
coltypmods() |
Returns a list of type-specific modifiers for the columns |
__str__() |
The standard Python |
The colnames()
, coltypes()
, and coltypmods()
methods raise an exception when called on a result object from a command that did not produce a result set, for example, UPDATE
without RETURNING
, or DROP TABLE
. However, you can use these methods on a result set containing zero rows.
TIP
Calling plpy.execute causes the entire result set to be read into memory. Use this function only when you are sure that the result set will be relatively small. If you do not want to risk excessive memory usage when fetching large results, use plpy.cursor rather than plpy.execute .
|
prepare and execute
The plpy.prepare
function prepares the execution plan for a query. It is called with a query string and a list of parameter types if you have parameter references in the query. For example:
plan = plpy.prepare("SELECT in_stock FROM book WHERE title = $1", ["text"])
text
is the type of the variable that will be passed as $1
.
After prepare
, use the plpy.execute
function to run the query plan. Pass the plan as the first argument (instead of the query string), and a list of values to substitute parameters in the query as the second argument. It can be omitted if the query does not expect any parameters.
row = plpy.execute(plan, ["To the Lighthouse"])
You can also add the optional third parameter — limit
that specifies the row limit as mentioned above.
Alternatively, you can call the execute
method on the plan
object:
row = plan.execute(["To the Lighthouse"])
When you prepare a plan using the PL/Python module, it is automatically saved. For more information, see Server Programming Interface. To effectively use this across multiple function calls, you may need to use the SD
or GD
persistent storage dictionary. For example:
CREATE FUNCTION use_plan(min_value int) RETURNS SETOF book AS $$
if "plan" in SD:
plan = SD["plan"]
else:
plan = plpy.prepare("SELECT * FROM book WHERE in_stock >=$1", ["int"])
SD["plan"] = plan
rows = plpy.execute(plan, [min_value])
plpy.notice(rows.colnames()) # display column names
total_books=0
for i, elem in enumerate(rows):
plpy.notice(rows[i]["title"]) # display book titles
total_books = total_books + rows[i]["in_stock"]
plpy.notice("total books: " + str(total_books)) # display total number of books
return rows
$$ LANGUAGE plpython3u;
SELECT * from use_plan(10);
In the example above, the plpy.notice
method is used to display information inside and outside the loop body for demonstration purposes. Generally, it is used to generate PostgreSQL log messages. See Utility functions.
Result:
NOTICE: ['book_id', 'title', 'author_id', 'in_stock'] NOTICE: Mrs. Dalloway NOTICE: To the Lighthouse NOTICE: The Lord of the Rings NOTICE: total books: 83 book_id | title | author_id | in_stock ---------+-----------------------+-----------+---------- 1 | Mrs. Dalloway | 1 | 25 2 | To the Lighthouse | 1 | 13 5 | The Lord of the Rings | 4 | 45
cursor
The plpy.cursor
function accepts the same arguments as plpy.execute
(except for the row limit) and returns a cursor object, which allows you to process large result sets in smaller chunks. As with plpy.execute
, a query string or a plan object with a list of arguments can be used.
plpy.cursor(query)
plpy.cursor(plan [, arguments])
The cursor
function can also be called as a method of the plan object. Below is an example that, like the other examples in this section, uses the pgbench_accounts
table created by pgbench.
CREATE FUNCTION count_odd_prepared() RETURNS integer AS $$
odd = 0
plan = plpy.prepare("SELECT aid FROM pgbench_accounts WHERE aid % $1 <> 0", ["integer"])
rows = list(plpy.cursor(plan, [2]))
# rows = list(plan.cursor([2])) -- call cursor as a method of the plan object
return len(rows)
$$ LANGUAGE plpython3u;
SELECT count_odd_prepared();
Result:
count_odd_prepared -------------------- 2500000
The cursor object provides the fetch
method that accepts an integer parameter and returns a result object. Each time you call fetch
, the returned object contains the next batch of rows. The amount of rows cannot be larger than the parameter value. Once all rows are exhausted, fetch starts returning an empty result object.
CREATE FUNCTION count_odd_fetch(batch_size integer) RETURNS integer AS $$
odd = 0
cursor = plpy.cursor("SELECT aid FROM pgbench_accounts")
while True:
rows = cursor.fetch(batch_size)
if not rows:
break
for row in rows:
if row['aid'] % 2:
odd += 1
return odd
$$ LANGUAGE plpython3u;
SELECT count_odd_fetch(1000);
Result:
count_odd_fetch ----------------- 2500000
Cursor objects also provide an iterator interface. It yields one row at a time until all rows are exhausted. Data fetched that way is not returned as result objects, but rather as dictionaries. Each dictionary corresponds to a single result row.
CREATE FUNCTION count_odd_iterator() RETURNS integer AS $$
odd = 0
for row in plpy.cursor("SELECT aid FROM pgbench_accounts"):
if row['aid'] % 2:
odd += 1
return odd
$$ LANGUAGE plpython3u;
SELECT count_odd_iterator();
Result:
count_odd_iterator -------------------- 2500000