Use PL/Python

Eugenia Kuzina

Contents

Overview
PL/Python functions
Data types
Anonymous code blocks
Sharing data
Database access

Overview

The PL/Python procedural language allows PostgreSQL functions and procedures to be written in the Python language. ADPG supports the language version Python 3.

PL/Python is only available as an untrusted language — it does not provide a possibility to restrict user operations and is therefore named plpython3u. The author of a function in the untrusted PL/Python language must ensure that the function cannot be misused, since it has the same permissions as a database administrator. Only superusers can create functions in plpython3u and other untrusted languages.

To install PL/Python in a particular database, use the CREATE EXTENSION command:

CREATE EXTENSION plpython3u;

NOTE

If the plpython3u extension is created in the template1 database used as the default template, all subsequently created databases will have this extension installed.

PL/Python functions

To create a function in the PL/Python language, use the standard CREATE FUNCTION command:

CREATE FUNCTION function_name (argument-list)
  RETURNS return-type
AS $$
  # PL/Python function body
$$ LANGUAGE plpython3u;

The function body is a Python script. When the function is called, its arguments are passed as elements of the args list. Named arguments are also passed as ordinary variables to the Python script. Named arguments make code more readable. The result is returned from the Python code with return or yield (in case of a function returning a set). If you do not provide a return value, Python returns the default None. PL/Python converts None to the SQL NULL value. In a procedure, the result from the Python code must be None. You can end the procedure without a return statement or use return without argument to achieve this. Otherwise, an error will be raised.

For example, the following function returns the greater of two integers:

CREATE FUNCTION get_max (a integer, b integer)
  RETURNS integer
AS $$
  if a > b:
    return a
  return b
$$ LANGUAGE plpython3u;

SELECT get_max(28,3);

Result:

 get_max
----------
       28

The arguments are set as global variables. Because of the scoping rules of Python, an argument variable cannot be reassigned inside the function to the value of an expression that involves the variable name itself, unless the variable is redeclared as global.

For example, the following function converts a string to lower case:

CREATE FUNCTION fn_lower (str text)
RETURNS text
AS $$
    str = str.lower() # error
    return str
$$ LANGUAGE plpython3u;

It will be created but will return the error: ERROR: UnboundLocalError: local variable 'str' referenced before assignment.

Add the global definition to the str variable to overcome the problem:

CREATE FUNCTION fn_lower (str text)
  RETURNS text
AS $$
  global str
  str = str.lower()
  return str
$$ LANGUAGE plpython3u;

SELECT fn_lower('TEsT STRinG');

Result:

  fn_lower
-------------
 test string

However, it is recommended not to rely on such implementation detail of the PL/Python, but to assume that function parameters are read-only.

Data types

When a PL/Python function is called, its arguments are converted from their PostgreSQL data type to a corresponding Python type. The table below contains type mappings of function arguments.

Type mappings of arguments
ADPG/PostgreSQL type	Python type
Boolean	Bool
Smallint, int, bigint, oid	Int
Real, double	Float
Numeric	Decimal
Bytea	Bytes
Null	None. See the NULL → None section
Array	List. See the Arrays → Lists section
Composite types	Composite-type arguments are passed to the function as Python mappings. See the Composite types section
Other data types, including the PostgreSQL character string types	Str (in Unicode, like all Python strings)

A return value of a PL/Python function is converted to the declared PostgreSQL return type using the rules described in the table below.

Return type conversion
ADPG/PostgreSQL return type	Conversion rules
Boolean	The return value will be evaluated according to the Python rules. That is, `false` will be returned for `0` and the empty string, but note that `true` will be returned for `f`
Bytea	The return value will be converted to Python `bytes` using Python built-in facilities and then converted to the `bytea` type
NULL	None. See the NULL → None section
Array	List. See the Arrays → Lists section
Composite types	A composite result can be returned as a sequence, mapping, or an object. You can also use a function with `OUT` parameters. See the Composite types section
Other PostgreSQL return types	The return value is converted to a string using the Python built-in `str`, and the result is passed to the input function of the PostgreSQL data type. If a Python value has the type of `float`, it is converted using the `repr` instead of `str` to avoid loss of precision. Strings are converted to the PostgreSQL server encoding when they are passed to PostgreSQL

Information about non-scalar data types is given below.

Note that logical inconsistencies between the declared return type in PostgreSQL and the returned Python data type are ignored — the value is converted anyway.

NULL → None

If an SQL NULL value is passed to a function, the argument value will appear as None in Python. You can add STRICT to the function definition to make ADPG/PostgreSQL return the NULL result if a NULL value is passed as a parameter. In this case, the function is not called. Alternatively, it is possible to check for NULL inputs in the function body:

CREATE FUNCTION get_max (a integer, b integer)
  RETURNS integer
AS $$
  if (a is None) or (b is None):
    return None
  if a > b:
    return a
  return b
$$ LANGUAGE plpython3u;

The following code returns NULL:

SELECT get_max(NULL, 5);

Arrays → Lists

SQL array values are passed into PL/Python as a Python list. To return an SQL array value out of a PL/Python function, return a Python list:

CREATE FUNCTION get_array()
RETURNS char[]
AS $$
return ['a', 'b', 'c', 'd', 'e']
$$ LANGUAGE plpython3u;

SELECT get_array();

Result:

  get_array
-------------
 {a,b,c,d,e}

Multidimensional arrays are passed into PL/Python as nested Python lists. For example, a 2-dimensional array is a list of lists. When returning a multi-dimensional SQL array out of a PL/Python function, the inner lists at each level must all be of the same size.

CREATE FUNCTION get_two_dim_array(x int4[]) RETURNS int4[] AS $$
plpy.info(x, type(x))
return x
$$ LANGUAGE plpython3u;

SELECT * FROM get_two_dim_array(ARRAY[[1,2,3],[11,12,13],[21,22,23]]);

Result:

INFO:  ([[1, 2, 3], [11, 12, 13], [21, 22, 23]], <class 'list'>)

        get_two_dim_array
---------------------------------
 {{1,2,3},{11,12,13},{21,22,23}}

Other Python sequences, like tuples, are also accepted for backward compatibility with PostgreSQL versions 9.6 and below. However, they are always treated as one-dimensional arrays, because they are ambiguous with composite types. For the same reason, when a composite type is used in a multi-dimensional array, it must be represented by a tuple, rather than a list.

Note that in Python, strings are sequences, which can have undesirable effects:

CREATE FUNCTION get_str_arr()
  RETURNS varchar[]
AS $$
return "test"
$$ LANGUAGE plpython3u;

SELECT get_str_arr();

Result:

 get_str_arr
-------------
 {t,e,s,t}

Composite types

Composite-type arguments are passed to a function as Python mappings. The attribute names of the composite type are the element names of the mapping. Example:

CREATE TABLE book (
    book_id SERIAL PRIMARY KEY,
    title VARCHAR(50) NOT NULL,
    in_stock INTEGER DEFAULT 0);

INSERT INTO book(title, in_stock) VALUES
('Mrs. Dalloway',25),
('To the Lighthouse',13),
('To Kill a Mockingbird',2),
('The Great Gatsby',3),
('The Lord of the Rings',45);

CREATE FUNCTION sufficient_in_stock(b book)
  RETURNS boolean
AS $$
  if b["in_stock"] >= 10 :
    return True
  return False
$$ LANGUAGE plpython3u;

SELECT book_id, title, sufficient_in_stock(b.*) AS sufficient
 FROM book b;

Result:

 book_id |         title         | sufficient
---------+-----------------------+------------
       1 | Mrs. Dalloway         | t
       2 | To the Lighthouse     | t
       3 | To Kill a Mockingbird | f
       4 | The Great Gatsby      | f
       5 | The Lord of the Rings | t

If an attribute has the NULL value in the passed row, it has the None value in the mapping.

There are multiple ways to return row or composite types from a Python function. The examples below assume that the following composite type exists:

CREATE TYPE named_value AS (
  name   text,
  value  integer
);

A composite result can be returned as one of the options described below.

Sequence type (a tuple or list)

A returned sequence object must have the same number of items as the composite result type has fields. The item with the 0 index is assigned to the first field of the composite type, 1 — to the second, and so on. For example:

CREATE FUNCTION create_named_value1 (name text, value integer)
    RETURNS named_value
    AS $$
    return ( name, value )
#   return [ name, value ] -- or alternatively, return as list
$$ LANGUAGE plpython3u;

SELECT create_named_value1('name1', 1);

Result:

 create_named_value1
---------------------
 (name1,1)

To return an SQL NULL for any column, insert None at the corresponding position.

When an array of composite types is returned, it cannot be returned as a list, because it is ambiguous whether the Python list represents a composite type or another array dimension.

A composite type cannot be a Python set because it is not indexable.

Mapping (dictionary)

The value for each result type column is retrieved from the mapping with the column name as a key.

CREATE FUNCTION create_named_value2 (name text, value integer)
  RETURNS named_value
AS $$
  return { "name": name, "value": value }
$$ LANGUAGE plpython3u;

SELECT create_named_value2('name2',2);

Result:

 create_named_value2
---------------------
 (name2,2)

Any extra dictionary key/value pairs are ignored. A missing key raises an error. To return an SQL NULL value for any column, insert None with the corresponding column name as the key.

Object (any object providing the getattr method)

This works the same as a mapping. Example:

CREATE FUNCTION create_named_value3 (name text, value integer)
  RETURNS named_value
AS $$
  class named_value:
    def __init__ (self, n, v):
      self.name = n
      self.value = v
  return named_value(name, value)
$$ LANGUAGE plpython3u;

Alternatively, the function can be rewritten as follows:

CREATE FUNCTION create_named_value3 (name text, value integer)
  RETURNS named_value
AS $$
  class nv: pass
  nv.name = name
  nv.value = value
  return nv
$$ LANGUAGE plpython3u;

SELECT create_named_value3('name3',3);

Result:

 create_named_value3
---------------------
 (name3,3)

OUT parameters

You can use functions with output parameters:

CREATE FUNCTION return_out_params(OUT id integer, OUT name text ) AS $$
    return (1, "text1")
$$ LANGUAGE plpython3u;

SELECT * FROM return_out_params();

Result:

 id | name
----+-------
  1 | text1

Output parameters of procedures are passed back in the same way:

CREATE PROCEDURE calculate_cube (INOUT x integer, INOUT y integer) AS $$
    return (pow(x, 3), pow(y, 3))
$$ LANGUAGE plpython3u;

CALL calculate_cube(2, 5);

Result:

 x |  y
---+-----
 8 | 125

Set-returning functions

A PL/Python function can also return sets of scalar or composite types. This can be done in different ways, since the returned object is internally turned into an iterator. The examples below utilize the following composite type:

CREATE TYPE author_book AS (
  author text,
  book text
);

A set result can be returned from a sequence type, iterator, or generator. Set-returning functions with OUT parameters are also supported.

Examples

Sequence type (tuple, list, set)

CREATE FUNCTION get_books(author text)
  RETURNS SETOF author_book
AS $$
  # return tuple containing lists as composite types
   return ( [ author, "War and Peace" ], [ author, "Anna Karenina" ], [ author, "The Kreutzer Sonata" ] )
$$ LANGUAGE plpython3u;

SELECT get_books('Tolstoy');

Result:

            get_books
---------------------------------
 (Tolstoy,"War and Peace")
 (Tolstoy,"Anna Karenina")
 (Tolstoy,"The Kreutzer Sonata")

Iterator

A function must return an object providing the __iter__ and __next__ methods.

CREATE FUNCTION get_books1(author text)
  RETURNS SETOF author_book
AS $$
  class producer:
    def __init__ (self, author, book):
      self.author = author
      self.book = book
      self.ndx = -1

    def __iter__ (self):
      return self

    def __next__ (self):
      self.ndx += 1
      if self.ndx == len(self.book):
        raise StopIteration
      return ( self.author, self.book[self.ndx] )

  return producer(author, [ "War and peace", "Anna Karenina", "The Kreutzer sonata" ])
$$ LANGUAGE plpython3u;

SELECT get_books1('Tolstoy');

Result:

           get_books1
---------------------------------
 (Tolstoy,"War and peace")
 (Tolstoy,"Anna Karenina")
 (Tolstoy,"The Kreutzer sonata")

Generator (yield)

CREATE FUNCTION get_books2 (author text)
  RETURNS SETOF author_book
AS $$
  for book in ["War and peace", "Anna Karenina", "The Kreutzer sonata"]:
    yield (author, book)
$$ LANGUAGE plpython3u;

SELECT get_books2('Tolstoy');

Result:

           get_books2
---------------------------------
 (Tolstoy,"War and peace")
 (Tolstoy,"Anna Karenina")
 (Tolstoy,"The Kreutzer sonata")

Set-returning functions with OUT parameters

CREATE FUNCTION multiout_simple_setof(n integer, OUT integer, OUT integer) RETURNS SETOF record AS $$
    return [(1, 2)] * n
$$ LANGUAGE plpython3u;

SELECT * FROM multiout_simple_setof(3);

Result:

 column1 | column2
---------+---------
       1 |       2
       1 |       2
       1 |       2

Anonymous code blocks

PL/Python also supports anonymous code blocks called with the DO statement. An anonymous code block takes no arguments, and any value it might return is discarded. Otherwise, it works like function code.

DO $$
# PL/Python code
$$ LANGUAGE plpython3u;

Example:

DO $$
plpy.notice('This code is executed inside the Do block')
$$ LANGUAGE plpython3u;

Result:

NOTICE:  This code is executed inside the Do block
DO

Sharing data

PL/Python provides the SD global dictionary to store private data between repeated calls to the same function. It also has another global dictionary — GD for public data, that is available to all Python functions within a session.

Each function gets its own execution environment in the Python interpreter, so that global data and function arguments from one function are not available to another function. The exception is the data in the GD dictionary, as mentioned above.

Database access

The PL/Python language module automatically imports a Python module called plpy. The functions and constants in this module are available in the Python code as plpy.<function_name>.

The plpy module provides several functions to execute database commands.

execute

The plpy.execute command executes the given query with the optional limit parameter and returns a result object.

plpy.execute(<query> [, <limit>])

If a limit is specified and is greater than 0, plpy.execute returns the number of rows that do not exceed the limit, as if the query includes a LIMIT clause. If you omit limit or set it to 0, the row limit will not be applied.

For example, the following code returns up to 3 rows from the book table:

CREATE FUNCTION access_db() RETURNS SETOF book AS $$
    rows=plpy.execute("SELECT * FROM book", 3)
    return rows
$$ LANGUAGE plpython3u;

SELECT * FROM access_db();

Result:

 book_id |         title         | author_id | in_stock
---------+-----------------------+-----------+----------
       1 | Mrs. Dalloway         |         1 |       25
       2 | To the Lighthouse     |         1 |       13
       3 | To Kill a Mockingbird |         2 |        2

The result object emulates a list or dictionary object. The result object can be accessed by row number and column name. For example, you can access the title column as:

col = rows[i]["title"]

You can use the len function to obtain the number of returned rows.

Also, the result object has additional methods listed in the table below.

Method Description

nrows()

Returns the number of rows processed by the command. Note that this is not necessarily the same as the number of rows returned. For example, an UPDATE command sets this value but does not return any rows (unless RETURNING is used)

status()

The SPI_execute() return value

colnames()

Returns a list of column names

coltypes()

Returns a list of column type OIDs

coltypmods()

Returns a list of type-specific modifiers for the columns

__str__()

The standard Python __str__() method that returns a string representation of an object. So, it is possible, for example, to debug query execution results using plpy.debug(row)

The colnames(), coltypes(), and coltypmods() methods raise an exception when called on a result object from a command that did not produce a result set, for example, UPDATE without RETURNING, or DROP TABLE. However, you can use these methods on a result set containing zero rows.

TIP

Calling plpy.execute causes the entire result set to be read into memory. Use this function only when you are sure that the result set will be relatively small. If you do not want to risk excessive memory usage when fetching large results, use plpy.cursor rather than plpy.execute.

prepare and execute

The plpy.prepare function prepares the execution plan for a query. It is called with a query string and a list of parameter types if you have parameter references in the query. For example:

plan = plpy.prepare("SELECT in_stock FROM book WHERE title = $1", ["text"])

text is the type of the variable that will be passed as $1.

After prepare, use the plpy.execute function to run the query plan. Pass the plan as the first argument (instead of the query string), and a list of values to substitute parameters in the query as the second argument. It can be omitted if the query does not expect any parameters.

row = plpy.execute(plan, ["To the Lighthouse"])

You can also add the optional third parameter — limit that specifies the row limit as mentioned above.

Alternatively, you can call the execute method on the plan object:

row = plan.execute(["To the Lighthouse"])

When you prepare a plan using the PL/Python module, it is automatically saved. For more information, see Server Programming Interface. To effectively use this across multiple function calls, you may need to use the SD or GD persistent storage dictionary. For example:

CREATE FUNCTION use_plan(min_value int) RETURNS SETOF book AS $$
    if "plan" in SD:
        plan = SD["plan"]
    else:
        plan = plpy.prepare("SELECT * FROM book WHERE in_stock >=$1", ["int"])
        SD["plan"] = plan
    rows = plpy.execute(plan, [min_value])
    plpy.notice(rows.colnames()) # display column names
    total_books=0
    for i, elem in enumerate(rows):
        plpy.notice(rows[i]["title"]) # display book titles
        total_books = total_books + rows[i]["in_stock"]
    plpy.notice("total books: " + str(total_books)) # display total number of books
    return rows
$$ LANGUAGE plpython3u;

SELECT * from use_plan(10);

In the example above, the plpy.notice method is used to display information inside and outside the loop body for demonstration purposes. Generally, it is used to generate PostgreSQL log messages. See Utility functions.

Result:

NOTICE:  ['book_id', 'title', 'author_id', 'in_stock']
NOTICE:  Mrs. Dalloway
NOTICE:  To the Lighthouse
NOTICE:  The Lord of the Rings
NOTICE:  total books: 83
 book_id |         title         | author_id | in_stock
---------+-----------------------+-----------+----------
       1 | Mrs. Dalloway         |         1 |       25
       2 | To the Lighthouse     |         1 |       13
       5 | The Lord of the Rings |         4 |       45

cursor

The plpy.cursor function accepts the same arguments as plpy.execute (except for the row limit) and returns a cursor object, which allows you to process large result sets in smaller chunks. As with plpy.execute, a query string or a plan object with a list of arguments can be used.

plpy.cursor(query)

plpy.cursor(plan [, arguments])

The cursor function can also be called as a method of the plan object. Below is an example that, like the other examples in this section, uses the pgbench_accounts table created by pgbench.

CREATE FUNCTION count_odd_prepared() RETURNS integer AS $$
    odd = 0
    plan = plpy.prepare("SELECT aid FROM pgbench_accounts WHERE aid % $1 <> 0", ["integer"])
    rows = list(plpy.cursor(plan, [2]))
#   rows = list(plan.cursor([2])) -- call cursor as a method of the plan object
    return len(rows)
$$ LANGUAGE plpython3u;

SELECT count_odd_prepared();

Result:

 count_odd_prepared
--------------------
            2500000

The cursor object provides the fetch method that accepts an integer parameter and returns a result object. Each time you call fetch, the returned object contains the next batch of rows. The amount of rows cannot be larger than the parameter value. Once all rows are exhausted, fetch starts returning an empty result object.

CREATE FUNCTION count_odd_fetch(batch_size integer) RETURNS integer AS $$
odd = 0
cursor = plpy.cursor("SELECT aid FROM pgbench_accounts")
while True:
    rows = cursor.fetch(batch_size)
    if not rows:
        break
    for row in rows:
        if row['aid'] % 2:
            odd += 1
return odd
$$ LANGUAGE plpython3u;

SELECT count_odd_fetch(1000);

Result:

 count_odd_fetch
-----------------
         2500000

Cursor objects also provide an iterator interface. It yields one row at a time until all rows are exhausted. Data fetched that way is not returned as result objects, but rather as dictionaries. Each dictionary corresponds to a single result row.

CREATE FUNCTION count_odd_iterator() RETURNS integer AS $$
odd = 0
for row in plpy.cursor("SELECT aid FROM pgbench_accounts"):
    if row['aid'] % 2:
         odd += 1
return odd
$$ LANGUAGE plpython3u;

SELECT count_odd_iterator();

Result:

 count_odd_iterator
--------------------
            2500000

Found a mistake? Seleсt text and press Ctrl+Enter to report it