Use C-language functions
In ADPG, you can use C language (or a language compatible with C, for example, C++) to write user-defined functions. The main feature of these functions is that they are compiled into dynamically loadable objects (also called shared libraries). The server loads them on demand. The dynamic loading feature distinguishes C-language functions from internal functions — the actual coding conventions are essentially the same for both. Hence, the standard internal function library is a rich source of coding examples for user-defined C functions.
Currently, only one calling convention is used for C functions (version 1). Support for this convention is indicated by a function declaration with the PG_FUNCTION_INFO_V1
macro as shown below.
Dynamic loading
The first time a user-defined function in a particular loadable object file is called in a session, the dynamic loader loads that object file into memory so that the function can be called. The CREATE FUNCTION command for a user-defined C function must specify two options for the function: the name of the loadable object file and the name of the C function to call within that object file. If the C function name is not explicitly specified, it is assumed to be the same as the SQL function name.
The following algorithm is used to locate the shared object file based on the name given in the CREATE FUNCTION
command:
-
If the name is an absolute path, the given file is loaded.
-
If the name starts with the
$libdir
string, that part is replaced by the PostgreSQL package library directory name, which is determined at build time. -
If the name does not contain a directory part, the file is searched for in the path specified by the
dynamic_library_path
configuration variable. -
Otherwise (the file was not found in the path, or it contains a non-absolute directory part), the dynamic loader will try to take the name as given, which will most likely fail. It is unreliable to depend on the current working directory.
If these options do not work, the platform-specific shared library file name extension (often .so
) is appended to the given name, and the options above are tried again. If that fails, the load will fail.
The default library path is /usr/lib/adpg16/lib
. You can use the following psql commands to verify it:
SHOW dynamic_library_path;
dynamic_library_path ---------------------- $libdir (1 row)
SELECT setting FROM pg_config WHERE name='PKGLIBDIR';
setting --------------------- /usr/lib/adpg16/lib
Also, you can determine which path contains $libdir
with the pg_config --pkglibdir
command. For this, the adpg16-devel package should be installed as described below.
The user, on whose behalf the ADPG server runs, must be able to traverse the path to the file you intend to load. Making the file or a higher-level directory not readable and/or not executable by this user is a common mistake.
To ensure that a dynamically loaded object file is not loaded into an incompatible server, ADPG/PostgreSQL checks if the file contains a "magic block" with the appropriate contents. This allows the server to detect incompatibilities, such as code compiled for a different major version of ADPG/PostgreSQL. To include a magic block, write this in one of the module source files, after the #include "fmgr.h"
header:
PG_MODULE_MAGIC;
After it is used for the first time, a dynamically loaded object file is retained in memory. In the same session, future calls to the functions in that file will only incur the small overhead of a symbol table lookup. If you need to force a reload of an object file, for example after recompiling it, start a new session.
Optionally, a dynamically loaded file can contain initialization and finalization functions. If the file includes a function named _PG_init
, that function will be called immediately after loading the file. The _PG_init
function receives no parameters and should return void. There is presently no way to unload a dynamically loaded file, so the finalization function is not used.
Create a sample function
ADPG/PostgreSQL will not compile C functions automatically. The object file must be compiled before it is referenced in the CREATE FUNCTION
command.
Creating shared libraries is generally analogous to linking executables: first the source files are compiled into object files, then the object files are linked together. The object files need to be created as position-independent code (PIC), which means that they can be placed at an arbitrary location in memory when they are loaded by the executable.
If your system does not have a C compiler, you can install GNU Compiler Collection (GCC) as described below.
The default CentOS repositories contain a package group named Development Tools that contains the GCC compiler and a lot of libraries and other utilities required for compiling software.
To install Development Tools including the GCC Compiler, run the following command:
$ sudo yum group install "Development Tools"
To check if the GCC compiler is successfully installed, use the following command that prints the GCC version:
$ gcc --version
Install the adpg16-devel package that contains development files for ADPG/PostgreSQL server-side programming:
$ sudo yum install adpg16-devel
To install the GCC Compiler, run the following command:
$ sudo apt install gcc
Alternatively, install the build-essential package that contains additional utilities used with GCC:
$ sudo apt install build-essential
To check if the GCC compiler is successfully installed, use the following command that prints the GCC version:
$ gcc --version
Install the adpg16-devel package that contains development files for ADPG/PostgreSQL server-side programming:
$ sudo apt-get install adpg16-devel
Create a C function that returns numbers to the second power from 1 to a passed parameter value. Note that you should always include postgres.h first in any source file of server code. The postgres.h file contains definitions of a lot of necessary types and functions, fmgr.h includes the function manager interfaces (PG_FUNCTION_ARGS
, etc.), funcapi.h is required to return a set of rows.
By default, postgres.h and other header files are located at the following path: /usr/lib/adpg16/include/server. You can put the created file to this folder or specify a path to header files (#include "postgres.h"
and others) according to the created file location.
#include "postgres.h"
#include "fmgr.h"
#include "funcapi.h"
PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(function_test);
Datum
function_test(PG_FUNCTION_ARGS)
{
FuncCallContext *function_call_context;
MemoryContext old_context;
if (SRF_IS_FIRSTCALL()) {
function_call_context = SRF_FIRSTCALL_INIT();
old_context = MemoryContextSwitchTo(function_call_context->multi_call_memory_ctx);
function_call_context->max_calls = PG_GETARG_INT32(0);
MemoryContextSwitchTo(old_context);
}
function_call_context = SRF_PERCALL_SETUP();
if (function_call_context->call_cntr < function_call_context->max_calls) {
SRF_RETURN_NEXT(function_call_context, pow(Int32GetDatum(function_call_context->call_cntr),2));
} else {
SRF_RETURN_DONE(function_call_context);
}
}
Specify the name of the file containing the function as foo.c.
Set the C_INCLUDE_PATH
environment variable to a path to the header files included via #include
for the GCC compiler:
export C_INCLUDE_PATH=$C_INCLUDE_PATH:/lib/adpg16/include/server
In this example, the source code is in the foo.c file, and a shared library with the foo.so name is created. The intermediate object file is called foo.o. A shared library can contain more than one object file, but the example uses one.
$ cc -fPIC -c foo.c
$ cc -shared -o foo.so foo.o
For more information on creating shared libraries, see Compiling and linking dynamically-loaded functions.
Create a function in PostgreSQL:
CREATE OR REPLACE FUNCTION
function_test(integer) RETURNS setof int4 AS
'/usr/lib/adpg16/include/server/foo.so', 'function_test'
LANGUAGE C
STRICT;
Test the function:
SELECT function_test(4);
Result:
function_test --------------- 1 4 9 16
Base types in C-Language functions
ADPG/PostgreSQL regards a base type as a "blob of memory". The user-defined functions define the way that PostgreSQL can operate on it. That is, ADPG/PostgreSQL only stores and retrieves the data from disk, and uses user-defined functions to input, process, and output the data.
Base types can have one of three internal formats:
-
pass by value, fixed-length;
-
pass by reference, fixed-length;
-
pass by reference, variable-length.
By-value types can only be 1, 2, or 4 bytes in length (also 8 bytes, if sizeof(Datum)
is 8 on the current machine). You should be careful to define your types such that they will be the same size (in bytes) on all architectures. For example, the long
type is 4 bytes on some machines and 8 bytes on others, whereas the int
type is 4 bytes on most Unix machines. A reasonable implementation of the int4
type on Unix machines can be:
/* 4-byte integer, passed by value */
typedef int int4;
The actual PostgreSQL code considers this type as int32
, because there is a convention in C that intXX
means XX
bits. Note therefore that the C type int8
is 1 byte in size. The SQL type int8
is called int64
in C.
Fixed-length types of any size can be passed by reference. For example, here is a sample implementation of a PostgreSQL type:
/* 16-byte structure, passed by reference */
typedef struct
{
double x, y;
} Point;
Only pointers to such types can be used when passing them in and out of PostgreSQL functions. To return a value of such a type, allocate the memory with the palloc
function, fill in the allocated memory, and return a pointer to it. Also, if you just want to return the same value as one of your input arguments, just return the pointer to the input value.
All variable-length types must also be passed by reference. These types must begin with a required length field of 4 bytes, which should be specified by the SET_VARSIZE
macro. Never set this field manually. All data to be stored within that type must be located in the memory immediately following that length field.
Never modify the contents of a pass-by-reference input value. This operation can corrupt data on the disk, since the pointer might point directly into a disk buffer.
As an example, the text
type can be defined as follows:
typedef struct {
int32 length;
char data[FLEXIBLE_ARRAY_MEMBER];
} text;
The [FLEXIBLE_ARRAY_MEMBER]
notation means that the actual length of data is not specified by this declaration.
When manipulating variable-length types, you must be careful to allocate the correct amount of memory and set the length field correctly. For example, to store 40 bytes in a text structure, use a code fragment like this:
#include "postgres.h"
...
char buffer[40]; /* the source data */
...
text *destination = (text *) palloc(VARHDRSZ + 40);
SET_VARSIZE(destination, VARHDRSZ + 40);
memcpy(destination->data, buffer, 40);
...
VARHDRSZ
is the same as sizeof(int32)
, but it is a good style to use the VARHDRSZ
macro to refer to the size of the overhead for a variable-length type.
The table below lists the C types corresponding to the built-in SQL data types of PostgreSQL. The Defined in column contains the header file name that should be included to get the type definition. As mentioned above, it is necessary to include postgres.h first in any source file of server code.
SQL type | C type | Defined in |
---|---|---|
boolean |
bool |
postgres.h (maybe compiler built-in) |
box |
BOX* |
utils/geo_decls.h |
bytea |
bytea* |
postgres.h |
"char" |
char |
(compiler built-in) |
character |
BpChar* |
postgres.h |
cid |
CommandId |
postgres.h |
date |
DateADT |
utils/date.h |
float4 (real) |
float4 |
postgres.h |
float8 (double precision) |
float8 |
postgres.h |
int2 (smallint) |
int16 |
postgres.h |
int4 (integer) |
int32 |
postgres.h |
int8 (bigint) |
int64 |
postgres.h |
interval |
Interval* |
datatype/timestamp.h |
lseg |
LSEG* |
utils/geo_decls.h |
name |
Name |
postgres.h |
numeric |
Numeric |
utils/numeric.h |
oid |
Oid |
postgres.h |
oidvector |
oidvector* |
postgres.h |
path |
PATH* |
utils/geo_decls.h |
point |
POINT* |
utils/geo_decls.h |
regproc |
RegProcedure |
postgres.h |
text |
text* |
postgres.h |
tid |
ItemPointer |
storage/itemptr.h |
time |
TimeADT |
utils/date.h |
time with time zone |
TimeTzADT |
utils/date.h |
timestamp |
Timestamp |
datatype/timestamp.h |
timestamp with time zone |
TimestampTz |
datatype/timestamp.h |
varchar |
VarChar* |
postgres.h |
xid |
TransactionId |
postgres.h |
Version 1 сalling сonventions
The version 1 calling convention relies on macros to perform operations involving passing arguments and results. By version 1 convention, a C function should always be defined as follows:
Datum <function_name>(PG_FUNCTION_ARGS)
In addition, the following macro call must be written just before the function itself:
PG_FUNCTION_INFO_V1(<function_name>);
In a version-1 function, each argument is fetched using the PG_GETARG_xxx()
macro that corresponds to the argument’s data type (xxx
is a data type). In non-strict functions, this call must be preceded by a check for NULL
in the argument using PG_ARGISNULL()
. The PG_RETURN_xxx()
macro for the required type returns a result. PG_GETARG_xxx()
takes the number of the function argument (starting with 0
) as its argument. PG_RETURN_xxx()
takes the actual value to return as an argument.
Examples:
/* The funcs.c file that is compiled into funcs.so */
#include "postgres.h"
#include <string.h>
#include "fmgr.h"
#include "utils/geo_decls.h"
#include "varatt.h"
PG_MODULE_MAGIC;
/* by value */
PG_FUNCTION_INFO_V1(add_one);
Datum
add_one(PG_FUNCTION_ARGS)
{
int32 arg = PG_GETARG_INT32(0);
PG_RETURN_INT32(arg + 1);
}
/* by reference, fixed length */
PG_FUNCTION_INFO_V1(add_one_float8);
Datum
add_one_float8(PG_FUNCTION_ARGS)
{
float8 arg = PG_GETARG_FLOAT8(0);
PG_RETURN_FLOAT8(arg + 1.0);
}
/* by reference, variable length */
PG_FUNCTION_INFO_V1(concat_text);
Datum
concat_text(PG_FUNCTION_ARGS)
{
text *arg1 = PG_GETARG_TEXT_PP(0);
text *arg2 = PG_GETARG_TEXT_PP(1);
int32 arg1_size = VARSIZE_ANY_EXHDR(arg1);
int32 arg2_size = VARSIZE_ANY_EXHDR(arg2);
/* VARSIZE_ANY_EXHDR is the size of the struct in bytes, minus VARHDRSZ or VARHDRSZ_SHORT of its header. */
int32 new_text_size = arg1_size + arg2_size + VARHDRSZ;
text *new_text = (text *) palloc(new_text_size);
SET_VARSIZE(new_text, new_text_size);
/*
* VARDATA is a pointer to the data region of the new struct. The source
* could be a short datum, so retrieve its data through VARDATA_ANY.
*/
memcpy(VARDATA(new_text), /* destination */
VARDATA_ANY(arg1), /* source */
arg1_size); /* size in bytes */
memcpy(VARDATA(new_text) + arg1_size, VARDATA_ANY(arg2), arg2_size);
PG_RETURN_TEXT_P(new_text);
}
After the code was prepared in the funcs.c file, compiled into a shared object as described in Create a sample function, and placed in the directory specified by the default library path, you can define the functions in ADPG/PostgreSQL with the following commands:
CREATE FUNCTION add_one(integer) RETURNS integer
AS 'funcs', 'add_one'
LANGUAGE C STRICT;
-- Overloading of the "add_one" SQL function
CREATE FUNCTION add_one(double precision) RETURNS double precision
AS 'funcs', 'add_one_float8'
LANGUAGE C STRICT;
CREATE FUNCTION concat_text(text, text) RETURNS text
AS 'funcs', 'concat_text'
LANGUAGE C STRICT;
Alternatively, you can place funcs.so in any other directory and specify the full path to this file instead of funcs
. Make sure that the user, on whose behalf the ADPG server runs, is able to execute this file.
Notice that we have specified the functions as STRICT
, meaning that the system should assume a NULL
result if any input value is NULL
. It avoids checking for NULL
inputs in the function code. Without this, you should check for NULL
values explicitly, using PG_ARGISNULL()
. For example:
isnull = PG_ARGISNULL(0);
if (isnull)
element = (Datum) 0;
else
element = PG_GETARG_DATUM(0);
Basic rules for writing code
The basic rules for writing and building C functions are as follows:
-
Define a "magic block" for your shared library —
PG_MODULE_MAGIC;
. -
When allocating memory, use the
palloc
andpfree
PostgreSQL functions instead of the correspondingmalloc
andfree
C-library functions. The memory allocated bypalloc
will be freed automatically at the end of each transaction, preventing memory leaks. -
Always zero the bytes of your structures using
memset
(or allocate the memory withpalloc0
). Even if you assign a value to each field of your structure, there might be alignment paddings that contain garbage values. Without this, it is difficult to support hash indexes or hash joins, as you must pick out only the significant bits of your data structure to compute a hash. The planner also sometimes relies on comparing constants via bitwise equality, so you can get undesirable planning results if logically-equivalent values are not bitwise equal. -
Most of the internal PostgreSQL types are declared in postgres.h, while the function manager interfaces (
PG_FUNCTION_ARGS
, etc.) are in fmgr.h. So you need to include at least these two files. As mentioned above, by default, ADPG server header files are located at the following path: /usr/lib/adpg16/include/server. For portability reasons, it is the best practice to include postgres.h first, before any other system or user header files. Including postgres.h will also include elog.h and palloc.h. -
Symbol names defined within object files must not conflict with each other or with symbols defined in the PostgreSQL server executable. You should rename your functions or variables if you get error messages related to this effect.
Composite-type arguments
Composite types do not have a fixed layout like C structures. Instances of a composite type can contain NULL
fields. In addition, composite types that are part of an inheritance hierarchy can have different fields than other members of the same inheritance hierarchy. Therefore, PostgreSQL provides a function interface for accessing fields of composite types from C.
As an example, consider a function that returns TRUE
if a book is written by the author with the specified id:
SELECT title, books_by_author_id(book, 1) AS is_author
FROM book WHERE is_author = TRUE;
/* The foo2.c file that is compiled into foo2.so */
#include "postgres.h"
#include "executor/executor.h" /* for GetAttributeByName() */
PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(books_by_author_id);
Datum
books_by_author_id(PG_FUNCTION_ARGS)
{
HeapTupleHeader t = PG_GETARG_HEAPTUPLEHEADER(0);
int32 author_id = PG_GETARG_INT32(1);
bool isnull;
Datum author;
author = GetAttributeByName(t, "author_id", &isnull);
if (isnull)
PG_RETURN_BOOL(false);
PG_RETURN_BOOL(DatumGetInt32(author) == author_id);
}
CREATE FUNCTION books_by_author_id(book, integer) RETURNS boolean
AS 'foo2', 'books_by_author_id'
LANGUAGE C STRICT;
SELECT title, books_by_author_id(book, 1) AS is_author FROM book;
Result:
title | is_author -----------------------+----------- The Great Gatsby | f The Lord of the Rings | f 1984 | f Animal Farm | f Mrs. Dalloway | t To the Lighthouse | t To Kill a Mockingbird | f
GetAttributeByName
is the PostgreSQL system function that returns attributes of the specified row. It has three arguments: the argument of the HeapTupleHeader
type passed into the function, the name of the desired attribute, and a return parameter that tells whether the attribute is NULL
. GetAttributeByName
returns a Datum value that you can convert to the proper data type by using the appropriate DatumGetxxx()
function. Note that the return value is meaningless if the null flag is set. Check the null flag before trying to do anything with the result.
Return sets
C-language functions have two options for returning sets (multiple rows):
-
The ValuePerCall mode — a set-returning function is called repeatedly (passing the same arguments each time). It returns one new row on each call, until it has no more rows to return and signals that by returning
NULL
. The set-returning function (SRF) must therefore save its state across calls and return the correct next item on each call. The example from the Create a sample function section uses the ValuePerCall mode. -
The Materialize mode — an SRF fills and returns a tuplestore object containing its entire result. Then only one call occurs for the whole result, and no inter-call state is needed.
This section is dedicated to the ValuePerCall mode. When using ValuePerCall mode, it is important to remember that the query is not guaranteed to be run to completion due to options such as LIMIT
, the executor might stop making calls to the set-returning function before all rows have been fetched. This means it is not safe to perform cleanup activities in the last call, because that might not ever happen.
The macros that support ValuePerCall use the FuncCallContext
structure. FuncCallContext
contains the state that should to be saved across calls.
typedef struct FuncCallContext
{
uint64 call_cntr;
uint64 max_calls;
void *user_fctx;
AttInMetadata *attinmeta;
MemoryContext multi_call_memory_ctx;
TupleDesc tuple_desc;
} FuncCallContext;
Name | Description |
---|---|
call_cntr |
Number of times the function has been called before. |
max_calls |
Maximum number of calls. This setting is optional. If not set, you must provide alternative way to determine when the function is done |
*user_fctx |
Optional pointer to miscellaneous user-provided context information. You can use |
*attinmeta |
Optional pointer to struct containing attribute type input metadata. |
multi_call_memory_ctx |
Is used for structures that must exist for multiple calls. |
tuple_desc |
Optional pointer to struct containing tuple description. |
Within the calling SRF, fcinfo->flinfo->fn_extra
is used to hold a pointer to FuncCallContext across calls. The macros automatically fill that field on first use, and expect to find the same pointer there on subsequent uses.
Name | Description |
---|---|
SRF_IS_FIRSTCALL() |
Used to determine if the function is being called for the first or a subsequent time |
SRF_FIRSTCALL_INIT() |
Executed on the first call only to initialize |
SRF_PERCALL_SETUP() |
Called on every function call, including the first, to set up for using |
SRF_RETURN_NEXT(funcctx, result) |
Used to return data to the caller if the function has data to return during the current call. The result must be of the |
SRF_RETURN_DONE(funcctx) |
Used to clean up and end the SRF when the function is finished returning data |
A complete example of a simple SRF returning a composite type:
/* The my_foo.c file that is compiled into my_foo.so */
#include "postgres.h"
#include "fmgr.h"
#include "funcapi.h"
PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(my_func);
Datum
my_func(PG_FUNCTION_ARGS)
{
FuncCallContext *funcctx;
int call_cntr;
int max_calls;
TupleDesc tupdesc;
AttInMetadata *attinmeta;
/* operations only on the first call of the function */
if (SRF_IS_FIRSTCALL())
{
MemoryContext oldcontext;
/* create a function context for cross-call persistence */
funcctx = SRF_FIRSTCALL_INIT();
/* switch to memory context appropriate for multiple function calls */
oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
/* total number of tuples to be returned */
funcctx->max_calls = PG_GETARG_INT32(0);
/* build a tuple descriptor for the result type */
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("function returning record called in context "
"that cannot accept type record")));
/* generate attribute metadata needed later to produce tuples from raw C strings*/
attinmeta = TupleDescGetAttInMetadata(tupdesc);
funcctx->attinmeta = attinmeta;
MemoryContextSwitchTo(oldcontext);
}
/* operations on every call of the function */
funcctx = SRF_PERCALL_SETUP();
call_cntr = funcctx->call_cntr;
max_calls = funcctx->max_calls;
attinmeta = funcctx->attinmeta;
if (call_cntr < max_calls)
{
char **values;
HeapTuple tuple;
Datum result;
/*
* Prepare an array of values for building the returned tuple.
* This should be an array of C strings which will
* be processed later by the type input functions.
*/
values = (char **) palloc(3 * sizeof(char *));
values[0] = (char *) palloc(16 * sizeof(char));
values[1] = (char *) palloc(16 * sizeof(char));
values[2] = (char *) palloc(16 * sizeof(char));
snprintf(values[0], 16, "%d", 1 * PG_GETARG_INT32(1));
snprintf(values[1], 16, "%d", PG_GETARG_INT32(1) * PG_GETARG_INT32(1));
snprintf(values[2], 16, "%d", PG_GETARG_INT32(1) * PG_GETARG_INT32(1) * PG_GETARG_INT32(1));
/* build a tuple */
tuple = BuildTupleFromCStrings(attinmeta, values);
/* create a Datum */
result = HeapTupleGetDatum(tuple);
/* clean up (this is not really necessary in this case) */
pfree(values[0]);
pfree(values[1]);
pfree(values[2]);
pfree(values);
SRF_RETURN_NEXT(funcctx, result);
}
else
{
SRF_RETURN_DONE(funcctx);
}
}
Declare this function in PostgreSQL:
CREATE TYPE __my_func AS (f1 integer, f2 integer, f3 integer);
CREATE OR REPLACE FUNCTION my_func(integer, integer)
RETURNS SETOF __my_func
AS '/usr/lib/adpg16/include/server/my_foo.so', 'my_func'
LANGUAGE C IMMUTABLE STRICT;
A different way to declare this function is to use OUT parameters:
CREATE OR REPLACE FUNCTION my_func(IN integer, IN integer,
OUT f1 integer, OUT f2 integer, OUT f3 integer)
RETURNS SETOF record
AS '/usr/lib/adpg16/include/server/my_foo.so', 'my_func'
LANGUAGE C IMMUTABLE STRICT;
Call the function:
SELECT my_func(6,5);
Result:
my_func ------------ (5,25,125) (5,25,125) (5,25,125) (5,25,125) (5,25,125) (5,25,125) (6 rows)