38.10. C-Language Functions
User-defined functions can be written in C (or a language that can be made compatible with C, such as C++). Such functions are compiled into dynamically loadable objects (also called shared libraries) and are loaded by the server on demand. The dynamic loading feature is what distinguishes “C language” functions from “internal” functions — the actual coding conventions are essentially the same for both. (Hence, the standard internal function library is a rich source of coding examples for user-defined C functions.)
Currently only one calling convention is used for C functions (“version 1”). Support for that calling convention is indicated by writing a PG_FUNCTION_INFO_V1()
macro call for the function, as illustrated below.
38.10.1. Dynamic Loading
The first time a user-defined function in a particular loadable object file is called in a session, the dynamic loader loads that object file into memory so that the function can be called. The CREATE FUNCTION
for a user-defined C function must therefore specify two pieces of information for the function: the name of the loadable object file, and the C name (link symbol) of the specific function to call within that object file. If the C name is not explicitly specified then it is assumed to be the same as the SQL function name.
The following algorithm is used to locate the shared object file based on the name given in the CREATE FUNCTION
command:
If the name is an absolute path, the given file is loaded.
If the name starts with the string
$libdir
, that part is replaced by the PostgreSQL package library directory name, which is determined at build time.If the name does not contain a directory part, the file is searched for in the path specified by the configuration variable dynamic_library_path.
Otherwise (the file was not found in the path, or it contains a non-absolute directory part), the dynamic loader will try to take the name as given, which will most likely fail. (It is unreliable to depend on the current working directory.)
If this sequence does not work, the platform-specific shared library file name extension (often .so
) is appended to the given name and this sequence is tried again. If that fails as well, the load will fail.
It is recommended to locate shared libraries either relative to $libdir
or through the dynamic library path. This simplifies version upgrades if the new installation is at a different location. The actual directory that $libdir
stands for can be found out with the command pg_config --pkglibdir
.
The user ID the PostgreSQL server runs as must be able to traverse the path to the file you intend to load. Making the file or a higher-level directory not readable and/or not executable by the postgres user is a common mistake.
In any case, the file name that is given in the CREATE FUNCTION
command is recorded literally in the system catalogs, so if the file needs to be loaded again the same procedure is applied.
Note
PostgreSQL will not compile a C function automatically. The object file must be compiled before it is referenced in a CREATE FUNCTION
command. See Section 38.10.5 for additional information.
To ensure that a dynamically loaded object file is not loaded into an incompatible server, PostgreSQL checks that the file contains a “magic block” with the appropriate contents. This allows the server to detect obvious incompatibilities, such as code compiled for a different major version of PostgreSQL. To include a magic block, write this in one (and only one) of the module source files, after having included the header fmgr.h
:
After it is used for the first time, a dynamically loaded object file is retained in memory. Future calls in the same session to the function(s) in that file will only incur the small overhead of a symbol table lookup. If you need to force a reload of an object file, for example after recompiling it, begin a fresh session.
Optionally, a dynamically loaded file can contain an initialization function. If the file includes a function named _PG_init
, that function will be called immediately after loading the file. The function receives no parameters and should return void. There is presently no way to unload a dynamically loaded file.
38.10.2. Base Types in C-Language Functions
To know how to write C-language functions, you need to know how PostgreSQL internally represents base data types and how they can be passed to and from functions. Internally, PostgreSQL regards a base type as a “blob of memory”. The user-defined functions that you define over a type in turn define the way that PostgreSQL can operate on it. That is, PostgreSQL will only store and retrieve the data from disk and use your user-defined functions to input, process, and output the data.
Base types can have one of three internal formats:
pass by value, fixed-length
pass by reference, fixed-length
pass by reference, variable-length
By-value types can only be 1, 2, or 4 bytes in length (also 8 bytes, if sizeof(Datum)
is 8 on your machine). You should be careful to define your types such that they will be the same size (in bytes) on all architectures. For example, the long
type is dangerous because it is 4 bytes on some machines and 8 bytes on others, whereas int
type is 4 bytes on most Unix machines. A reasonable implementation of the int4
type on Unix machines might be:
(The actual PostgreSQL C code calls this type int32
, because it is a convention in C that int
XX
means XX
bits. Note therefore also that the C type int8
is 1 byte in size. The SQL type int8
is called int64
in C. See also Table 38.2.)
On the other hand, fixed-length types of any size can be passed by-reference. For example, here is a sample implementation of a PostgreSQL type:
Only pointers to such types can be used when passing them in and out of PostgreSQL functions. To return a value of such a type, allocate the right amount of memory with palloc
, fill in the allocated memory, and return a pointer to it. (Also, if you just want to return the same value as one of your input arguments that's of the same data type, you can skip the extra palloc
and just return the pointer to the input value.)
Finally, all variable-length types must also be passed by reference. All variable-length types must begin with an opaque length field of exactly 4 bytes, which will be set by SET_VARSIZE
; never set this field directly! All data to be stored within that type must be located in the memory immediately following that length field. The length field contains the total length of the structure, that is, it includes the size of the length field itself.
Another important point is to avoid leaving any uninitialized bits within data type values; for example, take care to zero out any alignment padding bytes that might be present in structs. Without this, logically-equivalent constants of your data type might be seen as unequal by the planner, leading to inefficient (though not incorrect) plans.
Warning
Never modify the contents of a pass-by-reference input value. If you do so you are likely to corrupt on-disk data, since the pointer you are given might point directly into a disk buffer. The sole exception to this rule is explained in Section 38.12.
As an example, we can define the type text
as follows:
The [FLEXIBLE_ARRAY_MEMBER]
notation means that the actual length of the data part is not specified by this declaration.
When manipulating variable-length types, we must be careful to allocate the correct amount of memory and set the length field correctly. For example, if we wanted to store 40 bytes in a text
structure, we might use a code fragment like this:
VARHDRSZ
is the same as sizeof(int32)
, but it's considered good style to use the macro VARHDRSZ
to refer to the size of the overhead for a variable-length type. Also, the length field must be set using the SET_VARSIZE
macro, not by simple assignment.
Table 38.2 shows the C types corresponding to many of the built-in SQL data types of PostgreSQL. The “Defined In” column gives the header file that needs to be included to get the type definition. (The actual definition might be in a different file that is included by the listed file. It is recommended that users stick to the defined interface.) Note that you should always include postgres.h
first in any source file of server code, because it declares a number of things that you will need anyway, and because including other headers first can cause portability issues.
Table 38.2. Equivalent C Types for Built-in SQL Types
boolean
bool
postgres.h
(maybe compiler built-in)
box
BOX*
utils/geo_decls.h
bytea
bytea*
postgres.h
"char"
char
(compiler built-in)
character
BpChar*
postgres.h
cid
CommandId
postgres.h
date
DateADT
utils/date.h
float4
(real
)
float4
postgres.h
float8
(double precision
)
float8
postgres.h
int2
(smallint
)
int16
postgres.h
int4
(integer
)
int32
postgres.h
int8
(bigint
)
int64
postgres.h
interval
Interval*
datatype/timestamp.h
lseg
LSEG*
utils/geo_decls.h
name
Name
postgres.h
numeric
Numeric
utils/numeric.h
oid
Oid
postgres.h
oidvector
oidvector*
postgres.h
path
PATH*
utils/geo_decls.h
point
POINT*
utils/geo_decls.h
regproc
RegProcedure
postgres.h
text
text*
postgres.h
tid
ItemPointer
storage/itemptr.h
time
TimeADT
utils/date.h
time with time zone
TimeTzADT
utils/date.h
timestamp
Timestamp
datatype/timestamp.h
timestamp with time zone
TimestampTz
datatype/timestamp.h
varchar
VarChar*
postgres.h
xid
TransactionId
postgres.h
Now that we've gone over all of the possible structures for base types, we can show some examples of real functions.
38.10.3. Version 1 Calling Conventions
The version-1 calling convention relies on macros to suppress most of the complexity of passing arguments and results. The C declaration of a version-1 function is always:
In addition, the macro call:
must appear in the same source file. (Conventionally, it's written just before the function itself.) This macro call is not needed for internal
-language functions, since PostgreSQL assumes that all internal functions use the version-1 convention. It is, however, required for dynamically-loaded functions.
In a version-1 function, each actual argument is fetched using a PG_GETARG_
xxx
()
macro that corresponds to the argument's data type. (In non-strict functions there needs to be a previous check about argument null-ness using PG_ARGISNULL()
; see below.) The result is returned using a PG_RETURN_
xxx
()
macro for the return type. PG_GETARG_
xxx
()
takes as its argument the number of the function argument to fetch, where the count starts at 0. PG_RETURN_
xxx
()
takes as its argument the actual value to return.
Here are some examples using the version-1 calling convention: