The bytea data type

The bytea data type allows you to store variable-length binary strings. A binary string is a sequence of octets (bytes). Binary strings differ from character strings in the following:

  • Binary strings allow storing octets of zero values and other non-printable octets. Character strings do not allow zero octets and any other octet values that are invalid according to the database encoding.

  • Operations on binary strings process bytes, the character string processing depends on locale settings.

Binary strings are appropriate for "raw bytes", the character strings are suitable for text.

The bytea data type adds 1 or 4 bytes to the binary string size.

Bytea supports two formats for input and output: the hex format and PostgreSQL historical escape format. The input data can be one of these formats. The bytea_output configuration parameter determines the output format, its default value is hex.

The bytea hex format

The hex format encodes binary data as two hexadecimal digits per byte. It has the following format:

 '\x<binary string>'

Example:

SELECT '\xDEADBEEF';

The hexadecimal digits can be upper or lower case, whitespaces are permitted between digit pairs and prohibited within a digit pair and at the beginning of the \x sequence. The hex format is compatible with a wide range of applications and protocols, and it tends to be faster to convert than the escape format.

The bytea escape format

The escape format is the traditional PostgreSQL format for the bytea type. It represents a binary string as a sequence of ASCII characters and converts bytes that cannot be represented as an ASCII character into escape sequences.

To escape an octet, convert it into its three-digit octal value and precede it with a backslash. To add the backslash character to a binary string, use the double backslash (\\). While any octet value can be escaped, the values in the table below must be escaped.

Decimal octet value Description Escaped input representation Example Hex representation

0

zero octet

'\000'

'\000'::bytea

\x00

39

single quote

'''' or '\047'

''''::bytea

\x27

92

backslash

'\\' or '\134'

'\\'::bytea

\x5c

0 to 31 and 127 to 255

"non-printable" octets

'\xxx' (octal value)

'\001'::bytea

\x01

The requirement to escape "non-printable" octets depends on locale settings.

The bytea octets are output in the hex format by default.

If you change bytea_output to escape, PostgreSQL converts "non-printable" octets (decimal octet values from 0 to 31 and from 127 to 255) to equivalent three-digit octal values. Printable octets are output as their standard representation in the client character set. The backslash octet is doubled in the output.

Example:

SET bytea_output = 'escape';

SELECT 'abc \144\145\146 \052 \251 \134'::bytea;

The result:

       bytea
-------------------
 abc def * \251 \\
TIP
The escape format blurs the differences between binary strings and character strings and has unwieldy escape mechanism. It is preferable to use the hex format instead of escape.

Binary string functions and operators

PostgreSQL provides a lot of functions and operators for binary string manipulations. Many of them are equivalent to the text string functions. This section gives multiple examples. For more information on binary string operators and functions, refer to the PostgreSQL documentation: Binary string functions and operators.

SQL functions and operators

Some of the SQL operators and functions for working with binary strings are described below:

  • Use the || operator to concatenate two binary strings:

    SELECT '\x123456'::bytea || '\x789a00bcde'::bytea;

    The result: \x123456789a00bcde.

  • The position function returns the first starting index of the specified substring in bytes, or zero if the substring is not found:

    SELECT position('\x5678'::bytea in '\x1234567890'::bytea);

    The result: 3.

  • The substring function extracts a substring that starts at the specified byte and has the specified length in bytes. The function has the following syntax:

    substring ( bytes bytea [ FROM start integer ] [ FOR count integer ] ) → bytea

    Example:

    SELECT substring('\x1234567890'::bytea from 3 for 2);

    The result: \x5678.

Text/binary string conversion functions

PostgreSQL allows you to convert strings between different encodings and transform binary data to text form. The available conversions are listed in the following article: Available character set conversions. You can use the following functions:

  • convert. The function converts a binary string that contains text in the src_encoding encoding to a binary string in the dest_encoding encoding.

    convert ( <bytes> bytea, <src_encoding>, <dest_encoding> ) → bytea

    Example:

    SELECT convert('text_in_utf8', 'UTF8', 'LATIN1');

    The result: \x746578745f696e5f75746638.

  • convert_from. The function converts a binary string that contains text in the src_encoding encoding to text in the database encoding.

    convert_from ( <bytes> bytea, <src_encoding> ) → text

    Example:

    SELECT convert_from('\x746578745f696e5f75746638', 'UTF8');

    The result: text_in_utf8.

  • convert_to. The function converts a text string in the database encoding to a binary string in the dest_encoding encoding.

    convert_to ( <string> text, <dest_encoding> ) → bytea

    Example:

    SELECT convert_to('some_text', 'UTF8');

    The result: \x736f6d655f74657874.

  • encode. The function encodes binary data to a text representation. Supported formats are: base64, escape, hex.

    encode ( <bytes> bytea, <format> text ) → text

    Example:

    SELECT encode('\x41445047', 'base64');

    The result: QURQRw==.

  • decode. The function decodes binary data from a text string. Supported formats are: base64, escape, hex.

    decode ( <string> text, <format> text ) → bytea

    Example:

    SELECT decode('QURQRw==', 'base64');

    The result: \x41445047.

Found a mistake? Seleсt text and press Ctrl+Enter to report it