What Characters Are Allowed in PostgreSQL Strings?

Start supercharging your PostgreSQL today.

Written by Warda Bibi

Strings represent data in a readable and meaningful format, making it easier to interpret, search for, and communicate information effectively. They bridge the gap between raw data and human interpretation.

PostgreSQL offers different data types to work with strings, allowing you to store a variety of characters. These strings can include regular characters as well as special characters that have specific roles or functions.

In this blog, we will explore which characters are allowed in PostgreSQL strings, how they are used, and ways to handle them in string constants.

Overview of PostgreSQL Character Data Types

In PostgreSQL, character data types are used to store strings. String values are represented as literals in single quotes. For example, 'hello' is a string literal.

PostgreSQL provides three primary character types:

CHARACTER(n) or CHAR(n)
CHARACTER VARYING(n) or VARCHAR(n)
TEXT

Here, n is a positive integer that specifies the number of characters.

The CHAR(n) and VARCHAR(n) data types are well-suited for strings with a defined or limited length, such as usernames or email addresses. In contrast, the TEXT type is better suited for storing large, variable-length strings like blog posts or product descriptions. The following table illustrates the character types in PostgreSQL:

Character Types	Description
CHARACTER(n), CHAR(n)	Fixed-length, blank padded
CHARACTER VARYING(n), VARCHAR(n)	Variable-length with length limit
TEXT, VARCHAR	Variable unlimited length

CHAR(n)

CHAR(n) stores strings in a fixed length. If a string is shorter than n, it is padded with spaces. Suppose you define a column as CHAR(5) in a table:

CREATE TABLE Users(username CHAR(5));

Now, insert a username shorter than five characters: INSERT INTO Users (username) VALUES ('Sam');

The value 'Sam' is stored as 'Sam ' (with two trailing spaces) to make it exactly five characters long. This is because CHAR(n) always stores strings in fixed length by padding shorter strings with spaces.

The CHARACTER or CHAR without the length specifier (n) is the same as the CHARACTER(1) or CHAR(1). CHAR can be inefficient if most of the strings stored are shorter than the defined length.

VARCHAR(n)

VARCHAR(n) can store strings of different lengths up to a specified maximum (n). It can store a string up to 65,535 bytes long. Unlike CHAR, VARCHAR does not add extra spaces to make the string fit a fixed length. If the string is shorter than the defined maximum length, it uses only the exact space needed for the characters. Suppose you create a table with a VARCHAR column:

CREATE TABLE Users (username VARCHAR(20));

If you insert 'Sam', the database stores it as 'Sam' without adding any padding. The remaining 17 characters are not allocated or reserved, so there is no wasted space.
If you insert 'Alexander', it stores the full string 'Alexander'.
If you insert 'A string longer than 20 characters', it will throw an error.

If n is not specified, a VARCHAR column can store strings of any length, constrained only by PostgreSQL's maximum field size limit of one GB.

TEXT

The TEXT data type is designed to store large amounts of variable-length text. It can hold strings of virtually unlimited length, although it is typically constrained by database system limits, such as a maximum of 65,535 bytes in PostgreSQL. Unlike VARCHAR, which can be used with or without a specified length limit (e.g., VARCHAR(n) or VARCHAR), TEXT does not allow length constraints.

When VARCHAR is used without a specified length (VARCHAR without n), it behaves similarly to the TEXT data type. This means VARCHAR can store strings of unlimited length, subject only to the maximum size of a field in PostgreSQL, which is about one GB.

TEXT is commonly used for columns where the length of the input data is not predictable or varies significantly, such as storing descriptions, comments, or unstructured text.

Here’s an example of creating a table with a TEXT column: CREATE TABLE book( description TEXT );

Exceptions to Length Limits

Both CHAR(n) and VARCHAR(n) can store up to n characters, and If you try to insert a string longer than n characters, PostgreSQL will raise an error. However, one exception is that if the excessive characters are all spaces, PostgreSQL truncates the spaces to the maximum length (n) and stores the trimmed characters.

To illustrate this, create a column with CHAR(5):

CREATE TABLE sensor_data ( sensor_id CHAR(5), description TEXT);

Now try inserting data with more than five characters.

INSERT INTO sensor_data (sensor_id) VALUES ('SEN123'); -- Error

Here, since "SEN123" contains six characters but the allowed storage length is only five characters, PostgreSQL will raise an error because it exceeds the limit.
However, if the exceeded characters are spaces, PostgreSQL will trim them to fit within the five-character limit. For instance, "SEN12 " (with two spaces at the end) will be stored as "SEN12" with the trailing spaces removed.

One more thing: If a string explicitly casts to a CHAR(n) or VARCHAR(n), PostgreSQL will truncate the string to n characters before inserting it into the table.

For instance, if "SEN123" is explicitly cast to VARCHAR(5), it will be stored as "SEN12".

INSERT INTO sensor_data (sensor_id) VALUES ('SEN123'::VARCHAR(5));

This query will insert sensor_id as 'SEN12'.

Character Encoding in PostgreSQL

Encoding is the process of converting human-readable characters (e.g., "A," "é") into a computer-readable binary format (0s and 1s). Different languages have unique character sets, requiring various encoding standards like ASCII, UTF-8, or Latin-1 to map these characters to binary.

PostgreSQL supports multiple encodings, but UTF-8 is the default and most versatile option. UTF-8 can store characters from nearly all languages and symbol sets, making it ideal for modern, multilingual applications.

The chosen encoding affects how PostgreSQL stores and retrieves text data. Using the wrong encoding can lead to data corruption, such as:

Special characters are replaced with question marks (e.g., ?).
Characters are displayed incorrectly.

For example, if your PostgreSQL database uses Latin-1 encoding but you attempt to store Japanese characters, they may not render properly. You can check your database's encoding with the command:

SHOW server_encoding;

To ensure compatibility, it’s recommended to use UTF-8 encoding.

Characters Allowed in PostgreSQL Strings

PostgreSQL’s string data types can store a wide range of characters, including standard characters, whitespace characters, escape characters, and Unicode characters.

Standard characters

PostgreSQL supports a variety of standard characters, such as:

Alphanumeric characters, including the following
- Letters (A-Z, a-z)
- Digits (0-9)
Common symbols, such as the following:
- Punctuation marks (! , ? , ; , ‘ )
- Mathematical operators (+, - ,* , /)
- Special characters ( @ , # , $ , % ), etc.

Whitespace characters

PostgreSQL strings can include whitespace characters:

Spaces
Tabs
Newlines

This flexibility allows you to store multi-line strings or text with varying formatting.

Escape characters

Escaping is a process of indicating that the mentioned character should be treated differently than its usual meaning within a string, code, or expression. This is often done to handle special characters or reserved symbols that might otherwise be interpreted in a different way by the programming language, database system, or environment.

The backslash (\) is the most common escape character. It signals that the next character should be treated differently. A combination of the backslash and the character following it forms an escape sequence. This changes how that character is interpreted.

Here’s an example:

\' represents a single quote
\\ represents a literal backslash
\n represents a new line
\t represents a tab

To properly escape these sequences in a string, PostgreSQL requires the use of the E prefix, which tells PostgreSQL that escape sequences should be interpreted.

Unicode characters

There are about 7,164 languages spoken worldwide. The Unicode Standard supports over 168 scripts, covering many of these languages. Unicode is a standardized character encoding system that allows the representation of characters from practically:

All modern and historical languages
Emojis
Specialized symbols

PostgreSQL supports Unicode, which means it can store and process characters from any language or symbol set that is part of the Unicode standard.

Handling Special Characters

Certain characters in PostgreSQL have special meanings and functions. They perform specific actions, such as escaping other characters or denoting certain operations in SQL queries. When these special characters appear in string literals, they might cause errors or behave unexpectedly if not properly handled. Here are some common special characters and how they are typically handled to avoid issues. For a quick reference, check the cheat sheet at the end.

Single quote (')

The single quote (') in PostgreSQL is a special character because it is used to define the beginning and end of a string literal like this 'active'.

Suppose we need to include an actual single quote within the string. In that case, we can't directly use a single quote, as it would prematurely terminate the string or cause an error, as it would be seen as the end of the string.

To handle this situation, we need to escape it. Here are the different ways we can escape single quotes.

Method 1: Two consecutive single quotes

One way to escape single quotes is by using two consecutive single quotes ( ' ' ). For example,

INSERT INTO sensor_data (description) VALUES ('This string contains ''single quotes'' correctly ');

will be inserted into the 'description' column as:

'This string contains 'single quotes' correctly'

Method 2: Dollar quoting($$)

We can also use the $$ delimiter to enclose string literals, allowing us to include single quotes within the string without the need to escape them. Here's an example:

SELECT $$This string contains, 'single quotes'$$ AS example;

The output of this query will be:

This string contains, 'single quotes'

In this case, the $$ delimiter is used to enclose the string literal, and the single quotes inside the string are treated as part of the content, not as string delimiters, making it easier to include special characters like single quotes without escaping them.

Method 3: Escape string literal

We can also use the E prefix before the string literal and escape the single quote with a backslash (\).

SELECT E'This string contains, \'single quotes\'' AS description;

The E before the string literal indicates that the string is an escape string literal, which means it can interpret escape sequences. In this example, the single quote (') is escaped as \'.

The query will return the following:

This string contains, 'single quotes'

Backslash (\)

The backslash (\) in PostgreSQL is also a special character because it is used for escaping special characters within E-prefixed string literals. Backslash followed by certain characters have special meanings (e.g., \n for newline, \t for tab). But to include a backslash within a string, we need to use two backslashes (\\) with a prefix E or double quoting.

Method 1: Double backslash (\\)

Let's say we want to insert the following file path into the description column of the sensor_data table:

C:\sensor_data\logs\2024\12\27.txt

If we try to insert the path like this:

INSERT INTO sensor_data (description) VALUES ('C:\sensor_data\logs\2024\12\27.txt');PostgreSQL will return an error because the backslash is treated as an escape character. Specifically, \s is an invalid escape sequence in PostgreSQL, which causes the query to fail.

To correctly insert file paths containing backslashes, we need to escape them by using two backslashes (\\). Additionally, we have to use the E prefix to indicate that the string includes escape sequences.

INSERT INTO sensor_data (description) VALUES (E'C:\\sensor_data\\logs\\2024\\12\\27.txt');

In this example, the description column stores the file path, which is actually stored as C:\sensor_data\logs\2024\12\27.txt.

To include the backslashes within the string literal, we use two backslashes (\\) for each backslash in the actual path.

Note: You can also use dollar quoting (as mentioned above) to handle backslashes (\) within the string itself.

NewLine (\n)

The newline character (\n) is a special character that signifies the end of a line and the beginning of a new one. So, when used within an E-prefixed string, \n has a specific meaning. It doesn't literally represent the characters \ and n. Instead, it instructs the database to insert a newline at that position.

To include a literal \n in an E-prefixed string, we must escape the backslash (\) by doubling it (\\n). Alternatively, we can avoid using the E-string syntax and directly include \n as plain text. Suppose we are storing a log entry in a sensor_data table. The entry includes both a newline character (\n) and a single quote (') within the text. INSERT INTO sensor_data (description) VALUES (E'Sensor reading: 98.6°C\nWarning: \'Temperature exceeds threshold!\'');

Here \n will insert a new line instead. To add a string literal, add another \ to escape it.

INSERT INTO sensor_data (description) VALUES (E'Sensor reading: 98.6°C\\nWarning: \'Temperature exceeds threshold!\'');

Here, the \n remains literal in the output, as shown below: Sensor reading: 98.6°C\nWarning: 'Temperature exceeds threshold!'

Percent Sign (%)

The percent sign is used as a wildcard character for pattern matching in LIKE and ILIKE patterns. It matches for zero or more characters. To include a literal percent sign (%) within a string of pattern matching, we need to use the escape character (\).

We have a sensor_data table that logs messages about the battery status of devices. These messages include a literal % sign, such as "Battery at 100%."

INSERT INTO sensor_data (description) VALUES ('Battery at 100%'), ('Battery at 100 but the sensors operating time is significantly shorter than expected');

If we want to search for messages containing the exact string "100%" (where the % is a literal character, not a wildcard), we can use the LIKE operator:

SELECT * FROM sensor_data WHERE description LIKE '%100%';

This query will output:

Battery at 100% Battery at 100 but the sensors operating time is significantly shorter than expected

Because the % symbol in the LIKE pattern is a wildcard that matches any number of characters (zero or more), both entries are returned.

To specifically match the literal 100% and not treat % as a wildcard, we can escape the percent symbol by using a backslash (\).

SELECT * FROM sensor_data WHERE description LIKE '%100\%';

This query will output:

Battery at 100%

By using the escape character (\), we tell PostgreSQL to treat the % symbol as a literal character, rather than as a wildcard for pattern-matching.

Underscore ( _ )

In PostgreSQL, the underscore character (_) acts as a wildcard that matches any single character in LIKE and ILIKE patterns. However, to include a literal underscore within a string pattern, we need to escape it using \.

Let’s say we have a sensor_data table that logs messages about sensor device files. These messages include filenames with underscores, such as:

"Temperature_sensor_log_01.csv" INSERT INTO sensor_data (description) VALUES ('temperature_sensor_log_01.csv'), ('humidity_sensorLog_01.csv'), ('sensorData_2024_report.txt'), ('sensor_log_2024_data.csv');

If we want to search for filenames containing the string "sensor_" (where the underscore _ is a literal character and not a wildcard), we can use the LIKE operator. By default, the underscore _ is treated as a wildcard character in the LIKE pattern, which matches any single character.

SELECT * FROM sensor_data WHERE description LIKE '%sensor_%';

This query will output the following: temperature_sensor_log_01.csv humidity_sensorLog_01.csv sensorData_2024_report.txt sensor_log_2024_data.csv

However, filenames such as:

humidity_sensorLog_01.csv
sensorData_2024_report.txt

are not exactly "sensor_" followed by more characters, but they still get matched because the underscore _ is interpreted as a pattern-matching operator for a single character.

To make sure the underscore is treated as a literal character (and not a wildcard), we need to escape it using a backslash \. Here’s how we can do it: SELECT * FROM sensor_data WHERE description LIKE '%sensor\_%';

This query will correctly output : temperature_sensor_log_01.csv sensor_log_2024_data.csv

By using the escape character \, we tell PostgreSQL to treat the underscore as a literal character and not as a wildcard for pattern matching.

Additional Case

Double quote (")

Double quotes are essential for the following:

Identifiers with special characters and reserved keywords
Maintaining case sensitivity

Case 1: Identifier with special characters

PostgreSQL typically restricts identifiers to alphanumeric characters and underscores. Special characters like spaces, reserved keywords, or hyphens are not allowed. By enclosing them in double quotes, PostgreSQL recognizes these as part of the name. The use of double quotes helps distinguish these identifiers and allows for special characters or case sensitivity.

An identifier enclosed within double quotes (") is called a delimited identifier. These identifiers are the names of database objects such as tables, columns, or even roles.

Suppose you are creating a table named Sensor Data. Since this name contains a space, which is not a standard character for identifiers, we will use double quotes to define it as a delimited identifier:

CREATE TABLE "Sensor Data" ( sensor_id serial PRIMARY KEY, battery_level INT );

By enclosing the table name in double quotes, we indicate that it's a delimited identifier, allowing PostgreSQL to treat the space as part of the name. Similarly, if we try to add a hyphen in the identifier, it will return an error.

CREATE TABLE Sensor-Data ( id INT ); -- Error - is a special character.

This will cause an error because 'Sensor-Data' contains a hyphen. However, with double quotes, we can include such characters:

CREATE TABLE "Sensor-Data" ( id INT );

Without double quotes, reserved keywords cannot be used as table or column names:

CREATE TABLE select ( id INT ); -- Error 'select' is a reserved keyword

With double quotes, we can use reserved keywords as identifiers:

CREATE TABLE "select" ( id INT );

This works because ''select'' is quoted.

Case 2: Case sensitivity

PostgreSQL treats unquoted identifiers as case-insensitive and converts them to lowercase. However, enclosing an identifier in double quotes makes it case-sensitive. For instance, if you create a column named "TotalDevices" with double quotes, you must always reference it with the same case and double quotes:

CREATE TABLE "Sensor Devices" ( "TotalDevices" INT ); SELECT "TotalDevices" FROM "Sensor Devices";

Without the double quotes, PostgreSQL would interpret the column name in a case-insensitive manner.

Special Characters in PostgreSQL—Cheat Sheet

Here’s a cheat sheet for handling special characters in PostgreSQL:

Character	Description	Usage	Example
Single Quote ( ' )	Used to define string literals, causing errors when included in strings	Two single quotes: 'It''s a sensor' Dollar quoting: $$It's a sensor$$ Escape string literal: E'It\'s a sensor'	INSERT INTO data VALUES ('It''s working');
Backslash ( \ )	Acts as an escape character in E-prefixed strings	Double backslash: E'C:\\path\\to\\file.txt' Dollar quoting: $$C:\path\to\file.txt$$	INSERT INTO data VALUES (E'C:\\folder\\file.txt');
Newline ( \n )	Inserts a new line instead of displaying \n	New line: E'Temperature: 98.6°C\nWarning: High!' Literal \n: E'Temperature: 98.6°C\\nWarning: High!'	INSERT INTO logs VALUES (E'Alert:\\nTemperature High!');
Percent Sign ( % )	Used as a wildcard in LIKE queries	Escape with \	SELECT * FROM sensor_data WHERE description LIKE '%100\%';
Underscore ( _ )	Matches any single character in LIKE queries	Escape with \	SELECT * FROM sensor_data WHERE description LIKE '%sensor\_%';
Double Quote ( " )	Needed for identifiers with special characters or case sensitivity	For special character identifiers: CREATE TABLE "Sensor-Data" (id INT); For reserved keywords: CREATE TABLE "select" (id INT); For case-sensitive identifiers: CREATE TABLE "Devices" ("TotalDevices" INT);	SELECT "TotalDevices" FROM "Devices";

Best Practices for PostgreSQL Special Characters

standard_conforming_strings

In PostgreSQL, the standard_conforming_strings setting is a configuration parameter that determines how escape sequences within string literals (such as \n, \t, and \\) are interpreted.

When standard_conforming_strings is enabled (default), the following happens:

Escape sequences such as \n, \t, and \\ are treated as literal characters.
The backslash (\) is treated as a regular character, and if we want to use escape sequences (like newline or tab), we must use the E prefix for escape string literals.

When standard_conforming_strings is disabled (older PostgreSQL versions), the following happens:

Escape sequences like \n, \t, and \\ are automatically interpreted. The backslash (\) is treated as the escape character for these sequences, even without the E prefix.

We can check the current value of standard_conforming_strings using the following code:

SHOW standard_conforming_strings;

E-prefix for escape sequences

When working with special characters like newlines (\n), tabs (\t), quotes (\’), or backslashes (\\) in PostgreSQL, we should always use the E prefix before the string. This ensures the database correctly interprets escape sequences instead of treating them as literal text.

Conclusion

Effectively handling characters in PostgreSQL is crucial for successful database development. Understanding data types, escaping techniques, and best practices can help ensure the proper handling of characters, making applications more reliable and secure. To wrap things up, here's a little PostgreSQL message for you: SELECT $$Happy Coding$$ AS message;

To learn more about PostgreSQL or solve common PostgreSQL problems, make sure to visit the PostgreSQL Tips section and the comprehensive PostgreSQL Guides, where you'll find advice on how to fine-tune PostgreSQL performance, handle indexes, and much more.