Handling Large Objects in Postgres

Written by Warda Bibi

Postgres is designed to handle various types of data efficiently, making it a powerful choice for businesses and developers. One common application is storing large files, such as audio, images, videos, and logs, along with related metadata like titles and descriptions. While these files can be stored in a separate system, keeping them within the database ensures data integrity, consistency, and security.

Under the hood, these multimedia files are stored as binary, which allows efficient handling of raw binary data. However, these files can be very large, and databases impose size limits on how much data can be stored in a single tuple. This limit varies depending on the database system.

For example, in Oracle, a dedicated large object (LOB) data type is used to store such large files:

BLOB (Binary Large Object): stores binary data like images, videos, and PDFs
CLOB (Character Large Object): for large text-based content

These data types can store up to 128 TB and store data in a separate LOB segment outside the table, optimizing performance for large objects.

So, how does Postgres handle storing large files like images, videos, or logs? In Postgres, storage units are called pages, and pages have a fixed size of 8 kB by default. This page size works great for most data, but when it comes to larger files, it needs some special mechanism to keep things running smoothly.

In this blog, we will explore these mechanisms and discuss best practices for storing large data efficiently.

How to Handle Large Objects in Postgres

There are mainly two ways to store large objects in Postgres: You can use existing data types, such as BYTEA for binary large objects or text for character-based large objects, or you can use pg_largeobject.

Existing data types

BYTEA

BYTEA stands for byte array. It is a binary data type that stores variable-length binary strings or byte sequences. The maximum size of a BYTEA column is 1 GB, which means we can only store binary data up to 1 GB in a single BYTEA column.

This data type is suitable for storing images, multimedia files, and raw binary data. Unlike text or character data types, BYTEA stores raw binary data without any encoding.

The following example shows how to define a column with the BYTEA data type:

First, create a table called binary_data to store binary strings:

CREATE TABLE binary_data( id INT, data BYTEA );

Inserting data into a BYTEA column requires converting the binary data into a format Postgres can understand. Use either hex format (\x) or escape format (\) for storing binary data. The hex format is preferred for its readability and ease of use.

INSERT INTO binary_data(id,data) VALUES (1,'\x012345');

Third, retrieve data from the BYTEA column:

SELECT * FROM binary_data; Postgres returns the binary data in the format it was stored.

Output:

id | data ----+---------- 1 | \x012345

TEXT

The TEXT data type is used to store large text-based content such as logs, documents, and textual data. It allows the storage of variable-length character data. Similar to BYTEA, the maximum size of a text column is 1 GB.

The following example shows how to define a column with the TEXT data type:

CREATE TABLE text_data ( id SERIAL PRIMARY KEY, content TEXT );

TOAST (The Oversized-Attribute Storage Technique):

Although TEXT and BYTEA have a size limit of 1 GB, Postgres has a default page size of 8 kB. Furthermore, a single tuple in a Postgres table cannot span multiple pages, as the relational database likes to ensure it can store multiple tuples on a single page. If tuples are too large, fewer tuples fit on each page, leading to increased I/O operations and reduced performance.

This means that when the combined size of all fields in a tuple exceeds approximately 2 kB, Postgres automatically offloads the excess data using TOAST (The Oversized-Attribute Storage Technique).

TOAST refers to the automatic mechanism that Postgres uses to efficiently store and manage values in Postgres that do not fit within a page. TOAST handles this in two primary ways:

Compression: Postgres can compress the large field values within the tuple to reduce their size using a compression algorithm. By default, if compression is sufficient to bring the tuple's total size below the threshold, the data will remain in the main table, albeit in a compressed format.
Out-of-line storage: If compression alone isn't effective enough to reduce the size of the large field values, Postgres moves them to a separate TOAST table. This process is known as "out-of-line" storage because the original tuple in the main table doesn’t hold the large field values anymore. Instead, it contains a "pointer" or reference to the location of the large data in the TOAST table.

Almost every table we create has its own associated (unique) TOAST table, which may or may not ever be used, depending on the size of the rows we insert. A table with only fixed-width columns like integers may not have an associated toast table. All of this is transparent to the user and enabled by default.

While TOAST efficiently handles large values, it has a practical limit of 1 GB per column. Furthermore, if you need to update a large object stored in a TOAST column, Postgres rewrites the entire object, which can be inefficient for large files. This is where Postgres' pg_largeobject comes into play. It is designed specifically for managing very large objects and provides a more flexible and efficient way to store, access, and manipulate massive datasets.

Pg_largeobject

In Postgres, pg_largeobject is a system table used to store and manage large objects of binary data. Each large object is broken into chunks or “pages” small enough to be conveniently stored as rows in pg_largeobject.

Each large object has an entry in the system table pg_largeobject_metadata. Using a read/write API, large objects managed using pg_largeobject can be created, modified, and deleted. pg_largeobject allows storing large objects up to four TB.

Here is the schema of the two system tables designed for pg_largeobject.

Here:

loid is the object ID for identifying the large object.
pageno is the sequential page number of the chunk.
data is the actual binary content of that chunk (stored as BYTEA). Data is stored row by row, with each page linked to its respective object via loid.

Here:

oid is the unique identifier for each large object.
lomowner represents the owner (role/user) of the large object identifying which user created or owns the object.
lomacl represents the Access Control List (ACL) which is used to store permissions for who can read/write/delete the large object.

Pg_largeobject interfaces

Postgres' pg_largeobject is like a file system inside your database, where you can create, open, read, write, and even seek (jump to a specific part of a file) just like you would with files on your computer.

To make this easy, Postgres offers a set of tools (or "interfaces") through its libpq library, which is the standard way for applications to talk to Postgres. These tools let us work with large objects in a way that feels familiar if you have ever worked with files in a programming language or on a Unix-like system. Below are all the interfaces related to pg_largeobject:

Note: When using the interface to manipulate a large object, it must happen within an SQL transaction block. This is because the system uses file descriptors to manage large objects, and these file descriptors are only valid for the duration of a transaction.

Creating a large object

To store a large object, we first need to create it using lo_create. The function generates a new large object and returns an OID (object identifier), which uniquely identifies the object in the database. For example:

SELECT lo_create(12345);

This creates a large object with an OID of 12345. If the OID is omitted, Postgres assigns a system-generated OID.

lo_create ----------- 12345 An entry is made in pg_largeobject_metadata, storing information such as ownership and access privileges for this object. The actual binary data is stored in pg_largeobject, referenced by the OID. At this stage, the large object contains no data so the pg_largeobject system table has no entry.

Reading/writing data

After creating a large object, we need to write data into it. This is typically done using a combination of lo_open, lo_write, and lo_close inside a transaction block.

Opening a large object is necessary because Postgres treats large object operations like file handling. We open the object to obtain a file descriptor, perform read/write operations using that file descriptor, and close it afterward. Furthermore, Postgres requires large object read/write operations to be performed inside a transaction block. Here are the syntax and descriptions of these functions:

lo_open(OID, MODE)

This opens a large object. The MODE parameter determines whether the object is opened for one of these purposes:

Reading (INV_READ = 0x40000 or 262144 in decimal)
Writing (INV_WRITE = 0x20000 or 131072 in decimal)
Both (INV_READ | INV_WRITE = 0x60000 or 393216 in decimal)

On success, this function returns a non-negative large object descriptor, which is further used with functions like lo_read, lo_write, lo_lseek, lo_tell, lo_truncate, and lo_close. The descriptor remains valid only for the duration of the current transaction.

lowrite(fd, content)

This writes data to a large object identified by file descriptor fd. It returns the number of bytes actually written. In the event of an error, the return value is -1.

loread(fd, length)

This reads length bytes of data from a large object.

lo_close(fd)

This closes the large object file descriptor (fd). On success, lo_close returns zero. In the case of error, the return value is -1. Any large object descriptors that remain open at the end of a transaction will be closed automatically.

Example:

Let's go step by step to manipulate a large object: postgres=# BEGIN;This starts a transaction block, meaning all the subsequent operations will either commit (save) together or roll back if an error occurs.

postgres=*# SELECT lo_open(12345, 131072); lo_open --------- 0

This query returns 0, which is a file descriptor-like number assigned to this large object.

This descriptor (0) will be used in subsequent operations.

postgres=*# SELECT lowrite(0, 'Hello, World!'); lowrite --------- 13

This query writes "Hello, World!" into the large object identified by descriptor 0. The return value 13 represents the number of bytes written (since "Hello, World!" has 13 characters).

postgres=*# SELECT loread(0, 12); loread ---------------------------- \x48656c6c6f2c20576f726c64

This query reads 12 bytes from the large object associated with file descriptor 0. The result is in hexadecimal (hex) format. If you want to display the text instead of hex, you can cast it to text:

postgres=*# SELECT convert_from(loread(0, 12), 'UTF8'); convert_from -------------- Hello, World postgres=*# SELECT lo_close(0);

This releases the descriptor but does not delete the object from the database.

postgres=*# COMMIT;

This permanently saves all operations inside the transaction.

lo_lseek

This function moves the current location pointer for the large object descriptor identified by fd to the new location specified by offset.

Syntax:

lo_lseek(fd, offset, whence);

Here:

fd is a file descriptor returned by lo_open.
offset represents the number of bytes to move.
whence specifies where to seek from:
- 0 (SEEK_SET) means move from the beginning.
- 1 (SEEK_CUR) means move relative to the current position.
- 2 (SEEK_END) means move from the end.

The return value is the new location pointer or -1 on error.

For example:

SELECT lo_lseek(0, 7, 0);This moves the cursor seven (7) bytes from the beginning. Now your read/write operation will read/write from the new position.

SELECT loread(0, 5);If the large object contains "Hello, World!", this will read:

'World'

Note: When dealing with large objects that might exceed 2 GB in size, instead use:

lo_lseek64(fd, offset, whence);This function behaves like lo_lseek, but it can accept an offset larger than 2 GB and/or deliver a result larger than 2 GB. Note that lo_lseek will fail if the new location pointer is greater than 2 GB.

Updating a large object

One major advantage of using pg_largeobject is that it allows us to modify a specific portion of a large object instead of replacing the entire content. This is possible because Postgres provides chunk-based access, enabling efficient updates.

The following example demonstrates how to modify an existing large object by seeking a specific position and writing new data over it. Instead of replacing the entire object, we update only a portion of it.

begin; select lo_open(12345, x'60000'::int); select lo_lseek(0, 7, 0); select lowrite(0, ', + large object.\012'); Select lo_close(0) commit;Explanation:

Start a transaction to ensure atomicity.
Open the large object (12345) in read-write mode (x'60000'::int = INV_READ | INV_WRITE).
Seek to byte position 7 using lo_lseek(0, 7, 0), where 0 (SEEK_SET) moves the cursor from the beginning. That means any write operation overwrites content starting from this position.
Write new data at the current position using lowrite().
Commit the transaction, saving the modifications.

By default, the BYTEA output is hexadecimal (\x48656c6c6f). To make it more readable:

SET bytea_output = 'escape'; Now to see the updated object : postgres=# select * from pg_largeobject; loid | pageno | data -------+--------+------------------------------ 12345 | 0 | Hello, , + large object.\012 This demonstrates partial modification of a large object without replacing the entire content. This is particularly useful when doing one of the following:

Appending additional metadata to a stored binary file
Fixing or updating content without rewriting the entire object
Overwriting specific sections of an image, video, or other binary data

Importing a large object

We can also import an external file into pg_largeobject. The syntax for that is:

lo_import(File_Path)

This will create an object, store the file’s content in it, and return the object's OID.

To properly test the use case, let's create a large file using the following command:

dd if=/dev/urandom of=/tmp/large_file.bin bs=1M count=1024

This creates a random binary file (large_file.bin) of 1 GB. Now import this file in Postgres large object:

SELECT lo_import('/tmp/large_file.bin'); lo_import ----------- 24600This command stores the file in pg_largeobject with OID 24600. Postgres will split the file into 2 kB chunks for efficient retrieval and modification. To confirm this, use: postgres=# SELECT loid, pageno, length(data) FROM pg_largeobject WHERE loid = 24600 LIMIT 10; loid | pageno | length -------+--------+-------- 24600 | 0 | 2048 24600 | 1 | 2048 24600 | 2 | 2048 24600 | 3 | 2048 24600 | 4 | 2048 24600 | 5 | 2048 24600 | 6 | 2048 24600 | 7 | 2048 24600 | 8 | 2048 24600 | 9 | 2048

Explanation:

The file is chunked into 2048-byte (2 kB) pages.
A 1 GB file has ~524,288 pages (1 GB / 2 kB = 524,288).

So, instead of loading 1 GB into memory, we can read a portion using:

SELECT encode(loread(lo_open(24600, 262144), 50), 'hex'); encode ------------------------------------------------------------------------------------------- 173c11a19ded7b79c5a4b01a3ae4683c5a5ba3a44714e8dbaec50db73c147ccb4ee9791c8f9cabcb406bec5a0aaf99b38fda

Here:

lo_open opens the file (OID 24600) in read mode (262144 = INV_READ).
loread reads 50 bytes and encodes in hex for readability.
This is useful for streaming large data instead of loading it all at once.

To demonstrate efficient modification, let’s overwrite some bytes at page 100.

BEGIN; SELECT lo_open(24600, 131072 | 262144); -- Open in Read-Write mode SELECT lo_lseek(0, 204800, 0); -- Move to byte 200KB (100th page) SELECT lowrite(0, 'MODIFIED_DATA_HERE'); COMMIT;

Here:

Seek 200 kB into the file ( 2048-byte chunks).
Overwrite only a small part instead of replacing 1 GB.

Exporting the large object

After working with a large object (modifying or appending data), we can export it back to the filesystem using:

SELECT lo_export(24600, '/tmp/lo_test_new.txt'); lo_export ----------- 1This writes the large object with OID 24600 to the file /tmp/lo_test_new.txt.The return value (1) indicates success. Now, let's check the exported file:

ls -l /tmp/lo_test_new.txt -rw-r--r-- 1 user wheel 1073741824 Feb 21 00:31 /tmp/lo_test_new.txtThe exported file contains the modified data, including any changes made while it was stored in pg_largeobject.

Deleting a large object

When a large object is no longer needed, it should be deleted to free up storage space. The lo_unlink(OID) function removes the large object from the pg_largeobject system catalog.

For example:

SELECT lo_unlink(24600) lo_unlink ----------- 1

This command deletes the large object with OID 24600. A return value of 1 indicates that the deletion was successful.

Conclusion

Postgres offers various tools to handle large files, each tailored to different needs and scenarios. Use BYTEA for smaller binary data, rely on TOAST for moderate-sized objects, and turn to pg_largeobject when working with extremely large files that demand advanced features like streaming or partial updates. Developers who understand these mechanisms will be able to easily design databases that are both performant and scalable.

If you're working with massive but structured datasets, TimescaleDB's hypercore columnar compression is designed for structured, time-series, and analytical workloads rather than raw file storage. By leveraging column-oriented storage and time-based chunking, it achieves superior compression ratios (often 90 % or more) while maintaining compatibility with standard SQL queries. TimescaleDB's compression works transparently, reducing storage costs and improving query performance without additional complexity.

For developers managing structured datasets at scale, TimescaleDB provides a more efficient and scalable alternative to traditional Postgres large object storage mechanisms. To try it yourself, create a free account today.