Apr 11, 2024
Posted by
Kirk Laurence Roybal
In our last article about PostgreSQL shared_buffers
we went back in time to 1990. Let's Marty McFly our way over to about 1998. This was when Stella got her groove back, and PostgreSQL got its SQL. (Actually, PostgreSQL was released in 1997, but it's funnier telling it our way.)
At about the same time, the shared memory system was largely finalized into what we know today as shared_buffers
. So today, we'll talk about how the memory management part of the system works. This is one of the most important parts of PostgreSQL scalability. Memory management is second only to CPU management in vertical scalability tuning for PostgreSQL.
In future articles, we'll talk about how the backends work (the processes that serve each connection), how the autovacuum works (yes, the vacuum process is involved in shared memory for some strange reason), and how the background writer is involved (everybody wanted in on the party, so why not?).
For now, we'll concern ourselves with the postmaster process.
PostgreSQL was designed by a bunch of UNIX nuts familiar with the SysVinit system. All the OGs in the Linux community know this system as the way to start system services, also known as daemons. The idea is that a process will get kicked off by a system user, then that process will kick off another process and kill itself. This rather strange phenomenon is called "fork off and die."
But why would you do that? Well, any process that gets orphaned like that on a Unix system has to get included in the process list somehow. So, on Unix variants, it gets inherited by a process called "init." This "init" process ends up "owning" (that is, being the parent of) all of the system services running as daemons. The init process generally runs as root, the main system user.
PostgreSQL creates a system user called "postgres" during installation. When the time comes to start the service, the pg_ctl
process is invoked by the unprivileged system user "postgres" and passed some parameters for the location of the database, log files, etc. pg_ctl
then spawns off the postmaster process (comically called "postgres" in the system process list).
If you're counting, there are about five things in the PostgreSQL system named "postgres":
Welcome to open-source naming conventions (or lack thereof). The PostgreSQL Global Development Group is a loose set of volunteers. It is normal for the author to name each component they create. postgres
is another wonderful way developers named their own creations. The point is that you must watch out for context in the PostgreSQL world. In this article, we will be talking about the first three in the list and will attempt to be very explicit about which one at which point.
To make matters even worse, the PostgreSQL main process was, at one point, actually called postmaster
. A few revisions ago, in a fit of silliness at a bar after a convention, the team renamed it postgres
for maximum confusion. It would have been better to have named the system user "secret squirrel" and the process "postgresd," the PostgreSQL user alvin
, and the database nuts
.
This would have been more explicit, and this conversation would have been much simpler. But alas.
Back to the point here. pg_ctl
launches postgres
(the process, which the documentation calls "postmaster"). It launches it in the name of postgres
(the system user). Then pg_ct
` kills itself, leaving the postgres
process abandoned in the system. The postgres
process (again, also called "postmaster") gets inherited by init
, thereby becoming a system process, also called a "daemon."
In the final form, what we see in the process list using a utility like top or htop, is the postgres
process with a parent process of init
. All of the final processes are owned by root
.
Now, we finally get to the shared memory part of the article. The postmaster (postgres
process) then allocates all requested physical memory for the shared memory system in one lump. At least, it requests it that way. In developer terms, it is a single malloc() function call. The hope is that the memory it requests will be mostly contiguous with physical memory. This allocation makes searching the memory much more efficient.
The operating system is under no obligation to provide the memory in the requested lump. In fact, it usually doesn't. The default allocation method is to hand off a much smaller block while pretending it is the requested size. Then, allocate the actual memory later when it is used for the first time. This is called "overcommit" in the operating system jargon. It allows the operating system to pretend that it has a lot more memory than what is physically present.
This "overcommit" is not good for PostgreSQL memory management. For the sake of shared memory (and work_mem
, for that matter—more on that later), we would much rather get the whole memory block in a single piece, if possible. There is a great description of this in the PostgreSQL documentation. For that, we would turn off the "overcommit" for the operating system:
sysctl -w vm.overcommit_memory=2
The contents of a simple shell script that outputs some helpful information to see how the shared_buffers
are currently configured are appended to the end of this article. It is an expansion of the script provided in the documentation above for Ubuntu. Your system may vary slightly for the file system paths to the utilities.
The output should look something like this:
Data Directory: /var/lib/postgresql/14/main
Postmaster PID: 1262
PG Buffers: 145376K
Hugepagesize: 2048kB
Huge Pages Allocated: hugepages-1048576kB hugepages-2048kB
Of course, your actual values will be somewhat different.
Immediately following that explanation for "overcommit" in the documentation, there is a section about "Huge Pages." Huge Pages are a good thing. They allow 64-bit and above systems to allocate huge chunks of memory. This is particularly good for PostgreSQL systems.
There is a library for compatibility for 32-bit systems called "Transparent Huge Pages." This is not so good for PostgreSQL caching and memory management. It allows large blocks of memory to be accessed by dividing them into segments. This is exactly what we *don't* want.
#to see if they are on:
cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
The current setting is enclosed in square brackets. In this case, [always].
You can turn them off permanently by adding this to the grub kernel command line in /etc/default/grub:
transparent_hugepage=never
The result will look something like this:
GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet numa=off transparent_hugepage=never"
Which will take effect after the next reboot.
Or, you can temporarily disable this on the command like this:
sudo su -
echo never > /sys/kernel/mm/transparent_hugepage/enabled
These two settings alone will dramatically help shared memory usage.
If you run the pg_hugepages
utility, you'll notice that the changes only become active when the PostgreSQL service is restarted. This is because the memory is allocated when the postmaster process is launched.
cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
Notice that the square braces are now around the [never] setting.
shared_buffers
Let's get back to the part where PostgreSQL requests all of the shared memory in one lump. This happens when the postgres
process starts and only happens once. There is no way to modify (or even affect) the memory size after that initial allocation. If we have set the kernel parameters correctly to disable overcommit, we will see that the memory is now allocated to the postgres
process.
Unfortunately, this does not mean that the memory is being used efficiently. It just shows that PostgreSQL is in charge of it, namely the postgres
process. The memory may not be useful in any way at all.
That brings us to another PostgreSQL setting, shared_buffers
. This setting controls the amount of memory that PostgreSQL will request at startup. It understands unit measurements, so it may be set using standardized acronyms, such as "kB" or "MB" or, most interestingly in 2022, "GB."
Strangely, the PostgreSQL documentation states that:
If you have a dedicated database server with 1GB or more of RAM, a reasonable starting value for shared_buffers
is 25% of the memory in your system.
Well, uh, not really. Kinda. Sorta. Maybe.
The short answer is: "It depends." It depends on the workload. The long answer is that the next article will be about this subject. For now, we're just going to take the documentation at face value (normally a fine thing to do and will do no harm).
So, for the moment, let's take the total amount of memory, divide it by four, and use that as a starting point, just as the documentation suggests. The example system that this article is based on is 64GB/4=16GB.
Let's make this setting as safe as possible using an SQL command as the postgres
user.
ALTER SYSTEM SET shared_buffers TO '16 yadas';
ERROR: invalid value for parameter "shared_buffers": "16 yadas"
HINT: Valid units for this parameter are "B", "kB", "MB", "GB", and "TB".
Oh, we made a mistake, and PostgreSQL was smart enough to catch us at it.
That's great. Now let's do it for real this time.
ALTER SYSTEM SET shared_buffers TO '16 GB';
ALTER SYSTEM
That's better. As mentioned before, this setting will only become active after PostgreSQL is restarted.
sudo systemctl restart postgresql
pg_hugepages
Data Directory: /var/lib/postgresql/14/main
PG huges_pages: try
PG huge_page_size: 0
Postmaster PID: 5535
PG Buffers: 17171256K
Hugepagesize: 2048kB
Huge Pages Allocated: hugepages-1048576kB hugepages-2048kB
Transparent Huge Pages: always madvise [never]
There we go. Now the PostgreSQL shared_buffers setting looks like 16 GB of memory.
These settings are somewhat reasonable for a minimally configured system. There are still a lot of inefficiencies here. We will explore such inefficiencies in the next few articles. It is hard to put a value on how far we've come in configuration so far. The vagaries of load balancing, workload, and many other settings make it hard to guess. Making many assumptions about whether the shared memory would be useful, the system doesn't have other bottlenecks, etc., this configuration is still quite a bit better than no configuration at all.
But we won't leave it at that. Please stay tuned for further explanations and continued configuration details. In the next article, we'll discuss when caching doesn't help so much, and the proper setting is a very low amount of memory.
After that one, we'll explore how to tune the memory settings we just made and follow that up with an explanation of all the other systems that affect the performance of the buffers.
If you’re looking to expand your PostgreSQL scalability without the hand tuning mentioned in this article series, try our managed cloud service, Timescale. You will get all the PostgreSQL juice with extra features for time series (continuous aggregation, compression, automatic retention policies, hyperfunctions). Plus, a platform with automated backups, high availability, automatic upgrades, flexible resizing with autoscaling, and much more. You can use it for free for 30 days; no credit card required.
Shell script example to see how the shared_buffers
are currently configured:
#!/bin/bash
#===============================================================================
#
# FILE: pg_hugepages
#
# USAGE: ./pg_hugepages
#
# DESCRIPTION: Get the huge pages for PostgreSQL, compare to physical
#
# OPTIONS: Any options on the command line will be passed to psql
# REQUIREMENTS: ---
# BUGS: ---
# NOTES: ---
# AUTHOR: Kirk L. Roybal (), [email protected]
# COMPANY: Timescale
# VERSION: 1.0
# CREATED: 08/27/2022 05:11:49 PM WEST
# REVISION: ---
#===============================================================================
function die(){
retval=$1
shift
echo -e "$@"
exit $retval
}
pmap_opts=""
pmap="$(which pmap)"
[[ $pmap == "" ]] && die 3 "Can't find pmap" || pmap="$pmap $pmap_opts"
psql=$(which psql)
[[ $psql == "" ]] && die 4 "Can't find the psql utility."
sudo=$(which sudo)
[[ $sudo == "" ]] && die 5 "Can't find the sudo utility."
data_dir=$($psql $@ -qtAc "SELECT current_setting('data_directory');")
[[ $data_dir == "" ]] && die 1 "Can't find the data directory."
pg_huge_pages=$($psql $@ -qtAc "SELECT current_setting('huge_pages')")
pg_huge_page_size=$($psql $@ -qtAc "SELECT current_setting('huge_page_size')")
pg_pid=$($sudo head -n 1 $data_dir/postmaster.pid)
[[ ${pg_pid// /} == "" ]] && die 2 "Can't get the postmaster process id"
echo -e "Data Directory: $data_dir"
echo -e "PG huges_pages: $pg_huge_pages"
echo -e "PG huge_page_size: $pg_huge_page_size"
echo -e "Postmaster PID: $pg_pid"
pg_buf=$($sudo $pmap ${pg_pid})
[[ $pg_buf == "" ]] && {
echo -e "$sudo $pmap ${pg_pid}"
$sudo $pmap ${pg_pid}
die 3 "Can't figure out the pg memory allocation"
}
pg_buf=$(echo -e "$pg_buf" | awk '/rw-s/ && /zero/ {print $2}')
echo -e "PG Buffers: $pg_buf"
grep ^Hugepagesize /proc/meminfo | sed -e 's/[[:space:]]*//g' -e 's/:/: /'
echo -en "Huge Pages Allocated: "
ls /sys/kernel/mm/hugepages/
echo -en "Transparent Huge Pages: "
cat /sys/kernel/mm/transparent_hugepage/enabled
```