Nov 28, 2024
Posted by
Aleksander Alekseev
Contributing to open-source projects can be intimidating â and PostgreSQL is no exception. As a long-time PostgreSQL contributor, Aleks shares his hard-earned tips to help you make your first contribution, or start contributing more.
PostgreSQL is one of the most popular and loved databases in the world. Itâs no secret that we are big fans of PostgreSQL at Timescale: Weâve built TimescaleDB on top of it, we employ open-source PostgreSQL contributors (like me!), and weâve developed features to make using PostgreSQL better for time-series scenarios (like Skip Scan, which makes certain queries in PostgreSQL 8000x faster). But, in addition to helping improve the database itself, weâre committed to the success of the PostgreSQL community at large.
Open-source is not just my passion; itâs my career. Iâve been a PostgreSQL contributor since 2016, and recently joined Timescale as a full-time open-source PostgreSQL contributor. Iâve contributed not only to PostgreSQL but also to Insolar, Sigrok, and other open-source projects. Iâm the author of pg_protobuf and ZSON extensions for PostgreSQL and several open-source libraries for STM32 microcontrollers.
I love open-source because it enables us to see whatâs inside the software, learn from it, and improve it. The quality standards are higher in open-source software than in proprietary software because you canât hide any cut corners. Last but not least, open-source software canât refuse to sell or prolong your license because of geopolitical events or whatnot. (I encountered this at least twice in my career.)
Which brings me to the impetus for this post. Earlier this year, we ran the âState of PostgreSQLâ survey to learn how people use PostgreSQL, from their community experiences to popular tools and areas to improve.
You can see the State of PostgreSQL 2021 report to explore all findings and trends â but one result stood out for me:
85% of respondents havenât contributed to PostgreSQL codebase, docs, or commitfests, and only 4% have contributed several times.
The survey also highlighted several places where we, as a PostgreSQL community, can be more welcoming to new developers to help them use and contribute to PostgreSQL.
For example, one respondent said: âFirst code contributions can be traumatic...sometimes weâre not very welcome [sic] with new developers. We should improve...â
That got me thinking about how we can make it easier for folks to overcome the initial fear and other barriers - be it technical difficulty, confusing processes, or lack of information - that often surround contributing to an open-source project. After all, we want more people to be a part of the PostgreSQL community and to make contributions; thatâs how we make it even better.
To help more people get started, I wanted to share my observations, what Iâve learned over the past 5+ years, what I wish I knew when I started, and advice I typically give new contributors.
In my experience, little depends on the specifics of the project. So, while Iâll use PostgreSQL-specific examples, the following guidance is quite universal, whether you want to contribute to PostgreSQL for the first (or second, or third) time â or have another open-source project in mind.
I also included a few ways to give back to the community or help a project grow beyond code contributions: the important, yet easily overlooked, elements of building a sustainable, healthy open-source community.
One of the most important questions to ask is: âWhy do you want to be an open-source contributor?â Unless you recognize and understand your motivation, it will be difficult for you to find time for the project, especially as time goes on.
Here is a list of potential reasons why you might want to start working on an open-source project:
To gain a unique experience: If you're a backend developer who's been writing microservices for a while, you might look for a new challenge. Open-source software presents many (many!) such challenges and new technologies to learn.
To learn the internals of your favorite operating system/ database/ language/ compiler: Understanding the internals of your favorite open-source project allows you to use it more efficiently and to learn its limitations. As an example, not many users know that running SELECT queries may cause writes to the disk by PostgreSQL. Or that creating multiple temporary tables may significantly affect the performance of the entire database. Or that synchronous_commit = remote_apply
doesnât actually wait for replicas before committing the transaction. (The transaction is committed instantaneously. The user is just not notified about this, which may cause problems.)
To work with great people: Open-source attracts some of the most talented people from around the world. There is always something you can learn from them, big or small. The original idea of the ZSON extension came from Alexander Korotkov and Teodor Sigaev, both PostgreSQL committers I was lucky to work with. ZSON is now the most popular project I have on GitHub (390+ stars at the moment of writing) â and there is a possibility that it will be shipped with PostgreSQL by default.
To make users happier: Letâs say you contributed several lines of code and, as a result, made PostgreSQL 10% faster in some scenarios. PostgreSQL is used by thousands of companies whose products are used by millions of customers. Itâs satisfying to realize that your small patch made all of these users just a little bit happier, even if you might not explicitly hear from them.
To boost your resume: Itâs natural to seek a job that better suits you. Several years of contributing to a well-known open-source software will open new doors for you, from more technical experience to connections with various community members who you may wind up working with later.
To be fair, there are probably dozens of other reasons why you might want to start contributing to an open-source project. This list isnât exhaustive, and each case is unique, but I tried to distill a few of the reasons I see again and again.
Once youâve taken some time to reflect on and establish why you want to contribute, the next step is to familiarize yourself with the projectâs development process.
Before starting the work on a new patch, there are several things to learn about the project:
You can usually find this information, or most of it, in the projectâs GitHub README or somewhere in the documentation. For PostgreSQL, look at the installation docs.
PostgreSQL is written in C, uses the GNU Autotools build system, and relies on Perl scripting language for testing and SGML for documentation. It uses Git as the version control system, and the repository itself is self-hosted (although there is a mirror on GitHub).
PostgreSQL can be compiled and tested like this:
git clone http://git.postgresql.org/git/postgresql.git
cd postgresql
./configure --prefix=/home/user/pginstall --enable-tap-tests --enable-cassert --enable-debug
make world
make check-world
The details are a little bit more complicated, though. Firstly, you have to install several dependencies, which is done differently depending on the operating system.
For instance, on Ubuntu 20.04 LTS you will need:
# for basic build
sudo apt install gcc make flex bison libreadline-dev zlib1g-dev
# to build the documentation as well
sudo apt install docbook docbook-dsssl docbook-xsl libxml2-utils \
openjade opensp xsltproc
Secondly, there are several common mistakes that you can make, e.g., forgetting to run make distclean
after changing the header files.
There is a set of scripts on GitHub which will help you to avoid these mistakes.
Here is how to use it:
# where to install Postgres (you can add this line to your ~/.bash_profile)
export PGINSTALL="/home/user/pginstall"
# build and test Postgres
./full-build.sh
# install it to $PGINSTALL
./single-install.sh
# execute a little more tests on running Postgres
make installcheck-world
# check the documentation:
open ~/pginstall/share/doc/postgresql/html/index.html
Additionally, the PostgreSQL community uses mailing lists as the main communication channel for discussions and submitting and reviewing patches.
To get an idea of the types of messages, subscribe to pgsql-hackers@ (be aware that there are many messages per day on this mailing list). Two other important mailing lists are pgsql-general@ and pgsql-bugs@, and there are many others for assorted topics.
(As an aside: in the State of PostgreSQL survey, in going through the anonymized survey source data, a number of responses mentioned that the mailing lists werenât the friendliest way to track bugs and may be a barrier to getting involved. For example, âMailing lists are considered "hard" by people nowadays. Not the most welcoming interface for interacting with the community for quite a large number of people I'd expect.â)
After youâve gotten familiar with the development process, itâs time to start thinking about ideas for your first patch.
Your first patch doesnât have to be anything fancy. Here are several examples that I think make good first patches, both for PostgreSQL and as general places to start:
Find and fix mistakes in comments and documentation. Start with something simple. In my experience, projects are bound to have typos and mistakes in the code comments and in the documentation. Use your favorite text editor with a spell checker to find them. This is a really great place to start and overcome the fear of contributing: that there is no risk of breaking anything. Interestingly, this is exactly how I submitted my first (and so far the only) patch to the Linux kernel. (My first patch to PostgreSQL was rather complicated and thus not very representative.)
Participate in code review, testing, and discussions. Many software developers like to write the code, but few like to review and test it. One of the most valuable contributions one can make to PostgreSQL is being a reviewer. As a reviewer, your primary task is to check that the patch compiles, passes the tests, implements the claimed functionality, and includes documentation. And, of course, itâs worth checking that it doesnât have any obvious bugs.
Find and fix a bug. Check the bugtracker, or, in the case of PostgreSQL, check the archive of pgsql-bugs@ mailing list. Try to reproduce the bug. If it doesnât reproduce, it might already be fixed by another patch, or maybe the steps to reproduce it arenât very clear. In any case, reply to the mailing list to let the community know what you find. If you managed to reproduce the bug, you are lucky; from there, write a corresponding test and change the code so that it passes the test.
Find a bottleneck and optimize the code. After using a piece of software for a while, you discover cases when its performance is far from ideal. Use suitable tools (e.g., perf and eBPF) to find the bottleneck and then eliminate it. Before submitting the patch, make sure it doesnât cause performance degradation in some other scenarios.
Write tests. Use a suitable tool to test code coverage. For PostgreSQL (C or C++), that tool would be lcov
. With a code coverage report on hand, write a test that increases the code coverage.
Improve documentation. Ideally, documentation is structured in a way that allows users to download it as a PDF and read it like a book. PostgreSQL documentation is quite good in terms of covering many - many! - topics, but could benefit from more experienced/dedicated technical writers (e.g., people who could add more sample scenarios and illustrations to help new users understand concepts). With PostgreSQL, there are 71 chapters and 2300+ pages in total, and the pages mostly describe configuration parameters and query syntax vs. examples of how to solve concrete tasks. The FreeBSD Handbook comes to mind as a good âread it like a bookâ example. Â
Refactor the code. Refactoring has a clear goal: you rewrite the code so that it does the same thing but is more readable. Itâs worth noting that sometimes the PostgreSQL community can be a little skeptical about the value of such patches. The reason is that the community supports the last 5 major releases of PostgreSQL, so refactorings can make backporting of bugfixes more complicated.
I recommend accumulating small wins - by submitting several âfirst patchâ-like contributions - before you move on to more ambitious patches. This will help you get familiar with the contribution process, tools, and community â not to mention increasing your confidence.
Once youâve settled on an idea for your first patch, youâre all set to go ahead and start contributing! (I recommend reading the following section about common mistakes to avoid first đ).
There are a few common âmistakesâ people make when joining an open-source project, so Iâve compiled the following pieces of advice to help you to avoid some of these mistakes.
Thus far, Iâve focused my discussion on contributing patches with new code or bugfixes. But, submitting patches is not the be-all and end-all of contributing to open-source projects.
Next, weâll look at how to contribute to open-source projects and the surrounding community without writing code.
There are many ways to contribute to a project besides writing the code and documentation â and these non-code contributions are invaluable.
Here are several of my favorite ideas, although the list does not claim to be complete:
Help newcomers. There are always people who recently started to use a given open-source project (for reference, in the State of PostgreSQL survey, almost 50% of respondents said they were new-ish to the project, with 0-5 years experience.) Usually, there is a mailing list and/or Slack where they can ask questions. Join the corresponding channel and help newcomers.
Participate in conferences. Make a presentation on something you used, learned, or have been working on lately. Share the knowledge. For PostgreSQL, there are several popular conferences, like PGconf.asia, Postgres London, and PGconf.us, as well as many local meetups.
Create a blog / podcast / YouTube channel. This article, for instance, can be considered as a small contribution to open-source. Make sure your blog, podcast, etc., is added to the prominent community news aggregator(s), so people can learn about it. For PostgreSQL, this is PostgreSQL Planet.
Write a book. Writing a book is an ambitious and very time-consuming goal, but there are many ways to do so. For example, Manning is a publisher well known for helping new technical writers to publish their first book; or simply make a PDF in Google Docs and distribute it for free.
Participate in Google Summer of Code or Google Season of Docs. Google Summer of Code (GSoC) is a program focused on bringing student software developers into open-source development. Participate as a student or as a mentor. If you are a technical writer, consider participating in Google Season of Docs (GSoD). See GSoC and GSoD pages on PostgreSQL Wiki for more information.
Donate hardware for CI system. CI stands for continuous integration â and in the PostgreSQL world, itâs called Buildfarm. The community is interested in adding unusual platforms or combinations of architecture, operating system, and compiler to the Buildfarm. As an example, currently, there is no server with RISC-V architecture. RISC-V is an open instruction set architecture (ISA) that gets support from many leading hardware manufacturers, especially after the discovery of Meltdown and Spectre vulnerabilities. See Application to join PostgreSQL Buildfarm for more details.
The following additional materials are recommended for self-study in the context of contributing to PostgreSQL:
This post is merely a collection of advice, best practices, and various other things Iâve observed over the years to help would-be contributors make the jump from ânever contributedâ to âcontributed once or twiceâ (and, ultimately, hopefully some make it to âcontributed many timesâ).
There are many more technical and community experience topics that I didn't cover here. If you have something youâd like to read more about (e.g., debugging, profiling, and benchmarking PostgreSQL, or maybe about writing extensions), reach out to let me and the team know: [email protected].
Weâre also looking for ways to help the community, share knowledge, and contribute to things community members are already working on.
Lastly, Iâd be remiss if I didnât mention that weâre hiring across multiple teams  đ. TimescaleDB is the leading open-source relational database for time-series data. Itâs packaged as a PostgreSQL extension (an extension like CREATE EXTENSION
, not a fork, nor a set of patches).
If you know C and SQL, have experience with PostgreSQL, and want to be a full-time database developer, I encourage you to consider joining Timescale. Timescale is a remote-first company, with people located on all continents (except Antarctica â but if thatâs you, weâre happy to outfit your home office with a space heater).
If youâd like to discuss technical topics with me, Timescale engineers, and other developers and community members, you can find us in the TimescaleDB Slack (8K+ members).
Happy contributing đđ