Hello everyone,
I am seeking advice on a performance issue I’ve been experiencing with AWS RDS Postgres. Specifically, I’m noticing a substantial delay when inserting a large number of rows (around 3 million) into a database.
The structure of my tables is as follows:
CREATE TABLE table_a (
field_a varchar,
field_b integer,
field_c date,
primary key (field_a, field_b, field_c)
);
CREATE TABLE table_b (
field_a varchar,
field_b integer,
field_c date,
primary key (field_a, field_b, field_c)
);
When I perform different insert operations, here’s what I observe:
- Without primary key/index, executing a simple insert:
INSERT INTO table_b SELECT * FROM table_a;
Execution time is 909.044 ms.
- Without primary key/index, insert with NO CONFLICT:
INSERT INTO table_b SELECT * FROM table_a ON CONFLICT DO NOTHING;
Execution time is 919 ms.
- With a primary key/index, executing a simple insert:
INSERT INTO table_b SELECT * FROM table_a;
Execution time is 6188 ms.
- With a primary key/index, insert with NO CONFLICT:
INSERT INTO table_b SELECT * FROM table_a ON CONFLICT DO NOTHING;
Execution time jumps to 14287 ms.
The most significant delay appears when inserting with a primary key and using ON CONFLICT DO NOTHING. I’m puzzled why this operation takes almost 14 seconds.
So, I would like to ask if there are any strategies or solutions to improve this performance? We need to use a primary key with the condition ON CONFLICT DO NOTHING. Also, I am considering TimescaleDB as a possible alternative, given its touted performance benefits. Would this be a viable solution to my problem?
I appreciate your time in helping me understand this better.
Thank you.