PC

PostgreSQL Deadlock with concurrent inserts

February 12, 2023

I'm finally taking the time to write about this because I've found that the problem was interesting when I was debugging it.

A couple of months ago, I was seeding some random data into a table for a personal project of mine (Guesswho App), my seed script was fairly simple but was using two processes to insert data in parallel.

The script looked like this:

function seedData(csvReader) {
  db.connect().execute(async (db) => {
    for await (rows of csvReader.take(1000)) {
      await db.insert("features")
        .values(rows.map(r => r.name))
        .onConflict(t => t.doNothing());
    }
  });
}
 
const csvReader = createCsvReader(csvFilePath);
 
// Start 4 "tasks" to insert data in "parallel"
Promise.all([
  seedData(csvReader),
  seedData(csvReader),
  seedData(csvReader),
  seedData(csvReader)
]);

The first time I ran the script everything went well, but the second time the script hanged and ended up telling me that there was a deadlock. I was pretty surprised because I was not expecting this to happen.


Debugging the deadlock

Postgres is a nice great database, it doesn't wait indefinitely, not just tell you that you have exceeded the lock timeout. Instead it precisely tells you that there is a Deadlock.

A deadlock error looks like this:

ERROR:  deadlock detected
Process 50659 waits for ShareLock on transaction 1287723; blocked by process 49703.
Process 49703 waits for ShareLock on transaction 1287724; blocked by process 50659. 
 
SQL state: 40P01
Detail: Process 50659 waits for ShareLock on transaction 1287723; blocked by process 49703.
Process 49703 waits for ShareLock on transaction 1287724; blocked by process 50659.
Hint: See server log for query details.
Context: while locking tuple (0,1) in relation "features"

It turned out that the process 50659 and 49703 were actually both performing the INSERT INTO ... ON CONFLICT operations during the deadlock.

Digging deeper: ON CONFLICT DO NOTHING

Let’s take a moment to look at what ON CONFLICT DO NOTHING does under the hood.

When you run:

INSERT INTO features(name) VALUES ('height', 'eye_color')
ON CONFLICT DO NOTHING;

PostgreSQL has to:

  1. Check if a row with that conflicting value already exists (e.g., a unique constraint on name)
  2. Lock that row or the index entry that would correspond to it
  3. Decide whether to insert or skip

Turns out that if at the same time you run:

INSERT INTO features(name) VALUES ('eye_color', 'height')
ON CONFLICT DO NOTHING;

with the two same values but in a different order, you you can get a deadlock!

One transaction would lock height and the other would lock eye_color and then they would wait for each other to finish.


Solution

When dealing with deadlock, the solution is often the same: Make sure that locks are always taken in the same order.

In our case, we can simply sort the values before inserting them.

function seedData(csvReader) {
  db.connect().execute(async (db) => {
    for await (rows of csvReader.take(1000)) {
      await db.insert("features")
        .values(rows.map(r => r.name).sort())
        .onConflict(t => t.doNothing());
    }
  });
}
 
const csvReader = createCsvReader(csvFilePath);
 
// Start 4 "tasks" to insert data in "parallel"
Promise.all([
  seedData(csvReader),
  seedData(csvReader),
  seedData(csvReader),
  seedData(csvReader)
]);