Wednesday, May 15, 2019

"But how can I be waiting on a lock when I'm writing an entirely different table?"

Anybody who has run long transactions on PostgreSQL has run into situations where an update on one table, let's call it sphere_config, is waiting on a write from another table, let's call it component_state, to complete.

But how, you ask? How can that be? Those are two different tables, there are no rows in common between the two writes, thus no lock to wait upon!

Well, here's the deal: The transaction that's writing component_state *earlier* wrote to sphere_config. But it's still chugging away doing writes to component_state.

But wait, you say, yes, that transaction wrote to sphere_config, but wrote to records that are not in common with my transaction, so what gives? Well, what gives is this: Uniqueness constraints. Transaction A grabbed a share lock on sphere_state when it wrote to sphere_state because there's a uniqueness constraint. Until transaction A, that's writing to component_state, finishes chugging away doing its things and COMMITs its transaction, its writes to sphere_config aren't visible to transaction B, your transaction that's trying to write to sphere_config. After all, it might ROLLBACK instead because of an error. Thus your transaction B can't update the row it's trying to update because it doesn't know yet whether it would be violating a uniqueness constraint because it doesn't know yet whether transaction A wrote that same value (because transaction A hasn't committed or rolled back yet).

Now I hear you saying, "but transaction A is operating on a completely different subset of the table, there's no way that anything it's writing can violate uniqueness for transaction B." Well, Postgres doesn't know that because of the way Postgres time travels. So when transaction B goes to write to that table, it tries to grab a share lock on it... but can't, because transaction A still has it. Until transaction A finishes, transaction B is stuck.

So, how can you deal with this? Well:

  1. If you know you're never going to violate the uniqueness constraint, remove it.
  2. If you need the uniqueness constraint, try to break up your big long transactions into smaller transactions. This will clear those locks faster.
  3. Make sure your transactions get terminated by either a COMMIT or a ROLLBACK. Unfortunately some programs in languages that do exceptions have very bad error handling and, when an exception occurs that terminates execution of the database trasnaction, don't clean up the database connection behind them by issuing a ROLLBACK to the database session. Note that most people are using a connection pool between themselves and Postgres, and that connection pool may not automatically drop a connection when the program exceptions. Instead, the still-open connection (still unterminated by a ROLLBACK) may be simply put back into the pool for re-use, keeping that lock open on the . So: Try your darndest to use scaffolding like Spring's Service scaffolding that will automatically roll back transactions when there is an uncaught exception that terminates execution of a transaction. And do a lot of praying.
  4. If you are absolutely positively sure you will never have a long-running transaction, you can add a idle_in_transaction_session_timeout (integer) parameter to your postgresql.conf . Be very careful here though, if your connection pool doesn't do checking for dropped connections you can get *very* ugly user interaction with this!
That's pretty much it. Uniqueness checking in a time-traveling database like Postgres is pretty hard to do. In MySQL, if you try to write a duplicate into the index, it will collide with the record already there and your transaction will fail at that point, but Postgres doesn't ever actually update records, it just adds a new record with a higher transaction number and lets the record with a lower transaction number eventually get garbage-collected once all transactions referring to it are finished executing. This allows multiple transactions to have multiple views of the record as of the time that each transaction started, thus insuring chronological integrity (as well as adding a potential for deadlock but that's another story), but definitely makes uniqueness harder to accomplish.

So now you know how your transaction can be waiting on a lock when you're writing an entirely different table -- uniqueness checking. How... unique. And now that you know, now you know how to fix it.

-ELG