Thursday, February 20, 2014

Hibernate gotchas: ordering of operations

Grails / GORM was throwing a Hibernate error from time to time:

org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1

What was confusing was that there was no update anywhere in the code in question, which was a queue runner. The answer to what was causing this was interesting, and says a lot about Hibernate and its (in)capabilities.

The error in question was being thrown by transaction flush in a queue runner. The queue is in a Postgres database. Each site gets locked in the Postgres database, then its queue run with each queued-up item in the Postgres database deleted after it is processed, then the site gets unlocked.

The first problem arose with the lock/unlock code. There was a race condition, clearly, when two EC2 instances tried to lock the same site at the same time. The way it was originally implemented was with Hibernate, the first would create its lock record, then flush the transaction, then re-query to see whether there was other lockers with a lower ID holding a lock on the object. If so, it'd release the lock by deleting its lock record. Meanwhile the other instance finished processing that queue and released all locks on that site. So the first instance would go to delete its lock, find that it'd already been deleted, and throw that exception.

Once that was resolved, the queue runner itself started throwing the exception occasionally when the transaction was flushed after the unlock. What was discovered by turning on Hibernate debug was that Hibernate was re-ordering operations so that the unlock got applied to the database *before* the deletes got applied to the database. So the site would get unlocked, another queue runner would then re-lock the site to itself and start processing the same records that previously got processed, then go to delete the records, and find that the records had already been deleted out from under it. Bam.

The solution, in this case, was to rewrite the code to use the Groovy SQL API rather than use GORM/Hibernate.

What this does emphasize is that you cannot rely on operations being executed by Hibernate in the same order in which you specified them. For the most part this isn't a big deal, because everything you're operating on is going to be applied in a reasonable order so that the dependencies get satisfied. E.g. if you create a site then a host inside a site, the site record will get created in the database before the host record gets created in the database. But if ordering matters... time to use a different tool. Hibernate isn't it.

-ELG