Glorp and #commitUnitOfWorkAndContinue

[Please note: This post has been updated on Sept. 13th, 2012. The solution that I suggested in the original version of this post completely messes following commits for objects that got read from the DB before this commit and are changed later. Consecutive commits simply ignore changes to such objects. You can visit the discussion thread on Google Groups to see why and what all of this evolves to. The problem, however, is not trivial in situations where objects get deleted and should cascade their deletion to dependent objects. I do suggest a change to Glorp in that thread and am currently waiting for Alan to comment on it. ]

I just spent a bit over a day hunting some strange problem in a Seaside-based Web Application that uses VA Smalltalk and Glorp. The problem was that it was almost impossible to do a clean delete of an object that resides in two OneToManyMappings.
First, let me mention that our application was prototyped on top of an object database and it ran just fine, so it was not an application issue.

The object has some removal code that removes the object from the collections in the objects on the Many-side of the relationship. We considered this business code, and therefor just left it in place and added a

self session delete: theObject.

to the deletion method.

We thought all was running well for quite a while, until we realized we have lots of undeleted objects in the DB tables which have a foreign key of NULL. So the application was not really affected by the problem, but still we’d like to not collect too much rubbish in our DB. Once we’re live, we want to keep things small and manageable, and data garbage is not a good thing to start with.

So what exactly happened? We had no idea until we tried deleting objects and fired off our debuggers. At first, all seemed fine: a Transaction would be started, the objects deleted and then the transaction would be committed. But still, after a while, we’d find new corpses in the table (in fact we had more tables that were affected).

So what the heck was going on? By accident I saw that teh very next transaction after the delete would re-INSERT the objects with foreign Keys set to NULL. The following transaction didn’t really have to deal with the kind of Object we had deleted, it could be completely unrelated, just run in the same GlorpSession. Uhh!

Then followed a day of playing around, debugging and wondering.
We tried leaving our deletion-business code out of the game, maybe that would somehow lead to cache entries that need updateing with SET NULL for the foreign keys (at some stage during our search we saw this exact behaviour), but this would not be what we were looking for. The best thing we could achieve was that the objects got deleted in the DB but would still be present in the Smalltalk-collections of our business objects. This leads to interesting effects in later updates, of course 😉

Then we tried refresh:ing all objects that hold the deletion candidates in collections. To no avail.

So it was time to debug Glorp a bit harder. Not a fun job, but a good exercise in a lot of ways.

What we found was that the deletion code removed the objects from the session subcaches, but after the Transaction was committted and all seemed fine, the objects were back in the caches. Hughh???

It took another few hours to understand the problem a bit better. The reason it took so long is that debugging some Glorp calls freezes the image. You can break out of that by clicking the little break button, but you’d havce to start over and set your breakpoint after the statement that was hanging.
It seems we now know the cause of the problem. It seems to be GlorpSession>>#commitUnitOfWorkAndContinue.

commitUnitOfWorkAndContinue
   "Commit the current unit of work, but then keep going with the same set of registered objects, with their state updated to reflect current values."
   | registeredObjects |
   currentUnitOfWork isNil ifTrue: [^self error: 'Not in unit of work'].
   registeredObjects := currentUnitOfWork registeredObjects.
   currentUnitOfWork deletedObjects do: [:each | registeredObjects remove: each].
   self commitUnitOfWork.
   self beginUnitOfWork.
   self registerAll: registeredObjects.

This method tries to speed up committing and starting a new transaction, but it fails in cleaning up the objects that it reregisters. The deleted objects themselves get removed, but not the collections holding them. So the call to registerAll: at the end of the method re-registers the collections that reference the deleted object and as a logical consequence also reregisters the deleted objects as new. So after a commitUnitOfWorkAndContinue, the deleted objects are back in the caches and want to be inserted next time a transaction is committed.

Maybe I should mention that we had this issue with the version of GLORP that ships with VA Smalltalk 8.5.0 and I haven’t taken the time to check whether newer versions of Glorp have the same issue. The Implementation of #commitUnitOfWorkAndContinue, however, is still the same in VisualWorks 7.9, so I guess the problem persists.