I’ve been hunting for some strange problem which I thought was related to Glorp or (less likely) our business code for months.

Every once in a while, our production server would log strange NOT NULL problems on inserts of records that do not erectly resolve their foreign keys, even though the objects these records usually refer to can only be persistent objects that have been in the database long before the current Glorp session was started.

Needless to say that I couldn’t ever reproduce the problem on one of our test or development machines. The business code was absolutely safe and there was no way you could produce these objects in the application.

There was, however, one little hint: This problem only occurred on our production server and it would always be the very same action in the application. We never get these errors anywhere else than in one particular form callback.

Suspect…

We still had Glorp under suspicion when we introduced loads of logging and other output to our production machine, just to understand what was going on. The good news: Glorp would raise an exception, the server would correctly roll back and inform the users of a strange NOT NULL problem on inserts. So no harm was ever done to production data (jeez, am I glad we’re using an SQL DB with ACID compliance!).

The bad news: we still couldn’t see what is going on. Glorp would do two inserts in very short time: one where all the id instvars of the newly to be inserted objects are nil, and one with these IDs being numbers.

Wait: the same objects would be inserted twice? Once without an id on the Smalltalk side, and then with IDs? The normal behavior would be that there is only the INSERT with ids all being nil, because DB2 would assign the ids during the insert (well, it’s more complicated than that, but this is irrelevant for our story at hand).

So there it was: the clear proof that Glorp, for some reason, once in a few hundred or thousand cases would decide to INSERT objects twice in one transaction and fail miserably.

But why would it do that? If the mappings were wrong, this would happen each and every time the new objects are created (and these objects are being created lots of times a day on our server, because these are accounting entries, and that is what the whole point of the system is about…).

It was Alan Knight again who came up with the correct hint, but I didn’t understand it first: There might be a race condition hidden somewhere. But how on earth would that ever be possible?

…and how it was completely wrong

To make a long story a bit shorter: One day, when I angrily hammered on the dialog using IE (which is a seldom coincidence, because I usually use Mozilla for development and testing), I could reproduce the problem. It took me a while to understand how I did, but I did manage to make the error pop up right in my VA Smalltalk debugger: ExGlorpWriteFail!

After another 30 minutes of trying to reproduce the error once again, I finally found it: And you won’t believe it.

It’s a double submit problem that has to do with the way IE handles for submits. If the submit of a form takes longer than you can click the submit button twice or press the Enter Key twice, IE (and Safari, I found) submits the form twice.

So what happens is that users submit a form, Glorp inserts objects, and immediately in the same session inserts the same objects again because it hasn’t updated its caches yet, but now these objects – even though they are still new from Glorp’s perspective, have IDs, but not all references and stuff have been updated to reflect the new foreign keys and such. So before Glorp is done inserting objects and committing a transaction and updating its session cache, the double submit from the Browser added these objects again to the objects to be inserted and tried to add them to the list of operations to perform in the same transaction.

The funny thing here is that this does not happen in all Browsers, and maybe nobody ever had the idea to press Enter twice in our form, but our customers obviously did.

So I ducked the web for possible solutions to the problem and here is what we did to solve this: We added a little jQuery Plugin that gets bound to the form. It avoids re-submits within one second (which seems to be enough) (I found this on stack overflow)

$.fn.preventDoubleSubmit = function() {
  var last_clicked, time_since_clicked;
  $(this).bind("submit", function(event) {
  if(last_clicked) {
    time_since_clicked = jQuery.now() - last_clicked;
   }
  last_clicked = jQuery.now();
  if(time_since_clicked < 1000) {
    // Blocking form submit because it was too soon after the last submit.
    event.preventDefault();
    }
  return true;
  });
};

 

So far, the fix seems to work with current versions of Firefox, Internet Explorer,  Safari, Opera on both Mac and Windows.

Isn’t it funny how some problems turn out to be caused by very unexpected things? And how much time and energy it can take to follow completely wrong paths to find the cause of a problem?

 

5 responses to “When hardcore errors with double Glorp INSERTs turn out to be ancient web problems”

  1. Esteban Avatar

    Joachim:

    Last week I had an error very much like yours, but instead of having a browser performing a double “split-second” submission it was an Android app (I was developing) invoking the submission twice.

    Because on the Android side everything was asynchronous (and parallel) it was hard to spot.
    I lost an entire morning dealing with that, until I spot my stupid error on the client side.

    Like

    1. Joachim Avatar

      Esteban,

      good to know it’s not only me. It becomes even harder once you start stumbling form one of these into the next. These are the days when I think I might have chosen the wrong job and dream of being a hotel receptionist or gardener 😉

      Like

      1. Esteban Avatar

        There is a small a comeback to RDBMS. So there’s going to be more people sharing experiences. I know about a couple of folks developing their systems with GLORP persistence.

        There’ll be time to do gardening 😉

        ps: I posted a new question to the mailing list, I hope you can help me there.

        Like

  2. FDominicus Avatar

    Well in aftersight you can find it funny. But while working on it you could desparate. Recently I worked on an improvement which turned out to make things much worse. I wasn’t happy, and believe me it was no fun at all. Till one day I got it an guess what that seems to work. And it’s much less error prone, and even better gives the expected results. Anyway that took the better part of 3 weeks. And that is so expensive…..

    Like

    1. Joachim Avatar

      Hi Friedrich,

      well, in the end it’s not funny at all. Having an error popping up in your most important dialog from time to time and not having a slightest idea why it happens is not funny if you need to convince people to use your software. So the only funny thing is that – as you say – in hindsight, it is funny I never came to the right conclusions and made such a long journey around an obvious problem that hundreds or thousands of people have encountered and solved before.
      One thing I keep thinking when I lean back after such an experience is that I have learned something and maybe will be able to solve this in no time.

      Joachim

      Like