This is a little lesson I learned with my VAST-based Seaside Application on a headless Linux Server that is probably not really a gem in every Seasider’s toolbox, but the initial problem cost me quite a few nerves.
Let me start by explaining my workflow for a new Image Version:
- Use XD Packaging to build a headless VAST image with Seaside
- Test it locally on my Kubuntu 12.10 32 bit VM
- SFTP it to the test or production machine
- Kill the running image
- Restart the vm and image (I am working on starting it as a daemon, more on that in another post any time soon)
- Make a few short tests if it’s the production server or test as hard as possible if it is the test stage
- Leave the server alone and hope for paying customers (the application is not open to the public yet – although I had hoped to go public months ago)
So let’s pretend we forget step 4 before starting the new image. The old application is listening to a TCP/IP port and the new image tries to grab this port and listen to it from now on.
You’d expect the new image to tell you that it couldn’t setup its machinery (namely SST and a WASstServerAdaptor) by throwing an exception, so that you immediately know it didn’t start correctly.
Well, that’s what’s NOT happening. Old and new image seem to coexist in peace and harmony. In fact, the new image seems to think it has to be respectful and leave the field to the old guy who has so much experience with what it does. At least for now. Or maybe forever. Well, actually forever, because I never told the new image to try again any time later.
I had thought I was clever and had implemented the server startup method like this:
startServerAdaptor [ (WASstServerAdaptor port: self port) start. EsLogManager info: 'WASStServerAdaptor started on port: ' , port asString] on: Error do: [:ex | EsLogManager error: 'Seaside Adaptor couldn''t start due to: ' , ex description. ex pass].
And never worried about it again. Since most of the time I followed my workflow described above, all was fine.
Until one day I forgot step 4 and somewhat forgot about having forgotten it. The only thing I realized was that my application just refused to accept the bug fix I had added. It insisted on the old way of doing something wrong. On my local Kubuntu machine, the bug was fixed, but I double checked the .es files and the Package Contents Browser to see if the new methods really had been packaged.
It also ran on the test server.
Unfortunately, I came to the conclusion that there must be some tiny and surely tricky difference in the data on my test and production server. Something that even my bug fix didn’t handle. I really had to take a deep look at my test data and try to isolate the bug in this very special situation where the bug fix still wouldn’t work.
I won’t go into more detail about the things I did then. I’d turn red and to be honest, I don’t want to remember the details any more….
At one point in time (I guess it was in the bathroom or when I tried to fall asleep later that day or when I tried to fight the bug with eben more sugar) I decided to look at the problem from another perspective: maybe I was talking to the wrong image, even though my server startup code was so clever…
netstat -lnptu told me there was only one process listening to the port of my application. And that process was called esnx. All was good. Except for the fact that there were multiple processes running (ps aux | grep esnx). I knew that each image startup starts two or sometimes three es(nx) processes, but not more than 4. I was finally close to the solution: I had been talking to the old image all the time!
So what was happening:
The new image started, set up a WASstServerAdaptor that didn’t actually run, connected to the database, logged that all is good and it’s ready to answer requests. Since both old and new image logged to the same file (what a stupid mistake, btw.) everything looked good in the logs, the requests that were answered got logged as well, so from the log you could’t tell what’s happening. But it actually never really answered to http requests, because the already running image handled all of these. What a mess!!!
And all of that because the WASstServerAdaptor doesn’t throw an Error if it can’t start!
But there is a way to ask an adaptor if it is running and if you start two (or more) on the same port, only one of them will be running.
So here is my new startup code that handles the situation a bit better and will hopefully keep me from more new stupid errors (although I think now that I’ve learned this, I will not forget step 4 easily any more), because it will exit the image right after startup and write something to the log. I’ll probably also add Transcript output to this so that I can immediately see that the new image isn’t really active.
startServerAdaptor "self startServerAdaptor" |adaptor| [ (adaptor := WASstServerAdaptor port: self port) start. [adaptor isStarting] whileTrue: [ EsLogManager info: 'Waiting for the Adapter to finish starting'. (Delay forSeconds: 0.5) wait. ]. adaptor isRunning ifFalse: [Error signal: ('Adaptor is not Running - maybe port %1 is already in use?' bindWith: self port asString)]. EsLogManager info: 'WASStServerAdaptor started on port: ' , port asString] on: Error do: [:ex | EsLogManager error: 'Seaside Adaptor couldn''t start due to: ' , ex description. ex pass]
I am not really happy with this piece of code, because I think #start whould tell me if it couldn’t do its job (namely START and listen for requests), but I can run this twice now and get an error. And the server image quits immediately if it cannot handle requests, so I know something must be looked after. Here is a discussion thread I started on the VAST support group. I am eager to see if others agree they’d rather get an exception from #start or if there is anything I don’t know about how SST or TCP/IP in general works and that requires #start to not tell me about its inability to “start” in the way I mean “start”.