The WASstServerAdaptor story and its (preliminary) end

Well, a developer’s day is full of victories and waterloos. So I’ve had my developer’s day today.

After I had discussed my Server startup problems related to ports already in use, Marten came up with an explanation for VAST’s behavior on the VAST Support Group:

The reason for this behaviour is located in the method: SstTcpTransport>>#basicOpenListeningSocket and it seems to be correctly handled by VASmalltalk.

In this method a socket option (“SOREUSEADDR”) is set to true (if the configuration of the tcp network allows it, which is normally the case of VASmalltalk).

Hmm. Even if it’s not incorrect, I still don’t like it.
So my first waterloo was that Marten proved me wrong: this is not a bug or wrong behavior – from a very technical standpoint. I still thinnk the server pretends to be ready when it is not, and I don’t like it if my software tells me lies right into my face, especially if that even costs me lots of time…

The thread at the discussion group then continues with suggestions like changing the setting Marten mentioned in some very low level method for binding to a Socket. That is of course very deep down and you never know what you break if you open ALL TCP/IP Sockets as exclusive listeners.

So I decided to stick with my workaround that I described in my initial post on the subject (the reworked one, of course). It ran great – on Windows.

So here was my second waterloo. When I had XD packaged this and tried to start the same image twice on the Linux test machine, the Adaptor never left the #isStarting phase. The log contained lots of entries telling me we’re waiting for the adaptor to finish starting:

'2013-04-24 17:32:48,917: [INFO] Waiting for the Adapter to finish starting'
'2013-04-24 17:32:49,519: [INFO] Waiting for the Adapter to finish starting'
'2013-04-24 17:32:50,121: [INFO] Waiting for the Adapter to finish starting'
'2013-04-24 17:32:50,723: [INFO] Waiting for the Adapter to finish starting'
'2013-04-24 17:32:51,325: [INFO] Waiting for the Adapter to finish starting'

.. and so on. So the Socket behavior differs between Windows and Linux. On Windows, the retry loop was never run, the Adapter was not running immediately.

So I decided to extend my ugly workaround a little more with a retry count. I hate the code as it looks now, because it reminds me of really dark days in my programmer career. I’ll show it to you nevertheless, because it may be useful for others as well (I know, it’s not rocket science, but copying it from here is faster than thinking about it – you’re welcome!):

startServerAdaptor

	"self startServerAdaptor"

	|adaptor maxRetries retries|

	maxRetries := 5.
	retries := 1.

	[
		(adaptor := WASstServerAdaptor port: self port) start.
		[adaptor isStarting and: [retries <= maxRetries]] whileTrue: [
			EsLogManager info: ('Waiting for the Adapter to finish starting, retry count = %1' bindWith: retries asString).
			(Delay forSeconds: 0.5) wait.
			retries := retries +1.
			].
		adaptor isRunning ifFalse: [Error signal: ('Adaptor is not Running - maybe port %1 is already in use?' bindWith: self port asString)].
		EsLogManager info: 'WASStServerAdaptor started on port: ' , port asString]
			on: Error
			do: [:ex |
				EsLogManager
					error: 'Seaside Adaptor couldn''t start due to: ' , ex description.
				ex pass]

This is not a victory to be proud of, but one that will help me waste far less time lookinkg for phantom bugs. Here’s what happens in the Server ssh session when I start the image a second time:

UIProcess reportError: Adaptor is not Running - maybe port xxxxx is already in use?
 Dumping walkback to file: walkback.log

And the server process is immediately exiting. When I look into the application log I can see the server didn’t start because the port was in use. Great! Why not be happy about a small victory that may lay the ground for faster progress. It’s not much ado about nothing, and II’m pretty sure it pays back one day.

Issuing a REORG TABLE command to DB2 from VA Smalltalk (and Glorp)

You may have realized already that I misuse my blog and therewith you, my valued reader, as a swap space for small and maybe not so small little tricks I find in my day job from time to time.

And here is one little thing I just learned about how to invoke commands in DB2 that are not SQL statements form VA Smalltalk, in this specific case I mean REORG TABLE, but there are many other commands for which this may be useful.

Let me give you a little bit of context on why on earth I’d need that. Real men and even more so real DBA’s would simply fire up their DB2 command prompt and solve the problem at hand like a man. A simple table reorg would never stop a real man from saving the world in a day…

But here’s my problem. I am working on a Seaside Application in VA Smalltalk that is to be deployed to a Linux server. This application uses GLORP for persistence and I added some code that changes the database tables on server startup whenever I deploy a new image version. So if I add a new attribute to some persistent class, I have to add a new column to the underlying table(s). Some changes to the object model or some optimizations also need changes to foreign keys, indexes or even primary keys. These changes often cause DB2 to stop doing anything before I Reorg the modified table (Hint to IBM: Maybe that could be automated. The error message already tells me I need to Reorg, so why doesn’t it just do it for me???). The Error after such a change looks like this:

[SQLSTATE=57016 - [IBM][CLI Driver][DB2/LINUXX8664] SQL0668N  Operation not allowed for reason code "7" on 
table "MYTABLE".  SQLSTATE=57016
 [Native Error=-668]]

Yes, you’re guessing right: this error has made my life harder than I wanted it to be more than once in the past.

So one step in these schema migrations often is to change tables and then move data around, add foreign keys and stuff. In theory, that’s not too hard (I’ve learned a whole lot about this stuff from one of my friends and customers, hi Peter!). Unless DB2 gets into my way and tells me now that I’ve changed the primary key, I need to reorg the table first before I can change data.

Unfortunately, REORG TABLE is not a normal SQL statement. It is not intended to be used by normal SQL users and therefor cannot be issued just like a  normal SQL statement. Here’s what you get from DB2 if you inspect myGlorpSession accessor executeSQLString: 'REORG TABLE schema.tablename':

AbtError:  rc=-1 for '42601' in an AbtIbmCliCSDatabaseConnection at (24.04.2013 15:50:41)  '[SQLSTATE=42601 - [IBM][CLI Driver][DB2/LINUXX8664] SQL0104N  An unexpected token "TABLE" was found following "REORG ".  Expected tokens may include:  "JOIN <joined_table>".  SQLSTATE=42601
 [Native Error=-104]]

So this meant whenever I wanted to change a table’s primary key or add an index, I had to start the server once, stop it, use db2 to REORG tables by hand and start the server again. The steps of modifying the indexes and the following steps had to be separate migrations, each of them in their own transaction. Quite annoying.

But I found a solution to this problem. Real men, of course, knew it already. You can use a normal CLI client and issue admin commands by wrapping them into this:

myGlorpSession accessor executeSQLString:  'CALL ADMIN_CMD (''REORG TABLE schema.tablename'')' .

With this I can reorg a table first and then modify data in one single step during my server startup.

Doesn’t sound like a big deal? You’re wrong. This was a real hurdle to simple deployment. Just imagine you have to redo the same stuff on a development machine, a test server and a production server over and over again. And you must remember when to do what in the right order. Especially on a production server which shouldn’t be offline for too long, this is very important.

Before you ask: I am aware of the AUTO_REORG parameter, but as far as I know, Table reorganization can only run in an offline window. That’s not exactly what I am looking for. I need to do it in the middle of a migration script, not somewhen later tonight …

More on WASstServerAdaptor and used ports

There are probably a few  questions that you may ask about this problem I just posted about. And I try to answer them here so you may understand better why I ask for an exception (or at least a Warning):

Why don’t you simply make sure the image is only running once on the server?

Because that will hopefully not be the case for long. I hope for hundreds, better thousands of customers for my service, and scaling requires running multiple images behind a load balancer. Restricting myself to just one esnx process or only one image of a certain name seems like a bad idea here. Maybe the same server will once have to run another Seaside Server Application in addition to this one, and then the whole solution falls like a house of cards. BTW: You’ve heard about the possible ill effects of the Singleton pattern, haven’t you? This is one example that supports the theorem.

Why don’t you simply see if the port is taken before you start the Server Adaptor?

I don’t want to. It’s not smalltalkish to check before hand. We have great exception handling capabilities in Smalltalk and this is a great use case for them. Look at the code before and after the change in my last post. The first version is clean and easy to understand and completely sufficient to handle the situation the Smalltalk way. The final version is littered with stupid error condition checks and even polling. The only thing worse I can think would be some is...Error checks. Ah, no there is an even worse one: an integer return value that needs to be interpreted. That was old school back in 1983. C’mon!

You don’t like this answer? I’ll give you another one. I don’t know how to ask the OS whether a port is in use or not. Especially not if I have to keep my code portable between Windows and Linux. I am using a high-level language that aims to keep me abstracted from this OS Level stuff whenever it can, but still allows me to take the stairs down the basement and dig in the dirt there.

Why don’t you just check for instances of the Adaptor and stop the running one?

At least you’re still reading. Thanks for that, you’re a true friend.
As I’ve mentioned before, this already running instance is in another image, meaning in another OS Process and another address space. I cannot access it without using some sophisticated Inter Process Communication feature. Just for checking if I can use a port, that is a bit much. At least in my opinion.

There’s so much you can do with PID files and such on Linux. Why don’t you automate your deployment and add some fancy PID / Port lock files and stuff so that step 4 is never forgotten any more?

First, this is a great idea. And I am working on getting better in that area. Automation is a hobby of mine and I’ve helped quite a few customers automate their Packaging and Deployment in VAST projects.
Still, this would still be an incomplete solution. What if I decide to use another copy of my image (e.g. when I need to scale) and to use a port that is used by one of the daemons on the machine (I know, you can always check open ports, there are tons of lists on the web where you can find out what ports are usually used by the most-popular 1000000 applications)? What if that other application simply doesn’t use my mechanism, even though I’ve spent years making it perfect?

Starting WASstServerAdaptor and handling used ports

[Update: Please be sure tor read this post as well if you're interested in more details on this subject and this one if you want to see what my solution finally looks like]

This is a little lesson I learned with my VAST-based Seaside Application on a headless Linux Server that is probably not really a gem in every Seasider’s toolbox, but the initial problem cost me quite a few nerves.

Let me start by explaining my workflow for a new Image Version:
Continue reading

Getting jQuery ajax and callbacks to Seaside into the right order

I guess having learned something new is what makes a day a good one. So I’ve had a good day today.

In my Seaside Application, I try to add a few fancy ajax gadgets that use Ajax and do something before and after the actual Ajax call is performed. Especially in the case where you need to do something only after the ajax call is finished, you need to always keep in mind that the Ajax call is asynchronous, that means if you do an $.ajax, the following javascript statements will be run after the ajax call is issued (in other words:  immediately), not after the call is finished.

I thought I knew all about this.

Until today.

Because I had something that looked like this:

$.ajax({
  "url": options.url,
  "data": [
    options.queryFields,
    options.serializeCallback+"="+self.val(),
  ].join("&"),
  "complete": drawTarget(),
});

Don’t worry about the data parameter, that’s not important for the moment. Just imagine I send out a Seaside Callback’s identification as part of the options to my plugin and that is what gets handed back to the server in this call.

The point of my post is about the complete callback.

It turns out that I had some strange effect in that the callbacks (the one from this ajax and one in the drawTarget() function) came into my Seaside image in the wrong order, but only after the ajax was done. So the complete callback worked to some extent, but not quite right.
It turns out the use of complete, success and so one as parameters to $.ajax has long been deprecated:

Deprecation Notice: The jqXHR.success(), jqXHR.error(), and jqXHR.complete() callbacks are deprecated as of jQuery 1.8. To prepare your code for their eventual removal, use jqXHR.done(), jqXHR.fail(), and jqXHR.always() instead.

So after I changed my ajax code to

$.ajax({
  "url": options.url,
  "data": [
    options.queryFields,
    options.serializeCallback+"="+self.val(),
  ].join("&"),
}).always(drawTarget());

The callbacks come in in the correct order and my application works as it should. This is just another proof of the “once you do it right, it simply works ” theorem, I guess ;-)

And, it also is another hint that reading documentation can sometimes be a good idea.

How to waste less time debugging a Seaside/Glorp application

Yesterday I packaged my Seaside Application for the first time on VA Smalltalk 8.5.2 and deployed it to a staging server.

And promptly – as expected, I got some errors: The first few were easy to find. One of them being a missing rule in AbtXDSingleImagePackagingRule (or some superclass) to include the new EsTimeZone code. That could be fixed by hand.

But this morning I spent quite some time searching for a problem in the walkback.log that didn’t exist. And this post is mostly intended for myself to remember next time. But it might also save you some time. The second purpose of this post (or, to be exact, the next one) is to underline why I think the VAST port of Glorp has a lousy adaption of error handling.

But let’s start at the beginning: My image crashed as soon as a web user logged on to the web application. The image exited with Error code 60 and wrote a walkback log. So far, so good. The first thing this tells me is that my error handling code is not perfect yet, because I should have seen an error page instead of an HTTP-503.

So I started reading the walkback. My usual way of doing so is to start with the beginning of the walkback:

Walkback at 11:25:20 on 12.12.2012
Database error: 
[] in AbtHeadlessRuntimeStartUp class>>#outputWalkback:process:  
    receiver = AbtHeadlessRuntimeStartUp  
    arg1 = 'Database error: '  
    arg2 = Process:Dispatch worker: 8{running,3}  
    temp1 = 'walkback.log'  
    temp2 = a CfsWriteFileStream
BlockContextTemplate(Block)>>#valueWithErrorHandler:oldHandler:onReturnDo:  
    receiver = [] in AbtHeadlessRuntimeStartUp class>>#outputWalkback:process:
   ... and so on

So one thing was for sure: this is another one of those useless error messages that come from Glorp. Unfortunately, I decided to skip step two of my usual Continue reading

Web sockets are coming to Smalltalk

The web is more and more becoming the fat client of tomorrow. Not that it offers much more than what we’ve had fifteen years agod, but at least it is finally promising to become the one-platform-for-everything that Java never proved to be.

Nevertheless, web sockets promise to offer a new level of interactivity between browser-side applications and some server backend, because teh technology offers a steady, low-latency communication channel between server and frontend. Think Ajax as it should always have been: whenever the server needs to share information with a client, it simply sends it there, without the need for the client to poll or anything.

So web sockets are a really great opportunity for server-side frameworks like Seaside or AIDA. So if  you, like me, thought “Wow, if only I could use wb sockets in my web application!”, I have good news for you: We’ve seen the first announcements for web socket implementations in Smalltalk:

So once again the Smalltalk world is in the middle of the hottest and most exciting web technologies and people can implement cool advanced, highly interactive web applications in Smallzalk, using the most advanced interactive technology to develop and debug code on the server side.

I guess it won’t be long until someone comes up with an implementation of web sockets for GemStone/S or VA Smalltalk…

Seaside 3.0.7 (partially) and jQueryMobile 1.1.1 ported to VA Smalltalk

Marten’s been busy over the last few days and just released a new version of a partial Seaside 3.0.7 port (Instantiations ships VA 8.5.2 with Seaside 3.0.6), which he needed for the latest bels and whistles of Nick Ager’s Seaside integration for jQueryMobile V 1.1.1, which he also ported and released on VASTGoodies.

From the package comments of Marten’s Seaside version:

V 8.5.1 - with FileLibrary Addition from Seaside 3.0.7
  -> Caution: Not an official version. Use on your own risk.
  -> version needed for JQM 1.1.1 development  (Marten Feldtmann)

So it is not actually a full port of Seaside to VA ST 8.5.2, but the File Library additions that are important for jQM (I guess that means virtual file libraries).

Glorp and #commitUnitOfWorkAndContinue

[Please note: This post has been updated on Sept. 13th, 2012. The solution that I suggested in the original version of this post completely messes following commits for objects that got read from the DB before this commit and are changed later. Consecutive commits simply ignore changes to such objects. You can visit the discussion thread on Google Groups to see why and what all of this evolves to. The problem, however, is not trivial in situations where objects get deleted and should cascade their deletion to dependent objects. I do suggest a change to Glorp in that thread and am currently waiting for Alan to comment on it. ]

I just spent a bit over a day hunting some strange problem in a Seaside-based Web Application that uses VA Smalltalk and Glorp. The problem was that it was almost impossible to do a clean delete of an object that resides in two OneToManyMappings.
First, let me mention that our application was prototyped on top of an object database and it ran just fine, so it was not an application issue.

The object has some removal code that removes the object from the collections in the objects on the Many-side of the relationship. We considered this business code, and therefor just left it in place and added a

self session delete: theObject.

to the deletion method.

We thought all was running well for quite a while, until we realized we have lots of undeleted objects in the DB tables which have a foreign key of NULL. So the application was not really affected by the problem, but still we’d like to not collect too much rubbish in our DB. Once we’re live, we want to keep things small and manageable, and data garbage is not a good thing to start with.

So what exactly happened? We had no idea until we tried deleting objects and fired off our debuggers. At first, all seemed fine: a Transaction would be started, the objects deleted and then the transaction would be committed. But still, after a while, we’d find new corpses in the table (in fact we had more tables that were affected).

So what the heck was going on? By accident I saw that teh very next transaction after the delete would re-INSERT the objects with foreign Keys set to NULL. The following transaction didn’t really have to deal with the kind of Object we had deleted, it could be completely unrelated, just run in the same GlorpSession. Uhh!

Then followed a day of playing around, debugging and wondering.
We tried leaving our deletion-business code out of the game, Continue reading

Chrome, HTML5 and jQuery – welcome to the 21st century

HTML5 makes the world much better. At least in general.

One of the nice features of HTML5 is the possibility to give input fields a type-attribute, which helps browsers determine what kind of input the field is supposed to accept. The Browser can then use this info to both validate input and suppress form submission if the input doesn’t match, but also to offer some additional help for input of valid data. This is especially nice for mobile devices where the device can switch to an appropriate keyboard or show some fancy date input widgets.

But in some cases, this does not fit well with what jQuery (or, I am sure, any other JavaScript Library) provides to enable such things for older as well as current browsers.

The latest nicety I came across was the combination of jQueryUIs datepicker and Chrome’s datepicker when a text input is defined as type=”date”.

This exact combination causes several problems:

  1. You get two datepickers: one from Chrome and one from jQuery. Chrome renders its activation button for the datepicker into the input field, and jQuery renders an additional button right after the input field. So at least the user can choose which one they’d like to use. Of course they’ll be irritated.

    Two datepickers are one too much

    Chrome and jQuery both want to help you enter dates, and the user can even chose who they want to be supported by. In German there is a saying: Too many cooks will spoil the mash…

  2. For some reason, Chrome renders a placeholder text into the text input field, no matter if I sent a valid String as value from the server side (a Smalltalk/Seaside Application in my case), and only if you select a date in one of the two datepickers, you see an actual date in the input field

It is things like this that makes it a hard requierement to test you web application in all browsers that you want to support. And by “all” I really mean all-damnit-%&$%&$ing-versions-of-all-kinds-of-browsers-you-want-your-application-to-look-and-feel-good-on.

Forget about jQuery and cross-browser and one-javascript-to-tame-them-all. It’s all bloody stupid marketing-speak.

So what do we learn from that?

  • Be prepared to find tricky little problems with Ajax / jQuery on some version of some browser that don’t show up anywhere else. You may have tested this stinkin page on IE, FF and Chrome, but be sure that the one Browser you forgot to also test it on can break the whole thing. Sometimes, it is something you may have seen, but never paid attention to, but it will come back as a trouble ticket.
  • Be prepared to accept a looong test period in which you constantly hunt for impossible bugs that can only be reproduced in Version 19.5.105 of Browser X.
  • Be prepared to either make compromises or spend double the estimated time on seemingly irrelevant problems that do not really affect your functionality, but look ugly or are possibly irritating for your users.
  • Always resist equating JavaScript with portability or cross-platformness. You’ll be punished for it, believe me.
  • Forget about the 21st century. We’re back in the mid-80ies.

(Sorry for this post, but I am a bit frustrated today)