VMTRAP with VA Smalltalk due to strange bytecode differences (or whatever)

Today I was on the hunt. And I finally shot the beast that killed hundreds of good men over the last decades or so.

Well, at least it seems it’s dead, and I don’t really know why. I know it’s dead and I was hunting for it, but I am not sure I pulled the trigger.

But let’s start at the beginning. Yesterday I had a shiny new version of my Seaside Application and I had cross packaged it for Linux and tested it on two different Linux machines. And it crashed. It crashed without much comment, the Server just didn’t respond any more, the process was dead and all it left for me was a vmtrap.log and an vmtrap.img file in the image directory. Funnily, it crashed very reliably, within only a few Web page requests, but it died in many different places. The vmtrap.log almost always indicated the image had gotten to some method that was not the same of any of the previous crashes.

The funny thing was that the packaged image of the day before ran nicely, without any crashes. It ran using the very same VM and all external libs and stuff. The only difference was that other image. The code changes were only in application code, no external calls, strange extensions to UndefinedObject or any evil class initialization stuff. Just a few bug fixes and stuff.

So to sum the situatuon up: the dev image on Windows ran, a dev image on Linux ran as well, the packaged image of the previous day ran – even on the same machines using the same path and vm and libs and ini and whatever. The VM crashed in various places, and I could not find any pattern why it may have crashed.

I even created two completely fresh XD images and repackaged and went through the .es files and whatnot. I also repackaged the image of the previous day and all was the same again: old one was good, new one was broken.

So I was somewhat out of luck. I asked the wizard of G and found a very old post that seemed to talk about the very same problem. It came up with the (crude?) theory that someone must have screwed bytecodes in the library (Maybe I should mention I use a Linux XD image for packaging while I develop on Windows, so it’s not related to any image problems. I throw images away very often to avoid such problems right from the start) and recompiling methods helped their problem.

So I thought there’s not much to loose. I fired up a Config Map Browser, browsed the Changes between previous day and current (incl. required maps) and simply changed all changed methods by entering a blank or newline somewhere in the Methods. Luckily, I hadn’t changed that much code yesterday, maybe 50 methods. Have I ever mentioned I really love envy?

And what can I say? I fired up the Linux VM, cross-packaged the new (unchanged, but recompiled) code and the problem seems to be solved. The packaged image is running and seems to be stable.

So what could have gone wrong? I am packaging once or twice a week and haven’t seen this ever before.

Ah, and before you ask: None of the methods I had changed were ever mentioned in the vmtrap.log files…

Very strange…

But here’s my theory (please take it with a grain of salt):

I remember two things that happened to me yesterday. One was that I added an instance variable with accessors (using RB) named ‘class’ which wasn’t a good idea, so I later renamed the variable (but not the Accessors) to ‘cssClass’. I know I later couldn’t version/release that class because it was “inconsistent withe the edition in the Library”, an issue that is easy to solve most of the times.

I remember I once had lots of variables and accessors to rename in a very poor code base and I’ve had similar problems with these accessor methods that obviously didn’t work right. The source code said it would be doing one thing, but the results of executing them was different… Back then, changing the accessor methods by hand and saving /compiling new editions of them solved the issue.

I am not sure this theory is any good and am not sure how I could ever build a test for this stuff or write a somewhat useful support case. I only hope I remember this post  if I ever encounter such a strange problem again.

So what do we learn from this?

  • Version your code frequently to have milestones to check for changes!
  • Envy is one of your best friends when it comes to finding differences (handling them is another story and envy is not yet good at that)
  • Package early. package and test the packaged code often so that you know when things go wrong (and so that you don’t have too many of these surprises the night before shipping date)
  • Sometimes the fix of a bug doesn’t feel right. How can I know the next build tomorrow isn’t screwed again?

GemTalk Systems ist das neue Zuhause von GemStone/S und GLASS – ein Deutungsversuch

Es hat sich sicher schon herumgesprochen, dass die Entwicklungs- und Supportmannschaft von GemStone/S und GLASS sowie das darauf basierende MagLev für Ruby nun in einer neuen Firma zuhause sind. Das neue Unternehmen heisst GemTalk Systems und besteht aus der kompletten Mannschaft, die bisher an GemStone gearbeitet hat. Für das Produkt sind das sehr gute Nachrichten. GemStone/S war bei vmware irgendwie ein Stiefkind. Man musste auf der vmware-Website lange suchen, bis man GemStone/S fand. Selbst auf der zentralen Seite aller Produkte waren diverse Produkte aufgelistet, die ursprünglich von GemStone stammten (z.B. GemFire, SQLFire), nicht jedoch das eigentliche Flaggschiff.

Aus den Pressemitteilungen und Blog-Einträgen ist nicht zu entnehmen, wer die Investoren sind, die GemStone von vmware übernommen und GemTalk daraus gemacht haben. Auch auf der Website sind keine Informationen dazu zu finden. Dafür aber, dass vmware sich auch von den anderen Produkten wieder getrennt hat, wegen derer sie GemStone vor drei Jahren gekauft hatte. Diese sind ein paar Wochen zuvor in die Firma Pivotal ausgelagert worden und werden dort weitergeführt. Was das ganze für vmware für einen Sinn hatte, erschliesst sich mir nicht. Aber wer GemStone ein bisschen kennt, weiss, dass es schon in den neunziger Jahren mal ganz ähnlich zuging: der Börsenkracher Brokat aus Böblingen/Stuttgart hatte GemStone gekauft, um sein Portfolio mit GemStone/J (das es meines Wissens inzwischen nicht mehr gibt) aufzupeppeln. Glücklicher Weise ging Brokat erst vor sdie Hunde, nachdem GemStone mit viel Verlust wieder verkauft worden war. Auch bei Brokat hatte man den eigentlichen Edelstein von GemStone nie verstanden. Martin McClure hat auf der letzten Smalltalks 2012 einen sehr interessanten Vortrag zur Geschichte seines Arbeitgebers gehalten (hier gibt es das Video dazu). Um einen running gag aus diesem Vortrag fortzuführen: GemTalk Systems bleibt im selben Bürogebäude und zieht dort einen Stock höher ;-)

Nun könnte man meinen, sowohl Brokat als auch vmware haben sich an GemStone verschluckt, weil Smalltalk einfach nicht mehr zeitgerecht ist und niemand Smalltalk nutzt, zumindest niemand, der dafür Geld ausgeben möchte. Also war es nur logisch, das Zeug wieder abzustossen, und die Perlen des Unternehmens zu behalten.

Schaut man genauer hin, ergibt sich dann aber doch ein ganz anderes Bild: Die Kundenliste von GemTalk Systems ist sehr beeindruckend. Nicht, weil die Namen darauf so bekannt sind und Eindruck machen. Vielmehr, weil dahinter Unternehmen mit sehr komplexen, datenintensiven und äusserst flexiblen, veränderlichen Abläufen stecken.

Wie sonst, ausser durch seine besonderen Eigenschaften gerade für sehr komplexe Abläufe und hohe Datenvolumina liesse sich erklären, dass es Investoren gibt,  die eine Ausgründung von GemTalk Systems ermöglichen? Warum sollte jemand Geld in eine Firma stecken, deren Geschäft eine heisse Kartoffel ist, die vmware einfach fallen lassen möchte?

GemTalk hat eine beeindruckende Kundenliste, die – so wird gemunkelt – für einen attraktiven Cash-Flow sorgt. GemStone ist ein profitables Produkt, und das schon seit Jahrzehnten. GemStone war für Brokat eine Cash Cow und bei VMware war es nicht anders. Wie sonst liesse sich erklären, dass GemStone weiter existierte und das Produkt weiter entwickelt wurde? Wie sonst liesse sich erklären, dass vmware die kostenlose Verwendung von GLASS auch für kommerzielle Zwecke weiter gestatttete und sogar den Umfang der kostenfreien Lizenz erweiterte, das kommerzielle Produkt aber weiter führte? Wie sonst wäre zu erklären, dass erst unter VMware so recht Schwung in die Öffnung von GemStone hin zu open source-Projekten kam, sich Dale Henrichs zunehmend auch Themen wie Metacello und der Kompatibiltät von GemStone mit Pharo/Squeak widmen konnte? GemStone macht gerade einen sehr wichtigen Schritt: Weg von GemBuilder und der Notwendigkeit einer zweiten Smalltalk-Tools als Entwicklungs-Frontend hin zu einem kompletten Werkzeug mit einer eigenen (Webbasierten) IDE. Hier wird sehr viel mit Amber und Pharo gearbeitet und es gibt Initiativen hin zu einer Versionsverwaltung in Tools wie git oder subversion.

Der Mond über GemStone ist also ganz sicher nicht erblasst durch diese Ausgründung. Im Gegenteil, wir werden sicher noch viel Ineteressantes über GemStone hören in den nächsten Monaten. Die erste Etappe hierzu wird sicher die im Juni stattfindende STIC 2013 in den USA sein, auf der GemTalk dabei ist. Auch MagLev wird GemTalk weiterführen, und was uns Smalltalkern sicher am wichtigsten sein wird: es wird weiterhin ein kostenloses GLASS geben.

Die Ausgründung von GemTalk Systems steht nicht alleine da. Erst vor drei Jahren passierte bei Instantiations, dem Anbieter von VA Smalltalk, etwas ganz ähnliches: die Firma stiess ihr gesamtes Java-Produktportfolio samt Entwicklungsmannschaft an Google ab, und führte ihre Geschäfte als reine Smalltalk-Firma weiter (Joachims Small World berichtete). Seither hat sich auch an dem Produkt mehr getan, als in den Jahren zuvor.

Smalltalk scheint also nach wie vor eine Umgebung zu sein, in der es sich lohnt, zu investieren. Alle drei wichtigen kommerziellen Anbieter (Cincom, GemStone und Instantiations) scheinen profitabel zu sein und arbeiten fleissig an ihren Produkten. Auch im open source-Umfeld tut sich eine ganze Menge, angefangen bei Pharo, über Amber bis hin zu Squeak und GNU Smalltalk.

Um einen der abgeklopfteren Sprüche berühmter Persönlichkeiten zu bemühen: Die Gerüchte über das Ableben von Smalltalk sind noch immer völlig übertrieben….

Zum Abschluss noch ein paar weiterführende Links:

The WASstServerAdaptor story and its (preliminary) end

Well, a developer’s day is full of victories and waterloos. So I’ve had my developer’s day today.

After I had discussed my Server startup problems related to ports already in use, Marten came up with an explanation for VAST’s behavior on the VAST Support Group:

The reason for this behaviour is located in the method: SstTcpTransport>>#basicOpenListeningSocket and it seems to be correctly handled by VASmalltalk.

In this method a socket option (“SOREUSEADDR”) is set to true (if the configuration of the tcp network allows it, which is normally the case of VASmalltalk).

Hmm. Even if it’s not incorrect, I still don’t like it.
So my first waterloo was that Marten proved me wrong: this is not a bug or wrong behavior – from a very technical standpoint. I still thinnk the server pretends to be ready when it is not, and I don’t like it if my software tells me lies right into my face, especially if that even costs me lots of time…

The thread at the discussion group then continues with suggestions like changing the setting Marten mentioned in some very low level method for binding to a Socket. That is of course very deep down and you never know what you break if you open ALL TCP/IP Sockets as exclusive listeners.

So I decided to stick with my workaround that I described in my initial post on the subject (the reworked one, of course). It ran great – on Windows.

So here was my second waterloo. When I had XD packaged this and tried to start the same image twice on the Linux test machine, the Adaptor never left the #isStarting phase. The log contained lots of entries telling me we’re waiting for the adaptor to finish starting:

'2013-04-24 17:32:48,917: [INFO] Waiting for the Adapter to finish starting'
'2013-04-24 17:32:49,519: [INFO] Waiting for the Adapter to finish starting'
'2013-04-24 17:32:50,121: [INFO] Waiting for the Adapter to finish starting'
'2013-04-24 17:32:50,723: [INFO] Waiting for the Adapter to finish starting'
'2013-04-24 17:32:51,325: [INFO] Waiting for the Adapter to finish starting'

.. and so on. So the Socket behavior differs between Windows and Linux. On Windows, the retry loop was never run, the Adapter was not running immediately.

So I decided to extend my ugly workaround a little more with a retry count. I hate the code as it looks now, because it reminds me of really dark days in my programmer career. I’ll show it to you nevertheless, because it may be useful for others as well (I know, it’s not rocket science, but copying it from here is faster than thinking about it – you’re welcome!):

startServerAdaptor

	"self startServerAdaptor"

	|adaptor maxRetries retries|

	maxRetries := 5.
	retries := 1.

	[
		(adaptor := WASstServerAdaptor port: self port) start.
		[adaptor isStarting and: [retries <= maxRetries]] whileTrue: [
			EsLogManager info: ('Waiting for the Adapter to finish starting, retry count = %1' bindWith: retries asString).
			(Delay forSeconds: 0.5) wait.
			retries := retries +1.
			].
		adaptor isRunning ifFalse: [Error signal: ('Adaptor is not Running - maybe port %1 is already in use?' bindWith: self port asString)].
		EsLogManager info: 'WASStServerAdaptor started on port: ' , port asString]
			on: Error
			do: [:ex |
				EsLogManager
					error: 'Seaside Adaptor couldn''t start due to: ' , ex description.
				ex pass]

This is not a victory to be proud of, but one that will help me waste far less time lookinkg for phantom bugs. Here’s what happens in the Server ssh session when I start the image a second time:

UIProcess reportError: Adaptor is not Running - maybe port xxxxx is already in use?
 Dumping walkback to file: walkback.log

And the server process is immediately exiting. When I look into the application log I can see the server didn’t start because the port was in use. Great! Why not be happy about a small victory that may lay the ground for faster progress. It’s not much ado about nothing, and II’m pretty sure it pays back one day.

VAST improvements: Navigation in Config Maps

I’ve spent quite some time merging code recently, releasing Applications into Maps and Changing Required Map versions in other maps in an effort of porting a significant amount of existing VA Smalltalk code from Version 7 to the latest version 8.5.2.

There is a lot to say about how you should or should not structure your code and the fact that merging code and releasing lots and lots of open editions can be a hard job. You need to be fully awake and always be careful not to make mistakes that only adds to the pile of tiresome and demanding work to do. But there are a few ideas for small improvements to the configuration map browser that would make life so much easier and I thought I’d start with a little collection here and ask people to comment on them or add their own. Hopefully this will lead to a list that Instantiations considers as an inspiration for improvements in the tool…

  • First of all, I painfully missed a back/forward button. Really. You jump between maps and map editions all the time and loose a lot of context without the ability to just jump back.
  • A cheap alternative would be the option to spawn a new config map browser for certain lookups. My most painfully missed ones was the “find” menu item on the required maps list and the “Dependents” family of menu items on the maps list. Let me elaborate a lttle on the dependents thing: You are in process of releasing new apps or app editions to a config map and now need to find out which other maps need to be updated because this new edition needs to be released as a required map to these. All that VAST offers right now is that it selects all dependent maps in the same list. This is really stupid, because as soon as you click on one of them to change the required maps there, the other selected dependent maps are deselected. Bummer. So you after releasing into the first of the dependent maps, you have to navigate back to the one you came from (which sometimes is hard to remember if you’ve been doing this for hours now), search for dependents again and go to the second hit just to start the whole cycle again. So how did I help myself here? I did the search for dependent maps, went to the Transcript, opened a new Config Maps browser and navigated  to all the hits marked in the first browser one after the other. Was that a good experience? No. Because you need to dig down deeper on each of them, so you end up with 5 or more config maps browsers open and get lost very easily. So a first and cheap improvement would be to let the search for dependent maps open a new config map browser with only the found dependent maps in it. The window title could be “Maps dependent on MyConfigMap (Immediate)”. This wouldn’t really keep me from loosing track from time to time, but it would help me stay much more focussed on my job than the existing tool chain.
  • The MED Required Maps browser is of great help when it comes to understanding what maps and editions thereof are going to be loaded when you load a map. Such a tree is a great way to visualize what you’re dealing with. But it is a read-only tool You always have to have a normal config map browser open next to it in order to act on what you see.
  • When you load a map with required maps, and one of the maps in that graph is referenced/used in more than one edition, VAST warns you about it before it loads the code. But it doesn’t really support you in finding where these ambiguous references are. I may miss some existing menu item in the config maps browser, but I wished for an option to find all occurrences of editions of a certain config map in a given graph of required maps of one particular map edition. It can be very hard to resolve the issue at hand without it. The same of course applies to application editions/versions. Two maps in a graph can load an app in separate editions, but there is no way that I’m aware of that you could search for these occurrences. It is hard enough to decide if that may or may not be a wanted constellation (like “I want to load this feature, but there is a fix to a class in that particular application that I need to load a newer version of on top of that”), but as VAST stands today it is already hard to find out that there is such an issue at hand. Or maybe I am just missing some menu item and someone can shed some light on it for me…
  • In the real world, projects tend to build up code structures over the years where there are lots of ambiguous references, circular and conflicting required maps associations and there’s a lot of stuff that can go terribly wrong in such code bases. One case where such problems show their ugly face is migrations of code between VAST releases. You typically open up all maps during the process and produce lots of open map and application editions. Every time I try to resolve such issues, I start doing the very same thing: take a sheet of paper and draw boxes (for map editions) and connecting lines (for req. map relationships), just to understand the graph of code artefacts at hand. In this particular project, I simply gave up on it after a few hours, because 3 sheets of paper simply didn’t suffice. I knew there are lots of unnecessary req. map relationships but without a visual way of understanding the whole graph, it is almost impossible to decide which relationships may or may not be superfluous. So what would really help a lot would be a tool that visualizes the required maps graph of a map. This graph should at least show all relationships between maps, but it would best also include version numbers / edition timestamps.

So I am eager to learn what other VAST users think of these suggestions, where they see potential for improvements, what kind of tools they wrote (we do use a little box of utilities of our own as well), or what tricks people use to solve the kinds of problems i describe here. Feel free to comment and let’s start some discussion that could help Instantiations improve VAST and Envy tooling in future releases…

…and now for my Glorp on VAST error handling rant.

I’ve written in my previous post about a debugging session that took longer than necessary for two reasons: my stupidity and a combination of several unfriendly factors. So let’s continue our journey up through my walkback of today’s image crash.

Once I figured there was no Seaside problem in the crash, I scrolled further up through the walkback and what I found there really made me upset about the error handling in Glorp (I was also upset about myself and the fact that I had just spent an hour for nothing). Just about 1500 lines towards the top of the walkback was this:

ExceptionalEvent>>#signalWith:
  receiver = Exception: Database error
  arg1 = AbtError:  rc=-1 for '57016' in an AbtIbmCliCSDatabaseConnection at (12.12.2012 11:25:20)  '[SQ
LSTATE=57016 - [IBM][CLI Driver][DB2/LINUXX8664] SQL0668N  Operation not allowed for reason code "7" on 
table "MYTABLE".  SQLSTATE=57016
 [Native Error=-668]]
'
[] in VADatabaseAccessor>>#loginIfError:
  receiver = a VADatabaseAccessor
  blockarg1 = AbtError:  rc=-1 for '57016' in an AbtIbmCliCSDatabaseConnection at (12.12.2012 11:25:20)  '[SQLSTATE=57016 - [IBM][CLI Driver][DB2/LINUXX8664] SQL0668N  Operation not allowed for reason code "7" on table "MYTABLE".  SQLSTATE=57016
 [Native Error=-668]]
'

Holy Moly! So this was the cause of my problem. I had altered MYTABLE in some schema migration routine and the table needed reorg. This is easy to handle once you know it needs to be done: fire up some DB2 client and do a REORG TABLE MYTABLE.

But what is Joachim complaining about? It’s his job to know what he did to the database, not Glorp’s…

I am complainning about the fact that Glorp simply obfuscates a very clear error message. Remember what the walkback started with? If you forgot, here are the first lines of the walkback:

Walkback at 11:25:20 on 12.12.2012
Database error: 
    [] in AbtHeadlessRuntimeStartUp class>>#outputWalkback:process:  
    ...etc...

That’s what I am complaining about. I could have fixed the problem in just three minutes if the walkback told me what’s wrong.
DB2 tries hard to tell the stupid programmer what’s the problem. AbtIbmCliDatabaseConnection>>#fetchNextRowFromCursor:ifError: hands this exception on to its caller, and Glorp (or better, the VAST port of it) simply takes that message and puts it into the dustbin or keeps it as a secret. Eat this, programmer!

This is due to some strange error handling code that comes from the original Glorp implementation. It has to be adopted to VAST, because the adaptation that was done in the original Glorp code is defunct. I tried to fix that on my more than once, but only found another problem in VAST that completely befuddled me so I gave up. Whatever I tried, something didn’t work. That problem can be fixed very easily, and I think it will be integrated into VAST soon, but it isn’t yet.

So this is my rant about the current VA ST Glorp port and its error handling mess. Other than that, I must say Glorp is stable and works very well, even if the VAST version is a few versions behind.

Ah, before I forget it: After a reorg of the table it seems my application runs fine and I am in the middle of testing and fixing.

How to waste less time debugging a Seaside/Glorp application

Yesterday I packaged my Seaside Application for the first time on VA Smalltalk 8.5.2 and deployed it to a staging server.

And promptly – as expected, I got some errors: The first few were easy to find. One of them being a missing rule in AbtXDSingleImagePackagingRule (or some superclass) to include the new EsTimeZone code. That could be fixed by hand.

But this morning I spent quite some time searching for a problem in the walkback.log that didn’t exist. And this post is mostly intended for myself to remember next time. But it might also save you some time. The second purpose of this post (or, to be exact, the next one) is to underline why I think the VAST port of Glorp has a lousy adaption of error handling.

But let’s start at the beginning: My image crashed as soon as a web user logged on to the web application. The image exited with Error code 60 and wrote a walkback log. So far, so good. The first thing this tells me is that my error handling code is not perfect yet, because I should have seen an error page instead of an HTTP-503.

So I started reading the walkback. My usual way of doing so is to start with the beginning of the walkback:

Walkback at 11:25:20 on 12.12.2012
Database error: 
[] in AbtHeadlessRuntimeStartUp class>>#outputWalkback:process:  
    receiver = AbtHeadlessRuntimeStartUp  
    arg1 = 'Database error: '  
    arg2 = Process:Dispatch worker: 8{running,3}  
    temp1 = 'walkback.log'  
    temp2 = a CfsWriteFileStream
BlockContextTemplate(Block)>>#valueWithErrorHandler:oldHandler:onReturnDo:  
    receiver = [] in AbtHeadlessRuntimeStartUp class>>#outputWalkback:process:
   ... and so on

So one thing was for sure: this is another one of those useless error messages that come from Glorp. Unfortunately, I decided to skip step two of my usual Continue reading

VA Smalltalk on openSuSE: much ado about nothing?

For several reasons, I had to reinstall a new openSuSE 12.2 VM on my computer. Funnily, installing and starting VA Smalltalk was no problem at all this time. I didn’t have to call xhost before starting any of the VA Smalltalk installation or start scripts.

Not all problems that VA Smalltalk has on Linux are gone, but at least all the strange ones. Most of the issues I see are related to Locales. Instantiations is well aware of these problems and working on them.

So if you want to run VA Smalltalk on a US-ASCII Linux environment, openSuSE and VA Smalltalk fit nicely together. VAST is much snappier on KDE than on Ubuntu’s Unity Desktop, and if I didn’t need German Umlauts, I would really consider developing on Linux, even if syntax coloring is missing on Linux (in the current version).

But for now, I develop on Windows and test on Linux, because that is my deployment platform for a headless Seaside Image…

Starting VA Smalltalk on openSuSE 12.2

As I’ve written a few days ago, there are problems installing and starting VA Smalltalk on openSuSE 12.2. The problem is that root (nor any normal user) cannot start VAST or its setup program in an X session. The installation or start of VA Smalltalk quits with the following message:

Xlib: connection to ":0.0" refused by server
Xlib: No protocol specified

ERROR: Failed to open default display - exiting.

I already wrote the solution is to tell the X server to allow a connection to the X server for the current user. You can use xhost for this. The command to execute is host local: <username>. I couldn’t figure out which configration file would have to be changed to set this permanently, but I wanted to share what I did to at least make VA Smalltalk start without any problems. I just added the following line to the abt startup script (the text in black is as it ships with VAST, but with it you can find where to add the line):

#
# Runtime determination logic:
#
# 1. If <shellName> = "abt" then development time; otherwise runtime.
# 2. If development time, run es adding -mcd switch to disable code caching;
# if runtime, run es adding -no_break switch so user break window not shown
#
# As of VAST 5.5 the install no longer sets up symbolic links to /usr/bin/es.
# (This permits the coexistence of multiple VAST versions on the same UNIX machine)

xhost local:$(whoami)
if [ "${0##*/}" = "abt" ]
 then
   $VASTROOT_85/bin/es -mcd $IMAGE "$@"
 else
  $VASTROOT_85/bin/es -no_break $IMAGE "$@"
fi

I am not completely sure if I like this solution, but I’ve found a number of posts on the net where other applications  had the same startup problem and the only advice that seemed to reliably work was to call xhost before the start of the affected application.

Especially surprising for me was that adding my normal user to the user group xok, which is meant for users who may start X applications does not change anything.

So do you know what has to be done in openSuSE to allow certain users to start an application on the X server? Please add a comment and let me know. I don’t like the fact that I have to remember to change my act file after each VAST update. I could of course also put the call into .profile or any other startup script, but that still requires me to remember to do that for every user…

Using ltrace to debug VA Smalltalk calls to external libraries

This post is a bit technical, but sometimes even Smalltalk developers need to dive down into depths were it feels like noone else has ever been there before.

In this case I am looking for a segmentation fault in a call to an external library on Linux. From logfiles, I know that the library is found and the external function executes. This is because I am in the luxurious position to use an external library that writes a logfile.  I could even tell from log4s logging output in my application that a log output immediately after the call to the library is not executed any more. That’s all I know so far.

For Linux there are very helpful tools to trace what an application does, like strace and ltrace. I’ve used both, but strace didn’t really help me much. This post is about how to use ltrace together with VAST on Linux. There are a few little difficulties with using ltrace, one of them being that you cannot start ltrace on a shell script, and VAST typically is started using the abt shell script on Linux. Luckily, you can also start ltrace and tell it to sniff on a running process, which you have to name by its PID (process id). This option is quite useful for another reason: trace output can be very big and there’ll be a lot of information in it that’s not helpful for your case. If you need to run a few Smalltalk statements before the actual error, you can save yourself a lot of searching for relevant information in the log output if you start tracing as late as possible.

So first you need to find out the process’ PID. The easiest way to do so is to look at the shell output of the abt script:

joachim@BooberFraggle:~/image852> ./abt
VA Smalltalk, Version 8.5.2 

VM Timestamp: 4.0,(NC) 8/7/2012 (84)

(C) Copyright Instantiations 1994, 2012. All rights reserved.
(C) Copyright International Business Machines Corp. 1994, 2006. All rights reserved.
Virtual Machine PID: 30749

Commandline args {
 -mcd
 -i./abt.icx

 }

VM Options {
... etc. ...

You find the PID in this output (see red line).

So after starting VAST and executing whatever is needed to do before you want to start tracing, you start ltrace with a command line like this (best done as root, of course):

ltrace -p 30749 -S  -o ltrace.log -l /path/to/library

Then you can start evaluating the Smalltalk code that will crash the image and exit with a segmentation fault. YXou can of course even use the smalltalk debugger to run the code and see what happens – up to the point of the segmentation fault ;-)

Here’s what the parameters mean:

  • -p is the process id to trace (the one from above)
  • -S Display system calls as well as library calls
  • -o name of the trace log file
  • -l Display only the symbols included in the library  filename (If you know which library causes problems).

ltrace does have a lot more parameters and as always the man command is your friend to learn more about them.

So in my case, I get a trace file that contains the following info:

30749 SYS_getcwd("/home/joachim/image852", 4096) = 23
30749 SYS_lstat64(0xbfa008f4, 0xbfa005e4, 0xb725aff4, 0x92763d4, 0x92763dc) = -2
30749 SYS_getcwd("/home/joachim/image852", 4097) = 23
30749 SYS_lstat64(0x92765dc, 0xbfa018ac, 0xb725aff4, 3, 0xbfa01954) = -2
30749 SYS_lstat64(0xbfa008f4, 0xbfa005e4, 0xb725aff4, 0x9276575, 0x9276579) = 0
30749 SYS_lstat64(0xbfa008f4, 0xbfa005e4, 0xb725aff4, 0x927657a, 0x9276581) = 0
30749 SYS_lstat64(0xbfa008f4, 0xbfa005e4, 0xb725aff4, 0x9276582, 0x927658a) = 0
30749 SYS_lstat64(0xbfa008f4, 0xbfa005e4, 0xb725aff4, 0x927658b, 0x9276593) = -2
30749 SYS_lstat64(0x9276574, 0xbfa018ac, 0xb725aff4, 3, 0xbfa01954) = -2
30749 SYS_gettimeofday(0xbfa0178c, NULL) = 0
30749 SYS_write(12, "2012-11-20 10:43:14,003 INFO - "..., 70) = 70
30749 SYS_lseek(12, 0, 2) = 1004
30749 SYS_brk(0x092a5000) = 0x092a5000
30749 SYS_brk(0x092a4000) = 0x092a4000
30749 SYS_brk(0x092c5000) = 0x092c5000
30749 SYS_brk(0x092e6000) = 0x092e6000
30749 SYS_brk(0x09307000) = 0x09307000
30749 SYS_brk(0x09328000) = 0x09328000
30749 SYS_munmap(0xb6375000, 65588) = 0
30749 SYS_gettimeofday(0xbfa02840, NULL) = 0
30749 SYS_write(12, "2012-11-20 10:43:14,015 INFO - "..., 70) = 70
30749 SYS_lseek(12, 0, 2) = 1074
30749 SYS_gettimeofday(0xbfa02980, NULL) = 0
30749 SYS_write(12, "2012-11-20 10:43:14,016 INFO - "..., 96) = 96
30749 SYS_lseek(12, 0, 2) = 1170
30749 --- SIGSEGV (Segmentation fault) ---
30749 SYS_rt_sigprocmask(1, 0xbfa02790, 0, 8, 0xb725aff4) = 0
30749 SYS_open("vmtrap.log", 577, 0644) = 13
30749 SYS_write(13, "-Platform Information-----------"..., 40) = 40
30749 SYS_write(13, "\nVM Timestamp: ", 15) = 15
30749 SYS_write(13, "4.0,(NC) 8/7/2012 (84)", 22) = 22
30749 SYS_write(13, "\nCPU Architecture: ", 19) = 19

So as you can see, the last few lines are the beginning of the ouptut of the VM into vmtrap.log, which it writes when it crashes. The most interesting part is above the coloured line: The last few statements that were executed before the crash. What did I learn from that file?

You can see that the Library is producing output onto its log file (30749 SYS_write(12, “2012-11-20 10:43:14,016 INFO – “…, 96) = 96). Somewhere above the excerpt reproduced here, there is a line that says:

30749 SYS_open(“/home/joachim/image852/myLib.log”, 65, 0600) = 12

That obviously opens the log file and returns its file handle (12).

So I still don’t know what happens exactly, but ltrace at least can tell me a bit about what happened last before a crash.

This article, as you may have guessed, is mainly there to save me time next time I need to trace stuff and may not be extremely entertaining ;-)

Installing VA Smalltalk on openSuSE 12.2 32bit

Regular readers of my blog may know that we are working on a Seaside Application that is to be deployed on a Linux Web Server on the Internet. VA Smalltalk does support Linux, even though using VA Smalltalk for development work on Linux is not as nice and convenient as on Windows. But at least for testing, packaging and fixing some last bugs, VA Smalltalk runs reasonably well on Linux.

Over the last few days, I was on the hunt (and still am) for a problem with calls to an external library that causes no troubles on Windows, but ends with a General Protection Fault on Linux. Luckily, the Library does log its activity, so I know the library finishes its job, but control never returns back to our VAST application (We’ve added log entries right before and after the call to the library). Maybe I’ll write a little bit more about this later.

So I wanted to test whether this is only a problem in Ubuntu 12.04 (missing libraries or whatever) or if it is reproducible on openSuSE 12.2. I’ve been using SuSE for more than a decade, even longer than that, because I remember running a distro named LST (Linux Support Team) back in the early nineties, and I think that was the predecessor of SuSE that shipped on a CD or some twenty floppy discs…

Installing on Ubuntu runs quite flawless (I just updated to VAST 8.5.2 a few days ago and hod no problems at all), so I expected no problems on openSuSE. The installation of openSuSE in VmWare Fusion is extremely easy, and I was full of energy to install VAST on openSuSE. I kicked off a terminal, entered su – and ./setup & . Usually, this shell script starts a graphical installer that looks pretty similar to the one you know from Windows. From there on, all is very simple and boring.

Not this time.

The terminal said something like it had errors and I should please look into /tmp/vastInstall-output  . No graphical installer in sight.

And here’s what I found in that file:

VA Smalltalk, Version 8.5.2 
VM Timestamp: 4.0,(NC) 8/7/2012 (84)
(C) Copyright Instantiations 1994, 2012.  All rights reserved.
(C) Copyright International Business Machines Corp. 1994, 2006.  All rights reserved.
Virtual Machine PID: 3908
Commandline args {
 -mcd
 -no_break
 -isetup.icx
 -z.us./home/joachim/Downloads/cd_c
 }

VM Options {
  }
Xlib: connection to ":0.0" refused by server 
Xlib: No protocol specified

ERROR: Failed to open default display - exiting.

This happened in Gnome, KDE and IceWM. Luckily, Seth and John from Instantiations knew what I have to do and posted it to the VA Smalltalk support group. All that needs to be done is to configure the X server to allow local connections to the display for all users who need to start X programs. So before starting the installation, you have to execute the following commands (as root):

xhost local:root
xhost local: <user name who wants to start VAST>

You can then run setup to install VA Smalltalk as usual (please make sure you run it as root). These settings are not permanent, so you have to repeat the commands for every start of VA Smalltalk (so it’s helpful to add these commands to the abt startup shell script – which I plan to do once I have a little time. All that’s needed is to find out the current user that runs the script…).

I don’t fully understand the reasons for this, because you can start xclock, xeyes etc. even before that. But I keep this info here for further reference. VAST now runs on openSuSE 12.2 and I am happy I asked on the support forum.