Using ltrace to debug VA Smalltalk calls to external libraries


This post is a bit technical, but sometimes even Smalltalk developers need to dive down into depths were it feels like noone else has ever been there before.

In this case I am looking for a segmentation fault in a call to an external library on Linux. From logfiles, I know that the library is found and the external function executes. This is because I am in the luxurious position to use an external library that writes a logfile.  I could even tell from log4s logging output in my application that a log output immediately after the call to the library is not executed any more. That’s all I know so far.

For Linux there are very helpful tools to trace what an application does, like strace and ltrace. I’ve used both, but strace didn’t really help me much. This post is about how to use ltrace together with VAST on Linux. There are a few little difficulties with using ltrace, one of them being that you cannot start ltrace on a shell script, and VAST typically is started using the abt shell script on Linux. Luckily, you can also start ltrace and tell it to sniff on a running process, which you have to name by its PID (process id). This option is quite useful for another reason: trace output can be very big and there’ll be a lot of information in it that’s not helpful for your case. If you need to run a few Smalltalk statements before the actual error, you can save yourself a lot of searching for relevant information in the log output if you start tracing as late as possible.

So first you need to find out the process’ PID. The easiest way to do so is to look at the shell output of the abt script:

joachim@BooberFraggle:~/image852> ./abt
VA Smalltalk, Version 8.5.2 

VM Timestamp: 4.0,(NC) 8/7/2012 (84)

(C) Copyright Instantiations 1994, 2012. All rights reserved.
(C) Copyright International Business Machines Corp. 1994, 2006. All rights reserved.
Virtual Machine PID: 30749

Commandline args {
 -mcd
 -i./abt.icx

 }

VM Options {
... etc. ...

You find the PID in this output (see red line).

So after starting VAST and executing whatever is needed to do before you want to start tracing, you start ltrace with a command line like this (best done as root, of course):

ltrace -p 30749 -S  -o ltrace.log -l /path/to/library

Then you can start evaluating the Smalltalk code that will crash the image and exit with a segmentation fault. YXou can of course even use the smalltalk debugger to run the code and see what happens – up to the point of the segmentation fault😉

Here’s what the parameters mean:

  • -p is the process id to trace (the one from above)
  • -S Display system calls as well as library calls
  • -o name of the trace log file
  • -l Display only the symbols included in the library  filename (If you know which library causes problems).

ltrace does have a lot more parameters and as always the man command is your friend to learn more about them.

So in my case, I get a trace file that contains the following info:

30749 SYS_getcwd("/home/joachim/image852", 4096) = 23
30749 SYS_lstat64(0xbfa008f4, 0xbfa005e4, 0xb725aff4, 0x92763d4, 0x92763dc) = -2
30749 SYS_getcwd("/home/joachim/image852", 4097) = 23
30749 SYS_lstat64(0x92765dc, 0xbfa018ac, 0xb725aff4, 3, 0xbfa01954) = -2
30749 SYS_lstat64(0xbfa008f4, 0xbfa005e4, 0xb725aff4, 0x9276575, 0x9276579) = 0
30749 SYS_lstat64(0xbfa008f4, 0xbfa005e4, 0xb725aff4, 0x927657a, 0x9276581) = 0
30749 SYS_lstat64(0xbfa008f4, 0xbfa005e4, 0xb725aff4, 0x9276582, 0x927658a) = 0
30749 SYS_lstat64(0xbfa008f4, 0xbfa005e4, 0xb725aff4, 0x927658b, 0x9276593) = -2
30749 SYS_lstat64(0x9276574, 0xbfa018ac, 0xb725aff4, 3, 0xbfa01954) = -2
30749 SYS_gettimeofday(0xbfa0178c, NULL) = 0
30749 SYS_write(12, "2012-11-20 10:43:14,003 INFO - "..., 70) = 70
30749 SYS_lseek(12, 0, 2) = 1004
30749 SYS_brk(0x092a5000) = 0x092a5000
30749 SYS_brk(0x092a4000) = 0x092a4000
30749 SYS_brk(0x092c5000) = 0x092c5000
30749 SYS_brk(0x092e6000) = 0x092e6000
30749 SYS_brk(0x09307000) = 0x09307000
30749 SYS_brk(0x09328000) = 0x09328000
30749 SYS_munmap(0xb6375000, 65588) = 0
30749 SYS_gettimeofday(0xbfa02840, NULL) = 0
30749 SYS_write(12, "2012-11-20 10:43:14,015 INFO - "..., 70) = 70
30749 SYS_lseek(12, 0, 2) = 1074
30749 SYS_gettimeofday(0xbfa02980, NULL) = 0
30749 SYS_write(12, "2012-11-20 10:43:14,016 INFO - "..., 96) = 96
30749 SYS_lseek(12, 0, 2) = 1170
30749 --- SIGSEGV (Segmentation fault) ---
30749 SYS_rt_sigprocmask(1, 0xbfa02790, 0, 8, 0xb725aff4) = 0
30749 SYS_open("vmtrap.log", 577, 0644) = 13
30749 SYS_write(13, "-Platform Information-----------"..., 40) = 40
30749 SYS_write(13, "\nVM Timestamp: ", 15) = 15
30749 SYS_write(13, "4.0,(NC) 8/7/2012 (84)", 22) = 22
30749 SYS_write(13, "\nCPU Architecture: ", 19) = 19

So as you can see, the last few lines are the beginning of the ouptut of the VM into vmtrap.log, which it writes when it crashes. The most interesting part is above the coloured line: The last few statements that were executed before the crash. What did I learn from that file?

You can see that the Library is producing output onto its log file (30749 SYS_write(12, “2012-11-20 10:43:14,016 INFO – “…, 96) = 96). Somewhere above the excerpt reproduced here, there is a line that says:

30749 SYS_open(“/home/joachim/image852/myLib.log”, 65, 0600) = 12

That obviously opens the log file and returns its file handle (12).

So I still don’t know what happens exactly, but ltrace at least can tell me a bit about what happened last before a crash.

This article, as you may have guessed, is mainly there to save me time next time I need to trace stuff and may not be extremely entertaining😉

6 thoughts on “Using ltrace to debug VA Smalltalk calls to external libraries

    1. Hi Mariano,

      not yet. But now that we’re talking😉 :
      Do you have any idea how I could find out which flag the VM uses for the dlopen of my library? The library’s vendor suggests using RTLD_GLOBAL. I am playing around with strace, but it seems that doesn’t want to tell me…

      Joachim

      1. Hi. The only way to know how the VM uses dlopen is by checking the code, sorry😦 I don’t know if there is a way to know it from outside. For example, with NativeBoost FFI I see the code:

        primitiveDlopenFn
        | addr |

        addr := 0.
        self cCode: ‘
        #ifdef RTLD_DEFAULT
        addr = (void*) &dlopen
        #endif
        ‘.
        “return a dlopen() function address”
        interpreterProxy pop: interpreterProxy methodArgumentCount + 1 thenPush:
        (
        interpreterProxy positive32BitIntegerFor: addr).

        So it seems it is RTLD_DEFAULT. But this is all I can tell without further investigation.

        1. HI Mariano

          thanks for the info. It doesn’t actually help with the VM of VA Smalltalk, but it was worth asking. I’m playing with ltrace parameters to see if I can find it.

Comments are closed.