In my seemingly endless hunt for potential problems with Umlauts travelling between our users’ web browsers and our Seaside images, I find new areas of “interest” almost daily. As a little background information, it might be useful to mention that at least the Smalltalk diealect we are using (VA Smalltalk) is not speaking UTF-8 natively, so a German Ü in our images is encoded in ISO-8859-1 (or Windows 1252 or such), while the rest of the web uses utf-8 these days. Since Version 9.1 of VA smalltalk, we have at least a reliably working Seaside Adapter which converts between the web client (utf-8) and server (iso-8859-15 in our case).

This time, the problem was with special characters that had been entered into Smalltalk browsers in FileLibrary methods. In our case, we’re talking about Javascript code that is stored in WAFileLibraries and being edited inside the Smalltalk IDE (if you have no idea why anyone would want to do that, this article is probably not suitable for you 😉 ).

Since we deploy or file library contents to the file system on the web server (Apache will servve these from the file system instead of the Smalltalk image) for performance reasons, the above-mentioned adapter is not in the game any more when a user accesses our site. So our wait dialogs and stuff that had been implemented in these javascript methods would always display wrong characters.

Once you remember that the reason why things work on your development machine and doesn’t in production is due to this deployment, it is clear that the deployment process is not converting the files on output. So they end up on the file system in the wrong encoding.

There are at least two ways you could fix that:

  1. fix deployFiles in WAFileLibrary to add teh step of converting to utf-8 before saving to disk
  2. use the power of bash and iconv to convert the files on the file system on the deployment machine(s)

There are lots of good reasons for 1. However, apart from the fact that I don’t have access to Seaside’s base code and am not even using Pharo, and because I’d have to test what happens to binary files (like .png and such), I decided to use approach 2.

The script for doing this is easily stippled together with the help of StackOverflow

find . -name '*.js' -exec iconv -f iso-8859-1 -t utf-8 "{}" -o "{}" \;

As I said, integrating this into WAFileLibrary in Seaside would be much better. There are a few caveats when using this script:

  • you must run it exactly once, otherwise it will completely break Umlauts on second conversion
  • you need to remember to run it every time you update the files on the server
  • there are more, I am sure. I just can’t think of them right now

So this is not perfect. I will try to integrate it into teh two relevant implementations of #deployFiles and see if I can contribute that code back to the Seaside maintainers…