A friend just asked if I knew what kind of encoding he’s seeing when he parses strings with Umlauts from vCards into VA Smalltalk and gets something like =FC at places where there should be an ü.
It turned out I had no idea, even though you often see text with such =XX pieces in them. And somehow I thought that’s just some code page stuff, which, had I thought a bit deeper, was completely illogical because transcoded text between codepages typically is stuffed with strange characters, but not sequences of letters that you find on your keyboard.
A few minutes later I thought, well, that could be URL encoding. But it wasn’t, because URL encoding uses % to mark an encoded character, not =. But the FC for ü was correct. So I was close. Searching the web a little, I found out about quoted-printable encoding for mails and such. So I learnt something.
And it turned out I couldn’t find any method in my VA Smalltalk Image that seemed to help with this kind of encoding. So I had no good tip for my friend.
As always when you learn something, you should immediately try and use your new knowledge. And the job seemed to be easy enough to be done in an hour or so. So I went and implemented a tiny tool to encode and decode Strings in quoted-printable. So far, it seems to work and the results are compatible with several online encoders.
To be honest, it’s just one single class
QuotedPrintableCoder with two class methods called
decode:, so I guess I am not changing the world with it, but my friend is already using it in production code to import vCards and it seems to do its job nicely.
So if you need to decode mails with strange equal signs in them, you can now download a little coder from VASTGoodies.com.
Please let me know if you find bugs – or even better: fix them and publish your improved version to VASTGoodies. And if you know that a coder exists somewhere in VA Smalltalk, please keep it for yourself, because I’d be looking quite stupid (although I’d like to know about it)😉