NeoCSV for VA Smalltalk


There are tasks that a developer has to perform over and over again. Whenever you need this particular thing done, you usually just hammer down some code to get the particular task at hand done. So over the years you end up with two or three or even a few hands full of implementations that cover one and the same task. Every time you solve the problem, you approach it from another angle and end up with an implementation that is almost the same as the last three, but there is this one little aspect that is different this time.

There are several approaches to this problem:

  1. You remember you once wrote that thing and look at it. You find it covers what you need, but not fully. But the task at hand is so simple that reusing and extending your old code takes too much time. So you write another variant of your old code
  2. You have completely forgotten you did it, but have a few patterns in mind that would be just a few lines of code, so you just start right away
  3. You look for existing solutions from other sources. Most of the time, you find you don’t really like them or they are too heavyweight for the job at hand. So you implement your own subset of the problem

A few classical problems come to mind: reading INI-Files, Arranging the 29 to 31 days of a month in a nice table, reading and writing CSV files.

This time, my task at hand was to import/export data from/to our Kontolino.de production database. Some of that data is simple and homgenous enough to use plain old csv files. We need a waz to upload new tax report definitions to all stages (development, test, production) or to move some subset of data from production to test machines to debug more complex problems using real data.

What comes to mind? XML or CSV. The great thing about CSV is that you can edit data in your office suite’s calculation program very nicely and use it as a transport medium that is extremely easy to read/write. Since some of the data that we get from third parties are available in CSV or Excel format anyways, this is the natural choice.

So here we are: We need to import CSV data and convert it to all kinds of Smalltalk objects.

We had a few half-baked implementations of a CSV reader in our portfolio. But in the first two I looked at, I found missing pieces, like the ability to not only import a list of dictionaries that would then be converted to Smalltalk objects line by line, but to directly map from a comma-separated attribute to a Smalltalk object, just like in an O/R-Mapper.

So I decided to use NeoCSV, because had already used Sven’s STOMP package before, and knew he’s writing excellent code. He also comments his code very nicely so that it is easy to find your way into using it within a few minutes.

I ported his package to VA Smalltalk in something like 15 minutes. Unfortunately, I had somehow forgotten about Instantiations’ Monticello Importer, which would have imported the mcz in 2 minutes. But I used it for the tests, which turned out to be a great idea.

The importer comments out code that doesn’t compile in VAST, so you can change it without risking code loss due to load errors. And while the NeoCSV code almost instantly ran in VAST without any problems, the tests were packed with Pharo-specific convenience methods. Things like {} to create an Array or String>>#join: would be handy in VAST, but I didn’t know where I could put these to make life easier (and, I have no idea what would be needed to implement the curly braces thing). So I changed the test code quite a bit.

I even left two test methods in an error state (red in the sunit runner). These rely on the implementation of #= for Dictionaries, and there seem to be different opinions on what makes two dictionaries equal between Pharo and VAST. I tried Dictionaries in my code, so I am sure they work in NeoCSV, but I didn’t take the time to fix the tests.

So here is a little code snippet that reads a CSV file and converts each line into an object:

|file|
[
(NeoCSVReader on: (file := CfsReadFileStream open: 'customers.csv'))
  skipHeader;
  recordClass: Customer;
  addIntegerField: #id:;
  addField: #firstname:;
  addField: #lastname:;
  addField: #gender:  converter: [:str|  str = 'm' ifTrue: [Gender male] ifFalse:[Gender female]];
  addField: #classification: converter: [:inp| inp asSymbol];
  addField: #locked: converter:  [:str|  str ~= '1'];
  addField: #editable: converter:  [:str|  str ~= '1'];
 upToEnd.
] ensure: [file close]

That’s nice and easy. And performant! The really nice thing about this is that it returns ready-made Customer instances. You can of course just use Dictionaries and work with them as usual, but you can also get objects from the CSV file. I never took the time to implement this in my attempts, so even if it’s not rocket science, it helps mesolve my tasks a bit faster.

So if you’d like to stop reimplementing CSV import/export, you can download NeoCSV for VA Smalltalk on VASTgoodies.com. Of course there also is some documentation.