Monday 18 February 2013

A weekend with Talend

I am constantly surprised by the wealth of open-source software that is available for use. So when flu struck myself and my family this weekend, which put paid to our plans, I decided to do some software evaluation. Talend have been occupying the 'visionary' side of Gartner's Magic Quadrant Data Quality software for some time now.

Last year, I evaluated their Open Profiler, and found it useful, if a little clunky. But that is the tip of the iceberg. They also provide a Data Integration tool, which I decided to have a go with. 

The version I was using is version 5.2.1 Installing was simple, I merely extracted the zip file that you download from the Talend's website, and selected the file that would run the software. There are Linux, OS-X, Solaris and Windows options that all come packaged and ready to run in 32 or 64 bit options. The program runs on the well-known eclipse graphical user interface, so it depends upon having Java installed on your machine.

Once opened, the program took time to fully load all of the tools. But when selecting them from the panel on the right, I could see why. There is just about everything you need to be a one-man data integration specialist. It contains enough JDBC connectors to enable you to connect to just about any database. 

The database I chose was an old instance of MySQL that I had on my computer for some time. I set up some dummy data and dived right in.

The whole package is just the right mix between simplicity and configuration. Extracting, parsing, joining, transforming data is very straightforward. The way the program deals with type 1, type 2 and type 3 aspects of slowly-changing-dimensions is fantastic. That function alone makes it an outstanding piece of work that should save you a huge amount of development time. All of the modular jobs have the ability to export their results and the details of any errors back into the database of your choice, or perhaps into files. This means you can produce comprehensive management information on the efficiency of your processes.

Once you have built your DI jobs, you can export them as self-contained programs that can be deployed within the package or platform of your choice. As long as javascript is enabled, the jobs will run. Before the weekend was out, I had a fully functioning, scheduled data warehouse, with a  comprehensive detail layer and a presentation layer of summarised MI ready to plug an OLAP portal into.

There are some limitations to this package. If you are to work with a group of analysts, a shared repository is vital. However, you have to get the enterprise version for that, and that doesn't come cheap. But leaving that aside, Talend have to be congratulated for putting together quite an impressive piece of Data Integration software.. Honestly, just I can't believe it's free.

Next week I will be attending Talend's roadshow to see their new developments in the data science discipline of 'Big Data'. I will let you know how it goes.

No comments:

Post a Comment