by Esteban Agustin D'Amico
It is no secret that data offers a ridiculously important competitive advantage to those who know how to turn it into straightforward information. However, many companies do not make the best out of the available data. Transforming raw data into actionable insights is the key to improved decision-making.
As organizations grow, they store and produce more and more data. Based on immediate needs, each system or business unit opts for a database option. But it soon turns out that these many databases use different technologies, different data organization and/or even different data formats. These heterogeneous databases turn data consolidation into a nightmare.
This is the common problem our client was facing when he turned to us. His accounting system was not unified. It involved its invoicing system, its marketing data and its consignment management software. All platforms – Filemaker, MySQL, AWS – were different and they were cohabitating on different environments, databases, vpns, etc.
This interfered with the readability of the data and resulted in the absence of a unified accounting source of trust. Long story short, an ETL problem.
To deal with that problem, our client was running scripts based synchronizations coded in Python and Bash. It became an impediment when trying to validate processes because the complexity of the data sources made the error management a chaotic procedure. Automatically notifying managers about error occurrences, and notifying developers the cause of the mentioned errors could be described as a pain in the neck.
Debugging accounting reconciliation differences with multiple scripts in different languages was another big issue. Teams could spend hours trying to guess where the problem was.
The problems were not only on the development side but also impacted production environments. The lack of good logging practices made the team blind over the quality of the reported information and could not accurately decide if the report passed the tests before using it.
Changing approach was unavoidable, and so was choosing the right tool to tackle the pain and needs of this company.
Complex data integration from the different source systems
Lack of error management strategy
Providing actionable accounting insights
When it comes to challenging these complex situations, Pentaho Data Integration (aka Kettle), the ETL tool part of the Pentaho suite, is a robust tool that should not be underestimated.
And here is why we decided to go for Pentaho.
It is a powerful tool to integrate multiple data sources and processes (such as sending emails, downloading documents, scheduling actions…) into a smoothly functioning alliance
It spares manual logging of information, thereby reducing error probabilities
It offers robust error handling possibilities through process testing
"Pentaho data integration prepares and blends data to create a complete picture of your business that drives actionable insights.”
We modeled the problem “synchronising data sources”.
We adapted the different sources, removed useless information and aggregated when needed.
We tested the intermediate information. If it automatically passed the tests, a report was created and a notification was sent by email to confirm that the process had been successfully completed.
It is when things go wrong that the power of Pentaho best stands out. It proved to be remarkable to identify why a process is not completed when dealing with flows, schedules, testing and notifications. Now if an execution fails, we have a very detailed log to track the problem…and fix it!
And that is not it. Pentaho has an enormous potential that deserves to be exploited to not only load the information but deeply analyze it and let it do the talking.
We could create a Business Intelligence cube by creating dimensions. Or create useful dynamic reports using Pentaho Business Analytics. Thanks to data visualization tools, analysis could become almost instantaneous. And we could also integrate PDI with Hadoop to take advantage of tons of information that are currently squandered.
It would be great to hear about your experiences with Pentaho or other ETL tools. Comment section is all yours!