What is ETL?
Think of it as housecleaning for your data. ETL stands for extract, transform, and load. ETL is software that enables businesses to consolidate their disparate data while moving it from place to place, and it doesn't really matter that that data is in different forms or formats. The data can come from any source. ETL is powerful enough to handle such data disparities.
For example, a financial institution might have information on a customer in several departments and each department might have that customer's information listed in a different way. The membership department might list the customer by name, whereas the accounting department might list the customer by number. ETL can bundle all this data and consolidate it into a uniform presentation, such as for storing in a database or data warehouse.
Another way that companies use ETL is to move information to another application permanently. For instance, word-processing data might be translated into numbers and letters, which are easier to track in a spreadsheet or database program. This is particularly useful in backing up information as companies transition to new software altogether.
One important function of ETL is "cleansing" data. The ETL consolidation protocols also include the elimination of duplicate or fragmentary data, so that what passes from the E portion of the process to the L portion is easier to assimilate and/or store. Such cleansing operations can also include eliminating certain kinds of data from the process. If you don't want to include certain information, you can customize your ETL to eliminate that kind of information from your transformation.
The T portion of the equation, of course, is the most powerful. ETL can transform not only data from different departments but also data from different sources altogether. For example, data in an email program such as Microsoft Outlook could be transformed right along with data from an SAP manufacturing application, with the result being data of a common thread in the end. Microsoft, of course, makes an ETL package, as do Oracle and IBM.
This article was written very well and it gave a good description of what ETL is. The problem with ETL tools on the market is the fact that many of them do not focus on the most important aspect of ETL and that is the "T", or the Transform, of ETL. Although extraction and loading are very important, the transforming of the data from a legacy or unique system is important because it needs to respond properly with the language of the database where you are loading the information into. Most of the times complex transformations are required to make the structure and format of incoming data conform to the destination.
If you want to run a demo of the software that specializes in transforming data, take a look at Centerprise from Astera Software.
A very nice and easy to understand article. Well done! --
Nice article, easy to understand and correlates with day to day work in layman's language.
@rallenwriter -- There are a bunch of open source ETL tools out there, and more keep coming up every day.
Just like with open source business intelligence or data warehousing tools, you kind of have to pick and choose what works best for you.
I personally use Talend, but that may not be the best choice for your business.
Since they are open source, I suggest you read the reviews on a few and download them. Play around, see what works best for you.
Are there any really good open source ETL tools?
I am a big fan of ETL tools, but sometimes they're just so darn expensive.
Does anybody have a favorite open source ETL tool that they use for their business?
That's pretty cool -- I keep hearing all this buzz about ETL data warehousing and ETL for business intelligence, but this article summed it up really nicely.
It's so easy, especially in the business world to get caught up in jargon, and this article cleared away all that verbal clutter for me.
very nice article.
Post your comments