« Free Energy | Main | Focus on the Cloud, not the Clouds »

Sunday, September 14, 2008


Tahir Akhtar

I think another use of ESP could be to replace ETL while keeping the datawarehouse intact. This will removed the ETL related bottlenecks while still preserving the ability to do historical analysis.

Farshid Zaker

I think this solution could have some performance issues.
While any change to database should be intercepted...
Is there any reference for further study...?

Anthony Eden

Some feedback on your post:

"Unfortunately, there are analytical problems that need a holistic view of data. This is very typical of data warehousing applications. As a result, data warehouses are expensive, often out of the reach of smaller organizations."

Having a holistic view of data does not mean that a data warehouse has to be expensive and out of reach of smaller organizations. With a little bit of knowledge, some good books and open source software, even small organizations can build a functional data warehouse on a limited budget.

"ETL places a significant load on your production databases."

If it does then your ETL process is designed poorly. ETL should occur in a staging environment independent of the production environment. Most databases support online replication which can provide an easy means for keeping a staging environment in sync and ready to be processed.

"As your business grows this lag will grow as well."

Yes, this is an issue with ETL systems that rely on bulk processing, however ETL has been evolving and is moving towards extracting relevant information from transaction logs and only processing deltas, which greatly reduces latency.

"While it is able to provide analytics cost effectively, it does not provide the ability to perform historical analysis"

This is a pretty big deficiency but as you say there are ways around it. The expense of recreating the transaction logs is a one-time cost and can be done in non-production environments.

My biggest issue with your post though is that it glosses over the usability issues that come with trying to work directly with operational data. Data warehouses are about more than just optimizing for analytics from a technical perspective, they are also about creating an easy-to-understand schema that business folks can use directly in a self-service fashion. This means converting operational normalized data into business-process oriented denormalized structures. Unless your event processor is doing that you are missing one of the greatest advantages of data warehouses when they are done right.

The comments to this entry are closed.