A Data Lakehouse without Data?

A data lakehouse without data

 

Imagine if you bought a beautiful lake house, invited all your friends to come and visit, and the lake was dry?  Not much value and a little embarrassing, right?

Now imagine you have that beautiful lake house and you have special water valves to control not just if there is water in the lake but also control the water quality, clarity, and what fish the lake is stocked with?  Much more impressive, correct?

Delta Lake on Databricks is that beautiful lake house and Stitch from Talend is that water valve in this metaphor.

A Powerful BI Experience on Data Lakes

Databricks has recently announced the availability of SQL Analytics and several months ago, Talend released support for Delta Lake on Databricks in Stitch.  The combination of these two services enable data analysts and business intelligence developers to quickly and easily load data into Delta Lake on Databricks with the agility and scale of a fully-managed data pipeline service in a few clicks of the mouse.

Stitch reduces the barriers to loading data into a Delta Lake by providing a simple but powerful interface to define data replication processes from over 100 different data sources into Databricks Delta.  While it is very simple to define the replication process, Stitch provides a highly scalable and reliable SaaS based engine to perform those processes.  It literally takes 5 minutes or less to define the data replication process, you turn it on, and you forget about it because it just works, every time, all the time.

This is important for Databricks SQL Analytics because anyone who wants to perform SQL based analytics, can get their enterprise data into the data lake in minutes and leverage the reliability and performance of Delta Lake and none of the hassle traditionally expected. To deliver a powerful BI experience on a data lake, especially those enabled by Delta Lake, Databricks leverages its newly launched Delta Engine, a vectorized query engine that includes a query optimizer and caching capabilities that accelerate query performance on Delta Lake. This allows analysts and BI developers to access complete, recent, and reliable data to build dashboards and reports in minutes.  It also allows ETL/ELT developers using Talend Studio to leverage the Delta Engine for data engineering jobs.

For example, your VP of Sales asks for an analysis of regional sales performance using your Salesforce data.  You set up the integration in Talend Stitch (5 minutes), allow the data to replicate (5 to 15 minutes), and start building SQL reports in the Databricks UI.  In under an hour, you have your sales analytics results.

And then your VP of Marketing asks to augment that analysis with Marketo data and Google Adwords to see which sales region provided the best marketing spend ROI.  Just set up an integration with Marketo to Databricks and Google AdWords to Databricks, and again, in minutes you have your results.

Compared with traditional Hadoop or Data Warehouse approaches, what can be done in minutes or hours with Databricks and Talend used to take days, weeks, or even months to accomplish.

The combination of Delta Lake on Databricks, Databricks SQL Analytics, and Stitch from Talend gives you a data lake stocked with data that helps you be more agile, more efficient, and most importantly will impress your business leaders with fast, high quality answers without spending a lot of money.  Start impressing your business leaders now with a trial account or find Stitch in the Databricks Partner Gallery.

 

The post A Data Lakehouse without Data? appeared first on Talend Real-Time Open Source Data Integration Software.

Leave a Reply

Your email address will not be published. Required fields are marked *