Simply put, a data lakehouse combines the best features of data warehouse and data lake technology while also overcoming their limitations – making it much faster and easier for businesses to extract insights from all of their data, no matter what format it is in or how large it is in volume.
Traditionally, data warehouses have been very good at applying business intelligence to structured data (e.g. organised content like tables of numbers), but have required time-consuming extract, transform, and load (ETL) tools to import data from other systems of record. Data lakes were built to capture the vast (and continually growing) wealth of unstructured data (e.g. unorganised data like social media posts, sensor logs, and mobile coordinates) that today’s organisations would like to take action on. But extracting useful insights often requires expensive data science resources, and can present security and compliance challenges.
A data lakehouse removes the walls between lakes and warehouses -- marrying the low-cost, flexible storage of a data lake with the data management, schema, and governance of a warehouse. Some data lakehouses even benefit from a “zero-copy principle,” which allows IT teams to avoid the need for data copies and cumbersome ETL tools to improve compute performance. The end result is less time, less effort, less cost, and less latency involved in not just managing data, but, most importantly, getting insight and value from it, fast.
Today’s businesses need to manage ever-greater volumes of customer data—petabytes of data, generated across hundreds of thousands of daily interactions. It's no wonder they have invested in a variety of solutions to keep up: 976 different applications on average, all to track customers. But all these apps can lead to data silos across a business. We're talking 976 versions of one customer, when only one will do.
This is exactly the challenge a data lakehouse solves, delivering the scale and flexibility CIOs need to handle all this data, with the structure and schema to keep it organized.
This isn't empty talk, either. Data lakehouses can make a real impact on a company's bottom line, reducing silos and increasing operational efficiency—core concerns for today's IT and business decision-makers, according to the Salesforce IT & Business Alignment Barometer. Every business is looking for ways to accelerate time-to-market for their products and time-to-value for their customers. Data lakehouses can do both.
Best of all, data lakehouses can help your business lower costs, reduce developer backlogs, and drive efficiencies at a time when we're all being challenged to do even more with even less. By separating computing and storage, they allow businesses to easily add more storage without having to augment computing power. This is a very cost-effective way to extend analytics efforts because the expense involved in storing data remains low.
Your existing solutions can stay put. There’s no need to “rip and replace” when adopting a data lakehouse. Thanks to their open data protocols, data lakehouses can integrate easily with legacy apps and systems, whether they're pulling in first-party ad data, or BI tools, or proprietary AI models. You can then begin to phase out obsolete data management tools that require a lot of “care and feeding” on your timetable.
Like any powerful technology, a data lakehouse should adapt to changes in your business requirements, and not box you in.
With the right data lakehouse, businesses can drastically simplify data governance and compliance without slowing the pace of innovation -- a top concern for many of today's IT and business leaders, according to Salesforce's IT & Business Alignment Barometer.
They can consolidate multiple systems for data management into one platform -- reducing the amount of data spread across systems, and reducing the number of hands data travels through. They can also enjoy more control over security, authorisation levels, audit support, and more, thanks to the standardised open schema of lakehouses.
What does that look like in practice? CIOs and IT leaders can implement role-based access, so that marketing teams only have access to segmentation data, commerce teams only have access to order data, and more. They can also audit who's requesting data from the lakehouse, from where, and across what roles. It’s all thanks to the power and flexibility lakehouses put at their fingertips.
Salesforce’s Data Cloud – the next evolution of our Customer 360 platform – is powered by data lakehouse architecture, but we didn’t stop there. Built to work with broader industry and open source standards, Data Cloud takes the best of the data lakehouse and makes it even more flexible. With Data Cloud, you can make sense of all your data streams across a growing diversity of data types and sources, including REST APIs, SQL APIs, and file-level access. That makes Data Cloud a uniquely active and engaged platform across your enterprise.
Using Data Cloud, CIOs can integrate data from every step in the customer experience to deliver service faster, do more with less, and take advantage of new opportunities – all while creating unprecedented levels of personalisation.