Folks from all over the world descended to Las Vegas just after the Thanksgiving Holidays for one of the biggest conferences of the year: Amazon Web Services (AWS) re:Invent. The SnapLogic team was there in full force, increasing awareness, meeting customers, partners, and prospects. SnapLogic’s Michael Nixon presented a wonderful theater session on how organizations can ‘Cure Cloud Data Architecture Complexity’. If you were not there or if you did not get a chance to watch any of the keynotes, here are key themes from the event.
Growing importance of data management, governance, and sharing
As organizations face exponential growth in data, they need tools to manage and govern that data to harness it effectively. Additionally, to provide superior customer experiences, they also need tools to share that data without copying it.
AWS announced Amazon DataZone that provides fine grained controls to govern data access. It also provides an ML-powered Data Catalog with which users can discover data sources by specifying business terms. Other key enhancement included Glue Data Quality feature that allows teams to measure, monitor and manage the quality of their data and governance and auditability feature for end-to-end ML development for Amazon SageMaker.
To avoid delays and costs associated with data movement, Amazon has built interoperability between various AWS services such as Redshift, SageMaker, and Athena. Amazon has also introduced a centralized access control to manage Redshift data sharing to provide an improved experience.
Data mesh as a concept is great. How do you implement it in practice?
Data mesh provides a great operating model that empowers domain experts to be data product owners. This distributed ownership model unburdens the central data teams who can then focus on operational and strategic issues of their data architecture. But not many have put it in practice, so it was great to learn from organizations who have built a data mesh or have helped their customers build one.
Capital One realized their data mesh through a two pronged approach which involved:
- Defining common standards around org structure, metadata curation, data quality standards, common entitlement based on data sensitivity and
- Creating excellent user experiences for data publishers, data consumers, risk managers, and business data owners
The AWS team shared how you can leverage various AWS Services to implement a data mesh pattern. Users can leverage various data services such as DynamoDB, EMR, Aurora, SageMaker, Redshift, or OpenSearch to bring operational and analytics data to an Amazon S3 data lake. They can then use Lake Formation Data Catalog to catalog all available data and define governance standards on each. Domain experts can then publish governed data sets to consumers inside and outside the organization using AWS Data Exchange.
Simplify your data architecture complexity
The data tools landscape continues to evolve and as organizations adopt tools for data loading, ETL/data transformation, data integration, reverse ETL, app-to-app integrations, managing API based data services, they end up with a complex data architecture. While some of your teams can meet their short term needs, the IT teams and in turn your organization end up with the tool sprawl and a fragmented view of integrations. And if things go wrong, or if you have to debug something, you have a mess to sort through to get to the root cause. AWS and other vendors realize that and want to help simplify the architecture. AWS unveiled its zero-ETL vision, which in fact is their vision to automate most ETL processes. As a first step towards that vision, AWS announced integration between Amazon Aurora and Redshift.
The SnapLogic team met a number of people at our booth who wanted to simplify their data architecture. Some wanted to get away from legacy ETL platforms such as Informatica or IBM DataStage. Some felt limited by data loaders, which have unpredictable pricing and yet so few features. Additionally, some were looking for an alternative to code heavy tools such as MuleSoft, or wanted to empower the business teams. The underlying theme for all these conversations was organizations’ desire to simplify their data architecture by combining multiple tools into one.
SnapLogic helps you cure cloud data architecture complexity with a single platform that can do ETL, ELT, App-to-App integration, Reverse-ETL, and API Management for both technical and business users, with hybrid deployments, all in one seamless user experience.
We hope to see you at an AWS event near you, but in the meantime, if you would like to learn more about SnapLogic, let us know and we would be happy to walk you through it.