Previously published on medium.com.
This post is the second in a series providing an overview of Enterprise Integration. See Enterprise Integration for the previous post in this series. This post will give an overview of various integration tooling options and how Enterprises risk creating tooling silos while trying to build integrations to avoid data silos.
Enterprise Data Landscape
There are a variety of applications and systems which store data in an Enterprise. Considering a typical Enterprise, some of the applications in use and the integration approach to use with those systems are:
- SaaS Applications — Most SaaS applications expose REST or SOAP APIs for integration purposes. For some, custom SDK’s can be used to develop connectors.
- Custom On-Premises Applications — This would include custom applications running on customer data centers or on their IaaS accounts. Integration with such applications would use REST or SOAP APIs. Since the application is hosted by the customer, directly accessing the underlying database is also an option for integrating with such applications.
- Databases — Integrating with databases would use SQL queries or bulk load/unload tools for higher throughput. The same would apply to cloud-hosted databases. For NoSQL databases, custom SDK or REST APIs would have to be used for integration.
- Data Warehouses — Integrating with data warehouses would use SQL queries or bulk load/unload tools for higher throughput. The same would apply to cloud-hosted data warehouses.
- Data Lakes — Data lakes usually expose filesystem-level APIs for loading data into the lakes (aka data lake hydration). This applies for on-premises Hadoop based data lakes and also cloud-hosted data lakes using S3/ABFS/GCS etc.
Integration Tooling Options
There are a variety of applications and systems which store data in an Enterprise. The various integration tooling options available when working with these applications include:
On-Premises Integration Tools
- Support ELT use cases for data integration
- Limited support for SaaS applications and non-relational data formats
- Difficult to maintain and update
Custom Integration Applications
- Custom-developed in house applications for integration
- Non-trivial to maintain and update, difficult to add new functionality
Message Queue Focused Tooling
- Provide message queue interface for application integration
- Developer focused approach, rather than no-code
- Focused on in house applications rather than SaaS applications
SaaS Focused Integration Tools
- Focused primarily on the integration of SaaS applications
- Limited support for on-premises use cases and data integration use cases
IaaS Ecosystem Focused Tools
- Tools focused around integration across particular IaaS ecosystem
- Limited support for applications outside of the particular ecosystem
API Management Tools
- Support development of API’s to expose application data
- Require additional integration tooling for API authoring
Data Warehouse Integration Tools
- Support loading of application data into data warehouses
- Focused on data load into warehouses, limited support for data extract from warehouses
- No support for integration between SaaS applications, limited support for data lakes
Data Lake or Big Data Tools
- Support loading of application data into data lakes
- No support for integration between SaaS applications, limited support for databases and data warehouses
Tooling Silos
With the variety of integrations tools available, enterprises risk creating integration tooling silos. Considering an Enterprise with an On-premises SAP instance and a Workday Cloud application, they could face a scenario where multiple integration tools are required to integrate with these applications:
- On-Premises Integration Tools for integrating SAP data with other on-premises applications
- SaaS Focused Integration Tools for integrating Workday with other SaaS applications
- Data Warehouse Integration Tools for loading Workday data into a Cloud Warehouse (because the SaaS focused tool does not support data warehouse loads)
- Custom Integration applications for integrating SAP with Workday, since the SaaS integration tools do not work with on-premises SAP and the on-premises tools do not work well with complex data formats used by Workday
This leads to a scenario of integration tooling silos, where specialized tools are required for specific integration use cases. Multiple tools are used with one application endpoint since no single tool supports all the required use cases. The endpoint credentials need to be maintained in multiple integration tools. Every new integration needs to start with the consideration of which tool would have to be used for that use case.
Some of the technical challenges which come from use case-specific integration tooling are:
- Wrong tool for the job: If the only integration tooling available is a big data-focused tool, then all integration problems become big data problems. Hammers and nails!!
- Wrong design choices: If the integration tooling available is a data warehouse focused tool, then all integration problems require the use of the warehouse as a staging area. This is usually not the right design for application integration use cases and for scenarios involving log data where data lakes would be better suited.
- Uni-directional data movement: Some of the use case-specific tooling work in one direction only. For example, data warehouse loading tools usually load data into a warehouse, they do not support loading data into applications.
The challenges with such tooling silos are not just technical. There are organizational challenges in terms of requiring multiple teams to maintain the various integration use cases using different tools. It becomes impossible to assign ownership of integration requirements within the enterprise.
SnapLogic Solution
At SnapLogic, we believe that a modern enterprise needs to be able to satisfy all their integration uses cases with one integration platform. Such a platform needs to be able to provide support for various integration requirements including:
- Data Integration: supporting a variety of filesystem, file formats, authentication mechanisms etc.
- Application Integration: supporting native connectivity to most enterprise applications and providing REST and SOAP-based connectivity with a flexible data model to support modern data formats
- Ease of Use: making it easy for citizen integrators to develop integrations, rather than focusing on developers
- Hybrid Deployment: making it easy to access On-premises and Cloud endpoints seamlessly
- Big Data support: making it possible to automatically scale out and process data in parallel when data volume is larger than what can be processed on a single instance
- API management functionality: making it possible to use the data and application integration functionality to author APIs and expose them to end uses without requiring external APIM tools
- Data Science functionality: making it possible to build, train and deploy machine learning models, using the integration features to load data for building models, ML features for training, and APIM functionality for deploying the model.
Each of the above use cases requires significant focus to be able to satisfy the diverse use cases faced by enterprises. For example, a data integration use case could look at a high level to be a simple case of reading files from S3. Drilling down, the actual details could involve Parquet format files that have to be read from S3, where the S3 file is accessible through a custom IAM role only and the data is encrypted with customer-managed keys. It is useful to look at the details of the use cases when evaluating integration tooling.
The SnapLogic platform is deployed at some of the largest enterprises around the world and can solve all the integration challenges faced by enterprises. Using a single integration platform has significant advantages for the customer and helps avoid the challenges which come from integration tooling silos.