This post originally appeared on Data Informed.
As organizations look to increase their agility, IT and lines of business need to connect faster. Companies need to adopt cloud applications more quickly and they need to be able to access and analyze all their data, whether from a legacy data warehouse, a new SaaS application, or an unstructured data source such as social media. In short, a unified integration platform has become a critical requirement for most enterprises.
According to Gartner, “unnecessarily segregated application and data integration efforts lead to counterproductive practices and escalating deployment costs.”
Don’t let your organization get caught in that trap. Whether you are evaluating what you already have or shopping for something completely new, you should measure any platform by how well it address the “three A’s” of integration: Anything, Anytime, Anywhere.
Anything Goes
For today’s enterprise, the spectrum of what needs to be integrated is broader than ever. Most companies are dealing with many different data sources and targets, from software-as-a-service applications to on-premises ERP/CRM, databases and data warehouses, Internet of Things (IoT) sensors, clickstreams, logs, and social media data, just to name a few. Some older sources are being retired, but new sources are being added, so don’t expect simplicity any time soon. Instead, focus on making your enterprise ready for “anything.”
Beyond point-to-point. You may have managed integration before on a point-to-point basis. This approach is labor intensive, requiring hand-coding to get up and running, and additional coding any time there’s a change to either “point.” Integration of your endpoints could run into trouble when this happens, and then you would have to wait for your IT department to get around to fixing the issues. But the more serious problem is that this inflexible approach simply doesn’t scale to support enterprise-wide integration in a time of constant change.
Some modern concepts, when applied to integration, provide this flexibility and scale.
Microservices. An architecture approach in which IT develops a single service as a suite of small services that communicate with each other using lightweight REST APIs, microservices have become, during the past year or so, the standard architecture for developing enterprise applications.
When applied to integrations, these open up tremendous opportunity for achieving large-scale integration at a very low cost. Instead of one big execution engine running all integrations, smaller execution engines run some integrations. This way, you can supply more compute power to the integrations that need it, when they need it. You also can distribute the integrations between nodes on a cluster based on volume variations for horizontal scaling.
The document data model. Today’s modern applications produce more than just row and column data. So how do you achieve loose coupling while simultaneously supporting semi-structured and unstructured data, all without sacrificing performance? You can group data together more naturally and logically, and loosen the restrictions on database schema, by using a document data model to store data. Document-based data models help with loose coupling, brevity in expression, and overall reuse.
In this approach, each record and its associated data are thought of as a “document,” an independent unit that improves performance and makes it easier to distribute data across multiple servers while preserving its locality. You can turn object hierarchical data into a document. But this is not a seamless solution. Documents are a superset of row-column based records, so while you can put rows and columns into a document, it doesn’t work the other way around.
Anytime is Real Time, and It’s Happening Now
Today’s use cases like recommendation engines, predictive analytics, and fraud detection increasingly demand real-time, “anytime” capture and processing of data from applications, A modern integration platform needs to have a streaming layer that can handle real-time use cases as well as batch processing.
Many organizations are used to choosing tools based on data momentum: ESB platforms for event-based, low latency application integrations; and ETL tools for high-volume batch jobs. Now, though, enterprises have to look for the simplicity and flexibility of a framework that can support both batch and real-time, “anytime” processing, and architectures like the Lambda architecture are a result of that need.
The Lambda architecture is designed to balance latency and throughput in handling batch and real-time use cases. The batch layer provides comprehensive and accurate views. It can reprocess the entire data set available to it in case of any errors. However, it has a high latency, so to compensate, it also has a speed layer that provides real-time processing of streaming data. The serving layer of this architecture consists of an appropriate database for the speed and batch layers, which can be combined and queried to get answers from the data.
Because of these real-time use cases, streaming platforms have become very desirable.
Anywhere Should Look Like Everywhere
With today’s hybrid data and deployment architecture, your data can be anywhere, in any format, and might need a different processing treatment based on the particular use case. So, for example:
- If all your applications and data are in the cloud, you would need a cloud-first approach for all the other parts of the application ecosystem, including integration.
- If you have a hybrid architecture comprising both on-premises data and cloud applications, you may need to restrict data from leaving the premises. Consider an on-premises data plane for processing within the firewall.
- If you have a big data ecosystem, you probably need the flexibility to run natively on Hadoop using the YARN resource manager and to use MapReduce for processing any integration or transformation jobs.
Meanwhile, Spark has been gaining a lot of traction for processing low-latency jobs. For certain use cases that require analysis in real time, such as fraud detection, log processing, and processing data from IoT, Spark is an ideal processing engine.
Integration is at the heart of every successful social, mobile, analytics (big data), cloud, and IoT initiative. It’s no longer possible to scale up to success while having to choose between multiple tools and teams for application, process, and data integration. Successful enterprises today need to access and stream resources instantly through a single platform. When you have the right platform – one that provides anything, anytime, anywhere it’s needed – your users will never need to stop and ask what resources are being used, whether information is current, or where it’s located. Whatever they need will be available to them when they need it and wherever they are.
I also posted this on LinkedIn. Comments are welcome on Data Informed, LinkedIn or here.