Data volumes are exponentially increasing and many organizations are starting to realize the complexity of their growing data movement and data management solutions. Data exists in various systems, and getting meaningful value out of it has become a major challenge for many companies. Also, most of the data is usually stored in relational systems like MySQL, PostgreSQL and Oracle, these being the mainstream databases primarily used for OLTP purposes. NoSQL systems like Cassandra, MongoDB and DynamoDB have also emerged with tunable consistency model in order to store some of these mission critical data. Customers then typically move these data to much bigger systems like Teradata and Hadoop (OLAP) that can store large amounts of data, so they can run analytics, reporting or complex queries against it. There is also a recent trend where some of these data are moved to the cloud, especially to Amazon RedShift or Snowflake and also to HDInsights or Azure Data Warehouse.
If we talk about previous and current data flow trends across various systems, here is how it looks:
As an example: Think about a major food or a retail industry. The traditional way to analyze buying patterns/trends was to store data in flat files or operational databases, and later move these to a cloud based data warehouse system like Teradata (previous trend). This process was more of a batch oriented operation with no real time insights. For example, customers receiving coupons when they shop from these industries was purely based on historical search and recommendations. Some of these coupons may be relevant for many purchasers but most of the time they were not.
Fast forward to the current trends, and data is exploding globally. Therefore, the number of systems (sensors, etc.) to capture it is also increasing so scalable NoSQL systems have emerged to store these varied data sets. These types of ever increasing data sets have to be moved to low cost and scalable systems for analytics. This is where big data evolved and users started moving data to Hadoop instead of Teradata, mainly due to cost reasons. All these are then typically integrated with BI tools like Tableau for data visualization. With this setup, the food and retail industry is able to quickly send data to different systems & analyze on the fly with a quick feedback loop. This leads to an efficient customer-centric approach and printing coupons based on a combination of what they have bought recently/earlier.
If we talk about future trends in the data flow across various systems, here is how it can be foreseen:
As seen in the above diagram, cloud is here to stay and is becoming more prominent than ever before. Customers are slowly moving data to the cloud, so they can reduce infrastructure maintenance, cost and other complexities in-house. Some of the future data trends include:
- Moving data from operational and NoSQL systems to Redshift or Snowflake
- Moving data from Teradata to Hadoop on cloud like HDInsights or Azure Data Warehouse
- Moving data from sensors/event feeds to Azure Data Lake or Amazon S3. Later, Amazon Glacier could be used to store cold/infrequently used data.
All in all, data movement has become critical across these systems to derive core business value. This requires additional ETL development effort if done manually. But Snaplogic makes it real easy for users to move data to different systems – both for applications and databases/data warehouses (on-premises or cloud) by building pipelines via pre-built connectors called Snaps – without any coding effort. With just a simple drag and drop mechanism provided by Snaplogic scalable platform, customers can easily build pipelines with relevant Snaps, and move data from various systems so they can quickly derive business value out of it. These Snaps also perform data transformation, cleanup or wrangling so the downstream systems can get the desired data that can aid business outcomes.
A few SnapLogic pipelines for the current and future trends are shown below:
In summary, cloud will become mainstream as more and more enterprises are moving data to it, due to various benefits and flexibilities mentioned here. And SnapLogic’s cloud platform is at the forefront to helping this happen.