5 Reasons To Retire DataStage

Dhananjay Bapat headshot
6 min read

Legacy software is a drag on any enterprise. Organizations that have been around for decades have accumulated a lot of software to run their business. This includes various applications for business functions, data stores for operational data and analytical data, and software that connects them. But no software is perfect. To meet specific business needs, IT teams have to come up with workarounds, custom code, or complementary tools to satisfy those shortcomings. One name for it is “technical debt,” and just like financial debt, if your organization does not routinely pay it down (e.g., eliminate the custom code, consolidate different tools, and productize those workarounds), it piles up and makes your entire business inefficient.

IBM DataStage is one such tool. It has been around for decades. It might have served the needs of organizations in the past, but now it is weighing them down. It is an on-premises-based software made for organizations in the 1990s that often runs on custom hardware and has a thick client when the primary need of the enterprises was to move data between various data stores for analytics. IBM DataStage is the technology of the yesteryears and here are the five reasons why you should move away from DataStage.

See how SnapLogic compares to the competition

1. Painful upgrades

DataStage upgrades are complex and they put development teams between a rock and a hard place. If you do choose to upgrade, your developers have to develop weeks to do the pre-work (such as environment setup) and post-work (such as regressions). You need a skilled DBA to make sure that upgrades are done correctly and in the right sequence. And while you are doing all of that, you can’t develop any new integrations. If you have a growing backlog of projects, you can’t afford to stop development even for a week, so forget a month-long break. So before you embark on any upgrade, you have to see how impactful those updates will be in helping you deliver on business objectives.

And if you decide not to upgrade, you will lose out on the latest features. Often, teams forgo upgrades to keep delivering value to the business and any future upgrade becomes that much more complex. DataStage sometimes does not support upgrades from an older version and in that case, your upgrade process will take twice as long and cost twice as much.

2. Compounding costs 

IBM DataStage is an expensive platform. It might have delivered ROI in the prior decades when most of the data was on-premises, but in today’s world value delivered per dollar/pound/euro spent on DataStage is quite low. Enterprises are often locked into multi-year contracts that cost millions of dollars/pounds/euros a year. With 5 or 7-year contracts, you cannot also accurately predict your data volumes and your compute capacity needs. Most organizations have seen exponential data volumes and feel that they are severely constrained on their DataStage license and resources to implement new business use cases.

Costs are also exacerbated by the long time it takes to develop integrations on DataStage and the skilled resources needed to build those integrations. With DataStage, you have to define the schema for every object of every endpoint, which is time-consuming considering each SaaS endpoint has hundreds of objects. It also leads to brittle integrations which are hard to maintain every time the SaaS applications change. DataStage is also a hard platform to learn with a long learning curve so you have to rely on expert, expensive developers or outside consultants to build your integrations driving down your ROI.

3. Not suitable for a modern data platform

Since AWS’s founding in the 2000s companies have been modernizing legacy software and moving data stores to the cloud. Cloud data platforms such as Amazon Redshift, Snowflake, and Databricks relieve the IT and data teams from the costly and complicated task of installing, maintaining, scaling, and upgrading underlying hardware and software. Cloud data platforms have also been massively successful given that enterprises’ needs for data have grown exponentially. DataStage provides connectivity to Snowflake but your licensed DataStage software often will not support the latest version of Snowflake, Redshift, and Databricks, negating the benefits of this cloud migration. 

Additionally, DataStage is an XML-based platform and provides poor support for modern APIs. Most modern APIs are RESTful and accept requests and provide responses in JSON format which is variably structured. IBM DataStage platform is very clunky and unsuitable for most integrations since they deal with data from modern cloud-based applications and data stores. 

4. Fragmented operational view

IBM DataStage can only do data integration. It was built for connecting on-premises databases and data warehouses. It provides some connectivity to cloud data sources such as Snowflake, and Redshift but often does not support the latest features. That means, you will have to leverage other integration tools for automating business processes such as quote-to-cash, employee onboarding, and for API management. With multiple integration tools, your team needs multiple skill sets to build legacy system integrations that automate the enterprise. Additionally, you get a fragmented operational view of integrations that necessitates context-switching, complicates debugging in case of failures, and significantly reduces integrator productivity. 

5. Slow time to innovation and insights

It takes a long time to build integrations with IBM DataStage. Having to specify schema, and lack of standard ways to handle errors, slows down the development teams. As a result, with DataStage, your development team’s backlog will only increase as your business teams look to automate more business processes and data movements. Because of less-than-optimal connectivity to popular cloud-based services such as Snowflake, Redshift, Databricks, and Google BigQuery, your team also misses out on the latest innovations from these future-defining data platforms. These hurdles ultimately affect your business’s ability to glean insights from all sources of data, something that will help stand out and thrive in this competitive world.

Any one of these reasons can be important for your organization. Regardless, if you want to move away from IBM DataStage and know how SnapLogic can help you solve your challenges, request a demo here.

Dhananjay Bapat headshot
Senior Technical Product Marketing Manager at SnapLogic
5 Reasons To Retire DataStage

We're hiring!

Discover your next great career opportunity.