My Not Hot Take: AI Isn't Replacing ETL Platforms But It Is Redefining What They Need to Be

AI won't replace ETL platforms, but it's exposing how outdated most are. Why unified, reliable data infrastructure still wins in the AI era.
Benjamin Segal, Co-Founder & CEO

I keep getting asked the same question: is AI going to replace ETL platforms?

The short answer is no. The longer answer is that AI is exposing how outdated most ETL platforms actually are, and forcing the entire category to evolve or become irrelevant. This was originally going to be a LinkedIn post, but I realized I had way too much to say about this topic.

The problem is getting worse, not better

According to a recent MIT Technology Review survey of 400 senior tech executives, 77% say their data engineering teams' workloads are getting heavier despite deploying AI tools. Gartner estimates poor data quality still costs organizations an average of $12.9 million per year.

Teams are adding more tools and spending more money. The problem keeps growing. That should tell us something.

Where AI actually helps

AI is solving real problems in data engineering. Schema mapping that used to take weeks now takes days. Pipelines can detect anomalies and self-correct. Natural language interfaces let analysts self-serve instead of filing tickets and waiting. AI can write a script, generate SQL, build dbt models, and debug transformation logic faster than any human.

But the pipeline itself still needs to run deterministically. Filtering records, joining tables, running calculations. These need to produce the exact same result every time. The value of AI is in building and maintaining that logic faster. The value of the platform is making sure it executes reliably.

"Why not just run Claude Code on open-source tools?"

This is the other question I keep hearing. And it's a fair one. Teams are already using AI agents to debug Airflow DAGs, fix broken dbt models, and investigate data quality issues.

But there's a meaningful gap between "AI can fix a pipeline when you point it at one" and "AI can run your data infrastructure." Who's monitoring at 3am? Who handles schema drift automatically across hundreds of connectors? Who owns lineage at scale?

The connectors themselves are becoming table stakes. The real value is in operational reliability. And that's still platform territory.

That said, I've seen some awesome implementations of open source tools with Claude running maintenance and self-healing pipelines. It's impressive, and it's only going to get better. Our goal with Matia is to run alongside these, not compete with them.

The market is shifting fast

Look at what's happened in just the last two years. Salesforce acquired Informatica. Qlik absorbed Talend. Fivetran and dbt merged. A number of smaller catalog and observability players have been gobbled up by larger companies. That's a market screaming for consolidation and unification.

Open table formats like Apache Iceberg are accelerating this. When your data lives in open, engine-agnostic formats, you're no longer locked into one vendor's view of your ecosystem. A platform built on that foundation can provide true unified context, lineage, schemas, dependencies, across everything. And that's the context AI needs to be genuinely useful.

Unification matters more than people realize. AI needs context to work. An agent can fix a broken pipeline, but without visibility into the full data ecosystem, it's working blind. It doesn't know what's connected to what, what depends on what, or what breaks downstream when something changes upstream. A unified platform provides that context. It knows the lineage, the schemas, the dependencies, the schedules. That's what makes AI actually useful at scale, not just another tool duct-taped to the side.

The platforms that win won't be the ones that just bolted an AI assistant onto the sidebar. They'll be the ones that can handle the full breadth of modern data needs: real-time streaming, unstructured data, integration with LLM and RAG architectures, and expanding connector ecosystems that keep pace with how fast teams adopt new tools.

And they still need to do the "boring" stuff reliably. Move data from point A to point B. Clean it up. Land it somewhere useful. Every time, without fail.

The question was never really "will AI replace ETL platforms."

It's which platforms are built for what data engineering actually looks like now and in the future.

I think I know the answer.