Your data science workflow is broken...

Hello Reader,

In 2026, if you’re still manually exploring data by typing commands like df.isnull().sum(), you’re probably wasting your time.

I used to be pretty skeptical about using AI for data analysis. It felt wrong somehow. Like you’d lose rigor. Or worse, you’d stop understanding your own data.

But lately… something has shifted.

The tools have gotten better. A lot better. And I’m starting to see a real tipping point in how data scientists actually work day-to-day.

In this week’s newsletter, I want to share 4 practical ways you can leverage AI in your workflow as a data scientist or analyst.

Let's go! 💪

1. Less time cleaning data, more time (actually) analyzing data

If you’ve ever worked on a real data science project, you know the hard truth: too much time spent preparing data, and not enough time actually analyzing it.

Most of your hours go into writing repetitive code: checking data types, handling missing values, plotting distributions, looking at correlations, etc.

Not finding patterns, generating hypotheses, or digging into the interesting questions.

So here are 2 tools I’ve come across that can take a lot of that repetitive exploratory work off your plate:

1️⃣ Pandas AI

PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It's totally open-source, and you can hook it up with any LLM you like.

2️⃣ Data Formulator

This one comes out of Microsoft Research. And the easiest way to think about it is this:

Excel formulas + Power Query + modern AI.

You load your dataset and start interacting with it in natural language. But the nice part is that the tool shows you exactly how the data is being transformed and visualized, step by step. Nothing hidden. Which means the workflow stays transparent and reusable.

There’s also an agent mode that can help surface ideas from the data. Give it a broad prompt like “show me interesting trends in this dataset”, and it will start transforming the data, generating visualizations, and suggesting directions you can explore further.

If you already spend a lot of time in the Microsoft ecosystem (e.g Excel, Power Query), it’s definitely worth experimenting with.

2. Automating workflow with MCP

Top data scientists automate most of their workflow.

One thing I recently discovered that honestly changed how I think about productivity as a data scientist is MCP (Model Context Protocol).

Think about your typical day.

You're jumping between 5 different apps. You're querying databases, pushing code to GitHub, sharing updates on Slack, pulling docs from Google Drive.

That constant back and forth kills your productivity.

MCP basically lets you connect all of those tools to an AI assistant like Claude, so instead of bouncing between apps, you're doing everything from one place.

Here’s an example:

I installed Claude Desktop and connected my PostgreSQL database through MCP. Now, instead of opening a SQL client, writing a query, running it, copying the results, I can just ask:

“Show me the top 10 customers by revenue this month.”

And the result comes back instantly in the chat. One conversation instead of 4 separate steps!

Now, not every employer allows tools like Claude Desktop due to security policies, and that's fair. But even if you can’t use it at work yet, it’s worth experimenting with on personal projects.

And honestly, I think it's only a matter of time before this kind of workflow becomes standard.

3. Creating Synthetic Datasets for Prototyping

Before you can build any model, you need data. But getting it is often messy.

Maybe the data is locked behind privacy restrictions, which happens all the time in areas like healthcare or finance. Maybe the dataset doesn’t exist yet. Or sometimes you just want to test an idea quickly without waiting three weeks for IT to grant database access.

That’s where synthetic data becomes really useful.

Here’s a simple example. I asked Claude to generate a full e-commerce database with customers, products, orders, and order items — complete with relationships between the tables and realistic fake data so the dataset actually behaves like a real system.

This was the prompt I used:

Create a SQL script for an e-commerce database called demo_store. I need four tables: customers, products, orders, and order_items, all connected with proper relationships. Fill them with realistic synthetic data so I can run queries and get meaningful results.

Within seconds, I had a working dataset I could use to test queries, build dashboards, or prototype analysis ideas.

4. Modelling with Foundation models

Unlike traditional ML models that are custom-built for specific tasks, foundation models are trained on massive, varied datasets.

Traditional Machine Learning vs Foundation Models. Image by Armand Ruiz

Foundation models are no longer just for chatbots and image generation. They're being adopted in core data science work like time series forecasting, tabular prediction, and even recommendation systems.

How it works: Instead of training a model from scratch for every task, you start with a foundation model, then you simply adapt it to your problem.

Let's say you're building a time series forecasting model. You usually collect your data → clean it → engineer your features → pick a model → train it → fine-tune it → evaluate it. And if your company wants forecasts across 200 product lines? You would normally have to repeate that process over and over.

But nowadays, you can use foundation models like Chronos by Amazon and TimeGPT. You just feed in your data and get a forecast.

No feature engineering. No model selection. No training loop.

Netflix recently replaced a whole stack of separate recommendation models with ONE foundation model trained on billions of user interactions, and now every team just fine-tunes on top of it.

Whether we like it or not, the data science field is shifting pretty quickly.

But honestly, that’s not a bad thing. It just means the people who learn to use these new tools will have opportunities that didn’t even exist a couple of years ago.

I'd encourage you to pick 1 thing I covered in this newsletter, and try it out:

If you want to speed up exploratory analysis, check out tools like Pandas AI or Data Formulator.
If you’re tired of jumping between 5 different apps all day, MCP can connect much of your workflow in one place.
If you need a quick dataset for demos or experiments, use tools like Claude or ChatGPT to generate synthetic data.
And if you want to build models faster, explore using foundation models so you don’t always have to start from scratch.

I’ll be diving deeper into this topic in an upcoming video on my YouTube channel. Stay tuned! 😊

Have a great week ahead! 🙌
Thu

P.S: Work with me:

If you want a comprehensive course from Python fundamentals to building AI applications, check out my Python for AI Projects course. It’s packed with everything you need to build solid fundamentals and transform your skills in 2026.

Thu Vu

Say hi 🙌 on Youtube, LinkedIn, or Medium

Check out my older posts Here

Your Preferences:

Change your account details

Unsubscribe from all emails

Was this email forwarded to you? Subscribe here.

Thu Vu

Your data science workflow is broken...

1. Less time cleaning data, more time (actually) analyzing data

2. Automating workflow with MCP

3. Creating Synthetic Datasets for Prototyping

4. Modelling with Foundation models

Thu Vu

What if this were easy?

Will AI replace data jobs?

AI read my 150 pages of journaling