How We Cut Microsoft Fabric Refresh Times by 81% to reduce cost and provide faster business decision making.

November 26, 2025

By Juan Fernandez (data engineer, Amarji ltd)

At Amarji, we helped a client slash their Microsoft Fabric daily refresh time from around 80 minutes to just 15 — a reduction of roughly 81%.

Instead of throwing compute at the problem, we focused on removing overhead, optimising data flows, and using the right tools for the right job.

1. Full-Load Tables

Problem:
Some wide tables were being fully reloaded each day, even though several large columns weren’t used by analytics. These unnecessary columns increased I/O and slowed extraction.

Change:
We excluded the unused columns directly at the data source.

Result:
Refresh time dropped from 27 minutes to 7 — a 74% improvement.

2. Append-Only Tables with a Monotonic Primary Key

Problem:
Daily deltas were small, but the PySpark notebooks managing the append logic still had to start a Spark cluster each time adding ~4 minutes per run just for startup.

Change:
We eliminated Spark for this task. Instead, a Lookup activity reads the max key directly, and a Copy activity appends only the new rows.

Result:
Refresh time fell from 17 minutes to 2 — an 88% improvement.

3. Tables with Updatable Rows

Problem:
For tables that allowed updates, we ran frequent micro-batches using PySpark notebooks. Again, the compute startup cost outweighed the actual work.

Change:
We switched to Python notebooks using DuckDB for lightweight local joins and merges, writing back to the Lakehouse afterward.

Result:
Refresh time dropped from 35 minutes to 5 — an 86% improvement.

The Overall Impact

Across all patterns, total refresh time dropped from ~80 minutes to 15, an 81% faster pipeline overall.

For the client, that means:

More frequent Power BI refreshes
Shorter compute windows
Lower operating cost per run

By cutting refresh times by 81%, the client didn’t just save minutes, they gained flexibility, lower costs, and the ability to make data-driven decisions faster.

Key Takeaways

Trim early. Remove unnecessary data at the source — fewer bytes moved means less time spent.
Right-size your engine. Use Spark where parallelism matters; rely on Fabric-native activities for small deltas.
Lightweight compute wins. DuckDB is a great fit for fast, in-process upserts on modest datasets.

At Amarji, we help teams get more out of their data stack by designing with intent, not excess.