How We Cut Microsoft Fabric Refresh Times by 81% to reduce cost and provide faster business decision making. 

By Juan Fernandez (data engineer, Amarji ltd)
 
At Amarji, we helped a client slash their Microsoft Fabric daily refresh time from around 80 minutes to just 15 — a reduction of roughly 81%.
Instead of throwing compute at the problem, we focused on removing overhead, optimising data flows, and using the right tools for the right job.
 
1. Full-Load Tables
 
Problem:
Some wide tables were being fully reloaded each day, even though several large columns weren’t used by analytics. These unnecessary columns increased I/O and slowed extraction.
 
Change:
We excluded the unused columns directly at the data source.
 
Result:
Refresh time dropped from 27 minutes to 7 — a 74% improvement.
 
2. Append-Only Tables with a Monotonic Primary Key
 
Problem:
Daily deltas were small, but the PySpark notebooks managing the append logic still had to start a Spark cluster each time  adding ~4 minutes per run just for startup.
 
Change:
We eliminated Spark for this task. Instead, a Lookup activity reads the max key directly, and a Copy activity appends only the new rows.
 
Result:
Refresh time fell from 17 minutes to 2 — an 88% improvement.
 
3. Tables with Updatable Rows
 
Problem:
For tables that allowed updates, we ran frequent micro-batches using PySpark notebooks. Again, the compute startup cost outweighed the actual work.
 
Change:
We switched to Python notebooks using DuckDB for lightweight local joins and merges, writing back to the Lakehouse afterward.
 
Result:
Refresh time dropped from 35 minutes to 5 — an 86% improvement.
 
The Overall Impact
 
Across all patterns, total refresh time dropped from ~80 minutes to 15, an 81% faster pipeline overall.
For the client, that means:
 
  • More frequent Power BI refreshes
  • Shorter compute windows
  • Lower operating cost per run
By cutting refresh times by 81%, the client didn’t just save minutes, they gained flexibility, lower costs, and the ability to make data-driven decisions faster.
 
Key Takeaways
 
  • Trim early. Remove unnecessary data at the source — fewer bytes moved means less time spent.
  • Right-size your engine. Use Spark where parallelism matters; rely on Fabric-native activities for small deltas.
  • Lightweight compute wins. DuckDB is a great fit for fast, in-process upserts on modest datasets.
 
At Amarji, we help teams get more out of their data stack by designing with intent, not excess.
Scroll to Top