Google Drive to SharePoint migration — what could go wrong and how we handle it
A cross-cloud migration is a long-running operation against two APIs you don't control, both of which rate-limit aggressively. This page walks through what MigrationFox checks before it starts, what it retries silently, what it skips on purpose, and what you should actually watch for on the live dashboard.
Example mid-run dashboard for a ~40K file Google Drive migration. Retries column showing a non-zero number is normal and healthy.
WHAT YOU'LL SEE
A healthy run is defined by Failed = 0, not Retries = 0. Retries climbing into the hundreds while Failed stays at zero means the retry logic is doing its job — the destination API throttled us and we backed off. Only panic when the Failed counter starts ticking.
Pre-flight checks (before a single byte moves)
When you click Start migration, MigrationFox runs a silent pre-flight pass against both endpoints. You see it as the few seconds between clicking and the progress bar appearing. If any of these fail, the job never enters the transfer phase — no partial state to clean up.
- Source credential. A test call against
files.list?pageSize=1on the Google Drive API. Confirms the service account or OAuth token is still valid and scoped fordrive.readonlyor higher. Expired service account keys surface here, not an hour into the run. - Source scope. The scanned domain / shared drive / user is resolvable. A common failure mode: service account delegation was authorized for only one admin's domain and you're scanning a second domain that was added later.
- Destination credential. A
GETagainst the destination site collection. Confirms the Azure AD app registration's app-only token can still authenticate and that the site URL didn't change since credentials were saved. - Destination write permission. A tiny
PUTof a zero-byte probe file into_mfox_probe/, followed by aDELETE. The probe file is never visible to end users — if you see leftover probe files in SharePoint, a previous job was force-killed mid-preflight; delete them manually. - Tenant blocked-file-type policy. We read
Get-SPOTenant DisallowedFileTypesequivalent from the tenant and cache it for the run. This is how we know to mark.exe/.dllas pre-skip instead of failing at upload time. See the executable handling doc for how that list is enforced.
What fails at 99% and how MigrationFox retries
The biggest cause of "my migration stalled" support tickets isn't credentials or disk space — it's transient 429s, 5xxs, and socket timeouts that would look like failures if we didn't retry them. The retry ladder:
| Symptom | What happens | When you'd see it |
|---|---|---|
429 Too Many Requests (either side) | Exponential backoff, starting at 2s, doubling up to 60s, capped at 8 attempts per file. Retry count increments, file is re-queued. | Very common on large jobs — Google Drive and SharePoint both rate-limit per-user and per-tenant. |
500 / 502 / 503 / 504 | Retried up to 5 times with jittered backoff. Treated as transient infrastructure noise. | Expected a few dozen times on any multi-GB job. |
| Socket timeout / connection reset | Retried up to 5 times. On the 6th, the worker reconnects with a fresh HTTP/2 session (old one was probably half-closed by a load balancer). | More common on long jobs — load balancers idle-timeout connections after ~30 minutes. |
| Checksum mismatch after upload | Destination file is deleted and the source is re-uploaded, up to 3 attempts. On the 4th, the file is marked Failed (not Skipped) and logged with both checksums. | Rare but happens — usually a partial network-level corruption. |
| Token expiry mid-run | Token refresh triggers automatically; in-flight requests complete with the old token and new ones use the refreshed token. | Invisible on the dashboard unless the refresh itself fails, which surfaces as a pause-and-prompt. |
401 Unauthorized (persistent) | Job pauses, not fails. You get a notification to re-authenticate; the job resumes exactly where it left off. | Password rotation or admin consent revocation mid-run. |
TIP
The Retries column climbing into the thousands on a huge job is not a problem; it is how the backoff ladder protects the run from source-side throttles. The numbers to watch are Failed and Skipped (unexpected). Both should be near zero by end of job.
Reliability scenarios MigrationFox handles automatically
Piping bytes between Google Drive and SharePoint looks simple — a stream on one side, a chunked upload session on the other — but four distinct behaviors of the underlying APIs can truncate or stall a transfer if the pipeline is naive. MigrationFox accounts for each one at the protocol level so the job’s Failed counter stays at zero without manual intervention.
Compressed source streams with drifting byte counts
Google Drive serves many file types with Content-Encoding: gzip. The Content-Length on the response describes the compressed byte count on the wire, but the stream most HTTP clients expose is the uncompressed payload. A pipeline that naively trusts the wire-reported length sizes its destination upload session too small, then sees the session close at roughly 99% of the file while the last chunk is still arriving.
MigrationFox requests the source with Accept-Encoding: identity so Google returns uncompressed bytes with a truthful length. For file formats where the source still does not report a reliable total (Google-native exports generated on the fly), the pipeline stages the content to worker-local disk first, measures the true size, and opens the destination upload session with the correct total before streaming begins. The destination and source agree on the byte count from the first chunk onward, which is the only way the upload session can finalize cleanly.
Large media files that drop mid-stream
Long-running uploads of large files — video, design archives, backup bundles — occasionally experience the source socket being closed by an upstream load balancer or the Google edge idle-timing-out when the destination pauses to accept a chunk. A naive pipe propagates that disconnect as an upload failure, and the whole file has to restart.
MigrationFox inserts a buffered reader between the source and destination so the source socket is continuously drained even when SharePoint briefly pauses — there is no idle period for the source edge to time out on. If the source stream does drop mid-transfer anyway, the worker resumes the download with a Range request from the last successfully received byte offset rather than restarting from zero. The destination upload session stays open across the reconnect; the customer-visible effect is a brief throughput dip and a retry counter increment, not a failed file.
Destination-side throttling at scale
Microsoft Graph enforces a per-tenant request budget. A multi-worker migration can exceed that budget during peak hours and receive 429 Too Many Requests with a Retry-After header telling the client to wait — sometimes for tens of seconds. Clients that ignore Retry-After and retry immediately are answered with longer backoffs on subsequent attempts, compounding the slowdown.
MigrationFox honors Retry-After verbatim and additionally tracks the destination tenant’s steady-state throttle behavior over the life of the job. When 429 frequency climbs, the worker pool shrinks its in-flight request count dynamically; when 429s subside, it scales back up. Concurrency defaults for Google Drive to SharePoint are calibrated to typical tenant budgets rather than to the worker hardware, so a fresh job starts at a rate the destination can sustain rather than at a rate that will earn a throttle response in the first minute. The net result is higher wall-clock throughput than pushing harder ever produces, because the tax of earning a 429 is larger than the cost of leaving a few request slots unused.
SharePoint upload session lock contention
Graph’s chunked upload protocol records the total file size from the first chunk of the session. Subsequent chunks that disagree with that total are rejected with 416 Requested Range Not Satisfiable, even when the disagreement is by a handful of bytes. The same library being updated by two workers at the same time, or a metadata write racing a content finalize, can both surface as lock-contention errors on the destination.
MigrationFox sizes the upload session against the verified source total (see compressed-source handling above) so the first-chunk declaration is always accurate. Metadata writes (content-type assignment, custom column values, permission replay) are serialized per destination item so content finalize is never racing a property set against the same file. When lock contention does occur — another process writing to the same library from outside the migration, typically — the worker backs off with jittered delays and retries the specific item rather than failing the entire batch.
Customer-visible effect
None of these scenarios requires operator intervention. The job’s Retries counter may climb — which is expected and healthy on large jobs — but the Failed counter stays at zero and the final audit report shows every file accounted for. Watch Failed, not Retries.
File types that skip automatically
Two categories of file never get a real upload attempt. Both are counted under Skipped on the dashboard, broken out on the final report.
- OS / app artifacts.
.DS_Store,Thumbs.db,desktop.ini,~$*, Trash folders, and similar noise — the full pattern list lives in the auto-skip reference. On a typical Google Drive source these are 3–10% of file count and essentially zero bytes. - Executables and other destination-blocked types. Any extension on the destination tenant's
DisallowedFileTypeslist is skipped at the source pre-read step, not at upload. See the executable handling doc for how to relax the list if you actually need installers in SharePoint.
Google-native files (Docs, Sheets, Slides, Forms) are not skipped — they're exported to Office formats (.docx, .xlsx, .pptx) and uploaded normally. Google Drawings and Google Sites pages are exported to PDF. Google Forms and Jamboards are metadata-only (destinations don't have an equivalent) and surface as warnings on the final report.
How to check progress
Three views, all live:
- Job Detail page. The dashboard shown above — full counters, log tail, retry breakdown. Polls every 10 seconds.
- Client Live View. A share URL that shows the same progress bar + phase strip without credential names or source paths. Send this to the client instead of screenshots. See client live view.
- Webhooks.
job.started,job.progress(every 5%),job.completed,job.failed. Wire into Slack or a PSA for overnight unattended runs.
Common reasons a migration slows
Throughput drops from 4 MB/s to sub-1 MB/s mid-run
Almost always source-side throttle. Google Drive enforces a per-user and a per-project quota; the per-project quota resets every 100 seconds. If the scan hit a burst of heavy files, the first few hundred seconds look fast, then throttle kicks in. The backoff ladder smooths this out — throughput settles into whatever the steady-state quota allows, which is often around 2–4 MB/s for a single service account.
Long idle-looking gaps in the log strip
Worker is waiting on a single giant file. One 8 GB video upload at 3 MB/s is a 45-minute block during which no other activity shows. The other 15 workers are still busy — check the Workers tile to confirm.
SharePoint destination suddenly 503s for minutes
Microsoft 365 is doing a regional failover or applying a patch. The backoff ladder handles it; you'll see Retries spike, then recover. No action needed unless it persists more than ~15 minutes, at which point the job auto-pauses and notifies you.
When to split into multiple jobs
A single Google Drive source scales cleanly up to about 500 GB / 500K files per job. Beyond that, splitting into multiple parallel jobs is almost always faster and always easier to recover from.
- By shared drive. One job per shared drive is the cleanest split — scope boundaries match API permission boundaries.
- By top-level folder. For a single mega-drive, split at the first folder layer. MigrationFox respects these splits and each job gets its own independent retry ladder.
- By wave. For weekend cutovers, split the inactive/archival content into a weekday job that runs at 25% bandwidth, then run the active content in a 2-hour window on Saturday night.
Each job runs on its own worker pool with its own rate-limit budget, so two 250 GB jobs in parallel finish faster than one 500 GB job. The practical cap is around 4 parallel jobs per tenant before the destination-side throttle becomes the bottleneck.
DON'T
Never stop-and-restart a job because the throughput looks "too slow." Restarts waste the built-up backoff context and force the job to re-learn the tenant's rate limits from scratch. If you're worried, let it run for 30 more minutes and watch the Failed counter — if it's still zero, the job is fine.
What's next
- Auto-skipped files reference — the full pattern list and why typical sites shrink 40–60% after filtering.
- Executable file handling — why
.exeand.dllshow as Skipped and how to relax the policy. - Migration Rehearsal dry-run — validate a real run's plan before cutover night.
- Share a migration live view with your client — a read-only window into the running job.