MigrationFox Docs

Skip Junk Files

MigrationFox filters nine classes of operating-system junk at scan time so the migration scope reflects real user content, not OS debris. Every filtered file is logged by category with a count, so the scan report is an honest scope statement rather than a surprise that shows up on the other side.

Why scan-time, not upload-time

Filtering at scan time means MigrationFox never asks the source platform for junk-file metadata, permissions, or content. On a long-lived drive that saves tens of thousands of source-side requests before migration even starts. It also means the file count on the progress bar is the real count — the number you see is the number MigrationFox is going to move, not a padded figure that drops during upload.

Full default filter list

Nine categories are filtered by default on every scan. The first seven are high-confidence OS junk and are filtered silently into the scan report. The last two (.tmp family) are logged with a soft flag so you can un-filter them if your source tooling writes real data to those extensions.

PatternMatch typeOriginWhat it is
.DS_StoreExact filenamemacOSFinder metadata — window position, view mode, icon layout per folder.
._*PrefixmacOSResource-fork sidecars written on non-HFS filesystems (USB drives, network shares, cloud sync roots).
Thumbs.dbExact filenameWindowsExplorer’s cached image thumbnails so folders render faster.
desktop.iniExact filenameWindowsCustom folder icons, localized folder names, view templates.
~$*PrefixMicrosoft OfficeLock files written when a document is opened. Meant to be deleted on clean close — orphaned when the app crashes or OneDrive syncs before cleanup.
thumbdata3-*PrefixAndroidGallery thumbnail cache written into DCIM/.thumbnails/. Can grow to hundreds of megabytes on long-used phones.
.Trash-*Folder prefixLinuxTrash directories created on any mount where the user deleted files. Sync flows propagate them to cloud storage.
.Spotlight-V100Folder namemacOSSearch index written at the root of every mounted volume. Binary, not portable, pure metadata.
.localizedExact filenamemacOSMarker file that tells Finder to translate the folder name. Harmless on macOS, meaningless anywhere else.
.tmp / ~*.tmpSuffix / prefixMultipleTemp and autosave remnants (Word’s ~WRD0001.tmp, Photoshop autosave, partial-write leftovers). Logged, user can un-filter.

Why each pattern is filtered

macOS Finder metadata (.DS_Store, ._*, .localized, .Spotlight-V100)

Every time a macOS user opens a folder in Finder, macOS writes .DS_Store into that folder with window position, view mode, icon layout, and custom background image. It is useful on the Mac that created it. It is nonsense on every other machine. When someone drops that folder into a cloud drive, the .DS_Store comes along and propagates forever.

._* files are worse. macOS writes them on non-HFS filesystems to preserve extended attributes and resource forks the destination cannot store natively. For every Report.docx on a shared drive you get a ._Report.docx sidecar. They are usually 4 KB each but in typical long-lived drives account for the single largest filtered category by count.

.Spotlight-V100 is macOS’s search index, written at volume roots. Binary, not portable, and never useful in another filesystem. .localized marks folders for Finder translation; harmless elsewhere but also useless.

Windows Explorer artifacts (Thumbs.db, desktop.ini)

Windows writes Thumbs.db (cached image thumbnails) and desktop.ini (custom folder icons, localized folder names, view templates) automatically when Explorer renders folders. Both are classified as hidden system files, invisible in Explorer unless you actively show hidden items. When a user uploads a folder through the OneDrive or SharePoint client, both tag along. SharePoint has no concept of a thumbnail cache or a folder view template — they are just files, sitting in every library folder the user has ever viewed on Windows.

Office lock files (~$*)

When you open Budget.xlsx, Office writes ~$Budget.xlsx in the same folder as a lock file — it tells any other Excel instance that the workbook is open and by whom. When Excel closes cleanly, Office deletes the lock file. When Excel does not close cleanly (laptop slept, app crashed, OneDrive flushed before the cleanup), the lock file is orphaned and stays forever.

Orphaned ~$* files are misleading at best and harmful at worst. If one syncs to SharePoint it can confuse new Office sessions into thinking a file is locked by a session that ended three months ago. The user sees the file marked as in-use and cannot edit. Filtering them out during migration is pure upside.

Android gallery cache (thumbdata3-*)

Any Android phone used as a camera accumulates binary files named thumbdata3-1763508120, thumbdata3-1967290299, and so on inside DCIM/.thumbnails/. These are the platform’s cached gallery thumbnails. They can easily grow to several hundred megabytes apiece on phones with years of photos.

When a user backs up their phone’s photo folder to Google Drive (default behavior for many Android users), the thumbdata3-* files go with them. They are not photos, they are not viewable, they are caches. Filtering them out typically saves multiple gigabytes of upload per account.

Linux trash (.Trash-*)

Linux distributions write a .Trash-1000/ folder (numbered by user ID) into any mount where the user has sent files to the trash. The folder contains the deleted files plus an info/ subdirectory with metadata. If the user mounts a cloud sync folder on Linux and deletes anything from it, .Trash-1000 gets created and syncs up to the cloud — and the intentionally deleted files live on forever.

Temp and autosave remnants (.tmp, ~*.tmp)

Every application that implements autosave creates temporary files. Word saves ~WRD0001.tmp alongside the document. Photoshop writes .tmp files for every uncommitted edit. Many get cleaned up. Many do not — the application crashes mid-write, the user force-quits, the network disconnects during a save. The orphans stay on disk and eventually sync to the cloud.

Filtering these has a different character from the OS-junk categories. An autosave remnant could, in a small fraction of cases, be the only surviving copy of a document the user was editing when the system crashed. MigrationFox filters them by default but logs them under “recovered garbage” in the scan report so a careful operator can do a one-pass check before confirming the migration scope.

How to opt out

Filters are defaults, not policy. Opt-out paths exist for every category.

Per-job override

Toggle a category off for a single migration

On the scan profile panel for any migration job, each filter category has a toggle. Turning a category off means:

  • The pattern is not applied during scan.
  • Matching files are enumerated and will be migrated like any other content.
  • The scan report shows the category with a “disabled” indicator rather than a filtered count.

Per-job overrides do not affect other jobs in the same project. Use them for one-off audits (“move the Android thumbnails once for a forensic snapshot”) rather than blanket policy changes.

Per-tenant default

Change the default for every new job

For organizations that consistently need to preserve a specific category of file (records-retention shops that archive Office lock files as evidence, for instance), a tenant admin can change the default filter state for a category in tenant settings. New jobs created after the change inherit the new default; existing jobs keep whatever they started with.

Do-not-filter

Dotfiles in general are never filtered

Only the specific named patterns listed above are filtered. General dotfiles (.bashrc, .env, .gitconfig) are treated as user content and migrated as-is. A user who deliberately placed a dotfile in their drive put it there on purpose. MigrationFox does not try to guess which dotfiles are OS junk and which are intentional.

Performance impact

On a representative scan of a long-lived Google Drive account with 90,522 files totaling 577 GB, the filter removed 34,272 files (roughly 38% by count, about 36 GB by volume). The remaining 46,088 files were what the user actually wanted on SharePoint. Three things get faster as a result:

Shorter the migration, cheaper the migration, and the destination is not cluttered with OS debris nobody will ever look at.

The 30% figure is a long-tail average

Drives that are mostly large files (video archives, DICOM dumps, engineering builds) see a smaller relative speedup because the junk-file count is proportionally smaller. Drives that are mostly small files mixed with OS debris (user home directories, shared team drives that have been synced from many devices) see the full 30–35% improvement. Your mileage varies with content shape; the scan report tells you in advance what to expect.

What shows up in the scan report

Every scan that ran with filters on produces a breakdown in the report:

The migration estimate (time, request budget, storage delta) uses the after-filter numbers. If you turn a category back on, the estimate refreshes to match.

Troubleshooting

SymptomLikely causeFix
The scan shows more filtered files than I expected The source drive has been synced from multiple devices over several years, accumulating junk from each. Review the per-category breakdown in the report. If the counts are plausible for the categories shown (macOS users produce .DS_Store proportional to folder count, Android users produce thumbdata3-* proportional to camera use), the filter is working correctly.
A file I needed was filtered out The filename matched a default pattern. Most commonly, a legitimate document was named thumbs.db or a data file was saved with a .tmp extension. Turn the affected category off in the scan profile and re-scan. Use the per-job override rather than the tenant default unless every scan needs the change.
A category shows 0 filtered files but I know there are junk files in the source Either the category is already toggled off for this job, or the files in question do not match the exact pattern (e.g. Thumbs.DB with uppercase-DB on a case-sensitive source where the match is case-sensitive). Check the toggle state in the scan profile. If the toggle is on and files still are not filtered, compare the filenames against the exact patterns in the table above — matches are by the patterns specified, not by heuristics.
Junk files appeared on the destination after migration The job was created before the junk-file filter was enabled for this tenant, or a category that matches the junk was toggled off on the job. Delete the affected files on the destination (bulk delete by filename pattern works well for .DS_Store and Thumbs.db), then adjust the filter defaults so the next migration does not reintroduce them.
The .tmp filter removed an autosave file I needed to recover The .tmp family is filtered by default but logged under “recovered garbage” specifically because rare edge cases like this exist. Open the scan report, expand the .tmp category, locate the file by path, and either un-filter the category on a re-scan or copy the file manually from the source if still available.
Filter counts on a re-scan differ from the original scan The source drive changed between scans — new junk accumulated from recent sync activity or files were cleaned up by the user. Expected. The filter always matches against the current source state. The most recent scan’s numbers are the authoritative ones for the migration.

Related