Two-stage cross-platform GUI (Windows, macOS, Linux) to find and remove duplicates from Folder B that have an identical file in Folder A.
Identity rule: same filename (case-insensitive) and same content hash.
Deletion safety: only files in Folder B can be deleted—and only after an explicit hash match (green).
- Stage 1 – Fast filter: find candidates by name + size (no hashing).
- Stage 2 – On-demand verify: hash only the rows you select.
- MATCH → row turns green (identical content).
- DIFF → row turns red (different content).
- Unchecked rows remain neutral.
- Already-verified rows are automatically skipped on re-runs.
This avoids hashing everything, keeps the UI responsive, and lets you control what to verify.
- Qt (PySide6) UI
- Two-stage flow (name+size → selective hashing)
- Built-in BLAKE3 hasher (much faster than SHA-256)
- Parallel hashing with adjustable worker count
- Hash cache on disk using the OS-specific cache directory
- Long path support (
\\?\prefix) - Delete only from Folder B; Folder A is never touched
- Clear progress text + counters
DedupeUI can be used in two ways.
- Install Python 3.11+.
- Install dependencies:
python -m pip install -r requirements.txt
- Launch:
python ./dedupe_ui.py
- In the app:
- Pick Folder A (keep) and Folder B (dedupe target).
- Click Stage 1: Find name+size candidates.
- Select rows and click Stage 2: Verify hash (selected).
- Only green (MATCH) rows are true duplicates; select them and click Delete Selected from Folder B.
Download a pre-built executable from the releases page and run it. No Python installation is required.
Bundle your own single-file executable with PyInstaller:
python -m pip install -U pyinstaller
pyinstaller --onefile --noconsole --name DedupeUI ^
--hidden-import blake3 ^
.\dedupe_ui.py
# Executable is in .\dist\DedupeUI.exeWhy --hidden-import blake3? The app imports BLAKE3 lazily; this flag ensures PyInstaller bundles it.
- Back up Folder B before deleting.
- Only Path B is ever deleted; logic never touches Folder A.
- The Delete button works only on green (MATCH) rows that you select.
- Long or locked paths: the app uses the Windows
\\?\long-path prefix and reports permission/lock errors without stopping the whole run.
- Use BLAKE3 (default) for best speed.
- Tweak Workers:
- USB/SD: 4–8
- SSD/NVMe: 8–16
- Stage 1 is fast; in Stage 2, verify only the rows you care about.
- The hash cache accelerates repeats if files haven’t changed.
- Getting stuck mid-scan: this two-stage flow avoids whole-tree hashing—use Stage 2 to hash selected rows only.
- Anti-virus slowdowns: exclude the target folders temporarily while scanning (remember to re-enable).
- Permissions: run PowerShell as admin if needed for protected paths.
dedupe_ui_backup.py— original single-file versiondedupe_ui.py— app entry pointutils.py,hashing.py,stage1.py,verifier.py,gui.py— split modules by responsibilityREADME.md— this file
MIT