Back to menu command

Remove duplicate files

This command opens the "Remove duplicate files" dialog as shown below, which searches for and delete duplicate files within a single folder or across two folders.

Warning: Some operations may delete files.
Please always BACKUP your files before proceeding further.
The author will NOT be liable for any data loss while using or misusing this function.

Remove duplicate files within the same folder

  1. Specify the folder in the "Source folder".
  2. Uncheck the "Find duplicates between two directories" option.
  3. Set the searching criteria. Only files that meet the criteria will be searched.
    • In the "Prefix" box, input the string that must appear in the prefix of a file name. For example, if your input is "img", files with prefix such as "img01", "dc_img_0001", "00img" will be found. If you leave this box blank, any prefix is valid.
    • In the "Extension" box, input the extension you want to include or exclude. For example, if you want to find all ".jpg" and ".tif" files and exclude all ".bmp" files, input "jpg;tif;-bmp". If you want to find any files except all ".bmp" files, input "-bmp". Leaving this box blank means any extension is valid.
    • In the "Size(>=)" box, input the lower limit (in bytes) of the file size. Any file size will be valid if the number is "0".
  4. Specify the type of operation applied to the duplicate files.
    • "Test only" records the duplicate files in a log file (if enabled) and does not delete any file;
    • "Delete" records the duplicate files in a log file (if enabled) and deletes the files permanently;
    • "Move to" (which will enable the "Destination folder" box below) records the duplicate files in a log file (if enabled), renames the files according to their file name and moves the files to the destination folder that you specified. The renaming rule is Replace every "\" in the path with a "_" and replace the drive symbol ":\" with a "__". For example, a duplicate file "C:\output\aa.jpg" will be moved to "C:\temp\C__output_aa.jpg" if the destination folder is "C:\temp".
  5. Click the "Remove" button to start the process. The following description of the searching algorithm for the duplicate files may help you understand the behavior of this function better.
    1. The algorithm first gets the size information of all files in the source folder that meet the searching criteria.
    2. Then it groups files with the same size.
    3. It reads and compares the content of each file in the same group and then finds out the duplicate files within that group, which means that the algorithm finds out all duplicates regardless of the name, date and location of the files.
    4. The algorithm sorts the duplicates by their file names in either ascending or descending order, depending on what you have chosen.
    5. The algorithm only keeps the first file on the sorted list and deletes the remaining files.
  6. The number of sets of duplicates and deleted files are shown at the bottom of the dialog.
  7. Check out the log file (if enabled) for the list of duplicate files by clicking the "View" button, or you can delete the log file by clicking the "Clear" button.

Remove duplicate files across two folders

  1. Specify the folder in the "Source folder".
  2. Check the "Find duplicates between two directories" option and specify "Source folder 2" (called a reference folder).
  3. Set the searching criteria and type of operation, as described in step 3 and 4 in the previous section.
  4. Click "Remove" to start the process. In additional to what are described in the previous section, the algorithm also finds files (in the first folder) that have duplicates in the reference folder .
  5. Similarly, the algorithm sorts the duplicates by their file names in either ascending or descending order and deletes all files on the list except the first one. Note that, none of the files in the reference folder will be deleted.

VIPBase © 2006 Fengjun Lv
Last update: 08/01/2006 (check html)