Welcome!
Thank you visiting ICB - Iceberg Code Blog
This is the place, where I'd like to share ideas and some useful code (VB, Excel, T-SQL, JavaScript, ...)
Table of Content
This is the place, where I'd like to share ideas and some useful code (VB, Excel, T-SQL, JavaScript, ...)
Table of Content
Tuesday, March 28, 2006
Comparing files ... when you have too many files
In the last post I uploaded a spreadsheet to search all files in your PC
but now you got too many files, and if you are like me (I never throw away a
file and I do some kind of confusing versioning) you can get lost quick to find
latest version.
I've been looking around, but I didn't find anything that fit my needs, so
as usual I did my own.
Compare two files and decide if they are identical it's a kind of art
(without open them). I remarked that 2 identical files can be different
in byte comparing just because one has been closed with cell A1 selected
and another with cell B2 ...
So comparing them just byte by byte it's not enough.
For this reason I personally consider identical files with same lenght
and 98% or more bytes identical ...
I did 2 functions, the first to know if 2 file are 100% identical
and the second to know how much they are similar
If you don't like to read code, but you need just a solution
please download it know: File_Move_Rename.xls
Remarks that in the spreadsheet there are other functions:
FileMoveFile, FileDeleteFile, FolderCreate, FolderExists, ...
So first you need a simple code to compare files
Here's the code (ops ... comments are in Italian):
But, if the too files are similar for 99% you don't
get the result you expect so with a little modification
you can get a better result
Remark that Function FileSimilitude is a lot slower than FileBynaryCompare
but now you got too many files, and if you are like me (I never throw away a
file and I do some kind of confusing versioning) you can get lost quick to find
latest version.
I've been looking around, but I didn't find anything that fit my needs, so
as usual I did my own.
Compare two files and decide if they are identical it's a kind of art
(without open them). I remarked that 2 identical files can be different
in byte comparing just because one has been closed with cell A1 selected
and another with cell B2 ...
So comparing them just byte by byte it's not enough.
For this reason I personally consider identical files with same lenght
and 98% or more bytes identical ...
I did 2 functions, the first to know if 2 file are 100% identical
and the second to know how much they are similar
If you don't like to read code, but you need just a solution
please download it know: File_Move_Rename.xls
Remarks that in the spreadsheet there are other functions:
FileMoveFile, FileDeleteFile, FolderCreate, FolderExists, ...
So first you need a simple code to compare files
Here's the code (ops ... comments are in Italian):
Sub testFileBynaryCompare() Dim BinaryFile1, BinaryFile2, MyStart, MyResult BinaryFile1 = "c:\P4.html" BinaryFile2 = "c:\p3.html" MyStart = Now MyResult = FileBynaryCompare(BinaryFile1, BinaryFile2) MsgBox (Now - MyStart) * 86400 * 1000000000 / FileLen(BinaryFile1) & " sec / giga" MsgBox MyResult End Sub Function FileBynaryCompare(ByVal BinaryFile1 As String, ByVal BinaryFile2 As String) As Boolean ' funzione che verifica se 2 file sono uguali per ogni singolo byte ' verifico che i file esistono If (Len(Dir(BinaryFile1)) = 0) Then GoTo ExitFunction If (Len(Dir(BinaryFile2)) = 0) Then GoTo ExitFunction 'verifico che abbiano dimensioni uguali If FileLen(BinaryFile1) <> FileLen(BinaryFile2) Then GoTo ExitFunction ' dimensione della stringa estratta Dim BYTE_SIZE BYTE_SIZE = 5000 Dim StrVar1 As String * 5000 Dim StrVar2 As String * 5000 Dim FileNum1, FileNum2, i ' Apro il primo file FileNum1 = FreeFile Open BinaryFile1 For Binary Access Read As #FileNum1 ' Apro il secondo file FileNum2 = FreeFile Open BinaryFile2 For Binary Access Read As #FileNum2 'confronto 5000 byte alla volta i = 1 Do Until EOF(FileNum1) Get #FileNum1, i, StrVar1 Get #FileNum2, i, StrVar2 If StrVar1 <> StrVar2 Then GoTo ExitFunction i = i + BYTE_SIZE Loop ' se sono arrivato qui vuol dire che sono uguali FileBynaryCompare = True ExitFunction: Close #FileNum1 Close #FileNum2 End Function |
But, if the too files are similar for 99% you don't
get the result you expect so with a little modification
you can get a better result
Sub testFileSimilitude() Dim BinaryFile1, BinaryFile2, MyStart, MyResult BinaryFile1 = "c:\P4.html" BinaryFile2 = "c:\p3.html" MyStart = Now MyResult = FileSimilitude(BinaryFile1, BinaryFile2) MsgBox (Now - MyStart) * 86400 * 1000000000 / FileLen(BinaryFile1) & " sec / giga" MsgBox MyResult End Sub Function FileSimilitude(ByVal BinaryFile1 As String, ByVal BinaryFile2 As String) As Single ' funzione che verifica se 2 file sono uguali per ogni singolo byte ' verifico che i file esistono If (Len(Dir(BinaryFile1)) = 0) Then GoTo ExitFunction If (Len(Dir(BinaryFile2)) = 0) Then GoTo ExitFunction 'verifico che abbiano dimensioni uguali Dim FileLen1 FileLen1 = FileLen(BinaryFile1) If FileLen1 <> FileLen(BinaryFile2) Then GoTo ExitFunction If FileBynaryCompare(BinaryFile1, BinaryFile2) = True Then FileSimilitude = 1 GoTo ExitFunction End If ' dimensione della stringa estratta Dim BYTE_SIZE BYTE_SIZE = 5000 Dim StrVar1 As String * 5000 Dim StrVar2 As String * 5000 Dim FileNum1, FileNum2, i, k, j As Single ' Apro il primo file FileNum1 = FreeFile Open BinaryFile1 For Binary Access Read As #FileNum1 ' Apro il secondo file FileNum2 = FreeFile Open BinaryFile2 For Binary Access Read As #FileNum2 'confronto 5000 byte alla volta i = 1 j = 0 Do Until EOF(FileNum1) Get #FileNum1, i, StrVar1 Get #FileNum2, i, StrVar2 If StrVar1 <> StrVar2 Then For k = 1 To Len(StrVar1) If Mid(StrVar1, k, 1) <> Mid(StrVar2, k, 1) Then j = j + 1 If j > FileLen1 / 20 Then GoTo ExitFunction Next End If i = i + BYTE_SIZE Loop ' se sono arrivato qui vuol dire che sono uguali If FileLen1 = 0 Then FileSimilitude = 1 Else FileSimilitude = (FileLen(BinaryFile1) - j) / FileLen(BinaryFile1) End If ExitFunction: Close #FileNum1 Close #FileNum2 End Function |
Remark that Function FileSimilitude is a lot slower than FileBynaryCompare