Welcome!

Thank you visiting ICB - Iceberg Code Blog
This is the place, where I'd like to share ideas and some useful code (VB, Excel, T-SQL, JavaScript, ...)
Table of Content

Tuesday, March 28, 2006

Comparing files ... when you have too many files

In the last post I uploaded a spreadsheet to search all files in your PC
but now you got too many files, and if you are like me (I never throw away a
file and I do some kind of confusing versioning) you can get lost quick to find
latest version.

I've been looking around, but I didn't find anything that fit my needs, so
as usual I did my own.

Compare two files and decide if they are identical it's a kind of art
(without open them). I remarked that 2 identical files can be different
in byte comparing just because one has been closed with cell A1 selected
and another with cell B2 ...

So comparing them just byte by byte it's not enough.
For this reason I personally consider identical files with same lenght
and 98% or more bytes identical ...

I did 2 functions, the first to know if 2 file are 100% identical
and the second to know how much they are similar

If you don't like to read code, but you need just a solution
please download it know: File_Move_Rename.xls

Remarks that in the spreadsheet there are other functions:
FileMoveFile, FileDeleteFile, FolderCreate, FolderExists, ...

So first you need a simple code to compare files
Here's the code (ops ... comments are in Italian):



Sub testFileBynaryCompare()
Dim BinaryFile1, BinaryFile2, MyStart, MyResult
     BinaryFile1 = "c:\P4.html"
     BinaryFile2 = "c:\p3.html"
     MyStart = Now
     MyResult = FileBynaryCompare(BinaryFile1, BinaryFile2)
     MsgBox (Now - MyStart) * 86400 * 1000000000 / FileLen(BinaryFile1) & " sec / giga"
     MsgBox MyResult
End Sub


Function FileBynaryCompare(ByVal BinaryFile1 As String, ByVal BinaryFile2 As String) As Boolean
' funzione che verifica se 2 file sono uguali per ogni singolo byte
' verifico che i file esistono
     If (Len(Dir(BinaryFile1)) = 0) Then GoTo ExitFunction
     If (Len(Dir(BinaryFile2)) = 0) Then GoTo ExitFunction
'verifico che abbiano dimensioni uguali
     If FileLen(BinaryFile1) <> FileLen(BinaryFile2) Then GoTo ExitFunction

' dimensione della stringa estratta
Dim BYTE_SIZE
BYTE_SIZE = 5000
Dim StrVar1 As String * 5000
Dim StrVar2 As String * 5000
Dim FileNum1, FileNum2, i
' Apro il primo file
FileNum1 = FreeFile
Open BinaryFile1 For Binary Access Read As #FileNum1
' Apro il secondo file
FileNum2 = FreeFile
Open BinaryFile2 For Binary Access Read As #FileNum2
'confronto 5000 byte alla volta
i = 1
Do Until EOF(FileNum1)
     Get #FileNum1, i, StrVar1
     Get #FileNum2, i, StrVar2
         If StrVar1 <> StrVar2 Then GoTo ExitFunction
     i = i + BYTE_SIZE
Loop
' se sono arrivato qui vuol dire che sono uguali
FileBynaryCompare = True

ExitFunction:
Close #FileNum1
Close #FileNum2

End Function



But, if the too files are similar for 99% you don't
get the result you expect so with a little modification
you can get a better result



Sub testFileSimilitude()
Dim BinaryFile1, BinaryFile2, MyStart, MyResult
     BinaryFile1 = "c:\P4.html"
     BinaryFile2 = "c:\p3.html"
     MyStart = Now
     MyResult = FileSimilitude(BinaryFile1, BinaryFile2)
     MsgBox (Now - MyStart) * 86400 * 1000000000 / FileLen(BinaryFile1) & " sec / giga"
     MsgBox MyResult
End Sub


Function FileSimilitude(ByVal BinaryFile1 As String, ByVal BinaryFile2 As String) As Single
' funzione che verifica se 2 file sono uguali per ogni singolo byte
' verifico che i file esistono
     If (Len(Dir(BinaryFile1)) = 0) Then GoTo ExitFunction
     If (Len(Dir(BinaryFile2)) = 0) Then GoTo ExitFunction
'verifico che abbiano dimensioni uguali
Dim FileLen1
     FileLen1 = FileLen(BinaryFile1)
     If FileLen1 <> FileLen(BinaryFile2) Then GoTo ExitFunction

If FileBynaryCompare(BinaryFile1, BinaryFile2) = True Then
FileSimilitude = 1
GoTo ExitFunction

End If
' dimensione della stringa estratta
Dim BYTE_SIZE
BYTE_SIZE = 5000
Dim StrVar1 As String * 5000
Dim StrVar2 As String * 5000
Dim FileNum1, FileNum2, i, k, j As Single
' Apro il primo file
FileNum1 = FreeFile
Open BinaryFile1 For Binary Access Read As #FileNum1
' Apro il secondo file
FileNum2 = FreeFile
Open BinaryFile2 For Binary Access Read As #FileNum2
'confronto 5000 byte alla volta
i = 1
j = 0
Do Until EOF(FileNum1)
Get #FileNum1, i, StrVar1
Get #FileNum2, i, StrVar2
If StrVar1 <> StrVar2 Then
For k = 1 To Len(StrVar1)
If Mid(StrVar1, k, 1) <> Mid(StrVar2, k, 1) Then j = j + 1
If j > FileLen1 / 20 Then GoTo ExitFunction
Next
End If
i = i + BYTE_SIZE
Loop
' se sono arrivato qui vuol dire che sono uguali
If FileLen1 = 0 Then
FileSimilitude = 1
Else
FileSimilitude = (FileLen(BinaryFile1) - j) / FileLen(BinaryFile1)
End If
ExitFunction:
Close #FileNum1
Close #FileNum2

End Function



Remark that Function FileSimilitude is a lot slower than FileBynaryCompare

Comments: Post a Comment



<< Home