[ Home Page ] [ Eiffel Archive ] [ Eiffel Classes and Clusters ]
![]() |
Comparison Library |
Written by Ian Elliott.
comparison.zip (23,321 bytes) - source code
This library provides classes that compare two sequences of like items and report their differences. A sequence is taken to be an ordered collection of items where each item has an associated positive integer index which defines the items unique position in the collection. Successive items of the sequence may be accessed and for each access the index of the item accessed increases by 1. Thus files of lines, characters, bits and other objects, arrays and linear lists of items and even traversals through structured objects such as trees and graphs may be treated as sequences. Two sequences are the same if they contain items of equal value (by is_equal) at every corresponding index, otherwise the sequences are different. The smallest index at which corresponding items differ specifies the position of the first difference. At higher indexes the two sequences may contain subsequences of items in common i.e. they may resynchronise and further on they may differ again, and so on. The aim of the library is to provides means to identify the differences, report them and report where they occur.
A comparison algorithm is implemented for those sequences which are rewindable i.e. sequences for which reading may be re-commenced at a position already read. Such sequences include arrays and random access files. The classes are given in the Comparison cluster. A simple brute force pattern matching algorithm is used and it is reasonably effective for small files. An implementation comparing large files or files of bits would probably require a more sophisticated approach. A full implementation is provided for line based ASCII files i.e. for files which are sequences of ASCII strings separated by end of line characters. These classes are in the Line_file_comparison cluster. The program fcompare provides a simple command line interface to this implementation.
Notes:
This algorithm applies to rewindable sequences i.e. sequences where the items are typically read one after the other but an item already read can be re-read and the reading the sequence continued from there.
The two sequences are read an item at a time in concert. The items in each pair are compared. If the items are the same i.e. have the same value by is_equal, the next pair is read. This continues until one of the following occurs:
Considering each case in turn:
A possible extension is to ignore items of certain value e.g. blank lines in file comparisons. Another extension is to create a Windows version of fcompare using Object Tools Display Machine and another includes the ability to compare different file formats such as binary.
[ Home Page ] [ Eiffel Archive ] [ Eiffel Classes and Clusters ]