File comparison

File comparison in computing is the automatic comparing of data between files on a file system. The result of comparisons are typically displayed to the user, but can also be used to accomplish tasks in networks, file systems and revision control.

Examples of programs that complete the task include diff and cmp. The ability to highlight the changes to a document or file is found in text editors and word processors.

Examples of file comparison utilities include FileMerge and Microsoft File Compare. Diffutils is a GNU package which includes the diff command among other utilities. Free software comparison tools that provide file comparison systems, include WinMerge and Meld.

Method Types

Most file comparison tools find the longest common subsequence between two files. However, other file comparison tools find the longest increasing subsequence between two files (US patent|7031972). The file comparison tool used in Bazaar uses "(insert name of algorithm here)" [http://bramcohen.livejournal.com/37690.html] . The rsync protocol uses a rolling hash function to compare two files on two distant computers with low communication overhead.

When the most absolute type of file comparison is necessary, one would use what is referred to as "byte-level" file comparison. This method compares two or more files byte by byte to find the actual detailed comparison. While algorithmic methods are very accurate indeed, nothing is ever as accurate as byte-level comparison. However, one must trade off speed for this type of accuracy. On the other hand, this reduction in speed is really only an issue during an initial scan of all files. It becomes fairly negligible when performing later, incremental scans or comparisons.

Reasoning

It is important to note that there are different reasons for one to use different types of comparison tools. When one wishes to compare binary files, byte-level is probably best. But if one wishes to compare text files, a side-by-side visual comparison is usually best. (Note that visual comparison is also necessary for program files that are based upon languages that are human-readable or that are script-based.) This gives the user the chance to decide which file is the preferred one to retain, if the files should be merged together to create one containing all of the differences, or perhaps to keep them both as-is for later reference, thru some form of "versioning" control. Versioning is also important for backup purposes.

File comparison is an important, and most likely integral, part of file synchronization and/or backup. Even in backup methodologies, the issue of corruption is an important one. Corruption occurs without warning and without our knowledge; at least usually until too late to recover the missing parts. Usually, the only way to know for sure if a file has become corrupted is when it is next used or opened. Barring that, one must use a comparison tool to at least recognize that a difference has occurred. Therefore, all file sync or backup programs must include file comparison if these programs are to be actually useful and trusted.

When used in automated processes, file comparison can be set to automatically perform the correct method of saving. Usually the default should be to create another version of the same file automatically so that the user does not have to monitor the process at that point in time. Review, for the sake of elimination of unneeded versions of files, can then occur later at a more convenient time.

Historical Uses

Prior to file comparison, machines existed to compare magnetic tapes or punch cards. The IBM 519 Card Reproducer could determine whether a deck of punched cards were equivalent. In 1957, John Van Gardner developed a system to compare the check sums of loaded sections of Fortran programs to debug compilation problems on the IBM 704. [http://www.softwarepreservation.org/projects/FORTRAN/paper/John%20Van%20Gardner%20-%20Fortran%20And%20The%20Genesis%20Of%20Project%20Intercept.pdf]

See also

* Comparison of file comparison tools
* Computer-assisted reviewing

External links

*dmoz|Computers/Software/File_Management/File_Comparison/|File Comparison


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Comparison of file comparison tools — Contents 1 General 2 Compare Features 3 API / Editor Features 4 Other features 5 Aspects …   Wikipedia

  • File synchronization — (or syncing) in computing is the process of ensuring that computer files in two or more locations are updated via certain rules.[citation needed] In one way file synchronization, also called mirroring, updated files are copied from a source… …   Wikipedia

  • Comparison of spreadsheet software — Contents 1 Online versus Desktop 1.1 Collaborative spreadsheets 1.2 Remote data update 2 Online spreadsheets …   Wikipedia

  • Comparison — For comparisons within Wikipedia, see Category:Comparisons. Contents 1 Computer science 2 Language 3 Mathemat …   Wikipedia

  • Comparison of CAD, CAM and CAE file viewers — Computer aided design (CAD), Computer aided engineering (CAE) and Computer aided manufacturing (CAM) software produces files in a large variety of formats many of which are extremely complex and poorly supported by other applications. This… …   Wikipedia

  • Comparison of file sharing applications — File sharing is a method of distributing electronically stored information such as computer programs and digital media. Below is a list of file sharing applications. Top   A B C D E F G H I J K L M N O P Q R S T U V …   Wikipedia

  • File verification — is the process of using an algorithm for verifying the integrity or authenticity of a computer file. This can be done by comparing two files bit by bit, but requires two copies of the same file, and may miss systematic corruptions which might… …   Wikipedia

  • Comparison of EDA software — Comparison of Electronic Design Automation (EDA) software Contents 1 Free and Open Source Software (FOSS) 2 Proprietary software 3 Comparison of EDA packages 4 See …   Wikipedia

  • File Transfer Protocol — (FTP) is a network protocol used to transfer data from one computer to another through a network such as the Internet.FTP is a file transfer protocol for exchanging and manipulating files over a TCP computer network. A FTP client may connect to a …   Wikipedia

  • Comparison of Internet forum software (other) — Comparison of Internet forum software (WordPress) Latest release date Current stable version License Automatic Updates (for security) Simple Press [1] Version 4.4.5 yes Mingle Forum [2 …   Wikipedia


Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.