Data synchronization


Data synchronization

Data synchronization is the process of establishing consistency among data from a source to a target data storage and vice versa and the continuous harmonization of the data over time. It is fundamental to a wide variety of applications, including file synchronization and mobile device synchronization e.g. for PDAs.[1]

Contents

File-based solutions

There are tools available for file synchronization, version control (CVS, Subversion, etc.), distributed filesystems (Coda, etc.), and mirroring (rsync, etc.), in that all these attempt to keep sets of files synchronized. However, only version control and file synchronization tools can deal with modifications to more than one copy of the files.

  • File synchronization is commonly used for home backups on external hard drives or updating for transport on USB flash drives. The automatic process prevents copying already identical files and thus can save considerable time from a manual copy, also being faster and less error prone.[2]
  • Version control tools are intended to deal with situations where more than one person wants to simultaneously modify the same file, while file synchronizers are optimized for situations where only one copy of the file will be edited at a time. For this reason, although version control tools can be used for file synchronization, dedicated programs require less overhead.
  • Distributed filesystems may also be seen as ensuring multiple versions of a file are synchronized. This normally requires that the devices storing the files are always connected, but some distributed file systems like Coda allow disconnected operation followed by reconciliation. The merging facilities of a distributed file system are typically more limited than those of a version control system because most file systems do not keep a version graph.
  • Mirroring: A mirror is an exact copy of a data set. On the Internet, a mirror site is an exact copy of another Internet site. Mirror sites are most commonly used to provide multiple sources of the same information, and are of particular value as a way of providing reliable access to large downloads.

Synchronization can also be useful in encryption for synchronizing Public Key Servers.[3]

Theoretical models

Several theoretical models of data synchronization exist in the research literature, and the problem is also related to problem of Slepian-Wolf coding in information theory. The models are classified based on how they consider the data to be synchronized.

Unordered data

The problem of synchronizing unordered data (also known as the set reconciliation problem) is modeled as an attempt to compute the symmetric difference S_A \oplus S_B = (S_A - S_B) \cup (S_B - S_A) between two remote sets SA and SB of b-bit numbers.[4] Some solutions to this problem are typified by:

Wholesale transfer
In this case all data is transferred to one host for a local comparison.
Timestamp synchronization
In this case all changes to the data are marked with timestamps. Synchronization proceeds by transferring all data with a timestamp later than the previous synchronization.[5]
Mathematical synchronization
In this case data are treated as mathematical objects and synchronization corresponds to a mathematical process.[4][6][7]

Ordered data

In this case, two remote strings σA and σB need to be reconcilied. Typically, it is assumed that these strings differ by up to a fixed number of edits (i.e. character insertions, deletions, or modifications). Then data synchronization is the process of reducing edit distance between σA and σB, up to the ideal distance of zero. This is applied in all filesystem based synchronizations (where the data is ordered). Many practical applications of this are discussed or referenced above.

It is sometimes possible to transform the problem to one of unordered data through a process known as shingling (splitting the strings into shingles[clarification needed]).[8]

See also

  • SyncML, a standard mainly for calendar, contact and email synchronization

Notes

  1. ^ Agarwal, S.; Starobinski, D.; Ari Trachtenberg (2002). "On the scalability of data synchronization protocols for PDAs andmobile devices". Network, IEEE 16 (4): 22–28. doi:10.1109/MNET.2002.1020232. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1020232&isnumber=21950. Retrieved 2007-07-27. 
  2. ^ A. Tridgell (February 1999). Efficient algorithms for sorting and synchronization. PhD thesis. The Australian National University. http://samba.org/~tridge/phd_thesis.pdf. 
  3. ^ sks.dnsalias.net
  4. ^ a b Minsky, Y.; Ari Trachtenberg; Zippel, R. (2003). "Set reconciliation with nearly optimal communication complexity". Information Theory, IEEE Transactions on 49 (9): 2213–2218. doi:10.1109/TIT.2003.815784. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1226606. Retrieved 2007-07-27. 
  5. ^ Palm developer knowledgebase manuals
  6. ^ Ari Trachtenberg; D. Starobinski and S. Agarwal. "Fast PDA Synchronization Using Characteristic Polynomial Interpolation". IEEE INFOCOM 2002. doi:10.1109/INFCOM.2002.1019402. http://people.bu.edu/staro/infocom02pda.pdf. 
  7. ^ Y. Minsky and A. Trachtenberg, Scalable set reconciliation, Allerton Conference on Communication, Control, and Computing, Oct. 2002
  8. ^ S. Agarwal; V. Chauhan and Ari Trachtenberg (November 2006). "Bandwidth efficient string reconciliation using puzzles". IEEE Transactions on Parallel and Distributed Systems 17 (11): 1217–1225. doi:10.1109/TPDS.2006.148. http://ipsit.bu.edu/documents/puzzles_journal.pdf. Retrieved 2007-05-23. 

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • data synchronization — noun a) The establishment of equivalence between data collections (typically on different servers), where each data element in one item maps to a data item in the other, and their data is equivalent. b) (computing between trading partners) Data… …   Wiktionary

  • data synchronization — duomenų suvienodinimas statusas T sritis informatika apibrėžtis Duomenų, esančių skirtinguose kompiuteriuose arba skirtinguose kataloguose tame pačiame kompiuteryje, suvienodinimas. Pirmiausia patikrinama, ar duomenys vienodi. Jeigu yra nevienodų …   Enciklopedinis kompiuterijos žodynas

  • Global Data Synchronization Network — The Global Data Synchronisation Network is an internet based, interconnected network of interoperable data pools and a global registry known as the GS1 Global Registry, that enable companies around the globe to exchange standardised and… …   Wikipedia

  • Watermark (data synchronization) — DefinitionsWatermark describes an object of a predefined format which provides a point of reference for two systems/datasets attempting to establish delta/incremental synchronization; any object in the queried data source which was created,… …   Wikipedia

  • Global Data Synchronization Network — Das Global Data Synchronisation Network (GDSN) ist ein internetbasiertes Netzwerk von kompatiblen Datenbanken und einem globalen Register (GS1 Global Registry), dass es Unternehmen ermöglicht standardisierte und synchronisierte Produktdaten… …   Deutsch Wikipedia

  • Synchronization (computer science) — In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at… …   Wikipedia

  • Synchronization — Synchrony redirects here. For linguistic synchrony, see Synchronic analysis. For the X Files episode, see Synchrony (The X Files). For similarly named concepts, see Synchronicity (disambiguation). Not to be confused with data… …   Wikipedia

  • Data island — A data island is a data store, such as on a PDA or other computing device, that has non existent or limited external connectivity. This limits the ability of the user to synchronize with or copy the data to other devices. Though new data can be… …   Wikipedia

  • Data transmission — Data transmission, digital transmission, or digital communications is the physical transfer of data (a digital bit stream) over a point to point or point to multipoint communication channel. Examples of such channels are copper wires, optical… …   Wikipedia

  • Data integration — involves combining data residing in different sources and providing users with a unified view of these data.[1] This process becomes significant in a variety of situations, which include both commercial (when two similar companies need to merge… …   Wikipedia


Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.