Scientific data archiving

Scientific data archiving refers to the long-term storage of scientific data and methods. The various scientific journals have differing policies regarding how much of their data and methods scientists are required to store in a public archive, and what is actually archived varies widely between different disciplines. Similarly, the major grant-giving institutions have varying attitudes towards public archival of data. In general, the tradition of science has been for publications to contain sufficient information to allow fellow researchers to replicate and therefore test the research. In recent years this approach has become strained as increasingly research in certain areas depends on large datasets which cannot easily be replicated independently.

Data archiving is more important in some fields than others. In a few fields, all of the data necessary to replicate the work is already available in the journal article. In drug development, a great deal of data is generated and must be archived so researchers can verify that the reports the drug companies publish accurately reflect the data.

The requirement of data archiving is a recent development in the history of science. It was made possible by advances in information technology allowing large amounts of data to be stored and accessed from central locations. For example, the American Geophysical Union (AGU) adopted their first policy on data archiving in 1993, three years about after the beginning of the WWW. [”Policy on Referencing Data in and Archiving Data for AGU Publications” [] ] . This policy mandates that datasets cited in AGU papers must be archived by a recognised data center; it permits the creation of "data papers"; and it establishes AGU's role in maintaining data archives. But it makes no requirements on paper authors to archive their data.

Prior to data archiving, researchers who wanted to evaluate or replicate a paper would have to request data and methods information from the author. The science community expects authors to share supplemental data. This process was recognized as wasteful of time and energy and obtained mixed results. Information could become lost or corrupted over the years. In some cases, authors simply refuse to provide the information.

The need for data archiving and due diligence is greatly increased when the research deals with health issues or public policy formation. ["The Case for Due Diligence When Empirical Research is Used in Policy Formation" by Bruce McCullough and Ross McKitrick. [] ] [ "Data Sharing and Replication" a website by Gary King [] ]

Policies by journals


"Such material must be hosted on an accredited independent site (URL and accession numbers to be provided by the author), or sent to the Nature journal at submission, either uploaded via the journal's online submission service, or if the files are too large or in an unsuitable format for this purpose, on CD/DVD (five copies). Such material cannot solely be hosted on an author's personal or institutional web site. ["Availability of Data and Materials: The Policy of Nature Magazine [] ]

"Nature" requires the reviewer to determine if all of the supplementary data and methods have been archived. The policy advises reviewers to consider several questions, including: "Should the authors be asked to provide supplementary methods or data to accompany the paper online? (Such data might include source code for modelling studies, detailed experimental protocols or mathematical derivations.)" ["Guide to Publication Policies of the Nature Journals," published March 14, 2007. [] ]


‘’’Database deposition policy’’’ – "Science" supports the efforts of databases that aggregate published data for the use of the scientific community. Therefore, before publication, large data sets (including microarray data, protein or DNA sequences, and atomic coordinates or electron microscopy maps for macromolecular structures) must be deposited in an approved database and an accession number provided for inclusion in the published paper. ["General Policies of Science Magazine" [] ]

‘’’Materials and methods’’’ – "Science" now requests that, in general, authors place the bulk of their description of materials and methods online as supporting material, providing only as much methods description in the print manuscript as is necessary to follow the logic of the text. (Obviously, this restriction will not apply if the paper is fundamentally a study of a new method or technique.) [”Preparing Your Supporting Online Material” [] ]

Problems caused by lack of data archiving

In heart research

Dr. Ram Singh, a cardiologist practicing in India, has published research in many prestigious journals including "The Lancet" and "American Journal of Cardiology". In 1992, Singh published research on heart attack victims in "BMJ", The British Medical Association's flagship journal. The study was cited more than 200 times in scientific journals and in recommendations to doctors. His research was questioned in 1994. Dr. Richard Smith, BMJ's editor, wanted to investigate and consulted a statistician named Stephan Evans. Evans said a full review could only be done if he had the raw data. Smith feared that Singh would refuse to provide raw data. However, Smith did ask for raw data on a study submitted by Singh in 1994. Eight months later a box of papers arrived. Evans statistical analysis showed Singh's work to be full of inconsistencies and errors and should be retracted. The medical journal investigation lasted for 12 years before deciding the research was probably fraudulent. The Alliance for Human Research Protection looked into the matter and recommended that journal editors must "adopt a PUBLICATION REQUIREMENT for all authors submitting clinical trial reports if they want to protect the integrity of both the journals and the scientific literature. Authors should be REQUIRED to submit ALL RAW DATA along with their research report." (emphasis in the original) ["Medical Journal Editor Finds Truth Hard to Track Down" published by Alliance for Human Research Protection" [] ]

Data archives

* National Archive of Computerized Data on Aging
* National Archive of Criminal Justice Data []
* National Climatic Data Center
* National Geophysical Data Center
* National Snow and Ice Data Center
* National Oceanographic Data Center
* [ International Tree-Ring Data Bank]
* ESO/ST-ECF Science Archive Facility
* CISL Research Data Archive
* World Data Center


External links

* Statistical checklist required by Nature []
* Policies of Proceedings of the National Academy of Sciences (U.S.) []
* The US National Committee for CODATA []
* The Role of Data and Program Code Archives in the Future of Economic Research []
* Data sharing and replication – Gary King website []
* The Case for Due Diligence When Empirical Research is Used in Policy Formation by McCullough and McKitrick []
* Thoughts on Refereed Journal Publication by Chuck Doswell []
* “How to encourage the right behaviour” An opinion piece published March, 2002. []
* NASA Astrophysics Data System []

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Data collection — is a term used to describe a process of preparing and collecting data, for example, as part of a process improvement or similar project. The purpose of data collection is to obtain information to keep on record, to make decisions about important… …   Wikipedia

  • Scientific method — …   Wikipedia

  • Data — For data in a computer science context, see Data (computing). For other senses of the word, see Data (disambiguation). See also datum, a disambiguation page. The term data refers to qualitative or quantitative attributes of a variable or set of… …   Wikipedia

  • Data sharing — is the practice of making data used for scholarly research available to other investigators. Replication has a long history in science. The motto of The Royal Society is Nullius in verba , translated Take no man s word for it. [1] Many funding… …   Wikipedia

  • data archive — data archive, data bank A storage and retrieval facility or service for social scientific data . Data archiving has grown in parallel with the development of secondary analysis as a recognized field of social research. The International… …   Dictionary of sociology

  • Data management plan — A data management plan is a formal document that outlines how you will handle your data both during your research, and after the project is completed [1]. The goal of a data management plan is to consider the many aspects of data management,… …   Wikipedia

  • Data Documentation Initiative — The Data Documentation Initiative (DDI) is an international project to create a standard for information describing statistical and social science data. Begun in 1995, the effort brings together data professionals from around the world to develop …   Wikipedia

  • Scientific journal — For a broader class of publications, which include scientific journals, see Academic journal. In academic publishing, a scientific journal is a periodical publication intended to further the progress of science, usually by reporting new research …   Wikipedia

  • Scientific research in Canada — This article outlines the history of natural scientific research in Canada including, mathematics, physics, astronomy, space science, geology, oceanography, chemistry, biology, medical research and psychology. The social sciences are not treated… …   Wikipedia

  • Data curation — In science, Data curation is a term used to indicate the process of extraction of important information from scientific texts such as research articles by experts and converting them into an electronic form such as an entry of a biological… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.