About ORI

News & Events

Research Misconduct

RCR Resources


Policies & Regulations

Assurance Program

Case One: Creating a Public Archive of Sensitive Data

Printer FriendlyPrinter Friendly

RCR Casebook:  Data Acquisition and Management

Table of Contents | Previous | Next

Frances is a researcher studying the molecular basis of cancer. She plans to sequence the genomes of children with cancer. Frances also intends to make such sequencing data publicly available on line. Internet-based DNA sequence databases would allow other scientists to analyze the data and ideally come up with important findings more quickly. This may lead to rapid identification of targets for new pediatric cancer treatments. 

 “Re-identification occurs when two conditions are satisfied: 1) uniqueness, and 2) linkage. The first condition is satisfied when unique values exist in the shared medical records. The second condition is satisfied when attributes in the medical record can be used to link to identified information external to the medical record.”

 - Bradley Mali and Kathleen Benitez (Vanderbilt University)

If a large amount of sequence data is made available, it may be possible for individuals to eventually be re-identified, which could have negative consequences. For example, participants in childhood cancer studies could become known to future employers or insurers based on their genetic information (their history of cancer and their publicly archived data). Further, data obtained for one study might later reveal other information such as susceptibility to other diseases or previously unknown family relationships. If a subject in a genome-wide sequencing study later released a small set of genetic information to another party for a different purpose, that information might be matched to the more extensive sequence data on the internet, revealing more about the subject to that party than the subject intended. This is a risk that subjects of such research should be made aware of during the informed consent process.


However, children, unlike adults, cannot legally consent. Publishing their personal DNA sequence would be based on parental permission. Since publication of such data is irreversible, parents would have to agree to this on their children’s behalf. 

Federal regulations permit pediatric research that has no direct benefit to the child only when risks are minimal. The determination that a study involves only minimal risk requires the evaluation of the magnitude of possible harms as well as the probability of such harms.

How should Frances proceed?

Discussion Questions for the Facilitator

  • Can you describe scenarios in which the data could be re-identified?
  • What are some possible harms of re-identification?
  • What are ways in which publicly available DNA sequencing data are used by others?
  • How might a subject learn about actual occurrence of breaches of confidentiality?
  • Do you think that children should have an opportunity to refuse requests to assent to the public use of their genetic information?