Confidential refers to private information that a subject discloses with the expectation that it will not be divulged to others without that subject’s permission. When an investigator promises confidentiality, the subject is asked to supply information that could potentially identify that subject, which is then linked to the research data collected from the subject with the understanding that the investigator will not disclose the information to others outside of those for whom the subject has given the investigator explicit consent to share (i.e., the research team).
- Example: Surveys and interviews conducted in-person are considered confidential rather than anonymous, as the investigator can identify the subjects, even if the investigator has collected no other identifying information.
It is important to note the use of the terms divulge and disclose, as they point to an important aspect of confidentiality that the investigator must always keep in mind — that privacy cannot be guaranteed. While the investigator may promise not to share the subjects’ private information, it may still be discoverable by outside parties. When dealing with confidential information, then, the investigator must ensure the information is collected and stored in such a way as to minimize discovery by outside parties.
- Example: One-to-one interviews that are conducted in a public place may be overheard by others.
- Example: The investigator stores identifiable data on their computer unencrypted. The computer is left unattended and found by an outside party who traces the data to the subjects.
Confidential data collection involves a higher level of risk or potential for harm to the subjects than does anonymous data collection. It should be noted that there are multiple levels of possible risk in confidential data collection and storage.
- Example: The fewer the number of individuals who have access to the data, the lower the level of risk. Focus groups involve a higher level of risk than do one-to-one interviews, as the subjects must rely on the ability of all other subjects as well as the investigator to maintain the confidentiality of the information shared.
- Example: The more securely the data is stored, the lower the level of risk. Encrypted computers are more secure than locked file cabinets, and encrypted servers are more secure than personal computers.
Protecting Confidentiality
Prior to a subject’s participation in research, they must be told whether their involvement and the data collected will be anonymous or confidential. If the data are to remain confidential, it is also important for the investigator to discuss with the subject during the process of informed consent the level of confidentiality that can be offered and the potential for breach of confidentiality. This should, as well, be noted on the informed consent form.
Note that there are times when breaking confidentiality may be required. Investigators who are mandated reporters, for example, must disclose to subjects that they are legally obligated to report suspected child or elder abuse, or if the participant or others are in immediate risk of harm. Possibilities of this type must be noted in the consent form, as applicable to each project.
An important consideration in the use of confidential data is the investigator’s responsibility to keep the data as safe and secure as possible. The investigator can do this in a variety of ways (note the following list is in no way exhaustive):
- Limit access to the data to as few individuals as possible.
- Code the data whenever feasible.
- Store hard copies of the data in locked cabinets in locked rooms.
- Store the data, master code list and informed consent forms in separate locations.
- Transfer (from person to person, place to place) the data (field notes, recorded interviews, informed consent forms) promptly and securely.
- Transcribe recorded data as soon as possible and destroy original recordings.
- Store data on an encrypted computer or server.
- Upload data to an encrypted server promptly (do not wait until all data is collected).
- Delete identifiers (or de-identify) as soon as is feasible.
It is imperative that investigators keep in mind at all times the potential harm (social, legal, economic, physical) to subjects that may result from a breach in confidentiality. Plans for data security must be outlined in the research protocol when discussing the provisions for managing risk, and approved by the IRB.
Coded Data
A common practice for reducing the risk of a breach of confidentiality is for the investigator to code the information and data collected from the subject. When data is coded, a subject’s identifying information is separated from the subject’s research data and replaced with a code. The investigator then keeps a “master list” of the subjects’ names and identifying codes. For security purposes, the master list is kept separately from the subjects’ data.
- Example: Studies that involve interviews often utilize pseudonyms to mask subjects’ identities. In such cases, the investigator assigns each subject with a code name, which is used in all interview notes in lieu of the subject’s name. A master list linking the subjects’ names to their pseudonyms is developed to keep track of the data and is secured in a locked file cabinet. The interview notes are secured separately on an encrypted server.
De-identification
Another common practice — one that often leads to confusion between anonymous and confidential — is for the investigator (or the individual/organization from which the data originates) to de-identify the data collected from the subject. De-identification is the process by which all links between the subjects’ personally identifying information and their research data are severed and the investigator has no code by which to re-identify them. Data that will be de-identified is considered confidential data, because the researcher is able to identify the subjects prior to de-identification.
- Example: Often when an investigator conducts secondary data analysis, all identifying information has been scrubbed from the data prior to the investigator receiving it. (Note that in this case, the individual/organization from whence the data originated may retain the identifying information with the data.)
- Example: An investigator removes all identifying information from their research data and maintains no code with which to re-identify the data.
It is important to remember that the more indirect identifiers investigators collect, the higher the risk of re-identification. In addition, the investigator must keep in mind that there may be information available on the subjects that are unconnected from the data collected for the specific research in question and cumulatively, this information may be used to re-identify the subjects.
We acknowledge the Brandeis University IRB as the source for some of the information and examples provided above.